File handling
Let’s see how to set up your trajectory to be handled with SOAPify.
Getting the example trajectory
Assuming that we are on linux, let’s download the small example lammps trajectory (it is a 55 gold atoms nanoparticle NVT at few temperatures):
wget https://github.com/GMPavanLab/SOAPify/releases/download/0.1.0rc0/SmallExample.zip
and unzip it
unzip SmallExample.zip
now we should have some files in the SmallExample
directory
[1]:
%ls SmallExample
ih55.data ih55-T_100.lammpsdump ih55-T_200.lammpsdump
[2]:
import SOAPify
import SOAPify.HDF5er as HDF5er
from MDAnalysis import Universe
import h5py
exampleHDF5 = "ih55.hdf5"
exampleSOAPHDF5 = "ih55soap.hdf5"
The base file
Let’s create the base .hdf5 file with the trajectory
The next cell is equivalent to the cli commands:
SOAPify-prepareTrajectory ./SmallExample/ih55.data ih55.hdf5 \
-n SmallExample_100 \
-a "pair_style" "smatb/single" \
-a "pair_coeff" "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666" \
-a T 100 \
-u atom_style "id type x y z" \
-t ./SmallExample/ih55-T_100.lammpsdump \
--types Au
SOAPify-prepareTrajectory ./SmallExample/ih55.data ih55.hdf5 \
-n SmallExample_200 \
-a "pair_style" "smatb/single" \
-a "pair_coeff" "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666" \
-a T 200 \
-u atom_style "id type x y z" \
-t ./SmallExample/ih55-T_200.lammpsdump \
--types Au
the -a
are extra attributes that are saved in the hdf5 file and can be useful for storing data about the simulation
the -u
are extra options to pass to the MDAnalysis.Universe
[3]:
def createTrajFile(
trajname: str,
topologyFile: str,
trajectories: "list[str]",
outFile: str,
extraAttrs=None,
):
u = Universe(topologyFile, *trajectories, atom_style="id type x y z")
u.atoms.types = ["Au"] * len(u.atoms)
HDF5er.MDA2HDF5(u, outFile, trajname, trajChunkSize=1000, attrs=extraAttrs)
extraAttrs = {
"ts": "5fs",
"pair_style": "smatb/single",
"pair_coeff": "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666",
}
for T in [100, 200]:
createTrajFile(
trajname=f"SmallExample_{T}",
topologyFile="./SmallExample/ih55.data",
trajectories=[f"./SmallExample/ih55-T_{T}.lammpsdump"],
outFile=exampleHDF5,
extraAttrs=extraAttrs,
)
[0:1000] 1000 1000 chunk of 9016 B
[1000:2000] 1000 1000 chunk of 9016 B
[2000:3000] 1000 1000 chunk of 9016 B
[3000:4000] 1000 1000 chunk of 9016 B
[4000:5000] 1000 1000 chunk of 9016 B
[5000:6000] 1000 1000 chunk of 9016 B
[6000:7000] 1000 1000 chunk of 9016 B
[7000:8000] 1000 1000 chunk of 9016 B
[8000:9000] 1000 1000 chunk of 9016 B
[9000:10000] 1000 1000 chunk of 9016 B
[10000:11000] 1000 1000 chunk of 9016 B
[11000:12000] 1000 1000 chunk of 9016 B
[12000:13000] 1000 1000 chunk of 9016 B
[13000:14000] 1000 1000 chunk of 9016 B
[14000:15000] 1000 1000 chunk of 9016 B
[15000:16000] 1000 1000 chunk of 9016 B
[16000:17000] 1000 1000 chunk of 9016 B
[17000:18000] 1000 1000 chunk of 9016 B
[18000:19000] 1000 1000 chunk of 9016 B
[19000:20000] 1000 1000 chunk of 9016 B
[0:1000] 1000 1000 chunk of 9016 B
[1000:2000] 1000 1000 chunk of 9016 B
[2000:3000] 1000 1000 chunk of 9016 B
[3000:4000] 1000 1000 chunk of 9016 B
[4000:5000] 1000 1000 chunk of 9016 B
[5000:6000] 1000 1000 chunk of 9016 B
[6000:7000] 1000 1000 chunk of 9016 B
[7000:8000] 1000 1000 chunk of 9016 B
[8000:9000] 1000 1000 chunk of 9016 B
[9000:10000] 1000 1000 chunk of 9016 B
[10000:11000] 1000 1000 chunk of 9016 B
[11000:12000] 1000 1000 chunk of 9016 B
[12000:13000] 1000 1000 chunk of 9016 B
[13000:14000] 1000 1000 chunk of 9016 B
[14000:15000] 1000 1000 chunk of 9016 B
[15000:16000] 1000 1000 chunk of 9016 B
[16000:17000] 1000 1000 chunk of 9016 B
[17000:18000] 1000 1000 chunk of 9016 B
[18000:19000] 1000 1000 chunk of 9016 B
[19000:20000] 1000 1000 chunk of 9016 B
The attributes are then accessible, and can be used to our advantage (or to reproduce the simulations)
[4]:
with h5py.File(exampleHDF5, "r") as workFile:
trjContainers=workFile['Trajectories']
for name, trjGroup in trjContainers.items():
print(f"Trajectory group name: {name}")
print("Attributes:")
for attname, attval in trjGroup.attrs.items():
print(f'\t{attname}: "{attval}"')
Trajectory group name: SmallExample_100
Attributes:
pair_coeff: "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666"
pair_style: "smatb/single"
ts: "5fs"
Trajectory group name: SmallExample_200
Attributes:
pair_coeff: "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666"
pair_style: "smatb/single"
ts: "5fs"
Applying SOAP
We will deactivate the verbosity within this jupyter.
Then let’s calculate the SOAP fingerprints using dscribe to the all of the trajectories in the file.
Here we changed the lMax between the two iterations The next cell is equivalent to the command:
SOAPify-traj2SOAP ih55.hdf5 \
-s ih55soap.hdf5 -g SOAP4_4_4 -l 4 -n 4 -r 4.48023312 -j 16
followed by
SOAPify-traj2SOAP ih55.hdf5 \
-s ih55soap.hdf5 -g SOAP6_4_4 -l 6 -n 4 -r 4.48023312 -j 16
[5]:
def worker(trajFileName: str, soapFileName: str, soapGroup, **kwargs) -> None:
with h5py.File(trajFileName, "r") as workFile, h5py.File(
soapFileName, "a"
) as soapFile:
SOAPify.saponifyMultipleTrajectories(
trajContainers=workFile["Trajectories"],
SOAPoutContainers=soapFile.require_group(soapGroup),
SOAPOutputChunkDim=1000,
verbose=False,
**kwargs,
)
for l in [4, 6]:
worker(
trajFileName=exampleHDF5,
soapFileName=exampleSOAPHDF5,
soapGroup=f"SOAP{l}_4_4 ",
SOAPnJobs=16,
SOAPrcut=4.48023312,
SOAPnmax=4,
SOAPlmax=l,
)
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_100"
extra SOAP arguments: None
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_100"
extra SOAP arguments: None
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None
The information about the SOAP calculation are stored in the attributes of the SOAP fingerprint datasets:
[6]:
with h5py.File(exampleSOAPHDF5, "r") as workFile:
for name, trjGroup in workFile.items():
print(f"SOAP group name: \"{name}\"")
print("Attributes:")
for dsname, trjDS in trjGroup.items():
print(f"\tSOAP dataset: {dsname}, shape {trjDS.shape}")
for attname, attval in trjDS.attrs.items():
print(f'\t\t{attname}: "{attval}"')
SOAP group name: "SOAP4_4_4 "
Attributes:
SOAP dataset: SmallExample_100, shape (20000, 55, 50)
SOAPengine: "dscribe"
l_max: "4"
n_max: "4"
r_cut: "4.48023312"
species: "['Au']"
species_location_Au-Au: "[ 0 50]"
SOAP dataset: SmallExample_200, shape (20000, 55, 50)
SOAPengine: "dscribe"
l_max: "4"
n_max: "4"
r_cut: "4.48023312"
species: "['Au']"
species_location_Au-Au: "[ 0 50]"
SOAP group name: "SOAP6_4_4 "
Attributes:
SOAP dataset: SmallExample_100, shape (20000, 55, 70)
SOAPengine: "dscribe"
l_max: "6"
n_max: "4"
r_cut: "4.48023312"
species: "['Au']"
species_location_Au-Au: "[ 0 70]"
SOAP dataset: SmallExample_200, shape (20000, 55, 70)
SOAPengine: "dscribe"
l_max: "6"
n_max: "4"
r_cut: "4.48023312"
species: "['Au']"
species_location_Au-Au: "[ 0 70]"
Selecting a trajectory
You can calculate SOAP only for a single trajectory: This is equivalent to
SOAPify-traj2SOAP ih55.hdf5 \
-s ih55soap.hdf5 -g SOAP6_6_4 -l 6 -n 6 -r 4.48023312 -j 16 \
-t /Trajectories/SmallExample_200
[7]:
with h5py.File(exampleHDF5, "r") as workFile, h5py.File(
exampleSOAPHDF5, "a"
) as soapFile:
SOAPify.saponifyTrajectory(
trajContainer=workFile["/Trajectories/SmallExample_200"],
SOAPoutContainer=soapFile.require_group("SOAP_6_6_4"),
SOAPOutputChunkDim=1000,
SOAPnJobs=16,
SOAPrcut=4.48023312,
SOAPnmax=6,
SOAPlmax=6,
verbose=False,
)
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None
Fingerprints for a subsystem
You can also calculate the soap fingerprints of a subgroup of atoms.
Here, for example we will calculate the SOAP fingerprints of only the 0th and the 15th atoms
There is (for now) no equivalent cli command.
[8]:
with h5py.File(exampleHDF5, "r") as workFile, h5py.File(
exampleSOAPHDF5, "a"
) as soapFile:
SOAPify.saponifyTrajectory(
trajContainer=workFile["/Trajectories/SmallExample_200"],
SOAPoutContainer=soapFile.require_group("SOAP_4_4_4_FEW"),
centersMask=[0, 15],
SOAPOutputChunkDim=1000,
SOAPnJobs=16,
SOAPrcut=4.48023312,
SOAPnmax=4,
SOAPlmax=4,
verbose=False,
)
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None
Note the the new attribute centersIndexes
and the different shape of the dataset that reflects that SOAP fingerprints have been calculated only for atom 0 and 15:
[9]:
with h5py.File(exampleSOAPHDF5, "r") as workFile:
name = "SOAP_4_4_4_FEW"
trjGroup = workFile[name]
print(f'SOAP group name: "{name}"')
print("Attributes:")
for dsname, trjDS in trjGroup.items():
print(f"\tSOAP dataset: {dsname}, shape {trjDS.shape}")
for attname, attval in trjDS.attrs.items():
print(f'\t\t{attname}: "{attval}"')
SOAP group name: "SOAP_4_4_4_FEW"
Attributes:
SOAP dataset: SmallExample_200, shape (20000, 2, 50)
SOAPengine: "dscribe"
centersIndexes: "[ 0 15]"
l_max: "4"
n_max: "4"
r_cut: "4.48023312"
species: "['Au']"
species_location_Au-Au: "[ 0 50]"
You can call getSOAPSettings
on a SOAP dataset to get the necessary data to ‘fill’ the vector (aka restore the repetition in the data that have been removed to save space) by simpy passing the returned dictionary to fillSOAPVectorFromdscribe
[10]:
with h5py.File(exampleSOAPHDF5, "r") as workFile:
for name, trjGroup in workFile.items():
print(f"SOAP group name: \"{name}\"")
for dsname, trjDS in trjGroup.items():
print(f"\tSOAP dataset: {dsname}, shape {trjDS.shape}:")
fillInfo=SOAPify.getSOAPSettings(trjDS)
print(f"\t{fillInfo}")
example= SOAPify.fillSOAPVectorFromdscribe(trjDS[:],**fillInfo)
print(f"\tFilled shape : {example.shape}")
SOAP group name: "SOAP4_4_4 "
SOAP dataset: SmallExample_100, shape (20000, 55, 50):
{'nMax': 4, 'lMax': 4, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 50, None)}}
Filled shape : (20000, 55, 80)
SOAP dataset: SmallExample_200, shape (20000, 55, 50):
{'nMax': 4, 'lMax': 4, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 50, None)}}
Filled shape : (20000, 55, 80)
SOAP group name: "SOAP6_4_4 "
SOAP dataset: SmallExample_100, shape (20000, 55, 70):
{'nMax': 4, 'lMax': 6, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 70, None)}}
Filled shape : (20000, 55, 112)
SOAP dataset: SmallExample_200, shape (20000, 55, 70):
{'nMax': 4, 'lMax': 6, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 70, None)}}
Filled shape : (20000, 55, 112)
SOAP group name: "SOAP_4_4_4_FEW"
SOAP dataset: SmallExample_200, shape (20000, 2, 50):
{'nMax': 4, 'lMax': 4, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 50, None)}}
Filled shape : (20000, 2, 80)
SOAP group name: "SOAP_6_6_4"
SOAP dataset: SmallExample_200, shape (20000, 55, 147):
{'nMax': 6, 'lMax': 6, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 147, None)}}
Filled shape : (20000, 55, 252)
About the command line interface
You can get the options for the command line interface(cli) interfaces with
SOAPify-prepareTrajectory --help
and
SOAPify-traj2SOAP --help