File handling

Let’s see how to set up your trajectory to be handled with SOAPify.

Getting the example trajectory

Assuming that we are on linux, let’s download the small example lammps trajectory (it is a 55 gold atoms nanoparticle NVT at few temperatures):

wget https://github.com/GMPavanLab/SOAPify/releases/download/0.1.0rc0/SmallExample.zip

and unzip it

unzip SmallExample.zip

now we should have some files in the SmallExample directory

[1]:
%ls SmallExample
ih55.data  ih55-T_100.lammpsdump  ih55-T_200.lammpsdump
[2]:
import SOAPify
import SOAPify.HDF5er as HDF5er
from MDAnalysis import Universe
import h5py

exampleHDF5 = "ih55.hdf5"
exampleSOAPHDF5 = "ih55soap.hdf5"

The base file

Let’s create the base .hdf5 file with the trajectory

The next cell is equivalent to the cli commands:

SOAPify-prepareTrajectory ./SmallExample/ih55.data ih55.hdf5 \
    -n SmallExample_100 \
    -a "pair_style" "smatb/single" \
    -a "pair_coeff" "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666" \
    -a T 100 \
    -u atom_style "id type x y z" \
    -t ./SmallExample/ih55-T_100.lammpsdump \
    --types Au

SOAPify-prepareTrajectory ./SmallExample/ih55.data ih55.hdf5 \
    -n SmallExample_200 \
    -a "pair_style" "smatb/single" \
    -a "pair_coeff" "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666" \
    -a T 200 \
    -u atom_style "id type x y z" \
    -t ./SmallExample/ih55-T_200.lammpsdump \
    --types Au

the -a are extra attributes that are saved in the hdf5 file and can be useful for storing data about the simulation

the -u are extra options to pass to the MDAnalysis.Universe

[3]:
def createTrajFile(
    trajname: str,
    topologyFile: str,
    trajectories: "list[str]",
    outFile: str,
    extraAttrs=None,
):
    u = Universe(topologyFile, *trajectories, atom_style="id type x y z")
    u.atoms.types = ["Au"] * len(u.atoms)

    HDF5er.MDA2HDF5(u, outFile, trajname, trajChunkSize=1000, attrs=extraAttrs)


extraAttrs = {
    "ts": "5fs",
    "pair_style": "smatb/single",
    "pair_coeff": "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666",
}
for T in [100, 200]:
    createTrajFile(
        trajname=f"SmallExample_{T}",
        topologyFile="./SmallExample/ih55.data",
        trajectories=[f"./SmallExample/ih55-T_{T}.lammpsdump"],
        outFile=exampleHDF5,
        extraAttrs=extraAttrs,
    )

[0:1000] 1000 1000 chunk of 9016 B
[1000:2000] 1000 1000 chunk of 9016 B
[2000:3000] 1000 1000 chunk of 9016 B
[3000:4000] 1000 1000 chunk of 9016 B
[4000:5000] 1000 1000 chunk of 9016 B
[5000:6000] 1000 1000 chunk of 9016 B
[6000:7000] 1000 1000 chunk of 9016 B
[7000:8000] 1000 1000 chunk of 9016 B
[8000:9000] 1000 1000 chunk of 9016 B
[9000:10000] 1000 1000 chunk of 9016 B
[10000:11000] 1000 1000 chunk of 9016 B
[11000:12000] 1000 1000 chunk of 9016 B
[12000:13000] 1000 1000 chunk of 9016 B
[13000:14000] 1000 1000 chunk of 9016 B
[14000:15000] 1000 1000 chunk of 9016 B
[15000:16000] 1000 1000 chunk of 9016 B
[16000:17000] 1000 1000 chunk of 9016 B
[17000:18000] 1000 1000 chunk of 9016 B
[18000:19000] 1000 1000 chunk of 9016 B
[19000:20000] 1000 1000 chunk of 9016 B
[0:1000] 1000 1000 chunk of 9016 B
[1000:2000] 1000 1000 chunk of 9016 B
[2000:3000] 1000 1000 chunk of 9016 B
[3000:4000] 1000 1000 chunk of 9016 B
[4000:5000] 1000 1000 chunk of 9016 B
[5000:6000] 1000 1000 chunk of 9016 B
[6000:7000] 1000 1000 chunk of 9016 B
[7000:8000] 1000 1000 chunk of 9016 B
[8000:9000] 1000 1000 chunk of 9016 B
[9000:10000] 1000 1000 chunk of 9016 B
[10000:11000] 1000 1000 chunk of 9016 B
[11000:12000] 1000 1000 chunk of 9016 B
[12000:13000] 1000 1000 chunk of 9016 B
[13000:14000] 1000 1000 chunk of 9016 B
[14000:15000] 1000 1000 chunk of 9016 B
[15000:16000] 1000 1000 chunk of 9016 B
[16000:17000] 1000 1000 chunk of 9016 B
[17000:18000] 1000 1000 chunk of 9016 B
[18000:19000] 1000 1000 chunk of 9016 B
[19000:20000] 1000 1000 chunk of 9016 B

The attributes are then accessible, and can be used to our advantage (or to reproduce the simulations)

[4]:
with h5py.File(exampleHDF5, "r") as workFile:
    trjContainers=workFile['Trajectories']
    for name, trjGroup in trjContainers.items():
        print(f"Trajectory group name: {name}")
        print("Attributes:")
        for attname, attval in trjGroup.attrs.items():
            print(f'\t{attname}: "{attval}"')
Trajectory group name: SmallExample_100
Attributes:
        pair_coeff: "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666"
        pair_style: "smatb/single"
        ts: "5fs"
Trajectory group name: SmallExample_200
Attributes:
        pair_coeff: "1 1 2.88 10.35 4.178 0.210 1.818 4.07293506 4.9883063257983666"
        pair_style: "smatb/single"
        ts: "5fs"

Applying SOAP

We will deactivate the verbosity within this jupyter.

Then let’s calculate the SOAP fingerprints using dscribe to the all of the trajectories in the file.

Here we changed the lMax between the two iterations The next cell is equivalent to the command:

SOAPify-traj2SOAP ih55.hdf5 \
    -s ih55soap.hdf5 -g SOAP4_4_4 -l 4 -n 4 -r 4.48023312 -j 16

followed by

SOAPify-traj2SOAP ih55.hdf5 \
    -s ih55soap.hdf5 -g SOAP6_4_4 -l 6 -n 4 -r 4.48023312 -j 16
[5]:
def worker(trajFileName: str, soapFileName: str, soapGroup, **kwargs) -> None:
    with h5py.File(trajFileName, "r") as workFile, h5py.File(
        soapFileName, "a"
    ) as soapFile:
        SOAPify.saponifyMultipleTrajectories(
            trajContainers=workFile["Trajectories"],
            SOAPoutContainers=soapFile.require_group(soapGroup),
            SOAPOutputChunkDim=1000,
            verbose=False,
            **kwargs,
        )


for l in [4, 6]:
    worker(
        trajFileName=exampleHDF5,
        soapFileName=exampleSOAPHDF5,
        soapGroup=f"SOAP{l}_4_4 ",
        SOAPnJobs=16,
        SOAPrcut=4.48023312,
        SOAPnmax=4,
        SOAPlmax=l,
    )

using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_100"
extra SOAP arguments: None
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_100"
extra SOAP arguments: None
using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None

The information about the SOAP calculation are stored in the attributes of the SOAP fingerprint datasets:

[6]:
with h5py.File(exampleSOAPHDF5, "r") as workFile:
    for name, trjGroup in workFile.items():
        print(f"SOAP group name: \"{name}\"")
        print("Attributes:")
        for dsname, trjDS in trjGroup.items():
            print(f"\tSOAP dataset: {dsname}, shape {trjDS.shape}")
            for attname, attval in trjDS.attrs.items():
                print(f'\t\t{attname}: "{attval}"')
SOAP group name: "SOAP4_4_4 "
Attributes:
        SOAP dataset: SmallExample_100, shape (20000, 55, 50)
                SOAPengine: "dscribe"
                l_max: "4"
                n_max: "4"
                r_cut: "4.48023312"
                species: "['Au']"
                species_location_Au-Au: "[ 0 50]"
        SOAP dataset: SmallExample_200, shape (20000, 55, 50)
                SOAPengine: "dscribe"
                l_max: "4"
                n_max: "4"
                r_cut: "4.48023312"
                species: "['Au']"
                species_location_Au-Au: "[ 0 50]"
SOAP group name: "SOAP6_4_4 "
Attributes:
        SOAP dataset: SmallExample_100, shape (20000, 55, 70)
                SOAPengine: "dscribe"
                l_max: "6"
                n_max: "4"
                r_cut: "4.48023312"
                species: "['Au']"
                species_location_Au-Au: "[ 0 70]"
        SOAP dataset: SmallExample_200, shape (20000, 55, 70)
                SOAPengine: "dscribe"
                l_max: "6"
                n_max: "4"
                r_cut: "4.48023312"
                species: "['Au']"
                species_location_Au-Au: "[ 0 70]"

Selecting a trajectory

You can calculate SOAP only for a single trajectory: This is equivalent to

SOAPify-traj2SOAP ih55.hdf5 \
    -s ih55soap.hdf5 -g SOAP6_6_4 -l 6 -n 6 -r 4.48023312 -j 16 \
    -t /Trajectories/SmallExample_200
[7]:
with h5py.File(exampleHDF5, "r") as workFile, h5py.File(
    exampleSOAPHDF5, "a"
) as soapFile:
    SOAPify.saponifyTrajectory(
        trajContainer=workFile["/Trajectories/SmallExample_200"],
        SOAPoutContainer=soapFile.require_group("SOAP_6_6_4"),
        SOAPOutputChunkDim=1000,
        SOAPnJobs=16,
        SOAPrcut=4.48023312,
        SOAPnmax=6,
        SOAPlmax=6,
        verbose=False,
    )

using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None

Fingerprints for a subsystem

You can also calculate the soap fingerprints of a subgroup of atoms.

Here, for example we will calculate the SOAP fingerprints of only the 0th and the 15th atoms

There is (for now) no equivalent cli command.

[8]:
with h5py.File(exampleHDF5, "r") as workFile, h5py.File(
    exampleSOAPHDF5, "a"
) as soapFile:
    SOAPify.saponifyTrajectory(
        trajContainer=workFile["/Trajectories/SmallExample_200"],
        SOAPoutContainer=soapFile.require_group("SOAP_4_4_4_FEW"),
        centersMask=[0, 15],
        SOAPOutputChunkDim=1000,
        SOAPnJobs=16,
        SOAPrcut=4.48023312,
        SOAPnmax=4,
        SOAPlmax=4,
        verbose=False,
    )

using "dscribe" to calculate SOAP for "/Trajectories/SmallExample_200"
extra SOAP arguments: None

Note the the new attribute centersIndexes and the different shape of the dataset that reflects that SOAP fingerprints have been calculated only for atom 0 and 15:

[9]:
with h5py.File(exampleSOAPHDF5, "r") as workFile:
    name = "SOAP_4_4_4_FEW"
    trjGroup = workFile[name]
    print(f'SOAP group name: "{name}"')
    print("Attributes:")
    for dsname, trjDS in trjGroup.items():
        print(f"\tSOAP dataset: {dsname}, shape {trjDS.shape}")
        for attname, attval in trjDS.attrs.items():
            print(f'\t\t{attname}: "{attval}"')

SOAP group name: "SOAP_4_4_4_FEW"
Attributes:
        SOAP dataset: SmallExample_200, shape (20000, 2, 50)
                SOAPengine: "dscribe"
                centersIndexes: "[ 0 15]"
                l_max: "4"
                n_max: "4"
                r_cut: "4.48023312"
                species: "['Au']"
                species_location_Au-Au: "[ 0 50]"

You can call getSOAPSettings on a SOAP dataset to get the necessary data to ‘fill’ the vector (aka restore the repetition in the data that have been removed to save space) by simpy passing the returned dictionary to fillSOAPVectorFromdscribe

[10]:
with h5py.File(exampleSOAPHDF5, "r") as workFile:
    for name, trjGroup in workFile.items():
        print(f"SOAP group name: \"{name}\"")
        for dsname, trjDS in trjGroup.items():
            print(f"\tSOAP dataset: {dsname}, shape {trjDS.shape}:")
            fillInfo=SOAPify.getSOAPSettings(trjDS)
            print(f"\t{fillInfo}")
            example= SOAPify.fillSOAPVectorFromdscribe(trjDS[:],**fillInfo)
            print(f"\tFilled shape : {example.shape}")
SOAP group name: "SOAP4_4_4 "
        SOAP dataset: SmallExample_100, shape (20000, 55, 50):
        {'nMax': 4, 'lMax': 4, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 50, None)}}
        Filled shape : (20000, 55, 80)
        SOAP dataset: SmallExample_200, shape (20000, 55, 50):
        {'nMax': 4, 'lMax': 4, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 50, None)}}
        Filled shape : (20000, 55, 80)
SOAP group name: "SOAP6_4_4 "
        SOAP dataset: SmallExample_100, shape (20000, 55, 70):
        {'nMax': 4, 'lMax': 6, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 70, None)}}
        Filled shape : (20000, 55, 112)
        SOAP dataset: SmallExample_200, shape (20000, 55, 70):
        {'nMax': 4, 'lMax': 6, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 70, None)}}
        Filled shape : (20000, 55, 112)
SOAP group name: "SOAP_4_4_4_FEW"
        SOAP dataset: SmallExample_200, shape (20000, 2, 50):
        {'nMax': 4, 'lMax': 4, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 50, None)}}
        Filled shape : (20000, 2, 80)
SOAP group name: "SOAP_6_6_4"
        SOAP dataset: SmallExample_200, shape (20000, 55, 147):
        {'nMax': 6, 'lMax': 6, 'atomTypes': array(['Au'], dtype=object), 'atomicSlices': {'AuAu': slice(0, 147, None)}}
        Filled shape : (20000, 55, 252)

About the command line interface

You can get the options for the command line interface(cli) interfaces with

SOAPify-prepareTrajectory --help

and

SOAPify-traj2SOAP --help