DeepMD (official website, github, documentation) is a powerful tool for modelling the time-evolution of a chemical system by using molecular dynamics with machine-learning force field.

The theoretical foundation of DeepMD can be found in here: zhang2018deeppotential, wang2018deepmdkit, zhang2020dpgen. We also will give a short introduction in below (Part 0).

DeepMD has many applications, one of the most significnat work is: lu202186pflops, where they have studied the dynamics of over 100 million atoms with ab-initio accuracy (thanks to the machine-learning force field). This is very impressive.

So in this article, I'm going to introduce you how to use DeepMD, we are going to start from installing this package, and end this article with a practical example that you could modify it in your own research. I think it is very important that all the workflows in computational chemistry should be open, accessible to everyone in the field, so that we could improve those workflows together and that will benefit everyone.

Also I'm a firm believer in developing useful tools for the task at hand. If you do not have proper tools, your productivity will be sub-optimial. In my opinion, software development and discovering novel tools (codes, procedures, workflows) are really important in computational chemistry. I believe they are more important than the application to different chemical systems (they are also very important, but I think they can be only seen as the test-ground for our methods, they should not be the main focus of computational chemist).

Now let's move to the juicy part.

Part 0: Understand the theoretical background of DeepMD

Before we try to understand how to use a code, we need to understand how it is constructed, what method did it use and how do those methods work. So in this section, I will introduce the theory behind deepmd. Hope it will give you a firm background of this software.

The goal for DeePMD-kit: (from wang2018deepmdkit):

Build deep learning based representation of potential energy and force field and to perform molecular dynamics.
DeepMD-kit is interfaced with: (1) Tensorflow (to build the machine learning model)  (2) high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, e.g. LAMMPS and the i-PI, respectively.

There are two fundamental properties in molecular dynamics: (1) energy of the system (2) force of the system (based on the energy gradient to the location \(\vec{f}_i=-\partial E/\partial \vec{r}_i\))

In DeepMD, the total energy of the system (\(E_{tot}=E_{tot}(\{\vec{R}_I\}\)) can be written in the summation of the contribution of each atom:

\begin{equation} E_{tot}=\sum_{i=1}^N E_i  \end{equation}

The energy for each atom is determined by its own and its nearest neighbours:

\begin{equation} E_i=E_{s(i)}(\vec{R}_i,\{\vec{R}_j|j\in N_{R_c}(i)\})  \end{equation}

But how do we determine which atoms we should consider as the "nearest neighbour" of i-th atom? By using the cutting off raidus \(\vec{R}_c\). If the distance between i-th atom and the other atom is shorter than \(\vec{R}_c\), then we consider them as "neighbour".

Since the energy depends on the location of atoms, intuitively, we would like to think that if we input all the coordinates of atoms in a system to a deep neural network (DNN), it will be OK. But that's not the case.

In physical system, we have certain symmetry. For example, if I rotate the system, the energy will stay the same. This we call "rotational symmetry". There are other types of symmetries also, namely "translational symmetry", "rotational symmetry", and "permutational symmetry", if we want to create features that can used to model the energy (energy has all three symmetries), we need to create features that also have all symmetries. That's why we need to transform our cartesian coordinates of all atoms to a series of "descriptors", which we will discuss in below.

Atomic environment descriptor

The DeepMD method use "internal coordinates" as a descriptor (or feature) for DNN method. In the following are several steps that they take.

Step 1: Calculate the relative distance between i-th atom and its nearest neighbour

The distance between i-th atom and its nearest neighbour is: \(\vec{R}_{ij}=\vec{R}_i-\vec{R}_j\), it will satisfy the translational symmetry (because no matter what you move the system, the relative distance doesn't change. That's good!)

In conventional cartesian coordinates (named "lab frame" in the paper) (\(\{\vec{e}_x^0, \vec{e}_y^0, \vec{e}_z^0\}\)), the \(\vec{R}_{ij}\) can be written as:


So that we could use (\(x_{ij}^0, y_{ij}^0, z_{ij}^0\)) to represent this vector. We will use this later for preserving rotational symmetry.

Step 2: Perserve the rotational symmetry

The way DeepMD perserve rotational symmetry is by using "local coordination system".  How do we construct a local coordination system? First we need our i-th atom (as the center), then we need two other atoms in order to create the basis set of our coordination system, let's call them \(a(i)\) and \(b(i)\) (as the convention used in wang2018deepmdkit. So we can have two distances: \(\vec{R}_{ia(i)}\) and \(\vec{R}_{ib(i)}\). Now it is very easy to create a coordination system by using two vectors, just use the Gram-Schmidt process in linera algebra, and do not forget to normalize the vector, you are good to go!

According to the discussion above, we can establish the local coordination system with the center at i-th atom:


$$\vec{e}_{i2}=e(\vec{R}_{ib(i)}-(\vec{R}_{ib(i)}\cdot \vec{e}_{i1})\vec{e}_{i1})$$

$$\vec{e}_{i3}=\vec{e}_{i1}\times \vec{e}_{i2}$$

The logic behind above procedure is: First, normalize \(\vec{e}_{i1}\); Then, make \(\vec{R}_{ib(1)}\) orthogonal to \(\vec{R}_{ia(1)}\), also normalize the results; after we have two vectors which are both normalized and perpendicular to each other, then we could use cross product to create the third vector, which direction will be perpendicular to both vectors, and also its magnitude can be calculated by this formula: \(|v_3|=|v_1||v_2|\sin(\theta)=1\times 1\times 1=1\).

The \(e(\vec{R})\) represnts the normalization of vector \(\vec{R}\).

The change between local coordinates and lab coordinates can be done by the following:

$$(x_{ij}, y_{ij}, z_{ij})=(x_{ij}^0, y_{ij}^0, z_{ij}^0)\cdot [\vec{e}_{i1}, \vec{e}_{i2}, \vec{e}_{i3}]$$

In this local coordination system, if we rotate the system, the coordination system will be the same.

Quick exercise: can you prove that mathematically?

Now we have established a local coordination system for i-th atom, then we can use this coordination system to calculate the energy term of i-th atom \(E_i\). Right now all the features have the same symmetry as the energy.

Step 3: Choose the correct complexity

By the end of step 2, we have a local coordination system, now we have two ways to construct the feature:

  1. For i-th atom, and its neighbour, we only consider the distance between two atoms (\(\vec{R}_{ij}\)), where \(|\vec{R}_i-\vec{R}_j|<\vec{R}_c\)
  2. For i-th atom, and its neighbour, not only we consider the distance, we also consider the direction of the bond (namely the coordinates for the bond vector), in this case there will be four numbers for each pair: (\(\vec{R}_{ij}, x_{ij}, y_{ij}, z_{ij}\)).
How to choose the correct complexity?

This is a question that we need to think about.

Chain rules for force/virial computations

Now we have all our features, we can construct the atomic energy by using a deep neural network (DNN), the mathematical description of DNN is really simple, it is just a series of nested linear functions. If we assume that we have \(h\) hidden layer in our DNN, then the energy contribution of i-th atom can be represented as:

$$E_{s(i)}=L_{s(i)}^{out}\circ L_{s(i)}^{N_h}\circ\cdots L_{s(i)}^1(\vec{D}_i)$$

where \(\vec{D}_i\) is the feature vector that we choose.

The function \(L\) is called an "activation function", which can be written as:


This is very easy mathematics, and can be found in any textbooks about machine learning or deep learning.

Once we have the expression for the total energy, we can derived the expression for the force and virial of the system:

Expression for the force and the virial on each atom, from wang2018deepmdkit

Loss function for the DNN

In order to train the DNN, which means we need to find the optimal value for (\(\vec{W}_{s(i)}^p, b_{s(i)}^p\)). In order to do that we need a loss function that we can minimize in order to track our progress of the training.

The loss function proposed in the paper is:

Where \(\Delta E\), \(\Delta \vec{F}_i\), \(\Delta\Sigma\) denotes the root mean square (RMS) error in energy, force and virial respectively.

In DeepMD, the values for \(p_{\epsilon}\), \(p_{f}\) and \(p_{\xi}\) are free to change during the optimization process in order to achieve better accuracy. The choice given by the paper is:

The choice for the prefactors.

I think the loss function that the paper given is just one possibility, we can have other loss functions which could behave similarly (or even better). This is still an open question.

Part 1: Install the package

There are different ways for installing the deepmd package, each for different user cases.

The installation guide can be seen in here:

But if you want to use CUDA 11.0 version, then you need to download this version:

Upload to your supercomputer, and just run this script, it will install automatically. If you have encounter some problems, please contact me at I'll be very happy to answer your questions. Or you could report an issue on DeePMD-kit github:

A deep learning package for many-body potential energy representation and molecular dynamics - deepmodeling/deepmd-kit

Part 2: Use DeePMD-kit

Structure of DeePMD-kit

DeePMD-kit contians 3 parts:

  1. A library contains the implementation of the computation of descriptors, forces, and virial. Written in C++; including the interface to TensorFlow and third-party MD packages.
  2. A training and testing program builts on TensorFlow's python API
  3. supports for molecular dynamics code (e.g. LAMMPS & i-PI) – basically how to use the machine learning force field (MLFF) in those softwares

Workflow in using DeePMD-kit

There are four steps in using DeePMD-kit:

  1. Preparing data
  2. Training the model
  3. Testing the model
  4. Running classical / path-integral MD simulations with the mode (MLFF)

Step 1 - Data preparation

The data used in DeePMD-kit is composed as a list of systems. Each system contains lots of frames. Some of the frames are used as training data, others are used as validating or testing data. For each frame, the software will need the information below (each one related to one raw file): (1) size of your simulation box (box.raw) (2) cartesian coordinates of each atom in the system (coord.raw) (3) types of atoms (type.raw) (4) energy (energy.raw) (5) force (force.raw) (6) virial (virial.raw).

Besides type.raw, other files are constructed frame by frame, which means each line represent a frame in the MD simulation.

It is very important to illustarte the unit of the physical variables. For length it is A, for energy is eV, for force is eV/A, respectively.

Once you have all the raw files, you can use the following line to convert them into training sets, validation sets and testing sets:

$deepmd_source_dir/data/raw/ 2000

Where 2000 means that we want to split the frames by the factor of 2000, but this number can be adjusted accordingly.

In the end you will have a series of set files, then you could use those set files in your training process, which we will discuss later.

Step 2 - Training the model

In order to train the model, first you need to construct the model. According to the theoretical background mentioned above, there are two steps in building the DNN model: (1) transform all the cartesian coordinates to the descriptors (local coordination system) (2) build the DNN model from those descriptors. The code for doing that is called dp_train, and it needs an input file, which is a json file, an example is given in below:

This example is copied from deepmd github repository: link.

The name of the json file can be given by the user, but the execution command is:

dp train your_name.json

Insider your_name.json, there are many important parameters that determines how you want to train your model.

An example (water_se_a.json)

"model": {
    "type_map":	["O", "H"],
    "descriptor" :{
        "type":		"se_a",
        "rcut_smth":	5.80,
        "rcut":		6.00,
        "sel":		[46, 92],
        "neuron":		[25, 50, 100],
        "axis_neuron":	16,
        "resnet_dt":	false,
        "seed":		1,
        "_comment":		" that's all"
    "fitting_net" : {
        "neuron":		[240, 240, 240],
        "resnet_dt":	true,	    
        "seed":		1,
        "_comment":		" that's all"
    "_comment":	" that's all"

"learning_rate" :{
    "type":		"exp",
    "start_lr":	0.005,
    "decay_steps":	5000,
    "decay_rate":	0.95,
    "_comment":	"that's all"

"loss" : {
    "start_pref_e":	0.02,
    "limit_pref_e":	1,
    "start_pref_f":	1000,
    "limit_pref_f":	1,
    "start_pref_v":	0,
    "limit_pref_v":	0,
    "_comment":	" that's all"

"training" : {
    "systems":	["../data1/", "../data2/"],
    "set_prefix":	"set",    
    "stop_batch":	1000000,
    "_comment": " batch_size can be supplied with, e.g. 1, or auto (string) or [10, 20]",
    "batch_size":	1,

    "seed":		1,

    "_comment": " display and restart",
    "_comment": " frequencies counted in batch",
    "disp_file":	"lcurve.out",
    "disp_freq":	100,
    "_comment": " numb_test can be supplied with, e.g. 1, or XX% (string) or [10, 20]",
    "numb_test":	10,
    "save_freq":	1000,
    "save_ckpt":	"model.ckpt",
    "load_ckpt":	"model.ckpt",
    "profiling":	false,
    "_comment":	"that's all"

When learning any codes, I always ask myself a question: "Can I list all the variables that I need to determine before looking into the documentation of the input file of the code?" If I can do that, then I have understood the structure of the code, if not, then I will go back to the fundamental, and trying to understand it better. We are going to do just that in the following, once you understand why these parameters exist, you will have deeper understanding into the DeePMD-kit package.

In DeePMD-kit, we need to consider: (1) system (2) DNN. And in DNN, we need to consider: (1) architecture of the DNN (2) how to train it (loss function, learning rate etc.). Based on this simple logic, we can understand the above script clearer. For more detailed information about the parameters, you can check this website:

It is very important to note that in DeePMD-kit, they use TensorFlow checkpoints to store the intermediate results of your training, so you do not have to restart the training from scratch (if your calculation breaks due to time limit on your supercomputer, for example).

The parameters for saving the training results are:

"save_freq":	1000,
"save_ckpt":	"model.ckpt",
"load_ckpt":	"model.ckpt",

which means we will save the training results after 1000 epochs.

And if we want to restart our training, we could use dp train --restart (checkpoint)

Once the model has been trained (reaches the convergence), then we can freeze the model by using the following command:

dp freeze -o graph.pb

Step 3 - Test your model

Testing the accuracy of your model in DeePMD-kit is really easy, just using this command:

dp test -m graph.pb -s /path/to/system -n 30

where -m gives the model, -s gives the path to the tested systems, -n is the number of frames that we used as test.

Once your model has been tested, it can be used as the foundation of your molecular dynamics simulation, we will talk about that later.

Step 4 - Applications


In ASE, we can use DeePMD-kit as a calculator for the potential energy, also the preliminary geometric optimization. The example code is shown in below:

from ase import Atoms
from deepmd.calculator import DP

water = Atoms('H2O',
              positions=[(0.7601, 1.9270, 1),
                         (1.9575, 1, 1),
                         (1., 1., 1.)],
              cell=[100, 100, 100],
from ase.optimize import BFGS
dyn = BFGS(water)

I think it will be extremely helpful in the initialization of our structures for the catalytic systems.

This is a really long article, which contains almost everything you need to know about DeePMD-kit if you are a complete beginner like me. If you like this article, please subscribe for more like this in the future.

Take care, be happy! I'll see you in the next post.