The Basics
This tutorial will introduce the basic functionality and usage of MDAnalysis. It will focus on analyzing a Lithium-ion battery electrolyte composed of 49 Li+, 49 PF6-, 363 Butyro Nitride, and 237 Ethylene Carbonate. This tutorial will assume a basic familiarity with the MDAnalysis core data structures.
Creating a Solute
First, we import solvation_analysis and MDA and instantiate a universe object.
[19]:
# imports
import MDAnalysis as mda
import solvation_analysis
# we can use a test trajectory supplied with the package
from solvation_analysis.tests import datafiles
# instantiate Universe
u = mda.Universe(datafiles.bn_fec_data, datafiles.bn_fec_dcd_wrap)
Second, we need to select the solute and solvents AtomGroups that compose the solution. Here, I am selecting the relevant AtomGroups using Atom types, in general, you can select the AtomGroups any way you like. Generally, solvation_analysis will work best when 1) the solvents are disjoint sets and 2) the set of solute and solvents includes all atoms.
[20]:
# define solute AtomGroup
li_atoms = u.atoms.select_atoms("type 22")
# define solvent AtomGroups
PF6 = u.atoms.select_atoms("byres type 21")
BN = u.atoms.select_atoms("byres type 5")
FEC = u.atoms.select_atoms("byres type 19")
Finally, we can instantiate the Solute from the solute and solvents! The solvents are supplied with a dict of {str: AtomGroups}
. The strings should be the names of the solvents or some other convenient identifier because Solute will use this to identify the solvent in all future analysis.
[21]:
# instantiate solute
from solvation_analysis.solute import Solute
solute = Solute.from_atoms(li_atoms, {'PF6': PF6, 'BN': BN, 'FEC': FEC}, solute_name="Li")
We can see that solution now contains our atoms of interest as attributes:
[22]:
print(solute.solute_atoms)
print(solute.solvents['BN'])
<AtomGroup [<Atom 6734: of type 22 resid 651 and segid SYSTEM>, <Atom 6742: of type 22 resid 652 and segid SYSTEM>, <Atom 6750: of type 22 resid 653 and segid SYSTEM>, ..., <Atom 7102: of type 22 resid 697 and segid SYSTEM>, <Atom 7110: of type 22 resid 698 and segid SYSTEM>, <Atom 7118: of type 22 resid 699 and segid SYSTEM>]>
<AtomGroup [<Atom 1: of type 1 resid 1 and segid SYSTEM>, <Atom 2: of type 2 resid 1 and segid SYSTEM>, <Atom 3: of type 3 resid 1 and segid SYSTEM>, ..., <Atom 4354: of type 10 resid 363 and segid SYSTEM>, <Atom 4355: of type 11 resid 363 and segid SYSTEM>, <Atom 4356: of type 12 resid 363 and segid SYSTEM>]>
Running and Validating a Solute
Now that we have a solution, we can run the analysis by calling analysis.run()
. When run
is called, a few things happen:
First, Solute calculates the RDF between the solute and each solvent and uses it to identify the cutoff radius of the first solvation shell.
Second, Solute uses the cutoff radii to find all atoms in the first solvation shell. This is repeated for each solute at every frame in the analysis and the data is compiled into a pandas DataFrame, hereafter called solvation_data
.
Finally, Solute instantiates the analysis objects from the solvation_data
(by default Speciation, Coordination, and Pairing), providing a convenient interface for further analysis. These will be discussed later in the tutorial.
Now let’s do it!
[23]:
# run analysis
solute.run()
[23]:
<solvation_analysis.solute.Solute at 0x7fe038b14040>
It’s important to check that Solute did a good job identifying the solvation cutoff radii. Let’s inspect the cutoff radii that were calculated by Solute.
[24]:
# we need this just to display our plot
import matplotlib.pyplot as plt
# iterate through solvents
for solvent in solute.solvents.keys():
# plot the RDF!
solute.plot_solvation_radius('Li', solvent)
plt.show()
These look pretty good!
However, one could argue that our solvation cutoff for PF6 actually misses the first cutoff. This shouldn’t make a huge difference, but it’s not too hard to change either, so let’s fix it.
The trough of the first peak looks to be about 2.6 A. We can simply use the radii keyword of from_atoms
to specify the radii.
[25]:
# assign a new PF6 radius
solute = Solute.from_atoms(
li_atoms,
{'PF6': PF6, 'BN': BN, 'FEC': FEC},
solute_name="Li",
radii={"PF6": 2.6}
)
solute.run()
[25]:
<solvation_analysis.solute.Solute at 0x7fe05890c190>
[26]:
# inspect the radii
print(solute.radii)
# check that our new radius looks good
solute.plot_solvation_radius('Li', 'PF6')
plt.show()
{'PF6': 2.6, 'BN': 2.6500000000000004, 'FEC': 2.75}
There! Now we have solvation radii for all our species!
Occasionally, Solute will fail to identify the first peak, in that case you can add a solvation cutoff as shown above.
Basic Analysis with Coordination and Pairing
Solute instantiates Coordination, Pairing, and Speciation objects as attributes so that we can easily access detailed analysis through the Solute class. As an example, lets get the coordination numbers of our solvents. Here, the coordination number can be interpreted as the mean number of solvent coordinated with each solute. So in the case below, there are on average 0.253 FEC coordinated with each Li ion.
[27]:
# inspect the coordination numbers
solute.coordination.coordination_numbers
[27]:
{'BN': 4.351020408163265,
'FEC': 0.3346938775510204,
'PF6': 0.12040816326530612}
We use the same interface to view the percentage of solutes paired with each solvent. The pairing is defined as the percentage of solutes that are coordinated with ANY solvent. So in the example below, 100% of Li ions are coordinated with BN and 21% of Li ions are coordinated with FEC.
[28]:
# inspect the pairing percentages
solute.pairing.solvent_pairing
[28]:
{'BN': 1.0, 'FEC': 0.2653061224489796, 'PF6': 0.12040816326530612}
If we want to view this information across timesteps, we can instead look at the by-frame data. This can show us the coordination numbers or pairing fraction at each frame instead of averaged over the whole trajectory.
[29]:
# inspect coordination numbers by frame
solute.coordination.coordination_numbers_by_frame
[29]:
frame | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
solvent | ||||||||||
BN | 0.434694 | 0.436735 | 0.428571 | 0.432653 | 0.432653 | 0.436735 | 0.434694 | 0.444898 | 0.434694 | 0.434694 |
FEC | 0.024490 | 0.026531 | 0.030612 | 0.036735 | 0.030612 | 0.048980 | 0.030612 | 0.036735 | 0.038776 | 0.030612 |
PF6 | 0.016327 | 0.012245 | 0.014286 | 0.012245 | 0.012245 | 0.006122 | 0.010204 | 0.014286 | 0.012245 | 0.010204 |
This systems was already equilibrated, so unsurpringly, there isn’t much change.
Exploratory Analysis with Speciation
Speciation aims to enable a more interactive style of analysis. It helps identify the most common solvation shells, find specific examples of them, and select those examples for further analysis or visualization. It also collects summary statistics in the speciation_fraction
attribute, which shows the fraction of solutes that are solvated by a shell with the given composition.
[30]:
solute.speciation.speciation_fraction.head(8)
[30]:
BN | FEC | PF6 | count | |
---|---|---|---|---|
0 | 5 | 0 | 0 | 0.359184 |
1 | 4 | 0 | 0 | 0.257143 |
2 | 4 | 1 | 0 | 0.138776 |
3 | 4 | 0 | 1 | 0.085714 |
4 | 5 | 1 | 0 | 0.048980 |
5 | 4 | 2 | 0 | 0.030612 |
6 | 3 | 2 | 0 | 0.024490 |
7 | 3 | 0 | 1 | 0.022449 |
We can also use Speciation to determine the fraction of shells with some composition, such as having 4 BN.
[31]:
# calculate # of shells with 4 BN and any number of FEC or PF6
solute.speciation.calculate_shell_fraction({'BN': 4})
[31]:
0.516326530612245
[32]:
# make it more specific by specifying 1 PF6
solute.speciation.calculate_shell_fraction({'BN': 4, 'PF6': 1})
[32]:
0.08979591836734693
Perhaps we are interested in shells with 4 BN, 0 FEC, and 1 PF6. We can find these with speciation.find_shells
. This will find every solvation shell with that composition in every frame of the trajectory. Let’s see it in action
[33]:
# find all shells matching the given dictionary
solute.speciation.get_shells({'BN': 4, 'FEC': 0, 'PF6': 1}).head(12)
[33]:
solvent | BN | FEC | PF6 | |
---|---|---|---|---|
frame | solute_ix | |||
0 | 655 | 4 | 0 | 1 |
667 | 4 | 0 | 1 | |
670 | 4 | 0 | 1 | |
683 | 4 | 0 | 1 | |
690 | 4 | 0 | 1 | |
693 | 4 | 0 | 1 | |
1 | 667 | 4 | 0 | 1 |
668 | 4 | 0 | 1 | |
670 | 4 | 0 | 1 | |
671 | 4 | 0 | 1 | |
683 | 4 | 0 | 1 | |
693 | 4 | 0 | 1 |
Awesome! Let’s see what we have here by inspecting the data associated with the cluster. I like prime numbers, so lets go with solute 683 in frame 7. To do this, we will use the get_shell
method in the solution class. Setting as_df=True
will make the function return a DataFrame instead of an AtomGroup.
[34]:
# get the shell DataFrame
solute.get_shell(683, 7, as_df=True)
[34]:
distance | solute | solvent | solvent_ix | ||
---|---|---|---|---|---|
solute_atom_ix | solvent_atom_ix | ||||
7005 | 3700 | 2.047750 | Li | BN | 308 |
568 | 2.129007 | Li | BN | 47 | |
412 | 2.147081 | Li | BN | 34 | |
892 | 2.294223 | Li | BN | 74 | |
6755 | 2.435834 | Li | PF6 | 603 |
We can see that this matches the desired shell composition and shows us the indexes of the involved Atoms and Residues. Additionally, it shows us the distane of the closest atom of each solvent. If we want to return the AtomGroup instead of the DataFrame that’s easy as well.
[35]:
# get the shell AtomGroup
solute.get_shell(683, 7)
[35]:
<AtomGroup with 56 atoms>
The Core Solvation Data Structure
All of the analyses in solvation_analysis
are built on top of the solvation data, which is collected in a pandas DataFrame so that it can be parsed with vectorized numpy and pandas operations. Unless you want to design your own analyses, you probably won’t have a need to interact with this much, but it still might be useful to take a look at.
[36]:
# the core data structure
solute.solvation_data
[36]:
distance | solute | solvent | solvent_ix | ||||
---|---|---|---|---|---|---|---|
frame | solute_ix | solute_atom_ix | solvent_atom_ix | ||||
0 | 649 | 6733 | 4168 | 2.103129 | Li | BN | 347 |
2308 | 2.127130 | Li | BN | 192 | |||
6110 | 2.176079 | Li | FEC | 538 | |||
1312 | 2.316887 | Li | BN | 109 | |||
2608 | 2.376575 | Li | BN | 217 | |||
... | ... | ... | ... | ... | ... | ... | ... |
9 | 697 | 7117 | 652 | 2.018652 | Li | BN | 54 |
4000 | 2.092055 | Li | BN | 333 | |||
1468 | 2.148709 | Li | BN | 122 | |||
3328 | 2.184715 | Li | BN | 277 | |||
1804 | 2.371709 | Li | BN | 150 |
2355 rows × 4 columns
As you can see, the solvation data is indexed by the frame, solute_ix, solute_atom_ix, and solvent_atom_ix, which provide a unique identifier for each atom involved in solvation. Critically, the solvation data only includes atoms within the first solvation radius. By looking at the res_id’s we can determine the molecules involved in solvation. From only solvation_data, we can derive almost every interesting property of the solvation structure.