Solute

Solute

Author:

Orion Cohen

Year:

2021

Copyright:

GNU Public License v3

Solute is the core of solvation_analysis. Once you have specified the solute and solvent, Solute can calculate the coordination, networking, pairing, speciation, and residence information for every solute in the system.

If you are just getting started with solvation_analysis, the tutorials are the best place to start. If you are looking to learn more, this page will provide a full API overview of the Solute class and go into more detail on it’s inner working. There are three core concepts to understand in the Solute class: solute_atoms vs solvents, solvation data DataFrame, and the atom_solutes dictionary.

solute vs solvents

To instantiate a Solute, one must first use MDAnalysis to select the solute_atoms and solvents. The solute_atoms can be organized in different ways (described below) but the solvents are always a dictionary of solvent names (strings) and solvent AtomGroups. In some cases, specifying the solvent and solutes is the hardest part. Note that one species might be both the solute and a solvent. Such as if you are interested in hydrogen bond networks in water.

the solvation data DataFrame

The solvation data DataFrame is the core of the Solute class. It contains the full solvation shell of every solute in the system at every timestep. It is a labeled adjacency matrix of the solvation network. From this, all other solvation information is derived: coordination numbers, residence times, solute-solvent pairing, etc. The DataFrame does not need to be accessed by the user, but it is at the core of how solvation_analysis works.

the atom_solutes dictionary

Solvation can be considered on two scales, the scale of the solute molecule or the scale of atoms in the solute. For example, one might be interested in the solvation shell around a whole solute molecule, or around just a specific functional group. The Solute allows both scales to be analyzed simultaneously. It first constructs a Solute for every unique atom in the solute_atoms AtomGroup and then combines them into a single Solute representing the whole molecule. The Solute for each atom is stored in the atom_solutes dictionary. The key’s of the dictionary are the solute_name of each Solute, and depending on on how the Solute was instantiated, these might be generic names (e.g. “solute_0”, “solute_1”, etc.) or user-specified.

A Solute can be instantiated with the from_atoms, from_atoms_dict, or from_solutes_list methods. Below, the basic use of each constructor is demonstrated.

from_atoms takes an MDAnalysis.AtomGroup as input and automatically parses it into individual atom_solutes. This is the least flexible constructor and does the most work behind the scenes.

water = u.select_atoms("resname water")
water_O = water.select_atoms("element O")
water_O_solute = Solute.from_atoms(water_O, {"water": water})
# OR
water_solute = Solute.from_atoms(water, {"water": water})

The from_atoms_dict constructor takes a dict of solute names (strings) and AtomGroups. Each AtomGroup must have only have one atom per solute. This provides an intermediate level of flexibility because it allows you to name your atom solutes.

water_solute = Solute.from_atoms_dict({"water_O": water_O, "water_H1": water_H1, "water_H2": water_H2}, {"water": water)

Finally, the from_solutes_list is the most flexible because it allows you to create all the solutes individually with different settings. It might be useful if you wanted to use specify radii for each solute_atom.

water_H1 = u.select_atoms(...)
water_H2 = u.select_atoms(...)
solute_O = Solute(water_O, {"water": water})
solute_H1 = Solute(water_H1, {"water": water})
solute_H2 = Solute(water_H2, {"water": water})

water_solute = Solute.from_solutes_list([solute_O, solute_H1, solute_H2])

Once the Solute is instantiated, it must be run by calling Solute.run(). This will calculate the solvation data and instantiate the analysis classes. At this point, the Solute is ready to be analyzed. Solute inherits from MDAnalysis.analysis.base.AnalysisBase, so run supports all arguments from the base class.

By default, Coordination, Pairing, and Speciation are instantiated as attributes of the Solute. Networking and Residence can also be added. These are covered in greater detail in their respective API pages.

Solute also provides several functions to select a particular solute and its solvation shell, returning an AtomGroup for visualization or further analysis. This is covered in the visualization tutorial.

class solvation_analysis.solute.Solute[source]

Analyze the solvation structure of a solute.

Solute determines the coordination between the solute and solvents, collecting that data in a pandas.DataFrame (solvation_data) for further analysis. The solvation data DataFrame is a labeled adjacency matrix of the first-shell solvation network.

By default, Solute will determine the coordination cutoff radii from the Solute-solvent RDF. Radii can also be supplied with the radii parameter. The RDF’s of each solute-solvent pari can be visualize with the plot_solvation_radius method.

Solute instantiates Speciation, Coordination, and Pairing objects from the solvation_data, providing a convenient interface for further analysis. Networking and Residence can be added by modifying the analysis_classes parameter.

The parameters below are the kwargs supplied to the from_* constructors.

Parameters:
  • radii (dict of {str: float}, optional) – an optional dictionary of solvent names and associated solvation radii e.g. {“name_2”: radius_2, “name_5”: radius_5} Any radii not given will be calculated. The solvent names should match the solvents parameter.

  • rdf_kernel (function, optional) – this function must take RDF bins and data as input and return a solvation radius as output. e.g. rdf_kernel(bins, data) -> 3.2. By default, the rdf_kernel is solvation_analysis.rdf_parser.identify_solvation_cutoff.

  • kernel_kwargs (dict, optional) – kwargs passed to rdf_kernel

  • rdf_init_kwargs (dict, optional) – kwargs passed to the initialization of the MDAnalysis.InterRDF used to plot the solute-solvent RDFs. By default, range will be set to (0, 7.5).

  • rdf_run_kwargs (dict, optional) – kwargs passed to the internal MDAnalysis.InterRDF.run() command e.g. inner_rdf.run(**rdf_run_kwargs). By default, step, start, and stop will use any kwargs provided to solute.run(**kwargs).

  • solute_name (str, optional) – the name of the solute, used for labeling.

  • analysis_classes (List[str] or str, optional) – a list of the analysis classes to be instantiated, current options are: “pairing”, “coordination”, “speciation”, “residence”, and “networking”. By default, only “pairing”, “coordination”, and “residence” are instantiated. If networking is included, the networking_solvents kwarg must be specified. Enter ‘all’ to include all analysis classes.

  • networking_solvents (str, optional) – see the solvents parameter of the Networking class.

  • verbose (bool, optional) – Turn on more logging and debugging, default False

Variables:
  • u (Universe) – An MDAnalysis.Universe object

  • atom_solutes (dict of {str: Solute}) – a dictionary of solute names and their associated Solute objects.

  • n_solutes (int) – number of solute atoms

  • radii (dict) – a dictionary of solvation radii for each solvent e.g. {“name_2”: radius_2, “name_2”: radius_2, …}

  • rdf_data (dict) – a dictionary of RDF data, keys are solvent names and values are (bins, data) tuples.

  • solvation_data (pandas.DataFrame) – a dataframe of solvation data with columns “frame”, “solute_atom”, “solvent_atom”, “distance”, “solvent_name”, and “solvent”. If multiple entries share a frame, solute_atom, and solvent_atom, all but the closest atom is dropped.

  • solvation_data_dupicates (pandas.DataFrame) – All rows that are dropped from solvation_data when duplicates are dropped.

  • solute_res_ix (np.array) – a numpy array of the residue indices of every solute.

  • solute_atom_ix (np.array) – a numpy array of the atom indices of every solute.

  • solvent_counts (dict of {str: int}) – a dictionary of the number of residues for each solvent.

  • res_name_map (pandas.Series) – a map from residue indices in the Universe to solvent and solute names from the solute. For example, if the first residue in the universe was in self.solvent['res_one'], then res_name_map[0] == 'res_one'.

  • pairing (pairing.Pairing (optional)) – pairing analyzes the fraction solutes and solvents that are coordinated.

  • coordination (coordination.Coordination (optional)) – coordination analyses the coordination numbers of solvents and which solvent atoms are coordinated.

  • speciation (speciation.Speciation (optional)) – speciation provides an interface for finding and selecting the solvation shells surrounding each solute.

  • residence (residence.Residence (optional)) – residence calculates the residence times of each solvent on the solute. Only instantiated if ‘residence’ is included in the analysis_classes kwarg.

  • networking (networking.Networking (optional)) – networking analyses the connectivity of solute-solvent networks. Only instantiated if ‘networking’ is included in the analysis_classes kwarg. the networking_solvents kwarg must be specified.

__init__(solute_atoms, solvents, atom_solutes=None, radii=None, rdf_kernel=None, kernel_kwargs=None, rdf_init_kwargs=None, rdf_run_kwargs=None, solute_name='solute_0', analysis_classes=None, networking_solvents=None, verbose=False, internal_call=False)[source]

This method is not intended to be called directly. Instead, use from_atoms, from_atoms_dict or from_solute_list to create a Solute.

static from_atoms(solute_atoms, solvents, rename_solutes=None, **kwargs)[source]

Create a Solute from a single AtomGroup. The solute_atoms AtomGroup must should contain identical residues and identical atoms on each residue. For example, if one wanted to look at the O and H atoms of an ethanol molecule, solute_atoms should contain the O and H atoms of all ethanol molecules. It could not contain C atoms from some ethanol or include residues that were not ethanol.

Parameters:
  • solute_atoms (AtomGroup) – an AtomGroup or ResidueGroup containing one atom per residue.

  • solvents (dict of {str: MDAnalysis.AtomGroup}) – a dictionary of solvent names and associated MDAnalysis.AtomGroups. e.g. {“name_1”: solvent_group_1,”name_2”: solvent_group_2, …}

  • rename_solutes (dict, optional) – a dictionary of solute names to rename the solutes. Keys are the default solute_name, and values are the new names. For example, from_atoms might return a solute with solute_name “solute_0”, but you want to rename it to “functional_group_X”. In this case, rename_solutes={“solute_0”: “functional_group_X”}. If None, solutes will be numbered in the order their atoms appear on the residue.

  • kwargs (dict) – All kwargs listed in the Parameters section of the Solute class.

Return type:

Solute

static from_atoms_dict(solute_atoms_dict, solvents, **kwargs)[source]

Create a Solute object from a dictionary of solute atoms.

Parameters:
  • solute_atoms_dict (dict of {str: MDAnalysis.AtomGroup}) – a dictionary of solute atoms, e.g. {“name_1”: solute_1, “name_2”: solute_2}. Each AtomGroup should contain one type of atom, and only one atom per residue.

  • solvents (dict of {str: MDAnalysis.AtomGroup}) – a dictionary of solvent names and associated MDAnalysis.AtomGroups. e.g. {“name_1”: solvent_group_1,”name_2”: solvent_group_2, …} kwargs

  • kwargs (dict) – All kwargs listed in the Parameters section of the Solute class.

Return type:

Solute

static from_solute_list(solutes, solvents, **kwargs)[source]

Create a Solute from a list of Solutes. All Solutes must have only a single solute atom on each solute residue. Essentially, from_solute_list allows you to preconstruct the atom_solutes dictionary. This is useful if you want to individually specify initialization parameters for each atom Solute, otherwise, a different constructor is recommended.

Parameters:
  • solutes (list of Solute) – A list of Solutes. All Solutes must have only a single solute atom.

  • on each solute residue.

  • solvents (dict of {str: MDAnalysis.AtomGroup}) – a dictionary of solvent names and associated MDAnalysis.AtomGroups. e.g. {“name_1”: solvent_group_1,”name_2”: solvent_group_2, …}

  • kwargs (dict) – All kwargs listed in the Parameters section of the Solute class.

Return type:

Solute

plot_solvation_radius(solute_name, solvent_name)[source]

Plot the RDF of a solvent molecule

Specified by the residue name in the solvents dict. Includes a vertical line at the radius of interest.

Parameters:

solvent_name (str) – the name of the residue of interest, as written in the solvents dict

Returns:

  • fig (matplotlib.Figure)

  • ax (matplotlib.Axes)

draw_molecule(residue, filename=None)[source]

Returns

Parameters:
  • residue (the str name of the solute or any electrolytes or a MDAnalysis.Residue)

  • filename (If true, writes a file using the RDKit backend.)

Return type:

RDKit.Chem.rdchem.Mol

get_shell(solute_index, frame, as_df=False, remove_mols=None, closest_n_only=None)[source]

Select the solvation shell of the solute.

The solvation shell can be returned either as an AtomGroup, to be visualized or passed to other routines, or as a pandas.DataFrame for convenient inspection.

The solvation shell can be truncated before being returned, either by removing specific residue types with remove_mols or by introducing a hard cutoff with closest_n_only.

Parameters:
  • solute_index (int) – The index of the solute of interest

  • frame (int) – the frame in the trajectory to perform selection at. Defaults to the current trajectory frame.

  • as_df (bool, default False) – if true, this function will return a DataFrame representing the shell instead of a AtomGroup.

  • remove_mols (dict of {str: int}, optional) – remove_dict lets you remove specific residues from the final shell. It should be a dict of molnames and ints e.g. {'mol1': n, 'mol2', m}. It will remove up to n of mol1 and up to m of mol2. So if the dict is {'mol1': 1, 'mol2', 1} and the shell has 4 mol1 and 0 mol2, get_shell will return a shell with 3 mol1 and 0 mol2.

  • closest_n_only (int, optional) – if given, only the closest n residues will be included

Return type:

MDAnalysis.AtomGroup or pandas.DataFrame

get_closest_n_mol(solute_atom_ix, n_mol, guess_radius=3, return_ordered_resix=False, return_radii=False)[source]

Select the n closest mols to the solute.

The solute is specified by it’s index within solvation_data. n is specified with the n_mol argument. Optionally returns an array of their resids and an array of the distance of the closest atom in each molecule.

Parameters:
  • solute_atom_ix (int) – The index of the solute of interest

  • n_mol (int) – The number of molecules to return

  • guess_radius (float or int) – an initial search radius to look for closest n mol

  • return_ordered_resix (bool, default False) – whether to return the resix of the closest n molecules, ordered by radius

  • return_radii (bool, default False) – whether to return the distance of the closest atom of each of the n molecules

Returns:

  • full shell (MDAnalysis.AtomGroup) – the atoms in the shell

  • ordered_resids (numpy.array of int, optional) – the residue id of the n_mol closest atoms

  • radii (numpy.array of float, optional) – the distance of each atom from the center

radial_shell(solute_atom_ix, radius)[source]

Select all residues with atoms within r of the solute.

The solute is specified by it’s index within solvation_data. r is specified with the radius argument. Thin wrapper around solvation.get_radial_shell.

Parameters:
  • solute_atom_ix (int) – the index of the solute of interest

  • radius (float or int) – radius used for atom selection

Return type:

MDAnalysis.AtomGroup