cluster : group trajectory frames by structural similarity

class idpflex.cluster.ClusterTrove[source]

Bases: idpflex.cluster.ClusterTrove

A namedtuple with a keys() method for easy access of fields, which are described below under header Parameters

Parameters
  • idx (list) – Frame indexes for the representative structures (indexes start at zero)

  • rmsd (ndarray) – distance matrix between representative structures.

  • tree (Tree) – Clustering of representative structures. Leaf nodes associated with each centroid contain property iframe, which is the frame index in the trajectory pointing to the atomic structure corresponding to the centroid.

keys()[source]

Return the list of field names

save(filename)[source]

Serialize the cluster trove and save to file

Parameters

filename (str) – File name

idpflex.cluster.cluster_trajectory(a_universe, selection='not name H*', segment_length=1000, n_representatives=1000)[source]

Cluster a set of representative structures by structural similarity (RMSD)

The simulated trajectory is divided into segments, and hierarchical clustering is performed on each segment to yield a limited number of representative structures. These are then clustered into the final hierachical tree.

Parameters
  • a_universe (Universe) – Topology and trajectory.

  • selection (str) – atoms for which to calculate RMSD. See the selections page for atom selection syntax.

  • segment_length (int) – divide trajectory into segments of this length

  • n_representatives (int) – Desired total number of representative structures. The final number may be close but not equal to the desired number.

  • distance_matrix (ndarray)

Returns

clustering results for the representatives

Return type

ClusterTrove

idpflex.cluster.cluster_with_properties(a_universe, pcls, p_names=None, selection='not name H*', segment_length=1000, n_representatives=1000)[source]

Cluster a set of representative structures by structural similarity (RMSD) and by a set of properties

The simulated trajectory is divided into segments, and hierarchical clustering is performed on each segment to yield a limited number of representative structures (the centroids). Properties are calculated for each centroid, thus each centroid is described by a property vector. The dimensionality of the vector is related to the number of properties and the dimensionality of each property. The distances between any two centroids is calculated as the Euclidean distance between their respective vector properties. The distance matrix containing distances between all possible centroid pairs is employed as the similarity measure to generate the hierarchical tree of centroids.

The properties calculated for the centroids are stored in the leaf nodes of the hierarchical tree. Properties are then propagated up to the tree’s root node.

Parameters
  • a_universe (Universe) – Topology and trajectory.

  • pcls (list) – Property classes, such as Asphericity of SaSa

  • p_names (list) – Property names. If None, then default property names are used

  • selection (str) – atoms for which to calculate RMSD. See the selections page for atom selection syntax.

  • segment_length (int) – divide trajectory into segments of this length

  • n_representatives (int) – Desired total number of representative structures. The final number may be close but not equal to the desired number.

Returns

Hierarchical clustering tree of the centroids

Return type

ClusterTrove

idpflex.cluster.load_cluster_trove(filename)[source]
Load a previously saved

ClusterTrove instance

Parameters

filename (str) – File name containing the serialized ClusterTrove

Returns

Cluster trove instance stored in file

Return type

ClusterTrove

idpflex.cluster.propagator_size_weighted_sum(values, tree, *, weights=<function weights_by_size>)

Calculate a property of the node as the sum of its siblings’ property values, weighted by the relative cluster sizes of the siblings.

Parameters
  • values (list) – List of property values (of same type), one item for each leaf node.

  • node_tree (Tree) – Tree of ClusterNodeX nodes

idpflex.cluster.trajectory_centroids(a_universe, selection='not name H*', segment_length=1000, n_representatives=1000)[source]

Cluster a set of consecutive trajectory segments into a set of representative structures via structural similarity (RMSD)

The simulated trajectory is divided into consecutive segments, and hierarchical clustering is performed on each segment to yield a limited number of representative structures (centroids) per segment.

Parameters
  • a_universe (Universe) – Topology and trajectory.

  • selection (str) – atoms for which to calculate RMSD. See the selections page for atom selection syntax.

  • segment_length (int) – divide trajectory into segments of this length

  • n_representatives (int) – Desired total number of representative structures. The final number may be close but not equal to the desired number.

Returns

rep_ifr – Frame indexes of representative structures (centroids)

Return type

list