cluster : group trajectory frames by structural similarity¶
-
class
idpflex.cluster.
ClusterTrove
[source]¶ Bases:
idpflex.cluster.ClusterTrove
A namedtuple with a keys() method for easy access of fields, which are described below under header Parameters
- Parameters
idx (
list
) – Frame indexes for the representative structures (indexes start at zero)rmsd (
ndarray
) – distance matrix between representative structures.tree (
Tree
) – Clustering of representative structures. Leaf nodes associated with each centroid contain property iframe, which is the frame index in the trajectory pointing to the atomic structure corresponding to the centroid.
-
idpflex.cluster.
cluster_trajectory
(a_universe, selection='not name H*', segment_length=1000, n_representatives=1000)[source]¶ Cluster a set of representative structures by structural similarity (RMSD)
The simulated trajectory is divided into segments, and hierarchical clustering is performed on each segment to yield a limited number of representative structures. These are then clustered into the final hierachical tree.
- Parameters
a_universe (
Universe
) – Topology and trajectory.selection (str) – atoms for which to calculate RMSD. See the selections page for atom selection syntax.
segment_length (int) – divide trajectory into segments of this length
n_representatives (int) – Desired total number of representative structures. The final number may be close but not equal to the desired number.
distance_matrix (
ndarray
)
- Returns
clustering results for the representatives
- Return type
-
idpflex.cluster.
cluster_with_properties
(a_universe, pcls, p_names=None, selection='not name H*', segment_length=1000, n_representatives=1000)[source]¶ Cluster a set of representative structures by structural similarity (RMSD) and by a set of properties
The simulated trajectory is divided into segments, and hierarchical clustering is performed on each segment to yield a limited number of representative structures (the centroids). Properties are calculated for each centroid, thus each centroid is described by a property vector. The dimensionality of the vector is related to the number of properties and the dimensionality of each property. The distances between any two centroids is calculated as the Euclidean distance between their respective vector properties. The distance matrix containing distances between all possible centroid pairs is employed as the similarity measure to generate the hierarchical tree of centroids.
The properties calculated for the centroids are stored in the leaf nodes of the hierarchical tree. Properties are then propagated up to the tree’s root node.
- Parameters
a_universe (
Universe
) – Topology and trajectory.pcls (list) – Property classes, such as
Asphericity
ofSaSa
p_names (list) – Property names. If None, then default property names are used
selection (str) – atoms for which to calculate RMSD. See the selections page for atom selection syntax.
segment_length (int) – divide trajectory into segments of this length
n_representatives (int) – Desired total number of representative structures. The final number may be close but not equal to the desired number.
- Returns
Hierarchical clustering tree of the centroids
- Return type
-
idpflex.cluster.
load_cluster_trove
(filename)[source]¶ - Load a previously saved
ClusterTrove
instance
- Parameters
filename (str) – File name containing the serialized
ClusterTrove
- Returns
Cluster trove instance stored in file
- Return type
-
idpflex.cluster.
propagator_size_weighted_sum
(values, tree, *, weights=<function weights_by_size>)¶ Calculate a property of the node as the sum of its siblings’ property values, weighted by the relative cluster sizes of the siblings.
- Parameters
values (list) – List of property values (of same type), one item for each leaf node.
node_tree (
Tree
) – Tree ofClusterNodeX
nodes
-
idpflex.cluster.
trajectory_centroids
(a_universe, selection='not name H*', segment_length=1000, n_representatives=1000)[source]¶ Cluster a set of consecutive trajectory segments into a set of representative structures via structural similarity (RMSD)
The simulated trajectory is divided into consecutive segments, and hierarchical clustering is performed on each segment to yield a limited number of representative structures (centroids) per segment.
- Parameters
a_universe (
Universe
) – Topology and trajectory.selection (str) – atoms for which to calculate RMSD. See the selections page for atom selection syntax.
segment_length (int) – divide trajectory into segments of this length
n_representatives (int) – Desired total number of representative structures. The final number may be close but not equal to the desired number.
- Returns
rep_ifr – Frame indexes of representative structures (centroids)
- Return type