cnextend : Functionality for the hiearchical tree

class idpflex.cnextend.ClusterNodeX(*args, **kwargs)[source]

Bases: scipy.cluster.hierarchy.ClusterNode

Extension of ClusterNode to accommodate a parent reference and a protected dictionary of properties.

add_property(a_property)[source]

Insert or update a property in the set of properties

Parameters:a_property (ProfileProperty) – a property instance
distance_submatrix(dist_mat)[source]

Extract matrix of distances between leafs under the node.

Parameters:dist_mat (numpy.ndarray) – Distance matrix (square or in condensed form) among all N leaves of the tree to which the node belongs to. The row indexes of dist_mat must correspond to the node IDs of the leaves.
Returns:square distance matrix MxM between the M leafs under the node
Return type:ndarray
leaf_ids

ID’s of the leafs under the tree, ordered by increasing ID.

Returns:
Return type:list
leafs

Find the leaf nodes under this cluster node.

Returns:node leafs ordered by increasing ID
Return type:list
representative(dist_mat, similarity=<function mean>)[source]

Find leaf under node that is most similar to all other leaves under the node

Find the leaf that minimizes the similarity between itself and all the other leaves under the node. For instance, the average of all distances between one leaf and all the other leaves results in a similarity scalar for the leaf.

Parameters:
  • dist_mat (ndarray) – condensed or square distance matrix MxM or NxN among all N leaves in the tree or among all M leaves under the node. If dealing with the distance matrix among all leaves in the tree, self.distance_submatrix is first applied.
  • similarity (function object) – reduction operation on a the list of distances between one leaf and the other (M-1) leaves.
Returns:

representative leaf node

Return type:

ClusterNodeX

tree

Tree object owning the node

Returns:
Return type:Tree
class idpflex.cnextend.Tree(z=None)[source]

Bases: object

Hierarchical binary tree.

Parameters:z (ndarray) – linkage matrix from which to create the tree. See linkage()
from_linkage_matrix(z, node_class=<class idpflex.cnextend.ClusterNodeX>)[source]

Refactored to_tree() converts a hierarchical clustering encoded in matrix z (by linkage) into a convenient tree object.

Each node_class instance has a left, right, dist, id, and count attribute. The left and right attributes point to node_class instances that were combined to generate the cluster. If both are None then node_class is a leaf node, its count must be 1, and its distance is meaningless but set to 0.

Parameters:
leafs
Returns:leaf nodes ordered by increasing ID
Return type:list
nodes_above_depth(depth=0)[source]

Nodes at or above depth from the root node

Parameters:depth (int) – Depth level starting from the root level (depth=0)
Returns:List of nodes ordered by increasing ID. Last one is the root node
Return type:list
nodes_at_depth(depth=0)[source]

Nodes at a given depth from the root node

Parameters:depth (int) – Depth level starting from the root level (depth=0)
Returns:List of nodes corresponding to that particular level
Return type:list
save(filename)[source]

Serialize the tree and save to file

Parameters:filename (str) – File name
idpflex.cnextend.load_tree(filename)[source]

Load a previously saved tree

Parameters:filename (str) – File name containing the serialized tree
Returns:Tree instance stored in file
Return type:Tree
idpflex.cnextend.random_distance_tree(*args, **kwargs)[source]

Instantiate a tree where leafs and nodes have random distances to each other.

Distances randomly retrieved from a flat distribution of numbers between 0 and 1

Parameters:n_leafs (int) – Number of tree leaves
Returns:Elements of the named tuple: - tree: Tree
Tree instance
  • distance_matrix: ndarray
    square distance matrix in between pair of tree leafs
Return type:namedtuple