cnextend : Functionality for the hiearchical tree

class idpflex.cnextend.ClusterNodeX(*args, **kwargs)[source]

Bases: scipy.cluster.hierarchy.ClusterNode

Extension of ClusterNode to accommodate a parent reference and a dictionary-like of properties.

distance_submatrix(dist_mat)[source]

Extract matrix of distances between leafs under the node.

Parameters

dist_mat (numpy.ndarray) – Distance matrix (square or in condensed form) among all N leaves of the tree to which the node belongs to. The row indexes of dist_mat must correspond to the node IDs of the leaves.

Returns

square distance matrix MxM between the M leafs under the node

Return type

ndarray

property leaf_ids

ID’s of the leafs under the tree, ordered by increasing ID.

Returns

Return type

list

property leafs

Find the leaf nodes under this cluster node.

Returns

node leafs ordered by increasing ID

Return type

list

representative(dist_mat, similarity=<function mean>)[source]

Find leaf under node that is most similar to all other leaves under the node

Find the leaf that minimizes the similarity between itself and all the other leaves under the node. For instance, the average of all distances between one leaf and all the other leaves results in a similarity scalar for the leaf.

Parameters
  • dist_mat (ndarray) – condensed or square distance matrix MxM or NxN among all N leaves in the tree or among all M leaves under the node. If dealing with the distance matrix among all leaves in the tree, self.distance_submatrix is first applied.

  • similarity (function object) – reduction operation on a the list of distances between one leaf and the other (M-1) leaves.

Returns

representative leaf node

Return type

ClusterNodeX

property tree

Tree object owning the node

Returns

Return type

Tree

class idpflex.cnextend.Tree(z=None, dm=None)[source]

Bases: object

Hierarchical binary tree.

Parameters
  • z (ndarray) – linkage matrix from which to create the tree. See linkage()

  • dm (ndarray) – distance matrix from which to create the linkage matrix and tree

from_linkage_matrix(z, node_class=<class 'idpflex.cnextend.ClusterNodeX'>)[source]

Refactored to_tree() converts a hierarchical clustering encoded in matrix z (by linkage) into a convenient tree object.

Each node_class instance has a left, right, dist, id, and count attribute. The left and right attributes point to node_class instances that were combined to generate the cluster. If both are None then node_class is a leaf node, its count must be 1, and its distance is meaningless but set to 0.

Parameters
property leafs
Returns

leaf nodes ordered by increasing ID

Return type

list

nodes_above_depth(depth=0)[source]

Nodes at or above depth from the root node

Parameters

depth (int) – Depth level starting from the root level (depth=0)

Returns

List of nodes ordered by increasing ID. Last one is the root node

Return type

list

nodes_at_depth(depth=0)[source]

Nodes at a given depth from the root node

Parameters

depth (int) – Depth level starting from the root level (depth=0)

Returns

List of nodes corresponding to that particular level

Return type

list

save(filename)[source]

Serialize the tree and save to file

Parameters

filename (str) – File name

idpflex.cnextend.load_tree(filename)[source]

Load a previously saved tree

Parameters

filename (str) – File name containing the serialized tree

Returns

Tree instance stored in file

Return type

Tree

idpflex.cnextend.random_distance_tree(n_leafs)[source]

Instantiate a tree where leafs and nodes have random distances to each other.

Distances randomly retrieved from a flat distribution of numbers between 0 and 1

Parameters

n_leafs (int) – Number of tree leaves

Returns

Elements of the named tuple: - tree: Tree

Tree instance

  • distance_matrix: ndarray

    square distance matrix in between pair of tree leafs

Return type

namedtuple