Inheritance diagram for nipy.algorithms.clustering.hierarchical_clustering:
These routines perform some hierrachical agglomerative clustering of some input data. The following alternatives are proposed: - Distance based average-link - Similarity-based average-link - Distance based maximum-link - Ward’s algorithm under graph constraints - Ward’s algorithm without graph constraints
In this latest version, the results are returned in a ‘WeightedForest’ structure, which gives access to the clustering hierarchy, facilitates the plot of the result etc.
For back-compatibility, *_segment versions of the algorithms have been appended, with the old API (except the qmax parameter, which now represents the number of wanted clusters)
Author : Bertrand Thirion,Pamela Guevara, 2006-2009
Bases: nipy.algorithms.graph.forest.Forest
This is a weighted Forest structure, i.e. a tree - each node has one parent and children (hierarchical structure) - some of the nodes can be viewed as leaves, other as roots - the edges within a tree are associated with a weight: +1 from child to parent -1 from parent to child - additionally, the nodes have a value, which is called ‘height’, especially useful from dendrograms
Methods
adjacency() | returns the adjacency matrix of the graph as a sparse coo matrix |
all_distances([seed]) | returns all the distances of the graph as a tree |
anti_symmeterize() | anti-symmeterize self, i.e. produces the graph |
cc() | Compte the different connected components of the graph. |
check() | Check that self is indeed a forest, i.e. |
check_compatible_height() | Check that height[parents[i]]>=height[i] for all nodes |
cliques() | Extraction of the graphe cliques |
compact_neighb() | returns a compact representation of self |
compute_children() | Define the children of each node (stored in self.children) |
copy() | returns a copy of self |
cut_redundancies() | Returns a graph with redundant edges removed: ecah edge (ab) is present ony once in the edge matrix: the correspondng weights are added. |
define_graph_attributes() | define the edge and weights array |
degrees() | Returns the degree of the graph vertices. |
depth_from_leaves() | compute an index for each node: 0 for the leaves, 1 for |
dijkstra([seed]) | Returns all the [graph] geodesic distances starting from seed |
floyd([seed]) | Compute all the geodesic distances starting from seeds |
from_3d_grid(xyz[, k]) | Sets the graph to be the topological neighbours graph |
get_E() | To get the number of edges in the graph |
get_V() | To get the number of vertices in the graph |
get_children([v]) | Get the children of a node/each node |
get_descendants(v[, exclude_self]) | returns the nodes that are children of v as a list |
get_edges() | To get the graph’s edges |
get_height() | Get the height array |
get_vertices() | To get the graph’s vertices (as id) |
get_weights() | |
is_connected() | States whether self is connected or not |
isleaf() | Identification of the leaves of the forest |
isroot() | Returns an indicator of nodes being roots |
kruskal() | Creates the Minimum Spanning Tree of self using Kruskal’s algo. |
leaves_of_a_subtree(ids[, custom]) | tests whether the given nodes are the leaves of a certain subtree |
left_incidence() | Return left incidence matrix |
list_of_neighbors() | returns the set of neighbors of self as a list of arrays |
list_of_subtrees() | returns the list of all non-trivial subtrees in the graph |
main_cc() | Returns the indexes of the vertices within the main cc |
merge_simple_branches() | Return a subforest, where chained branches are collapsed |
normalize([c]) | Normalize the graph according to the index c |
partition(threshold) | Partition the tree according to a cut criterion |
plot([ax]) | Plot the dendrogram associated with self |
plot_height() | Plot the height of the non-leaves nodes |
propagate_upward(label) | Propagation of a certain labelling from leves to roots |
propagate_upward_and(prop) | propagates from leaves to roots some binary property of the nodes |
remove_edges(valid) | Removes all the edges for which valid==0 |
remove_trivial_edges() | Removes trivial edges, i.e. |
reorder_from_leaves_to_roots() | reorder the tree so that the leaves come first then their |
right_incidence() | Return right incidence matrix |
set_edges(edges) | Sets the graph’s edges |
set_euclidian(X) | Compute the weights of the graph as the distances between the |
set_gaussian(X[, sigma]) | Compute the weights of the graph as a gaussian function |
set_height([height]) | Set the height array |
set_weights(weights) | Set edge weights |
show([X, ax]) | Plots the current graph in 2D |
split(k) | idem as partition, but a number of components are supplied instead |
subforest(valid) | Creates a subforest with the vertices for which valid > 0 |
subgraph(valid) | Creates a subgraph with the vertices for which valid>0 |
symmeterize() | Symmeterize self, modify edges and weights so that |
to_coo_matrix() | Return adjacency matrix as coo sparse |
tree_depth() | Returns the number of hierarchical levels in the tree |
voronoi_diagram(seeds, samples) | Defines the graph as the Voronoi diagram (VD) |
voronoi_labelling(seed) | Performs a voronoi labelling of the graph |
Parameters: | V: the number of edges of the graph : parents=None: array of shape (V) :
height=None: array of shape(V) :
|
---|
returns the adjacency matrix of the graph as a sparse coo matrix
Returns: | adj: scipy.sparse matrix instance, :
|
---|
returns all the distances of the graph as a tree
Parameters: | seed=None array of shape(nbseed) with valuesin [0..self.V-1] :
|
---|---|
Returns: | dg: array of shape(nseed, self.V), the resulting distances : |
Notes
By convention infinite distances are given the distance np.inf
anti-symmeterize self, i.e. produces the graph whose adjacency matrix would be the antisymmetric part of its current adjacency matrix
Compte the different connected components of the graph.
Returns: | label: array of shape(self.V), labelling of the vertices : |
---|
Check that self is indeed a forest, i.e. contains no loop
Returns: | a boolean b=0 iff there are loops, 1 otherwise : |
---|
Notes
Slow implementation, might be rewritten in C or cython
Check that height[parents[i]]>=height[i] for all nodes
Extraction of the graphe cliques these are defined using replicator dynamics equations
Returns: | cliques: array of shape (self.V), type (np.int) :
|
---|
returns a compact representation of self
Returns: | idx: array of of shape(self.V + 1): :
neighb: array of shape(self.E), concatenated list of neighbors : weights: array of shape(self.E), concatenated list of weights : |
---|
Define the children of each node (stored in self.children)
returns a copy of self
Returns a graph with redundant edges removed: ecah edge (ab) is present ony once in the edge matrix: the correspondng weights are added.
Returns: | the resulting WeightedGraph : |
---|
define the edge and weights array
Returns the degree of the graph vertices.
Returns: | rdegree: (array, type=int, shape=(self.V,)), the right degrees : ldegree: (array, type=int, shape=(self.V,)), the left degrees : |
---|
compute an index for each node: 0 for the leaves, 1 for their parents etc. and maximal for the roots.
Returns: | depth: array of shape (self.V): the depth values of the vertices : |
---|
Returns all the [graph] geodesic distances starting from seed x
- seed (int, >-1, <self.V) or array of shape(p)
- edge(s) from which the distances are computed
Returns: | dg: array of shape (self.V), :
|
---|
Notes
It is mandatory that the graph weights are non-negative
Compute all the geodesic distances starting from seeds
Parameters: | seed= None: array of shape (nbseed), type np.int :
|
---|---|
Returns: | dg array of shape (nbseed, self.V) :
|
Notes
It is mandatory that the graph weights are non-negative. The algorithm proceeds by repeating Dijkstra’s algo for each seed. Floyd’s algo is not used (O(self.V)^3 complexity...)
Sets the graph to be the topological neighbours graph of the three-dimensional coordinates set xyz, in the k-connectivity scheme
Parameters: | xyz: array of shape (self.V, 3) and type np.int, : k = 18: the number of neighbours considered. (6, 18 or 26) : |
---|---|
Returns: | E(int): the number of edges of self : |
To get the number of edges in the graph
To get the number of vertices in the graph
Get the children of a node/each node
Parameters: | v: int, optional :
|
---|---|
Returns: | children: list of int the list of children of node v (if v is provided) :
|
returns the nodes that are children of v as a list
Parameters: | v: int, a node index : |
---|---|
Returns: | desc: list of int, the list of all descendant of the input node : |
To get the graph’s edges
Get the height array
To get the graph’s vertices (as id)
States whether self is connected or not
Identification of the leaves of the forest
Returns: | leaves: bool array of shape(self.V), indicator of the forest’s leaves : |
---|
Returns an indicator of nodes being roots
Returns: | roots, array of shape(self.V, bool), indicator of the forest’s roots : |
---|
Creates the Minimum Spanning Tree of self using Kruskal’s algo. efficient is self is sparse
Returns: | K, WeightedGraph instance: the resulting MST : |
---|
Notes
If self contains several connected components, will have the same number k of connected components
tests whether the given nodes are the leaves of a certain subtree
Parameters: | ids: array of shape (n) that takes values in [0..self.V-1] : custom == False, boolean :
|
---|
Return left incidence matrix
Returns: | left_incid: list :
|
---|
returns the set of neighbors of self as a list of arrays
returns the list of all non-trivial subtrees in the graph Caveat: theis function assumes that the vertices are sorted in a way such that parent[i]>i forall i Only the leaves are listeed, not the subtrees themselves
Returns the indexes of the vertices within the main cc
Returns: | idx: array of shape (sizeof main cc) : |
---|
Return a subforest, where chained branches are collapsed
Returns: | sf, Forest instance, same as self, without any chain : |
---|
Normalize the graph according to the index c Normalization means that the sum of the edges values that go into or out each vertex must sum to 1
Parameters: | c=0 in {0, 1, 2}, optional: index that designates the way :
|
---|
Notes
Note that when sum_{edge[e, .] == a } D[e] = 0, nothing is performed
Partition the tree according to a cut criterion
Plot the dendrogram associated with self the rank of the data in the dendogram is returned
Parameters: | ax: axis handle, optional : |
---|---|
Returns: | ax, the axis handle : |
Plot the height of the non-leaves nodes
Propagation of a certain labelling from leves to roots Assuming that label is a certain positive integer field this propagates these labels to the parents whenever the children nodes have coherent properties otherwise the parent value is unchanged
Parameters: | label: array of shape(self.V) : |
---|---|
Returns: | label: array of shape(self.V) : |
propagates from leaves to roots some binary property of the nodes so that prop[parents] = logical_and(prop[children])
Parameters: | prop, array of shape(self.V), the input property : |
---|---|
Returns: | prop, array of shape(self.V), the output property field : |
Removes all the edges for which valid==0
Parameters: | valid : (self.E,) array |
---|
Removes trivial edges, i.e. edges that are (vv)-like self.weights and self.E are corrected accordingly
Returns: | self.E (int): The number of edges : |
---|
reorder the tree so that the leaves come first then their parents and so on, and the roots are last.
Returns: | order: array of shape(self.V) :
|
---|
Return right incidence matrix
Returns: | right_incid: list :
|
---|
Sets the graph’s edges
Compute the weights of the graph as the distances between the corresponding rows of X, which represents an embdedding of self
Parameters: | X array of shape (self.V, edim), :
|
---|
Compute the weights of the graph as a gaussian function of the distance between the corresponding rows of X, which represents an embdedding of self
Parameters: | X array of shape (self.V, dim) :
sigma=0, float: the parameter of the gaussian function : |
---|
Notes
When sigma == 0, the following value is used: sigma = sqrt(mean(||X[self.edges[:, 0], :]-X[self.edges[:, 1], :]||^2))
Set the height array
Set edge weights
Parameters: | weights: array :
|
---|
Plots the current graph in 2D
Parameters: | X : None or array of shape (self.V, 2)
ax: None or int, optional :
|
---|---|
Returns: | ax: axis handle : |
Notes
This should be used only for small graphs.
idem as partition, but a number of components are supplied instead
Creates a subforest with the vertices for which valid > 0
Parameters: | valid: array of shape (self.V): idicator of the selected nodes : |
---|---|
Returns: | subforest: a new forest instance, with a reduced set of nodes : |
Notes
The children of deleted vertices become their own parent
Creates a subgraph with the vertices for which valid>0 and with the correponding set of edges
Parameters: | valid, array of shape (self.V): nonzero for vertices to be retained : |
---|---|
Returns: | G, WeightedGraph instance, the desired subgraph of self : |
Notes
The vertices are renumbered as [1..p] where p = sum(valid>0) when sum(valid==0) then None is returned
Symmeterize self, modify edges and weights so that self.adjacency becomes the symmetric part of the current self.adjacency.
Return adjacency matrix as coo sparse
Returns: | sp: scipy.sparse matrix instance :
|
---|
Returns the number of hierarchical levels in the tree
Defines the graph as the Voronoi diagram (VD) that links the seeds. The VD is defined using the sample points.
Parameters: | seeds: array of shape (self.V, dim) : samples: array of shape (nsamples, dim) : |
---|
Notes
By default, the weights are a Gaussian function of the distance The implementation is not optimal
Performs a voronoi labelling of the graph
Parameters: | seed: array of shape (nseeds), type (np.int), :
|
---|---|
Returns: | labels: array of shape (self.V) the labelling of the vertices : |
Agglomerative function based on a (hopefully sparse) similarity graph
Parameters: | G the input graph : |
---|---|
Returns: | t a weightForest structure that represents the dendrogram of the data : |
Agglomerative function based on a (hopefully sparse) similarity graph
Parameters: | G the input graph : stop: float :
qmax: int, optional :
verbose : bool, optional
|
---|---|
Returns: | u: array of shape (G.V) :
cost: array of shape (G.V (?)) :
|
Modifies the graph K to merge nodes i and j into nodes k
The similarity values are weighted averaged, where pop[i] and pop[j] yield the relative weights. this is used in average_link_slow (deprecated)
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters: | G : graph
feature : array of shape (G.V,dim_feature)
verbose : bool, optional
|
---|---|
Returns: | t : WeightedForest instance
|
Notes
When G has more than 1 connected component, t is no longer a tree. This case is handled cleanly now
Agglomerative function based on a field structure
Parameters: | F the input field (graph+feature) : stop: float, optional :
qmax: int, optional :
verbose : bool, optional
|
---|---|
Returns: | u: array of shape (F.V) :
cost array of shape (F.V - 1) :
|
Notes
See ward_quick_segment for more information
Caveat : only approximate
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters: | G : graph instance
feature: array of shape (G.V,dim_feature) :
verbose : bool, optional
|
---|---|
Returns: | t: weightForest instance, :
Notes : —- : Hopefully a quicker version : A euclidean distance is used in the feature space : Caveat : only approximate |
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters: | G: labs.graph.WeightedGraph instance :
feature array of shape (G.V,dim_feature) :
stop1 : int or float, optional
qmax : int, optional
verbose : bool, optional
|
---|---|
Returns: | u: array of shape (G.V) :
cost: array of shape (G.V - 1) :
|
Notes
Hopefully a quicker version
A euclidean distance is used in the feature space
Caveat : only approximate
Agglomerative function based on a topology-defining graph and a feature matrix.
Parameters: | G : graph object
feature : array of shape (G.V,dim_feature)
stop : int or float, optional
qmax : int, optional
verbose : bool, optional
|
---|---|
Returns: | u: array of shape (G.V): :
cost: array of shape (G.V - 1) :
|
Notes
A euclidean distance is used in the feature space
Caveat : when the number of cc in G (nbcc) is greter than qmax, u contains nbcc values, not qmax !