rdkit.ML.Cluster.Murtagh module¶
Interface to the C++ Murtagh hierarchic clustering code
- rdkit.ML.Cluster.Murtagh.ClusterData(data, nPts, method, isDistData=0)¶
clusters the data points passed in and returns the cluster tree
Arguments
data: a list of lists (or array, or whatever) with the input data (see discussion of _isDistData_ argument for the exception)
nPts: the number of points to be used
- method: determines which clustering algorithm should be used.
The defined constants for these are: ‘WARDS, SLINK, CLINK, UPGMA’
- isDistData: set this toggle when the data passed in is a
distance matrix. The distance matrix should be stored symmetrically so that _LookupDist (above) can retrieve the results:
for i<j: d_ij = dists[j*(j-1)//2 + i]
Returns
a single entry list with the cluster tree