RDKit
Open-source cheminformatics and machine learning.
RDPickers::HierarchicalClusterPicker Class Reference

Diversity picker based on hierarchical clustering. More...

#include <HierarchicalClusterPicker.h>

Inheritance diagram for RDPickers::HierarchicalClusterPicker:
RDPickers::DistPicker

Public Types

enum  ClusterMethod {
  WARD = 1, SLINK = 2, CLINK = 3, UPGMA = 4,
  MCQUITTY = 5, GOWER = 6, CENTROID = 7
}
 The type of hierarchical clustering algorithm to use. More...
 

Public Member Functions

 HierarchicalClusterPicker (ClusterMethod clusterMethod)
 Constructor - takes a ClusterMethod as an argument. More...
 
RDKit::INT_VECT pick (const double *distMat, unsigned int poolSize, unsigned int pickSize) const
 This is the function that does the picking. More...
 
RDKit::VECT_INT_VECT cluster (const double *distMat, unsigned int poolSize, unsigned int pickSize) const
 This is the function that does the clustering of the items - used by the picker. More...
 
- Public Member Functions inherited from RDPickers::DistPicker
 DistPicker ()
 Default constructor. More...
 
virtual ~DistPicker ()
 

Detailed Description

Diversity picker based on hierarchical clustering.

This class inherits from DistPicker since it uses the distance matrix for diversity picking. The clustering itself is done using the Murtagh code in $RDBASE/Code/ML/Cluster/Mutagh/

Definition at line 24 of file HierarchicalClusterPicker.h.

Member Enumeration Documentation

The type of hierarchical clustering algorithm to use.

Enumerator
WARD 
SLINK 
CLINK 
UPGMA 
MCQUITTY 
GOWER 
CENTROID 

Definition at line 28 of file HierarchicalClusterPicker.h.

Constructor & Destructor Documentation

RDPickers::HierarchicalClusterPicker::HierarchicalClusterPicker ( ClusterMethod  clusterMethod)
inlineexplicit

Constructor - takes a ClusterMethod as an argument.

Sets the hierarch clustering method

Definition at line 42 of file HierarchicalClusterPicker.h.

References cluster(), and pick().

Member Function Documentation

RDKit::VECT_INT_VECT RDPickers::HierarchicalClusterPicker::cluster ( const double *  distMat,
unsigned int  poolSize,
unsigned int  pickSize 
) const

This is the function that does the clustering of the items - used by the picker.

ARGUMENTS:

Parameters
distMat- distance matrix - a vector of double. It is assumed that only the lower triangle element of the matrix are supplied in a 1D array
NOTE: this matrix WILL BE ALTERED during the picking
poolSize- the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1)
pickSize- the number clusters to divide the pool into (<= poolSize)

Referenced by HierarchicalClusterPicker().

RDKit::INT_VECT RDPickers::HierarchicalClusterPicker::pick ( const double *  distMat,
unsigned int  poolSize,
unsigned int  pickSize 
) const
virtual

This is the function that does the picking.

Here is how the algorithm works
FIX: Supply reference

  • The entire pool is clustered using the distance matrix using one of the hierachical clustering method (specified via the constructor).
  • Starting with the individaul items in the pool, clusters are merged based on the output from clustering method.
  • The merging is stopped when the number of clusters is same as the number of picks.
  • For each item in a cluster the sum of square of the distances to the rest of of the items (in the cluster) is computed. The item with the smallest of values is picked as a representative of the cluster. Basically trying to pick the item closest to the centroid of the cluster.
\param distMat - distance matrix - a vector of double. It is assumed

that only the lower triangle element of the matrix are supplied in a 1D array
NOTE: this matrix WILL BE ALTERED during the picking

Parameters
poolSize- the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1)
pickSize- the number items to pick from pool (<= poolSize)

Implements RDPickers::DistPicker.

Referenced by HierarchicalClusterPicker().


The documentation for this class was generated from the following file: