RDKit
Open-source cheminformatics and machine learning.
Loading...
Searching...
No Matches
RDPickers::HierarchicalClusterPicker Class Reference

Diversity picker based on hierarchical clustering. More...

#include <HierarchicalClusterPicker.h>

Inheritance diagram for RDPickers::HierarchicalClusterPicker:
RDPickers::DistPicker

Public Types

enum  ClusterMethod {
  WARD = 1 , SLINK = 2 , CLINK = 3 , UPGMA = 4 ,
  MCQUITTY = 5 , GOWER = 6 , CENTROID = 7
}
 The type of hierarchical clustering algorithm to use. More...
 

Public Member Functions

 HierarchicalClusterPicker (ClusterMethod clusterMethod)
 Constructor - takes a ClusterMethod as an argument.
 
RDKit::INT_VECT pick (const double *distMat, unsigned int poolSize, unsigned int pickSize) const override
 This is the function that does the picking.
 
RDKit::VECT_INT_VECT cluster (const double *distMat, unsigned int poolSize, unsigned int pickSize) const
 This is the function that does the clustering of the items - used by the picker.
 
- Public Member Functions inherited from RDPickers::DistPicker
 DistPicker ()
 Default constructor.
 
virtual ~DistPicker ()
 

Detailed Description

Diversity picker based on hierarchical clustering.

This class inherits from DistPicker since it uses the distance matrix for diversity picking. The clustering itself is done using the Murtagh code in $RDBASE/Code/ML/Cluster/Mutagh/

Definition at line 25 of file HierarchicalClusterPicker.h.

Member Enumeration Documentation

◆ ClusterMethod

The type of hierarchical clustering algorithm to use.

Enumerator
WARD 
SLINK 
CLINK 
UPGMA 
MCQUITTY 
GOWER 
CENTROID 

Definition at line 29 of file HierarchicalClusterPicker.h.

Constructor & Destructor Documentation

◆ HierarchicalClusterPicker()

RDPickers::HierarchicalClusterPicker::HierarchicalClusterPicker ( ClusterMethod clusterMethod)
inlineexplicit

Constructor - takes a ClusterMethod as an argument.

Sets the hierarchy clustering method

Definition at line 43 of file HierarchicalClusterPicker.h.

Member Function Documentation

◆ cluster()

RDKit::VECT_INT_VECT RDPickers::HierarchicalClusterPicker::cluster ( const double * distMat,
unsigned int poolSize,
unsigned int pickSize ) const

This is the function that does the clustering of the items - used by the picker.

ARGUMENTS:

Parameters
distMat- distance matrix - a vector of double. It is assumed that only the lower triangle element of the matrix are supplied in a 1D array
NOTE: this matrix WILL BE ALTERED during the picking
poolSize- the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1)
pickSize- the number clusters to divide the pool into (<= poolSize)

◆ pick()

RDKit::INT_VECT RDPickers::HierarchicalClusterPicker::pick ( const double * distMat,
unsigned int poolSize,
unsigned int pickSize ) const
overridevirtual

This is the function that does the picking.

Here is how the algorithm works
FIX: Supply reference

  • The entire pool is clustered using the distance matrix using one of the hierarchical clustering method (specified via the constructor).
  • Starting with the individual items in the pool, clusters are merged based on the output from clustering method.
  • The merging is stopped when the number of clusters is same as the number of picks.
  • For each item in a cluster the sum of square of the distances to the rest of of the items (in the cluster) is computed. The item with the smallest of values is picked as a representative of the cluster. Basically trying to pick the item closest to the centroid of the cluster.
\param distMat - distance matrix - a vector of double. It is assumed

that only the lower triangle element of the matrix are supplied in a 1D array
NOTE: this matrix WILL BE ALTERED during the picking

Parameters
poolSize- the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1)
pickSize- the number items to pick from pool (<= poolSize)

Implements RDPickers::DistPicker.


The documentation for this class was generated from the following file: