RDKit
Open-source cheminformatics and machine learning.
RDPickers::MaxMinPicker Class Reference

Implements the MaxMin algorithm for picking a subset of item from a pool. More...

#include <MaxMinPicker.h>

Inheritance diagram for RDPickers::MaxMinPicker:
RDPickers::DistPicker

Public Member Functions

 MaxMinPicker ()
 Default Constructor. More...
 
template<typename T >
RDKit::INT_VECT lazyPick (T &func, unsigned int poolSize, unsigned int pickSize) const
 Contains the implementation for a lazy MaxMin diversity picker. More...
 
template<typename T >
RDKit::INT_VECT lazyPick (T &func, unsigned int poolSize, unsigned int pickSize, const RDKit::INT_VECT &firstPicks, int seed=-1) const
 
template<typename T >
RDKit::INT_VECT lazyPick (T &func, unsigned int poolSize, unsigned int pickSize, const RDKit::INT_VECT &firstPicks, int seed, double &threshold) const
 
RDKit::INT_VECT pick (const double *distMat, unsigned int poolSize, unsigned int pickSize, RDKit::INT_VECT firstPicks, int seed=-1) const
 Contains the implementation for the MaxMin diversity picker. More...
 
RDKit::INT_VECT pick (const double *distMat, unsigned int poolSize, unsigned int pickSize) const
 
- Public Member Functions inherited from RDPickers::DistPicker
 DistPicker ()
 Default constructor. More...
 
virtual ~DistPicker ()
 

Detailed Description

Implements the MaxMin algorithm for picking a subset of item from a pool.

This class inherits from the DistPicker and implements a specific picking strategy aimed at diversity. See documentation for "pick()" member function for the algorithm details

Definition at line 46 of file MaxMinPicker.h.

Constructor & Destructor Documentation

RDPickers::MaxMinPicker::MaxMinPicker ( )
inline

Default Constructor.

Definition at line 51 of file MaxMinPicker.h.

Member Function Documentation

template<typename T >
RDKit::INT_VECT RDPickers::MaxMinPicker::lazyPick ( T &  func,
unsigned int  poolSize,
unsigned int  pickSize 
) const

Contains the implementation for a lazy MaxMin diversity picker.

See the documentation for the pick() method for details about the algorithm

Parameters
func- a function (or functor) taking two unsigned ints as arguments and returning the distance (as a double) between those two elements.
poolSize- the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1)
pickSize- the number items to pick from pool (<= poolSize)
firstPicks- (optional)the first items in the pick list
seed- (optional) seed for the random number generator

Definition at line 290 of file MaxMinPicker.h.

Referenced by lazyPick().

template<typename T >
RDKit::INT_VECT RDPickers::MaxMinPicker::lazyPick ( T &  func,
unsigned int  poolSize,
unsigned int  pickSize,
const RDKit::INT_VECT firstPicks,
int  seed = -1 
) const

Definition at line 280 of file MaxMinPicker.h.

References lazyPick().

template<typename T >
RDKit::INT_VECT RDPickers::MaxMinPicker::lazyPick ( T &  func,
unsigned int  poolSize,
unsigned int  pickSize,
const RDKit::INT_VECT firstPicks,
int  seed,
double &  threshold 
) const
RDKit::INT_VECT RDPickers::MaxMinPicker::pick ( const double *  distMat,
unsigned int  poolSize,
unsigned int  pickSize,
RDKit::INT_VECT  firstPicks,
int  seed = -1 
) const
inline

Contains the implementation for the MaxMin diversity picker.

Here is how the picking algorithm works, refer to Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604 for more detail:

A subset of k items is to be selected from a pool containing N molecules. Then the MaxMin method is as follows:

  1. Initialise Subset with some appropriately chosen seed compound and set x = 1.
  2. For each of the N-x remaining compounds in Dataset calculate its dissimilarity with each of the x compounds in Subset and retain the smallest of these x dissimilarities for each compound in Dataset.
  3. Select the molecule from Dataset with the largest value for the smallest dissimilarity calculated in Step 2 and transfer it to Subset.
  4. Set x = x + 1 and return to Step 2 if x < k.

    Parameters
    distMat- distance matrix - a vector of double. It is assumed that only the lower triangle element of the matrix are supplied in a 1D array
    poolSize- the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1)
    pickSize- the number items to pick from pool (<= poolSize)
    firstPicks- indices of the items used to seed the pick set.
    seed- (optional) seed for the random number generator

Definition at line 120 of file MaxMinPicker.h.

References CHECK_INVARIANT.

RDKit::INT_VECT RDPickers::MaxMinPicker::pick ( const double *  distMat,
unsigned int  poolSize,
unsigned int  pickSize 
) const
inlinevirtual

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Implements RDPickers::DistPicker.

Definition at line 132 of file MaxMinPicker.h.


The documentation for this class was generated from the following file: