rdkit.SimDivFilters.rdSimDivPickers module

Module containing the diversity and similarity pickers

class rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod

Bases: enum

CENTROID = rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.CENTROID
GOWER = rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.GOWER
MCQUITTY = rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.MCQUITTY
UPGMA = rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.UPGMA
WARD = rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.WARD
names = {'CENTROID': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.CENTROID, 'CLINK': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.CLINK, 'GOWER': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.GOWER, 'MCQUITTY': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.MCQUITTY, 'SLINK': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.SLINK, 'UPGMA': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.UPGMA, 'WARD': rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.WARD}
values = {1: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.WARD, 2: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.SLINK, 3: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.CLINK, 4: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.UPGMA, 5: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.MCQUITTY, 6: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.GOWER, 7: rdkit.SimDivFilters.rdSimDivPickers.ClusterMethod.CENTROID}
class rdkit.SimDivFilters.rdSimDivPickers.HierarchicalClusterPicker((object)self, (ClusterMethod)clusterMethod)

Bases: instance

A class for diversity picking of items using Hierarchical Clustering

C++ signature :

void __init__(_object*,RDPickers::HierarchicalClusterPicker::ClusterMethod)

Cluster((HierarchicalClusterPicker)self, (AtomPairsParameters)distMat, (int)poolSize, (int)pickSize) _vectSt6vectorIiSaIiEE :

Return a list of clusters of item from the pool using hierarchical clustering

Parameters:
  • distMat (-) – 1D distance matrix (only the lower triangle elements)

  • poolSize (-) – number of items in the pool

  • pickSize (-) – number of items to pick from the pool

C++ signature :

std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > Cluster(RDPickers::HierarchicalClusterPicker*,boost::python::api::object {lvalue},int,int)

Pick((HierarchicalClusterPicker)self, (AtomPairsParameters)distMat, (int)poolSize, (int)pickSize) _vecti :

Pick a diverse subset of items from a pool of items using hierarchical clustering

Parameters:
  • distMat (-) – 1D distance matrix (only the lower triangle elements)

  • poolSize (-) – number of items in the pool

  • pickSize (-) – number of items to pick from the pool

C++ signature :

std::vector<int, std::allocator<int> > Pick(RDPickers::HierarchicalClusterPicker*,boost::python::api::object {lvalue},int,int)

class rdkit.SimDivFilters.rdSimDivPickers.LeaderPicker((object)arg1)

Bases: instance

A class for diversity picking of items using Roger Sayle’s Leader algorithm (analogous to sphere exclusion). The algorithm is currently unpublished, but a description is available in this presentation from the 2019 RDKit UGM: https://github.com/rdkit/UGM_2019/raw/master/Presentations/Sayle_Clustering.pdf

C++ signature :

void __init__(_object*)

LazyBitVectorPick((LeaderPicker)self, (AtomPairsParameters)objects, (int)poolSize, (float)threshold[, (int)pickSize=0[, (AtomPairsParameters)firstPicks=()[, (int)numThreads=1]]]) _vecti :

Pick a subset of items from a collection of bit vectors using Tanimoto distance. The threshold value is a distance (i.e. 1-similarity). Note that the numThreads argument is currently ignored.

C++ signature :

std::vector<int, std::allocator<int> > LazyBitVectorPick(RDPickers::LeaderPicker*,boost::python::api::object,int,double [,int=0 [,boost::python::api::object=() [,int=1]]])

LazyPick((LeaderPicker)self, (AtomPairsParameters)distFunc, (int)poolSize, (float)threshold[, (int)pickSize=0[, (AtomPairsParameters)firstPicks=()[, (int)numThreads=1]]]) _vecti :

Pick a subset of items from a pool of items using the user-provided function to determine distances. Note that the numThreads argument is currently ignored.

C++ signature :

std::vector<int, std::allocator<int> > LazyPick(RDPickers::LeaderPicker*,boost::python::api::object,int,double [,int=0 [,boost::python::api::object=() [,int=1]]])

class rdkit.SimDivFilters.rdSimDivPickers.MaxMinPicker((object)arg1)

Bases: instance

A class for diversity picking of items using the MaxMin Algorithm

C++ signature :

void __init__(_object*)

LazyBitVectorPick((MaxMinPicker)self, (AtomPairsParameters)objects, (int)poolSize, (int)pickSize[, (AtomPairsParameters)firstPicks=()[, (int)seed=-1[, (AtomPairsParameters)useCache=None]]]) _vecti :

Pick a subset of items from a pool of bit vectors using the MaxMin Algorithm Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604 :param - vectors: a sequence of the bit vectors that should be picked from. :param - poolSize: number of items in the pool :param - pickSize: number of items to pick from the pool :param - firstPicks: (optional) the first items to be picked (seeds the list) :param - seed: (optional) seed for the random number generator :param - useCache: IGNORED.

C++ signature :

std::vector<int, std::allocator<int> > LazyBitVectorPick(RDPickers::MaxMinPicker*,boost::python::api::object,int,int [,boost::python::api::object=() [,int=-1 [,boost::python::api::object=None]]])

LazyBitVectorPickWithThreshold((MaxMinPicker)self, (AtomPairsParameters)objects, (int)poolSize, (int)pickSize, (float)threshold[, (AtomPairsParameters)firstPicks=()[, (int)seed=-1]]) tuple :

Pick a subset of items from a pool of bit vectors using the MaxMin Algorithm Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604 :param - vectors: a sequence of the bit vectors that should be picked from. :param - poolSize: number of items in the pool :param - pickSize: number of items to pick from the pool :param - threshold: stop picking when the distance goes below this value :param - firstPicks: (optional) the first items to be picked (seeds the list) :param - seed: (optional) seed for the random number generator

C++ signature :

boost::python::tuple LazyBitVectorPickWithThreshold(RDPickers::MaxMinPicker*,boost::python::api::object,int,int,double [,boost::python::api::object=() [,int=-1]])

LazyPick((MaxMinPicker)self, (AtomPairsParameters)distFunc, (int)poolSize, (int)pickSize[, (AtomPairsParameters)firstPicks=()[, (int)seed=-1[, (AtomPairsParameters)useCache=None]]]) _vecti :

Pick a subset of items from a pool of items using the MaxMin Algorithm Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604 :param - distFunc: a function that should take two indices and return the

distance between those two points. NOTE: the implementation caches distance values, so the client code does not need to do so; indeed, it should not.

Parameters:
  • poolSize (-) – number of items in the pool

  • pickSize (-) – number of items to pick from the pool

  • firstPicks (-) – (optional) the first items to be picked (seeds the list)

  • seed (-) – (optional) seed for the random number generator

  • useCache (-) – IGNORED

C++ signature :

std::vector<int, std::allocator<int> > LazyPick(RDPickers::MaxMinPicker*,boost::python::api::object,int,int [,boost::python::api::object=() [,int=-1 [,boost::python::api::object=None]]])

LazyPickWithThreshold((MaxMinPicker)self, (AtomPairsParameters)distFunc, (int)poolSize, (int)pickSize, (float)threshold[, (AtomPairsParameters)firstPicks=()[, (int)seed=-1]]) tuple :

Pick a subset of items from a pool of items using the MaxMin Algorithm Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604 :param - distFunc: a function that should take two indices and return the

distance between those two points. NOTE: the implementation caches distance values, so the client code does not need to do so; indeed, it should not.

Parameters:
  • poolSize (-) – number of items in the pool

  • pickSize (-) – number of items to pick from the pool

  • threshold (-) – stop picking when the distance goes below this value

  • firstPicks (-) – (optional) the first items to be picked (seeds the list)

  • seed (-) – (optional) seed for the random number generator

C++ signature :

boost::python::tuple LazyPickWithThreshold(RDPickers::MaxMinPicker*,boost::python::api::object,int,int,double [,boost::python::api::object=() [,int=-1]])

Pick((MaxMinPicker)self, (AtomPairsParameters)distMat, (int)poolSize, (int)pickSize[, (AtomPairsParameters)firstPicks=()[, (int)seed=-1]]) _vecti :

Pick a subset of items from a pool of items using the MaxMin Algorithm Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604

Parameters:
  • distMat (-) – 1D distance matrix (only the lower triangle elements)

  • poolSize (-) – number of items in the pool

  • pickSize (-) – number of items to pick from the pool

  • firstPicks (-) – (optional) the first items to be picked (seeds the list)

  • seed (-) – (optional) seed for the random number generator

C++ signature :

std::vector<int, std::allocator<int> > Pick(RDPickers::MaxMinPicker*,boost::python::api::object,int,int [,boost::python::api::object=() [,int=-1]])