RDKit
Open-source cheminformatics and machine learning.
RDInfoTheory Namespace Reference

Class used to rank bits based on a specified measure of infomation. More...

Classes

class  BitCorrMatGenerator
 
class  InfoBitRanker
 

Typedefs

typedef std::vector< RDKit::USHORTUSHORT_VECT
 
typedef std::vector< USHORT_VECTVECT_USHORT_VECT
 

Functions

template<class T >
double ChiSquare (T *dMat, long int dim1, long int dim2)
 
template<class T >
double InfoEntropy (T *tPtr, long int dim)
 
template<class T >
double InfoEntropyGain (T *dMat, long int dim1, long int dim2)
 

Detailed Description

Class used to rank bits based on a specified measure of infomation.

Basically a primitive mimic of the CombiChem "signal" functionality To use:

  • create an instance of this class
  • loop over the fingerprints in the dataset by calling accumulateVotes method
  • call getTopN to get the top n ranked bits

Sample usage and results from the python wrapper: Here's a small set of vectors:

for i,bv in enumerate(bvs): print bv.ToBitString(),acts[i]

... 0001 0 0101 0 0010 1 1110 1

Default ranker, using infogain:

ranker = InfoBitRanker(4,2) for i,bv in enumerate(bvs): ranker.AccumulateVotes(bv,acts[i])

...

for bit,gain,n0,n1 in ranker.GetTopN(3): print

int(bit),'%.3f'gain,int(n0),int(n1) ... 3 1.000 2 0 2 1.000 0 2 0 0.311 0 1

Using the biased infogain:

ranker = InfoBitRanker(4,2,InfoTheory.InfoType.BIASENTROPY) ranker.SetBiasList((1,)) for i,bv in enumerate(bvs): ranker.AccumulateVotes(bv,acts[i])

...

for bit,gain,n0,n1 in ranker.GetTopN(3): print

int(bit),'%.3f'gain,int(n0),int(n1) ... 2 1.000 0 2 0 0.311 0 1 1 0.000 1 1

A chi squared ranker is also available:

ranker = InfoBitRanker(4,2,InfoTheory.InfoType.CHISQUARE) for i,bv in enumerate(bvs): ranker.AccumulateVotes(bv,acts[i])

...

for bit,gain,n0,n1 in ranker.GetTopN(3): print

int(bit),'%.3f'gain,int(n0),int(n1) ... 3 4.000 2 0 2 4.000 0 2 0 1.333 0 1

As is a biased chi squared:

ranker = InfoBitRanker(4,2,InfoTheory.InfoType.BIASCHISQUARE) ranker.SetBiasList((1,)) for i,bv in enumerate(bvs): ranker.AccumulateVotes(bv,acts[i])

...

for bit,gain,n0,n1 in ranker.GetTopN(3): print

int(bit),'%.3f'gain,int(n0),int(n1) ... 2 4.000 0 2 0 1.333 0 1 1 0.000 1 1

Typedef Documentation

Definition at line 83 of file InfoBitRanker.h.

Definition at line 84 of file InfoBitRanker.h.

Function Documentation

template<class T >
double RDInfoTheory::ChiSquare ( T *  dMat,
long int  dim1,
long int  dim2 
)

Definition at line 14 of file InfoGainFuncs.h.

template<class T >
double RDInfoTheory::InfoEntropy ( T *  tPtr,
long int  dim 
)

Definition at line 67 of file InfoGainFuncs.h.

Referenced by InfoEntropyGain().

template<class T >
double RDInfoTheory::InfoEntropyGain ( T *  dMat,
long int  dim1,
long int  dim2 
)

Definition at line 88 of file InfoGainFuncs.h.

References InfoEntropy().