RDKit
Open-source cheminformatics and machine learning.
RDInfoTheory::InfoBitRanker Class Reference

#include <InfoBitRanker.h>

Public Types

enum  InfoType { ENTROPY = 1, BIASENTROPY = 2, CHISQUARE = 3, BIASCHISQUARE = 4 }
 the type of measure for information More...
 

Public Member Functions

 InfoBitRanker (unsigned int nBits, unsigned int nClasses, InfoType infoType=InfoBitRanker::ENTROPY)
 Constructor. More...
 
 ~InfoBitRanker ()
 
void accumulateVotes (const ExplicitBitVect &bv, unsigned int label)
 Accumulate the votes for all the bits turned on in a bit vector. More...
 
void accumulateVotes (const SparseBitVect &bv, unsigned int label)
 
double * getTopN (unsigned int num)
 Returns the top n bits ranked by the information metric. More...
 
unsigned int getNumInstances () const
 return the number of labelled instances(examples) or fingerprints seen so far More...
 
unsigned int getNumClasses () const
 return the number of classes More...
 
void setBiasList (RDKit::INT_VECT &classList)
 Set the classes to which the entropy calculation should be biased. More...
 
void setMaskBits (RDKit::INT_VECT &maskBits)
 Set the bits to be used as a mask. More...
 
void writeTopBitsToStream (std::ostream *outStream) const
 Write the top N bits to a stream. More...
 
void writeTopBitsToFile (const std::string &fileName) const
 Write the top bits to a file. More...
 

Detailed Description

Definition at line 86 of file InfoBitRanker.h.

Member Enumeration Documentation

the type of measure for information

Enumerator
ENTROPY 
BIASENTROPY 
CHISQUARE 
BIASCHISQUARE 

Definition at line 91 of file InfoBitRanker.h.

Constructor & Destructor Documentation

RDInfoTheory::InfoBitRanker::InfoBitRanker ( unsigned int  nBits,
unsigned int  nClasses,
InfoType  infoType = InfoBitRanker::ENTROPY 
)
inline

Constructor.

ARGUMENTS:

  • nBits: the dimension of the bit vectors or the fingerprint length
  • nClasses: the number of classes used in the classification problem (e.g. active, moderately active, inactive etc.). It is assumed that the classes are numbered from 0 to (nClasses - 1)
  • infoType: the type of information metric

Definition at line 110 of file InfoBitRanker.h.

RDInfoTheory::InfoBitRanker::~InfoBitRanker ( )
inline

Definition at line 127 of file InfoBitRanker.h.

References accumulateVotes(), and getTopN().

Member Function Documentation

void RDInfoTheory::InfoBitRanker::accumulateVotes ( const ExplicitBitVect bv,
unsigned int  label 
)

Accumulate the votes for all the bits turned on in a bit vector.

ARGUMENTS:

  • bv : bit vector that supports [] operator
  • label : the class label for the bit vector. It is assumed that 0 <= class < nClasses

Referenced by ~InfoBitRanker().

void RDInfoTheory::InfoBitRanker::accumulateVotes ( const SparseBitVect bv,
unsigned int  label 
)
unsigned int RDInfoTheory::InfoBitRanker::getNumClasses ( ) const
inline

return the number of classes

Definition at line 164 of file InfoBitRanker.h.

References setBiasList(), setMaskBits(), writeTopBitsToFile(), and writeTopBitsToStream().

unsigned int RDInfoTheory::InfoBitRanker::getNumInstances ( ) const
inline

return the number of labelled instances(examples) or fingerprints seen so far

Definition at line 159 of file InfoBitRanker.h.

double* RDInfoTheory::InfoBitRanker::getTopN ( unsigned int  num)

Returns the top n bits ranked by the information metric.

This is actually the function where most of the work of ranking is happening

Parameters
numthe number of top ranked bits that are required
Returns
a pointer to an information array. The client should not delete this

Referenced by ~InfoBitRanker().

void RDInfoTheory::InfoBitRanker::setBiasList ( RDKit::INT_VECT classList)

Set the classes to which the entropy calculation should be biased.

This list contains a set of class ids used when in the BIASENTROPY mode of ranking bits. In this mode, a bit must be correllated higher with one of the biased classes than all the other classes. For example, in a two class problem with actives and inactives, the fraction of actives that hit the bit has to be greater than the fraction of inactives that hit the bit

ARGUMENTS: classList - list of class ids that we want a bias towards

Referenced by getNumClasses().

void RDInfoTheory::InfoBitRanker::setMaskBits ( RDKit::INT_VECT maskBits)

Set the bits to be used as a mask.

If this function is called, only the bits which are present in the maskBits list will be used.

ARGUMENTS: maskBits - the bits to be considered

Referenced by getNumClasses().

void RDInfoTheory::InfoBitRanker::writeTopBitsToFile ( const std::string &  fileName) const

Write the top bits to a file.

Referenced by getNumClasses().

void RDInfoTheory::InfoBitRanker::writeTopBitsToStream ( std::ostream *  outStream) const

Write the top N bits to a stream.

Referenced by getNumClasses().


The documentation for this class was generated from the following file: