RDKit
Open-source cheminformatics and machine learning.
Loading...
Searching...
No Matches
RDInfoTheory::InfoBitRanker Class Reference

#include <InfoBitRanker.h>

Public Types

enum  InfoType { ENTROPY = 1 , BIASENTROPY = 2 , CHISQUARE = 3 , BIASCHISQUARE = 4 }
 the type of measure for information More...
 

Public Member Functions

 InfoBitRanker (unsigned int nBits, unsigned int nClasses, InfoType infoType=InfoBitRanker::ENTROPY)
 Constructor.
 
 ~InfoBitRanker ()
 
void accumulateVotes (const ExplicitBitVect &bv, unsigned int label)
 Accumulate the votes for all the bits turned on in a bit vector.
 
void accumulateVotes (const SparseBitVect &bv, unsigned int label)
 
double * getTopN (unsigned int num)
 Returns the top n bits ranked by the information metric.
 
unsigned int getNumInstances () const
 return the number of labelled instances(examples) or fingerprints seen so far
 
unsigned int getNumClasses () const
 return the number of classes
 
void setBiasList (RDKit::INT_VECT &classList)
 Set the classes to which the entropy calculation should be biased.
 
void setMaskBits (RDKit::INT_VECT &maskBits)
 Set the bits to be used as a mask.
 
void writeTopBitsToStream (std::ostream *outStream) const
 Write the top N bits to a stream.
 
void writeTopBitsToFile (const std::string &fileName) const
 Write the top bits to a file.
 

Detailed Description

Definition at line 87 of file InfoBitRanker.h.

Member Enumeration Documentation

◆ InfoType

the type of measure for information

Enumerator
ENTROPY 
BIASENTROPY 
CHISQUARE 
BIASCHISQUARE 

Definition at line 92 of file InfoBitRanker.h.

Constructor & Destructor Documentation

◆ InfoBitRanker()

RDInfoTheory::InfoBitRanker::InfoBitRanker ( unsigned int  nBits,
unsigned int  nClasses,
InfoType  infoType = InfoBitRanker::ENTROPY 
)
inline

Constructor.

ARGUMENTS:

  • nBits: the dimension of the bit vectors or the fingerprint length
  • nClasses: the number of classes used in the classification problem (e.g. active, moderately active, inactive etc.). It is assumed that the classes are numbered from 0 to (nClasses - 1)
  • infoType: the type of information metric

Definition at line 111 of file InfoBitRanker.h.

◆ ~InfoBitRanker()

RDInfoTheory::InfoBitRanker::~InfoBitRanker ( )
inline

Definition at line 128 of file InfoBitRanker.h.

Member Function Documentation

◆ accumulateVotes() [1/2]

void RDInfoTheory::InfoBitRanker::accumulateVotes ( const ExplicitBitVect bv,
unsigned int  label 
)

Accumulate the votes for all the bits turned on in a bit vector.

ARGUMENTS:

  • bv : bit vector that supports [] operator
  • label : the class label for the bit vector. It is assumed that 0 <= class < nClasses

◆ accumulateVotes() [2/2]

void RDInfoTheory::InfoBitRanker::accumulateVotes ( const SparseBitVect bv,
unsigned int  label 
)

◆ getNumClasses()

unsigned int RDInfoTheory::InfoBitRanker::getNumClasses ( ) const
inline

return the number of classes

Definition at line 169 of file InfoBitRanker.h.

◆ getNumInstances()

unsigned int RDInfoTheory::InfoBitRanker::getNumInstances ( ) const
inline

return the number of labelled instances(examples) or fingerprints seen so far

Definition at line 164 of file InfoBitRanker.h.

◆ getTopN()

double * RDInfoTheory::InfoBitRanker::getTopN ( unsigned int  num)

Returns the top n bits ranked by the information metric.

This is actually the function where most of the work of ranking is happening

Parameters
numthe number of top ranked bits that are required
Returns
a pointer to an information array. The client should not delete this

◆ setBiasList()

void RDInfoTheory::InfoBitRanker::setBiasList ( RDKit::INT_VECT classList)

Set the classes to which the entropy calculation should be biased.

This list contains a set of class ids used when in the BIASENTROPY mode of ranking bits. In this mode, a bit must be correllated higher with one of the biased classes than all the other classes. For example, in a two class problem with actives and inactives, the fraction of actives that hit the bit has to be greater than the fraction of inactives that hit the bit

ARGUMENTS: classList - list of class ids that we want a bias towards

◆ setMaskBits()

void RDInfoTheory::InfoBitRanker::setMaskBits ( RDKit::INT_VECT maskBits)

Set the bits to be used as a mask.

If this function is called, only the bits which are present in the maskBits list will be used.

ARGUMENTS: maskBits - the bits to be considered

◆ writeTopBitsToFile()

void RDInfoTheory::InfoBitRanker::writeTopBitsToFile ( const std::string &  fileName) const

Write the top bits to a file.

◆ writeTopBitsToStream()

void RDInfoTheory::InfoBitRanker::writeTopBitsToStream ( std::ostream *  outStream) const

Write the top N bits to a stream.


The documentation for this class was generated from the following file: