RDKit
Open-source cheminformatics and machine learning.
RDKit::MorganFingerprints Namespace Reference

Typedefs

typedef std::map< boost::uint32_t, std::vector< std::pair< boost::uint32_t, boost::uint32_t > > > BitInfoMap
 

Functions

SparseIntVect< boost::uint32_t > * getFingerprint (const ROMol &mol, unsigned int radius, std::vector< boost::uint32_t > *invariants=0, const std::vector< boost::uint32_t > *fromAtoms=0, bool useChirality=false, bool useBondTypes=true, bool useCounts=true, bool onlyNonzeroInvariants=false, BitInfoMap *atomsSettingBits=0)
 returns the Morgan fingerprint for a molecule More...
 
SparseIntVect< boost::uint32_t > * getHashedFingerprint (const ROMol &mol, unsigned int radius, unsigned int nBits=2048, std::vector< boost::uint32_t > *invariants=0, const std::vector< boost::uint32_t > *fromAtoms=0, bool useChirality=false, bool useBondTypes=true, bool onlyNonzeroInvariants=false, BitInfoMap *atomsSettingBits=0)
 returns the Morgan fingerprint for a molecule More...
 
ExplicitBitVectgetFingerprintAsBitVect (const ROMol &mol, unsigned int radius, unsigned int nBits, std::vector< boost::uint32_t > *invariants=0, const std::vector< boost::uint32_t > *fromAtoms=0, bool useChirality=false, bool useBondTypes=true, bool onlyNonzeroInvariants=false, BitInfoMap *atomsSettingBits=0)
 returns the Morgan fingerprint for a molecule as a bit vector More...
 
void getConnectivityInvariants (const ROMol &mol, std::vector< boost::uint32_t > &invars, bool includeRingMembership=true)
 returns the connectivity invariants for a molecule More...
 
void getFeatureInvariants (const ROMol &mol, std::vector< boost::uint32_t > &invars, std::vector< const ROMol * > *patterns=0)
 returns the feature invariants for a molecule More...
 

Variables

std::vector< std::string > defaultFeatureSmarts
 
const std::string morganFingerprintVersion = "1.0.0"
 
const std::string morganConnectivityInvariantVersion = "1.0.0"
 
const std::string morganFeatureInvariantVersion = "0.1.0"
 

Typedef Documentation

typedef std::map<boost::uint32_t, std::vector<std::pair<boost::uint32_t, boost::uint32_t> > > RDKit::MorganFingerprints::BitInfoMap

Definition at line 56 of file MorganFingerprints.h.

Function Documentation

void RDKit::MorganFingerprints::getConnectivityInvariants ( const ROMol mol,
std::vector< boost::uint32_t > &  invars,
bool  includeRingMembership = true 
)

returns the connectivity invariants for a molecule

Parameters
mol: the molecule to be considered
invars: used to return the results
includeRingMembership: if set, whether or not the atom is in a ring will be used in the invariant list.
void RDKit::MorganFingerprints::getFeatureInvariants ( const ROMol mol,
std::vector< boost::uint32_t > &  invars,
std::vector< const ROMol * > *  patterns = 0 
)

returns the feature invariants for a molecule

Parameters
molthe molecule to be considered
invars: used to return the results
patternsif provided should contain the queries used to assign atom-types. if not provided, feature definitions adapted from reference: Gobbi and Poppinger, Biotech. Bioeng. 61 47-54 (1998) will be used for Donor, Acceptor, Aromatic, Halogen, Basic, Acidic
SparseIntVect<boost::uint32_t>* RDKit::MorganFingerprints::getFingerprint ( const ROMol mol,
unsigned int  radius,
std::vector< boost::uint32_t > *  invariants = 0,
const std::vector< boost::uint32_t > *  fromAtoms = 0,
bool  useChirality = false,
bool  useBondTypes = true,
bool  useCounts = true,
bool  onlyNonzeroInvariants = false,
BitInfoMap atomsSettingBits = 0 
)

returns the Morgan fingerprint for a molecule

These fingerprints are similar to the well-known ECFP or FCFP fingerprints, depending on which invariants are used.

The algorithm used is described in the paper Rogers, D. & Hahn, M. Extended-Connectivity Fingerprints. JCIM 50:742-54 (2010) http://dx.doi.org/10.1021/ci100050t

The original implementation was done using this paper: D. Rogers, R.D. Brown, M. Hahn J. Biomol. Screen. 10:682-6 (2005) and an unpublished technical report: http://www.ics.uci.edu/~welling/teaching/ICS274Bspring06/David%20Rogers%20-%20ECFP%20Manuscript.doc

Parameters
molthe molecule to be fingerprinted
radiusthe number of iterations to grow the fingerprint
invariants: optional pointer to a set of atom invariants to be used. By default ECFP-type invariants are used (calculated by getConnectivityInvariants())
fromAtoms: if this is provided, only the atoms in the vector will be used as centers in the fingerprint
useChirality: if set, additional information will be added to the fingerprint when chiral atoms are discovered. This will cause
C[C@H](F)Cl,
                      C[C@@H](F)Cl, and CC(F)Cl 
to generate different fingerprints.
useBondTypes: if set, bond types will be included as part of the hash for calculating bits
useCounts: if set, counts of the features will be used
onlyNonzeroInvariants: if set, bits will only be set from atoms that have a nonzero invariant.
atomsSettingBits: if nonzero, this will be used to return information about the atoms that set each particular bit. The keys are the map are bit ids, the values are lists of (atomId, radius) pairs.
Returns
a pointer to the fingerprint. The client is responsible for calling delete on this.
ExplicitBitVect* RDKit::MorganFingerprints::getFingerprintAsBitVect ( const ROMol mol,
unsigned int  radius,
unsigned int  nBits,
std::vector< boost::uint32_t > *  invariants = 0,
const std::vector< boost::uint32_t > *  fromAtoms = 0,
bool  useChirality = false,
bool  useBondTypes = true,
bool  onlyNonzeroInvariants = false,
BitInfoMap atomsSettingBits = 0 
)

returns the Morgan fingerprint for a molecule as a bit vector

see documentation for getFingerprint() for theory/references

Parameters
molthe molecule to be fingerprinted
radiusthe number of iterations to grow the fingerprint
nBitsthe number of bits in the final fingerprint
invariants: optional pointer to a set of atom invariants to be used. By default ECFP-type invariants are used (calculated by getConnectivityInvariants())
fromAtoms: if this is provided, only the atoms in the vector will be used as centers in the fingerprint
useChirality: if set, additional information will be added to the fingerprint when chiral atoms are discovered. This will cause
C[C@H](F)Cl,
                      C[C@@H](F)Cl, and CC(F)Cl 
to generate different fingerprints.
useBondTypes: if set, bond types will be included as part of the hash for calculating bits
onlyNonzeroInvariants: if set, bits will only be set from atoms that have a nonzero invariant.
atomsSettingBits: if nonzero, this will be used to return information about the atoms that set each particular bit. The keys are the map are bit ids, the values are lists of (atomId, radius) pairs.
Returns
a pointer to the fingerprint. The client is responsible for calling delete on this.
SparseIntVect<boost::uint32_t>* RDKit::MorganFingerprints::getHashedFingerprint ( const ROMol mol,
unsigned int  radius,
unsigned int  nBits = 2048,
std::vector< boost::uint32_t > *  invariants = 0,
const std::vector< boost::uint32_t > *  fromAtoms = 0,
bool  useChirality = false,
bool  useBondTypes = true,
bool  onlyNonzeroInvariants = false,
BitInfoMap atomsSettingBits = 0 
)

returns the Morgan fingerprint for a molecule

These fingerprints are similar to the well-known ECFP or FCFP fingerprints, depending on which invariants are used.

The algorithm used is described in the paper Rogers, D. & Hahn, M. Extended-Connectivity Fingerprints. JCIM 50:742-54 (2010) http://dx.doi.org/10.1021/ci100050t

The original implementation was done using this paper: D. Rogers, R.D. Brown, M. Hahn J. Biomol. Screen. 10:682-6 (2005) and an unpublished technical report: http://www.ics.uci.edu/~welling/teaching/ICS274Bspring06/David%20Rogers%20-%20ECFP%20Manuscript.doc

Parameters
molthe molecule to be fingerprinted
radiusthe number of iterations to grow the fingerprint
invariants: optional pointer to a set of atom invariants to be used. By default ECFP-type invariants are used (calculated by getConnectivityInvariants())
fromAtoms: if this is provided, only the atoms in the vector will be used as centers in the fingerprint
useChirality: if set, additional information will be added to the fingerprint when chiral atoms are discovered. This will cause
C[C@H](F)Cl,
                      C[C@@H](F)Cl, and CC(F)Cl 
to generate different fingerprints.
useBondTypes: if set, bond types will be included as part of the hash for calculating bits
onlyNonzeroInvariants: if set, bits will only be set from atoms that have a nonzero invariant.
atomsSettingBits: if nonzero, this will be used to return information about the atoms that set each particular bit. The keys are the map are bit ids, the values are lists of (atomId, radius) pairs.
Returns
a pointer to the fingerprint. The client is responsible for calling delete on this.

Variable Documentation

std::vector<std::string> RDKit::MorganFingerprints::defaultFeatureSmarts
const std::string RDKit::MorganFingerprints::morganConnectivityInvariantVersion = "1.0.0"

Definition at line 209 of file MorganFingerprints.h.

const std::string RDKit::MorganFingerprints::morganFeatureInvariantVersion = "0.1.0"

Definition at line 227 of file MorganFingerprints.h.

const std::string RDKit::MorganFingerprints::morganFingerprintVersion = "1.0.0"

Definition at line 58 of file MorganFingerprints.h.