Package rdkit :: Package Chem :: Module MACCSkeys
[hide private]
[frames] | no frames]

Module MACCSkeys

source code

SMARTS definitions for the publically available MACCS keys
and a MACCS fingerprinter

I compared the MACCS fingerprints generated here with those from two
other packages (not MDL, unfortunately). Of course there are
disagreements between the various fingerprints still, but I think
these definitions work pretty well. Some notes:

1) most of the differences have to do with aromaticity
2) there's a discrepancy sometimes because the current RDKit
definitions do not require multiple matches to be distinct. e.g. the
SMILES C(=O)CC(=O) can match the (hypothetical) key O=CC twice in my
definition. It's not clear to me what the correct behavior is.
3) Some keys are not fully defined in the MDL documentation
4) Two keys, 125 and 166, have to be done outside of SMARTS.
5) Key 1 (ISOTOPE) isn't defined

Rev history:
2006 (gl): Original open-source release
May 2011 (gl): Update some definitions based on feedback from Andrew Dalke

Functions [hide private]
 
_InitKeys(keyList, keyDict)
*Internal Use Only*
source code
 
_pyGenMACCSKeys(mol, **kwargs)
generates the MACCS fingerprint for a molecules
source code
 
_test() source code
Variables [hide private]
  smartsPatts = {1: ('?', 0), 2: ('[#104]', 0), 3: ('[#32,#33,#3...
  maccsKeys = None
hash(x)
  __package__ = 'rdkit.Chem'

Imports: Chem, rdMolDescriptors, DataStructs, GenMACCSKeys, FingerprintMol


Function Details [hide private]

_InitKeys(keyList, keyDict)

source code 
*Internal Use Only*

generates SMARTS patterns for the keys, run once

_pyGenMACCSKeys(mol, **kwargs)

source code 
generates the MACCS fingerprint for a molecules

 **Arguments**

   - mol: the molecule to be fingerprinted

   - any extra keyword arguments are ignored
   
 **Returns**

    a _DataStructs.SparseBitVect_ containing the fingerprint.

>>> m = Chem.MolFromSmiles('CNO')
>>> bv = GenMACCSKeys(m)
>>> tuple(bv.GetOnBits())
(24, 68, 69, 71, 93, 94, 102, 124, 131, 139, 151, 158, 160, 161, 164)
>>> bv = GenMACCSKeys(Chem.MolFromSmiles('CCC'))
>>> tuple(bv.GetOnBits())
(74, 114, 149, 155, 160)


Variables Details [hide private]

smartsPatts

Value:
{1: ('?', 0),
 2: ('[#104]', 0),
 3: ('[#32,#33,#34,#50,#51,#52,#82,#83,#84]', 0),
 4: ('[Ac,Th,Pa,U,Np,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr]', 0),
 5: ('[Sc,Ti,Y,Zr,Hf]', 0),
 6: ('[La,Ce,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu]', 0),
 7: ('[V,Cr,Mn,Nb,Mo,Tc,Ta,W,Re]', 0),
 8: ('[!#6;!#1]1~*~*~*~1', 0),
...