Package rdkit :: Package Chem :: Module rdSubstructLibrary :: Class SubstructLibrary
[hide private]
[frames] | no frames]

Class SubstructLibrary

 object --+    
          |    
??.instance --+
              |
             SubstructLibrary

SubstructLibrary: This provides a simple API for substructure searching large datasets
The SubstructLibrary takes full advantage of available threads during the search operation.
Basic operation is simple
>>> from __future__ import print_function
>>> import os
>>> from rdkit import Chem, RDConfig
>>> from rdkit.Chem import rdSubstructLibrary
>>> library = rdSubstructLibrary.SubstructLibrary()
>>> for mol in Chem.SDMolSupplier(os.path.join(RDConfig.RDDataDir, 
...                               'NCI', 'first_200.props.sdf')):
...   idx = library.AddMol(mol)
>>> core = Chem.MolFromSmarts('CCCCOC')
>>> indices = library.GetMatches(core)
>>> len(indices)
11


Substructure matching options can be sent into GetMatches:
>>> indices = library.GetMatches(core, useChirality=False) 
>>> len(indices)
11

Controlling the number of threads or the maximum number of matches returned:
is also available (the default is to run on all cores)
>>> indices = library.GetMatches(core, numThreads=2, maxResults=10) 
>>> len(indices)
10

Working on larger datasets:

Molecules are fairly large objects and will limit the number that can be kept in memory.
To assist this we supply three other molecule holders:
  CachedMolHolder - stores molecules as their pickled representation
  CachedSmilesMolHolder - stores molecules internally as smiles strings
  CachedTrustedSmilesMolHolder = excepts (and stores) molecules as trusted smiles strings

Using Pattern fingerprints as a pre-filter:
Pattern fingerprints provide an easy way to indicate whether the substructure search should be
be done at all.  This is particulary useful with the Binary and Smiles based molecule holders
as they have an expensive molecule creation step in addition to the substructure searching step
 
>>> library = rdSubstructLibrary.SubstructLibrary(rdSubstructLibrary.CachedSmilesMolHolder(), 
...                                               rdSubstructLibrary.PatternHolder())
>>> for mol in Chem.SDMolSupplier(os.path.join(RDConfig.RDDataDir, 
...                               'NCI', 'first_200.props.sdf')):
...   idx = library.AddMol(mol)
>>> indices = library.GetMatches(core)
>>> len(indices)
11

This (obviously) takes longer to initialize.  However, both the molecule and pattern
holders can be populated with raw data, a simple example is below:
>>> import csv
>>> molholder = rdSubstructLibrary.CachedSmilesMolHolder()
>>> pattern_holder = rdSubstructLibrary.PatternHolder()
>>> for i, row in enumerate(csv.reader(open(os.path.join(RDConfig.RDDataDir, 
...                               'NCI', 'first_200.tpsa.csv')))):
...   if i:
...     idx = molholder.AddSmiles(row[0])
...     idx2 = pattern_holder.AddFingerprint(
...         pattern_holder.MakeFingerprint(Chem.MolFromSmiles(row[0])))
...     assert idx==idx2
>>> library = rdSubstructLibrary.SubstructLibrary(molholder,pattern_holder)
>>> indices = library.GetMatches(core)
>>> len(indices)
11

Instance Methods [hide private]
 
AddMol(...)
AddMol( (SubstructLibrary)arg1, (Mol)mol) -> int : Adds a molecule to the substruct library
 
CountMatches(...)
CountMatches( (SubstructLibrary)query [, (Mol)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (bool)numThreads=-1 [, (int)maxResults=1000]]]]]) -> int : Get the matches for the query.
 
GetMatches(...)
GetMatches( (SubstructLibrary)arg1, (Mol)query [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1 [, (int)maxResults=1000]]]]]) -> _vectj : Get the matches for the query.
 
GetMol(...)
GetMol( (SubstructLibrary)arg1, (int)arg2) -> Mol : Returns a particular molecule in the molecule holder
 
HasMatch(...)
HasMatch( (SubstructLibrary)arg1, (Mol)query [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> bool : Get the matches for the query.
 
__init__(...)
__init__( (object)arg1) -> None :
 
__len__(...)
__len__( (SubstructLibrary)arg1) -> int :
 
__reduce__(...)
helper for pickle

Inherited from unreachable.instance: __new__

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  __instance_size__ = 24
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

AddMol(...)

 

AddMol( (SubstructLibrary)arg1, (Mol)mol) -> int :
    Adds a molecule to the substruct library

    C++ signature :
        unsigned int AddMol(RDKit::SubstructLibrary {lvalue},RDKit::ROMol)

CountMatches(...)

 

CountMatches( (SubstructLibrary)query [, (Mol)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (bool)numThreads=-1 [, (int)maxResults=1000]]]]]) -> int :
    Get the matches for the query.
    
     Arguments:
      - query:      substructure query
      - numThreads: number of threads to use, -1 means all threads
    

    C++ signature :
        unsigned int CountMatches(RDKit::SubstructLibrary {lvalue} [,RDKit::ROMol=True [,bool=True [,bool=False [,bool=-1 [,int=1000]]]]])

CountMatches( (SubstructLibrary)arg1, (Mol)query, (int)startIdx, (int)endIdx [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> int :
    Get the matches for the query.
    
     Arguments:
      - query:      substructure query
      - startIdx:   index to search from
      - endIdx:     index (non-inclusize) to search to
      - numThreads: number of threads to use, -1 means all threads
    

    C++ signature :
        unsigned int CountMatches(RDKit::SubstructLibrary {lvalue},RDKit::ROMol,unsigned int,unsigned int [,bool=True [,bool=True [,bool=False [,int=-1]]]])

GetMatches(...)

 

GetMatches( (SubstructLibrary)arg1, (Mol)query [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1 [, (int)maxResults=1000]]]]]) -> _vectj :
    Get the matches for the query.
    
     Arguments:
      - query:      substructure query
      - numThreads: number of threads to use, -1 means all threads
      - maxResults: maximum number of results to return

    C++ signature :
        std::vector<unsigned int, std::allocator<unsigned int> > GetMatches(RDKit::SubstructLibrary {lvalue},RDKit::ROMol [,bool=True [,bool=True [,bool=False [,int=-1 [,int=1000]]]]])

GetMatches( (SubstructLibrary)arg1, (Mol)query, (int)startIdx, (int)endIdx [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1 [, (int)maxResults=1000]]]]]) -> _vectj :
    Get the matches for the query.
    
     Arguments:
      - query:      substructure query
      - startIdx:   index to search from
      - endIdx:     index (non-inclusize) to search to
      - numThreads: number of threads to use, -1 means all threads
      - maxResults: maximum number of results to return

    C++ signature :
        std::vector<unsigned int, std::allocator<unsigned int> > GetMatches(RDKit::SubstructLibrary {lvalue},RDKit::ROMol,unsigned int,unsigned int [,bool=True [,bool=True [,bool=False [,int=-1 [,int=1000]]]]])

GetMol(...)

 

GetMol( (SubstructLibrary)arg1, (int)arg2) -> Mol :
    Returns a particular molecule in the molecule holder
    
      ARGUMENTS:
        - idx: which molecule to return
    
      NOTE: molecule indices start at 0
    

    C++ signature :
        boost::shared_ptr<RDKit::ROMol> GetMol(RDKit::SubstructLibrary {lvalue},unsigned int)

HasMatch(...)

 

HasMatch( (SubstructLibrary)arg1, (Mol)query [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> bool :
    Get the matches for the query.
    
     Arguments:
      - query:      substructure query
      - numThreads: number of threads to use, -1 means all threads
    

    C++ signature :
        bool HasMatch(RDKit::SubstructLibrary {lvalue},RDKit::ROMol [,bool=True [,bool=True [,bool=False [,int=-1]]]])

HasMatch( (SubstructLibrary)arg1, (Mol)query, (int)startIdx, (int)endIdx [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> bool :
    Get the matches for the query.
    
     Arguments:
      - query:      substructure query
      - startIdx:   index to search from
      - endIdx:     index (non-inclusize) to search to
      - numThreads: number of threads to use, -1 means all threads
    

    C++ signature :
        bool HasMatch(RDKit::SubstructLibrary {lvalue},RDKit::ROMol,unsigned int,unsigned int [,bool=True [,bool=True [,bool=False [,int=-1]]]])

__init__(...)
(Constructor)

 

__init__( (object)arg1) -> None :

    C++ signature :
        void __init__(_object*)

__init__( (object)arg1, (MolHolderBase)arg2) -> None :

    C++ signature :
        void __init__(_object*,boost::shared_ptr<RDKit::MolHolderBase>)

__init__( (object)arg1, (MolHolderBase)arg2, (FPHolderBase)arg3) -> None :

    C++ signature :
        void __init__(_object*,boost::shared_ptr<RDKit::MolHolderBase>,boost::shared_ptr<RDKit::FPHolderBase>)

Overrides: object.__init__

__len__(...)
(Length operator)

 

__len__( (SubstructLibrary)arg1) -> int :

    C++ signature :
        unsigned int __len__(RDKit::SubstructLibrary {lvalue})

__reduce__(...)

 
helper for pickle

Overrides: object.__reduce__
(inherited documentation)