| Trees | Indices | Help |
|
|---|
|
|
object --+
|
??.instance --+
|
SubstructLibrary
SubstructLibrary: This provides a simple API for substructure searching large datasets
The SubstructLibrary takes full advantage of available threads during the search operation.
Basic operation is simple
>>> from __future__ import print_function
>>> import os
>>> from rdkit import Chem, RDConfig
>>> from rdkit.Chem import rdSubstructLibrary
>>> library = rdSubstructLibrary.SubstructLibrary()
>>> for mol in Chem.SDMolSupplier(os.path.join(RDConfig.RDDataDir,
... 'NCI', 'first_200.props.sdf')):
... idx = library.AddMol(mol)
>>> core = Chem.MolFromSmarts('CCCCOC')
>>> indices = library.GetMatches(core)
>>> len(indices)
11
Substructure matching options can be sent into GetMatches:
>>> indices = library.GetMatches(core, useChirality=False)
>>> len(indices)
11
Controlling the number of threads or the maximum number of matches returned:
is also available (the default is to run on all cores)
>>> indices = library.GetMatches(core, numThreads=2, maxResults=10)
>>> len(indices)
10
Working on larger datasets:
Molecules are fairly large objects and will limit the number that can be kept in memory.
To assist this we supply three other molecule holders:
CachedMolHolder - stores molecules as their pickled representation
CachedSmilesMolHolder - stores molecules internally as smiles strings
CachedTrustedSmilesMolHolder = excepts (and stores) molecules as trusted smiles strings
Using Pattern fingerprints as a pre-filter:
Pattern fingerprints provide an easy way to indicate whether the substructure search should be
be done at all. This is particulary useful with the Binary and Smiles based molecule holders
as they have an expensive molecule creation step in addition to the substructure searching step
>>> library = rdSubstructLibrary.SubstructLibrary(rdSubstructLibrary.CachedSmilesMolHolder(),
... rdSubstructLibrary.PatternHolder())
>>> for mol in Chem.SDMolSupplier(os.path.join(RDConfig.RDDataDir,
... 'NCI', 'first_200.props.sdf')):
... idx = library.AddMol(mol)
>>> indices = library.GetMatches(core)
>>> len(indices)
11
This (obviously) takes longer to initialize. However, both the molecule and pattern
holders can be populated with raw data, a simple example is below:
>>> import csv
>>> molholder = rdSubstructLibrary.CachedSmilesMolHolder()
>>> pattern_holder = rdSubstructLibrary.PatternHolder()
>>> for i, row in enumerate(csv.reader(open(os.path.join(RDConfig.RDDataDir,
... 'NCI', 'first_200.tpsa.csv')))):
... if i:
... idx = molholder.AddSmiles(row[0])
... idx2 = pattern_holder.AddFingerprint(
... pattern_holder.MakeFingerprint(Chem.MolFromSmiles(row[0])))
... assert idx==idx2
>>> library = rdSubstructLibrary.SubstructLibrary(molholder,pattern_holder)
>>> indices = library.GetMatches(core)
>>> len(indices)
11
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Inherited from Inherited from |
|||
|
|||
__instance_size__ = 24
|
|||
|
|||
|
Inherited from |
|||
|
|||
AddMol( (SubstructLibrary)arg1, (Mol)mol) -> int :
Adds a molecule to the substruct library
C++ signature :
unsigned int AddMol(RDKit::SubstructLibrary {lvalue},RDKit::ROMol)
|
CountMatches( (SubstructLibrary)query [, (Mol)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (bool)numThreads=-1 [, (int)maxResults=1000]]]]]) -> int :
Get the matches for the query.
Arguments:
- query: substructure query
- numThreads: number of threads to use, -1 means all threads
C++ signature :
unsigned int CountMatches(RDKit::SubstructLibrary {lvalue} [,RDKit::ROMol=True [,bool=True [,bool=False [,bool=-1 [,int=1000]]]]])
CountMatches( (SubstructLibrary)arg1, (Mol)query, (int)startIdx, (int)endIdx [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> int :
Get the matches for the query.
Arguments:
- query: substructure query
- startIdx: index to search from
- endIdx: index (non-inclusize) to search to
- numThreads: number of threads to use, -1 means all threads
C++ signature :
unsigned int CountMatches(RDKit::SubstructLibrary {lvalue},RDKit::ROMol,unsigned int,unsigned int [,bool=True [,bool=True [,bool=False [,int=-1]]]])
|
GetMatches( (SubstructLibrary)arg1, (Mol)query [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1 [, (int)maxResults=1000]]]]]) -> _vectj :
Get the matches for the query.
Arguments:
- query: substructure query
- numThreads: number of threads to use, -1 means all threads
- maxResults: maximum number of results to return
C++ signature :
std::vector<unsigned int, std::allocator<unsigned int> > GetMatches(RDKit::SubstructLibrary {lvalue},RDKit::ROMol [,bool=True [,bool=True [,bool=False [,int=-1 [,int=1000]]]]])
GetMatches( (SubstructLibrary)arg1, (Mol)query, (int)startIdx, (int)endIdx [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1 [, (int)maxResults=1000]]]]]) -> _vectj :
Get the matches for the query.
Arguments:
- query: substructure query
- startIdx: index to search from
- endIdx: index (non-inclusize) to search to
- numThreads: number of threads to use, -1 means all threads
- maxResults: maximum number of results to return
C++ signature :
std::vector<unsigned int, std::allocator<unsigned int> > GetMatches(RDKit::SubstructLibrary {lvalue},RDKit::ROMol,unsigned int,unsigned int [,bool=True [,bool=True [,bool=False [,int=-1 [,int=1000]]]]])
|
GetMol( (SubstructLibrary)arg1, (int)arg2) -> Mol :
Returns a particular molecule in the molecule holder
ARGUMENTS:
- idx: which molecule to return
NOTE: molecule indices start at 0
C++ signature :
boost::shared_ptr<RDKit::ROMol> GetMol(RDKit::SubstructLibrary {lvalue},unsigned int)
|
HasMatch( (SubstructLibrary)arg1, (Mol)query [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> bool :
Get the matches for the query.
Arguments:
- query: substructure query
- numThreads: number of threads to use, -1 means all threads
C++ signature :
bool HasMatch(RDKit::SubstructLibrary {lvalue},RDKit::ROMol [,bool=True [,bool=True [,bool=False [,int=-1]]]])
HasMatch( (SubstructLibrary)arg1, (Mol)query, (int)startIdx, (int)endIdx [, (bool)recursionPossible=True [, (bool)useChirality=True [, (bool)useQueryQueryMatches=False [, (int)numThreads=-1]]]]) -> bool :
Get the matches for the query.
Arguments:
- query: substructure query
- startIdx: index to search from
- endIdx: index (non-inclusize) to search to
- numThreads: number of threads to use, -1 means all threads
C++ signature :
bool HasMatch(RDKit::SubstructLibrary {lvalue},RDKit::ROMol,unsigned int,unsigned int [,bool=True [,bool=True [,bool=False [,int=-1]]]])
|
__init__( (object)arg1) -> None :
C++ signature :
void __init__(_object*)
__init__( (object)arg1, (MolHolderBase)arg2) -> None :
C++ signature :
void __init__(_object*,boost::shared_ptr<RDKit::MolHolderBase>)
__init__( (object)arg1, (MolHolderBase)arg2, (FPHolderBase)arg3) -> None :
C++ signature :
void __init__(_object*,boost::shared_ptr<RDKit::MolHolderBase>,boost::shared_ptr<RDKit::FPHolderBase>)
|
__len__( (SubstructLibrary)arg1) -> int :
C++ signature :
unsigned int __len__(RDKit::SubstructLibrary {lvalue})
|
helper for pickle
|
| Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Sun Oct 8 11:32:01 2017 | http://epydoc.sourceforge.net |