Package Chem :: Module rdmolops
[hide private]
[frames] | no frames]

Module rdmolops



Module containing RDKit functionality for manipulating and querying molecules.



Functions [hide private]
 
AddHs(...)
Adds hydrogens to the graph of a molecule.
source code
 
AssignAtomChiralCodes(...)
Does the CIP chirality assignment (R/S) for the molecule's atoms.
source code
 
AssignBondStereoCodes(...)
Does the CIP stereochemistry assignment (Z/E) for the molecule's bonds .
source code
 
DaylightFingerprint(...)
Returns a "Daylight"-type fingerprint for a molecule Explanation of the algorithm below.
source code
 
DeleteSubstructs(...)
Removes atoms matching a substructure query from a molecule ARGUMENTS: - mol: the molecule to be modified - query: the molecule to be used as a substructure query - onlyFrags: (optional) if this toggle is set, atoms will only be removed if the entire fragment in which they are found is matched by the query.
source code
 
FindAllPathsOfLengthN(...)
Finds all paths of a particular length in a molecule ARGUMENTS: - mol: the molecule to use - length: an integer with the target length for the paths.
source code
 
FindAllSubgraphsOfLengthN(...)
Finds all subgraphs of a particular length in a molecule ARGUMENTS: - mol: the molecule to use - length: an integer with the target number of bonds for the subgraphs.
source code
 
FindUniqueSubgraphsOfLengthN(...)
Finds unique subgraphs of a particular length in a molecule ARGUMENTS: - mol: the molecule to use - length: an integer with the target number of bonds for the subgraphs.
source code
 
GetAdjacencyMatrix(...)
Returns the molecule's adjacency matrix.
source code
 
GetDistanceMatrix(...)
Returns the molecule's distance matrix.
source code
 
GetFormalCharge(...)
Returns the formal charge for the molecule.
source code
 
GetMolFrags(...)
Finds the disconnected fragments from a molecule.
source code
 
GetSSSR(...)
Get the smallest set of simple rings for a molecule.
source code
 
GetSymmSSSR(...)
Get a symmetrized SSSR for a molecule.
source code
 
Kekulize(...)
Kekulizes the molecule ARGUMENTS: - mol: the molecule to use - clearAromaticFlags: (optional) if this toggle is set, all atoms and bonds in the molecule will be marked non-aromatic following the kekulization.
source code
 
RDKFingerprint(...)
Returns an RDKit topological fingerprint for a molecule Explanation of the algorithm below.
source code
 
RemoveHs(...)
Removes any hydrogens from the graph of a molecule.
source code
 
ReplaceCore(...)
Removes the core of a molecule and labels the sidechains with dummy atoms.
source code
 
ReplaceSidechains(...)
Replaces sidechains in a molecule with dummy atoms for their attachment points.
source code
 
ReplaceSubstructs(...)
Replaces atoms matching a substructure query in a molecule ARGUMENTS: - mol: the molecule to be modified - query: the molecule to be used as a substructure query - replacement: the molecule to be used as the replacement - replaceAll: (optional) if this toggle is set, all substructures matching the query will be replaced in a single result, otherwise each result will contain a separate replacement.
source code
 
SanitizeMol(...)
Kekulize, check valencies, set aromaticity, conjugation and hybridization - The molecule is modified in place.
source code
 
WedgeMolBonds(...)
Set the wedging on single bonds in a molecule.
source code
Function Details [hide private]

AddHs(...)

source code 
Adds hydrogens to the graph of a molecule.

  ARGUMENTS:

    - mol: the molecule to be modified

    - explicitOnly: (optional) if this toggle is set, only explicit Hs will
      be added to the molecule.  Default value is 0 (add implicit and explicit Hs).

    - addCoords: (optional) if this toggle is set, The Hs will have 3D coordinates
      set.  Default value is 0 (no 3D coords).

  RETURNS: a new molecule with added Hs

  NOTES:

    - The original molecule is *not* modified.

    - Much of the code assumes that Hs are not included in the molecular
      topology, so be *very* careful with the molecule that comes back from
      this function.


C++ signature:
    AddHs(RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False) -> RDKit::ROMol*

AssignAtomChiralCodes(...)

source code 
Does the CIP chirality assignment (R/S) 
  for the molecule's atoms.
  Chiral atoms will have a property '_CIPCode' indicating
  their chiral code.

  ARGUMENTS:

    - mol: the molecule to use
    - cleanIt: (optional) if provided, atoms with a chiral specifier that aren't
      actually chiral (e.g. atoms with duplicate substituents or only 2 substituents,
      etc.) will have their chiral code set to CHI_UNSPECIFIED
    - force: (optional) causes the calculation to be repeated, even if it has already
      been done


C++ signature:
    AssignAtomChiralCodes(RDKit::ROMol {lvalue} mol, bool cleanIt=False, bool force=False) -> void*

AssignBondStereoCodes(...)

source code 
Does the CIP stereochemistry assignment (Z/E)
   for the molecule's bonds .
  Qualifying bonds will have a property '_CIPCode' indicating
  their stereochemistry.

  ARGUMENTS:

    - mol: the molecule to use
    - cleanIt: (optional) ignored
    - force: (optional) causes the calculation to be repeated, even if it has already
      been done


C++ signature:
    AssignBondStereoCodes(RDKit::ROMol {lvalue} mol, bool cleanIt=False, bool force=False) -> void*

DaylightFingerprint(...)

source code 
Returns a "Daylight"-type fingerprint for a molecule

  Explanation of the algorithm below.

  ARGUMENTS:

    - mol: the molecule to use

    - minPath: (optional) minimum number of bonds to include in the subgraphs
      Defaults to 1.

    - maxPath: (optional) maximum number of bonds to include in the subgraphs
      Defaults to 7.

    - fpSize: (optional) number of bits in the fingerprint
      Defaults to 2048.

    - nBitsPerPath: (optional) number of bits to set per path
      Defaults to 4.

    - useHs: (optional) include information about number of Hs on each
      atom when calculating path hashes.
      Defaults to 1.

    - tgtDensity: (optional) fold the fingerprint until this minimum density has
      been reached
      Defaults to 0.

    - minSize: (optional) the minimum size the fingerprint will be folded to when
      trying to reach tgtDensity
      Defaults to 128.

  RETURNS: a DataStructs.ExplicitBitVect with _fpSize_ bits

  ALGORITHM:

   This algorithm functions by find all paths between minPath and maxPath in
    length.  For each path:

     1) The Balaban J value is calculated.

     2) The 32 bit Balaban J value is used to seed a random-number generator

     3) _nBitsPerPath_ random numbers are generated and used to set the corresponding
        bits in the fingerprint



C++ signature:
    DaylightFingerprint(RDKit::ROMol mol, unsigned int minPath=1, unsigned int maxPath=7, unsigned int fpSize=2048, unsigned int nBitsPerHash=4, bool useHs=True, double tgtDensity=0.0, unsigned int minSize=128) -> ExplicitBitVect*

DeleteSubstructs(...)

source code 
Removes atoms matching a substructure query from a molecule

  ARGUMENTS:

    - mol: the molecule to be modified

    - query: the molecule to be used as a substructure query

    - onlyFrags: (optional) if this toggle is set, atoms will only be removed if
      the entire fragment in which they are found is matched by the query.
      See below for examples.
      Default value is 0 (remove the atoms whether or not the entire fragment matches)

  RETURNS: a new molecule with the substructure removed

  NOTES:

    - The original molecule is *not* modified.

  EXAMPLES:

   The following examples substitute SMILES/SMARTS strings for molecules, you'd have
   to actually use molecules:

    - DeleteSubstructs('CCOC','OC') -> 'CC'

    - DeleteSubstructs('CCOC','OC',1) -> 'CCOC'

    - DeleteSubstructs('CCOCCl.Cl','Cl',1) -> 'CCOCCl'

    - DeleteSubstructs('CCOCCl.Cl','Cl') -> 'CCOC'


C++ signature:
    DeleteSubstructs(RDKit::ROMol mol, RDKit::ROMol query, bool onlyFrags=False) -> RDKit::ROMol*

FindAllPathsOfLengthN(...)

source code 
Finds all paths of a particular length in a molecule

  ARGUMENTS:

    - mol: the molecule to use

    - length: an integer with the target length for the paths.

    - useBonds: (optional) toggles the use of bond indices in the paths.
      Otherwise atom indices are used.  *Note* this behavior is different
      from that for subgraphs.
      Defaults to 1.

  RETURNS: a tuple of tuples with IDs for the bonds.

  NOTES: 

   - Difference between _subgraphs_ and _paths_ :: 

       Subgraphs are potentially branched, whereas paths (in our 
       terminology at least) cannot be.  So, the following graph: 

            C--0--C--1--C--3--C
                  |
                  2
                  |
                  C

       has 3 _subgraphs_ of length 3: (0,1,2),(0,1,3),(2,1,3)
       but only 2 _paths_ of length 3: (0,1,3),(2,1,3)


C++ signature:
    FindAllPathsOfLengthN(RDKit::ROMol mol, unsigned int length, bool useBonds=True, bool useHs=False) -> std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >

FindAllSubgraphsOfLengthN(...)

source code 
Finds all subgraphs of a particular length in a molecule

  ARGUMENTS:

    - mol: the molecule to use

    - length: an integer with the target number of bonds for the subgraphs.

    - useHs: (optional) toggles whether or not bonds to Hs that are part of the graph
      should be included in the results.
      Defaults to 0.

    - verbose: (optional, internal use) toggles verbosity in the search algorithm.
      Defaults to 0.

  RETURNS: a tuple of 2-tuples with bond IDs

  NOTES: 

   - Difference between _subgraphs_ and _paths_ :: 

       Subgraphs are potentially branched, whereas paths (in our 
       terminology at least) cannot be.  So, the following graph: 

            C--0--C--1--C--3--C
                  |
                  2
                  |
                  C
  has 3 _subgraphs_ of length 3: (0,1,2),(0,1,3),(2,1,3)
  but only 2 _paths_ of length 3: (0,1,3),(2,1,3)


C++ signature:
    FindAllSubgraphsOfLengthN(RDKit::ROMol mol, unsigned int length, bool useHs=False) -> std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >

FindUniqueSubgraphsOfLengthN(...)

source code 
Finds unique subgraphs of a particular length in a molecule

  ARGUMENTS:

    - mol: the molecule to use

    - length: an integer with the target number of bonds for the subgraphs.

    - useHs: (optional) toggles whether or not bonds to Hs that are part of the graph
      should be included in the results.
      Defaults to 0.

    - useBO: (optional) Toggles use of bond orders in distinguishing one subgraph from
      another.
      Defaults to 1.

  RETURNS: a tuple of tuples with bond IDs



C++ signature:
    FindUniqueSubgraphsOfLengthN(RDKit::ROMol mol, unsigned int length, bool useHs=False, bool useBO=True) -> std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >

GetAdjacencyMatrix(...)

source code 
Returns the molecule's adjacency matrix.

  ARGUMENTS:

    - mol: the molecule to use

    - useBO: (optional) toggles use of bond orders in calculating the matrix.
      Default value is 0.

    - emptyVal: (optional) sets the elements of the matrix between non-adjacent atoms
      Default value is 0.

    - force: (optional) forces the calculation to proceed, even if there is a cached value.
      Default value is 0.

    - prefix: (optional, internal use) sets the prefix used in the property cache
      Default value is .

  RETURNS: a Numeric array of floats containing the adjacency matrix


C++ signature:
    GetAdjacencyMatrix(RDKit::ROMol {lvalue} mol, bool useBO=False, int emptyVal=0, bool force=False, char const* prefix='') -> _object*

GetDistanceMatrix(...)

source code 
Returns the molecule's distance matrix.

  ARGUMENTS:

    - mol: the molecule to use

    - useBO: (optional) toggles use of bond orders in calculating the distance matrix.
      Default value is 0.

    - useAtomWts: (optional) toggles using atom weights for the diagonal elements of the
      matrix (to return a "Balaban" distance matrix).
      Default value is 0.

    - force: (optional) forces the calculation to proceed, even if there is a cached value.
      Default value is 0.

    - prefix: (optional, internal use) sets the prefix used in the property cache
      Default value is .

  RETURNS: a Numeric array of floats with the distance matrix


C++ signature:
    GetDistanceMatrix(RDKit::ROMol {lvalue} mol, bool useBO=False, bool useAtomWts=False, bool force=False, char const* prefix='') -> _object*

GetFormalCharge(...)

source code 
Returns the formal charge for the molecule.

  ARGUMENTS:

    - mol: the molecule to use


C++ signature:
    GetFormalCharge(RDKit::ROMol) -> int

GetMolFrags(...)

source code 
Finds the disconnected fragments from a molecule.

  For example, for the molecule 'CC(=O)[O-].[NH3+]C' GetMolFrags() returns
  ((0, 1, 2, 3), (4, 5))

  ARGUMENTS:

    - mol: the molecule to use

  RETURNS: a tuple of tuples with IDs for the atoms in each fragment.


C++ signature:
    GetMolFrags(RDKit::ROMol) -> std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >

GetSSSR(...)

source code 
Get the smallest set of simple rings for a molecule.

  ARGUMENTS:

    - mol: the molecule to use.

  RETURNS: the number of rings found
         This will be equal to NumBonds-NumAtoms+1 for single-fragment molecules.


C++ signature:
    GetSSSR(RDKit::ROMol {lvalue}) -> int

GetSymmSSSR(...)

source code 
Get a symmetrized SSSR for a molecule.

  The symmetrized SSSR is at least as large as the SSSR for a molecule.
  In certain highly-symmetric cases (e.g. cubane), the symmetrized SSSR can be
  a bit larger (i.e. the number of symmetrized rings is >= NumBonds-NumAtoms+1).

  ARGUMENTS:

    - mol: the molecule to use.

  RETURNS: the number of rings found


C++ signature:
    GetSymmSSSR(RDKit::ROMol {lvalue}) -> std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >

Kekulize(...)

source code 
Kekulizes the molecule

  ARGUMENTS:

    - mol: the molecule to use

    - clearAromaticFlags: (optional) if this toggle is set, all atoms and bonds in the 
      molecule will be marked non-aromatic following the kekulization.
      Default value is 0.

  NOTES:

    - The molecule is modified in place.


C++ signature:
    Kekulize(RDKit::ROMol {lvalue} mol, bool clearAromaticFlags=False) -> void*

RDKFingerprint(...)

source code 
Returns an RDKit topological fingerprint for a molecule

  Explanation of the algorithm below.

  ARGUMENTS:

    - mol: the molecule to use

    - minPath: (optional) minimum number of bonds to include in the subgraphs
      Defaults to 1.

    - maxPath: (optional) maximum number of bonds to include in the subgraphs
      Defaults to 7.

    - fpSize: (optional) number of bits in the fingerprint
      Defaults to 2048.

    - nBitsPerPath: (optional) number of bits to set per path
      Defaults to 4.

    - useHs: (optional) include information about number of Hs on each
      atom when calculating path hashes.
      Defaults to 1.

    - tgtDensity: (optional) fold the fingerprint until this minimum density has
      been reached
      Defaults to 0.

    - minSize: (optional) the minimum size the fingerprint will be folded to when
      trying to reach tgtDensity
      Defaults to 128.

  RETURNS: a DataStructs.ExplicitBitVect with _fpSize_ bits

  ALGORITHM:

   This algorithm functions by find all paths between minPath and maxPath in
    length.  For each path:

     1) A hash is calculated.

     2) The hash is used to seed a random-number generator

     3) _nBitsPerPath_ random numbers are generated and used to set the corresponding
        bits in the fingerprint



C++ signature:
    RDKFingerprint(RDKit::ROMol mol, unsigned int minPath=1, unsigned int maxPath=7, unsigned int fpSize=2048, unsigned int nBitsPerHash=4, bool useHs=True, double tgtDensity=0.0, unsigned int minSize=128) -> ExplicitBitVect*

RemoveHs(...)

source code 
Removes any hydrogens from the graph of a molecule.

  ARGUMENTS:

    - mol: the molecule to be modified

    - implicitOnly: (optional) if this toggle is set, only implicit Hs will
      be removed from the graph.  Default value is 0 (remove implicit and explicit Hs).

  RETURNS: a new molecule with the Hs removed

  NOTES:

    - The original molecule is *not* modified.


C++ signature:
    RemoveHs(RDKit::ROMol mol, bool implicitOnly=False) -> RDKit::ROMol*

ReplaceCore(...)

source code 
Removes the core of a molecule and labels the sidechains with dummy atoms.

  ARGUMENTS:

    - mol: the molecule to be modified

    - coreQuery: the molecule to be used as a substructure query for recognizing the core

    - replaceDummies: toggles replacement of atoms that match dummies in the query

  RETURNS: a new molecule with the core removed

  NOTES:

    - The original molecule is *not* modified.

  EXAMPLES:

   The following examples substitute SMILES/SMARTS strings for molecules, you'd have
   to actually use molecules:

    - ReplaceCore('CCC1CCC1','C1CCC1') -> 'CC[Xa]'

    - ReplaceCore('CCC1CC1','C1CCC1') -> ''

    - ReplaceCore('C1CC2C1CCC2','C1CCC1') -> '[Xa]C1CCC1[Xb]'

    - ReplaceCore('C1CNCC1','N') -> '[Xa]CCCC[Xb]'

    - ReplaceCore('C1CCC1CN','C1CCC1[*]',False) -> '[Xa]CN'


C++ signature:
    ReplaceCore(RDKit::ROMol mol, RDKit::ROMol coreQuery, bool replaceDummies=True) -> RDKit::ROMol*

ReplaceSidechains(...)

source code 
Replaces sidechains in a molecule with dummy atoms for their attachment points.

  ARGUMENTS:

    - mol: the molecule to be modified

    - coreQuery: the molecule to be used as a substructure query for recognizing the core

  RETURNS: a new molecule with the sidechains removed

  NOTES:

    - The original molecule is *not* modified.

  EXAMPLES:

   The following examples substitute SMILES/SMARTS strings for molecules, you'd have
   to actually use molecules:

    - ReplaceSidechains('CCC1CCC1','C1CCC1') -> '[Xa]C1CCC1'

    - ReplaceSidechains('CCC1CC1','C1CCC1') -> ''

    - ReplaceSidechains('C1CC2C1CCC2','C1CCC1') -> '[Xa]C1CCC1[Xb]'


C++ signature:
    ReplaceSidechains(RDKit::ROMol mol, RDKit::ROMol coreQuery) -> RDKit::ROMol*

ReplaceSubstructs(...)

source code 
Replaces atoms matching a substructure query in a molecule

  ARGUMENTS:

    - mol: the molecule to be modified

    - query: the molecule to be used as a substructure query

    - replacement: the molecule to be used as the replacement

    - replaceAll: (optional) if this toggle is set, all substructures matching
      the query will be replaced in a single result, otherwise each result will
      contain a separate replacement.
      Default value is False (return multiple replacements)

  RETURNS: a tuple of new molecules with the substructures replaced removed

  NOTES:

    - The original molecule is *not* modified.

  EXAMPLES:

   The following examples substitute SMILES/SMARTS strings for molecules, you'd have
   to actually use molecules:

    - ReplaceSubstructs('CCOC','OC','NC') -> ('CCNC',)

    - ReplaceSubstructs('COCCOC','OC','NC') -> ('COCCNC','CNCCOC')

    - ReplaceSubstructs('COCCOC','OC','NC',True) -> ('CNCCNC',)


C++ signature:
    ReplaceSubstructs(RDKit::ROMol mol, RDKit::ROMol query, RDKit::ROMol replacement, bool replaceAll=False) -> _object*

SanitizeMol(...)

source code 
Kekulize, check valencies, set aromaticity, conjugation and hybridization

    - The molecule is modified in place.

    - If sanitization fails, an exception will be thrown

  ARGUMENTS:

    - mol: the molecule to be modified

  NOTES:


C++ signature:
    SanitizeMol(RDKit::ROMol {lvalue}) -> void*

WedgeMolBonds(...)

source code 
Set the wedging on single bonds in a molecule.
    The wedging scheme used is that from Mol files.
 
  ARGUMENTS:

    - molecule: the molecule to update
 


C++ signature:
    WedgeMolBonds(RDKit::ROMol {lvalue}, RDKit::Conformer const*) -> void*