Package rdkit :: Package ML :: Package Descriptors :: Module Parser
[hide private]
[frames] | no frames]

Module Parser

source code

The "parser" for compound descriptors.

I almost hesitate to document this, because it's not the prettiest
thing the world has ever seen... but it does work (for at least some
definitions of the word).

Rather than getting into the whole mess of writing a parser for the
compound descriptor expressions, I'm just using string substitutions
and python's wonderful ability to *eval* code.

It would probably be a good idea at some point to replace this with a
real parser, if only for the flexibility and intelligent error
messages that would become possible.

The general idea is that we're going to deal with expressions where
atomic descriptors have some kind of method applied to them which
reduces them to a single number for the entire composition.  Compound
descriptors (those applicable to the compound as a whole) are not
operated on by anything in particular (except for standard math stuff).

Here's the general flow of things:

  1) Composition descriptor references ($a, $b, etc.) are replaced with the
     corresponding descriptor names using string subsitution.
     (*_SubForCompoundDescriptors*)

  2) Atomic descriptor references ($1, $2, etc) are replaced with lookups
     into the atomic dict with "DEADBEEF" in place of the atom name.
     (*_SubForAtomicVars*)

  3) Calls to Calculator Functions are augmented with a reference to
     the composition and atomic dictionary
     (*_SubMethodArgs*)

**NOTE:**

  anytime we don't know the answer for a descriptor, rather than
  throwing a (completely incomprehensible) exception, we just return
  -666.  So bad descriptor values should stand out like sore thumbs.

Functions [hide private]
 
HAS(strArg, composList, atomDict)
*Calculator Method*
source code
 
SUM(strArg, composList, atomDict)
*Calculator Method*
source code
 
MEAN(strArg, composList, atomDict)
*Calculator Method*
source code
 
AVG(strArg, composList, atomDict)
*Calculator Method*
source code
 
DEV(strArg, composList, atomDict)
*Calculator Method*
source code
 
MIN(strArg, composList, atomDict)
*Calculator Method*
source code
 
MAX(strArg, composList, atomDict)
*Calculator Method*
source code
 
_SubForAtomicVars(cExpr, varList, dictName)
replace atomic variables with the appropriate dictionary lookup
source code
 
_SubForCompoundDescriptors(cExpr, varList, dictName)
replace compound variables with the appropriate list index
source code
 
_SubMethodArgs(cExpr, knownMethods)
alters the arguments of calls to calculator methods
source code
 
CalcSingleCompoundDescriptor(compos, argVect, atomDict, propDict)
calculates the value of the descriptor for a single compound
source code
 
CalcMultipleCompoundsDescriptor(composVect, argVect, atomDict, propDictList)
calculates the value of the descriptor for a list of compounds
source code
Variables [hide private]
  __DEBUG = 0
  knownMethods = ['SUM', 'MIN', 'MAX', 'MEAN', 'AVG', 'DEV', 'HAS']
  __package__ = 'rdkit.ML.Descriptors'
  e = 2.71828182846
  pi = 3.14159265359

Imports: RDConfig, string, acos, acosh, asin, asinh, atan, atan2, atanh, ceil, copysign, cos, cosh, degrees, erf, erfc, exp, expm1, fabs, factorial, floor, fmod, frexp, fsum, gamma, hypot, isinf, isnan, ldexp, lgamma, log, log10, log1p, modf, pow, radians, sin, sinh, sqrt, tan, tanh, trunc


Function Details [hide private]

HAS(strArg, composList, atomDict)

source code 
*Calculator Method*

does a string search

**Arguments**

  - strArg: the arguments in string form

  - composList: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  1 or 0

SUM(strArg, composList, atomDict)

source code 
*Calculator Method*

calculates the sum of a descriptor across a composition

**Arguments**

  - strArg: the arguments in string form

  - compos: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  a float

MEAN(strArg, composList, atomDict)

source code 
*Calculator Method*

calculates the average of a descriptor across a composition

**Arguments**

  - strArg: the arguments in string form

  - compos: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  a float

AVG(strArg, composList, atomDict)

source code 
*Calculator Method*

calculates the average of a descriptor across a composition

**Arguments**

  - strArg: the arguments in string form

  - compos: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  a float

DEV(strArg, composList, atomDict)

source code 
*Calculator Method*

calculates the average deviation of a descriptor across a composition

**Arguments**

  - strArg: the arguments in string form

  - compos: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  a float

MIN(strArg, composList, atomDict)

source code 
*Calculator Method*

calculates the minimum value of a descriptor across a composition

**Arguments**

  - strArg: the arguments in string form

  - compos: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  a float

MAX(strArg, composList, atomDict)

source code 
*Calculator Method*

calculates the maximum value of a descriptor across a composition

**Arguments**

  - strArg: the arguments in string form

  - compos: the composition vector

  - atomDict: the atomic dictionary

**Returns**

  a float

_SubForAtomicVars(cExpr, varList, dictName)

source code 
replace atomic variables with the appropriate dictionary lookup

*Not intended for client use*

_SubForCompoundDescriptors(cExpr, varList, dictName)

source code 
replace compound variables with the appropriate list index

*Not intended for client use*

_SubMethodArgs(cExpr, knownMethods)

source code 
alters the arguments of calls to calculator methods

*Not intended for client use*

This is kind of putrid (and the code ain't so pretty either)
The general idea is that the various special methods for atomic
descriptors need two extra arguments (the composition and the atomic
dict).  Rather than make the user type those in, we just find
invocations of these methods and fill out the function calls using
string replacements.

CalcSingleCompoundDescriptor(compos, argVect, atomDict, propDict)

source code 
calculates the value of the descriptor for a single compound

**ARGUMENTS:**

  - compos: a vector/tuple containing the composition
     information... in the form:
     '[("Fe",1.),("Pt",2.),("Rh",0.02)]'

  - argVect: a vector/tuple with three elements:
  
       1) AtomicDescriptorNames:  a list/tuple of the names of the
         atomic descriptors being used. These determine the
         meaning of $1, $2, etc. in the expression

       2) CompoundDescriptorNames:  a list/tuple of the names of the
         compound descriptors being used. These determine the
         meaning of $a, $b, etc. in the expression

       3) Expr: a string containing the expression to be used to
         evaluate the final result.

  - atomDict:
       a dictionary of atomic descriptors.  Each atomic entry is
       another dictionary containing the individual descriptors
       and their values

  - propVect:
       a list of descriptors for the composition.  

**RETURNS:**

  the value of the descriptor, -666 if a problem was encountered
  
**NOTE:**

  - because it takes rather a lot of work to get everything set
      up to calculate a descriptor, if you are calculating the
      same descriptor for multiple compounds, you probably want to
      be calling _CalcMultipleCompoundsDescriptor()_.

CalcMultipleCompoundsDescriptor(composVect, argVect, atomDict, propDictList)

source code 
calculates the value of the descriptor for a list of compounds

**ARGUMENTS:**

  - composVect: a vector of vector/tuple containing the composition
     information.
     See _CalcSingleCompoundDescriptor()_ for an explanation of the elements.

  - argVect: a vector/tuple with three elements:

       1) AtomicDescriptorNames:  a list/tuple of the names of the
         atomic descriptors being used. These determine the
         meaning of $1, $2, etc. in the expression

       2) CompoundDsscriptorNames:  a list/tuple of the names of the
         compound descriptors being used. These determine the
         meaning of $a, $b, etc. in the expression

       3) Expr: a string containing the expression to be used to
         evaluate the final result.

  - atomDict:
       a dictionary of atomic descriptors.  Each atomic entry is
       another dictionary containing the individual descriptors
       and their values

  - propVectList:
     a vector of vectors of descriptors for the composition.  

**RETURNS:**

  a vector containing the values of the descriptor for each
  compound.  Any given entry will be -666 if problems were
  encountered