Package Chem :: Module BuildFragmentCatalog
[hide private]
[frames] | no frames]

Module BuildFragmentCatalog

source code

command line utility for working with FragmentCatalogs (CASE-type analysis)

**Usage**

  BuildFragmentCatalog [optional args] <filename>

 filename, the name of a delimited text file containing InData, is required
 for some modes of operation (see below)

**Command Line Arguments**

 - -n *maxNumMols*:  specify the maximum number of molecules to be processed

 - -b: build the catalog and OnBitLists
    *requires InData*

 - -s: score compounds
    *requires InData and a Catalog, can use OnBitLists*

 - -g: calculate info gains
    *requires Scores*

 - -d: show details about high-ranking fragments
    *requires a Catalog and Gains*

 - --catalog=*filename*: filename with the pickled catalog.
    If -b is provided, this file will be overwritten.

 - --onbits=*filename*: filename to hold the pickled OnBitLists.
   If -b is provided, this file will be overwritten
  
 - --scores=*filename*: filename to hold the text score data.
   If -s is provided, this file will be overwritten

 - --gains=*filename*: filename to hold the text gains data.
   If -g is provided, this file will be overwritten

 - --details=*filename*: filename to hold the text details data.
   If -d is provided, this file will be overwritten.

 - --minPath=2: specify the minimum length for a path

 - --maxPath=6: specify the maximum length for a path

 - --smiCol=1: specify which column in the input data file contains
     SMILES

 - --actCol=-1: specify which column in the input data file contains
     activities

 - --nActs=2: specify the number of possible activity values

 - --nBits=-1: specify the maximum number of bits to show details for



Classes [hide private]
  RunDetails
Functions [hide private]
 
message(msg, dest=sys.stdout) source code
 
BuildCatalog(suppl, maxPts=-1, groupFileName=None, minPath=2, maxPath=6, reportFreq=10)
builds a fragment catalog from a set of molecules in a delimited text block...
source code
 
ScoreMolecules(suppl, catalog, maxPts=-1, actName='', acts=None, nActs=2, reportFreq=10)
scores the compounds in a supplier using a catalog **Arguments** - suppl: a mol supplier - catalog: the FragmentCatalog - maxPts: (optional) the maximum number of molecules to be considered - actName: (optional) the name of the molecule's activity property.
source code
 
ScoreFromLists(bitLists, suppl, catalog, maxPts=-1, actName='', acts=None, nActs=2, reportFreq=10)
similar to _ScoreMolecules()_, but uses pre-calculated bit lists for the molecules (this speeds things up a lot) **Arguments** - bitLists: sequence of on bit sequences for the input molecules - suppl: the input supplier (we read activities from here) - catalog: the FragmentCatalog - maxPts: (optional) the maximum number of molecules to be considered - actName: (optional) the name of the molecule's activity property.
source code
 
CalcGains(suppl, catalog, topN=-1, actName='', acts=None, nActs=2, reportFreq=10, biasList=None, collectFps=0)
calculates info gains by constructing fingerprints...
source code
 
CalcGainsFromFps(suppl, fps, topN=-1, actName='', acts=None, nActs=2, reportFreq=10, biasList=None)
calculates info gains from a set of fingerprints...
source code
 
OutputGainsData(outF, gains, cat, nActs=2) source code
 
ProcessGainsData(inF, delim=',', idCol=0, gainCol=1)
reads a list of ids and info gains out of an input file...
source code
 
ShowDetails(catalog, gains, nToDo=-1, outF=sys.stdout, idCol=0, gainCol=1, outDelim=',')
gains should be a sequence of sequences.
source code
 
SupplierFromDetails(details) source code
 
Usage() source code
 
ParseArgs(details) source code
Variables [hide private]
  _cvsVersion = '$Revision: 2 $'
  idx1 = 10
  idx2 = 13
  __VERSION_STRING = ' 2 '
  Complex0 = 'F'
  Complex16 = 'F'
  Complex32 = 'F'
  Complex64 = 'D'
  Complex8 = 'F'
  Float0 = 'f'
  Float16 = 'f'
  Float32 = 'f'
  Float64 = 'd'
  Float8 = 'f'
  Int0 = '1'
  Int16 = 's'
  Int32 = 'i'
  Int8 = '1'
  absolute = <ufunc 'absolute'>
  add = <ufunc 'add'>
  arccos = <ufunc 'arccos'>
  arccosh = <ufunc 'arccosh'>
  arcsin = <ufunc 'arcsin'>
  arcsinh = <ufunc 'arcsinh'>
  arctan = <ufunc 'arctan'>
  arctan2 = <ufunc 'arctan2'>
  arctanh = <ufunc 'arctanh'>
  bitwise_and = <ufunc 'bitwise_and'>
  bitwise_or = <ufunc 'bitwise_or'>
  bitwise_xor = <ufunc 'bitwise_xor'>
  ceil = <ufunc 'ceil'>
  conjugate = <ufunc 'conjugate'>
  cos = <ufunc 'cos'>
  cosh = <ufunc 'cosh'>
  divide = <ufunc 'divide'>
  divide_safe = <ufunc 'divide_safe'>
  e = 2.71828182846
  equal = <ufunc 'equal'>
  exp = <ufunc 'exp'>
  fabs = <ufunc 'fabs'>
  floor = <ufunc 'floor'>
  floor_divide = <ufunc 'floor_divide'>
  fmod = <ufunc 'fmod'>
  greater = <ufunc 'greater'>
  greater_equal = <ufunc 'greater_equal'>
  hypot = <ufunc 'hypot'>
  invert = <ufunc 'invert'>
  left_shift = <ufunc 'left_shift'>
  less = <ufunc 'less'>
  less_equal = <ufunc 'less_equal'>
  log = <ufunc 'log'>
  log10 = <ufunc 'log10'>
  logical_and = <ufunc 'logical_and'>
  logical_not = <ufunc 'logical_not'>
  logical_or = <ufunc 'logical_or'>
  logical_xor = <ufunc 'logical_xor'>
  maximum = <ufunc 'maximum'>
  minimum = <ufunc 'minimum'>
  multiply = <ufunc 'multiply'>
  negative = <ufunc 'negative'>
  not_equal = <ufunc 'not_equal'>
  pi = 3.14159265359
  power = <ufunc 'power'>
  remainder = <ufunc 'remainder'>
  right_shift = <ufunc 'right_shift'>
  sin = <ufunc 'sin'>
  sinh = <ufunc 'sinh'>
  sqrt = <ufunc 'sqrt'>
  subtract = <ufunc 'subtract'>
  tan = <ufunc 'tan'>
  tanh = <ufunc 'tanh'>
  true_divide = <ufunc 'true_divide'>
Function Details [hide private]

BuildCatalog(suppl, maxPts=-1, groupFileName=None, minPath=2, maxPath=6, reportFreq=10)

source code 
builds a fragment catalog from a set of molecules in a delimited text block

**Arguments**

  - suppl: a mol supplier

  - maxPts: (optional) if provided, this will set an upper bound on the
    number of points to be considered

  - groupFileName: (optional) name of the file containing functional group
    information

  - minPath, maxPath: (optional) names of the minimum and maximum path lengths
    to be considered

  - reportFreq: (optional) how often to display status information  

**Returns**

  a FragmentCatalog
  

ScoreMolecules(suppl, catalog, maxPts=-1, actName='', acts=None, nActs=2, reportFreq=10)

source code 
scores the compounds in a supplier using a catalog

**Arguments**

  - suppl: a mol supplier

  - catalog: the FragmentCatalog

  - maxPts: (optional) the maximum number of molecules to be
    considered

  - actName: (optional) the name of the molecule's activity property.
    If this is not provided, the molecule's last property will be used.

  - acts: (optional) a sequence of activity values (integers).
    If not provided, the activities will be read from the molecules.

  - nActs: (optional) number of possible activity values

  - reportFreq: (optional) how often to display status information  

**Returns**

  a 2-tuple:

    1) the results table (a 3D array of ints nBits x 2 x nActs)

    2) a list containing the on bit lists for each molecule

ScoreFromLists(bitLists, suppl, catalog, maxPts=-1, actName='', acts=None, nActs=2, reportFreq=10)

source code 
similar to _ScoreMolecules()_, but uses pre-calculated bit lists
for the molecules (this speeds things up a lot)


**Arguments**

  - bitLists: sequence of on bit sequences for the input molecules

  - suppl: the input supplier (we read activities from here)

  - catalog: the FragmentCatalog

  - maxPts: (optional) the maximum number of molecules to be
    considered

  - actName: (optional) the name of the molecule's activity property.
    If this is not provided, the molecule's last property will be used.

  - nActs: (optional) number of possible activity values

  - reportFreq: (optional) how often to display status information  

**Returns**

   the results table (a 3D array of ints nBits x 2 x nActs)

CalcGains(suppl, catalog, topN=-1, actName='', acts=None, nActs=2, reportFreq=10, biasList=None, collectFps=0)

source code 
calculates info gains by constructing fingerprints
*DOC*

Returns a 2-tuple:
   1) gains matrix
   2) list of fingerprints

CalcGainsFromFps(suppl, fps, topN=-1, actName='', acts=None, nActs=2, reportFreq=10, biasList=None)

source code 
calculates info gains from a set of fingerprints

*DOC*

ProcessGainsData(inF, delim=',', idCol=0, gainCol=1)

source code 
reads a list of ids and info gains out of an input file

  

ShowDetails(catalog, gains, nToDo=-1, outF=sys.stdout, idCol=0, gainCol=1, outDelim=',')

source code 

gains should be a sequence of sequences.  The idCol entry of each
sub-sequence should be a catalog ID.  _ProcessGainsData()_ provides
suitable input.