Package rdkit :: Package ML :: Package Data :: Module Quantize
[hide private]
[frames] | no frames]

Module Quantize

source code

Automatic search for quantization bounds

This uses the expected informational gain to determine where quantization bounds should
lie.

**Notes**:

  - bounds are less than, so if the bounds are [1.,2.],
    [0.9,1.,1.1,2.,2.2] -> [0,1,1,2,2]

Functions [hide private]
 
feq(v1, v2, tol=1e-08)
floating point equality with a tolerance factor
source code
 
FindVarQuantBound(vals, results, nPossibleRes)
Uses FindVarMultQuantBounds, only here for historic reasons...
source code
 
_GenVarTable(vals, cuts, starts, results, nPossibleRes)
Primarily intended for internal use
source code
 
_PyRecurseOnBounds(vals, cuts, which, starts, results, nPossibleRes, varTable=None)
Primarily intended for internal use
source code
 
_NewPyRecurseOnBounds(vals, cuts, which, starts, results, nPossibleRes, varTable=None)
Primarily intended for internal use
source code
 
_NewPyFindStartPoints(sortVals, sortResults, nData) source code
 
FindVarMultQuantBounds(vals, nBounds, results, nPossibleRes)
finds multiple quantization bounds for a single variable
source code
Variables [hide private]
  hascQuantize = 1
  _float_tol = 1e-08
  __package__ = 'rdkit.ML.Data'

Imports: numpy, entropy, zip, map, range, cQuantize, _RecurseOnBounds, _FindStartPoints


Function Details [hide private]

feq(v1, v2, tol=1e-08)

source code 
floating point equality with a tolerance factor

**Arguments**

  - v1: a float

  - v2: a float

  - tol: the tolerance for comparison

**Returns**

  0 or 1

FindVarQuantBound(vals, results, nPossibleRes)

source code 
Uses FindVarMultQuantBounds, only here for historic reasons
  

_GenVarTable(vals, cuts, starts, results, nPossibleRes)

source code 
Primarily intended for internal use

constructs a variable table for the data passed in
The table for a given variable records the number of times each possible value
 of that variable appears for each possible result of the function.

**Arguments**

  - vals: a 1D Numeric array with the values of the variables

  - cuts: a list with the indices of the quantization bounds
    (indices are into _starts_ )

  - starts: a list of potential starting points for quantization bounds

  - results: a 1D Numeric array of integer result codes

  - nPossibleRes: an integer with the number of possible result codes

**Returns**

  the varTable, a 2D Numeric array which is nVarValues x nPossibleRes

**Notes**

  - _vals_ should be sorted!
  

_PyRecurseOnBounds(vals, cuts, which, starts, results, nPossibleRes, varTable=None)

source code 
Primarily intended for internal use

Recursively finds the best quantization boundaries

**Arguments**

  - vals: a 1D Numeric array with the values of the variables,
    this should be sorted

  - cuts: a list with the indices of the quantization bounds
    (indices are into _starts_ )

  - which: an integer indicating which bound is being adjusted here
    (and index into _cuts_ )

  - starts: a list of potential starting points for quantization bounds

  - results: a 1D Numeric array of integer result codes

  - nPossibleRes: an integer with the number of possible result codes

**Returns**

  - a 2-tuple containing:

    1) the best information gain found so far

    2) a list of the quantization bound indices ( _cuts_ for the best case)

**Notes**

 - this is not even remotely efficient, which is why a C replacement
   was written

_NewPyRecurseOnBounds(vals, cuts, which, starts, results, nPossibleRes, varTable=None)

source code 
Primarily intended for internal use

Recursively finds the best quantization boundaries

**Arguments**

  - vals: a 1D Numeric array with the values of the variables,
    this should be sorted

  - cuts: a list with the indices of the quantization bounds
    (indices are into _starts_ )

  - which: an integer indicating which bound is being adjusted here
    (and index into _cuts_ )

  - starts: a list of potential starting points for quantization bounds

  - results: a 1D Numeric array of integer result codes

  - nPossibleRes: an integer with the number of possible result codes

**Returns**

  - a 2-tuple containing:

    1) the best information gain found so far

    2) a list of the quantization bound indices ( _cuts_ for the best case)

**Notes**

 - this is not even remotely efficient, which is why a C replacement
   was written

FindVarMultQuantBounds(vals, nBounds, results, nPossibleRes)

source code 
finds multiple quantization bounds for a single variable

**Arguments**

  - vals: sequence of variable values (assumed to be floats)

  - nBounds: the number of quantization bounds to find

  - results: a list of result codes (should be integers)

  - nPossibleRes: an integer with the number of possible values of the
    result variable

**Returns**

  - a 2-tuple containing:

    1) a list of the quantization bounds (floats)

    2) the information gain associated with this quantization