rdkit.ML.Data.MLData module¶
classes to be used to help work with data sets
- class rdkit.ML.Data.MLData.MLDataSet(data, nVars=None, nPts=None, nPossibleVals=None, qBounds=None, varNames=None, ptNames=None, nResults=1)¶
Bases:
object
A data set for holding general data (floats, ints, and strings)
- Note
this is intended to be a read-only data structure (i.e. after calling the constructor you cannot touch it)
Constructor
Arguments
- data: a list of lists containing the data. The data are copied, so don’t worry
about us overwriting them.
nVars: the number of variables
nPts: the number of points
- nPossibleVals: an list containing the number of possible values
for each variable (should contain 0 when not relevant) This is _nVars_ long
- qBounds: a list of lists containing quantization bounds for variables
which are to be quantized (note, this class does not quantize the variables itself, it merely stores quantization bounds. an empty sublist indicates no quantization for a given variable This is _nVars_ long
- varNames: a list of the names of the variables.
This is _nVars_ long
- ptNames: the names (labels) of the individual data points
This is _nPts_ long
- nResults: the number of results columns in the data lists. This is usually
1, but can be higher.
- AddPoint(pt)¶
- AddPoints(pts, names)¶
- GetAllData()¶
returns a copy of the data
- GetInputData()¶
returns the input data
Note
- _inputData_ means the examples without their result fields
(the last _NResults_ entries)
- GetNPossibleVals()¶
- GetNPts()¶
- GetNResults()¶
- GetNVars()¶
- GetNamedData()¶
returns a list of named examples
Note
- a named example is the result of prepending the example
name to the data list
- GetPtNames()¶
- GetQuantBounds()¶
- GetResults()¶
Returns the result fields from each example
- GetVarNames()¶
- class rdkit.ML.Data.MLData.MLQuantDataSet(data, nVars=None, nPts=None, nPossibleVals=None, qBounds=None, varNames=None, ptNames=None, nResults=1)¶
Bases:
MLDataSet
a data set for holding quantized data
Note
this is intended to be a read-only data structure (i.e. after calling the constructor you cannot touch it)
Big differences to MLDataSet
data are stored in a numpy array since they are homogenous
results are assumed to be quantized (i.e. no qBounds entry is required)
Constructor
Arguments
- data: a list of lists containing the data. The data are copied, so don’t worry
about us overwriting them.
nVars: the number of variables
nPts: the number of points
- nPossibleVals: an list containing the number of possible values
for each variable (should contain 0 when not relevant) This is _nVars_ long
- qBounds: a list of lists containing quantization bounds for variables
which are to be quantized (note, this class does not quantize the variables itself, it merely stores quantization bounds. an empty sublist indicates no quantization for a given variable This is _nVars_ long
- varNames: a list of the names of the variables.
This is _nVars_ long
- ptNames: the names (labels) of the individual data points
This is _nPts_ long
- nResults: the number of results columns in the data lists. This is usually
1, but can be higher.
- GetAllData()¶
returns a copy of the data
- GetInputData()¶
returns the input data
Note
- _inputData_ means the examples without their result fields
(the last _NResults_ entries)
- GetNamedData()¶
returns a list of named examples
Note
- a named example is the result of prepending the example
name to the data list
- GetResults()¶
Returns the result fields from each example