rdkit.ML.DecTree.Forest module

code for dealing with forests (collections) of decision trees

NOTE This code should be obsolete now that ML.Composite.Composite is up and running.

class rdkit.ML.DecTree.Forest.Forest

Bases: object

a forest of unique decision trees.

adding an existing tree just results in its count field being incremented
and the errors being averaged.

typical usage:

  1. grow the forest with AddTree until happy with it
  2. call AverageErrors to calculate the average error values
  3. call SortTrees to put things in order by either error or count
AddTree(tree, error)

Adds a tree to the forest

If an identical tree is already present, its count is incremented

Arguments

  • tree: the new tree
  • error: its error value
NOTE: the errList is run as an accumulator,
you probably want to call AverageErrors after finishing the forest
AverageErrors()

convert summed error to average error

This does the conversion in place

ClassifyExample(example)

classifies the given example using the entire forest

returns a result and a measure of confidence in it.

FIX: statistics sucks... I’m not seeing an obvious way to get

the confidence intervals. For that matter, I’m not seeing an unobvious way.

For now, this is just treated as a voting problem with the confidence measure being the percent of trees which voted for the winning result.

CollectVotes(example)

collects votes across every member of the forest for the given example

Returns

a list of the results
GetAllData()

Returns everything we know

Returns

a 3-tuple consisting of:

  1. our list of trees
  2. our list of tree counts
  3. our list of tree errors
GetCount(i)
GetDataTuple(i)

returns all relevant data about a particular tree in the forest

Arguments

i: an integer indicating which tree should be returned

Returns

a 3-tuple consisting of:

  1. the tree
  2. its count
  3. its error
GetError(i)
GetTree(i)
GetVoteDetails()

Returns the details of the last vote the forest conducted

this will be an empty list if no voting has yet been done

Grow(examples, attrs, nPossibleVals, nTries=10, pruneIt=0, lessGreedy=0)

Grows the forest by adding trees

Arguments

  • examples: the examples to be used for training
  • attrs: a list of the attributes to be used in training
  • nPossibleVals: a list with the number of possible values each variable (as well as the result) can take on
  • nTries: the number of new trees to add
  • pruneIt: a toggle for whether or not the tree should be pruned
  • lessGreedy: toggles the use of a less greedy construction algorithm where each possible tree root is used. The best tree from each step is actually added to the forest.
MakeHistogram()

creates a histogram of error/count pairs

Pickle(fileName='foo.pkl')

Writes this forest off to a file so that it can be easily loaded later

Arguments

fileName is the name of the file to be written
SetCount(i, val)
SetDataTuple(i, tup)

sets all relevant data for a particular tree in the forest

Arguments

  • i: an integer indicating which tree should be returned
  • tup: a 3-tuple consisting of:
    1. the tree
    2. its count
    3. its error
SetError(i, val)
SetTree(i, val)
SortTrees(sortOnError=1)

sorts the list of trees

Arguments

sortOnError: toggles sorting on the trees’ errors rather than their counts