Package rdkit :: Package ML :: Package DecTree :: Module Forest :: Class Forest
[hide private]
[frames] | no frames]

Class Forest

source code

object --+
         |
        Forest

a forest of unique decision trees.

adding an existing tree just results in its count field being incremented
    and the errors being averaged.

typical usage:

  1) grow the forest with AddTree until happy with it

  2) call AverageErrors to calculate the average error values

  3) call SortTrees to put things in order by either error or count

Instance Methods [hide private]
 
MakeHistogram(self)
creates a histogram of error/count pairs
source code
 
CollectVotes(self, example)
collects votes across every member of the forest for the given example
source code
 
ClassifyExample(self, example)
classifies the given example using the entire forest
source code
 
GetVoteDetails(self)
Returns the details of the last vote the forest conducted
source code
 
Grow(self, examples, attrs, nPossibleVals, nTries=10, pruneIt=0, lessGreedy=0)
Grows the forest by adding trees
source code
 
Pickle(self, fileName='foo.pkl')
Writes this forest off to a file so that it can be easily loaded later
source code
 
AddTree(self, tree, error)
Adds a tree to the forest
source code
 
AverageErrors(self)
convert summed error to average error
source code
 
SortTrees(self, sortOnError=1)
sorts the list of trees
source code
 
GetTree(self, i) source code
 
SetTree(self, i, val) source code
 
GetCount(self, i) source code
 
SetCount(self, i, val) source code
 
GetError(self, i) source code
 
SetError(self, i, val) source code
 
GetDataTuple(self, i)
returns all relevant data about a particular tree in the forest
source code
 
SetDataTuple(self, i, tup)
sets all relevant data for a particular tree in the forest
source code
 
GetAllData(self)
Returns everything we know
source code
 
__len__(self)
allows len(forest) to work
source code
 
__getitem__(self, which)
allows forest[i] to work.
source code
 
__str__(self)
allows the forest to show itself as a string
source code
 
__init__(self)
x.__init__(...) initializes x; see help(type(x)) for signature
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

CollectVotes(self, example)

source code 
collects votes across every member of the forest for the given example

**Returns**

  a list of the results

ClassifyExample(self, example)

source code 
classifies the given example using the entire forest

**returns** a result and a measure of confidence in it.

**FIX:** statistics sucks... I'm not seeing an obvious way to get 
     the confidence intervals.  For that matter, I'm not seeing
     an unobvious way.

     For now, this is just treated as a voting problem with the confidence
     measure being the percent of trees which voted for the winning result.

GetVoteDetails(self)

source code 
Returns the details of the last vote the forest conducted

this will be an empty list if no voting has yet been done

Grow(self, examples, attrs, nPossibleVals, nTries=10, pruneIt=0, lessGreedy=0)

source code 
Grows the forest by adding trees

**Arguments**

 - examples: the examples to be used for training

 - attrs: a list of the attributes to be used in training

 - nPossibleVals: a list with the number of possible values each variable
   (as well as the result) can take on

 - nTries: the number of new trees to add

 - pruneIt: a toggle for whether or not the tree should be pruned

 - lessGreedy: toggles the use of a less greedy construction algorithm where
   each possible tree root is used.  The best tree from each step is actually
   added to the forest.

Pickle(self, fileName='foo.pkl')

source code 
Writes this forest off to a file so that it can be easily loaded later

**Arguments**

  fileName is the name of the file to be written
  

AddTree(self, tree, error)

source code 
Adds a tree to the forest

If an identical tree is already present, its count is incremented

**Arguments**

  - tree: the new tree

  - error: its error value

**NOTE:** the errList is run as an accumulator,
    you probably want to call AverageErrors after finishing the forest

AverageErrors(self)

source code 
convert summed error to average error

This does the conversion in place

SortTrees(self, sortOnError=1)

source code 
sorts the list of trees

**Arguments**

  sortOnError: toggles sorting on the trees' errors rather than their counts

GetDataTuple(self, i)

source code 
returns all relevant data about a particular tree in the forest

**Arguments**

  i: an integer indicating which tree should be returned

**Returns**

  a 3-tuple consisting of:

    1) the tree

    2) its count

    3) its error

SetDataTuple(self, i, tup)

source code 
sets all relevant data for a particular tree in the forest

**Arguments**

  - i: an integer indicating which tree should be returned

  - tup: a 3-tuple consisting of:

    1) the tree

    2) its count

    3) its error

GetAllData(self)

source code 
Returns everything we know

**Returns**

  a 3-tuple consisting of:

    1) our list of trees

    2) our list of tree counts

    3) our list of tree errors

__getitem__(self, which)
(Indexing operator)

source code 
allows forest[i] to work.  return the data tuple

    

__str__(self)
(Informal representation operator)

source code 
allows the forest to show itself as a string

    

Overrides: object.__str__

__init__(self)
(Constructor)

source code 
x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__
(inherited documentation)