rdkit.ML.DecTree.ID3 module

ID3 Decision Trees

contains an implementation of the ID3 decision tree algorithm as described in Tom Mitchell’s book “Machine Learning”

It relies upon the _Tree.TreeNode_ data structure (or something
with the same API) defined locally to represent the trees
rdkit.ML.DecTree.ID3.CalcTotalEntropy(examples, nPossibleVals)

Calculates the total entropy of the data set (w.r.t. the results)

Arguments

  • examples: a list (nInstances long) of lists of variable values + instance

    values

  • nPossibleVals: a list (nVars long) of the number of possible values each variable can adopt.

Returns

a float containing the informational entropy of the data set.
rdkit.ML.DecTree.ID3.GenVarTable(examples, nPossibleVals, vars)

Generates a list of variable tables for the examples passed in.

The table for a given variable records the number of times each possible value of that variable appears for each possible result of the function.

Arguments

  • examples: a list (nInstances long) of lists of variable values + instance

    values

  • nPossibleVals: a list containing the number of possible values of

    each variable + the number of values of the function.

  • vars: a list of the variables to include in the var table

Returns

a list of variable result tables. Each table is a Numeric array
which is varValues x nResults
rdkit.ML.DecTree.ID3.ID3(examples, target, attrs, nPossibleVals, depth=0, maxDepth=-1, **kwargs)

Implements the ID3 algorithm for constructing decision trees.

From Mitchell’s book, page 56

This is slightly modified from Mitchell’s book because it supports
multivalued (non-binary) results.

Arguments

  • examples: a list (nInstances long) of lists of variable values + instance

    values

  • target: an int

  • attrs: a list of ints indicating which variables can be used in the tree

  • nPossibleVals: a list containing the number of possible values of

    every variable.

  • depth: (optional) the current depth in the tree

  • maxDepth: (optional) the maximum depth to which the tree

    will be grown

Returns

a DecTree.DecTreeNode with the decision tree
NOTE: This code cannot bootstrap (start from nothing...)
use _ID3Boot_ (below) for that.
rdkit.ML.DecTree.ID3.ID3Boot(examples, attrs, nPossibleVals, initialVar=None, depth=0, maxDepth=-1, **kwargs)

Bootstrapping code for the ID3 algorithm

see ID3 for descriptions of the arguments

If _initialVar_ is not set, the algorithm will automatically
choose the first variable in the tree (the standard greedy approach). Otherwise, _initialVar_ will be used as the first split.