Package rdkit :: Package ML :: Package SLT :: Module Risk
[hide private]
[frames] | no frames]

Module Risk

source code

code for calculating empirical risk

Functions [hide private]
 
log2(x) source code
 
BurgesRiskBound(VCDim, nData, nWrong, conf)
Calculates Burges's formulation of the risk bound
source code
 
CristianiRiskBound(VCDim, nData, nWrong, conf)
the formulation here is from pg 58, Theorem 4.6 of the book "An Introduction to Support Vector Machines" by Cristiani and Shawe-Taylor Cambridge University Press, 2000
source code
 
CherkasskyRiskBound(VCDim, nData, nWrong, conf, a1=1.0, a2=2.0)
The formulation here is from Eqns 4.22 and 4.23 on pg 108 of Cherkassky and Mulier's book "Learning From Data" Wiley, 1998.
source code
Variables [hide private]
  __package__ = 'rdkit.ML.SLT'

Imports: math


Function Details [hide private]

BurgesRiskBound(VCDim, nData, nWrong, conf)

source code 
Calculates Burges's formulation of the risk bound

The formulation is from Eqn. 3 of Burges's review
article "A Tutorial on Support Vector Machines for Pattern Recognition"
 In _Data Mining and Knowledge Discovery_ Kluwer Academic Publishers
 (1998) Vol. 2

**Arguments**

  - VCDim: the VC dimension of the system

  - nData: the number of data points used

  - nWrong: the number of data points misclassified

  - conf: the confidence to be used for this risk bound


**Returns**

  - a float
  
**Notes**

 - This has been validated against the Burges paper

 - I believe that this is only technically valid for binary classification

CristianiRiskBound(VCDim, nData, nWrong, conf)

source code 

the formulation here is from pg 58, Theorem 4.6 of the book
"An Introduction to Support Vector Machines" by Cristiani and Shawe-Taylor
Cambridge University Press, 2000


**Arguments**

  - VCDim: the VC dimension of the system

  - nData: the number of data points used

  - nWrong: the number of data points misclassified

  - conf: the confidence to be used for this risk bound


**Returns**

  - a float
  
**Notes**

  - this generates odd (mismatching) values

CherkasskyRiskBound(VCDim, nData, nWrong, conf, a1=1.0, a2=2.0)

source code 


The formulation here is from Eqns 4.22 and 4.23 on pg 108 of
Cherkassky and Mulier's book "Learning From Data" Wiley, 1998.

**Arguments**

  - VCDim: the VC dimension of the system

  - nData: the number of data points used

  - nWrong: the number of data points misclassified

  - conf: the confidence to be used for this risk bound

  - a1, a2: constants in the risk equation. Restrictions on these values:

      - 0 <= a1 <= 4

      - 0 <= a2 <= 2

**Returns**

  - a float
  

**Notes**

 - This appears to behave reasonably

 - the equality a1=1.0 is by analogy to Burges's paper.