rdkit.ML.SLT.Risk module¶

code for calculating empirical risk

rdkit.ML.SLT.Risk.BurgesRiskBound(VCDim, nData, nWrong, conf)¶

Calculates Burges’s formulation of the risk bound

The formulation is from Eqn. 3 of Burges’s review article “A Tutorial on Support Vector Machines for Pattern Recognition”

In _Data Mining and Knowledge Discovery_ Kluwer Academic Publishers (1998) Vol. 2

Arguments

VCDim: the VC dimension of the system

nData: the number of data points used

nWrong: the number of data points misclassified

conf: the confidence to be used for this risk bound

Returns

a float

Notes

This has been validated against the Burges paper

I believe that this is only technically valid for binary classification

rdkit.ML.SLT.Risk.CherkasskyRiskBound(VCDim, nData, nWrong, conf, a1=1.0, a2=2.0)¶

The formulation here is from Eqns 4.22 and 4.23 on pg 108 of Cherkassky and Mulier’s book “Learning From Data” Wiley, 1998.

Arguments

VCDim: the VC dimension of the system

nData: the number of data points used

nWrong: the number of data points misclassified

conf: the confidence to be used for this risk bound

a1, a2: constants in the risk equation. Restrictions on these values:

0 <= a1 <= 4

0 <= a2 <= 2

Returns

a float

Notes

This appears to behave reasonably

the equality a1=1.0 is by analogy to Burges’s paper.

rdkit.ML.SLT.Risk.CristianiRiskBound(VCDim, nData, nWrong, conf)¶

the formulation here is from pg 58, Theorem 4.6 of the book “An Introduction to Support Vector Machines” by Cristiani and Shawe-Taylor Cambridge University Press, 2000

Arguments

VCDim: the VC dimension of the system

nData: the number of data points used

nWrong: the number of data points misclassified

conf: the confidence to be used for this risk bound

Returns

a float

Notes

this generates odd (mismatching) values

rdkit.ML.SLT.Risk.log2(x)¶

rdkit.ML.SLT.Risk module¶

Table of Contents

Previous topic

Next topic

This Page