Package rdkit :: Package Chem :: Package AtomPairs :: Module Utils
[hide private]
[frames] | no frames]

Module Utils

source code

Functions [hide private]
 
ExplainAtomCode(code, branchSubtract=0)
**Arguments**:
source code
 
NumPiElectrons(atom)
Returns the number of electrons an atom is using for pi bonding
source code
 
BitsInCommon(v1, v2)
Returns the number of bits in common between two vectors
source code
 
DiceSimilarity(v1, v2, bounds=None)
Implements the DICE similarity metric.
source code
 
Dot(v1, v2)
Returns the Dot product between two vectors:
source code
 
CosineSimilarity(v1, v2)
Implements the Cosine similarity metric.
source code
 
_test() source code
Variables [hide private]
  __package__ = 'rdkit.Chem.AtomPairs'

Imports: Chem, rdMolDescriptors, math, GetAtomCode


Function Details [hide private]

ExplainAtomCode(code, branchSubtract=0)

source code 


**Arguments**:

  - the code to be considered

  - branchSubtract: (optional) the constant that was subtracted off
    the number of neighbors before integrating it into the code.  
    This is used by the topological torsions code.
    

>>> m = Chem.MolFromSmiles('C=CC(=O)O')
>>> code = GetAtomCode(m.GetAtomWithIdx(0))
>>> ExplainAtomCode(code)
('C', 1, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(1))
>>> ExplainAtomCode(code)
('C', 2, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(2))
>>> ExplainAtomCode(code)
('C', 3, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(3))
>>> ExplainAtomCode(code)
('O', 1, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(4))
>>> ExplainAtomCode(code)
('O', 1, 0)

NumPiElectrons(atom)

source code 
Returns the number of electrons an atom is using for pi bonding

>>> m = Chem.MolFromSmiles('C=C')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1

>>> m = Chem.MolFromSmiles('C#CC')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
2
>>> NumPiElectrons(m.GetAtomWithIdx(1))
2

>>> m = Chem.MolFromSmiles('O=C=CC')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1
>>> NumPiElectrons(m.GetAtomWithIdx(1))
2
>>> NumPiElectrons(m.GetAtomWithIdx(2))
1
>>> NumPiElectrons(m.GetAtomWithIdx(3))
0

FIX: this behaves oddly in these cases:
>>> m = Chem.MolFromSmiles('S(=O)(=O)')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
2

>>> m = Chem.MolFromSmiles('S(=O)(=O)(O)O')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
0

In the second case, the S atom is tagged as sp3 hybridized.

BitsInCommon(v1, v2)

source code 
Returns the number of bits in common between two vectors

**Arguments**:

  - two vectors (sequences of bit ids)

**Returns**: an integer

**Notes**

  - the vectors must be sorted

  - duplicate bit IDs are counted more than once

>>> BitsInCommon( (1,2,3,4,10), (2,4,6) )
2

Here's how duplicates are handled:
>>> BitsInCommon( (1,2,2,3,4), (2,2,4,5,6) )
3
 

DiceSimilarity(v1, v2, bounds=None)

source code 
Implements the DICE similarity metric.
 This is the recommended metric in both the Topological torsions
 and Atom pairs papers.

**Arguments**:

  - two vectors (sequences of bit ids)

**Returns**: a float.

**Notes**

  - the vectors must be sorted

  
>>> DiceSimilarity( (1,2,3), (1,2,3) )
1.0
>>> DiceSimilarity( (1,2,3), (5,6) )
0.0
>>> DiceSimilarity( (1,2,3,4), (1,3,5,7) )
0.5
>>> DiceSimilarity( (1,2,3,4,5,6), (1,3) )
0.5

Note that duplicate bit IDs count multiple times:
>>> DiceSimilarity( (1,1,3,4,5,6), (1,1) )
0.5

but only if they are duplicated in both vectors:
>>> DiceSimilarity( (1,1,3,4,5,6), (1,) )==2./7
True

Dot(v1, v2)

source code 
Returns the Dot product between two vectors:

**Arguments**:

  - two vectors (sequences of bit ids)

**Returns**: an integer

**Notes**

  - the vectors must be sorted

  - duplicate bit IDs are counted more than once

>>> Dot( (1,2,3,4,10), (2,4,6) )
2

Here's how duplicates are handled:
>>> Dot( (1,2,2,3,4), (2,2,4,5,6) )
5
>>> Dot( (1,2,2,3,4), (2,4,5,6) )
2
>>> Dot( (1,2,2,3,4), (5,6) )
0
>>> Dot( (), (5,6) )
0

CosineSimilarity(v1, v2)

source code 
Implements the Cosine similarity metric.
 This is the recommended metric in the LaSSI paper

**Arguments**:

  - two vectors (sequences of bit ids)

**Returns**: a float.

**Notes**

  - the vectors must be sorted

>>> print('%.3f'%CosineSimilarity( (1,2,3,4,10), (2,4,6) ))
0.516
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (2,2,4,5,6) ))
0.714
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (1,2,2,3,4) ))
1.000
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (5,6,7) ))
0.000
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), () ))
0.000