rdkit.Chem.AtomPairs.Utils module

rdkit.Chem.AtomPairs.Utils.BitsInCommon(v1, v2)

Returns the number of bits in common between two vectors

Arguments:

  • two vectors (sequences of bit ids)

Returns: an integer

Notes

  • the vectors must be sorted
  • duplicate bit IDs are counted more than once
>>> BitsInCommon( (1,2,3,4,10), (2,4,6) )
2

Here’s how duplicates are handled: >>> BitsInCommon( (1,2,2,3,4), (2,2,4,5,6) ) 3

rdkit.Chem.AtomPairs.Utils.CosineSimilarity(v1, v2)
Implements the Cosine similarity metric.
This is the recommended metric in the LaSSI paper

Arguments:

  • two vectors (sequences of bit ids)

Returns: a float.

Notes

  • the vectors must be sorted
>>> print('%.3f'%CosineSimilarity( (1,2,3,4,10), (2,4,6) ))
0.516
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (2,2,4,5,6) ))
0.714
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (1,2,2,3,4) ))
1.000
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (5,6,7) ))
0.000
>>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), () ))
0.000
rdkit.Chem.AtomPairs.Utils.DiceSimilarity(v1, v2, bounds=None)
Implements the DICE similarity metric.
This is the recommended metric in both the Topological torsions and Atom pairs papers.

Arguments:

  • two vectors (sequences of bit ids)

Returns: a float.

Notes

  • the vectors must be sorted
>>> DiceSimilarity( (1,2,3), (1,2,3) )
1.0
>>> DiceSimilarity( (1,2,3), (5,6) )
0.0
>>> DiceSimilarity( (1,2,3,4), (1,3,5,7) )
0.5
>>> DiceSimilarity( (1,2,3,4,5,6), (1,3) )
0.5

Note that duplicate bit IDs count multiple times: >>> DiceSimilarity( (1,1,3,4,5,6), (1,1) ) 0.5

but only if they are duplicated in both vectors: >>> DiceSimilarity( (1,1,3,4,5,6), (1,) )==2./7 True

edge case >>> DiceSimilarity( (), () ) 0.0

and bounds check >>> DiceSimilarity( (1,1,3,4), (1,1)) 0.666... >>> DiceSimilarity( (1,1,3,4), (1,1), bounds=0.3) 0.666... >>> DiceSimilarity( (1,1,3,4), (1,1), bounds=0.33) 0.666... >>> DiceSimilarity( (1,1,3,4,5,6), (1,1), bounds=0.34) 0.0

rdkit.Chem.AtomPairs.Utils.Dot(v1, v2)

Returns the Dot product between two vectors:

Arguments:

  • two vectors (sequences of bit ids)

Returns: an integer

Notes

  • the vectors must be sorted
  • duplicate bit IDs are counted more than once
>>> Dot( (1,2,3,4,10), (2,4,6) )
2

Here’s how duplicates are handled: >>> Dot( (1,2,2,3,4), (2,2,4,5,6) ) 5 >>> Dot( (1,2,2,3,4), (2,4,5,6) ) 2 >>> Dot( (1,2,2,3,4), (5,6) ) 0 >>> Dot( (), (5,6) ) 0

rdkit.Chem.AtomPairs.Utils.ExplainAtomCode(code, branchSubtract=0)

Arguments:

  • the code to be considered
  • branchSubtract: (optional) the constant that was subtracted off the number of neighbors before integrating it into the code. This is used by the topological torsions code.
>>> m = Chem.MolFromSmiles('C=CC(=O)O')
>>> code = GetAtomCode(m.GetAtomWithIdx(0))
>>> ExplainAtomCode(code)
('C', 1, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(1))
>>> ExplainAtomCode(code)
('C', 2, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(2))
>>> ExplainAtomCode(code)
('C', 3, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(3))
>>> ExplainAtomCode(code)
('O', 1, 1)
>>> code = GetAtomCode(m.GetAtomWithIdx(4))
>>> ExplainAtomCode(code)
('O', 1, 0)
rdkit.Chem.AtomPairs.Utils.NumPiElectrons(atom)

Returns the number of electrons an atom is using for pi bonding

>>> m = Chem.MolFromSmiles('C=C')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1
>>> m = Chem.MolFromSmiles('C#CC')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
2
>>> NumPiElectrons(m.GetAtomWithIdx(1))
2
>>> m = Chem.MolFromSmiles('O=C=CC')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1
>>> NumPiElectrons(m.GetAtomWithIdx(1))
2
>>> NumPiElectrons(m.GetAtomWithIdx(2))
1
>>> NumPiElectrons(m.GetAtomWithIdx(3))
0
>>> m = Chem.MolFromSmiles('c1ccccc1')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
1

FIX: this behaves oddly in these cases: >>> m = Chem.MolFromSmiles(‘S(=O)(=O)’) >>> NumPiElectrons(m.GetAtomWithIdx(0)) 2

>>> m = Chem.MolFromSmiles('S(=O)(=O)(O)O')
>>> NumPiElectrons(m.GetAtomWithIdx(0))
0

In the second case, the S atom is tagged as sp3 hybridized.