rdkit.Chem.AtomPairs.Utils module¶
-
rdkit.Chem.AtomPairs.Utils.BitsInCommon(v1, v2)¶ Returns the number of bits in common between two vectors
Arguments:
- two vectors (sequences of bit ids)
Returns: an integer
Notes
- the vectors must be sorted
- duplicate bit IDs are counted more than once
>>> BitsInCommon( (1,2,3,4,10), (2,4,6) ) 2
Here’s how duplicates are handled: >>> BitsInCommon( (1,2,2,3,4), (2,2,4,5,6) ) 3
-
rdkit.Chem.AtomPairs.Utils.CosineSimilarity(v1, v2)¶ - Implements the Cosine similarity metric.
- This is the recommended metric in the LaSSI paper
Arguments:
- two vectors (sequences of bit ids)
Returns: a float.
Notes
- the vectors must be sorted
>>> print('%.3f'%CosineSimilarity( (1,2,3,4,10), (2,4,6) )) 0.516 >>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (2,2,4,5,6) )) 0.714 >>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (1,2,2,3,4) )) 1.000 >>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), (5,6,7) )) 0.000 >>> print('%.3f'%CosineSimilarity( (1,2,2,3,4), () )) 0.000
-
rdkit.Chem.AtomPairs.Utils.DiceSimilarity(v1, v2, bounds=None)¶ - Implements the DICE similarity metric.
- This is the recommended metric in both the Topological torsions and Atom pairs papers.
Arguments:
- two vectors (sequences of bit ids)
Returns: a float.
Notes
- the vectors must be sorted
>>> DiceSimilarity( (1,2,3), (1,2,3) ) 1.0 >>> DiceSimilarity( (1,2,3), (5,6) ) 0.0 >>> DiceSimilarity( (1,2,3,4), (1,3,5,7) ) 0.5 >>> DiceSimilarity( (1,2,3,4,5,6), (1,3) ) 0.5
Note that duplicate bit IDs count multiple times: >>> DiceSimilarity( (1,1,3,4,5,6), (1,1) ) 0.5
but only if they are duplicated in both vectors: >>> DiceSimilarity( (1,1,3,4,5,6), (1,) )==2./7 True
edge case >>> DiceSimilarity( (), () ) 0.0
and bounds check >>> DiceSimilarity( (1,1,3,4), (1,1)) 0.666... >>> DiceSimilarity( (1,1,3,4), (1,1), bounds=0.3) 0.666... >>> DiceSimilarity( (1,1,3,4), (1,1), bounds=0.33) 0.666... >>> DiceSimilarity( (1,1,3,4,5,6), (1,1), bounds=0.34) 0.0
-
rdkit.Chem.AtomPairs.Utils.Dot(v1, v2)¶ Returns the Dot product between two vectors:
Arguments:
- two vectors (sequences of bit ids)
Returns: an integer
Notes
- the vectors must be sorted
- duplicate bit IDs are counted more than once
>>> Dot( (1,2,3,4,10), (2,4,6) ) 2
Here’s how duplicates are handled: >>> Dot( (1,2,2,3,4), (2,2,4,5,6) ) 5 >>> Dot( (1,2,2,3,4), (2,4,5,6) ) 2 >>> Dot( (1,2,2,3,4), (5,6) ) 0 >>> Dot( (), (5,6) ) 0
-
rdkit.Chem.AtomPairs.Utils.ExplainAtomCode(code, branchSubtract=0)¶ Arguments:
- the code to be considered
- branchSubtract: (optional) the constant that was subtracted off the number of neighbors before integrating it into the code. This is used by the topological torsions code.
>>> m = Chem.MolFromSmiles('C=CC(=O)O') >>> code = GetAtomCode(m.GetAtomWithIdx(0)) >>> ExplainAtomCode(code) ('C', 1, 1) >>> code = GetAtomCode(m.GetAtomWithIdx(1)) >>> ExplainAtomCode(code) ('C', 2, 1) >>> code = GetAtomCode(m.GetAtomWithIdx(2)) >>> ExplainAtomCode(code) ('C', 3, 1) >>> code = GetAtomCode(m.GetAtomWithIdx(3)) >>> ExplainAtomCode(code) ('O', 1, 1) >>> code = GetAtomCode(m.GetAtomWithIdx(4)) >>> ExplainAtomCode(code) ('O', 1, 0)
-
rdkit.Chem.AtomPairs.Utils.NumPiElectrons(atom)¶ Returns the number of electrons an atom is using for pi bonding
>>> m = Chem.MolFromSmiles('C=C') >>> NumPiElectrons(m.GetAtomWithIdx(0)) 1
>>> m = Chem.MolFromSmiles('C#CC') >>> NumPiElectrons(m.GetAtomWithIdx(0)) 2 >>> NumPiElectrons(m.GetAtomWithIdx(1)) 2
>>> m = Chem.MolFromSmiles('O=C=CC') >>> NumPiElectrons(m.GetAtomWithIdx(0)) 1 >>> NumPiElectrons(m.GetAtomWithIdx(1)) 2 >>> NumPiElectrons(m.GetAtomWithIdx(2)) 1 >>> NumPiElectrons(m.GetAtomWithIdx(3)) 0
>>> m = Chem.MolFromSmiles('c1ccccc1') >>> NumPiElectrons(m.GetAtomWithIdx(0)) 1
FIX: this behaves oddly in these cases: >>> m = Chem.MolFromSmiles(‘S(=O)(=O)’) >>> NumPiElectrons(m.GetAtomWithIdx(0)) 2
>>> m = Chem.MolFromSmiles('S(=O)(=O)(O)O') >>> NumPiElectrons(m.GetAtomWithIdx(0)) 0
In the second case, the S atom is tagged as sp3 hybridized.