RDKit
Open-source cheminformatics and machine learning.
Embedder.h
Go to the documentation of this file.
1 //
2 // Copyright (C) 2004-2017 Greg Landrum and Rational Discovery LLC
3 //
4 // @@ All Rights Reserved @@
5 // This file is part of the RDKit.
6 // The contents are covered by the terms of the BSD license
7 // which is included in the file license.txt, found at the root
8 // of the RDKit source tree.
9 //
10 
11 #ifndef RD_EMBEDDER_H_GUARD
12 #define RD_EMBEDDER_H_GUARD
13 
14 #include <map>
15 #include <Geometry/point.h>
16 #include <GraphMol/ROMol.h>
17 
18 namespace RDKit {
19 namespace DGeomHelpers {
20 
21 //! Compute an embedding (in 3D) for the specified molecule using Distance
22 // Geometry
23 /*!
24  The following operations are performed (in order) here:
25  -# Build a distance bounds matrix based on the topology, including 1-5
26  distances but not VDW scaling
27  -# Triangle smooth this bounds matrix
28  -# If step 2 fails - repeat step 1, this time without 1-5 bounds and with vdW
29  scaling, and repeat step 2
30  -# Pick a distance matrix at random using the bounds matrix
31  -# Compute initial coordinates from the distance matrix
32  -# Repeat steps 3 and 4 until maxIterations is reached or embedding is
33  successful
34  -# Adjust initial coordinates by minimizing a Distance Violation error
35  function
36 
37  **NOTE**: if the molecule has multiple fragments, they will be embedded
38  separately,
39  this means that they will likely occupy the same region of space.
40 
41  \param mol Molecule of interest
42  \param maxIterations Max. number of times the embedding will be tried if
43  coordinates are
44  not obtained successfully. The default value is 10x the
45  number of atoms.
46  \param seed provides a seed for the random number generator (so that
47  the same
48  coordinates can be obtained for a molecule on multiple
49  runs)
50  If negative, the RNG will not be seeded.
51  \param clearConfs Clear all existing conformations on the molecule
52  \param useRandomCoords Start the embedding from random coordinates instead of
53  using eigenvalues of the distance matrix.
54  \param boxSizeMult Determines the size of the box that is used for
55  random coordinates. If this is a positive number, the
56  side length will equal the largest element of the
57  distance
58  matrix times \c boxSizeMult. If this is a negative
59  number,
60  the side length will equal \c -boxSizeMult (i.e.
61  independent
62  of the elements of the distance matrix).
63  \param randNegEig Picks coordinates at random when a embedding process
64  produces
65  negative eigenvalues
66  \param numZeroFail Fail embedding if we find this many or more zero
67  eigenvalues
68  (within a tolerance)
69  \param coordMap a map of int to Point3D, between atom IDs and their locations
70  their locations. If this container is provided, the
71  coordinates
72  are used to set distance constraints on the embedding. The
73  resulting
74  conformer(s) should have distances between the specified
75  atoms that
76  reproduce those between the points in \c coordMap. Because
77  the embedding
78  produces a molecule in an arbitrary reference frame, an
79  alignment step
80  is required to actually reproduce the provided coordinates.
81  \param optimizerForceTol set the tolerance on forces in the distgeom optimizer
82  (this shouldn't normally be altered in client code).
83  \param ignoreSmoothingFailures try to embed the molecule even if triangle
84  bounds
85  smoothing fails
86  \param enforceChirality enforce the correct chirality if chiral centers are
87  present
88 
89  \param useExpTorsionAnglePrefs impose experimental torsion-angle preferences
90  \param useBasicKnowledge impose "basic knowledge" terms such as flat
91  aromatic rings, ketones, etc.
92  \param verbose print output of experimental torsion-angle preferences
93 
94  \param basinThresh set the basin threshold for the DGeom force field,
95  (this shouldn't normally be altered in client code).
96 
97  \param onlyHeavyAtomsForRMS only use the heavy atoms when doing RMS filtering
98 
99  \return ID of the conformations added to the molecule, -1 if the emdedding
100  failed
101 */
102 int EmbedMolecule(ROMol &mol, unsigned int maxIterations = 0, int seed = -1,
103  bool clearConfs = true, bool useRandomCoords = false,
104  double boxSizeMult = 2.0, bool randNegEig = true,
105  unsigned int numZeroFail = 1,
106  const std::map<int, RDGeom::Point3D> *coordMap = 0,
107  double optimizerForceTol = 1e-3,
108  bool ignoreSmoothingFailures = false,
109  bool enforceChirality = true,
110  bool useExpTorsionAnglePrefs = false,
111  bool useBasicKnowledge = false, bool verbose = false,
112  double basinThresh = 5.0, bool onlyHeavyAtomsForRMS = false);
113 
114 //*! Embed multiple conformations for a molecule
115 /*!
116  This is kind of equivalent to calling EmbedMolecule multiple times - just that
117  the bounds
118  matrix is computed only once from the topology
119 
120  **NOTE**: if the molecule has multiple fragments, they will be embedded
121  separately,
122  this means that they will likely occupy the same region of space.
123 
124 
125  \param mol Molecule of interest
126  \param res Used to return the resulting conformer ids
127  \param numConfs Number of conformations to be generated
128  \param numThreads Sets the number of threads to use (more than one thread
129  will only
130  be used if the RDKit was build with multithread support)
131  If set to zero, the max supported by the system will be
132  used.
133  \param maxIterations Max. number of times the embedding will be tried if
134  coordinates are
135  not obtained successfully. The default value is 10x the
136  number of atoms.
137  \param seed provides a seed for the random number generator (so that
138  the same
139  coordinates can be obtained for a molecule on multiple
140  runs).
141  If negative, the RNG will not be seeded.
142  \param clearConfs Clear all existing conformations on the molecule
143  \param useRandomCoords Start the embedding from random coordinates instead of
144  using eigenvalues of the distance matrix.
145  \param boxSizeMult Determines the size of the box that is used for
146  random coordinates. If this is a positive number, the
147  side length will equal the largest element of the
148  distance
149  matrix times \c boxSizeMult. If this is a negative
150  number,
151  the side length will equal \c -boxSizeMult (i.e.
152  independent
153  of the elements of the distance matrix).
154  \param randNegEig Picks coordinates at random when a embedding process
155  produces
156  negative eigenvalues
157  \param numZeroFail Fail embedding if we find this many or more zero
158  eigenvalues
159  (within a tolerance)
160  \param pruneRmsThresh Retain only the conformations out of 'numConfs' after
161  embedding that are
162  at least this far apart from each other. RMSD is
163  computed on the heavy atoms.
164  Prunining is greedy; i.e. the first embedded
165  conformation is retained and from
166  then on only those that are atleast pruneRmsThresh away
167  from already
168  retained conformations are kept. The pruning is done
169  after embedding and
170  bounds violation minimization. No pruning by default.
171  \param coordMap a map of int to Point3D, between atom IDs and their locations
172  their locations. If this container is provided, the
173  coordinates
174  are used to set distance constraints on the embedding. The
175  resulting
176  conformer(s) should have distances between the specified
177  atoms that
178  reproduce those between the points in \c coordMap. Because
179  the embedding
180  produces a molecule in an arbitrary reference frame, an
181  alignment step
182  is required to actually reproduce the provided coordinates.
183 
184  \param optimizerForceTol set the tolerance on forces in the DGeom optimizer
185  (this shouldn't normally be altered in client code).
186 
187  \param ignoreSmoothingFailures try to embed the molecule even if triangle
188  bounds
189  smoothing fails
190  \param enforceChirality enforce the correct chirality if chiral centers are
191  present
192 
193  \param useExpTorsionAnglePrefs impose experimental torsion-angle preferences
194  \param useBasicKnowledge impose "basic knowledge" terms such as flat
195  aromatic rings, ketones, etc.
196  \param verbose print output of experimental torsion-angle preferences
197 
198  \param basinThresh set the basin threshold for the DGeom force field,
199  (this shouldn't normally be altered in client code).
200 
201  \param onlyHeavyAtomsForRMS only use the heavy atoms when doing RMS filtering
202 
203 */
204 void EmbedMultipleConfs(
205  ROMol &mol, INT_VECT &res, unsigned int numConfs = 10, int numThreads = 1,
206  unsigned int maxIterations = 30, int seed = -1, bool clearConfs = true,
207  bool useRandomCoords = false, double boxSizeMult = 2.0,
208  bool randNegEig = true, unsigned int numZeroFail = 1,
209  double pruneRmsThresh = -1.0,
210  const std::map<int, RDGeom::Point3D> *coordMap = 0,
211  double optimizerForceTol = 1e-3, bool ignoreSmoothingFailures = false,
212  bool enforceChirality = true, bool useExpTorsionAnglePrefs = false,
213  bool useBasicKnowledge = false, bool verbose = false,
214  double basinThresh = 5.0, bool onlyHeavyAtomsForRMS = false);
215 //! \overload
217  ROMol &mol, unsigned int numConfs = 10, unsigned int maxIterations = 30,
218  int seed = -1, bool clearConfs = true, bool useRandomCoords = false,
219  double boxSizeMult = 2.0, bool randNegEig = true,
220  unsigned int numZeroFail = 1, double pruneRmsThresh = -1.0,
221  const std::map<int, RDGeom::Point3D> *coordMap = 0,
222  double optimizerForceTol = 1e-3, bool ignoreSmoothingFailures = false,
223  bool enforceChirality = true, bool useExpTorsionAnglePrefs = false,
224  bool useBasicKnowledge = false, bool verbose = false,
225  double basinThresh = 5.0, bool onlyHeavyAtomsForRMS = false);
226 
227 //! Parameter object for controlling embedding
228 /*!
229  numConfs Number of conformations to be generated
230 
231  numThreads Sets the number of threads to use (more than one thread
232  will only be used if the RDKit was build with multithread
233  support) If set to zero, the max supported by the system will
234  be used.
235 
236  maxIterations Max. number of times the embedding will be tried if
237  coordinates are not obtained successfully. The default
238  value is 10x the number of atoms.
239 
240  randomSeed provides a seed for the random number generator (so that
241  the same coordinates can be obtained for a molecule on
242  multiple runs).
243  If negative, the RNG will not be seeded.
244 
245  clearConfs Clear all existing conformations on the molecule
246 
247  useRandomCoords Start the embedding from random coordinates instead of
248  using eigenvalues of the distance matrix.
249 
250  boxSizeMult Determines the size of the box that is used for
251  random coordinates. If this is a positive number, the
252  side length will equal the largest element of the distance
253  matrix times \c boxSizeMult. If this is a negative number,
254  the side length will equal \c -boxSizeMult (i.e. independent
255  of the elements of the distance matrix).
256 
257  randNegEig Picks coordinates at random when a embedding process produces
258  negative eigenvalues
259 
260  numZeroFail Fail embedding if we find this many or more zero eigenvalues
261  (within a tolerance)
262 
263  pruneRmsThresh Retain only the conformations out of 'numConfs' after
264  embedding that are at least this far apart from each other.
265  RMSD is computed on the heavy atoms.
266  Prunining is greedy; i.e. the first embedded conformation is
267  retained and from then on only those that are at least
268  \c pruneRmsThresh away from already
269  retained conformations are kept. The pruning is done
270  after embedding and bounds violation minimization.
271  No pruning by default.
272 
273  coordMap a map of int to Point3D, between atom IDs and their locations
274  their locations. If this container is provided, the
275  coordinates are used to set distance constraints on the
276  embedding. The resulting conformer(s) should have distances
277  between the specified atoms that reproduce those between the
278  points in \c coordMap. Because the embedding produces a
279  molecule in an arbitrary reference frame, an alignment step
280  is required to actually reproduce the provided coordinates.
281 
282  optimizerForceTol set the tolerance on forces in the DGeom optimizer
283  (this shouldn't normally be altered in client code).
284 
285  ignoreSmoothingFailures try to embed the molecule even if triangle bounds
286  smoothing fails
287 
288  enforceChirality enforce the correct chirality if chiral centers are present
289 
290  useExpTorsionAnglePrefs impose experimental torsion-angle preferences
291 
292  useBasicKnowledge impose "basic knowledge" terms such as flat
293  aromatic rings, ketones, etc.
294 
295  verbose print output of experimental torsion-angle preferences
296 
297  basinThresh set the basin threshold for the DGeom force field,
298  (this shouldn't normally be altered in client code).
299 
300  onlyHeavyAtomsForRMS only use the heavy atoms when doing RMS filtering
301 */
303  unsigned int maxIterations;
308  double boxSizeMult;
310  unsigned int numZeroFail;
311  const std::map<int, RDGeom::Point3D> *coordMap;
317  bool verbose;
318  double basinThresh;
322  : maxIterations(0),
323  numThreads(1),
324  randomSeed(-1),
325  clearConfs(true),
326  useRandomCoords(false),
327  boxSizeMult(2.0),
328  randNegEig(true),
329  numZeroFail(1),
330  coordMap(NULL),
331  optimizerForceTol(1e-3),
332  ignoreSmoothingFailures(false),
333  enforceChirality(true),
334  useExpTorsionAnglePrefs(false),
335  useBasicKnowledge(false),
336  verbose(false),
337  basinThresh(5.0),
338  pruneRmsThresh(-1.0),
339  onlyHeavyAtomsForRMS(false){};
340  EmbedParameters(unsigned int maxIterations, int numThreads, int randomSeed,
341  bool clearConfs, bool useRandomCoords, double boxSizeMult,
342  bool randNegEig, unsigned int numZeroFail,
343  const std::map<int, RDGeom::Point3D> *coordMap,
344  double optimizerForceTol, bool ignoreSmoothingFailures,
345  bool enforceChirality, bool useExpTorsionAnglePrefs,
346  bool useBasicKnowledge, bool verbose, double basinThresh,
347  double pruneRmsThresh, bool onlyHeavyAtomsForRMS)
348  : maxIterations(maxIterations),
349  numThreads(numThreads),
350  randomSeed(randomSeed),
351  clearConfs(clearConfs),
352  useRandomCoords(useRandomCoords),
353  boxSizeMult(boxSizeMult),
354  randNegEig(randNegEig),
355  numZeroFail(numZeroFail),
356  coordMap(coordMap),
357  optimizerForceTol(optimizerForceTol),
358  ignoreSmoothingFailures(ignoreSmoothingFailures),
359  enforceChirality(enforceChirality),
360  useExpTorsionAnglePrefs(useExpTorsionAnglePrefs),
361  useBasicKnowledge(useBasicKnowledge),
362  verbose(verbose),
363  basinThresh(basinThresh),
364  pruneRmsThresh(pruneRmsThresh),
365  onlyHeavyAtomsForRMS(onlyHeavyAtomsForRMS){};
366 };
367 
368 //! Parameters corresponding to Sereina Riniker's KDG approach
369 extern const EmbedParameters KDG;
370 //! Parameters corresponding to Sereina Riniker's ETDG approach
371 extern const EmbedParameters ETDG;
372 //! Parameters corresponding to Sereina Riniker's ETKDG approach
373 extern const EmbedParameters ETKDG;
374 
375 inline int EmbedMolecule(ROMol &mol, const EmbedParameters &params) {
376  return EmbedMolecule(
377  mol, params.maxIterations, params.randomSeed, params.clearConfs,
378  params.useRandomCoords, params.boxSizeMult, params.randNegEig,
379  params.numZeroFail, params.coordMap, params.optimizerForceTol,
381  params.useExpTorsionAnglePrefs, params.useBasicKnowledge, params.verbose,
382  params.basinThresh, params.onlyHeavyAtomsForRMS);
383 }
384 inline void EmbedMultipleConfs(ROMol &mol, INT_VECT &res, unsigned int numConfs,
385  const EmbedParameters &params) {
387  mol, res, numConfs, params.numThreads, params.maxIterations,
388  params.randomSeed, params.clearConfs, params.useRandomCoords,
389  params.boxSizeMult, params.randNegEig, params.numZeroFail,
390  params.pruneRmsThresh, params.coordMap, params.optimizerForceTol,
392  params.useExpTorsionAnglePrefs, params.useBasicKnowledge, params.verbose,
393  params.basinThresh, params.onlyHeavyAtomsForRMS);
394 }
395 inline INT_VECT EmbedMultipleConfs(ROMol &mol, unsigned int numConfs,
396  const EmbedParameters &params) {
397  INT_VECT res;
398  EmbedMultipleConfs(mol, res, numConfs, params);
399  return res;
400 }
401 }
402 }
403 
404 #endif
const EmbedParameters ETDG
Parameters corresponding to Sereina Riniker&#39;s ETDG approach.
const EmbedParameters KDG
Parameters corresponding to Sereina Riniker&#39;s KDG approach.
Defines the primary molecule class ROMol as well as associated typedefs.
EmbedParameters(unsigned int maxIterations, int numThreads, int randomSeed, bool clearConfs, bool useRandomCoords, double boxSizeMult, bool randNegEig, unsigned int numZeroFail, const std::map< int, RDGeom::Point3D > *coordMap, double optimizerForceTol, bool ignoreSmoothingFailures, bool enforceChirality, bool useExpTorsionAnglePrefs, bool useBasicKnowledge, bool verbose, double basinThresh, double pruneRmsThresh, bool onlyHeavyAtomsForRMS)
Definition: Embedder.h:340
const std::map< int, RDGeom::Point3D > * coordMap
Definition: Embedder.h:311
ROMol is a molecule class that is intended to have a fixed topology.
Definition: ROMol.h:103
void EmbedMultipleConfs(ROMol &mol, INT_VECT &res, unsigned int numConfs=10, int numThreads=1, unsigned int maxIterations=30, int seed=-1, bool clearConfs=true, bool useRandomCoords=false, double boxSizeMult=2.0, bool randNegEig=true, unsigned int numZeroFail=1, double pruneRmsThresh=-1.0, const std::map< int, RDGeom::Point3D > *coordMap=0, double optimizerForceTol=1e-3, bool ignoreSmoothingFailures=false, bool enforceChirality=true, bool useExpTorsionAnglePrefs=false, bool useBasicKnowledge=false, bool verbose=false, double basinThresh=5.0, bool onlyHeavyAtomsForRMS=false)
Parameter object for controlling embedding.
Definition: Embedder.h:302
std::vector< int > INT_VECT
Definition: types.h:191
Std stuff.
Definition: Atom.h:29
int EmbedMolecule(ROMol &mol, unsigned int maxIterations=0, int seed=-1, bool clearConfs=true, bool useRandomCoords=false, double boxSizeMult=2.0, bool randNegEig=true, unsigned int numZeroFail=1, const std::map< int, RDGeom::Point3D > *coordMap=0, double optimizerForceTol=1e-3, bool ignoreSmoothingFailures=false, bool enforceChirality=true, bool useExpTorsionAnglePrefs=false, bool useBasicKnowledge=false, bool verbose=false, double basinThresh=5.0, bool onlyHeavyAtomsForRMS=false)
Compute an embedding (in 3D) for the specified molecule using Distance.
const EmbedParameters ETKDG
Parameters corresponding to Sereina Riniker&#39;s ETKDG approach.