1
2
3
4
5
6
7 """ command line utility for building composite models
8
9 #DOC
10
11 **Usage**
12
13 BuildComposite [optional args] filename
14
15 Unless indicated otherwise (via command line arguments), _filename_ is
16 a QDAT file.
17
18 **Command Line Arguments**
19
20 - -o *filename*: name of the output file for the pickled composite
21
22 - -n *num*: number of separate models to add to the composite
23
24 - -p *tablename*: store persistence data in the database
25 in table *tablename*
26
27 - -N *note*: attach some arbitrary text to the persistence data
28
29 - -b *filename*: name of the text file to hold examples from the
30 holdout set which are misclassified
31
32 - -s: split the data into training and hold-out sets before building
33 the composite
34
35 - -f *frac*: the fraction of data to use in the training set when the
36 data is split
37
38 - -r: randomize the activities (for testing purposes). This ignores
39 the initial distribution of activity values and produces each
40 possible activity value with equal likliehood.
41
42 - -S: shuffle the activities (for testing purposes) This produces
43 a permutation of the input activity values.
44
45 - -l: locks the random number generator to give consistent sets
46 of training and hold-out data. This is primarily intended
47 for testing purposes.
48
49 - -B: use a so-called Bayesian composite model.
50
51 - -d *database name*: instead of reading the data from a QDAT file,
52 pull it from a database. In this case, the _filename_ argument
53 provides the name of the database table containing the data set.
54
55 - -D: show a detailed breakdown of the composite model performance
56 across the training and, when appropriate, hold-out sets.
57
58 - -P *pickle file name*: write out the pickled data set to the file
59
60 - -F *filter frac*: filters the data before training to change the
61 distribution of activity values in the training set. *filter
62 frac* is the fraction of the training set that should have the
63 target value. **See note below on data filtering.**
64
65 - -v *filter value*: filters the data before training to change the
66 distribution of activity values in the training set. *filter
67 value* is the target value to use in filtering. **See note below
68 on data filtering.**
69
70 - --modelFiltFrac *model filter frac*: Similar to filter frac above,
71 in this case the data is filtered for each model in the composite
72 rather than a single overall filter for a composite. *model
73 filter frac* is the fraction of the training set for each model
74 that should have the target value (*model filter value*).
75
76 - --modelFiltVal *model filter value*: target value to use for
77 filtering data before training each model in the composite.
78
79 - -t *threshold value*: use high-confidence predictions for the
80 final analysis of the hold-out data.
81
82 - -Q *list string*: the values of quantization bounds for the
83 activity value. See the _-q_ argument for the format of *list
84 string*.
85
86 - --nRuns *count*: build *count* composite models
87
88 - --prune: prune any models built
89
90 - -h: print a usage message and exit.
91
92 - -V: print the version number and exit
93
94 *-*-*-*-*-*-*-*- Tree-Related Options -*-*-*-*-*-*-*-*
95
96 - -g: be less greedy when training the models.
97
98 - -G *number*: force trees to be rooted at descriptor *number*.
99
100 - -L *limit*: provide an (integer) limit on individual model
101 complexity
102
103 - -q *list string*: Add QuantTrees to the composite and use the list
104 specified in *list string* as the number of target quantization
105 bounds for each descriptor. Don't forget to include 0's at the
106 beginning and end of *list string* for the name and value fields.
107 For example, if there are 4 descriptors and you want 2 quant
108 bounds apiece, you would use _-q "[0,2,2,2,2,0]"_.
109 Two special cases:
110 1) If you would like to ignore a descriptor in the model
111 building, use '-1' for its number of quant bounds.
112 2) If you have integer valued data that should not be quantized
113 further, enter 0 for that descriptor.
114
115 - --recycle: allow descriptors to be used more than once in a tree
116
117 - --randomDescriptors=val: toggles growing random forests with val
118 randomly-selected descriptors available at each node.
119
120
121 *-*-*-*-*-*-*-*- KNN-Related Options -*-*-*-*-*-*-*-*
122
123 - --doKnn: use K-Nearest Neighbors models
124
125 - --knnK=*value*: the value of K to use in the KNN models
126
127 - --knnTanimoto: use the Tanimoto metric in KNN models
128
129 - --knnEuclid: use a Euclidean metric in KNN models
130
131 *-*-*-*-*-*-*- Naive Bayes Classifier Options -*-*-*-*-*-*-*-*
132 - --doNaiveBayes : use Naive Bayes classifiers
133
134 - --mEstimateVal : the value to be used in the m-estimate formula
135 If this is greater than 0.0, we use it to compute the conditional
136 probabilities by the m-estimate
137
138 *-*-*-*-*-*-*-*- SVM-Related Options -*-*-*-*-*-*-*-*
139
140 **** NOTE: THESE ARE DISABLED ****
141
142 ## - --doSVM: use Support-vector machines
143
144 ## - --svmKernel=*kernel*: choose the type of kernel to be used for
145 ## the SVMs. Options are:
146 ## The default is:
147
148 ## - --svmType=*type*: choose the type of support-vector machine
149 ## to be used. Options are:
150 ## The default is:
151
152 ## - --svmGamma=*gamma*: provide the gamma value for the SVMs. If this
153 ## is not provided, a grid search will be carried out to determine an
154 ## optimal *gamma* value for each SVM.
155
156 ## - --svmCost=*cost*: provide the cost value for the SVMs. If this is
157 ## not provided, a grid search will be carried out to determine an
158 ## optimal *cost* value for each SVM.
159
160 ## - --svmWeights=*weights*: provide the weight values for the
161 ## activities. If provided this should be a sequence of (label,
162 ## weight) 2-tuples *nActs* long. If not provided, a weight of 1
163 ## will be used for each activity.
164
165 ## - --svmEps=*epsilon*: provide the epsilon value used to determine
166 ## when the SVM has converged. Defaults to 0.001
167
168 ## - --svmDegree=*degree*: provide the degree of the kernel (when
169 ## sensible) Defaults to 3
170
171 ## - --svmCoeff=*coeff*: provide the coefficient for the kernel (when
172 ## sensible) Defaults to 0
173
174 ## - --svmNu=*nu*: provide the nu value for the kernel (when sensible)
175 ## Defaults to 0.5
176
177 ## - --svmDataType=*float*: if the data is contains only 1 and 0 s, specify by
178 ## using binary. Defaults to float
179
180 ## - --svmCache=*cache*: provide the size of the memory cache (in MB)
181 ## to be used while building the SVM. Defaults to 40
182
183 **Notes**
184
185 - *Data filtering*: When there is a large disparity between the
186 numbers of points with various activity levels present in the
187 training set it is sometimes desirable to train on a more
188 homogeneous data set. This can be accomplished using filtering.
189 The filtering process works by selecting a particular target
190 fraction and target value. For example, in a case where 95% of
191 the original training set has activity 0 and ony 5% activity 1, we
192 could filter (by randomly removing points with activity 0) so that
193 30% of the data set used to build the composite has activity 1.
194
195
196 """
197 import RDConfig
198 from utils import listutils
199 from ML.Composite import Composite,BayesComposite
200
201 from Numeric import *
202 from ML.Data import DataUtils,SplitData
203 from ML import ScreenComposite
204 from Dbase import DbModule
205 from Dbase.DbConnection import DbConnect
206 from ML import CompositeRun
207 import sys,cPickle,time
208 import DataStructs
209
210 _runDetails = CompositeRun.CompositeRun()
211
212 __VERSION_STRING="3.2.3"
213
214 _verbose = 1
216 """ emits messages to _sys.stdout_
217 override this in modules which import this one to redirect output
218
219 **Arguments**
220
221 - msg: the string to be displayed
222
223 """
224 if _verbose: sys.stdout.write('%s\n'%(msg))
225
226
227 -def testall(composite,examples,badExamples=[]):
228 """ screens a number of examples past a composite
229
230 **Arguments**
231
232 - composite: a composite model
233
234 - examples: a list of examples (with results) to be screened
235
236 - badExamples: a list to which misclassified examples are appended
237
238 **Returns**
239
240 a list of 2-tuples containing:
241
242 1) a vote
243
244 2) a confidence
245
246 these are the votes and confidence levels for **misclassified** examples
247
248 """
249 wrong = []
250 for example in examples:
251 if composite.GetActivityQuantBounds():
252 answer = composite.QuantizeActivity(example)[-1]
253 else:
254 answer = example[-1]
255 res,conf = composite.ClassifyExample(example)
256 if res != answer:
257 wrong.append((res,conf))
258 badExamples.append(example)
259
260 return wrong
261
354
355 -def RunOnData(details,data,progressCallback=None,saveIt=1,setDescNames=0):
356 nExamples = data.GetNPts()
357 if details.lockRandom:
358 seed = details.randomSeed
359 else:
360 import random
361 seed = (random.randint(0,1e6),random.randint(0,1e6))
362 DataUtils.InitRandomNumbers(seed)
363 testExamples = []
364 if details.shuffleActivities == 1:
365 DataUtils.RandomizeActivities(data,shuffle=1,runDetails=details)
366 elif details.randomActivities == 1:
367 DataUtils.RandomizeActivities(data,shuffle=0,runDetails=details)
368
369 namedExamples = data.GetNamedData()
370 if details.splitRun == 1:
371 trainIdx,testIdx = SplitData.SplitIndices(len(namedExamples),details.splitFrac,
372 silent=not _verbose)
373
374 trainExamples = [namedExamples[x] for x in trainIdx]
375 testExamples = [namedExamples[x] for x in testIdx]
376 else:
377 testExamples = []
378 testIdx = []
379 trainIdx = range(len(namedExamples))
380 trainExamples = namedExamples
381
382 if details.filterFrac != 0.0:
383
384 if hasattr(details,'activityBounds') and details.activityBounds:
385 tExamples = []
386 bounds = details.activityBounds
387 for pt in trainExamples:
388 pt = pt[:]
389 act = pt[-1]
390 placed=0
391 bound=0
392 while not placed and bound < len(bounds):
393 if act < bounds[bound]:
394 pt[-1] = bound
395 placed = 1
396 else:
397 bound += 1
398 if not placed:
399 pt[-1] = bound
400 tExamples.append(pt)
401 else:
402 bounds = None
403 tExamples = trainExamples
404 trainIdx,temp = DataUtils.FilterData(tExamples,details.filterVal,
405 details.filterFrac,-1,
406 indicesOnly=1)
407 tmp = [trainExamples[x] for x in trainIdx]
408 testExamples += [trainExamples[x] for x in temp]
409 trainExamples = tmp
410
411 counts = DataUtils.CountResults(trainExamples,bounds=bounds)
412 ks = counts.keys()
413 ks.sort()
414 message('Result Counts in training set:')
415 for k in ks:
416 message(str((k, counts[k])))
417 counts = DataUtils.CountResults(testExamples,bounds=bounds)
418 ks = counts.keys()
419 ks.sort()
420 message('Result Counts in test set:')
421 for k in ks:
422 message(str((k, counts[k])))
423 nExamples = len(trainExamples)
424 message('Training with %d examples'%(nExamples))
425
426 nVars = data.GetNVars()
427 attrs = range(1,nVars+1)
428 nPossibleVals = data.GetNPossibleVals()
429 for i in range(1,len(nPossibleVals)):
430 if nPossibleVals[i-1] == -1:
431 attrs.remove(i)
432
433 if details.pickleDataFileName != '':
434 pickleDataFile = open(details.pickleDataFileName,'wb+')
435 cPickle.dump(trainExamples,pickleDataFile)
436 cPickle.dump(testExamples,pickleDataFile)
437 pickleDataFile.close()
438
439 if details.bayesModel:
440 composite = BayesComposite.BayesComposite()
441 else:
442 composite = Composite.Composite()
443
444 composite._randomSeed = seed
445 composite._splitFrac = details.splitFrac
446 composite._shuffleActivities = details.shuffleActivities
447 composite._randomizeActivities = details.randomActivities
448
449 if hasattr(details,'filterFrac'):
450 composite._filterFrac = details.filterFrac
451 if hasattr(details,'filterVal'):
452 composite._filterVal = details.filterVal
453
454 composite.SetModelFilterData(details.modelFilterFrac, details.modelFilterVal)
455
456 composite.SetActivityQuantBounds(details.activityBounds)
457 nPossibleVals = data.GetNPossibleVals()
458 if details.activityBounds:
459 nPossibleVals[-1] = len(details.activityBounds)+1
460
461
462 if setDescNames:
463 composite.SetInputOrder(data.GetVarNames())
464 composite.SetDescriptorNames(details._descNames)
465 else:
466 composite.SetDescriptorNames(data.GetVarNames())
467 composite.SetActivityQuantBounds(details.activityBounds)
468 if details.nModels==1:
469 details.internalHoldoutFrac=0.0
470 if details.useTrees:
471 from ML.DecTree import CrossValidate,PruneTree
472 if details.qBounds != []:
473 from ML.DecTree import BuildQuantTree
474 builder = BuildQuantTree.QuantTreeBoot
475 else:
476 from ML.DecTree import ID3
477 builder = ID3.ID3Boot
478 driver = CrossValidate.CrossValidationDriver
479 pruner = PruneTree.PruneTree
480
481 composite.SetQuantBounds(details.qBounds)
482 nPossibleVals = data.GetNPossibleVals()
483 if details.activityBounds:
484 nPossibleVals[-1] = len(details.activityBounds)+1
485 composite.Grow(trainExamples,attrs,nPossibleVals=[0]+nPossibleVals,
486 buildDriver=driver,
487 pruner=pruner,
488 nTries=details.nModels,pruneIt=details.pruneIt,
489 lessGreedy=details.lessGreedy,needsQuantization=0,
490 treeBuilder=builder,nQuantBounds=details.qBounds,
491 startAt=details.startAt,
492 maxDepth=details.limitDepth,
493 progressCallback=progressCallback,
494 holdOutFrac=details.internalHoldoutFrac,
495 replacementSelection=details.replacementSelection,
496 recycleVars=details.recycleVars,
497 randomDescriptors=details.randomDescriptors,
498 silent=not _verbose)
499
500 elif details.useSigTrees:
501 from ML.DecTree import CrossValidate
502 from ML.DecTree import BuildSigTree
503 builder = BuildSigTree.SigTreeBuilder
504 driver = CrossValidate.CrossValidationDriver
505 nPossibleVals = data.GetNPossibleVals()
506 if details.activityBounds:
507 nPossibleVals[-1] = len(details.activityBounds)+1
508 if hasattr(details,'sigTreeBiasList'):
509 biasList = details.sigTreeBiasList
510 else:
511 biasList=None
512 if hasattr(details,'useCMIM'):
513 useCMIM=details.useCMIM
514 else:
515 useCMIM=0
516 if hasattr(details,'allowCollections'):
517 allowCollections = details.allowCollections
518 else:
519 allowCollections=False
520 composite.Grow(trainExamples,attrs,nPossibleVals=[0]+nPossibleVals,
521 buildDriver=driver,
522 nTries=details.nModels,
523 needsQuantization=0,
524 treeBuilder=builder,
525 maxDepth=details.limitDepth,
526 progressCallback=progressCallback,
527 holdOutFrac=details.internalHoldoutFrac,
528 replacementSelection=details.replacementSelection,
529 recycleVars=details.recycleVars,
530 randomDescriptors=details.randomDescriptors,
531 biasList=biasList,
532 useCMIM=useCMIM,
533 allowCollection=allowCollections,
534 silent=not _verbose)
535
536 elif details.useKNN:
537 from ML.KNN import CrossValidate
538 from ML.KNN import DistFunctions
539
540 driver = CrossValidate.CrossValidationDriver
541 dfunc = ''
542 if (details.knnDistFunc == "Euclidean") :
543 dfunc = DistFunctions.EuclideanDist
544 elif (details.knnDistFunc == "Tanimoto"):
545 dfunc = DistFunctions.TanimotoDist
546 else:
547 assert 0,"Bad KNN distance metric value"
548
549
550 composite.Grow(trainExamples, attrs, nPossibleVals=[0]+nPossibleVals,
551 buildDriver=driver, nTries=details.nModels,
552 needsQuantization=0,
553 numNeigh=details.knnNeighs,
554 holdOutFrac=details.internalHoldoutFrac,
555 distFunc=dfunc)
556
557 elif details.useNaiveBayes or details.useSigBayes:
558 from ML.NaiveBayes import CrossValidate
559 driver = CrossValidate.CrossValidationDriver
560 if not (hasattr(details,'useSigBayes') and details.useSigBayes):
561 composite.Grow(trainExamples, attrs, nPossibleVals=[0]+nPossibleVals,
562 buildDriver=driver, nTries=details.nModels,
563 needsQuantization=0, nQuantBounds=details.qBounds,
564 holdOutFrac=details.internalHoldoutFrac,
565 replacementSelection=details.replacementSelection,
566 mEstimateVal=details.mEstimateVal,
567 silent=not _verbose)
568 else:
569 if hasattr(details,'useCMIM'):
570 useCMIM=details.useCMIM
571 else:
572 useCMIM=0
573
574 composite.Grow(trainExamples, attrs, nPossibleVals=[0]+nPossibleVals,
575 buildDriver=driver, nTries=details.nModels,
576 needsQuantization=0, nQuantBounds=details.qBounds,
577 mEstimateVal=details.mEstimateVal,
578 useSigs=True,useCMIM=useCMIM,
579 holdOutFrac=details.internalHoldoutFrac,
580 replacementSelection=details.replacementSelection,
581 silent=not _verbose)
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601 else:
602 from ML.Neural import CrossValidate
603 driver = CrossValidate.CrossValidationDriver
604 composite.Grow(trainExamples,attrs,[0]+nPossibleVals,nTries=details.nModels,
605 buildDriver=driver,needsQuantization=0)
606
607 composite.AverageErrors()
608 composite.SortModels()
609 modelList,counts,avgErrs = composite.GetAllData()
610 counts = array(counts)
611 avgErrs = array(avgErrs)
612 composite._varNames = data.GetVarNames()
613
614 for i in xrange(len(modelList)):
615 modelList[i].NameModel(composite._varNames)
616
617
618 weightedErrs = counts*avgErrs
619 averageErr = sum(weightedErrs)/sum(counts)
620 devs = (avgErrs - averageErr)
621 devs = devs * counts
622 devs = sqrt(devs*devs)
623 avgDev = sum(devs)/sum(counts)
624 message('# Overall Average Error: %%% 5.2f, Average Deviation: %%% 6.2f'%(100.*averageErr,100.*avgDev))
625
626 if details.bayesModel:
627 composite.Train(trainExamples,verbose=0)
628
629
630 composite.ClearModelExamples()
631 if saveIt:
632 composite.Pickle(details.outName)
633 details.model = DbModule.binaryHolder(cPickle.dumps(composite))
634
635 badExamples = []
636 if not details.detailedRes and (not hasattr(details,'noScreen') or not details.noScreen):
637 if details.splitRun:
638 message('Testing all hold-out examples')
639 wrong = testall(composite,testExamples,badExamples)
640 message('%d examples (%% %5.2f) were misclassified'%(len(wrong),
641 100.*float(len(wrong))/float(len(testExamples))))
642 _runDetails.holdout_error = float(len(wrong))/len(testExamples)
643 else:
644 message('Testing all examples')
645 wrong = testall(composite,namedExamples,badExamples)
646 message('%d examples (%% %5.2f) were misclassified'%(len(wrong),
647 100.*float(len(wrong))/float(len(namedExamples))))
648 _runDetails.overall_error = float(len(wrong))/len(namedExamples)
649
650 if details.detailedRes:
651 message('\nEntire data set:')
652 resTup = ScreenComposite.ShowVoteResults(range(data.GetNPts()),data,composite,
653 nPossibleVals[-1],details.threshold)
654 nGood,nBad,nSkip,avgGood,avgBad,avgSkip,voteTab = resTup
655 nPts = len(namedExamples)
656 nClass = nGood+nBad
657 _runDetails.overall_error = float(nBad) / nClass
658 _runDetails.overall_correct_conf = avgGood
659 _runDetails.overall_incorrect_conf = avgBad
660 _runDetails.overall_result_matrix = repr(voteTab)
661 nRej = nClass-nPts
662 if nRej > 0:
663 _runDetails.overall_fraction_dropped = float(nRej)/nPts
664
665 if details.splitRun:
666 message('\nHold-out data:')
667 resTup = ScreenComposite.ShowVoteResults(range(len(testExamples)),testExamples,
668 composite,
669 nPossibleVals[-1],details.threshold)
670 nGood,nBad,nSkip,avgGood,avgBad,avgSkip,voteTab = resTup
671 nPts = len(testExamples)
672 nClass = nGood+nBad
673 _runDetails.holdout_error = float(nBad) / nClass
674 _runDetails.holdout_correct_conf = avgGood
675 _runDetails.holdout_incorrect_conf = avgBad
676 _runDetails.holdout_result_matrix = repr(voteTab)
677 nRej = nClass-nPts
678 if nRej > 0:
679 _runDetails.holdout_fraction_dropped = float(nRej)/nPts
680
681
682 if details.persistTblName and details.dbName:
683 message('Updating results table %s:%s'%(details.dbName,details.persistTblName))
684 details.Store(db=details.dbName,table=details.persistTblName)
685
686 if details.badName != '':
687 badFile = open(details.badName,'w+')
688 for i in xrange(len(badExamples)):
689 ex = badExamples[i]
690 vote = wrong[i]
691 outStr = '%s\t%s\n'%(ex,vote)
692 badFile.write(outStr)
693 badFile.close()
694
695 composite.ClearModelExamples()
696 return composite
697
698 -def RunIt(details,progressCallback=None,saveIt=1,setDescNames=0):
699 """ does the actual work of building a composite model
700
701 **Arguments**
702
703 - details: a _CompositeRun.CompositeRun_ object containing details
704 (options, parameters, etc.) about the run
705
706 - progressCallback: (optional) a function which is called with a single
707 argument (the number of models built so far) after each model is built.
708
709 - saveIt: (optional) if this is nonzero, the resulting model will be pickled
710 and dumped to the filename specified in _details.outName_
711
712 - setDescNames: (optional) if nonzero, the composite's _SetInputOrder()_ method
713 will be called using the results of the data set's _GetVarNames()_ method;
714 it is assumed that the details object has a _descNames attribute which
715 is passed to the composites _SetDescriptorNames()_ method. Otherwise
716 (the default), _SetDescriptorNames()_ gets the results of _GetVarNames()_.
717
718 **Returns**
719
720 the composite model constructed
721
722
723 """
724 details.rundate = time.asctime()
725
726 fName = details.tableName.strip()
727 if details.outName == '':
728 details.outName = fName + '.pkl'
729 if not details.dbName:
730 if details.qBounds != []:
731 data = DataUtils.TextFileToData(fName)
732 else:
733 data = DataUtils.BuildQuantDataSet(fName)
734 elif details.useSigTrees or details.useSigBayes:
735 details.tableName = fName
736 data = details.GetDataSet(pickleCol=0,pickleClass=DataStructs.ExplicitBitVect)
737 elif details.qBounds != [] or not details.useTrees:
738 details.tableName = fName
739 data = details.GetDataSet()
740 else:
741 data = DataUtils.DBToQuantData(details.dbName,fName,quantName=details.qTableName,
742 user=details.dbUser,password=details.dbPassword)
743
744 composite = RunOnData(details,data,progressCallback=progressCallback,
745 saveIt=saveIt,setDescNames=setDescNames)
746 return composite
747
748
750 """ prints the version number
751
752 """
753 print 'This is BuildComposite.py version %s'%(__VERSION_STRING)
754 if includeArgs:
755 import sys
756 print 'command line was:'
757 print ' '.join(sys.argv)
758
760 """ provides a list of arguments for when this is used from the command line
761
762 """
763 import sys
764 print __doc__
765 sys.exit(-1)
766
768 """ initializes a details object with default values
769
770 **Arguments**
771
772 - details: (optional) a _CompositeRun.CompositeRun_ object.
773 If this is not provided, the global _runDetails will be used.
774
775 **Returns**
776
777 the initialized _CompositeRun_ object.
778
779
780 """
781 if runDetails is None: runDetails = _runDetails
782 return CompositeRun.SetDefaults(runDetails)
783
785 """ parses command line arguments and updates _runDetails_
786
787 **Arguments**
788
789 - runDetails: a _CompositeRun.CompositeRun_ object.
790
791 """
792 import getopt
793 args,extra = getopt.getopt(sys.argv[1:],'P:o:n:p:b:sf:F:v:hlgd:rSTt:BQ:q:DVG:N:L:',
794 ['nRuns=','prune','profile',
795 'seed=','noScreen',
796
797 'modelFiltFrac=', 'modelFiltVal=',
798
799 'recycle','randomDescriptors=',
800
801 'doKnn','knnK=','knnTanimoto','knnEuclid',
802
803 'doSigTree','doCMIM=','allowCollections',
804
805 'doNaiveBayes', 'mEstimateVal=',
806 'doSigBayes',
807
808
809
810
811
812
813 'replacementSelection',
814
815 ])
816 runDetails.profileIt=0
817 for arg,val in args:
818 if arg == '-n':
819 runDetails.nModels = int(val)
820 elif arg == '-N':
821 runDetails.note=val
822 elif arg == '-o':
823 runDetails.outName = val
824 elif arg == '-Q':
825 qBounds = eval(val)
826 assert type(qBounds) in [type([]),type(())],'bad argument type for -Q, specify a list as a string'
827 runDetails.activityBounds=qBounds
828 runDetails.activityBoundsVals=val
829 elif arg == '-p':
830 runDetails.persistTblName=val
831 elif arg == '-P':
832 runDetails.pickleDataFileName= val
833 elif arg == '-r':
834 runDetails.randomActivities = 1
835 elif arg == '-S':
836 runDetails.shuffleActivities = 1
837 elif arg == '-b':
838 runDetails.badName = val
839 elif arg == '-B':
840 runDetails.bayesModels=1
841 elif arg == '-s':
842 runDetails.splitRun = 1
843 elif arg == '-f':
844 runDetails.splitFrac=float(val)
845 elif arg == '-F':
846 runDetails.filterFrac=float(val)
847 elif arg == '-v':
848 runDetails.filterVal=float(val)
849 elif arg == '-l':
850 runDetails.lockRandom = 1
851 elif arg == '-g':
852 runDetails.lessGreedy=1
853 elif arg == '-G':
854 runDetails.startAt = int(val)
855 elif arg == '-d':
856 runDetails.dbName=val
857 elif arg == '-T':
858 runDetails.useTrees = 0
859 elif arg == '-t':
860 runDetails.threshold=float(val)
861 elif arg == '-D':
862 runDetails.detailedRes = 1
863 elif arg == '-L':
864 runDetails.limitDepth = int(val)
865 elif arg == '-q':
866 qBounds = eval(val)
867 assert type(qBounds) in [type([]),type(())],'bad argument type for -q, specify a list as a string'
868 runDetails.qBoundCount=val
869 runDetails.qBounds = qBounds
870 elif arg == '-V':
871 ShowVersion()
872 sys.exit(0)
873 elif arg == '--nRuns':
874 runDetails.nRuns = int(val)
875 elif arg == '--modelFiltFrac':
876 runDetails.modelFilterFrac=float(val)
877 elif arg == '--modelFiltVal':
878 runDetails.modelFilterVal=float(val)
879 elif arg == '--prune':
880 runDetails.pruneIt=1
881 elif arg == '--profile':
882 runDetails.profileIt=1
883
884 elif arg == '--recycle':
885 runDetails.recycleVars=1
886 elif arg == '--randomDescriptors':
887 runDetails.randomDescriptors=int(val)
888
889 elif arg == '--doKnn':
890 runDetails.useKNN=1
891 runDetails.useTrees=0
892
893 runDetails.useNaiveBayes=0
894 elif arg == '--knnK':
895 runDetails.knnNeighs = int(val)
896 elif arg == '--knnTanimoto':
897 runDetails.knnDistFunc="Tanimoto"
898 elif arg == '--knnEuclid':
899 runDetails.knnDistFunc="Euclidean"
900
901 elif arg == '--doSigTree':
902
903 runDetails.useKNN=0
904 runDetails.useTrees=0
905 runDetails.useNaiveBayes=0
906 runDetails.useSigTrees=1
907 elif arg == '--doCMIM':
908 runDetails.useCMIM=int(val)
909 elif arg == '--allowCollections':
910 runDetails.allowCollections=True
911
912 elif arg == '--doNaiveBayes':
913 runDetails.useNaiveBayes=1
914
915 runDetails.useKNN=0
916 runDetails.useTrees=0
917 runDetails.useSigBayes=0
918 elif arg == '--doSigBayes':
919 runDetails.useSigBayes=1
920 runDetails.useNaiveBayes=0
921
922 runDetails.useKNN=0
923 runDetails.useTrees=0
924 elif arg == '--mEstimateVal':
925 runDetails.mEstimateVal=float(val)
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966 elif arg== '--seed':
967
968 runDetails.randomSeed = eval(val)
969
970 elif arg== '--noScreen':
971 runDetails.noScreen=1
972
973 elif arg== '--replacementSelection':
974 runDetails.replacementSelection = 1
975
976 elif arg == '-h':
977 Usage()
978
979 else:
980 Usage()
981 runDetails.tableName=extra[0]
982
983 if __name__ == '__main__':
984 if len(sys.argv) < 2:
985 Usage()
986
987 _runDetails.cmd = ' '.join(sys.argv)
988 SetDefaults(_runDetails)
989 ParseArgs(_runDetails)
990
991
992 ShowVersion(includeArgs=1)
993
994 if _runDetails.nRuns > 1:
995 for i in range(_runDetails.nRuns):
996 sys.stderr.write('---------------------------------\n\tDoing %d of %d\n---------------------------------\n'%(i+1,_runDetails.nRuns))
997 RunIt(_runDetails)
998 else:
999 if _runDetails.profileIt:
1000 import hotshot,hotshot.stats
1001 prof=hotshot.Profile('prof.dat')
1002 prof.runcall(RunIt,_runDetails)
1003 stats = hotshot.stats.load('prof.dat')
1004 stats.strip_dirs()
1005 stats.sort_stats('time','calls')
1006 stats.print_stats(30)
1007 else:
1008 RunIt(_runDetails)
1009