NAME

tpot - Automated Machine Learning tool

DESCRIPTION

usage: tpot [-h] [-is INPUT_SEPARATOR] [-target TARGET_NAME]

: [-mode {classification,regression}] [-o OUTPUT_FILE] [-g GENERATIONS] [-p POPULATION_SIZE] [-os OFFSPRING_SIZE] [-mr MUTATION_RATE] [-xr CROSSOVER_RATE] [-scoring SCORING_FN] [-cv NUM_CV_FOLDS] [-sub SUBSAMPLE] [-njobs NUM_JOBS] [-maxtime MAX_TIME_MINS] [-maxeval MAX_EVAL_MINS] [-s RANDOM_STATE] [-config CONFIG_FILE] [-template TEMPLATE] [-memory MEMORY] [-cf CHECKPOINT_FOLDER] [-es EARLY_STOP] [-v {0,1,2,3}] [-log LOG] [--version] INPUT_FILE

A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.

positional arguments:

INPUT_FILE: Data file to use in the TPOT optimization process. Ensure that the class label column is labeled as "class".

optional arguments:

-h, --help: Show this help message and exit.
-is INPUT_SEPARATOR: Character used to separate columns in the input file.
-target TARGET_NAME: Name of the target column in the input file.
-mode {classification,regression}: Whether TPOT is being used for a supervised classification or regression problem.
-o OUTPUT_FILE: File to export the code for the final optimized pipeline.
-g GENERATIONS: Number of iterations to run the pipeline optimization process. It must be a positive number or None. If None, the parameter max_time_mins must be defined as the runtime limit. Generally, TPOT will work better when you give it more generations (and therefore time) to optimize the pipeline. TPOT will evaluate POPULATION_SIZE + GENERATIONS x OFFSPRING_SIZE pipelines in total.
-p POPULATION_SIZE: Number of individuals to retain in the GP population every generation. Generally, TPOT will work better when you give it more individuals (and therefore time) to optimize the pipeline. TPOT will evaluate POPULATION_SIZE + GENERATIONS x OFFSPRING_SIZE pipelines in total.
-os OFFSPRING_SIZE: Number of offspring to produce in each GP generation. By default,OFFSPRING_SIZE = POPULATION_SIZE.
-mr MUTATION_RATE: GP mutation rate in the range [0.0, 1.0]. This tells the GP algorithm how many pipelines to apply random changes to every generation. We recommend using the default parameter unless you understand how the mutation rate affects GP algorithms.
-xr CROSSOVER_RATE: GP crossover rate in the range [0.0, 1.0]. This tells the GP algorithm how many pipelines to "breed" every generation. We recommend using the default parameter unless you understand how the crossover rate affects GP algorithms.
-scoring SCORING_FN: Function used to evaluate the quality of a given pipeline for the problem. By default, accuracy is used for classification problems and mean squared error (mse) is used for regression problems. Note: If you wrote your own function, set this argument to mymodule.myfunctionand TPOT will import your module and take the function from there.TPOT will assume the module can be imported from the current workdir.TPOT assumes that any function with "error" or "loss" in the name is meant to be minimized, whereas any other functions will be maximized. Offers the same options as cross_val_score: accuracy, adjusted_rand_score, average_precision, f1, f1_macro, f1_micro, f1_samples, f1_weighted, neg_log_loss, neg_mean_absolute_error, neg_mean_squared_error, neg_median_absolute_error, precision, precision_macro, precision_micro, precision_samples, precision_weighted, r2, recall, recall_macro, recall_micro, recall_samples, recall_weighted, roc_auc
-cv NUM_CV_FOLDS: Number of folds to evaluate each pipeline over in stratified k-fold cross-validation during the TPOT optimization process.
-sub SUBSAMPLE: Subsample ratio of the training instance. Setting it to 0.5 means that TPOT will use a random subsample of half of training data for the pipeline optimization process.
-njobs NUM_JOBS: Number of CPUs for evaluating pipelines in parallel during the TPOT optimization process. Assigning this to -1 will use as many cores as available on the computer. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
-maxtime MAX_TIME_MINS: How many minutes TPOT has to optimize the pipeline. If not None, this setting will allow TPOT to run until max_time_mins minutes elapsed and then stop. TPOT will stop earlier if generationsis set and all generations are already evaluated.
-maxeval MAX_EVAL_MINS: How many minutes TPOT has to evaluate a single pipeline. Setting this parameter to higher values will allow TPOT to explore more complex pipelines but will also allow TPOT to run longer.
-s RANDOM_STATE: Random number generator seed for reproducibility. Set this seed if you want your TPOT run to be reproducible with the same seed and data set in the future.
-config CONFIG_FILE: Configuration file for customizing the operators and parameters that TPOT uses in the optimization process. Must be a Python module containing a dict export named "tpot_config" or the name of built-in configuration.
-template TEMPLATE: Template of predefined pipeline structure. The option is for specifying a desired structurefor the machine learning pipeline evaluated in TPOT. So far this option only supportslinear pipeline structure. Each step in the pipeline should be a main class of operators(Selector, Transformer, Classifier or Regressor) or a specific operator(e.g. SelectPercentile) defined in TPOT operator configuration. If one step is a main class,TPOT will randomly assign all subclass operators (subclasses of SelectorMixin,TransformerMixin, ClassifierMixin or RegressorMixin in scikit-learn) to that step.Steps in the template are delimited by "-", e.g. "SelectPercentile-Transformer-Classifier".By default value of template is None, TPOT generates tree-based pipeline randomly.
-memory MEMORY: Path of a directory for pipeline caching or "auto" for using a temporary caching directory during the optimization process. If supplied, pipelines will cache each transformer after fitting them. This feature is used to avoid repeated computation by transformers within a pipeline if the parameters and input data are identical with another fitted pipeline during optimization process.
-cf CHECKPOINT_FOLDER: If supplied, a folder in which tpot will periodically save the best pipeline so far while optimizing. This is useful in multiple cases: sudden death before tpot could save an optimized pipeline, progress tracking, grabbing a pipeline while it's still optimizing etc.
-es EARLY_STOP: How many generations TPOT checks whether there is no improvement in optimization process. End optimization process if there is no improvement in the set number of generations.
-v {0,1,2,3}: How much information TPOT communicates while it is running: 0 = none, 1 = minimal, 2 = high, 3 = all. A setting of 2 or higher will add a progress bar during the optimization procedure.
-log LOG: Save progress content to a file
--version: Show the TPOT version number and exit.

January 2021

tpot 0.11.7+dfsg