NAME

mlpack_decision_tree - decision tree

SYNOPSIS



 mlpack_decision_tree [-m unknown] [-l string] [-g double] [-n int] [-e bool] [-T string] [-L string] [-t string] [-V bool] [-w string] [-M unknown] [-p string] [-P string] [-h -v]

DESCRIPTION

Train and evaluate using a decision tree. Given a dataset containing numeric or categorical features, and associated labels for each point in the dataset, this program can train a decision tree on that data.

The training set and associated labels are specified with the '--training_file (-t)' and '--labels_file (-l)' parameters, respectively. The labels should be in the range [0, num_classes - 1]. Optionally, if '--labels_file (-l)' is not specified, the labels are assumed to be the last dimension of the training dataset.

When a model is trained, the '--output_model_file (-M)' output parameter may be used to save the trained model. A model may be loaded for predictions with the '--input_model_file (-m)' parameter. The '--input_model_file (-m)' parameter may not be specified when the '--training_file (-t)' parameter is specified. The '--minimum_leaf_size (-n)' parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The '--minimum_gain_split (-g)' parameter specifies the minimum gain that is needed for the node to split. If '--print_training_error (-e)' is specified, the training error will be printed.

Test data may be specified with the '--test_file (-T)' parameter, and if performance numbers are desired for that test set, labels may be specified with the '--test_labels_file (-L)' parameter. Predictions for each test point may be saved via the '--predictions_file (-p)' output parameter. Class probabilities for each prediction may be saved with the '--probabilities_file (-P)' output parameter.

For example, to train a decision tree with a minimum leaf size of 20 on the dataset contained in 'data.csv' with labels 'labels.csv', saving the output model to 'tree.bin' and printing the training error, one could call

$ decision_tree --training_file data.arff --labels_file labels.csv --output_model_file tree.bin --minimum_leaf_size 20 --minimum_gain_split 0.001 --print_training_error

Then, to use that model to classify points in 'test_set.csv' and print the test error given the labels 'test_labels.csv' using that model, while saving the predictions for each point to 'predictions.csv', one could call

$ decision_tree --input_model_file tree.bin --test_file test_set.arff --test_labels_file test_labels.csv --predictions_file predictions.csv

OPTIONAL INPUT OPTIONS

--help (-h) [bool]: Default help info.
--info [string]: Get help on a specific module or option. Default value ''.
--input_model_file (-m) [unknown]: Pre-trained decision tree, to be used with test points. Default value ''.
--labels_file (-l) [string]: Training labels. Default value ''.
--minimum_gain_split (-g) [double]: Minimum gain for node splitting. Default value 1e-07.
--minimum_leaf_size (-n) [int]: Minimum number of points in a leaf. Default value 20.
--print_training_error (-e) [bool]: Print the training error.
--test_file (-T) [string]: Testing dataset (may be categorical). Default value ''.
--test_labels_file (-L) [string]: Test point labels, if accuracy calculation is desired. Default value ''.
--training_file (-t) [string]: Training dataset (may be categorical). Default value ''.
--verbose (-v) [bool]: Display informational messages and the full list of parameters and timers at the end of execution.
--version (-V) [bool]: Display the version of mlpack.
--weights_file (-w) [string] The weight of labels Default value ''.

OPTIONAL OUTPUT OPTIONS

--output_model_file (-M) [unknown]: Output for trained decision tree. Default value ''.
--predictions_file (-p) [string]: Class predictions for each test point. Default value ''.
--probabilities_file (-P) [string]: Class probabilities for each test point. Default value ''.

ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.

18 November 2018

mlpack-3.0.4