DOKK / manpages / debian 10 / mlpack-bin / mlpack_decision_tree.1.en
mlpack_decision_tree(1) User Commands mlpack_decision_tree(1)

mlpack_decision_tree - decision tree


mlpack_decision_tree [-m unknown] [-l string] [-g double] [-n int] [-e bool] [-T string] [-L string] [-t string] [-V bool] [-w string] [-M unknown] [-p string] [-P string] [-h -v]

Train and evaluate using a decision tree. Given a dataset containing numeric or categorical features, and associated labels for each point in the dataset, this program can train a decision tree on that data.

The training set and associated labels are specified with the '--training_file (-t)' and '--labels_file (-l)' parameters, respectively. The labels should be in the range [0, num_classes - 1]. Optionally, if '--labels_file (-l)' is not specified, the labels are assumed to be the last dimension of the training dataset.

When a model is trained, the '--output_model_file (-M)' output parameter may be used to save the trained model. A model may be loaded for predictions with the '--input_model_file (-m)' parameter. The '--input_model_file (-m)' parameter may not be specified when the '--training_file (-t)' parameter is specified. The '--minimum_leaf_size (-n)' parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The '--minimum_gain_split (-g)' parameter specifies the minimum gain that is needed for the node to split. If '--print_training_error (-e)' is specified, the training error will be printed.

Test data may be specified with the '--test_file (-T)' parameter, and if performance numbers are desired for that test set, labels may be specified with the '--test_labels_file (-L)' parameter. Predictions for each test point may be saved via the '--predictions_file (-p)' output parameter. Class probabilities for each prediction may be saved with the '--probabilities_file (-P)' output parameter.

For example, to train a decision tree with a minimum leaf size of 20 on the dataset contained in 'data.csv' with labels 'labels.csv', saving the output model to 'tree.bin' and printing the training error, one could call

$ decision_tree --training_file data.arff --labels_file labels.csv --output_model_file tree.bin --minimum_leaf_size 20 --minimum_gain_split 0.001 --print_training_error

Then, to use that model to classify points in 'test_set.csv' and print the test error given the labels 'test_labels.csv' using that model, while saving the predictions for each point to 'predictions.csv', one could call

$ decision_tree --input_model_file tree.bin --test_file test_set.arff --test_labels_file test_labels.csv --predictions_file predictions.csv

Default help info.
Get help on a specific module or option. Default value ''.
Pre-trained decision tree, to be used with test points. Default value ''.
Training labels. Default value ''.
Minimum gain for node splitting. Default value 1e-07.
Minimum number of points in a leaf. Default value 20.
Print the training error.
Testing dataset (may be categorical). Default value ''.
Test point labels, if accuracy calculation is desired. Default value ''.
Training dataset (may be categorical). Default value ''.
Display informational messages and the full list of parameters and timers at the end of execution.
Display the version of mlpack.

Output for trained decision tree. Default value ''.
Class predictions for each test point. Default value ''.
Class probabilities for each test point. Default value ''.

For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.

18 November 2018 mlpack-3.0.4