mlpack_logistic_regression(1) | User Commands | mlpack_logistic_regression(1) |
mlpack_logistic_regression - l2-regularized logistic regression and prediction
mlpack_logistic_regression [-b int] [-d double] [-m unknown] [-l string] [-L double] [-n int] [-O string] [-s double] [-T string] [-e double] [-t string] [-V bool] [-o string] [-M unknown] [-p string] [-h -v]
An implementation of L2-regularized logistic regression using either the L-BFGS optimizer or SGD (stochastic gradient descent). This solves the regression problem
where y takes values 0 or 1.
y = (1 / 1 + e^-(X * b))
This program allows loading a logistic regression model (via the ’--input_model_file (-m)' parameter) or training a logistic regression model given training data (specified with the '--training_file (-t)' parameter), or both those things at once. In addition, this program allows classification on a test dataset (specified with the '--test_file (-T)' parameter) and the classification results may be saved with the '--output_file (-o)' output parameter. The trained logistic regression model may be saved using the ’--output_model_file (-M)' output parameter.
The training data, if specified, may have class labels as its last dimension. Alternately, the '--labels_file (-l)' parameter may be used to specify a separate matrix of labels.
When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be specified with the '--lambda (-L)' option, and the optimizer used to train the model can be specified with the '--optimizer (-O)' parameter. Available options are 'sgd' (stochastic gradient descent) and ’lbfgs' (the L-BFGS optimizer). There are also various parameters for the optimizer; the '--max_iterations (-n)' parameter specifies the maximum number of allowed iterations, and the '--tolerance (-e)' parameter specifies the tolerance for convergence. For the SGD optimizer, the '--step_size (-s)' parameter controls the step size taken at each iteration by the optimizer. The batch size for SGD is controlled with the '--batch_size (-b)' parameter. If the objective function for your data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for the optimizers, but the C++ interface must be used to access these.
For SGD, an iteration refers to a single point. So to take a single pass over the dataset with SGD, '--max_iterations (-n)' should be set to the number of points in the dataset.
Optionally, the model can be used to predict the responses for another matrix of data points, if '--test_file (-T)' is specified. The '--test_file (-T)' parameter can be specified without the '--training_file (-t)' parameter, so long as an existing logistic regression model is given with the ’--input_model_file (-m)' parameter. The output predictions from the logistic regression model may be saved with the '--output_file (-o)' parameter.
This implementation of logistic regression does not support the general multi-class case but instead only the two-class case. Any labels must be either 0 or 1. For more classes, see the softmax_regression program.
As an example, to train a logistic regression model on the data ''data.csv'' with labels ''labels.csv'' with L2 regularization of 0.1, saving the model to ’'lr_model.bin'', the following command may be used:
$ logistic_regression --training_file data.csv --labels_file labels.csv --lambda 0.1 --output_model_file lr_model.bin
Then, to use that model to predict classes for the dataset ''test.csv'', storing the output predictions in ''predictions.csv'', the following command may be used:
$ logistic_regression --input_model_file lr_model.bin --test_file test.csv --output_file predictions.csv
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.
18 November 2018 | mlpack-3.0.4 |