mlpack_linear_svm(1) | User Commands | mlpack_linear_svm(1) |
mlpack_linear_svm - linear svm is an l2-regularized support vector machine.
mlpack_linear_svm [-d double] [-E int] [-m unknown] [-l string] [-r double] [-n int] [-N bool] [-c int] [-O string] [-s int] [-S bool] [-a double] [-T string] [-L string] [-e double] [-t string] [-V bool] [-M unknown] [-P string] [-p string] [-h -v]
An implementation of linear SVMs that uses either L-BFGS or parallel SGD (stochastic gradient descent) to train the model.
This program allows loading a linear SVM model (via the '--input_model_file (-m)' parameter) or training a linear SVM model given training data (specified with the '--training_file (-t)' parameter), or both those things at once. In addition, this program allows classification on a test dataset (specified with the '--test_file (-T)' parameter) and the classification results may be saved with the '--predictions_file (-P)' output parameter. The trained linear SVM model may be saved using the '--output_model_file (-M)' output parameter.
The training data, if specified, may have class labels as its last dimension. Alternately, the '--labels_file (-l)' parameter may be used to specify a separate vector of labels.
When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be specified with the '--lambda (-r)' option, and the number of classes can be manually specified with the '--num_classes (-c)'and if an intercept term is not desired in the model, the '--no_intercept (-N)' parameter can be specified.Margin of difference between correct class and other classes can be specified with the '--delta (-d)' option.The optimizer used to train the model can be specified with the '--optimizer (-O)' parameter. Available options are 'psgd' (parallel stochastic gradient descent) and 'lbfgs' (the L-BFGS optimizer). There are also various parameters for the optimizer; the '--max_iterations (-n)' parameter specifies the maximum number of allowed iterations, and the '--tolerance (-e)' parameter specifies the tolerance for convergence. For the parallel SGD optimizer, the ’--step_size (-a)' parameter controls the step size taken at each iteration by the optimizer and the maximum number of epochs (specified with '--epochs (-E)'). If the objective function for your data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for the optimizers, but the C++ interface must be used to access these.
Optionally, the model can be used to predict the labels for another matrix of data points, if '--test_file (-T)' is specified. The '--test_file (-T)' parameter can be specified without the '--training_file (-t)' parameter, so long as an existing linear SVM model is given with the '--input_model_file (-m)' parameter. The output predictions from the linear SVM model may be saved with the '--predictions_file (-P)' parameter.
As an example, to train a LinaerSVM on the data ''data.csv'' with labels ’'labels.csv'' with L2 regularization of 0.1, saving the model to ’'lsvm_model.bin'', the following command may be used:
$ mlpack_linear_svm --training_file data.csv --labels_file labels.csv --lambda 0.1 --delta 1 --num_classes 0 --output_model_file lsvm_model.bin
Then, to use that model to predict classes for the dataset ''test.csv'', storing the output predictions in ''predictions.csv'', the following command may be used:
$ mlpack_linear_svm --input_model_file lsvm_model.bin --test_file test.csv --predictions_file predictions.csv
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.
12 December 2020 | mlpack-3.4.2 |