mlpack_approx_kfn(1) | User Commands | mlpack_approx_kfn(1) |
mlpack_approx_kfn - approximate furthest neighbor search
mlpack_approx_kfn [-a string] [-e bool] [-x string] [-m unknown] [-k int] [-p int] [-t int] [-q string] [-r string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]
This program implements two strategies for furthest neighbor search. These strategies are:
These two strategies give approximate results for the furthest neighbor search problem and can be used as fast replacements for other furthest neighbor techniques such as those found in the mlpack_kfn program. Note that typically, the 'ds' algorithm requires far fewer tables and projections than the 'qdafn' algorithm.
Specify a reference set (set to search in) with '--reference_file (-r)', specify a query set with '--query_file (-q)', and specify algorithm parameters with '--num_tables (-t)' and '--num_projections (-p)' (or don't and defaults will be used). The algorithm to be used (either 'ds'---the default---or ’qdafn') may be specified with '--algorithm (-a)'. Also specify the number of neighbors to search for with '--k (-k)'.
Note that for 'qdafn' in lower dimensions, '--num_projections (-p)' may need to be set to a high value in order to return results for each query point.
If no query set is specified, the reference set will be used as the query set. The '--output_model_file (-M)' output parameter may be used to store the built model, and an input model may be loaded instead of specifying a reference set with the '--input_model_file (-m)' option.
Results for each query point can be stored with the '--neighbors_file (-n)' and '--distances_file (-d)' output parameters. Each row of these output matrices holds the k distances or neighbor indices for each query point.
For example, to find the 5 approximate furthest neighbors with ’reference_set.csv' as the reference set and 'query_set.csv' as the query set using DrusillaSelect, storing the furthest neighbor indices to 'neighbors.csv' and the furthest neighbor distances to 'distances.csv', one could call
$ mlpack_approx_kfn --query_file query_set.csv --reference_file reference_set.csv --k 5 --algorithm ds --neighbors_file neighbors.csv --distances_file distances.csv
and to perform approximate all-furthest-neighbors search with k=1 on the set ’data.csv' storing only the furthest neighbor distances to 'distances.csv', one could call
$ mlpack_approx_kfn --reference_file reference_set.csv --k 1 --distances_file distances.csv
A trained model can be re-used. If a model has been previously saved to ’model.bin', then we may find 3 approximate furthest neighbors on a query set ’new_query_set.csv' using that model and store the furthest neighbor indices into 'neighbors.csv' by calling
$ mlpack_approx_kfn --input_model_file model.bin --query_file new_query_set.csv --k 3 --neighbors_file neighbors.csv
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.
12 December 2020 | mlpack-3.4.2 |