mlpack_knn - k-nearest-neighbors search
mlpack_knn [-a string] [-e double] [-m unknown] [-k int] [-l int] [-q string] [-R bool] [-r string] [-b double] [-s int] [-u double] [-t string] [-D string] [-T string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]
This program will calculate the k-nearest-neighbors of a set of
points using kd-trees or cover trees (cover tree support is experimental and
may be slow). You may specify a separate set of reference points and query
points, or just a reference set which will be used as both the reference and
query set.
For example, the following command will calculate the 5 nearest
neighbors of each point in 'input.csv' and store the distances in
'distances.csv' and the neighbors in 'neighbors.csv':
$ mlpack_knn --k 5 --reference_file input.csv
--neighbors_file neighbors.csv --distances_file
distances.csv
The output is organized such that row i and column j in the
neighbors output matrix corresponds to the index of the point in the
reference set which is the j'th nearest neighbor from the point in the query
set with index i. Row j and column i in the distances output matrix
corresponds to the distance between those two points.
- --algorithm
(-a) [string]
- Type of neighbor search: 'naive', 'single_tree', 'dual_tree', 'greedy'.
Default value 'dual_tree'.
- --epsilon
(-e) [double]
- If specified, will do approximate nearest neighbor search with given
relative error. Default value 0.
- --help (-h)
[bool]
- Default help info.
- --info
[string]
- Print help on a specific option. Default value ''.
- --input_model_file
(-m) [unknown]
- Pre-trained kNN model.
- --k (-k)
[int]
- Number of nearest neighbors to find. Default value 0.
- --leaf_size
(-l) [int]
- Leaf size for tree building (used for kd-trees, vp trees, random
projection trees, UB trees, R trees, R* trees, X trees, Hilbert R trees,
R+ trees, R++ trees, spill trees, and octrees). Default value 20.
- --query_file
(-q) [string]
- Matrix containing query points (optional).
- --random_basis
(-R) [bool]
- Before tree-building, project the data onto a random orthogonal
basis.
- --reference_file
(-r) [string]
- Matrix containing the reference dataset.
- --rho (-b)
[double]
- Balance threshold (only valid for spill trees). Default value 0.7.
- --seed (-s)
[int]
- Random seed (if 0, std::time(NULL) is used). Default value 0.
- --tau (-u)
[double]
- Overlapping size (only valid for spill trees). Default value 0.
- --tree_type
(-t) [string]
- Type of tree to use: 'kd', 'vp', 'rp', 'max-rp', 'ub', 'cover', 'r',
'r-star', 'x', 'ball', 'hilbert-r', 'r-plus', 'r-plus-plus', 'spill',
'oct'. Default value 'kd'.
- --true_distances_file
(-D) [string]
- Matrix of true distances to compute the effective error (average relative
error) (it is printed when -v is specified).
- --true_neighbors_file
(-T) [string]
- Matrix of true neighbors to compute the recall (it is printed when
-v is specified).
- --verbose
(-v) [bool]
- Display informational messages and the full list of parameters and timers
at the end of execution.
- --version
(-V) [bool]
- Display the version of mlpack.
For further information, including relevant papers, citations, and
theory, consult the documentation found at http://www.mlpack.org or included
with your distribution of mlpack.