NAME

mlpack_krann - k-rank-approximate-nearest-neighbors (krann)

SYNOPSIS



 mlpack_krann [-a double] [-X bool] [-m unknown] [-k int] [-l int] [-N bool] [-q string] [-R bool] [-r string] [-L bool] [-s int] [-S bool] [-z int] [-T double] [-t string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]

DESCRIPTION

This program will calculate the k rank-approximate-nearest-neighbors of a set of points. You may specify a separate set of reference points and query points, or just a reference set which will be used as both the reference and query set. You must specify the rank approximation (in %) (and optionally the success probability).

For example, the following will return 5 neighbors from the top 0.1% of the data (with probability 0.95) for each point in 'input.csv' and store the distances in 'distances.csv' and the neighbors in 'neighbors.csv.csv':

$ krann --reference_file input.csv --k 5 --distances_file distances.csv --neighbors_file neighbors.csv --tau 0.1

Note that tau must be set such that the number of points in the corresponding percentile of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point -- this is invalid and the program will terminate with an error message.

The output matrices are organized such that row i and column j in the neighbors output file corresponds to the index of the point in the reference set which is the i'th nearest neighbor from the point in the query set with index j. Row i and column j in the distances output file corresponds to the distance between those two points.

OPTIONAL INPUT OPTIONS

--alpha (-a) [double]: The desired success probability. Default value 0.95.
--first_leaf_exact (-X) [bool]: The flag to trigger sampling only after exactly exploring the first leaf.
--help (-h) [bool]: Default help info.
--info [string]: Get help on a specific module or option. Default value ''.
--input_model_file (-m) [unknown]: Pre-trained kNN model. Default value ''.
--k (-k) [int]: Number of nearest neighbors to find. Default value 0.
--leaf_size (-l) [int]: Leaf size for tree building (used for kd-trees, UB trees, R trees, R* trees, X trees, Hilbert R trees, R+ trees, R++ trees, and octrees). Default value 20.
--naive (-N) [bool]: If true, sampling will be done without using a tree.
--query_file (-q) [string]: Matrix containing query points (optional). Default value ''.
--random_basis (-R) [bool]: Before tree-building, project the data onto a random orthogonal basis.
--reference_file (-r) [string]: Matrix containing the reference dataset. Default value ''.
--sample_at_leaves (-L) [bool]: The flag to trigger sampling at leaves.
--seed (-s) [int]: Random seed (if 0, std::time(NULL) is used). Default value 0.
--single_mode (-S) [bool]: If true, single-tree search is used (as opposed to dual-tree search.
--single_sample_limit (-z) [int]: The limit on the maximum number of samples (and hence the largest node you can approximate). Default value 20.
--tau (-T) [double]: The allowed rank-error in terms of the percentile of the data. Default value 5.
--tree_type (-t) [string]: Type of tree to use: 'kd', 'ub', 'cover', 'r', 'x', 'r-star', 'hilbert-r', 'r-plus', 'r-plus-plus', 'oct'. Default value 'kd'.
--verbose (-v) [bool]: Display informational messages and the full list of parameters and timers at the end of execution.
--version (-V) [bool]: Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

--distances_file (-d) [string]: Matrix to output distances into. Default value ''.
--neighbors_file (-n) [string]: Matrix to output neighbors into. Default value ''.
--output_model_file (-M) [unknown]: If specified, the kNN model will be output here. Default value ''.

ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.

18 November 2018

mlpack-3.0.4