DOKK / manpages / debian 12 / simstring-bin / simstring.1.en
SIMSTRING(1) General Commands Manual SIMSTRING(1)

simstring - build database and find similar words

simstring [OPTIONS]

This utility finds strings in the database (DB) such that they have similarity, in the similarity measure (SIM), no smaller than the threshold (TH) with queries read from STDIN. When -b (--build) option is specified, this utility builds a database (DB) for strings read from STDIN.

These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. For a complete description, see the Info files.

build a database for strings read from STDIN
specify a database file
use Unicode (wchar_t) for representing characters
specify the unit of n-grams (DEFAULT=3)
include marks for begins and ends of strings
pecify a similarity measure (DEFAULT='cosine'):

exact exact match
dice dice coefficient
cosine] cosine coefficient
jaccard jaccard coefficient
overlap overlap coefficient
specify the threshold (DEFAULT=0.7)
echo back query strings to the output
suppress supplemental information from the output
show benchmark result (retrieved strings are suppressed)
show this version information and exit
show summary of options and exit

/usr/share/doc/simstring-dev/examples

January 26, 2015