Concepts
Addok is a geocoder. It allows to look for structured geographical documents from an unstructured string.
Data
Addok accepts batch loading of geo-related documents like addresses. At least latitude and longitude are required for each indexed document. Even once loaded, data should not be considered as your reference but versatile, each reindexation requires initial source files to be loaded again.
By default, a Redis database stores indexes while another one is storing raw documents. All data stored into Redis means it is stored into memory. You can install a plugin to keep documents within another database engine (SQLite or PostgreSQL for instance) to save memory.
Indexation
Each document is computed through many processors defined in your configuration file. There are two main steps to handle a document: strings preparation and indexes computation.
The document is split into tokens (by default words but can be trigrams when using addok-trigrams plugin). Each token will became a Redis sorted set storing the list of documents containing that token. Filters and geographical properties will also be stored as Redis sorted sets. A search query consists of intersections of these defined sets.
Search
Search is a three-steps process: first we clean and put into tokens the query (with same processors as during indexation), then we try to find all candidates for a given query and finally we iterate to order results by relevance.
Through heuristics, we try to find a reasonable number of candidates (about 100) dealing with noise, typos and wrong input. Once the candidates are retrieved, they are ordered mainly by string comparisons.
Documents importances and geographical positions may also be taken into account. Additionally, a query can be explicitly filtered by the issuer based on documents’ fields to restrain the number of potential results.
HTTP API
Addok provides an API to query the indexed data via HTTP. It has been developed with performance as a key constraint. By default, it serves results as flat GeoCodeJSON.
The API has three entry points by default but you can extend it. One is to perform a search query, the second is about reverse geocoding (see below) and the last one is for retrieving a document.
You can perform a search query with a geographical bias, boosting candidates around a given location. Besides, it allows for reverse geocoding: from a location to the closest known address for instance.
Hacking
A custom binary launches a shell interpreter with a couple of useful commands to debug and understand how it works. For instance, you can explain a given result, shows autocomplete results for a given token, inspect how a string is put into tokens and so on. Oh, and of course, perform a search!
Even if Addok focuses on the particular problem of addresses — trying to do one job and to (hopefully) do it well — it has been developed with extensibility in mind. You can enrich it for your own use with plugins and/or API entry points.
See also this presentation for more details.