SWISH-LIBRARY(1) | SWISH-E Documentation | SWISH-LIBRARY(1) |
SWISH-LIBRARY - Interface to the Swish-e C library
The C library in an interface to the Swish-e search code. It provides a way to embed Swish-e into your applications. This API is based on Swish-e version 2.3.
Note: This is a NEW API as of Swish-e version 2.3. The C language interface has changed as has the perl interface to Swish-e. The new Perl interface is the SWISH::API module and is included with the Swish-e distribution. The old SWISHE perl module has been rewritten to work with the new API. The SWISHE perl module is no longer included with the Swish-e distribution, but can be downloaded from the Swish-e web site.
The advantage of the library is that the index files or files can be opened one time and many queries made on the open index. This saves the startup time required to fork and run the swish-e binary, and the expensive time of opening up the index file. Some benchmarks have shown a three fold increase in speed.
The downside is that your program now has more code and data in it (the index tables can use quite a bit of memory), and if a fatal error happens in swish it will bring down your program. These are things to think about, especially if embedding swish into a web server such as Apache where there are many processes serving requests.
The best way to learn about the library is to look at two files included with the Swish-e distribution that make use of the library.
To build and run libtest chdir to the src directory and run the commands:
$ make libtest $ ./libtest [optional name of index file]
You will be prompted for the search words. The default index used is index.swish-e. This can be overridden by placing a list of index files in a quote-protected string.
$ ./libtest 'index1 index2 index3'
The Swish-e library is installed when you run "make install" when building Swish-e. No extra installation steps are required.
The library consists of a header file "swish-e.h" and a library "libswish-e.*" that can either be a static or shared library depending on your platform.
When you first attach to an index file (or index files) you are returned a "swish handle". From the handle you create one or more "search objects" which holds the parameters to query the index, such as the query string, sort order, search phrase delimiter, limit parameters and HTML structure bits. The "object" is really just a pointer to a C structure, but it's helpful to think of it as an object that data and functionality associated with it.
The search object is used to query the index. A query returns a "results object". The results object holds the number of hits, the parsed query per index, and the result set. The results object keeps track of the current position in the result set. You may "seek" to a specific record within the result set (useful for displaying a page of results).
Finally, a result object represents a single result from the result list. A result object provides access to the result's properties (such as file name, rank, etc.).
In addition to results, there are functions available to access the header values stored in the index file, functions to check and report errors, and a few utility functions.
Below is the list of available function included in the Swish-e C language API.
These functions (and typedefs) are defined in the swish-e.h header file. The common objects (e.g. structures) used are:
SW_HANDLE - swish handle that associates with an index file SW_SEARCH - search "object" that holds search parameters SW_RESULTS - results "object" that holds a result set SW_RESULT - a single result used for accessing the result's properties SW_FUZZYWORD - used for fuzzy (stemming) word conversion
Searching
SW_HANDLE myhandle; myhandle = SwishInit("file1.idx");
Typically you will open a handle at the beginning of your program and use it to make multiple queries on an index.
This function will always return a swish handle. You must check for errors, and on error free the memory used by the handle, or abort.
Here's an example of aborting:
SW_HANDLE swish_handle; swish_handle = SwishInit("file1.idx file2.idx"); if ( SwishError( swish_handle ) ) SwishAbortLastError( swish_handle );
And here's an example of catching the error:
SW_HANDLE swish_handle; swish_handle = SwishInit("file1.idx file2.idx"); if ( SwishError( swish_handle ) ) { printf("Failed to connect to swish. %s\n", SwishErrorString( swish_handle ) ); SwishClose( swish_handle ); /* free the memory used */ return 0; }
You may have more than one handle active at a time.
Swish-e will not tell you if the index file changes on disk (such as after reindexing). In a persistent environment (e.g. mod_perl) the calling program should check to see if the index file has changed on disk. A common way to do this is to store the inode number before opening the index file(s), and then stat the file name every so often and reopen the index files if the inode number changes.
Unlike the other settings on the search object, once you run a query on the search object you must call SwishResetSearchLimit() to change or clear the limit parameters.
You may free the search object before freeing and generated results objects.
You should always check for errors after calling SwishExecute().
You should always check for errors after calling
SwishQuery().
Reading Results
The "index_name" is the name of the index supplied in the SwishInit() function call.
Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a char **. See src/libtest.c for an example of accessing the strings in this list, but in general you may cast this to a (char **).
Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a char **. See src/libtest.c for an example of accessing the strings in this list, but in general you may cast this to a (char **).
Returns the position or a negative number on error.
The result object returned does not need to be freed after use (unlike the swish handle, search object, and results object).
Aborts if called with a NULL SW_RESULT object
Returns a string value of the specified property.
Returns the empty string "" if the current result does not have the specified property assigned.
Returns the string "(null)" on invalid property name (i.e. property name is not defined in the index) and sets an error (see below) indicating the invalid property name.
The string returned does not need to be freed, but is only valid for the current result. If you wish to save the string you must copy it locally.
Dates are formatted using the hard-coded format string: "%Y-%m-%d %H:%M:%S" in localtime.
Swish-e will abort if called with a NULL SW_RESULT object. Without the SW_RESULT object swish-e cannot set any error codes.
On error returns UMAX_LONG. This is commonly defined in limits.h. Check SwishError() (see below) for the type of error.
If SwishError() returns false (zero) then it simply means that this result does not have any data for the specified property.
If SwishError() returns true (non-zero) then either the propertyname specified is invalid, or the property requested is not a numeric (or date) property (e.g. it's a string property).
See below on how to fetch the specific error message when SwishError() is true.
Swish-e will abort if called with a NULL SW_RESULT object. Propertyname is the name of the property. ID is the id number of the property, if known. ID is not normally used in the API, but it's purpose is to avoid looking up the property ID for every result displayed.
The return PropValue is a structure that contains a flag to indicate the type, and a union that holds the property value. They flags and structure are defined in swish-e.h.
The property must be copied locally and the returned "PropValue" value must be freed by calling freeResultPropValue() to avoid a memory leak.
On error returns NULL. Check SwishError() (see below) for the type of error.
If returns NULL but SwishError() returns false (zero) then it simply means that this result does not have any data for the specified property.
If SwishError() returns true (non-zero) then the property name specified is invalid (i.e. not defined for the index).
See below on how to fetch the specific error message when SwishError() is true.
See perl/API.xs for an example on using this function.
Accessing the Index Header Values
Each index file has associated header values that describe the index. These functions provide access to this data. The header data is returned as a union SWISH_HEADER_VALUE, and a pointer to a SWISH_HEADER_TYPE is passed in and the returned value indicates the type of data that is returned. See src/libtest.c and perl/API.xs for examples.
See src/libtest.c and perl/API.xs for examples.
Accessing Property Meta Data
In addition to the pre-defined standard properties, you have the option of adding additional "meta" properties to be indexed and/or added to the list of properties returned with each result. Consult the sections on the MetaNames and PropteryNames directives in the CONFIGURATION FILE for an explanation of how to do this.
These functions provide access to the meta data stored in an index. You can use them to determine what meta/property information is available for an index including all the pre-defined standard properties. See libtest.c for an example.
Checking for Errors
You should check for errors after all calls. The last error is stored in the swish handle object, and is only valid until the next operation (which resets the error flags).
Currently, some errors are flagged as "critical" errors. In these cases you should destroy (by calling the SwishClose() function ) the current swish handle. If you have other objects in scope (e.g. a search object or results object) destroy those first.
The types of errors that are critical can be seen in src/error.c. Currently the list includes:
Could not open index file Unknown index file format Index file(s) is empty Index file error Invalid swish handle Invalid results object
Utility Functions
This fuction may change in the future since only 8-bit chars can currently be used.
This can be used to convert a word to its stem. It uses only the original Porter Stemmer.
The fuzzy mode used during indexing is stored in the index file. Since each result is linked to a given index file this method allows stemming a word based on it's index file.
One possible use for this is to highlight search terms in a document summary, which would be based on a given result.
The methods below can be used to access the data returned. The SW_FUZZYWORD object must be freed when done to avoid a memory leak.
Here's an example:
SW_FYZZYWORD fuzzy_word = SwishFuzzyWord( result ); const char **word_list = SwishFuzzyWordList( fuzzy_word ); while ( *word_list ) { printf("%s\n", *word_list ); word_list++; } SwishFuzzyWordFree( fuzzy_word );
If the stemmer does not convert the string (for example attempting to stem numeric data) the word_list will contain the original word. To tell if the stemmer actually stemmed the word check the return value with SwishFuzzyWordError().
Not all stemmers set this value correctly. But since SwishFuzzyWordList() will return a valid string regardless of the return value, you can often just ignore this setting. That's what I do.
This is normally just one, but in the case of DoubleMetaphone it can be one or two (i.e. DoubleMetaphone can return one or two strings).
Please report bug reports to the Swish-e discussion group. Feel also free to improve or enhance this feature.
Original interface: Aug 2000 Jose Ruiz jmruiz@boe.es
Updated: Aug 22, 2002 - Bill Moseley
Interface redesigned for Swish-e version 2.3 Oct 17, 2002 - Bill Moseley
$Id: SWISH-LIBRARY.pod 1906 2007-02-07 19:25:16Z moseley $
.
2009-04-04 | 2.4.7 |