Bloom::Filter - Sample Perl Bloom filter implementation
A Bloom filter is a probabilistic algorithm for doing existence
tests in less memory than a full list of keys would require. The tradeoff to
using Bloom filters is a certain configurable risk of false positives. This
module implements a simple Bloom filter with configurable capacity and false
positive rate. Bloom filters were first described in a 1970 paper by Burton
Bloom, see
<http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal>.
use Bloom::Filter
my $bf = Bloom::Filter->new( capacity => 10, error_rate => .001 );
$bf->add( @keys );
while ( <> ) {
chomp;
print "Found $_\n" if $bf->check( $_ );
}
- new %PARAMS
- Create a brand new instance. Allowable params are
"error_rate",
"capacity".
- init
- Calculates the best number of hash functions and optimum filter length,
creates some random salts, and generates a blank bit vector. Called
automatically by constructor.
- capacity
- Returns the total capacity of the Bloom filter
- error_rate
- Returns the configured maximum error rate
- length
- Returns the length of the Bloom filter in bits
- key_count
- Returns the number of items currently stored in the filter
- on_bits
- Returns the number of 'on' bits in the filter
- salts
- Returns the list of salts used to create the hash functions
- add @KEYS
- Adds the list of keys to the filter. Will fail, return
"undef" and complain if the number of
keys in the filter exceeds the configured capacity.
- check @KEYS
- Checks the provided key list against the Bloom filter, and returns a list
of equivalent length, with true or false values depending on whether there
was a match.
- _calculate_shortest_filter_length CAPACITY ERR_RATE
- Given a desired error rate and maximum capacity, returns the optimum
combination of vector length (in bits) and number of hash functions to use
in building the filter, where "optimum" means shortest vector
length.
- _get_cells KEY
- Given a key, hashes it using the list of salts and returns an array of
cell indexes corresponding to the key.
Originally written by Maciej Ceglowski
<maciej@ceglowski.com>. Currently maintained by Grzegorz
Rożniecki <xaerxess@gmail.com>.
Dmitriy Ryaboy <dmitriy.ryaboy@ask.com> (big speedup in
February 2007, thanks!)
(c) 2004 Maciej Ceglowski
This is free software, distributed under version 2 of the GNU
Public License (GPL).