bloom package - github.com/yourbasic/bloom - Go Packages
Package bloom provides a Bloom filter implementation.
Bloom filters ¶
A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership.
A membership test returns either ”likely member” or ”definitely not a member”. Only false positives can occur: an element that has been added to the filter will always be identified as ”likely member”.
The probabilities of different outcomes of a membership test at a false-positives rate of 1/100 are:
Test(s) true false -------------------------------------- s has been added 1 0 s has not been added 0.01 0.99
Elements can be added, but not removed. With more elements in the filter, the probability of false positives increases.
Performance ¶
A full filter with a false-positives rate of 1/p uses roughly 0.26ln(p) bytes per element and performs ⌈1.4ln(p)⌉ bit array lookups per test:
p bytes lookups
-------------------------
4 0.4 2
8 0.5 3
16 0.7 4
32 0.9 5
64 1.1 6
128 1.3 7
256 1.5 8
512 1.6 9
1024 1.8 10
Each membership test makes a single call to a 128-bit hash function. This improves speed without increasing the false-positives rate as shown by Kirsch and Mitzenmacher.
Limitations ¶
This implementation is not intended for cryptographic use.
The internal data representation is different for big-endian and little-endian machines.
Typical use case ¶
The Basics example contains a typcial use case: a blacklist of shady websites.
Build a blacklist of shady websites.
package main
import (
"fmt"
"github.com/yourbasic/bloom"
)
func main() {
// Create a Bloom filter with room for 10000 elements
// at a false-positives rate less than 0.5 percent.
blacklist := bloom.New(10000, 200)
// Add an element to the filter.
url := "https://rascal.com"
blacklist.Add(url)
// Test for membership.
if blacklist.Test(url) {
fmt.Println(url, "seems to be shady.")
} else {
fmt.Println(url, "has not yet been added to our blacklist.")
}
}
Output: https://rascal.com seems to be shady.
This section is empty.
This section is empty.
This section is empty.
Filter represents a Bloom filter.
New creates an empty Bloom filter with room for n elements at a false-positives rate less than 1/p.
Add adds s to the filter and tells if s was already a likely member.
AddByte adds b to the filter and tells if b was already a likely member.
Count returns an estimate of the number of elements in the filter.
Test tells if s is a likely member of the filter. If true, s is probably a member; if false, s is definitely not a member.
TestByte tells if b is a likely member of the filter. If true, b is probably a member; if false, b is definitely not a member.
Union returns a new Bloom filter that consists of all elements that belong to either f1 or f2. The two filters must be of the same size n and have the same false-positives rate p.
The resulting filter is the same as the filter created from scratch using the union of the two sets.
Compute the union of two filters.
package main
import (
"fmt"
"github.com/yourbasic/bloom"
"strconv"
)
func main() {
// Create two Bloom filters, each with room for 1000 elements
// at a false-positives rate less than 1/100.
n, p := 1000, 100
f1 := bloom.New(n, p)
f2 := bloom.New(n, p)
// Add "0", "2", …, "498" to f1
for i := 0; i < n/2; i += 2 {
f1.Add(strconv.Itoa(i))
}
// Add "1", "3", …, "499" to f2
for i := 1; i < n/2; i += 2 {
f2.Add(strconv.Itoa(i))
}
// Compute the approximate size of f1 ∪ f2.
fmt.Println("f1 ∪ f2:", f1.Union(f2).Count())
}
Output: f1 ∪ f2: 505