In our last article, we discussed some common probabilistic data structures and their use cases. Today we’re going to follow up on that post and show how to implement a simple bloom filter in C# to provide extra learning by example.
Bloom Filter Recap
In case you didn’t read our last article, here’s a TLDR of bloom filters. A Bloom filter is a space-efficient, probabilistic data structure used to determine whether an element is a member of a set. It allows for false positives but guarantees no false negatives, meaning it may mistakenly report that an element is in the set when it isn’t, but never the opposite. By adjusting the size of the underlying bit array and the number of hash functions, you can control the false positive rate.
Building in C#
Step 1: Create the App
We’re going to do this as a console app. Run the following command in your terminal to get started
|
|
Step 2: Add the bloom filter class.
First, we’ll define the BloomFilter class with the following attributes:
- bitArray: A BitArray to store the data
- size: The size of the BitArray
- hashFunctionsCount: The number of hash functions we’ll use
|
|
Step 3: Implement a hash function
We’ll use the built-in GetHashCode() method to create multiple hash functions. We’ll vary the hash functions by using a different seed value for each. Add the following method to the Bloomfilter
class
|
|
Step 4: Add elements to the Bloom Filter
To add an element to the Bloom Filter, we’ll use the Add() method. We’ll compute the hash value for each hash function and set the corresponding bit in the bitArray. Add the following method to the Bloomfilter
class
|
|
Step 5: Check for the presence of elements in the Bloom Filter
To check if an element is present in the Bloom Filter, we’ll use the Contains() method. We’ll compute the hash value for each hash function and check the corresponding bit in the bitArray. If all bits are set to true, the element is probably in the set. Add the following method to the Bloomfilter
class
|
|
Step 6: Test it out!
A complex sounding data structure is really simple! Now we can test it out. Add the following code to your Program.cs
|
|
Keep in mind that Bloom Filters can have false positives but never false negatives. You can control the false positive rate by adjusting the size of the bitArray and the number of hash functions.
We hope this simple walk through helps break down a complex idea into simple steps for you. Let us know how it goes when you try it out!