Featured image of post Implementing a bloom filter in C#

Implementing a bloom filter in C#

How to create a basic bloom filter in c#

In our last article, we discussed some common probabilistic data structures and their use cases. Today we’re going to follow up on that post and show how to implement a simple bloom filter in C# to provide extra learning by example.

Bloom Filter Recap

In case you didn’t read our last article, here’s a TLDR of bloom filters. A Bloom filter is a space-efficient, probabilistic data structure used to determine whether an element is a member of a set. It allows for false positives but guarantees no false negatives, meaning it may mistakenly report that an element is in the set when it isn’t, but never the opposite. By adjusting the size of the underlying bit array and the number of hash functions, you can control the false positive rate.

Building in C#

Step 1: Create the App

We’re going to do this as a console app. Run the following command in your terminal to get started

1
dotnet new console -n bloom

Step 2: Add the bloom filter class.

First, we’ll define the BloomFilter class with the following attributes:

  • bitArray: A BitArray to store the data
  • size: The size of the BitArray
  • hashFunctionsCount: The number of hash functions we’ll use
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
using System;
using System.Collections;

public class BloomFilter
{
    private BitArray bitArray;
    private int size;
    private int hashFunctionsCount;

    public BloomFilter(int size, int hashFunctionsCount)
    {
        this.size = size;
        this.hashFunctionsCount = hashFunctionsCount;
        bitArray = new BitArray(size);
    }
}

Step 3: Implement a hash function

We’ll use the built-in GetHashCode() method to create multiple hash functions. We’ll vary the hash functions by using a different seed value for each. Add the following method to the Bloomfilter class

1
2
3
4
5
private int HashFunction(string item, int seed)
{
    int hash = item.GetHashCode() ^ seed;
    return Math.Abs(hash) % size;
}

Step 4: Add elements to the Bloom Filter

To add an element to the Bloom Filter, we’ll use the Add() method. We’ll compute the hash value for each hash function and set the corresponding bit in the bitArray. Add the following method to the Bloomfilter class

1
2
3
4
5
6
7
8
public void Add(string item)
{
    for (int i = 0; i < hashFunctionsCount; i++)
    {
        int hashValue = HashFunction(item, i);
        bitArray.Set(hashValue, true);
    }
}

Step 5: Check for the presence of elements in the Bloom Filter

To check if an element is present in the Bloom Filter, we’ll use the Contains() method. We’ll compute the hash value for each hash function and check the corresponding bit in the bitArray. If all bits are set to true, the element is probably in the set. Add the following method to the Bloomfilter class

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public bool Contains(string item)
{
    for (int i = 0; i < hashFunctionsCount; i++)
    {
        int hashValue = HashFunction(item, i);
        if (!bitArray.Get(hashValue))
        {
            return false;
        }
    }

    return true;
}

Step 6: Test it out!

A complex sounding data structure is really simple! Now we can test it out. Add the following code to your Program.cs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public static void Main(string[] args)
{
    BloomFilter bloomFilter = new BloomFilter(1000, 3);

    // Add items
    bloomFilter.Add("apple");
    bloomFilter.Add("banana");
    bloomFilter.Add("grape");

    // Check if items are present
    Console.WriteLine(bloomFilter.Contains("apple")); // Output: True
    Console.WriteLine(bloomFilter.Contains("banana")); // Output: True
    Console.WriteLine(bloomFilter.Contains("grape")); // Output: True
    Console.WriteLine(bloomFilter.Contains("orange")); // Output: False (or True with a small probability)
}

Keep in mind that Bloom Filters can have false positives but never false negatives. You can control the false positive rate by adjusting the size of the bitArray and the number of hash functions.

We hope this simple walk through helps break down a complex idea into simple steps for you. Let us know how it goes when you try it out!

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy