Bloom Filter [Systems Design Sundays]
Reducing the load on your system for complex queries.
Imagine you had a very large data set. Many many members, with each member being very complex. Maybe your social media platform took off, and you have millions of users with funky usernames (and even more complex user IDs).
Now every time you want to check whether a new ID is a part of the set, what would you do? LinkedList and Arrays would be too slow. Tries would need to be constantly updated and balanced, making them problematic. Why not try Hash maps/sets? Everyone always says that they have O(1) insertion, lookup, etc.
However, think back to the piece on hashing we did. Remember that hashes are implemented on top of arrays. If you had to access the underlying storage every time, your costs would shoot through the roof. Remember keeping lots of data stored is cheap. Loading it to work on is much costlier. So how can we test for membership w/o killing our financials? That is what we will learn about.
Fruit flies use a modified version of Bloom filters to detect novelty of odors, with additional features including similarity of novel odor to that of previously experienced examples, and time elapsed since previous experience of the same odor.
- This is Wikipedia’s first example of the use of Bloom Filter.
What are bloom filters- Bloom filters are probabilistic data structures meant to test for membership in an efficient manner.
How they work (insertion)- Have a bit array of size m (all values initialized to 0). Have k hash functions, all of which map to index values in the array (go from 0-(m-1) ). Given an input, run it through all the hash functions. This gives us k different values. Go over the array, and set all the indices for 1.
Querying- Take input, take it through our k functions. If any of the indices are at value 0, then we haven’t seen this input.
False Positives/Negatives- Bloom filters are probabilistic. They can tell you if an element is probably in the storage. Sometimes they will say yes to elements that don’t exist. They have no false negatives. They will never say an element DNE if it does.
Regeneration- As we start adding more values, the rate of False Positives will go up. Eventually, every query will return true. Once our filter fills up beyond a point, it makes sense to create a bigger one/replace it. Fun fact- a filter of any fixed size can be used for an infinite number of elements. Don’t do this, but keep this as trivia.
Storage- Bloom Filters don’t actually store the elements in the database. You need something else for it. They just make the process of searching much cheaper. The tradeoff is that we will
Bloom Filters are one of the most important data structures being implemented right now. There are many variants of the filter, each specialized for different tasks. For example, Content Delivery Networks use them to optimize the storage used in the cache.
If you have enjoyed this post so far, please make sure you like it (the little heart button in the email/post). I also have a special request for you.
This newsletter has received a lot of love. If you haven’t already, I would really appreciate it if you could take 5 seconds to let Substack know that they should feature this publication on their pages. This will allow more people to see the newsletter.
There is a simple form in Substack that you can fill up for it. Here it is. Thank you.
To get your Substack URL, follow the following steps-
Open - https://substack.com/
If you haven’t already, log in with your email.
In the top right corner, you will see your icon. Click on it. You will see the drop-down. Click on your name/profile. That will show you the link.
You will be redirected to your URL. Please put that in to the survey. Appreciate your help.
In the comments below, share what topic you want to focus on next. I’d be interested in learning and will cover them. To learn more about the newsletter, check our detailed About Page + FAQs
If you liked this post, make sure you fill out this survey. It’s anonymous and will take 2 minutes of your time. It will help me understand you better, allowing for better content.
I’ll see you living the dream.
Go kill all and Stay Woke,
To make sure you get the most out of System Design Sundays, make sure you’re checking in the rest of the days as well. Leverage all the techniques I have discovered through my successful tutoring to easily succeed in your interviews and save your time and energy by joining the premium subscribers down below. Get a discount (for a whole year) using the button below
Reach out to me on:
Message me on Twitter: https://twitter.com/Machine01776819
My LinkedIn: https://www.linkedin.com/in/devansh-devansh-516004168/
Read my articles: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Get a free stock on Robinhood. No risk to you, so not using the link is losing free money: https://join.robinhood.com/fnud75