How to ARCHITECT a search engine like Google Search[System Design Sundays]
Summarizing one of the best writeups on it online?
If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here.
To learn more about the newsletter, check our detailed About Page + FAQs
To help me understand you better, please fill out this anonymous, 2-min survey. If you liked this post, make sure you hit the heart icon in this email.
Recommend this publication to Substack over here
Take the next step by subscribing here
Everyone wants to be a software engineer at Google.
Today I will go a step beyond. I will teach you how to be the software engineer of Google :))
I came across a wonderful post about this topic by the exceptional, AI Edge newsletter. Damien is a very new writer on Substack, but his content is god-tier. If you are interested in learning about Machine Learning System Design, make sure you subscribe to it. It is one of the newsletters I recommend on my other Substack Publication, AI Made Simple. Damien’s post is extremely detailed, so I will cover some concepts that stood out to me. If you want to learn more about this topic, I will link his original post down below. You can go check it out.
There are more than 100B indexed websites and ~40K searches per second. The system needs to be fast, scalable and needs to adapt to the latest news!
-An idea of the scale of this system.
Key Principles
Good Cakes have layers- Far too often, the temptation while designing systems is to spend a lot of time and resources trying to engineer the perfect solution for individual tasks. This is very costly and time-consuming. You’re much better off layering multiple smaller solutions onto each other to create a bigger system. As Damien explores the possible architecture of Google, you will see multiple examples where he talks of combining smaller systems for better results. This is generally accomplished by having simpler filtering mechanisms to run a query through a large corpus of search documents, and then using a more complex system to create the final result.
Handling Multi-Modal Search- Google search consists of far more than images. We have images, text, gifs, and videos (in multiple languages). How can you build an engine to accommodate all of this? How can we compare the difference in the quality of search output for the various kinds of search results? One easy solution is to use User Engagement. Using user engagement to train a Universal Search Aggregator and your other models would work well. If I had to make a prediction for the future, my guess is that Google is working on transcribing the various outputs into a common latent space. This would allow them to integrate YouTube, into their search results more heavily. As of now, the knowledge on YouTube has been largely untouched (videos are linked in search results but that’s it).
On the labeling of data- Damien rightly points out that manually labeling data at the scale of Google is not practical. I think they handle this problem using semi-supervised labeling. This approach can be used to handle lots of unlabeled data, and it tends to work very well. The image below is one such example of Google Researchers using SSL to hit peak performance.
These were the important principles that stood out to me in Damien’s write-up. Now go ahead and read his breakdown of the Search System. It is fairly dense, but it’s worth it (especially if you are into Machine Learning).
I created Technology Made Simple using new techniques discovered through tutoring multiple people in top tech firms. The newsletter is designed to help you succeed, saving you from hours wasted on mediocre resources or on the Leetcode grind. Easily find your needs met in one place. I have a 100% satisfaction policy, so you can try it out at no risk to you. Use the button below to get 20% off for up to a whole year. Using this discount will drop the prices-
800 INR (10 USD) → 533 INR (8 USD) per Month
8000 INR (100 USD) → 6400INR (80 USD) per year
In the comments below, share what topic you want to focus on. I’d be interested in learning and will cover them. To learn more about the newsletter, check our detailed About Page + FAQs
If you liked this post, make sure you fill out this survey. It’s anonymous and will take 2 minutes of your time. It will help me understand you better, allowing for better content.
https://forms.gle/XfTXSjnC8W2wR9qT9
If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here.
Stay Woke,
Go kill all,
Devansh <3
Reach out to me on:
Instagram: https://www.instagram.com/iseethings404/
Message me on Twitter: https://twitter.com/Machine01776819
My LinkedIn: https://www.linkedin.com/in/devansh-devansh-516004168/
My content:
Read my articles: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd