How Vector Databases enable Generative AI [Math Mondays]
Why Vector DBs are gaining so much attention in today's markets
Hey, it’s your favorite cult leader here 🐱👤
Mondays are dedicated to theoretical concepts. We’ll cover ideas in Computer Science💻💻, math, software engineering, and much more. Use these days to strengthen your grip on the fundamentals and 10x your skills 🚀🚀.
To get access to all the articles and support my crippling chocolate milk addiction, consider subscribing if you haven’t already!
p.s. you can learn more about the paid plan here.
Here’s a crazy statistic- Vector Database startups have raised over 350 Million USD to enable the next generation of Gen AI products.
This might have you scratching your head. After all, Vector DBs are not nearly as mainstream as other kinds of Databases. Many people in the software space haven’t even heard about them. So what’s got the whole field buzzing about them- with investors dropping 100 Million on just one startup (Pinecone, which has a valuation of 750 Million)?
In today’s edition of Tech Made Simple, we will be covering the reason why Vector DBs are taking over the headlines in the Database sphere. We will explore what Vector DBs are, how they work, and why they synergize so well with AI.
Vector databases have many use cases across different domains and applications that involve natural language processing (NLP), computer vision (CV), recommendation systems (RS), and other areas that require semantic understanding and matching of data.
-Microsoft
Understanding Vector DBs
What is a Vector DB- Simply put, Vector DBs store Vectors (shocker I know). This might seem trivial at first, but this is more powerful than you’d realize. If you’d like a full definition, here is a quote from Microsoft’s writeup on this topic- A vector database is a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. ... The vectors are usually generated by applying some kind of transformation or embedding function to the raw data, such as text, images, audio, video, and others. The embedding function can be based on various methods, such as machine learning models, word embeddings, feature extraction algorithms.
Why Vectors are so important for AI- To answer this question, let’s first understand how Gen AI works. To grossly oversimplify a complex system, Gen AI models like ChatGPT work in three steps-
The datasets are encoded into a latent space (we call these embeddings).
The latent space is used to train the data.
ChatGPT then uses this encoded space to process your query. The query is fed into the latent space. The AI traverses the latent space to find the best outputs.
This is where we see Vector DBs being very useful. The vectors stored in our Vector DB are not random values, but the latent space embeddings of our data(we’ve got bigger plot twists than Game of Thrones here). The specialization of Vector DBs to handling Vectors makes them the KDB to the Haaland of LLMs/other giant Gen AI models.
How Vector Databases work- Here is your Vector DBs for dummies crash course-
To use Vector DBs, we need the Vectors we will insert. We generate these Vectors by using AI to create vector embeddings for the data we want to index into our DB. The AI used is called our Embedding Model (EM).
The vector embeddings are inserted into our vector database. Generally, you’d want to keep some reference to the original content the embedding was created from to make help your embeddings stand out and improve performance when we want to search through our DB.
When our application queries, we use the same EM to create embeddings for the query and use those embeddings to query the database for similar vector embeddings. When it comes to Gen AI like ChatGPT, we tack on another layer to this- the model uses these similarity computations to compute the most likely next word. This in essence is also why ChatGPT hallucinates- it will eventually pick probable words/sentences that are actually untrue. Notice this has nothing to do with stale data or incorrect data in the samples (as some claim)- this is fundamentally tied to the architecture. If you want to learn more about Why LLMs Hallucinate, we have a breakdown of this on the sister publication- AI Made Simple over here.
Ultimately, Vector DBs add a level of flexibility not found in traditional databases. When I worked on changing English statements (written by business users) to SQL queries that had to be executed (potentially joining multiple tables), one thing I quickly learned was that AI could only do so much. I was able to build a somewhat working prototype by instead using a relatively basic AI (compared to the monstrosities we see these days) and focusing all of my efforts on restructuring the datasets in ways that made it easier for AI to interact with the datasets. Vector DBs take that Principle, and raise millions of dollars to crank it up to 11.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
Save the time, energy, and money you would burn by going through all those videos, courses, products, and ‘coaches’ and easily find all your needs met in one place at ‘Tech Made Simple’! Stay ahead of the curve in AI, software engineering, and the tech industry with expert insights, tips, and resources. 20% off for new subscribers by clicking this link. Subscribe now and simplify your tech journey!
Using this discount will drop the prices-
800 INR (10 USD) → 640 INR (8 USD) per Month
8000 INR (100 USD) → 6400INR (80 USD) per year (533 INR /month)
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
Interesting read. I came across pinecone a few days before and want to explore this DB.