So... what is multi-modal AI? And why is the internet losing their mind about it? [Math Mondays]
For the first time in forever, the hype around GPT somewhat matches the development
Hey, it’s your favorite cult leader here 🐱👤
Mondays are dedicated to theoretical concepts. We’ll cover ideas in Computer Science💻💻, math, software engineering, and much more. Use these days to strengthen your grip on the fundamentals and 10x your skills 🚀🚀.
To get access to all the articles and support my crippling chocolate milk addiction, consider subscribing if you haven’t already!
p.s. you can learn more about the paid plan here.
If you went to LinkedIn over last week/2 weeks, you were probably inundated by people losing their minds over GPT integrating multi-modality into its capabilities. Normally, I would take some time to tell you that this is another example of the hype machine working overtime to sell you another fundamentally useless idea.
Well, this time is different. Multi-modality is a genuinely powerful development, one that does warrant the attention that it is receiving. In this article, I will give you a quick introduction to multi-modality, why it’s a big deal for AI Models, and some problems it can come with (remember, nothing is a silver bullet).
What is multi modal AI- Simply put, multi-modal AI refers to AI that integrates multiple types of data (multiple modalities of information). Traditionally, we develop language models for language, acoustic models for sound, statistical models for tabular data etc. Multi-modal models are trained with a mixture of these inputs in the same training process. This is typically done by running the input through embedding models that will create vector representation of your data in a common n-dimensional space
Why multimodality is a big deal- Instead of getting all mathy, I want you to go outside right now. Take a walk. Now look at the sky and imagine that you had a jet pack. Think of how many more paths you could take- even if you stayed in the same geographic area. Multi-modality adds another dimension to your data- allowing your model to sample from a search space that is an order of magnitude greater. In our walking example, we went from x^2 possible points to hit to x^3 points. When introducing their multi-modal AI infrastructure Pathways, Google wrote the following-
The danger of multi-modality- While multi-modality is great for model performance, it doesn’t really address the more fundamental issues with GPT and LLMs that hinder their larger scale adoption. The increased search space rives up costs, multi-modality doesn’t really impact hallucination, and many of the problems with unreliability and fragility still persist. I still maintain that many of the use-cases that people hype around these models can be better addressed by simpler technologies. Don’t let the shiny new thing distract you from the fundamentals.
Overall, multi-modality is really cool. It enables all kinds of applications in compression, data annotation, labeling etc. This might be a bit of a heretical take, but I'm personally more excited by multi-modal embeddings than I am by the multi-modal AI models themselves. I might be the only one here, but I just see more utility in developing better embeddings than I do with building better models. That being said, in the right circumstances integrating multi-modal capabilities into your AI Models can definitely be a big dub.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
Save the time, energy, and money you would burn by going through all those videos, courses, products, and ‘coaches’ and easily find all your needs met in one place at ‘Tech Made Simple’! Stay ahead of the curve in AI, software engineering, and the tech industry with expert insights, tips, and resources. 20% off for new subscribers by clicking this link. Subscribe now and simplify your tech journey!
Using this discount will drop the prices-
800 INR (10 USD) → 640 INR (8 USD) per Month
8000 INR (100 USD) → 6400INR (80 USD) per year (533 INR /month)
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819