When Statistics Lie. Anscombe's Quartet [Math Mondays]
This is a mistake made by a lot of Data Scientists, Statisticians, Deep Learning Engineers, and Mathematicians.
To learn more about the newsletter, check our detailed About Page + FAQs
To help me understand you better, please fill out this anonymous, 2-min survey. If you liked this post, make sure you hit the heart icon in this email.
Recommend this publication to Substack over here
Take the next step by subscribing here
The most important stage in Data Analysis is called Data Exploration.
This involves looking the at data you have collected, studying it for clues about the underlying problem, and combining those clues with your prior domain knowledge to gain useful insights. These insights guide your next step.
Often, this is an iterative process. The end of this iteration leaves you with a Dataset that you can use to run more computationally expensive experiments. However, this is where a lot of people mess up. In this post/email, I will cover one of the most overlooked ideas in Data Science, Anscombe’s Quarter.
Key Highlights
What is it- Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points.
Why this happens- Often your data can have outliers and influential sections that skew the distributions towards them.
How you can deal with it- Using Visualization can help you spot patterns and trends. If your data has a lot of features, using high-dimensional visualization techniques will be a game-changer. I like using TSNE, which I’ve covered here
An interesting variant- This idea is so important that there are a few people dedicated to exploring variants. Below is my favorite variant, called Datasaurus Dozen. The 12 data distributions have very different shapes despite similar base stats. If you looked just at their statistics and nothing else, you would make the mistake of assuming that these were the same distributions.
If you are interested in it, especially from a more Machine Learning Perspective, then watch the video below. I cover the idea in more detail, talk about it’s relevant in AI and Machine Learning, etc. Don’t forget to hit that like button ::)
I created Technology Made Simple using new techniques discovered through tutoring multiple people into top tech firms. The newsletter is designed to help you succeed, saving you from hours wasted on the Leetcode grind. I have a 100% satisfaction policy, so you can try it out at no risk to you. You can read the FAQs and find out more here. Use the button below to get 20% off for upto a whole year.
Before proceeding, if you have enjoyed this post so far, please make sure you like it (the little heart button in the email/post). I also have a special request for you.
***Special Request***
This newsletter has received a lot of love. If you haven’t already, I would really appreciate it if you could take 5 seconds to let Substack know that they should feature this publication on their pages. This will allow more people to see the newsletter.
There is a simple form in Substack that you can fill up for it. Here it is. Thank you.
https://docs.google.com/forms/d/e/1FAIpQLScs-yyToUvWUXIUuIfxz17dmZfzpNp5g7Gw7JUgzbFEhSxsvw/viewform
To get your Substack URL, follow the following steps-
Open - https://substack.com/
If you haven’t already, log in with your email.
In the top right corner, you will see your icon. Click on it. You will see the drop-down. Click on your name/profile. That will show you the link.
You will be redirected to your URL. Please put that in to the survey. Appreciate your help.
In the comments below, share what topic you want to focus on. I’d be interested in learning and will cover them. To learn more about the newsletter, check our detailed About Page + FAQs
If you liked this post, make sure you fill out this survey. It’s anonymous and will take 2 minutes of your time. It will help me understand you better, allowing for better content.
https://forms.gle/XfTXSjnC8W2wR9qT9
I see you living the dream.
Go kill all and Stay Woke,
Devansh <3
To make sure you get the most out of Math Mondays, make sure you’re checking in the rest of the days as well. Leverage all the techniques I have discovered through my successful tutoring to easily succeed in your interviews and save your time and energy by joining the premium subscribers down below. Get a discount (for a whole year) using the button below
Reach out to me on:
Instagram: https://www.instagram.com/iseethings404/
Message me on Twitter: https://twitter.com/Machine01776819
My LinkedIn: https://www.linkedin.com/in/devansh-devansh-516004168/
My content:
Read my articles: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Get a free stock on Robinhood. No risk to you, so not using the link is losing free money: https://join.robinhood.com/fnud75