To learn more about the newsletter, check our detailed About Page + FAQs
To help me understand you better, please fill out this anonymous, 2-min survey. If you liked this post, make sure you hit the heart icon in this email.
Recommend this publication to Substack over here
Take the next step by subscribing here
Are you ready for this one?
This problem can be found as problem 49. Group Anagrams
Problem
Given an array of strings strs
, group the anagrams together. You can return the answer in any order.
An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
Example 1:
Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]
Example 2:
Input: strs = [""]
Output: [[""]]
Example 3:
Input: strs = ["a"]
Output: [["a"]]
Constraints:
1 <= strs.length <= 104
0 <= strs[i].length <= 100
strs[i]
consists of lowercase English letters.
The wording of the question has a very interesting hidden easter-egg. You should latch on to that in your interviews for the best results. This post will discuss this at the end.
You can test your solution here
Step 1: Analyzing the Input and Output
In a question like this, I love to analyze the inputs and outputs of the problem. This is generally a good first step if nothing immediately stands out because it can help you find some potential threads to unravel the problems.
In our case, the input is simple, but the output is interesting. We’re grouping the strings by the distribution of the chars. We know that this can be in any order. The unordered nature and grouping make me think of a particular tool that we really like in this newsletter. I’ll rewrite these more explicitly give you a hint-
We want to group the strings together by a certain property. As we build the return value, we’ll be checking if a group with this property exists within whatever data structure we use to store the return value.
The final ordering doesn’t matter. We can use a data structure that is unordered.
By this time, it should be obvious which DS we use. Our good ol’ buddy the HashMap/Dictionary. Our map will look like-
dict(charsDistribution)= [all strings that have this distibution]
where charsDistribution
is some way you store the distribution/spread of characters. This gives us the perfect next step to tackle. And we are going to run after it, with all the enthusiasm of a billion-dollar crypto firm chasing bankruptcy (click to read more).
Step 2: charsDistribution
The next logical question is simple- how do we represent the distribution of characters in a way that is hashable? We have a few options.
Use the Sorted version of every string in strs as the key. This way all anagrams will have the same form. This is simple to build. Sort every string and see if it already exists in the dict. If not, just mash the original string in there. However this is slow. If the average length of the string is m in str, then our solution will n*m logm. Not the battle you want to pick.
Use a genetic string as the key. To encode the distribution you use a modified string that tracks the characters and their counts. So the word ‘keyyz’ would have a string like 1k1e2y1z. This is a term I made up because you see this kind of representation of states in Genetic Algorithms (which are super underrated in AI). Coding this up is technically going to use the least extra space, but it will be pretty challenging and not worth it for your interview. Handling the different orderings of anagrams efficiently (ab vs ba) will present a challenge. Do it if you want to practice your coding and have nothing else going on in your life.
Sounds like we’re stuck between 2 tough choices. Here is where we will do something that physically sickens me. We will do something that I was born to crusade against (it’s not wearing pants, although that is a very close second). We will compromise.
Our compromise will be simple. We will use a little extra space to make our code quick and easy. We won’t use a ton of it, and the amount of time you save on the time complexity and not coming up with a convoluted solution will be worth it. Let’s cover that next.
Step 3: Our compromise
Here is what our compromise will look like-
Every key will be an array of length 26. Every index in that array will tell us how much of that corresponding char is present. So the key ‘abc’ gives us [1,1,1, 0,0….0].
Using this key removes the need for sorting. All the anagrams will lead to the same key. An array of this nature is inherently sorted.
To those of you that don’t know index mapping, check it for your language. In Python we use the ord function to find the ASCII of a character. We then subtract the ASCII of a character by the ASCII ‘a’ to find the correct location in our array.
correct index in distribution= ord(char) - ord("a")
Clearly, we are using some extra space. In this case, it’s not a huge deal because we are dealing with only 26 chars. But this solution will scale linearly with a larger input alphabet (time will too, but that is true for every approach). This is not a problem, but make sure you mention it for those extra brownie points.
With this out of the way, let’s code it up.
Step 4: Code
If you’ve followed along so far, then you should have no problems understanding the code.
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
ans = collections.defaultdict(list)
for s in strs:
charsDistribution = [0] * 26
for c in s:
charsDistribution[ord(c) - ord("a")] += 1
ans[tuple(charsDistribution)].append(s)
return ans.values()
To those of you confused by the tuple() that is done because arrays in Python are mutable. Mutable structures can’t be hashed. Converting to tuple() makes it immutable. If you come from a language like Java, where arrays and array lists are different, then this step is not needed.
Time Complexity- O(m*n) where m is the average length of the strings in strs
Space Complexity- O(n)
Loved the solution? Hate it? Want to talk to me about your crypto portfolio? Get my predictions on the UCL or MMA PPVs? Want an in-depth analysis of the best chocolate milk brands? Reach out to me by replying to this email, in the comments, or using the links below.
PS: The use of the word ‘typically’. If you were interviewing for a role, and you saw the word ‘typically’ dropped into the Leetcode style question, I want you to proceed like normal. It generally means nothing (like in this case). However, you want to mention it to your interviewer either in the beginning (as a clarifying question) or at the end (when talking about possible heuristic optimizations). Either one is a great demonstration of your attention to detail.
Stay Woke. I’ll see you living the dream.
Go kill all,
Devansh <3
Reach out to me on:
Instagram: https://www.instagram.com/iseethings404/
Message me on Twitter: https://twitter.com/Machine01776819
My LinkedIn: https://www.linkedin.com/in/devansh-devansh-516004168/
My content:
Read my articles: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Get a free stock on Robinhood. No risk to you, so not using the link is losing free money: https://join.robinhood.com/fnud75