An Amazon Sr. Manager's Guide to Managing AI Teams effectively [Storytime Saturdays]
How to build teams that can help you take abstract ideas to tangible AI products.
Hey, it’s your favorite cult leader here 🐱👤
On Saturdays, I will cover stories/writeups covering various people’s experiences 📚📚. These stories will help you learn from the mistakes and successes of others. These stories will cover various topics like Leadership, Productivity, and Personal/Professional Development. Use these to zoom ahead on your goals 🚀🚀
To get access to all my articles and support my crippling chocolate milk addiction, consider subscribing if you haven’t already!
p.s. you can learn more about the paid plan here.
Nvidia’s stock is on a tear right now. They are close to touching the Trillion Dollar mark, which would put them in a rare league with Google, Apple, and other heavy hitters. Their growth has been funded by one thing in particular- AI Hardware. As everyone and their grandmas continue to get hot and bothered about AI, Nvidia is uniquely positioned to profit. More and more teams will continue to set up AI teams and integrate Data into their operations.
However, this leads to an interesting challenge. Managing anyone is hard- but AI teams can be a special kind of nightmare to handle. The multi-disciplinary and opaque nature of the field adds extra challenges, ones that need very robust systems to tackle. Fortunately, the internet can be a great source of knowledge. Christopher Walton, a Sr Applied Science Manager at Amazon, shared 5 amazing insights about how he recommends AI teams be managed. In this article, I will be expanding upon his tips, while also adding some others that will be useful to you. Use these to get ahead of the curve in adopting AI the right way. If you don’t want to fall behind in the AI race, keep reading 📖📖.
How to Effectively Manage AI/ML Teams
There are a few principles to managing AI Teams. These involve-
Focus on rapid experimentation. AI is all about experimentation. The faster your team can experiment, the more insights they will generate. We covered this principle more generally in the piece on why you should ship code fast. This means focusing on removing blockers to experimentation, such as standardizing data quality, increasing data access, and increasing automation in different stages. We have a post on CI/CD here that might be useful to you in this regard.
Embrace failure. Experiments will fail. This is not a bad thing, since your teams can still learn. Allow time for multiple iterations of each experiment, find ways to fail fast, and have multiple alternatives available. And for the love of god, don’t punish failure/non-results. Punishing failure will disincentivize your teams from taking risks and this will kill innovation and initiative. We have a whole piece on the benefits of autonomous teams and how to foster autonomy and initiative here.
Build a community of scientists. Give the scientists within your organization spaces where they can collaborate and learn from each other. This will exponentially push the velocity of ideas in your org. I would add to this, and recommend that you integrate voices from outside the non-academic circles into this article. When we covered why Open Source was wrecking OpenAI and Google, we pinpointed the echo chamber-esque nature of research often done in Academia/Big Tech Research teams. Here is a snippet- If most people in your research teams are from research team are from this background, they will replicate many of the flaws found in academia and upper-level education.
Don’t skimp on the hiring. Scientists are not engineers or data analysts. Although many scientists are capable of writing code and performing analytics, these tasks require very different skills and interests. Hire experts in each area, or rotate scientists between tasks to avoid frustration. Losing skilled workers is a cost that you want to avoid when you can.
Invest in MLOps. Productionizing AI/ML is hard and requires dedicated engineering effort. A lot of great ML science does not reach production due to a lack of MLOps capability. Invest in MLOps engineers to ensure that successful experiments are fully deployed. Chris recommends a 2:1 ratio of Ops bros to scientists. If your ML systems aren’t mature as Amazon, I would say go for overkill. Over-index on the engineering so that you can get systems developed and the infrastructures set up.
Consider the cost of experiments. Too many people blindly follow the most popular/State of the Art methods in AI and end up burning through their budgets quicker than Haaland can break scoring records. Things have an opportunity cost to consider and don’t let that slip your mind. Take this recent comment on my breakdown of Google’s SpineNet architecture on our sister publication AI Made Simple. It very rightly points out that the cost for the search that Google used would not be practical for a majority. This is something that you should consider as well.
Honor your ancestors. Confucian philosophy is well known for its emphasis on bringing honor to your ancestors. A similar principle applies to effective AI. Remember, AI/Machine Learning is not the goal- it’s a tool to solve a problem. To rush into implementing Deep Learning without consulting domain experts and end-users is foolish. They can give you insights on what problems are worth solving, what factors can improve efficiency, and whether insights from AI Models even make sense to begin with. To blindly swing tech around without understanding the underlying domain is a great way to end up like Uber, Zillow, and Doordash- fancy tech, no profits.
These are points that merit their own articles. So for now, I will elaborate on a few key principles to get started. If you want more detailed follow-ups, you know how to reach me.
How to speed up your experimentation process
The first step to improving the speed of experimentation is to invest in robust data infrastructure and data processing techniques to ensure high-quality data for experiments. This may involve establishing data pipelines, implementing data validation processes, and conducting regular data audits.
This is where Automation can really kick things up a notch. Automation saves valuable developer time and allows scientists and engineers to focus on building as opposed to verification and other time-consuming but critical tasks. It can also reduce errors from human oversight. Integrating routine checks and tests will only help you in designing better systems. In a similar vein, improving your orgs hierarchies to allow for quicker access and more democratic access can be very beneficial to faster experimentation. If you’re worried about Data Privacy- there are lots of data masking techniques that can improve privacy while keeping your data somewhat intact. This might be worth pursuing if you handle sensitive data.
Lastly, you want to implement a comprehensive experiment-tracking system to capture experiment teams, parameters, results, and insights. This allows scientists to analyze and learn from previous experiments, avoiding duplication of effort and facilitating knowledge sharing within the team. This is an investment that will continue to pay for itself as your operations expand.
Using Failure for better results
The single most effective way you can not utilize failure effectively is to not punish it. Remember, humans are geared toward loss aversion, so punishing failure will wreck morale and drive. Instead, Instead, you want to treat mistakes as learning opportunities. A senior AI manager from Amazon told me that his team had a simple policy for mistakes/errors- mistakes would not be penalized if the person who made it documented their mistake, analyzed what caused it, and how it would be avoided in the future. This approach is great because it allows people to learn from their (and other people’s) mistakes. If you start building internal productivity tools, then having such a log will also help you design the best tools.
Collaborative Science: Leveraging peer reviews and cross-team partnerships
To foster collaboration and peer review processes within AI teams, organizations can implement the following practices:
Regular Peer Review Sessions: Establish a culture of peer reviews where scientists/developers can share their experimental designs, methodologies, and preliminary results with their peers. Make this as part of their work, so you don’t stake up too much on your scientist’s plates.
Leverage Async formats for idea discussion: Group chats and social media platforms (even internal comms platforms count here) are all async which comes with its own benefits. Develop async channels for the explicit purpose of sharing ideas and approaches. This facilitates the cross-pollination of ideas, creates a better community, and exposes scientists to diverse perspectives.
Cross-Team Partnerships: This is a favorite of Fidel Rodriguez, Director of Analytics at LinkedIn. In my conversation with him, he mentioned that he loves Gemba Walks- where people from other teams come and questions about a product/feature. Their outside perspective allows for more diverse views. To see the other leadership techniques that Fidel loves, check out the article Organizational lessons from a Director of Analytics at LinkedIn.
I don’t have much to say about the others. If you really want, I will research and do a dedicated follow-up. However, based on early readings- they feel a lot more context-dependent- so it would be hard to give any truly useful ideas here. If you have anything to share, please go ahead.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma's favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Read my articles on Medium: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819