How to Create Good Documentation in Software Engineering and Tech[Technique Tuesdays]
With an emphasis on Documentation for Data Science and Machine Learning
To learn more about the newsletter, check our detailed About Page + FAQs
To help me understand you better, please fill out this anonymous, 2-min survey. If you liked this post, make sure you hit the heart icon in this email.
Recommend this publication to Substack over here
Take the next step by subscribing here
A lot of you loved the Saturday post on how to become an effective leader in tech,
In it, I covered the importance of asynchronous communication and how using documentation could help streamline your teams productivity.
As promised in that post, today I will cover how you can create better documentation standards. Applying these standards to your teams will help pay off in spades. I will spend a lot of time referencing Machine Learning, AI and Data Science because that is my domain and the field I read up the most in. However, the core of this post will be applicable
Key Highlights
Why Good Documentation Matters- Good documentation provides clarity to your whole team about the desired goals and the work done to achieve them. Clarity will supercharge a team’s productivity.
The Pillars of Good Documentation- Vision for Company and Products (Trust me this is very important); Resource/Situational Constraints; Data Sources Used, Datasets available, and processing done; Projects currently in progress ;The actual code you have; Ownership
The post will cover each of these pillars individually and what you need to do to ensure that each serves their purpose to their full extent. Let’s get right into it.
Why good data documentation matters for Tech
Before we get into the specifics of how to write good documentation, let’s briefly cover why good documentation matters. This will allow us to have guiding principles for our documentation. Remember, once you establish your “why”, it is easier to figure out your how.
So why does good documentation matter in Software Engineering (and Machine Learning in particular)? After all, it can be expensive and time-consuming for a developer to go through and spend all those hours writing the documentation. Good documentation allows for the following benefits-
Helps people be on the same page. Having good documentation allows people from different teams all have a common understanding.
Makes the vision and plans clear. The correct actions vary based on an organization's plans, vision, and constraints. Having detailed documentation will allow everyone to figure out the next steps better. Remember, it’s hard to see a target you can’t hit. Documentation makes your targets a lot more concrete.
Reduces the onboarding time. Every time I work on a new project, the first thing I do is look through what has been done already. This involves pouring over the methods, information about the data collected, ML pipelines rationale, etc. Having good documentation will severely reduce the time I would otherwise spend catching up.
If I had to summarize the benefits in a word, it would be clarity. Good documentation adds a lot of clarity across the board. This will save your business a lot of developer hours that would have otherwise been wasted looking up things repeatedly.
In an increasingly remote world, asynchronicity becomes the norm. Fantastic Documentation promotes asynchronicity. If you hate needless meetings, then promote thorough documentation. This will allow you to cut a lot of dead time going over the same ideas repeatedly. If you want to be a remote worker, then learning documentation and/or these other skills will allow you to thrive.
The details of good documentation for Software Engineering.
So now to answer a multi-million dollar question, “How to write Good Documentation in Software Engineering/Tech?”. To have good documentation you need to address the following areas-
Vision for Company and Products (Trust me this is very important)
Resource/Situational Constraints
Data Sources Used, Datasets available, and processing done (this one is very Machine Learning Specific, but chances are that most big projects you interact with will have some degree of data processing).
Projects currently in progress
The actual code you have
Ownership
Let’s address each of these individually.
1: Company Vision and Plans
This should always be the number one priority within a company. As mentioned earlier, you can only hit the target you see. Having a clear direction and purpose will save your group a lot of energy wasted running around like a headless chicken.
So what does this look like? Documentation should clearly outline each product developed or in development. The business cases, how those products integrate into the larger ecosystem, and the ideal customer that your business/group is targetting. This might seem like something for the finance bros, but these are key for developers. Remember, in the end, we have to develop products that are useful to society in order to generate long-term value. Not knowing who you’re building a product for is setting yourself up for failure.
Let me drill this point home with an example of a Large Scale Automated Data Analysis company. This company takes data from their clients and does some analysis for them and returns nice insights. If this sounds familiar to my regulars, this is because the company I helped build, Clientell, does exactly this.
There are two ways our new hypothetical company can go. One is to try serving a lot of clients and make money by achieving a lot of volume (tons of clients/orders). The other is to only work for a few high-ticket clients and build very heavy solutions geared towards these clients. Either way will allow you to build a thriving business. However, the engineering challenges for each are very different. Having a clear vision will help your whole organization move towards the right goals.
2: Resource/Situational Constraints
With a clear vision, you also need to outline the constraints your team is currently working with. These constraints might be physical/resource-oriented (lack of manpower, cloud computing, finances), or domain-oriented (rules and regulations). They might even be self-imposed (meet certain baselines, use certain tools/solutions, integrate within a framework). Making these clear will be crucial, and all documentation should cover these.
3: Information about Data
Every time an organization tells me they don’t have this, I shake my head. Any serious Data Processing/Analysis/AI Company should do this. Your documentation should cover information about the data sources used, what the pipeline looks like, and what kind of processing is being done to features/information from our raw data.
Each feature being used for the Data Science/Machine Learning should have it’s own breakdown with information about its nature (Categorical/Boolean/Numerical etc), rationale for using it, expected range/distribution. This would also be a good place to document any priors and how you came to them.
Even if you aren’t a major Data Analysis team/project, chances are you’re working with Data to some capacity. Documenting the kind of data you’re working with, the expected ranges, etc. will help a ton in debugging and identifying next steps.
4: Projects currently in progress
An engineer working on one project should be able to look up other projects also in the pipeline. This can help developers develop a birds-eye view of the organization and is a must for building cross-team collaborations. It can also help your engineers build solutions with the big picture in mind, which will pay many dividends.
5: The actual code you have
We already covered this one in Depth. Look into this over here-
6: Ownership
My original version did not contain this section. Most organizations I’ve consoluted don’t have software operations that are extensive enough to need this, and this is rarely brought up in the engineering blogs. However, I often send my drafts to various experts to review, and one of them brought this up. Major thank you to Dhruv Garg. He is an Applied Science Manager at Amazon and someone who regularly gives me his inputs. He is currently Hiring Machine Learning people in his team, so if you’re interested reach out to him on LinkedIn here.
He had the following to say-
Simply put, as documentation gets larger, maintaining it and searching through it becomes a pain. This is not a problem for 90% of companies in the world, since they operate locally and technology is the tool they use for their business. However, you’re all here because you want to make the big bucks in Tech. Which means you will be working with Tech Companies operating at Huge Scale. Keep this advice in mind.
I created Technology Interviews Made Simple using new techniques discovered through tutoring multiple people into top tech firms. The newsletter is designed to help you succeed, saving you from hours wasted on the Leetcode grind. I have a 100% satisfaction policy, so you can try it out at no risk to you. You can read the FAQs and find out more here. Use the button below to get 20% off for upto a whole year.
Before proceeding, if you have enjoyed this post so far, please make sure you like it (the little heart button in the email/post). I also have a special request for you.
***Special Request***
This newsletter has received a lot of love. If you haven’t already, I would really appreciate it if you could take 5 seconds to let Substack know that they should feature this publication on their pages. This will allow more people to see the newsletter.
There is a simple form in Substack that you can fill up for it. Here it is. Thank you.
https://docs.google.com/forms/d/e/1FAIpQLScs-yyToUvWUXIUuIfxz17dmZfzpNp5g7Gw7JUgzbFEhSxsvw/viewform
To get your Substack URL, follow the following steps-
Open - https://substack.com/
If you haven’t already, log in with your email.
In the top right corner, you will see your icon. Click on it. You will see the drop-down. Click on your name/profile. That will show you the link.
You will be redirected to your URL. Please put that in to the survey. Appreciate your help.
In the comments below, share what topic you want to focus on. I’d be interested in learning and will cover them. To learn more about the newsletter, check our detailed About Page + FAQs
If you liked this post, make sure you fill out this survey. It’s anonymous and will take 2 minutes of your time. It will help me understand you better, allowing for better content.
https://forms.gle/XfTXSjnC8W2wR9qT9
I see you living the dream.
Go kill all and Stay Woke,
Devansh <3
To make sure you get the most out of Technique Tuesdays, make sure you’re checking in the rest of the days as well. Leverage all the techniques I have discovered through my successful tutoring to easily succeed in your interviews and save your time and energy by joining the premium subscribers down below. Get a discount (for a whole year) using the button below
Reach out to me on:
Instagram: https://www.instagram.com/iseethings404/
Message me on Twitter: https://twitter.com/Machine01776819
My LinkedIn: https://www.linkedin.com/in/devansh-devansh-516004168/
My content:
Read my articles: https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Get a free stock on Robinhood. No risk to you, so not using the link is losing free money: https://join.robinhood.com/fnud75