How GitHub uses GitHub Actions and Actions larger runners to build and test GitHub.com [Technique Tuesdays]
Enhancing Continuous Integration with GitHub Actions and Larger Runners
Hey, it’s your favorite cult leader here 🦹♂️🦹♂️
On Tuesdays, I will cover problem-solving techniques that show up in software engineering, computer science, and Leetcode Style Questions📚📚📝📝.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction.
PS- We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee. Check out this post for more details and to find a plan that works for you.
Executive Highlights (TL;DR of the article)
Recently, I came across a very interesting piece called, “How GitHub uses GitHub Actions and Actions larger runners to build and test GitHub.com”, which is a pretty interesting overview of using Github Actions for CI/CD (learn more about what it is and how it enables smoothness and collaborations across large, diverse teams here). It was pretty interesting, and I think it’s always good to study different software engineering tools to see how we can improve our own work experiences-
we run 15,000 CI jobs within an hour across 150,000 cores of compute
This article will be my overview + analysis of the article to understand how GitHub achieves speed, efficiency, and reliability at a massive scale. To understand the article, it’s helpful to first understand Github Actions and Action Runners.
GitHub Actions is a workflow automation platform that allows developers to automate various tasks within the GitHub development process. These tasks can include building and testing code, deploying applications, and running other repetitive tasks. Integrating actions has 2 major benefits. First, it standardizes your process, ensuring better stability, security, and performance (you don’t miss silly little things). An example of a set of actions is shown below-
Second, it drops the cognitive load on your devs, allowing them to focus on more important tasks. This can be a huge boost to productivity and reduce the amount of busy work that needs to be done-
Larger Runners- “Larger runners are GitHub Actions runners that are hosted by GitHub. They are managed virtual machines (VMs) with more RAM, CPU, and disk space than standard GitHub-hosted runners. There are a variety of different machine sizes offered for the runners as well as some additional features compared to the standard GitHub-hosted runners.”
Once we understand this, we can discuss the following ideas in more detail-
Evolving CI/CD for Enterprise Needs: GitHub’s CI/CD was facing scaling and stability issues, which lead to the development of a more robust system powered by Actions.
Larger Runners: The Scalability Powerhouse:
Handling GitHub’s immense workload requires the power and flexibility of Larger Runners. These managed VMs, with their customizable hardware and autoscaling capabilities, form the backbone of their CI/CD infrastructure.
“We wanted to share some raw numbers on our current peak utilization of larger runners:
Uses 4,500 concurrent 32-core runners
Runs 125,000 build minutes per hour
Queues and runs approximately 15,000 jobs within an hour
Allocates around 150,000 cores of compute”
Larger runners give Github two pretty important efficiency benefits-
Dynamic Autoscaling: Larger Runners are deeply integrated with GitHub’s infrastructure, enabling seamless autoscaling. When CI workloads increase, new runner instances are provisioned automatically to meet demand, ensuring consistently fast build and test times. This also works the other way, automatically scaling down when needed.
Fine-Tuned Resource Control: Larger Runners offer customizable virtual machine sizes, allowing GitHub to specify the precise CPU, RAM, and storage resources required for different types of workflows.
Custom Images: Pre-built for Speed: GitHub leverages custom VM images to pre-configure build environments with project-specific tools and cached source code. This, coupled with regular image updates via a dedicated API, significantly reduces build times and optimizes resource usage.
In practice, this has reduced the bootstrapping time of our projects significantly. Without custom images, our workflows would take around 50 minutes from start to finish, versus the 12 minutes they take today. This is a game changer for our engineers.
Actions Features- Workflow Efficiency at Scale: Specific Actions features enhance GitHub’s CI/CD efficiency:
Reusable Workflows: Centralized, reusable workflows ensure consistency and streamline CI/CD adoption across thousands of repositories.
Reusable Workflow Outcomes: This innovative feature minimizes redundant CI runs by intelligently reusing previous successful outcomes.
Securing the Pipeline: Private Service Access: GitHub implemented a custom solution using OIDC tokens to securely grant CI workflows access to sensitive internal services within their VPC. “With this solution we are able to securely provide remote access from larger runners running GitHubActions to our private resources within our VPC.” We won’t discuss this, since it’s not super relevant to the focus of this article, but check out their blog if you want to learn more.
If that sounds interesting, let’s get into it.
I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.
The Problems with Traditional CI at Github
Early iterations of Github’s Continous Integration systems encountered challenges in:
Scaling to Meet Growing Demands: As the platform and its engineering team grew rapidly, legacy systems struggled to keep pace, leading to slow build times and resource constraints.
Maintaining Consistent Build Environments: Ensuring consistent and reliable build environments across thousands of concurrent builds proved difficult with previous CI solutions.
GitHub Actions emerged as the ideal solution, enabling a more robust and scalable CI/CD pipeline. This transition has yielded significant advantages:
Meeting Real-World Demands: Using Actions internally allows GitHub to rigorously test the capabilities of its own product under the most demanding real-world conditions, driving continuous improvement. Since Github provides tools to developers, internal feedback directly feeds the development of features that customers would appreciate.
Empowering Internal Developers: The transition to a more powerful and reliable CI/CD system empowers GitHub’s engineers to focus on innovative features without worrying about infrastructure limitations
Larger Runners came in clutch to push Actions to large scales.
Larger Runners: Engineered for Scale and Performance
We covered the most important points of Large Runners in the tl;dr, so here’s a quote from the writeup to re-emphasize the main things.
Coming from previous iterations of GitHub’s CI systems, we needed the ability to create CI machines on demand to meet the fast feedback cycles needed by GitHub engineers and to scale with the rate of change of the site.
With larger runners, we maintain the ability to autoscale our CI system because GitHub will automatically create multiple instances of a runner that scale up and down to match the job demands of our engineers. An added benefit is that the GitHub DX team no longer has to worry about the scaling of the runners since all of those complexities are handled by GitHub itself!
The additional Computational capabilities enable something very cool-
Using Custom Images for Speed and Efficiency
Custom VM images within Actions further optimize GitHub’s CI/CD pipeline:
Pre-Configured Environments: Custom images serve as pre-configured build environments, containing all the necessary software, libraries, and tools specific to a project. This eliminates the time-consuming process of downloading and installing dependencies during each CI run, dramatically speeding up build times.
Cached Source Code for Even Faster Builds: GitHub goes beyond pre-installed dependencies by bundling a cached version of a project’s source code into the custom image. This means subsequent builds can leverage the cached code, further accelerating the process.
Dedicated API for Image Updates: To keep the cached source code and dependencies up to date, GitHub has developed a dedicated API that automatically updates custom images multiple times a day. This ensures that CI builds always start with the latest code and avoid inconsistencies.
To round out their use of Actions to speed up their builds, Github also makes use of special features, which we will discuss next.
Optimizing Actions Features for Reuse
At the scale of Github, any added operation can add a lot of cost. GitHub makes strategic use of specific Action features to improve the efficiency of deploying Actions at scale:
Reusable Workflows for Consistency and Speed: Reusable workflows allow teams to define standard CI/CD processes that can be easily shared and implemented across thousands of repositories. This promotes consistency, eliminates redundancy, and simplifies the onboarding of new projects onto CI/CD.
Reusable Workflow Outcomes: Developed in collaboration with the Actions team, this feature intelligently identifies situations where repeating CI checks would be redundant (e.g., when file changes don’t affect the outcome of a workflow). By reusing previous successful outcomes, GitHub saves significant time and compute resources.
This ensures that the costs don’t grow linearly with operations, leading to significant cost savings in the long run.
All of these combine to give Github a much smoother CI process and better Developer Experiences.
If you liked this article and wish to share it, please refer to the following guidelines.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
PS- We follow a “pay what you can” model, which allows you to support within your means. Check out this post for more details and to find a plan that works for you.
I regularly share mini-updates on what I read on the Microblogging sites X(https://twitter.com/Machine01776819), Threads(https://www.threads.net/@iseethings404), and TikTok(https://www.tiktok.com/@devansh_ai_made_simple)- so follow me there if you’re interested in keeping up with my learnings.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819