Building a Simple Live Streaming Platform[System Design Sundays]
Putting various concepts together for the interview
Hey, it’s your favorite cult leader here 🐱👤
On Sundays, I will go over various Systems Design topics⚙⚙. These can be mock interviews, writeups by various organizations, or overviews of topics that you need to design better systems. 📝📝
To get access to all the articles, support my crippling chocolate milk addiction, and become a premium member of this cult, use the button below-
p.s. you can learn more about the paid plan here.
“Design a platform like Twitch” is one of the most common system design interview questions that you would be asked. In today’s post on System Design, I thought it might be useful to go over some of the things I would look at the various steps involved in live-streaming:
The Journey of a Twitch Stream
Ingestion: “In the world of video, video ingesting refers to the process of taking raw footage from a camera and getting it into your editing system. It’s not just about getting your content from point A to point B — it’s also about ensuring everything is digitized properly while preserving its original quality. This also means capturing all the metadata that comes with every video file your company produces.“ For a live-streaming platform, this would be getting the video into the encoders and storage. It all starts with the broadcaster. Streaming software (like OBS Studio) encodes the raw video and audio stream, typically using the RTMP (Real-Time Messaging Protocol). This stream is then pushed to one of Twitch's geographically distributed ingest servers.
Transcoding: Twitch can't simply send the raw stream to viewers because internet speeds and device capabilities vary b/w users/geographies. Transcoding is the process of converting the original stream into multiple resolutions and bitrates (e.g., 1080p, 720p, 480p). This creates a 'quality ladder' that allows viewers to adjust playback seamlessly based on their network conditions. For this, you might consider the Open Source Transcoder Handbrake.
Packaging and Segmentation: Transcoded streams are neatly packaged into formats suitable for internet delivery, primarily HLS (HTTP Live Streaming). In HLS, the video is broken down into short segments (a few seconds each) making it easier to distribute and buffer.
Content Delivery Network (CDN): A CDN is a vast network of servers spread around the world. CDNs help reduce latency and improve performance by ensuring that the users are served video from a network. CDNs allow groups to offload the costs/effort of server maintenance to another group and focus on their specializations. Many platforms like Twitch tend to utilize a combination of both their own servers and an external CDN.
Fast, Temporary Storage
RAM (In-Memory Caches): Ideal for holding the most recent HLS segments for each stream, across all quality renditions. This ensures minimal delay between a segment being transcoded and its availability to viewers. Redis is often used for this type of caching layer.
SSDs (Solid State Drives): Used for slightly less immediate needs. They might store a larger buffer of HLS segments, offering quick retrieval if a viewer's device requests a 'chunk' from further back in the stream.
Long-term Object Storage
Cloud Object Storage (Amazon S3, Azure Blob Storage): Highly scalable storage solutions designed for storing huge volumes of data. This is where recorded streams (VODs) would reside. These services offer different storage tiers, allowing for cost optimization by moving older, less watched VODs to less expensive storage classes.
Databases
Relational Databases: Structured and queried for core system data: user accounts, stream metadata (titles, categories), subscription information, and other essential platform state management.
NoSQL Databases: Provide low-latency access to high-volume data like chat messages, real-time viewer counts, emotes, and other elements that scale heavily as viewership grows. For certain things like messages during the stream, a platform might consider deleting them since they no longer provide a value add.
Twitch has approximately 125 database hosts serving OLTP workloads in production, usually as part of a cluster. Approximately 4% run MySQL, 2% run Amazon Aurora, and the rest run PostgreSQL. We manage the provisioning, system image, replication and backup of several of the databases though most new clusters are RDS for PostgreSQL.
The most interesting cluster we manage is the original central database dating back to the origins of Twitch. In aggregate, that cluster averages over 300,000 transactions per second. We build and maintain our own specialized infrastructure to keep this stable, responsive, and capable of dealing with the varied use cases it supports.
- This is a very old article, but still interesting enough as a read
Key Technical Challenges and Solutions
Low Latency: Live streaming is all about immediacy. A live streaming platform like Twitch might employ several tricks to minimize delay:
Small Segment Sizes: Shorter HLS segments mean less buffering on the viewer's side. Using smaller segments also ties in well with adaptive bitrates.
Optimized Protocols: Newer protocols, like WebRTC or variations on HLS, are emerging to further reduce streaming latency. You can see how the different technologies used to deliver streams below:
Global Edge Network: Placing servers as close to viewers as possible reduces network transit times
Scalability: Millions of concurrent streams are not easy to manage. Twitch relies heavily on:
Load Balancing: Requests are intelligently distributed across its ingest and transcoding infrastructure to prevent bottlenecks.
Autoscaling: Resources are spun up or down elastically based on demand, ensuring viewers get a consistent experience.
Custom Solutions: Twitch has moved away from some off-the-shelf tools (like HAProxy for load balancing, which they replaced with Intelligest) towards its own tailored solutions for maximum performance.
Reliability: With so many streams, things are bound to go wrong. Twitch must ensure resilience through:
Redundancy: Multiple copies of data and backup servers minimize the effect of failures.
Resilient Encoding: Streams might be encoded multiple times, in different formats, to ensure viewers can continue watching even if there are encoder problems.
Thorough Monitoring: Extensive systems to track performance metrics help engineers spot and address issues swiftly. This is extremely important b/c a platform like Twitch will lose a lot of money if it goes down. It’s better to have proactive monitoring and utilize anomaly detection to catch potential problems before they arise.
A Step Beyond
The above components mentioned are good enough to get the basics. However, to go above and beyond, it’s good to study some more advanced technologies in depth. Here are some other things related to live-streaming that would be studying:
Adaptive Bitrate Algorithms: Deciding which quality to serve a viewer isn't trivial. Complex algorithms take into account network conditions, device type, and more.
Advanced Caching: Smart caching strategies at the edge mean less data needs to travel all the way from origin servers. Intelligent techniques like caching the most prominent streams/clips to reduce serving time might be worth exploring
Community Features: Chat systems, synchronized viewing, and other interactive features all add layers of complexity.
These are just some of the factors that I would consider while designing apps like Twitch. What would you add/change? Let me know in the comments or by replying to the email.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
Save the time, energy, and money you would burn by going through all those videos, courses, products, and ‘coaches’ and easily find all your needs met in one place at ‘Tech Made Simple’! Stay ahead of the curve in AI, software engineering, and the tech industry with expert insights, tips, and resources. 20% off for new subscribers by clicking this link. Subscribe now and simplify your tech journey!
Using this discount will drop the prices-
800 INR (10 USD) → 640 INR (8 USD) per Month
8000 INR (100 USD) → 6400INR (80 USD) per year (533 INR /month)
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819