Understanding Branchless Programming [Technique Tuesdays]
What is it, how it works, and does Branchless Programming actually speed up your code?
Hey, it’s your favorite cult leader here 🦹♂️🦹♂️
On Tuesdays, I will cover problem-solving techniques that show up in software engineering, computer science, and Leetcode Style Questions📚📚📝📝.
To get access to all my articles and support my crippling chocolate milk addiction, consider subscribing if you haven’t already!
p.s. you can learn more about the paid plan here.
Recently, I came across a very interesting video going over this idea called Branchless Programming. It talked about how control structures like the if statement were very slow and taxing for the CPU and then introduced the idea of branchless programming. I’m linking it below, for any of you that want to watch.
This video had me interested in exploring the topic in more depth. So I did some digging into Branchless Programming, and I learned some very interesting things from it. So in today’s edition of Tech Made Simple, we will be going over Branchless Programming. This is a surprisingly divisive topic (is there anything that programmers don’t start forum wars over), so we will go over what branchless programming is, how it’s theoretically supposed to work, and what the debate is about.
I love this, but I'm reminded of Kernighan's law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
-One of the comments about this technique. Branchless programming is often disliked because it can make code needlessly complex.
What you should know about Branchless Programming
What is branchless programming- Branchless programming is a programming technique that eliminates or minimizes the use of branches in code. Branches are conditional statements, such as if-else statements and loops, that can cause performance penalties in modern processors.
When a branch is executed, the processor must stop what it is doing and check the condition of the branch. It can’t plan ahead- making optimization harder.
Is this really a big deal- When it comes to compiling code, CPUs use a method called Instruction Pipelining. This involves fetching the next instructions while running the current one. Running an instruction involves 4 steps-
Fetch: Fetches the next instruction needs to be excited from the memory (usually available in processor cache itself.)
Decode: After the fetch stage is completed, the instruction will be decoded
Execute: The decoder will identify the instruction and issues control signals to execute the instruction
Write Back: After execution, the results will be written back
When the CPU decodes a branch instruction it has to flush the stages and restart the pipeline. This can add up, especially when we’re dealing with conditionals in loops.
Branchless is also useful for SIMD since that lacks branches to begin with. An edge case, but worth mentioning. If you’re looking to go deeper into this subject, this is a great blog post outlining some cases where Branchless Programming might be a winner, along with some concerns about it.
Why the Debate- It should be clear that Branchless Programming comes with optimization benefits. So why is this not more mainstream? Why do colleges not teach it, why does it not show up in interviews etc? Why do so many developers dislike it? Simply put- complexity. As alluded to earlier, branchless programming gets more complicated than your last breakup (it’s right up there with Tax Law in terms of complexity). This complexity often overwrites any performance gains- simple code is better than clever code in most cases. As we covered in our post on clean code- clean code is code that can be meaningfully changed by other developers with minimal supervision. Branchless Programming is the opposite- it often requires a lot of thinking to understand the solution fully. Not great for maintenance.
My take on it- In a nutshell, I’m not a huge fan of this technique. Given my whole schtick of simplifying things- this should come as no surprise. For even moderately complex code, I don’t see this being a particularly viable solution. In my experience, you’re much better off building a simpler system over trying to optimize a complicated one. Of course, my experience is heavily filtered by my experiences in Machine Learning Engineering- where the biggest ROIs are often in picking simpler models, constraining user interactions to a few areas, and data engineering/selection. But for whatever it’s worth- the programming subreddit seems to agree with me on Branchless Programming.
Ultimately, from my research, it seems like Branchless Programming is not something that most people should be putting most of their time on. It can be very performant, but the complexity would probably make the juice not worth the squeeze. But given the size of our reader base, I’m sure some of you are assembly nerds that find your God in optimizing every micro routine. If that is you, who am I to stop you? If you have any optimizations you’re particularly proud of, share them and I will be happy to share them with the audience.
That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. If you refer people using the button below, you will get many cool benefits such as magic swords, dragons, and complimentary premium subscriptions
Save the time, energy, and money you would burn by going through all those videos, courses, products, and ‘coaches’ and easily find all your needs met in one place at ‘Tech Made Simple’! Stay ahead of the curve in AI, software engineering, and the tech industry with expert insights, tips, and resources. 20% off for new subscribers by clicking this link. Subscribe now and simplify your tech journey!
Using this discount will drop the prices-
800 INR (10 USD) → 640 INR (8 USD) per Month
8000 INR (100 USD) → 6400INR (80 USD) per year (533 INR /month)
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- https://artificialintelligencemadesimple.substack.com/
My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
It doesn't make sense to use branchless programming the vast majority of the time. As you stated, it improves performance, but at the cost of complexity - and the cost of complexity far outweighs the benefits to performance.
At least most of the time. Where the optimizations of branchless programming are useful is in the (usually very small) areas of code where performance bottlenecks are known to exist.
Implementing these techniques in just a few key places will have an outsized impact on performance.
And since these areas are small, we can put more time into commenting them extensively so it's clear to those who follow in our slime trai... uh, footsteps, what the code is doing.
So, this all means that, as in literally every other cool new programming fad/idea, branchless programming has application in specific circumstances, but isn't a panacea. A truly professional programmer needs to choose the appropriate technique for the specific circumstances at hand, not try to hammer the round pegs of a favored technique into the square holes of reality.