How to use Machine Learning for your Small Business [Storytime Saturdays]

Effective Machine Learning can be done at smaller scales

Dec 17, 2022

To learn more about the newsletter, check our detailed About Page + FAQs

To help me understand you better, please fill out this anonymous, 2-min survey. If you liked this post, make sure you hit the heart icon in this email.

Get a free Weekly Summary of the important updates in AI and Machine Learning here

Recommend this publication to Substack over here

Take the next step by subscribing here

Recently, a reader + viewer of my content reached out to me about Machine Learning and some specific concerns/challenges he had with his Machine Learning Journey. He asked me a list of questions, all of which I was happy to answer (he asked good, well-structured questions). Among the questions was a particular video request:

The answer to question 2, if you were wondering, is absolutely. Shocking I know

I realized that this would definitely be an interesting topic for me to cover. I normally cover the technical details of Machine Learning, but I don’t really talk about the business implementation side too much. Especially for smaller organizations, which have a lot lower margin of error/experimentation than larger firms. In this article, I will cover how you can implement machine learning in your small organization. We’ll cover the process, end-end, going over some considerations and challenges you will face on the way.

Now a lot of you are not going to work in AI directly. But there is no doubt that organizations all over the world have decided to integrate Data Analysis and AI into their procedures. Whether you’re a product manager or a developer, having insight into the procedure of implementing AI into your systems will put you in the prime position to take advantage of the new opening. As I’ve shown you in my article on what leads companies to promote the wrong people, many people get promoted by being at the right place and at the right time. Add to this the right skills, and your chances go through the roof.

Important Highlights

Step 1- Figure out your problem: Yes, this sounds obvious. But many people don’t fully understand the importance of this step. Your problem is not as simple as “Use AI in X”. It involves thinking of where you will benefit from Machine Learning. It involves thinking about what your monetization strategy is, and how implementing ML into it will benefit you. Sometimes you will see that the cost of implementing Data will be more than the expected returns. I’ve talked to many teams that rushed to implement Deep-Learning/Neural Network based solutions while avoiding simpler solutions because they didn’t spend time thinking about the overall problem. By the end of this step you want to establish a baseline, expectations from this project, understand the minimum needed viability for your system etc.
Step 2- Building the Pipeline- A machine learning pipeline refers to all the steps implemented in your ML system, from your data collection to the final model inference and usage of your results. Think of it as the skeleton and nervous system of your system. Many teams make the mistake of focusing mostly on the ML models, and neglect building the entire pipeline first. Their final models often become out of place and clunky. Build your full pipeline first, even if the individual components are relatively basic. By the end of this step, you should have a fully functioning, well-designed system. This system is now ready for the next steps.
Step 3- Trial and Error- This is the step where you add the horsepower to your solutions. Often, this will involve trying to add new features, adding more models, implementing alternative inference structures etc. This can be quite grindy because there is a lot of trying something out, seeing if it clicks, and going back to the drawing board. Once you’ve hit your minimum requirements, it is now time to proceed to the next step.
Step 4- Deployment- Once you have the system reaching the minimum acceptable performance, you want to send your system out into the wild. See how it interacts with your live product, ensure things are working fine etc. If you see the returns you expected, you can always go back and add more resources to your system for better performance. However, it is important to do this after your system has only proven that it can function in the final product.

Want more deets? Here is some more information on the steps-

Step 1: Figure out your problem/solution domain

This seems trivial, but it really isn’t. When defining your problems, it is important to hash out several details. Where would you get possible sources of data? Is there a dataset the client already has that they want analyzed, or will you have to figure out the features? The former is more straightforward (will create a video on my favorite pipeline for it soon). For the latter option, you will need to do a lot of domain research. Let’s say you wanted to identify which customers were likely to churn based on behavior. Look at the features different industries have used. Talk to a few domain experts in your own industry to see what features might be relevant and how you can relate the features to the edutech domain.

Figuring out why prospective customers engage with your products. This will optimize your approach.

Another important question is wrt finance. What is your plan for profitability? For example, you could establish a firm that handles the process end-end, from data collection to recommendations. This requires a more significant investment but will provide you with a ton of experience in the field. It will also allow you to approach organizations and show them exactly how you can help. For such a setup, you will need to arrange an initial bit of cash as runway till you can hit profitability. This money might come from loans, investors, or personal savings. Each of these methods has pros and cons, so make sure you look into each and decide what is best for your situation.

Your problem/domain discovery will help you figure out the solutions. Photo by Sigmund on Unsplash

When I typically worked with smaller organizations, I had the org handle the data collection side of things. I would do the domain research, tell them what they needed to collect, and then worked with that data. This approach saves you from a lot of headaches. This approach worked out for me because I was using these smaller projects to develop my experience and nail bigger projects. Either approach will work very well for you.

One of the most important aspects you want to cover when implementing for smaller organizations is to set up minimum baselines. There is a big chance that your project will run out of funding/resources before you can try all your ideas. You want to plan for this event. What are the minimum acceptable results? What would the compensation look like? Other arrangements. These conversations can be awkward but must be had before significant time, energy and resources is put into the project. The alternative is getting caught unaware when the constraints/real-life sneak up on you and liver-punch you.

Step 2: Building the Pipeline

The next step comes in building your machine learning pipeline. Building a solid pipeline is very important because it will help you integrate different imputation policies, model training protocols, and other sources of variance.

Watch this short video to understand Machine Learning Pipelines:

When developing your pipeline, make sure you use a very truncated version of your dataset. This will save you a lot of time when testing out your pipeline. The goal at this stage is NOT to get the analysis done. It is just to make sure your pipeline works. Check out this video for a project that will give you the skills required to build these pipelines. I will make a video covering the details of such a pipeline soon, so make sure you subscribe

Step 3: Trial and Error

Next is the non-sexy part of this process. You will need to do a lot of trial and error. Once you run your pipeline on your full datasets, you will have a ton of reports to go through. Understanding different data imputation policies and the other moving parts and evaluating their impact on the datasets. You will discover the weirdest things.

One of the things you will notice is that you will have to reset a lot of features. You will have to test and drop a lot of them. Every time your dataset changes, you will have to rerun the pipeline. This is one of the reasons I recommend that people utilize smaller models. In this constant process of rebuilding and iterations, you most likely won’t be able to afford the costly models used.

Trial and Error will also give you a fingertip for the code and processes allowing you to build and expand your solutions to fine-tune the specifics.

Step 4: Deployment

Once your theoretical solution is working, it is time to deploy it in practice. Often you will face some challenges integrating the data sources into the pipelines, to fully automate the whole process. Based on the domain, and nature of features used, you might need to set up retraining and feature monitoring protocols.

In this stage, you are basically done. Your next step should be to ensure the health of your problem. If your algorithms that are very client-facing (chat-bots etc), then you want to integrate a lot of chaos testing and robustness checks periodically into your systems. If your algorithms operate more on the backend (like recommendation systems), then this step is less important. Instead monitoring for things like Data Drift, Model Generalization etc become a higher priority.

Technology Made Simple

A guide to Chaos Engineering [Technique Tuesdays]

To learn more about the newsletter, check our detailed About Page + FAQs To help me understand you better, please fill out this anonymous, 2-min survey. If you liked this post, make sure you hit the heart icon in this email. Recommend this publication to Substack over here…

3 years ago · 2 likes · Devansh

Final Thoughts

Obviously setting up Machine Learning Solutions for smaller organizations can be a delicate operation. They often lack a large number of resources that allow for larger data operations, recollections, and analysis. Fortunately, you can mitigate this by spending a lot of time on Step 1, really breaking down the components.

Next, keep in mind that software is always a work in progress. In each of the steps, you want to integrate a whole suite of logs, tests, and mock ‘viruses’ to ensure that your system is capable of handling various kinds of loads and tensions. Having automated tests to monitor thresholds can be key to ensuring that you can catch problems as they start to occur.

Loved the post? Hate it? Want to talk to me about your crypto portfolio? Get my predictions on the UCL or MMA PPVs? Want an in-depth analysis of the best chocolate milk brands? Reach out to me by replying to this email, in the comments, or using the links below.