2021-02-27

Practical ML end-to-end project: labelling, training, inference, serverless deployment

The End

The goal of this project was to make a system that could play the card game Quiddler alongside human players. The reason for doing it was to have a vehicle to learn new technology. Therefore, to make this challenging I wanted a cloud hosted system to read the physical cards from a camera and deduce the best play.

And here's the result: quiddler.jerbly.net


When you visit this web page you can open the webcam and then take a photo of the cards in your hand and the card on the deck. Provide the number of cards in your hand as a hint and then ask for the best play. The screenshot shows that the cards have been recognised successfully and the suggested play is to pick up the deck card "o" and drop the "a", then make the words "xenon" and "udo" for a score of 37.

So what happened?

  1. quiddler.jerbly.net is a custom domain assigned to an Azure Static Web App.
  2. The vue.js web app takes the webcam images and sends them to an Azure Function App.
  3. The function runs an object detection model over the image to find the letters on the cards and returns them to the web app. This IceVision model was built using hand labelled photos and trained using AzureML.
  4. The web app then sends the letters to a second function to get the suggested play.
  5. This second function uses a recursive searching method over a prefix tree index of a dictionary.
  6. The result is sent back to the page and rendered.  

There's a lot going on here. This long blog post will take you through the journey:

  • Object detection: Data labelling, model training and inference on AzureML.
  • The Game - using prefix trees to find the best play
  • Using a serverless Azure Function App for Object Detection Inference and the game
  • Creating and hosting the vue.js web app as an Azure Static Web App

Object Detection

My first attempt at a system to read the cards was through OCR. Surely reading characters is a solved problem? I went down a rabbit hole here looking at classic CV techniques and I even recreated the EAST paper so I could train my own text detector. I learned a lot about deep neural networks, pytorch, fastai and machine learning in general doing this, but it didn't make a good Quiddler card recognizer! EAST gives you rotated bounding boxes over areas of the scene that look like text. The idea was to get these rboxes, cut them out, rotate them back to horizontal, filter and pass to Tesseract. It was not great.

Then it dawned on me to stop seeing the characters on the cards as text but just as things, objects. What I needed was a good object detection framework with built-in image augmentation. When studying deep learning it's common to train a network to recognize cats and dogs, the idea being that I can show it enough cats and dogs so it learns cattiness and doggy-ness. Now if I show it a picture of a cat it's never seen before it'll recognize it as a cat since it has a higher score of cattiness. In this experiment you spend a lot of time worrying about generalizing for any cat, not just the ones in the training set. In the case of these cards though it's subtly different. I only want to detect this exact text, if it's an A I'm not trying to generalize for all different fonts or handwriting etc. I only need to detect the A as it appears on these cards. The variations which need to be covered though are to do with how the cards are held up to the camera: rotation, zoom, lighting and perspective warping. Since these are relatively easy to synthesize through image augmentation it meant I didn't have to go crazy and take 10,000 photos and label them. In fact I only used 120 photos for 31 different cards!

Labelling

I used an iPhone, a webcam and VGG Image Annotator to make the dataset. Since I started on the EAST project and rotated boxes, I had marked these up as polygons rather than flat boxes. This doesn't matter though as you can easily put a flat box around a polygon by just finding the min and max.

It's pretty amazing what you can achieve with such a small dataset. Here's a screenshot from the VIA grid view of the entire labelled training set:

IceVision

I found the IceVision object detection framework via the fastai Discord. It seemed to be the perfect fit for my problem making it really simple to train a faster-rcnn model on my dataset. All that was needed was a parser for the VIA json output. This was originally written standalone but I later did a PR to add this to the library.

Training

Training went well locally on my TensorBook but I was interested to try this out on AzureML to capture statistics from multiple training runs and to use the cloud compute for larger batch sizes (the RTX 2080 in the TensorBook has 8GB). The AzureML Quiddler notebook shows you how to upload and register a dataset, train the model and then register the best performing model in your workspace. The main concept in AzureML is that you write a training python script that takes in arguments for hyperparameters etc. You then provide the SDK with an environment containing the conda or pip dependencies and it builds and registers docker container for you. Then you launch the script via the container with the parameters you want on the compute you specify. The compute can be your local machine or a cluster in your AzureML workspace.
The gist above runs the training.py script with 3 epochs, batch_size 6 etc. on local compute. The default base image for the environment contains python 3.6 on Ubuntu 16.04. To match my dev environment I wanted python 3.8 on Ubuntu 18.04. To achieve this you need to specify an alternate base image and specify python 3.8 in the dependencies yaml file:

Dataset

Going back to the launch code for a moment you'll see the input_data argument. Here I pass it a dataset to be used by the training script as a download since I only have 120 photos. Alternatively you can ask for this to be a mount when you have a larger dataset. AzureML takes care of getting the dataset in a location for you and provides the script with a path.

To look at the registered dataset you can use the AzureML Workspace via the Portal:

 

Statistics

While the training is running you get a nice view inside your notebook. To make this really useful you can use a group of logging functions from the Run class inside your script to post data to your training run. By hooking this up to the fastai callback system you can log your losses and metrics for each epoch quite easily:
Then the notebook widget produces this:
I am working on another project to make an AzureML / fastai / IceVision helper library where you can get this and more. So by the time you're reading this there could be another dependency.

Not only do the statistics go to your notebook but they're also logged in your AzureML workspace and visible in the portal. You can then do neat things like compare experiment runs:
The Run class also contains log methods for other data types like tables and arrays. You can also log images - at the end of the training I run inference on a test set and log the output as a table with its images:
You can then look at these in the portal:

Results

My best run as determined by my test set accuracy ran for 200 epochs in 35 minutes with a small batch size, 6. This run correctly identified every card in the test set. I wanted a practical way to confirm that the model was going to be good enough though so I created a quick script to predict cards in real-time from a webcam. This allowed me to manipulate the cards in front of the camera and spot any weaknesses:

The Game

On your turn you need to try to make the best scoring word or words from your hand. You can also substitute one of your cards with the card face up on the discard pile. Or choose the next unseen card from the face-down deck. The complication for the algorithm is that you're not trying to make the longest single word but to use up all the cards in your hand on multiple words. Also, some of the cards are double-letter cards, IN for example.

Prefix Trees are used to hold the structure for all possible words and the permutations given the double-letter cards. For example the word: "inquiring" can be constructed from the cards in 8 ways:

            'in/qu/i/r/in/g': 36,
            'in/qu/i/r/i/n/g': 36,
            'in/q/u/i/r/in/g': 46,
            'in/q/u/i/r/i/n/g': 46,
            'i/n/qu/i/r/in/g': 36,
            'i/n/qu/i/r/i/n/g': 36,
            'i/n/q/u/i/r/in/g': 46,
            'i/n/q/u/i/r/i/n/g': 46

As the game progresses you start each round with an increasing number of cards in your hand. The last round has 10 cards. The implementation takes the hand cards and deck card and suggests the best play as a result.

Hand:     a/cl/g/in/th/m/p/o/u/y
Deck:     n
Score:    58
Complete: True
Words:    ['cl/o/th', 'm/u/n/g', 'p/in/y']
Pick up:  n
Drop:     a

Inference Service

AzureML has a few different ways that you can deploy a model for inference. This notebook shows the flow and interaction with a local, ACI (Azure Container Instance) and AKS (Azure Kubernetes Service) deployed AzureML service. 

The pattern is very similar to what we did for training: define an environment, write a scoring script, deploy.

The dependencies for the environment are similar to those used for training but we can use icevision[inference] instead of icevision[all] to reduce the size a bit. Plus we need to add pygtrie for the prefix tree code. I did hit on a problem here though that might help people out. I wanted to use a CPU Ubuntu 18.04 base image but for some reason it threw errors about a missing library. Thankfully the AzureML SDK allows you to specify an inline dockerfile so we can add the missing library:
Note that we're defining the scoring script, score.py, in the InferenceConfig. This contains two functions which you need to define: init() and run(). In the init() function you load the model and any other initialization, then run() is called every time a request comes in through the web server.
To consume the service you just need to know the service_uri and optionally the access key. You can get these from the portal or by calling get_keys() and grabbing the service_uri property from the service object returned when you call Model.deploy(). The gist below shows deploying to AKS and getting those properties:
Note I'm referring here to an inference cluster named jb-inf which I previously set up through the portal.

Finally we can call the web service and get the results. The scoring script takes a json payload with base64 encoded images for the hand and the deck and a hint for the number of cards in the hand:

Serverless

The inference service through AKS is great for serious usage but it's not a cost effective way to host a hobby service. Your AKS cluster can scale down pretty low but it can't go to zero. Therefore you'd be paying for 24x7 uptime even if no one visits the site for months! This is where serverless can really shine. Azure Function Apps on the Consumption Plan can scale to zero and you only pay for the time when your functions are running. On top of this there's a significant level of free tier usage per month before you would have to start paying anyway. I haven't paid a cent yet!

So what's the downside? Cold-start:
"Apps may scale to zero when idle, meaning some requests may have additional latency at startup. The consumption plan does have some optimizations to help decrease cold start time, including pulling from pre-warmed placeholder functions that already have the function host and language processes running."

Here's some data taken from running a few invocations in a row on the card detection function which I'll go through next:

As you can see, there's a slow invocation taking 50 seconds, then all subsequent ones take 10.

Function App

Azure Function Apps are essentially hosts that contain multiple functions. This is a little different if you are used to AWS Lambda functions. As you can imagine there are many different ways to develop, test and deploy Azure Functions but I have found using the Azure Functions extension for VSCode is really nice. There are great getting started guides that I won't repeat here.

When you're working with Azure functions you define settings for the host in the host.json and for python you have a single requirements.txt file to define the dependencies for the host. I decided to divide the problem of finding the best play for a set of cards into two functions. One, cards, that does the object detection on the hand and deck and a second, game, that takes two strings for the hand and deck and returns the best play. This basically splits what we had earlier from the single scoring script in the AzureML AKS deployment. Here's what the directory tree looks like:

├── cards
│   ├── function.json
│   ├── __init__.py
│   └── quiddler.pt
├── game
│   ├── function.json
│   ├── __init__.py
│   ├── quiddler_game.py
│   └── sowpods.txt
├── host.json
├── local.settings.json
└── requirements.txt

You can see the layout with the two functions and the host. Note, in the cards folder there's quiddler.pt - the trained model, and in the game folder there's sowpods.txt - the word dictionary. These are both heavyweight items for which there is some overhead to initially load and process. In both cases we use a global from the entry-point function: main() to lazy initialize, just once. Subsequent calls to this running instance will not need to perform this initialization:
The game function is quite straightforward. Let's look at the cards function. There are a couple of changes from the AKS implementation mostly to support the web app front-end that we'll get to later. Support has been added for URLs - if you pass in a URL instead of base64 encoded image it will fetch it. We're also now returning the images with the predict mark-up to display in the UI. This borrowed a bit of code from the training run when we uploaded test result images to the workspace:
A final note on the requirements.txt - to save some load time we want to keep the dependencies fairly tight. So CPU pytorch libraries are loaded and icevision with no additional options:
After deploying to Azure you can see the functions in the portal and in VSCode:
If you right-click on the function name you can select "Copy Function Url" to get the api end-point. In my case for the cards it's: https://quiddler.azurewebsites.net/api/cards

Now we can post a json payload to this URL and get a result, great! So let's make a front-end to do just that:

Static Web App

The Static Web Apps service in Azure is relatively new. At the time of writing it's still marked as Preview. Basically we want to host some html, css, javascript etc. - this is achievable using a Storage Account, but this new service has some cool additions.

First things first, let's take a look at the vue.js web app. Front-end design is not my strength, so let's just focus on how we call the serverless functions and display the results. In main.js this is handled with uploadImages(). This builds the json payload and uses axios to send it to the URL of the function we discovered earlier. If this call is successful the marked-up card images are displayed and the strings representing the detected cards are sent to the game serverless function. If this second call is successful, the results are displayed.

A really useful feature of all the Azure services used in this blog post is that they can all be developed and tested locally before you deploy. You saw how the training and inference in AzureML can all be done locally. Function Apps can run up in a local environment and you can debug right in VSCode. One complication with this is CORS. When developing locally you need to define the CORS setting for the function app in local.settings.json:

With this in place you can just open the index.html page as a file in Chrome and temporarily set the api URLs to the local endpoint. e.g. http://localhost:7071/api/

The new Static Web App features Github Action based deployment. So, once you're all set with the local development you can use the VSCode extension and provide access to the repo containing your app. It will then install an Action and deploy to Azure. Now, whenever you push to main it will deploy to production! There are also staging environments for PRs, free DevOps! You can even see the action history in VSCode:

Static Web Apps defines a domain for you that you can change by setting a Custom Domain through the portal - I assigned this to quiddler.jerbly.net. Finally, we have to set CORS in production through the portal:

Wrap-up

And that's it! Go to quiddler.jerbly.net and have a go. If you don't have a set of cards you can still see it in action by clicking "Random" for hand and deck. Oh and if you don't recognize some of the words it comes up with (this will happen a lot) just click the link though to the Collins Dictionary definition. 


2018-09-16

Serverless Machine Learning Classifier SlackBot

In this post I show the steps to build and deploy a SlackBot into AWS using Chalice. The bot uses a trained machine learning model to predict the R&D team most able to answer a question that the user has entered. Like so:

The detail on training the machine learning model will be in a later post. For now just know that text from Jira tickets assigned to scrum teams labelled A to I is used to build a classifier so we can predict a team given a question. The trained model is stored in S3.

Serverless hosting

The SlackBot is basically a web server responding to incoming requests via a rest endpoint. You could just run an Internet accessible server somewhere but this seems like a perfect application to run in AWS Lambda; each request is short-lived and stateless. The problem with AWS is there's a lot to learn to put in place all the bits and pieces you need to achieve this: Lambda functions, API Gateway, IAM roles etc. To simplify this, open source tools and frameworks can take care of a lot of this overhead. AWS Chalice is
...a microframework for writing serverless apps in python. It allows you to quickly create and deploy applications that use AWS Lambda. 
I won't repeat here what you can read in the Chalice docs. I'd suggest building a Hello World rest api to begin with and then following along with this post.

Slack client

To interact with Slack from our Python code we need to use slackclient. Since our application is already running in the Chalice framework we won't be using a lot of the Slack client framework. In fact it's only used to make the api call to return the response to the question.

The Bot does need to be registered in your Slack account. I won't step-by-step through that but here is a screenshot of the critical page showing the configuration of the Event Subscriptions:


You'll see above that there's a Request URL to fill out and in this screenshot it's referring to an AWS location. We haven't got that far yet, there's a little bit of a chicken and egg situation here. We have to build the bot and deploy it via Chalice to get the URL to plug in here. However, in order to deploy that bot we need some info from Slack:

  1. Basic Information / Verification Token -- (for the SLACK_VERIFICATION_TOKEN)
  2. OAuth & Permissions / Bot User OAuth Access Token -- (for the SLACK_BOT_TOKEN)

The Code

If you made the Hello World Chalice application you'll be familiar with the main components:
  • app.py - where we'll build the slackbot
  • .chalice/config.json - where we'll configure the app
  • requirements.txt - the python libs we need

This first block of code does some imports and some initialization. The imports refer to some packages which we'll need to bundle up into the deployment zip for the Lambda function. They are declared in the requirements.txt file for Chalice:


Part of the initialization is reading in environment variables. It's good practice to separate these configuration settings from your code. In addition to the Slack tokens mentioned earlier, we also specify the S3 bucket and key for the machine learning model's location. Here's an example config.json file:


There are a couple of other settings in this config file to look at:

lambda_memory_size - I've set this to 1536 MB - note that the memory allocation is used to calculate a proportional CPU power. Since we're loading and running an ML model the default allocation is not sufficient. My strategy is to start reasonably high and then tune down. The cloudwatch logs show the amount of time spent and memory used for each run - this is good information to use when tuning.

manage_iam_role and iam_role_arn - Chalice will attempt to build an iam role for you with the correct permissions but this doesn't always work. If S3 access has not been granted on the role you can add this in the AWS console and then provide the arn in config.json. You'll also need to set manage_iam_role to false.

Delays

Eagle eyed readers will have noticed that in the cloudwatch log screenshot above there was a big time difference between two runs, one was 10 seconds and the next less than 1 millisecond! The reason for this is to do with the way AWS Lambda works. If the function is "cold", the first run in a new container, then we have to perform the costly process of retrieving the ML model from S3 and loading it into sci-kit learn. For all subsequent calls while the function is "warm" the model will be in memory. We have to code for these "cold" and "warm" states to provide acceptable behaviour.


The code snippet above shows where we load and use the model. load_model() simply checks whether the clf has been initialized. If not it downloads the model from S3 to a /tmp location in the Lambda container and loads it. At the time of writing Lambda provides 512 MB of tmp space so watch the size of your model. While the function is "warm" subsequent calls to load_model() will return quickly without needing to load the model.

You might be wondering why we didn't just load the model in at the top of script. This would only load it in once, and it would do it naturally on the first "cold" run. The problem with this is to do with how Slack interacts with a SlackBot.


The function above is called from Slack whenever there's an incoming event based on the event types we've registered for (screenshot near the top of this post). It's also called during registration as the "challenge" to check that this is the bot matching the registration. The first if statement checks that the incoming token from Slack matches our verification token. Next, if this is a challenge request, we simply turn around and send the challenge string back.

The third if statement is an interesting one, and a reason why we had to handle the model loading the way we did. In the event api doc one of the failure conditions is "we wait longer than 3 seconds to receive a valid response from your server". This also covers the "challenge" request above, so if we loaded the ML model for 10 seconds we wouldn't even be able to register our bot! The doc goes on to say that failures are retried immediately, then after 1 minute and then 5 minutes. This code could more thoroughly deal with various failure scenarios and handle each accordingly but, for simplicity and because this isn't a mission critical app I've written the code to simply absorb retry attempts by immediately returning with a 200 OK. The reason for this is I was finding that my first "cold" run would take longer than the 3 seconds and so a retry would come in. This might even cause AWS to spawn another Lambda function because my first one was still running. Eventually, depending on timing, I might get two or three identical answers in the Slack channel as the runs eventually completed.

So, finally we get onto the bot actually performing the inference over the incoming question and returning a string suggesting the team to contact. This is actually quite straightforward - it looks for the word "who" in the question and then passes the whole string into the predict function to get a classification. Note that load_model() is only called if "who" is found.

Deployment

If you ran through the Chalice Hello World app earlier you will have got to the point where you ran "chalice deploy" and then it did it's magic and created all the AWS stuff for you and you were up and running. This might work for this SlackBot, but one problem I've run into is the 50MB zip size limit that AWS Lambda imposes. If your deployment zip (all the code and packages) is larger than this limit it'll fail. All is not lost though, for whatever reason if you deploy your zip from S3 rather than as part of your upload you can go beyond this limit.

Chalice provides an alternative to the automatic deployment method allowing you to use AWS Cloudformation instead. This way of deploying has a side-benefit for this limit problem in that it uses S3 to hold the package for deployment. Calling "chalice package", rather than deploy creates you the zip and the SAM template for you to then deploy using the AWS CLI. To automate this for my SlackBot I built a bash script:


Registration

Finally, with the bot deployed we can register it with Slack. Remember that Request URL and the challenge process? Take the URL that either "chalice deploy" or "deploy.sh" printed out and add "/slack/events" to it so it looks something like this:

  • https://##########.execute-api.us-east-1.amazonaws.com/api/slack/events

Paste this into the request URL form field and Slack will immediately attempt to verify your bot with a challenge request.

We're done!

The Serverless SlackBot is up and running and ready to answer your users questions. In the next post I'll explain how the machine learning model was created with a walkthrough of the code to collect the data from Jira, clean it, train a model on it and deliver it to S3.

Code available on Github

2017-05-06

ControlMyPi shutting down :(

Unfortunately ControlMyPi will be shutting down on September 1st due to Google removing support for XMPP :(

Since I first wrote ControlMyPi many well supported IoT platforms have come about. Most of these use MQTT for the messaging where ControlMyPi was using XMPP.

I would suggest you take a look at the IoT systems provided by Adafruit and AWS. Here’s a tutorial I wrote for Adafruit: Monitor PiCam and temperature on a PiTFT via adafruit.io

There are many many more on the Adafruit learning system.