2018-09-16

Serverless Machine Learning Classifier SlackBot

In this post I show the steps to build and deploy a SlackBot into AWS using Chalice. The bot uses a trained machine learning model to predict the R&D team most able to answer a question that the user has entered. Like so:

The detail on training the machine learning model will be in a later post. For now just know that text from Jira tickets assigned to scrum teams labelled A to I is used to build a classifier so we can predict a team given a question. The trained model is stored in S3.

Serverless hosting

The SlackBot is basically a web server responding to incoming requests via a rest endpoint. You could just run an Internet accessible server somewhere but this seems like a perfect application to run in AWS Lambda; each request is short-lived and stateless. The problem with AWS is there's a lot to learn to put in place all the bits and pieces you need to achieve this: Lambda functions, API Gateway, IAM roles etc. To simplify this, open source tools and frameworks can take care of a lot of this overhead. AWS Chalice is
...a microframework for writing serverless apps in python. It allows you to quickly create and deploy applications that use AWS Lambda. 
I won't repeat here what you can read in the Chalice docs. I'd suggest building a Hello World rest api to begin with and then following along with this post.

Slack client

To interact with Slack from our Python code we need to use slackclient. Since our application is already running in the Chalice framework we won't be using a lot of the Slack client framework. In fact it's only used to make the api call to return the response to the question.

The Bot does need to be registered in your Slack account. I won't step-by-step through that but here is a screenshot of the critical page showing the configuration of the Event Subscriptions:


You'll see above that there's a Request URL to fill out and in this screenshot it's referring to an AWS location. We haven't got that far yet, there's a little bit of a chicken and egg situation here. We have to build the bot and deploy it via Chalice to get the URL to plug in here. However, in order to deploy that bot we need some info from Slack:

  1. Basic Information / Verification Token -- (for the SLACK_VERIFICATION_TOKEN)
  2. OAuth & Permissions / Bot User OAuth Access Token -- (for the SLACK_BOT_TOKEN)

The Code

If you made the Hello World Chalice application you'll be familiar with the main components:
  • app.py - where we'll build the slackbot
  • .chalice/config.json - where we'll configure the app
  • requirements.txt - the python libs we need

This first block of code does some imports and some initialization. The imports refer to some packages which we'll need to bundle up into the deployment zip for the Lambda function. They are declared in the requirements.txt file for Chalice:


Part of the initialization is reading in environment variables. It's good practice to separate these configuration settings from your code. In addition to the Slack tokens mentioned earlier, we also specify the S3 bucket and key for the machine learning model's location. Here's an example config.json file:


There are a couple of other settings in this config file to look at:

lambda_memory_size - I've set this to 1536 MB - note that the memory allocation is used to calculate a proportional CPU power. Since we're loading and running an ML model the default allocation is not sufficient. My strategy is to start reasonably high and then tune down. The cloudwatch logs show the amount of time spent and memory used for each run - this is good information to use when tuning.

manage_iam_role and iam_role_arn - Chalice will attempt to build an iam role for you with the correct permissions but this doesn't always work. If S3 access has not been granted on the role you can add this in the AWS console and then provide the arn in config.json. You'll also need to set manage_iam_role to false.

Delays

Eagle eyed readers will have noticed that in the cloudwatch log screenshot above there was a big time difference between two runs, one was 10 seconds and the next less than 1 millisecond! The reason for this is to do with the way AWS Lambda works. If the function is "cold", the first run in a new container, then we have to perform the costly process of retrieving the ML model from S3 and loading it into sci-kit learn. For all subsequent calls while the function is "warm" the model will be in memory. We have to code for these "cold" and "warm" states to provide acceptable behaviour.


The code snippet above shows where we load and use the model. load_model() simply checks whether the clf has been initialized. If not it downloads the model from S3 to a /tmp location in the Lambda container and loads it. At the time of writing Lambda provides 512 MB of tmp space so watch the size of your model. While the function is "warm" subsequent calls to load_model() will return quickly without needing to load the model.

You might be wondering why we didn't just load the model in at the top of script. This would only load it in once, and it would do it naturally on the first "cold" run. The problem with this is to do with how Slack interacts with a SlackBot.


The function above is called from Slack whenever there's an incoming event based on the event types we've registered for (screenshot near the top of this post). It's also called during registration as the "challenge" to check that this is the bot matching the registration. The first if statement checks that the incoming token from Slack matches our verification token. Next, if this is a challenge request, we simply turn around and send the challenge string back.

The third if statement is an interesting one, and a reason why we had to handle the model loading the way we did. In the event api doc one of the failure conditions is "we wait longer than 3 seconds to receive a valid response from your server". This also covers the "challenge" request above, so if we loaded the ML model for 10 seconds we wouldn't even be able to register our bot! The doc goes on to say that failures are retried immediately, then after 1 minute and then 5 minutes. This code could more thoroughly deal with various failure scenarios and handle each accordingly but, for simplicity and because this isn't a mission critical app I've written the code to simply absorb retry attempts by immediately returning with a 200 OK. The reason for this is I was finding that my first "cold" run would take longer than the 3 seconds and so a retry would come in. This might even cause AWS to spawn another Lambda function because my first one was still running. Eventually, depending on timing, I might get two or three identical answers in the Slack channel as the runs eventually completed.

So, finally we get onto the bot actually performing the inference over the incoming question and returning a string suggesting the team to contact. This is actually quite straightforward - it looks for the word "who" in the question and then passes the whole string into the predict function to get a classification. Note that load_model() is only called if "who" is found.

Deployment

If you ran through the Chalice Hello World app earlier you will have got to the point where you ran "chalice deploy" and then it did it's magic and created all the AWS stuff for you and you were up and running. This might work for this SlackBot, but one problem I've run into is the 50MB zip size limit that AWS Lambda imposes. If your deployment zip (all the code and packages) is larger than this limit it'll fail. All is not lost though, for whatever reason if you deploy your zip from S3 rather than as part of your upload you can go beyond this limit.

Chalice provides an alternative to the automatic deployment method allowing you to use AWS Cloudformation instead. This way of deploying has a side-benefit for this limit problem in that it uses S3 to hold the package for deployment. Calling "chalice package", rather than deploy creates you the zip and the SAM template for you to then deploy using the AWS CLI. To automate this for my SlackBot I built a bash script:


Registration

Finally, with the bot deployed we can register it with Slack. Remember that Request URL and the challenge process? Take the URL that either "chalice deploy" or "deploy.sh" printed out and add "/slack/events" to it so it looks something like this:

  • https://##########.execute-api.us-east-1.amazonaws.com/api/slack/events

Paste this into the request URL form field and Slack will immediately attempt to verify your bot with a challenge request.

We're done!

The Serverless SlackBot is up and running and ready to answer your users questions. In the next post I'll explain how the machine learning model was created with a walkthrough of the code to collect the data from Jira, clean it, train a model on it and deliver it to S3.

Code available on Github

2017-05-06

ControlMyPi shutting down :(

Unfortunately ControlMyPi will be shutting down on September 1st due to Google removing support for XMPP :(

Since I first wrote ControlMyPi many well supported IoT platforms have come about. Most of these use MQTT for the messaging where ControlMyPi was using XMPP.

I would suggest you take a look at the IoT systems provided by Adafruit and AWS. Here’s a tutorial I wrote for Adafruit: Monitor PiCam and temperature on a PiTFT via adafruit.io

There are many many more on the Adafruit learning system.

2015-06-14

Motion Google Drive Uploader for OAuth 2.0

Three years ago I wrote the original script to upload video to Google Drive as a trigger from Motion. Unfortunately Google recently removed support for the old security model and you now have to use OAuth 2.0. Also the gdata libraries have been superseded by the Google API Python Client. A few people have written alternatives to my original script quicker than I did, but here's my new version.

Note: I'm not using any third party libraries like PyDrive for example, this script goes straight to the Google APIs. (PyDrive has not been touched for a long time by its author and is missing features).

Motion Google Drive Uploader on Github

Installation

Google setup:


  1. Go to https://code.google.com/apis/console - make sure you're logged in as the correct user for the Drive you want to use.
  2. Use the drop-down at the top of the page to create a project called 'Uploader'.
  3. Click on "Enable an API" then "Drive API" and then click the "Enable API" button
  4. Next go to "APIs & auth -> Credentials" menu and click "Create new Client ID".
  5. Set Application type to "Installed application" and click "Configure Consent Screen"
  6. Set the Product Name to "Uploader"
  7. You'll be returned back to the dialog. Make sure "Installed application" and "other" are selected and then click "Create Client ID".
  8. Download the JSON file.
  9. Rename it to client_secrets.json and put it somewhere like /home/pi/.

Script setup:

  1. Go to /home/pi/ and get the uploader.py from github: git clone https://github.com/jerbly/motion-uploader.git
  2. Update/install Google Python API: sudo pip install --upgrade google-api-python-client
  3. Make uploader.py executable: chmod a+x uploader.py

Configuration:

  1. If you used the git clone command you'll find the example uploader.cfg file in the motion-uploader directory. This now has a couple more options which need setting up. 
  2. oauth/folder - set this to the directory where you have put the client_secrets.json file and where the script can write a credentials. e.g. /home/pi/
  3. docs/snapshot-folder - set this to a public folder name in your drive if you wish to use this feature (explained below).

Initial authentication:

  1. From the command line run the script by hand with your config file and a test avi file e.g. ./uploader.py /home/pi/uploader.cfg /home/pi/test.avi
  2. Copy the URL to your clipboard and paste it into a browser where you are already logged in to Google as the correct user.
  3. Accept and copy the authentication code to the clipboard.
  4. Paste it back in to the command line and hit enter.
  5. This will authenticate you and create a credentials file which will be used for future logins.
Now, finally, you can run up Motion and it should work like it used to.

New feature: Public Snapshot

Motion comes with a feature to periodically take a snapshot regardless of whether motion has been detected. This is a nice feature if you want to have a web site with the latest view from your webcam. You can use Google Drive to host this image rather than installing a web server on your Pi and opening firewalls etc.
  1. Create a public folder in your Google Drive: 
    1. Create a new folder called 'public'
    2. Double click on it
    3. Go to the sharing menu (person icon with + sign)
    4. Click "Advanced" and then "Change..."
    5. Set it to "On - Public on the Web"
  2. Configure your uploader.cfg so docs/snapshot-folder is set to this folder 'public'.
  3. Configure motion.conf to take a snapshot every n seconds named lastsnap.jpg and upload it:
    1. snapshot_interval 300
    2. snapshot_filename lastsnap
    3. on_picture_save /home/pi/motion-uploader/uploader.py /home/pi/uploader.cfg %f snap
To find the public URL that you can bookmark and embed in other pages run the command line with the 'snapurl' option: ./uploader.py /home/pi/uploader.cfg /home/pi/lastsnap.jpg snapurl

It will print something like this: https://googledrive.com/host/{your-folder-id-here}/lastsnap.jpg