Docker for Deep Learning on AWS

This post will help you set up a GPU enabled Docker container on an AWS EC2 instance. We’ll also cover how to access a Jupyter server running inside the container from your local machine.

Why? One of my favourite things about Docker is that it allows you to easily reproduce your local development environment on a remote machine. This comes in real handy whenever you need to train a model on a GPU. This post assumes some familiarity with building Docker images and launching EC2 instances on AWS. For a primer on Docker and how it can help improve your Data Science workflow, check out this excellent post.

TL;DR: Use nvidia/cuda as your base image and pass the --gpus 1 flag when launching a container.

Step 0: Your Git Repository#

The working directory for this post will be your project’s local git repository. While not a strict prerequisite of our goal of setting up a GPU enabled Docker container on AWS, it will make your life much easier by allowing you to simply git clone your GitHub repo on your EC2 instance.

Step 1: The Docker Image#

The first step is to build the image we need to train a Deep Learning model. We’ll do that by adding the following Dockerfile to our repository.

The key component of this Dockerfile is the nvidia/cuda base image, which does all of the leg work needed for a container to access system GPUs. Your requirements file should contain all of the packages you need to train your model and will need to include Jupyter for the last CMD to work. If you have a script that trains your model, just replace the above CMD with the command that runs your script. You can also enable your favourite Jupyter extensions by adding RUN jupyter contrib nbextension install after the requirements step.

Step 2: Launch & Connect to EC2#

The next step is to launch an EC2 instance and ssh into it. Your instance needs to have Docker and nvidia-docker installed. The easiest option is to choose an ubuntu Deep Learning AMI, which comes with both installed. Once you’ve chosen your instance’s AMI, select an instance type with a GPU. Sadly, they’re not cheap. At the time of writing this, a p2.xlarge instance in us-west-2 will cost you $0.90/hour 😭.

At the review page before you launch your instance, edit the security groups to open up port 8888 so that we can connect to the Jupyter server running inside the container from our local machine. Once your instance is up and running, ssh into it by running this command on your local terminal:

% ssh -i <path-to-aws-pem-file> <user>@<EC2-hostname>

You can find your user and EC2-hostname by selecting your instance and clicking Connect on the AWS Console. It should look something like ubuntu@EC2–

Step 3: Clone Your Repo & Build Your Image#

Now that we’ve connected to a running EC2 instance, it’s time to build our Docker image. At your instance’s command-line, clone your GitHub repo:

$ git clone<username>/<repo>.git

Then cd into it and run this command to build your Docker image:

$ docker build -t <image-name> .

If you have a pre-built image stored in a container registry like docker hub you can pull your image with docker pull, which can save time if your image takes a while to build.

Step 4: Launch a GPU Enabled Container#

With our image built, we’re ready to launch our container. At your instance’s command-line, run:

$ docker run --gpus all -d -p 8888:8888 -v $(pwd):/src <image-name>

The key bit is the --gpus flag, which gives the container access to your instance’s GPUs (thanks to nvidia-docker). As for the rest,

  • The -d flag runs the container in the background (detatched mode).
  • The -p flag binds port 8888 on the container to port 8888 on the EC2 instance (which we opened up to inbound connections earlier).
  • The -v flag mounts the current directory (your cloned repository) to the working directory we set in our Dockerfile (which we called /src), so that changes we make to our notebooks modify the files on disc.

The above command both starts a container and launches the Jupyter server inside of it (thanks to the last CMD in our Dockerfile). You’ll see the ID of the container printed to the screen. Copy that and run the next command to get the access token for the Jupyter server:

$ docker exec <container-ID> jupyter notebook list

Copy the printed access token and head back to your local command-line.

Step 5: Connect to Your Container’s Jupyter Server#

Congrats! You’ve done all of the hard work to set up a running GPU enabled container on a remote AWS instance. The last step is to forward a local port on your machine to the Jupyter server running inside the container:

% ssh -NfL 8080:localhost:8888 <user>@<EC2-hostname>

The above command forwards port 8080 on your machine to localhost:8888 on your EC2 instance, where the Jupyter server is running. Since port 8888 is the default port for Jupyter, we forwarded port 8080 to avoid clashing with any notebooks running on our local machine. Now navigate to localhost:8080 in your browser and paste the Jupyter access token from the previous step.

Ta-dah 🎉! You’re now talking to a Jupyter server running inside a GPU enabled Docker container on a remote EC2 instance. If you’re working with PyTorch, you can check that cuda is available with torch.cuda.is_available().

Final Thoughts#

I hope this post will help ease your development workflow when it comes to training Deep Learning models. Incorporating Docker into your everyday workflow takes time, but it’ll save future you lots of headaches and time spent trying to reproduce your local environment in a remote setting.

P.S. Remember to shut down your p2.xlarge instance when you’re done!