This post will help you set up a GPU enabled Docker container on an AWS EC2 instance. We’ll also cover how to access a Jupyter server running inside the container from your local machine.
Why? One of my favourite things about Docker is that it allows you to easily reproduce your local development environment on a remote machine. This comes in real handy whenever you need to train a model on a GPU. This post assumes some familiarity with building Docker images and launching EC2 instances on AWS. For a primer on Docker and how it can help improve your Data Science workflow, check out this excellent post.
nvidia/cuda as your base image and pass the
--gpus 1 flag when launching a container.
Step 0: Your Git Repository#
The working directory for this post will be your project’s local git repository. While not a strict prerequisite of our goal of setting up a GPU enabled Docker container on AWS, it will make your life much easier by allowing you to simply
git clone your GitHub repo on your EC2 instance.
Step 1: The Docker Image#
The first step is to build the image we need to train a Deep Learning model. We’ll do that by adding the following Dockerfile to our repository.
The key component of this Dockerfile is the
nvidia/cuda base image, which does all of the leg work needed for a container to access system GPUs. Your requirements file should contain all of the packages you need to train your model and will need to include Jupyter for the last
CMD to work. If you have a script that trains your model, just replace the above
CMD with the command that runs your script. You can also enable your favourite Jupyter extensions by adding
RUN jupyter contrib nbextension install after the requirements step.
Step 2: Launch & Connect to EC2#
The next step is to launch an EC2 instance and
ssh into it. Your instance needs to have Docker and nvidia-docker installed. The easiest option is to choose an ubuntu Deep Learning AMI, which comes with both installed. Once you’ve chosen your instance’s AMI, select an instance type with a GPU. Sadly, they’re not cheap. At the time of writing this, a
p2.xlarge instance in
us-west-2 will cost you $0.90/hour 😭.
At the review page before you launch your instance, edit the security groups to open up
port 8888 so that we can connect to the Jupyter server running inside the container from our local machine. Once your instance is up and running,
ssh into it by running this command on your local terminal:
% ssh -i <path-to-aws-pem-file> <user>@<EC2-hostname>
You can find your
EC2-hostname by selecting your instance and clicking
Connect on the AWS Console. It should look something like
Step 3: Clone Your Repo & Build Your Image#
Now that we’ve connected to a running EC2 instance, it’s time to build our Docker image. At your instance’s command-line, clone your GitHub repo:
$ git clone https://github.com/<username>/<repo>.git
Then cd into it and run this command to build your Docker image:
$ docker build -t <image-name> .
If you have a pre-built image stored in a container registry like docker hub you can pull your image with
docker pull, which can save time if your image takes a while to build.
Step 4: Launch a GPU Enabled Container#
With our image built, we’re ready to launch our container. At your instance’s command-line, run:
$ docker run --gpus all -d -p 8888:8888 -v $(pwd):/src <image-name>
The key bit is the
--gpus flag, which gives the container access to your instance’s GPUs (thanks to nvidia-docker). As for the rest,
-dflag runs the container in the background (detatched mode).
port 8888on the container to
port 8888on the EC2 instance (which we opened up to inbound connections earlier).
-vflag mounts the current directory (your cloned repository) to the working directory we set in our Dockerfile (which we called /src), so that changes we make to our notebooks modify the files on disc.
The above command both starts a container and launches the Jupyter server inside of it (thanks to the last
CMD in our Dockerfile). You’ll see the ID of the container printed to the screen. Copy that and run the next command to get the access token for the Jupyter server:
$ docker exec <container-ID> jupyter notebook list
Copy the printed access token and head back to your local command-line.
Step 5: Connect to Your Container’s Jupyter Server#
Congrats! You’ve done all of the hard work to set up a running GPU enabled container on a remote AWS instance. The last step is to forward a local port on your machine to the Jupyter server running inside the container:
% ssh -NfL 8080:localhost:8888 <user>@<EC2-hostname>
The above command forwards
port 8080 on your machine to
localhost:8888 on your EC2 instance, where the Jupyter server is running. Since port 8888 is the default port for Jupyter, we forwarded
port 8080 to avoid clashing with any notebooks running on our local machine. Now navigate to
localhost:8080 in your browser and paste the Jupyter access token from the previous step.
Ta-dah 🎉! You’re now talking to a Jupyter server running inside a GPU enabled Docker container on a remote EC2 instance. If you’re working with PyTorch, you can check that cuda is available with
I hope this post will help ease your development workflow when it comes to training Deep Learning models. Incorporating Docker into your everyday workflow takes time, but it’ll save future you lots of headaches and time spent trying to reproduce your local environment in a remote setting.
P.S. Remember to shut down your
p2.xlarge instance when you’re done!