Docker data saved in persistent storage

ML Training

I know that we can store our models, training data, etc in persistent storage so that we don’t have to load it every time we spin up a new instance.

Can we do the same with Docker data (/var/lib/docker/)?

I assume we are getting our own VM but not sure if we’ll have the root privilege to symlink our /var/lib/docker dir to persistent storage.

I have not signed up with Lamba Labs yet so not having to rebuild our docker environment every time we spin up a new instance will help in the decision process.

Thanks

In short, yes you can definitely do this and you have root access to your instances. Less of a headache than symlinking (for me, largely preference) is just telling Docker that you’ve got your own preferred data-root location:

I’ve just run through this myself on two Lambda Cloud GPU Instances just to make sure I was giving you accurate info, but if interested:

First I set up a Filesystem:
(Important to point out, Lambda notes this is a beta feature on their website. I can only answer from personal experience, but the stability/experience has been excellent to me):
Screenshot from 2023-06-23 14-19-52

Started up an Instance in the same location as my filesystem:

(A6000 instance if curious, though it does not impact this test)

# SSH in normally:
ssh ubuntu@<IP> -i <YOUR_KEY>.pem 

Check that our filesystem is mounted:

ubuntu:~$ ls -l
total 0
drwxr-xr-x 2 ubuntu ubuntu 4096 Jun 23 18:20 TestFilesystem

Set Docker’s root to your desired filesystem (should be mounted at /home/ubuntu/<YOUR_FS_NAME>) and bounce:

echo '{ "data-root": "/home/ubuntu/TestFilesystem" }' | sudo tee /etc/docker/daemon.json && sudo systemctl restart docker

Pulling down a Docker image just to test/demonstrate:

sudo docker pull fedora
sudo docker images
REPOSITORY   TAG       IMAGE ID       CREATED      SIZE
fedora       latest    6126bb20b2b5   2 days ago   190MB

At this time I completely terminate the first instance, but of course I still have my persistent storage.

Then…

Started up a completely fresh/new instance:

ssh into THIS Instance and:

# instantly run the same one-liner as last time
echo '{ "data-root": "/home/ubuntu/TestFilesystem" }' | sudo tee /etc/docker/daemon.json && sudo systemctl restart docker

Checking and making sure our images are still present:

# are all of our images still there?
ubuntu:~$ sudo docker images
REPOSITORY   TAG       IMAGE ID       CREATED      SIZE
fedora       latest    6126bb20b2b5   2 days ago   190MB

Works like a charm. I did notice that when storing docker images in these separate filesystems you may need to set your storage driver in that Docker JSON configuration, example:

{
    "data-root":"/home/ubuntu/TestFilesystem",
    "storage-driver": "devicemapper"
}

-however I did not need to do this for the Fedora Image in this demonstration, just something of note. I’m not fully aware of the differences in the storage type/drivers these separate filesystems use.

“TL;DR:”
-but it all works for me from these basic tests without issue. All works out the box at the cost of a one-liner on-boot which sounds like it’s what you were after.

1 Like

@mpapili Amazing answer!!! Thank you very much. I will be spinning up my first instance next week.

1 Like