22 August 2017

My intro to Docker - Part 1 of 5

It’s been quite a while since I blogged, and there are several reasons for this. First of all, I haven’t really had the time, but I also haven’t really found a topic that I feel passionate enough about to blog about. But having played around with Docker, I now have! So I thought I would jot down some stuff about Docker… If nothing else, it gives me a away to come back to what I used to know when I have forgotten all of it again…

What is Docker?

The first thing to cover is what Docker really is. I have seen a lot of explanations, both of what it is, and why it is so good. But I have had a hard time grasping it in the way that it has been explained to me. So here is my explanation. And by my explanation, I mean the way I think about it. It might not be 100% correct from an implementation point of view, but it is the way I see it…

Many people ask what the difference is between Virtual Machines and Docker containers, so I thought I would take that viewpoint.

When we run VMs, vi basically emulate all the hardware of the machine, including a virtual hard drive, and then boot an operating system from that virtual HDD inside of our existing operating system. So we are booting up a full machine, as we normally would, but all the hardware that it runs on is just virtual stuff from the host. This way, we can run many machines on one physical machine, but each virtual machine is a full machine with an OS and so on. Docker is different…

When we run Docker, we have our existing operating system as the base, and then create a area on top of that where our Docker container runs. It still uses the same operating system, but it’s isolated from everything else on the host, so that it can’t interact with the other things installed on the machine. That way, we don’t have to boot up a whole operating system, because that is already booted up with the host. Instead, all we need to do is set up a context that is isolated from the rest of the machine, and have our application run in there. Inside that context, we can then set up the environment that is needed for our application to run. But the main thing is that it doesn’t run its own OS, it just runs in an isolated context on top of the host.

Inside the container, it is a lot like differential disks that are used when running virtual machines. A diff disk contains all the changes from base disk to the current state. So you start out with a base disk that has all the “base” information. The base disk could for example be a disk that contains a clean Windows server install. Then you add a disk on top of that, and install IIS. That second disk only contains the added/changed bytes that were required to install IIS. And then on top of that you might add another diff disk that contains your application. And that disk only contains the bytes needed for that… Slowly building up to what you need. However, in the end, the disks are kind of mashed together, and booted up like a full server. But you can for example use the same base disk for multiple other diff disks. So the WIndows Server base disk could be used for all VMs running on that Windows version.

In Docker, your initial base disk is actually your own operating system, or the host. In most cases that would be your Linux machine, but with Windows Server 2016 you can use Windows containers as well. And then your images are basically diff disks on top of your empty host operating system. But it isn’t based on the actual host disk, with all the installed applications and so on. Instead, it’s based on a clean/empty version of that OS disk. Basically the raw OS part of your hosts.

That means that we don’t have to boot up the server to get everything running. Instead, we can use the hosts OS, and then start a new isolated context with the “diff disks” added on top of the empty OS. This makes it MUCH faster to start a container, compared to starting a physical or virtual machine.

This is also why you can’t run Windows containers on a Linux host and vice versa. At least at the moment. In the near future, it seems like you will be able to run Linux containers on Windows. But that is done through some magic and virtualization.

In Docker, we don’t talk about differential disks though. We talk about images. But an image is basically like a diff disk. It contains all the changes you want to do on top of the host OS for your application. And just as in my previous example, you could create an image that contained IIS. And then add another image based on that image, that contains your application.

Kind of like this

My visualization skills might not be the best, but what I’m trying to show is that at the bottom, we have the physical, or virtual machine, that will be the host. On top of that, we have the host operating system. And in that OS, we might have files and apps that we have installed on the machine, but it also contains a container (green). The container has 2 images (orange) “added on top of” the host OS. In this case, one image that adds IIS, and then one on top of that includes the actual application that I want to host. But the container is completely isolated from the installed apps and files on the host. It even has its own network. So the isolation is pretty complete, unless we go in and mess with it.

This is a very simple example, and also potentially stupid one, as I choose to use IIS. Normally, you would probably use Apache or nginx for the example, as most Docker stuff is Linux based. But I thought that it might be a little easier to grasp using Microsoft-centric technologies. And I consider it simple since the example only runs one container on the host. In most cases you would run multiple containers on the same host, all isolated from each other. This enables much better utilization of resources on the host, and higher density of applications on each machine.

Images and containers…?

As I explained in the previous section, a Docker image is a lot like a diff disk. It contains the bytes that needs to be added to get the environment you need inside your container. And the cool thing is that there are public repositories containing images for you to use. This makes it really easy to get started with new stuff. If you want to play around with Redis for example, you just pull down the Redis image to your machine, and start a container based on that image, and all of the sudden you have Redis running with a default configuration on your machine. No installation. Not configuration. And no risk of messing up your machine.

So the image contains the bytes needed to get your environment as you need it to run your application. And it’s based on some other image, or the host OS. The image could be a Redis install with default configuration, or maybe Apache, or maybe just an image preloaded with the stuff you need to run .NET Core on Linux. Either way it is just a predefined set up of the environment you need.

The image is then used to start containers. So you don’t start or run the image as you would start a VM by using the disk. Instead, you start/run a container based on that image. So the image is just the blueprint for the environment that you want inside of your container, and it’s immutable. So you can start as many containers as you want based on the same image. Each container will use the image to set up the environment, and then start whatever processes are configured inside that environment, but the container will never change the image. Any writes inside the container writes to another layer on top of the image…

Installation

The first step to getting started with Docker is to install it on your machine. And since you are on my blog, I’m assuming that you are running Windows, or maybe Mac. That means that you want to install Docker Desktop. This is an application that runs on both Windows or Mac, enabling Docker. On the Windows side, it uses Hyper-V to host a Linux VM that runs Docker for us, and on the Mac side, it does something as well… I don’t actually know, but I assume it uses some form of Mac based virtualization to host a Linux machine with Docker.

Once your have either of these installed, and started, you can start using Docker. I use Windows, so I will be using PowerShell to run my Docker commands, but it should be pretty much identical if you do it in the Terminal on a Mac…

The first thing you can try, just to verify that everything is working, is to run

docker version

This will tell you what version of Docker you are running both on the server, and on the client. As well as some other stuff.

If you don’t get a print out telling you that, something is wrong…

Getting images and setting up a container

Next, you can try and run

docker images

This will show you what images you have installed on your machine. I assume that you get very little back since you just installed it. And by very little, I mean nothing. But to be honest, I’m not 100% sure of what you get by default when you install it. But either way, let’s try and pull down a small and simple base image that we can try to run.

Images are stored in repositories, either public or private, or local. When you ran docker images, you asked Docker to list all images in the local repository. But there is also a huge public Docker repo collection called Docker Hub. This is a place where people upload useful images for the public to use. And the image I want, is a tiny one called alpine, which is a 5Mb Alpine Linux image.

There are two ways to pull an image from a Docker Hub repo. Either, you just request Docker to set up a container based on the image you want, and Docker automatically pulls it down for you if you don’t have it, like this

docker run -it alpine

or, you can manually pull it down to your machine first, and then run it like this

docker pull alpine
docker run -it alpine

Either way, it will pull down the alpine docker image to the local host, set up a new container based on that image, and attach the input and output from that containers terminal to your PowerShell window. So after running the docker run command, you can go ahead and run whatever Linux Shell command you want inside that container… And when you are done, you just type exit, to exit the process, which causes the container to stop.

Note: When pulling down the image, it mentions that it is using default tag “latest”. “Latest” in this case is the “version” of the image. All images in a repo has a tag, or version. By default, the last image uploaded to a repo gets the “latest” tag. This way, you can ask to get a specific version of an image in a repo, or just leave it out, and get the latest by default. And since I don’t care about what version of the alpine image I get, I just ignore the tag for now to get the latest.

So, what am I really doing here? Well…the docker pull alpine is pretty self explanatory I think. But the docker run one is a bit more complicated as I have added some options to that command.

docker run, tells the Docker client to use the run command, which sets up, and start, a new container. The simplest version of that command is just

docker run <IMAGE NAME>

This will set up a new container using the defined image, and start it. However, if you do that with the alpine image, it will do nothing. It will start the container, but nothing is happening inside it, so Docker will just stop it straight away.

This isn’t very useful for this image. But you can also give it a command to run when it starts, which can be useful. Like this

docker run alpine ls

This will set up a new container based on alpine, start the container, run the ls command, output the returned result, and then consider the command completed and stop the container again.

So what is the --it option? Well, first of all, it is a concatenated version of -i -t, and it means that you want to attach to both the input and output of the running container, allowing your to execute commands inside the container. So when the container starts, the PowerShell prompt turns into a remote prompt for the Linux container you are running.

Listing and removing containers

Starting up and playing around with containers is fun, but when you run a container like this, once it stops, it isn’t actually deleted.

If you want to see all the containers on your machine, you use the ps command like this

docker ps

This will list all the running containers on your machine.

In this case, it will probably be empty, because all of your containers have been stopped. But if you tell it to list ALL the containers like this

docker ps -a

you probably get a list of containers that are stopped.

The list includes the containers id, the image is based on, the command that is run when it starts, when it was created, current status, ports and the name of the container. A good set of information about the container…

If you want to clean up the list, and free some space on your machine, you can remove a container by running the rm command.

docker rm <CONTAINER ID/NAME>

All containers get a unique id, as well as a name. The name is an auto generated 2 part name if you don’t manually set one, which is a little easier to work with than the auto generated id. You can use either when you remove a container though. And if you use the id, you only need to define enough of the characters of the id for it to be unique. You don’t need to use all…

Ever so often you end up starting up and playing around with a containers for a very short time, while trying something out. Having to manually remove it after just trying something out, can be a bit tedious. So you can actually automate the removal if you want, by adding the option --rm to the run command. Like this

docker run –it --rm alpine

This way, you tell Docker that you want it to remove the container as soon as it stops, so you don’t end up with a heap of useless containers on your machine.

This is a nice convenience thing, but beware, once in a while you might wish that your test container hadn’t been removed…

That was it for this post. In the next post I’ll look at creating your own images.