20 September 2017

A brief look at Azure Container Service

Yesterday I had a couple of hours left over as I was on a train on the way to do a presentation. So I thought i would play around a little with the Azure Container Service. Seeing that I have gotten hooked on Docker, having the ability to spin up a Docker cluster while on the train using just my mobile phone for the connection, seems like a really cool thing. I guess normal people read books and watch Netflix on the train. Me…I spin up 5 Docker clusters…

Setting up an Azure Container Service is really simple, so let’s have a look at it.

Setting up a new Container Service

You just go to the portal and choose to add a new Azure Container Service, and accept that it will use the Resource Manager deployment model. Then you have to fill out a bit of information.

First you have to fill out some basic information. It wants a name for the service, the subscription that should pay for it, a resource group to place it in and a region to put it in. Pretty basic stuff.

Next it wants to know what kind of orchestrator you want to use, and how you want to set up the masters in the cluster. In my case, I chose Docker Swarm for the orchestrator. Then you need to give it a unique DNS name prefix. It needs to be unique as it will be used as a prefix for the 2 domain names that are set up for your cluster. Once you have defined that, you need to tell it what username should be used for the admin account for the machines, as well as the SSH public key to use when connecting to it. And finally, you have o tell it how many masters you want. You get the option of 1, 3 or 5, which are all automatically distributed among fault and update domains to make sure that they don’t all go down at the same time. It would be kind of useless to have multiple masters, if they all go down when something goes wrong, or they are updated…

But before I go any further, I want to go out on a tangent and talk about SSH keys…

SSH keys

A lot of developers work with SSH keys and SSH connections all the time, but not all of us. To be perfectly honest, this is actually one of the first times ever that I have had to create SSH keys. It all depends on the environment you work in.

So what are they, and what do we need them for? Well, it is actually not that hard. It is just a private/public key pair that is used to securely communicate between 2 endpoints. When you connect to a master in the cluster, you do that by setting up an SSH tunnel. This is a secure connection between your machine and the server, where your machine encrypts the data with the private key, and the server uses the public key that you just provided in the portal.

If you haven’t created a set of SSH keys before, I thought I would quickly mention how to do that. It is really simple. All you have to do is run the following command in the terminal

ssh-keygen -t rsa -b 4096 -C "[YOUR E-MAIL ADDRESS]"

This will then ask you where to put the generated files. I just accepted the default, which is C:\Users\[USERNAME]\.ssh\. And then it wants you to input a passphrase to make sure that you are the only one that can use the keys.

Once the command has completed, you will have a brand new set of SSH keys!

However, if you are on a Windows machine, like me, you probably get an error saying that it can’t find the executable ssh-keygen. So where do you go to get that? Well, if you have Git installed on your machine, you are in luck. That installation actually gives you that executable, as well as a few other ones, as part of the package. All you have to do is to add the path %Program Files%\Git\usr\bin to your path, and you are good to go.

Once you have the keys generated, you can just open the file called id_rsa.pub in Notepad, and copy the key.

Ok, after that side note, let’s get back to creating the cluster!

More configuration

As soon as we have set up the master node configuration, we need to configure the agents. However, this is a lot easier than the rest of the config. You just need to set up how many agents you want, and what size of machines you want to use. In other words, how much load do you need to handle, and how big is your wallet.

That’s it! As soon as the portal has verified the values, you can click OK, and get your cluster set up! It takes a bit of time though… On the other hand, once you see what it has set up, you will understand why. There are quite a few pieces involved in setting up the cluster. And that is actually part of the reason I wanted to write this post. I wanted to have a better look at what is actually being created. Since everything is created automatically, the pieces are named less than perfect, and since there as so much stuff being created, it can be hard to figure out how it all fits together.

When the creation is done, you get all your new services added to the resource group you chose. So what is actually being created? Well, let’s walk through it and have a look!

What do you get?

I’ll go outside in, from the internet to the machines, instead of by name or service, as I think that makes it easier to understand how the different parts fit together.

First of all, we get a Container Service with the name we set up. This service in itself is pretty boring, and doesn’t give us a whole lot of information.

“Inside” that, we get a Virtual Network (swarm-vnet-XXX) that connects all of the VMs in the cluster. Actually, it connects to the network interfaces connected to the master VMs and to a Virtual Machine Scales Set containing the agents. But for simplicity, let’s just say that it connects the machines in the cluster… The virtual network is set up with 2 address spaces, 10.0.0.0/8 and 172.16.0.0/24. The 2 address spaces are used by 2 subnets, swarm-subnetMaster (172.16.0.0) and swarm-subnet (10.0.0.0).

Seeing that the network is split into 2 parts, one for the masters and one for the agents, I’ll look at the cluster based on that. I’ll start out by looking at the master side of things…

On the swarm-masterSubnet subnet, there is a Load Balancer (swarm-master-lb-XXX) that manages the masters. It adds NAT, mapping TCP port 22 and 2200 to port 22 on the master.

And what is port 22? Well, it’s SSH. So this NAT allows us to connect to the master using SSH.

And if you have more than one master, it maps port 220x to port 22 on each of the masters, where x is a sequential number that gives us port 2200, 2201 and 2202 in a 3 master cluster. This makes it possible to connect to each one of the masters by choosing the correct port to use.

The Load Balancer connect to the master machines using one or more Network Interfaces (swarm-master-XXX-nic-x). So depending on the number of masters you choose, you will have one or more Network Interfaces defined.

Each one of the potentially multiple Network Interfaces are then connected to a Virtual Machine (swarm-master-XXX-x) that plays the role of Swarm Master in the cluster.

And that is pretty much it for the master side of ot the network. However, there are 2 more things that is set up for us.

First of all, all the masters are connected to an Availability Set (swarm-master-availabilitySet-XXX). This availability set is responsible for spreading out the the masters across fault and update domains, to make sure that they don’t all go down together if something happens.

And secondly, there is a Public IP Address (swarm-master-ip-[ACS SERVICE NAME]mgmt-XXX). This is a public IP address that is connected to the Load Balancer, giving us a public endpoint that we can use to reach our master nodes. It also happens to be configured with a DNS name, which is defined as [ACS SERVICE NAME]mgmt.[REGION].cloudapp.azure.com. So whenever we want to connect to our Swarm masters, we can connect to that address.

Ok, so that takes care of the master end of things. But what’s the situation on the agent side of things? Well, a lot of it is very similar…

On the swarm-subnet, there is another Load Balancer (swarm-agent-lb-XXX) that manages the incoming requests, and forwards them to the agents. However, this Load Balancer actually does sort of load balance the incoming traffic instead of just forward ports to machines. It has load balancing set up for port 80, 443 and 8080, passing the requests to something called Backend pools. These “pool” of machines, are sets of machines that should handle the incoming requests. In this case, there is a single pool (swarm-agent-pool-XXX), containing…no not a set of agents…but a Virtual Machine Scale Set.

So the Load Balancer forwards the incoming requests to a Virtual Machine Scale Set (swarm-agent-XXX-vmss). This is a uniform set of machines that are handled together as a “unit”. The set can then be scaled out and in, and Azure takes care of setting everything up for us. So all we have to do, is to tell it what OS, machine size and count we want, and Azure takes care of the rest. It even makes sure to spread out the machines across fault and update domains and so on, to make sure it is as highly available as possible.

And finally, we get a bunch of Storage Accounts to hold the OS disks and diagnostics data.

So that’s pretty much what we get when we set up a new Azure Container Service! It is quite a few pieces that has to be set up, and work together for everything to work. So it’s kind of nice that Azure sorts that all out once we have defined the basic requirements! But how do we go about starting up a container in the cluster we just created? Well…that isn’t that hard!

Connecting to the master a.k.a Getting your SSH on

The first thing we need to do, is to set up an SSH tunnel to the Swarm Master. This isn’t complicated as such, but for me as a Windows person, it feels a bit awkward. But just relax and let go of that! It’s not hard, and it is pretty cool once you have tried it. Just open a terminal and run the following command

ssh -fNL 2375:localhost:2375 -p 2200 [ADMIN ACCOUNT]@[CONTAINER SERVICE NAME]mgmt.northeurope.cloudapp.azure.com

This opens up an SSH tunnel from your machine to the Swarm manager in the cloud. It connects to the manager using port 2200, encrypting the traffic using the SSH keys. It then maps port 2375 on your local machine to port 2375 on the remote machine. So anything you send to localhost:2375 will be sent to port 2375 on the the remote machine, tunneled securely using SSH. The endpoint [ADMIN ACCOUNT]@[CONTAINER SERVICE NAME]mgmt.northeurope.cloudapp.azure.com just says that you want to connect to [CONTAINER SERVICE NAME]mgmt.northeurope.cloudapp.azure.com, using the account [ADMIN ACCOUNT].

Actually, that is what happens if you just pass in -L. The -fN is to connect the current terminal to the tunnel, allowing us to use the terminal to communicate with the master.

Once the tunnel is up and running, we need to tell our Docker client that we want to communicate with our Docker master in the cloud using that port. So, to do that, we need to set the DOCKER_HOST environment variable, which is easily done by calling

set DOCKER_HOST=:2375

if you are using CMD, or

export DOCKER_HOST=:2375

if you are in Bash.

And if you want to verify that you are connected, you can just try running

docker info

to see if you get a response.

Starting your first container

Once you are connected, you can run a new container like you would in any other scenario… For example, I ran

docker run -d -p 80:80 --name demo_app zerokoll/demoapp

which started a very simple ASP.NET Core container that I had uploaded to my own Docker Hub repo. And as you can see, I mapped the external port 80 to the containers port 80. This is possible because port 80 is mapped in the agent load balancer by default. Together with port 443 and 8080. But, if you want to use any other ports, you need to remember to map those ports through the balancer.

Once the container has started, you can just browse to the agent endpoint ([CONTAINER SERVICE NAME]agents.[REGION].cloudapp.azure.com) in your browser.

If you want to deploy something more complicated than a single container like that, you can go ahead and use a docker-compose.yml file. However, since ACS is only running Docker 1.24, we can’t use docker stack, as this requires version 1.25. Instead, we need to use docker-compose.

Ok…I think that was all I had to cover. It is probably just a bunch of stuff you could have figured out on your own, but was hopefully faster to read it here. And to be honest, the post is mostly written with myself in mind. I needed to write this down to sort it all out in my head, and to have a place to come back to when I forget how it is all connected. But hopefully it might help someone else out there as well at some point!

Cheers!