Category Archives: Infrastructure

Multicloud Kubernetes

Introduction

In this blog post, I’m going to go over how I deploy a multi-cloud Kubernetes cluster. But first, why do this?

I run Openstack at home in my basement. I do not own any UPS nor do I pay for dual internet connections. But, I have a lot of hardware that costs me less that $40/month in electricity. Considering power outages and consumer ISP outages can be devastating to an application I would like to keep running, I’m going to leverage some cheap cloud servers and the power/flexibility of Kubernetes to make sure a power outage does not bring down my application.

I have a Github Repo with all of my example code. Go look at it because I do not plan on putting a ton of code in here. Just enough to solve some non standard problems.

Outline

  1. Install VMs on Openstack at home, Openstack on Ramnode, and Digital Ocean. This will be done with Terraform.
  2. Create a Kubernetes cluster with these VMs. This will be done with RKE and Ansible.
  3. Solve some gotcha problems.
  4. Deploy your app.
  5. Profit.

VPN

How will these instances talk to each other? My homelab is all behind NAT and I doubt it’s good practice to have your pod network running on public networks.

Hey I have an idea! VPN!… Ugh… But now where do I host the VPN server? That’s now a single point of failure…

Enter Zerotier. What is Zerotier? It’s a peer to peer VPN. It requires a central server when nodes first get added to the VPN, but after that all traffic is peer to peer. This means I no longer have to worry about a single point of failure for the VPN.

Installing the VMs with Terraform

In a previous post on Immutable Infrastructure, I suggested using Terraform. Terraform will give us immutability on the VMs and will allow us to easily deploy VMs to multiple platforms at a time.

In this post, I’m not going to go over everything that is involved in creating VMs on multiple platforms. I will however, show you how I’m creating an Ansible inventory file with Terraform.

resource "local_file" "hosts_cfg" {
  content = templatefile("./hosts.cfg",
    {
      ramnode_workers = "${join("\n", openstack_compute_instance_v2.ramnode-worker.*.network.0.fixed_ip_v4)}"
      ramnode_masters = "${join("\n", openstack_compute_instance_v2.ramnode-master.*.network.0.fixed_ip_v4)}"
      home_workers = "${join("\n", openstack_compute_instance_v2.home-worker.*.network.0.fixed_ip_v4)}"
      home_masters = "${join("\n", openstack_compute_instance_v2.home-master.*.network.0.fixed_ip_v4)}"
      do_workers = "${join("\n", digitalocean_droplet.worker.*.ipv4_address) }"
      do_masters = "${join("\n", digitalocean_droplet.master.*.ipv4_address)}"
    }
  )
  filename = "inventory"
}

Looking at this, I’m doing a join with a newline at the end of each node. Here is what the hosts.cfg file looks like:

[workers]
${ramnode_workers}
${home_workers}
${do_workers}

[masters]
${ramnode_masters}
${home_masters}
${do_masters}

And after Terraform renders it, here is what my inventory file looks like:

[workers]
107.191.111.61
10.0.0.183
161.35.105.36

[masters]
107.191.111.187
10.0.0.137
161.35.14.32

Now, I have a working file to pass to Ansible!

Deploying Kubernetes

To deploy Kubernetes, I like using RKE (Rancher Kubernetes Engine). I use Ansible to template out the RKE config. So my flow looks like this:

Terraform -> Does the separate cloud stuff and creates VMs -> Outputs inventory file for Ansible
Ansible -> configures basic stuff on VMs -> templates out the RKE config -> runs RKE for you
RKE (run locally by Ansible) -> deploys Kubernetes and configures it to use the Zerotier network.

Here is what the RKE config (as an Ansible template) looks like:

---

ssh_key_path: {{ rke_ssh_key_location }}

cluster_name: {{ rke_cluster_name }}
ignore_docker_version: true
system_images:
    kubernetes: rancher/hyperkube:v{{ kubernetes_version }}-rancher1

{% if (longhorn_enabled is defined and longhorn_enabled | bool == True) %}
services:
  kubelet:
    extra_args:
      volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
    extra_binds:
      - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec
{% endif %}
  kube_api:
    service_cluster_ip_range: 192.168.0.0/16

network:
    plugin: flannel
    {% if (kubernetes_network_interface is defined) %}

    options:
        flannel_iface: {{ kubernetes_network_interface }}
    {% endif %}

nodes:
  {% for node in groups['masters'] %}

  - address: {{node}}
    name: {{hostvars[node]['ansible_hostname']}}
    hostname_override: {{hostvars[node]['ansible_hostname']}}
    internal_address: {{ hostvars[node][kubernetes_ansible_interface]['ipv4']['address'] }}
    user: {{standard_user}}
    role:
    - controlplane
    - etcd

  {% endfor %}
  {% for node in groups['workers'] %}

  - address: {{node}}
    internal_address: {{ hostvars[node][kubernetes_ansible_interface]['ipv4']['address'] }}
    name: {{hostvars[node]['ansible_hostname']}}
    hostname_override: {{hostvars[node]['ansible_hostname']}}
    user: {{standard_user}}
    role:
    - worker

  {% endfor %}

dns:
    provider: coredns
    upstreamnameservers:
    - 1.1.1.1
    - 8.8.4.4

addons_include:
  - {{ rke_directory }}/cert-manager-namespace.yaml
  - {{ rke_directory }}/configs/cloudflare-updater.yaml
  - {{ rke_directory }}/configs/test-app.yaml
  - https://github.com/jetstack/cert-manager/releases/download/v0.13.1/cert-manager.yaml
  - {{ rke_directory }}/cert-manager-prod-issuer.yaml

The key pieces here, are internal_address being the Zerotier address for the node and up at the top flannel_iface is the Zerotier interface name. internal_address tells the Kubernetes nodes to advertise themselves with the address written. flannel_iface tells Kubernetes to run the flannel pod network over the Zerotier interface.

$ kubectl get nodes
NAME                             STATUS   ROLES               AGE     VERSION
multicloud-digitalocean-master   Ready    controlplane,etcd   2m47s   v1.15.12
multicloud-digitalocean-worker   Ready    worker              2m36s   v1.15.12
multicloud-home-master           Ready    controlplane,etcd   2m49s   v1.15.12
multicloud-home-worker           Ready    worker              2m41s   v1.15.12
multicloud-ramnode-master        Ready    controlplane,etcd   2m45s   v1.15.12
multicloud-ramnode-worker        Ready    worker              2m35s   v1.15.12

We now have 1 master and 1 worker in each cloud. You will always want a master in each cloud to make sure your api layer is HA. Workers are up to you. With my application at home, I deploy more workers at home because VMs at home are “free” and I have capacity. If my home “cloud” goes down, I make sure there are enough resources on the other clouds to handle regular traffic until the home cloud comes back up. Alternatively, I can scale out the other clouds easily with terraform.

Gotchas

There are two gotchas in this setup.

  1. You cannot use cloud block storage. Why? Because Kubernetes defines that storage cluster wide and expects all nodes in the cluster to be able to use that storage. Ramnode and my Home Openstack nodes cannot attach DigitalOcean block storage. This is solved by deploying Longhorn or another in cluster storage like Rook Ceph or whatever you like. My Github Repo uses Longhorn.
  2. What is the entrypoint? Enter Cloudflare (or another DNS provider with an API).

I wrote a quick docker image with a script that can update Cloudflare with my public IP address. You can see that here. I deploy that into a pod that runs on the cluster and updates Cloudflare with the public address.

Now, if a node goes down, the Cloudflare updater will move to another node in the cluster and update Cloudflare to point to its public address!

This trick even works behind a NAT. At home, I have a Public IP -> port 80 and 443 forwarded to haproxy -> haproxy load balanced across my kubernetes workers.

EXTRA OPTIONAL CONFIG: Because I’m using Debian 10, systemd resolvd is a little wonky at times with docker. This is why I added upstream dns servers in the rke config above.

Deploy App

I have temporarily deployed a test app for this Kubernetes cluster to https://helloworld.test.codywimer.com. This is reflected in the Github repo. I don’t plans on leaving it up for very long so I’ve included a screenshot.

Profit/Final Thoughts

Here is the final diagram for this multi-cloud Kubernetes cluster:

Architecture Diagram

This setup will give you a highly available multi-cloud application deployment platform. At home, it provides me HA. In an enterprise, it can provide you flexibility and the capability to use cheap on-prem infrastructure for normal load and then (with more terraform) provides the ability to scale horizontally up and down with cloud resources.

IMO a hybrid (in house + cloud) approach to infrastructure gives you the best bang for your buck and gives you the most flexibility.

Happy Infrastructuring!

Jacob Cody Wimer

Immutable Infrastructure

Introduction

What is normal behavior when a server goes haywire? SSH into it and see what’s going on, right? How do you normally update a server? SSH into it and (on Ubuntu) run sudo apt-get upgrade, right? How do you normally deploy your applications? SSH into your server and run git pull or docker pull, right? That behavior is the behavior of a traditional mutable infrastructure. Mutable meaning these servers change after their original deploy.

Meme: "WHAT IF I TOLD YOU" - All Templates - Meme-arsenal.com

What if I told you there’s another paradigm to consider? In the paragraphs ahead, I’ll explain the benefits of immutable infrastructure and give examples of how to implement immutable infrastructure. Immutable meaning servers not changing after they’re deployed. Need to update a server? Deploy a new one. Need to deploy your code? Deploy a new server. Need to fix a server? Nope, shoot it in the head and deploy a new one.

Benefits of Immutable Infrastructure

The biggest benefit to immutable infrastructure is infrastructure as code. In order to be capable of always deploying a new server, you’ll be forced to automate and you’ll be forced to continue automating. Other benefits include:

  • Forcing you to aggregate all logging information (you can’t retrieve logs from a dead server)
  • No more configuration drift. You’ll no longer have differences between servers that are alike because 1. they don’t change after being deployed and 2. they’re being deployed with the same automation.
  • No more snowflake servers. No more servers that only Joe Blow, who used to be on your team but is now on another team, knows how to configure.
  • Troubleshooting becomes easier. Delete the server and deploy a new one. Your servers are no longer pets. They’re cattle.
  • The ability to test new server configurations before they’re deployed via staging environments.
  • The ability to quickly roll back to a known working state if a new server update breaks things.

Tools of the Trade

The number 1 thing that is needed for immutable infrastructure is “cloud” infrastructure. Wait, does that mean I have to pay for the cloud in order to have immutable infrastructure? No, on-prem you can use a number of VM technologies or you can use Kubernetes (sigh, yes you’re not escaping this post without me talking about this cult revolution). On-prem VM solutions that all have an API for deploying servers include Openstack, VMware, and Proxmox. I use Openstack at work and at home. It can be very daunting at first look, but it’s much easier that perceived. I will expand on that more in a future blog post.

Alternatively, you can use public clouds too. AWS, GCP, Azure, OVHcloud, and Digital Ocean to name a few.

Configuration Management

Now that you can easily deploy servers, it’s time to talk about automation to configure them.

Everyone starts this conversation off by ignoring user-data. If your VM platform supports it (Openstack, AWS, GCP, OVHcloud, or Digital Ocean to name a few), user-data is a script that can be passed to your VM provisioner that will run as soon as the VM boots. You can technically automate your entire infrastructure with user-data and Docker. I recommend doing initial configuration with this before moving on to something more complicated like I’m going to suggest below.

Other common configuration management tools include Chef, Puppet, SaltStack, and Ansible. Nothing against the others, they’re fine tools if you design your code properly, but I recommend Ansible. 1. because it’s a push model. You push Ansible configuration to a server instead of a server pulling it and 2. because it’s agentless. You do not need a long running “master” that servers pull from and you do not need an agent on the servers to pull from the “master”.

Simple Immutable Infrastructure

Again, user-data combined with Docker can be very powerful. I recommend the following:

  1. Create a user-data script for each “role” of server in your environment.
  2. Package applications for each “role” into Docker images.
  3. Your user-data script will handle installing Docker and running the Docker image for that role.
  4. Use a tool like Terraform or use Bash to deploy all of your instances with the user-data script passed to it. Terraform can be beneficial over Bash in the fact that it is written to be immutable. If it senses configuration change for a server (like a new network added or a change in image), it will handle deleting that server and replacing it for you with a new one.
  5. Ansible can be used after the server is deployed for any extra configuration. Normally, this is needed for things like loadbalancers that need to know about other servers in the environment (like your app servers).
  6. Profit

Enter Kubernetes

Yes, another blog post talking about the Cult that is Kubernetes. Why am I talking about Kubernetes in a post about immutable infrastructure? Because Kubernetes deployments are actually immutable.

Kubernetes operates with a master/worker paradigm. You apply yaml definitions to tell Kubernetes the state you want your infrastructure to look like, and the masters spread Docker containers out to the workers to create that state. If a worker goes down, the master notices and spreads new containers out to other workers without your intervention. You can then kill off the worker that went down, create a new worker, and add it to the Kubernetes cluster.

Within the past year I have actually moved away from the “simple immutable infrastructure” model above and moved to Kubernetes. Why? Because the model above just pushes Docker images around. Kubernetes was written to do just that and it’s yaml deployment definitions are much more portable if you want to switch cloud providers or even use a different provider as fail over site. Instead of rewriting step 4 for a different cloud, you can just apply your already written yaml to a new Kubernetes cluster in a different cloud.

Every major cloud provider that provides support for user-data also provides Kubernetes as a service if you do not want to deploy it yourself. Openstack, AWS, GCP, Azure, and Digital Ocean to name a few. There are also plenty of on-prem deployment options. One I highly recommend is Rancher.

If you know nothing about Kubernetes, you can get up to speed by reading a great book called Kubernetes Up and Running. This book goes over deploying a simple cluster on-prem and all of the API pieces that help you deploy your apps onto Kubernetes.

Final Thoughts

Infrastructure is not something you want changing out from underneath you while it’s running live. Creating immutable infrastructure simplifies troubleshooting in many ways, allows you opportunities to test your infrastructure before putting it in place, forces good practice like infrastructure as code, automation, and aggregated logging, and also allows you to more easily roll back to a previous state if need be.

Thank you for reading and happy infrastructuring!

Jacob Cody Wimer