Skip to content

Setup Kamaji on Azure

This guide will lead you through the process of creating a working Kamaji setup on on MS Azure.

The material here is relatively dense. We strongly encourage you to dedicate time to walk through these instructions, with a mind to learning. We do NOT provide any "one-click" deployment here. However, once you've understood the components involved it is encouraged that you build suitable, auditable GitOps deployment processes around your final infrastructure.

The guide requires:

  • one bootstrap workstation
  • an AKS Kubernetes cluster to run the Admin and Tenant Control Planes
  • an arbitrary number of Azure virtual machines to host Tenants' workloads

Summary

Prepare the bootstrap workspace

This guide is supposed to be run from a remote or local bootstrap machine. First, clone the repo and prepare the workspace directory:

git clone https://github.com/clastix/kamaji
cd kamaji/deploy

We assume you have installed on your workstation:

Make sure you have a valid Azure subscription, and login to Azure:

az account set --subscription "MySubscription"
az login

Currently, the Kamaji setup, including Admin and Tenant clusters need to be deployed within the same Azure region. Cross-regions deployments are not supported.

Access Admin cluster

In Kamaji, an Admin Cluster is a regular Kubernetes cluster which hosts zero to many Tenant Cluster Control Planes. The admin cluster acts as management cluster for all the Tenant clusters and implements Monitoring, Logging, and Governance of all the Kamaji setup, including all Tenant clusters. For this guide, we're going to use an instance of Azure Kubernetes Service - AKS as the Admin Cluster.

Throughout the following instructions, shell variables are used to indicate values that you should adjust to your own Azure environment:

source kamaji-azure.env

az group create \
  --name $KAMAJI_RG \
  --location $KAMAJI_REGION

az network vnet create \
  --resource-group $KAMAJI_RG \
  --name $KAMAJI_VNET_NAME \
  --location $KAMAJI_REGION \
  --address-prefix $KAMAJI_VNET_ADDRESS

az network vnet subnet create \
  --resource-group $KAMAJI_RG \
  --name $KAMAJI_SUBNET_NAME \
  --vnet-name $KAMAJI_VNET_NAME \
  --address-prefixes $KAMAJI_SUBNET_ADDRESS

KAMAJI_SUBNET_ID=$(az network vnet subnet show \
  --resource-group ${KAMAJI_RG} \
  --vnet-name ${KAMAJI_VNET_NAME} \
  --name ${KAMAJI_SUBNET_NAME} \
  --query id --output tsv)

az aks create \
  --resource-group $KAMAJI_RG \
  --name $KAMAJI_CLUSTER \
  --location $KAMAJI_REGION \
  --vnet-subnet-id $KAMAJI_SUBNET_ID \
  --zones 1 2 3 \
  --node-count 3 \
  --nodepool-name $KAMAJI_CLUSTER

Once the cluster formation succedes, get credentials to access the cluster as admin

az aks get-credentials  \
  --resource-group $KAMAJI_RG \
  --name $KAMAJI_CLUSTER

And check you can access:

kubectl cluster-info

Install Kamaji Controller

Kamaji takes advantage of the dynamic admission control, such as validating and mutating webhook configurations. These webhooks are secured by a TLS communication, and the certificates are managed by cert-manager, making it a prerequisite that must be installed.

The Kamaji controller needs to access a default datastore in order to save data of the tenants' clusters. The Kamaji Helm Chart provides the installation of a basic unamanaged etcd, out of box.

Install Kamaji with helm using an unmanaged etcd as default datastore:

helm repo add clastix https://clastix.github.io/charts
helm repo update
helm install kamaji clastix/kamaji -n kamaji-system --create-namespace

A managed datastore is highly recommended in production. The kamaji-etcd project provides a viable option to setup a managed multi-tenant etcd running as StatefulSet made of three replicas. Optionally, Kamaji offers support for a different storage system, as MySQL or PostgreSQL compatible database, thanks to the native kine integration.

Create Tenant Cluster

Tenant Control Plane

With Kamaji on AKS, the tenant control plane is accessible:

  • from tenant worker nodes through an internal loadbalancer
  • from tenant admin user through an external loadbalancer responding to https://${TENANT_NAME}.${TENANT_NAME}.${TENANT_DOMAIN}:443

Create a tenant control plane of example:

cat > ${TENANT_NAMESPACE}-${TENANT_NAME}-tcp.yaml <<EOF
apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: ${TENANT_NAME}
  namespace: ${TENANT_NAMESPACE}
spec:
  dataStore: default
  controlPlane:
    deployment:
      replicas: 3
      additionalMetadata:
        labels:
          tenant.clastix.io: ${TENANT_NAME}
      extraArgs:
        apiServer: []
        controllerManager: []
        scheduler: []
      resources:
        apiServer:
          requests:
            cpu: 250m
            memory: 512Mi
          limits: {}
        controllerManager:
          requests:
            cpu: 125m
            memory: 256Mi
          limits: {}
        scheduler:
          requests:
            cpu: 125m
            memory: 256Mi
          limits: {}
    service:
      additionalMetadata:
        labels:
          tenant.clastix.io: ${TENANT_NAME}
        annotations:
          service.beta.kubernetes.io/azure-load-balancer-internal: "true"
      serviceType: LoadBalancer
  kubernetes:
    version: ${TENANT_VERSION}
    kubelet:
      cgroupfs: systemd
    admissionControllers:
      - ResourceQuota
      - LimitRanger
  networkProfile:
    port: ${TENANT_PORT}
    certSANs:
    - ${TENANT_NAME}.${TENANT_DOMAIN}
    serviceCidr: ${TENANT_SVC_CIDR}
    podCidr: ${TENANT_POD_CIDR}
    dnsServiceIPs:
    - ${TENANT_DNS_SERVICE}
  addons:
    coreDNS: {}
    kubeProxy: {}
    konnectivity:
      server:
        port: ${TENANT_PROXY_PORT}
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits: {}
---
apiVersion: v1
kind: Service
metadata:
  name: ${TENANT_NAME}-public
  namespace: ${TENANT_NAMESPACE}
  annotations:
    service.beta.kubernetes.io/azure-dns-label-name: ${TENANT_NAME}
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: ${TENANT_PORT}
  selector:
    kamaji.clastix.io/name: ${TENANT_NAME}
  type: LoadBalancer
EOF

kubectl -n ${TENANT_NAMESPACE} apply -f ${TENANT_NAMESPACE}-${TENANT_NAME}-tcp.yaml

Make sure:

  • the following annotation: service.beta.kubernetes.io/azure-load-balancer-internal=true is set on the tcp service. It tells Azure to expose the service within an internal loadbalancer.

  • the following annotation: service.beta.kubernetes.io/azure-dns-label-name=${TENANT_NAME} is set the public loadbalancer service. It tells Azure to expose the Tenant Control Plane with public domain name: ${TENANT_NAME}.${TENANT_DOMAIN}.

Working with Tenant Control Plane

Check the access to the Tenant Control Plane:

curl -k https://${TENANT_NAME}.${KAMAJI_REGION}.cloudapp.azure.com/healthz
curl -k https://${TENANT_NAME}.${KAMAJI_REGION}.cloudapp.azure.com/version

Let's retrieve the kubeconfig in order to work with it:

kubectl get secrets -n ${TENANT_NAMESPACE} ${TENANT_NAME}-admin-kubeconfig -o json \
  | jq -r '.data["admin.conf"]' \
  | base64 --decode \
  > ${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig

kubectl --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig config \
  set-cluster ${TENANT_NAME} \
  --server https://${TENANT_NAME}.${KAMAJI_REGION}.cloudapp.azure.com

and let's check it out:

kubectl --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig get svc

NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
default       kubernetes   ClusterIP   10.32.0.1    <none>        443/TCP     6m

Check out how the Tenant Control Plane advertises itself:

kubectl --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig get ep

NAME         ENDPOINTS           AGE
kubernetes   10.240.0.100:6443   57m

Prepare worker nodes to join

Currently Kamaji does not provide any helper for creation of tenant worker nodes. You should get a set of machines from your infrastructure provider, turn them into worker nodes, and then join to the tenant control plane with the kubeadm. In the future, we'll provide integration with Cluster APIs and other tools, as for example, Terrform.

Create an Azure VM Stateful Set to host worker nodes

az network vnet subnet create \
   --resource-group $KAMAJI_RG \
   --name $TENANT_SUBNET_NAME \
   --vnet-name $KAMAJI_VNET_NAME \
   --address-prefixes $TENANT_SUBNET_ADDRESS

az vmss create \
   --name $TENANT_VMSS \
   --resource-group $KAMAJI_RG \
   --image $TENANT_VM_IMAGE \
   --vnet-name $KAMAJI_VNET_NAME \
   --subnet $TENANT_SUBNET_NAME \
   --computer-name-prefix $TENANT_NAME- \
   --custom-data ./tenant-cloudinit.yaml \
   --load-balancer "" \
   --instance-count 0

az vmss update \
   --resource-group $KAMAJI_RG \
   --name $TENANT_VMSS \
   --set virtualMachineProfile.networkProfile.networkInterfaceConfigurations[0].enableIPForwarding=true

az vmss scale \
   --resource-group $KAMAJI_RG \
   --name $TENANT_VMSS \
   --new-capacity 3

Join worker nodes

The current approach for joining nodes is to use kubeadm and therefore, we will create a bootstrap token to perform the action. In order to facilitate the step, we will store the entire command of joining in a variable:

TENANT_ADDR=$(kubectl -n ${TENANT_NAMESPACE} get svc ${TENANT_NAME} -o json | jq -r ."spec.loadBalancerIP")
JOIN_CMD=$(echo "sudo kubeadm join ${TENANT_ADDR}:6443 ")$(kubeadm --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig token create --print-join-command |cut -d" " -f4-)

A bash loop will be used to join all the available nodes.

VMIDS=($(az vmss list-instances \
   --resource-group $KAMAJI_RG \
   --name $TENANT_VMSS \
   --query [].instanceId \
   --output tsv))

for i in ${!VMIDS[@]}; do
  VMID=${VMIDS[$i]}
  az vmss run-command create \
      --name join-tenant-control-plane \
      --vmss-name  $TENANT_VMSS \
      --resource-group $KAMAJI_RG \
      --instance-id ${VMID} \
      --script "${JOIN_CMD}"
done

Checking the nodes:

kubectl --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig get nodes

NAME               STATUS     ROLES    AGE    VERSION
tenant-00-000000   NotReady   <none>   112s   v1.25.0
tenant-00-000002   NotReady   <none>   92s    v1.25.0
tenant-00-000003   NotReady   <none>   71s    v1.25.0

The cluster needs a CNI plugin to get the nodes ready. In this guide, we are going to install calico, but feel free to use one of your taste.

Download the latest stable Calico manifest:

curl https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/calico.yaml -O

As per documentation, Calico in VXLAN mode is supported on Azure while IPIP packets are blocked by the Azure network fabric. Make sure you edit the manifest above and set the following variables:

  • CLUSTER_TYPE="k8s"
  • CALICO_IPV4POOL_IPIP="Never"
  • CALICO_IPV4POOL_VXLAN="Always"

Apply to the tenant cluster:

kubectl --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig apply -f calico.yaml

And after a while, nodes will be ready

kubectl --kubeconfig=${TENANT_NAMESPACE}-${TENANT_NAME}.kubeconfig get nodes 

NAME               STATUS   ROLES    AGE     VERSION
tenant-00-000000   Ready    <none>   3m38s   v1.25.0
tenant-00-000002   Ready    <none>   3m18s   v1.25.0
tenant-00-000003   Ready    <none>   2m57s   v1.25.0

Cleanup

To get rid of the Kamaji infrastructure, remove the RESOURCE_GROUP:

az group delete --name $KAMAJI_RG --yes --no-wait

That's all folks!