Skip to content

Tenant Control Plane Monitoring

Kamaji exposes a set of metrics that can be used to monitor the health of the Tenant Control Plane (TCP) and its components. The metrics are exposed in Prometheus format and can be scraped by a Prometheus server instance running in the Management Cluster.

Prerequisites

Ensure you have installed the Prometheus Operator in the Management Cluster and that it is configured properly. You should verify that Service Monitor CRDs are installed in the Management Cluster as they are used to tell Prometheus how to scrape the metrics from the TCP.

Enable metrics scraping

On the Management Cluster, in the same namespace as the Tenant Control Plane, create a Service Monitor that instructs Prometheus how to scrape the metrics from the TCP.

First, create a service for exposing metric endpoints from TCP components. The following is an example for a Tenant Control Plane named charlie deployed in the default namespace:

apiVersion: v1
kind: Service
metadata:
  labels:
    kamaji.clastix.io/name: charlie-metrics
  name: charlie-metrics
  namespace: default
spec:
  ports:
  - name: kube-apiserver-metrics
    port: 6443
    protocol: TCP
    targetPort: 6443
  - name: kube-controller-manager-metrics
    port: 10257
    protocol: TCP
    targetPort: 10257
  - name: kube-scheduler-metrics
    port: 10259
    protocol: TCP
    targetPort: 10259
  selector:
    kamaji.clastix.io/name: charlie
  type: ClusterIP

Then create a Service Monitor that tells Prometheus how to scrape the metrics from the TCP:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    release: kube-prometheus-stack
  name: kube-prometheus-stack-tcp-charlie
  namespace: default
spec:
  endpoints:
  # API Server endpoint
  - port: kube-apiserver-metrics
    scheme: https
    path: /metrics
    interval: 15s
    scrapeTimeout: 10s
    tlsConfig:
      # skip certificate verification
      insecureSkipVerify: true
      # Client certificate for authentication
      cert:
        secret:
          name: charlie-api-server-kubelet-client-certificate
          key: apiserver-kubelet-client.crt
      # Client key for authentication
      keySecret:
        name: charlie-api-server-kubelet-client-certificate
        key: apiserver-kubelet-client.key
    metricRelabelings:
    - action: drop
      regex: apiserver_request_duration_seconds_bucket;(0.15|0.2|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2|3|3.5|4|4.5|6|7|8|9|15|25|40|50)
      sourceLabels:
      - __name__
      - le
    relabelings:
    - action: replace
      targetLabel: cluster
      replacement: charlie
    - action: replace
      targetLabel: job
      replacement: apiserver
  # Controller Manager endpoint
  - port: kube-controller-manager-metrics
    scheme: https
    path: /metrics
    interval: 15s
    scrapeTimeout: 10s
    tlsConfig:
      # skip certificate verification
      insecureSkipVerify: true
      # Client certificate for authentication
      cert:
        secret:
          name: charlie-api-server-kubelet-client-certificate
          key: apiserver-kubelet-client.crt
      # Client key for authentication
      keySecret:
        name: charlie-api-server-kubelet-client-certificate
        key: apiserver-kubelet-client.key
    relabelings:
    - action: replace
      targetLabel: cluster
      replacement: charlie
    - action: replace
      targetLabel: job
      replacement: kube-controller-manager
  # Scheduler endpoint
  - port: kube-scheduler-metrics
    scheme: https
    path: /metrics
    interval: 15s
    scrapeTimeout: 10s
    tlsConfig:
      # skip certificate verification
      insecureSkipVerify: true
      # Client certificate for authentication
      cert:
        secret:
          name: charlie-api-server-kubelet-client-certificate
          key: apiserver-kubelet-client.crt
      # Client key for authentication
      keySecret:
        name: charlie-api-server-kubelet-client-certificate
        key: apiserver-kubelet-client.key
    relabelings:
    - action: replace
      targetLabel: cluster
      replacement: charlie
    - action: replace
      targetLabel: job
      replacement: kube-scheduler
  selector:
    matchLabels:
      kamaji.clastix.io/name: charlie-metrics

TLS certificates

To access metrics endpoints, the Prometheus must authenticate with the control plane endpoints. You can use the <tcp_name>-api-server-kubelet-client-certificate secret. This secret is automatically created by Kamaji in the namespace and contains the client certificate and key needed for the control plane components.

Finally, ensure the Prometheus service account, e.g. kube-prometheus-stack-prometheus has the necessary permissions to access the secret containing the certificates. The following is an example of a ClusterRole and ClusterRoleBinding that grants the required permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus-secret-access
  namespace: default
subjects:
- kind: ServiceAccount
  name: kube-prometheus-stack-prometheus
  namespace: monitoring-system
roleRef:
  kind: Role
  name: prometheus-secret-reader
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus-secret-reader
  namespace: default
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list", "watch"] 

For production environments, a fined grained approach is recommended to restrict access only to the secrets containing the required certificates.

Accessing metrics

Scraped metrics are available in the Prometheus server. You can access the Prometheus dashboard to view the metrics and create alerts based on them. If you use the same Prometheus instance for monitoring both the Management Cluster and Tenant Control Planes, you must relabel the scraped metrics to differentiate between them. This can be achieved in the values.yaml file used to install the Prometheus Operator Helm Chart:

...
prometheus:
...
kubeApiServer:
    serviceMonitor:
    relabelings:
    - action: replace
        targetLabel: cluster
        replacement: kamaji
kubeControllerManager:
    serviceMonitor:
    relabelings:
    - action: replace
        targetLabel: cluster
        replacement: kamaji
kubeScheduler:
    serviceMonitor:
    relabelings:
    - action: replace
        targetLabel: cluster
        replacement: kamaji
...

Grafana

Grafana is a widely used tool for visualizing metrics. You can create custom dashboards for Tenant Control Planes and visualize the metrics scraped by Prometheus. The Prometheus Operator Helm Chart also installs Grafana with a set of predefined dashboards for Kubernetes Control Plane components: kube-apiserver, kube-scheduler, and kube-controller-manager. These dashboards can serve as a starting point for creating custom dashboards for Tenant Control Planes or can be used as-is.

Multi-Cluster Mode

In Grafana, enable the "Multi-Cluster Mode" option for improved visualization of metrics. This option is available in the Grafana settings.

That's it!