Kubernetes and deployments configuration recommendations

~ 12 mins

06 Feb 2021

Ideas or code sources:

Heeeeello, everyone!

Intro

Some time ago, I was working with Kubernetes and MicroService deployments. That was not a bare-metal deployment of k8s, but AWS EKS. It was configured via Terraform and, maybe, someday I’ll write an article about that…

At that time, I read a lot of recommendations and collected the most important of them into one article. But now, let’s look at collected recommendations!

This article does not contain recommendations like “any service sould have it’s own Service Account configured and RBAC should be explicitly defined for each Account” or “network policies must be set for each application deployment”. Those are must have.

Resource Requests/Limits

Root cause

Those values must be set!

The reason is quite simple: if you don’t set limits – the container will be allowed to consume all resources available to the worker node. This will affect all containers running on the same node. We faced that while mass start services whether because of node die or cluster-autoscaler scale down unused nodes or just parallel deploy of newer services versions.

We had Java Spring Boot based services and, as all of you know, they’re quite greedy for the resources, especially while starting up.

After some testing, we came with these values:

## ...skipped...
resources:
  limits:
    cpu: 1000m
    memory: 500Mi
  requests:
    cpu: 200m
    memory: 256Mi

To determine proper value, you must know how many Cores on your node! Note, that 1000m means whole 1 core!

In this example, we requested guaranteed memory to 256Mb of RAM and 200m CPU and limited (burstable) maximum memory to 500Mb of RAM, and 1 CPU. The m for the CPU stands for milli, and thus 1000m is 1 CPU, based on this definition, you can write cpu: 1 and this will be equal to cpu: 1000m.

But I want to warn you on setting memory limit, that in most cases at some point your container could be killed with OOM because we can guarantee only 256Mb of RAM (in our example) and that does not mean that at some point of time, the node will have additional 256Mb (see the limit set to 500Mb). In this case, your container will be killed with OOM reason.

In case of CPU limiting, the node will be just overcommitted which could affect other containers, running on the same node.

For high CPU consuming apps it’s highly recommended to run those on separate nodes designated for high CPU usage, like AWS EC2 c-type instances. That will require an additional definition of nodeSelector configuration in deployment resource.

You can always get current usage of CPU and Memory for each pod, container inside pods, and worker node overall:

kubectl top pods
kubectl top pods --containers
kubectl top nodes

This command requires Metrics Server deployed into the cluster

Metrics Server is a cluster-wide aggregator of resource usage data. By default, it is deployed in clusters created by kube-up.sh script as a Deployment object. If you use a different Kubernetes setup mechanism, you can deploy it using the provided deployment components.yaml file.

Metrics Server collects metrics from the Summary API, exposed by Kubelet on each node, and is registered with the main API server via Kubernetes aggregator.

Learn more about the metrics server in the design doc.

Recommended configuration

Hard to say what exact numbers should be set for your deployment. That also heavily depends on the framework you use in your application.

The only recommendation is to run tests, get values…

For SpringBoot based applications, we used these:

## ...skipped...
resources:
  limits:
    cpu: 1000m
    memory: 500Mi
  requests:
    cpu: 200m
    memory: 384Mi

That was enough to run the application without high loads, those CPU Limits helped services to start flawlessly.

Conclusion

Resources must be set. There is an option to make them dynamic – Vertical Pod Autoscaler.

Vertical Pod Autoscaler

Virtual Pod Autoscaler article by Banzaicloud: https://banzaicloud.com/blog/k8s-vertical-pod-autoscaler/
GoogleCloud article about Virtual Pod Autoscaler: https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler#limitations_for_vertical_pod_autoscaling
Visualize VPA Recommendations: https://github.com/FairwindsOps/goldilocks/

What is Vertical Pod Autoscaler?

VerticalPodAutoscaler resource can help you automate away the manual process of setting Resource Requests/Limits by looking at cpu/mem usage in time and setting or recommending new requests and limits based on that all over again.

Vertical Pod Autoscaler is a set of services, so it needs to be depoyed first. More details on Installation process you can find on this page.

Official documentation warns, that VPA is not yet ready for use with JVM-based workloads due to limited visibility into the actual memory usage of the workload. More details on limitations you can find on this page.

Vertical Pod Autoscaler consists of several components. Most interesting here is the Recommendation service which is watching for Resource consumption and provides recommendations for Resource Requests and Limits.

Lets enable VPA for our service:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-service-vpa
  namespace: my-namespace
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       my-service
  updatePolicy:
    updateMode: "Off"

And after some time, you’ll can get created VPA resource, which contains Recommendations by invoking following command:

kubectl describe vpa my-service-vpa

and its output:

API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Name: my-service-vpa
Namespace: my-namespace
Labels: <none>
Annotations: <none>
Metadata: <none>
Spec:
  Target Ref:
    API Version: apps/v1
    Kind: Deployment
    Name: my-service
  Update Policy:
    Update Mode: Off
Status:
  Conditions:
    Last Transition Time: 2020-05-23T14:28:35Z
    Status: True
    Type: RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name: my-service
      Lower Bound:
        Cpu: 25m
        Memory: 350837913
      Target:
        Cpu: 143m
        Memory: 410771395
      Uncapped Target:
        Cpu: 143m
        Memory: 410771395
      Upper Bound:
        Cpu: 246m
        Memory: 670336299

At the end of this output, we can see Container Recommendations, which could be used to set those manually, or when Update mode enabled, those will be set automatically.

How does it work?

First of all, VPA services should be deployed into the cluster. All instructions could be found here. Also, it requires Metrics API Server to be deployed as well. VPA can work in three different modes:

Off: VPA just recommends its best estimate for resources in the status section of the VerticalPodAutoscaler CRD.
Initial: VPA changes the container resource section to follow its estimate when new pods are created.
Auto: VPA adjusts the container resource spec of running pods, destroying them and recreating them (this process will honor the Pod Disruption Budget, if set).

Update mode: Auto will trigger pod to be restarted on each Resource Request/Limit change! So configuring Pod Disruption Budget for service is highly recommended in order to eliminate Service Outage.

So, what we can get from Container Recommendations for Guaranteed and Burstable configuration:

Name	Description
Lower bound	Minimal amount of resources that should be set for the container.
Upper bound	Maximum amount of resources that should be set (above which you are likely wasting resources).
Target	VPA’s recommendation based on the algorithm described here and considering additional constraints specified in the VPA CRD. The constraints are not shown in the above VPA example. They allow you to set minimum and maximum caps to VPA’s recommendation, see here for more details.
Uncapped target	VPA’s recommendation without considering any additional constraints.

Recommended configuration

As for now, this resource does not have any configuration like others, so, I’d prefer this to be deployed in Recommendation mode to get some statistics and recommendations only.

Conclusion

This is recommended, but with enabled automated resource change, it could trigger scaling worker nodes. So, the current recommendation is to deploy but only to get recommendations on setting Resource Requests and Limits.

Horizontal Pod Autoscaler

Require any type of Metrics API Server deployed into cluster depending on what metrics we might want to scale our application. See Liveness/Readiness probes details below.

Resource Requests and Limits in Development’s Pod Template must be defined to use this.

Root cause

The main reason is that we want to automate the scaling of our service if we start receiving more traffic and our pods getting too hot

Recommended configuration

We will use simple scaling based on CPU and Memory

Since we can use percentage values from Metrics API Server, here I’ll provide only a common configuration that can be applied to all services as a starting point. More precise values you can get after some load testing.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 7
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 150

We use API version v2beta2 here. This is important since the previous API version has a different definition!

Those values are percentages and based on Resource Request (Guaranteed) value! That is why we set 150 of averageUtilization for memory resource.

Liveness and readiness probes

Root cause

How to know when service up and can server traffic? People usually do not know the difference between Liveness and Readiness Probes.

First of all, probes help Kubernetes to understand two things:

Pod is up in general (Liveness probe)
Pod is ready to receive traffic (Readiness probe)

Moreover, those also work with tight integration with Kubernetes Service resource. Service always check if Deployment’s Pods has Readiness check successful. If it is not then this Pod will be excluded from receiving traffic until Readiness check succeeds again. For example, one of the Deployment Pods starts handling too much traffic or an expensive computation and in this case, we might want to let this pod cool down and finish current tasks, and only after that, it should continue serving incoming requests. That’s the purpose of Readiness probe check. Incase you defined the same endpoint for Liveness, Kubernetes could think that Pod is dead and will restart Pod with all related data loss. Why would you restart a pod that is healthy and just doing a lot of work? Also, this configuration should work in tight integration with Horizontal Pod Autoscaler

More details you can find on this page.

Recommended configuration

Liveness probe

There are two ways for Liveness checks.

Liveness via tcpSocket check of Application Port:

livenessProbe:
  initialDelaySeconds: 60
  periodSeconds: 20
  tcpSocket:
    port: 8080

and via httpGet:

livenessProbe:
  initialDelaySeconds: 60
  periodSeconds: 20
  httpGet:
    path: /health/liveness
    port: 8080

Spring Boot has special endpoints for Liveness and Readiness probes. More details about that you can find on this page.

Readiness probe

Readiness will be the same disregarding the chosen Liveness check way:

readinessProbe:
  httpGet:
    path: /health/readiness
    port: 8080
  failureThreshold: 10
  initialDelaySeconds: 30
  periodSeconds: 10
  successThreshold: 2
  timeoutSeconds: 10

To find proper values, measure your application startup timing, how long takes initialization, etc. For each framework, those values could be different, as well as different classes, providers, external resources, or services your application use.

Conclusion

Never use same endpoint for Liveness and Readiness Probe. If the liveness probe is equal to the readiness probe, you are in big trouble. Probes must be set with proper endpoints!

Do not fail either of the probes if any of your shared dependencies is down, it would cause the cascading failure of all the pods. You are shooting yourself in the foot.

Pod Disruption Budget

This only will help with voluntary disruptions and will not work with not work for involuntary disruptions!

More on Voluntary and involuntary disruptions

Root cause

From time to time your nodes and cluster have to be upgraded, decommissioned, scaled up, or down. That could lead that your service can become unresponsive, for example, all pods of the service/deployment were running on one same node which was “marked” to be deleted by cluster-autoscaler. In this case, all pods of the Deployment will be down till they’ll be recreated on another live worker node. But we can tell Cluster that we want to have at least one pod to be up and the “marked” node will not be deleted until at least one pod will be ready to serve traffic on any other live node, in other words – migration of probes by recreation without Out-of-Service.

Another example. A quorum-based application would like to ensure that the number of replicas running is never brought below the number needed for a quorum. A web front end might want to ensure that the number of replicas serving load never falls below a certain percentage of the total.

There are two main approaches (they could be used together)

Option 1: `minAvailable`

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: myapp

Option 2: `maxUnavailable`

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp

More information on Pod Disruptions you can find on this page.

Recommended configuration

This is a common configuration for any service. The only thing that should be changed is the number.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: myapp

The deployment must have Replica Count set to at least 2! Otherwise, such PDB configuration will stuck automated Pod eviction and this should be solved manually

Conclusion

Pod Disruption Budget must be defined!

Pod Anti Affinity

Root cause

Running, for example, 3 pod replicas of your application deployment, the node goes down and all the replicas with it. Huh? All the replicas were running on one node? Wasn’t Kubernetes supposed to be magical and provide HA?! Nope, you can’t expect Kubernetes scheduler to enforce anti-affinities for your pods. You have to define them explicitly.

Recommended configuration

## ...skipped...
labels:
  app: myapp
## ...skipped...
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: "app"
              operator: In
              values:
                - myapp
        topologyKey: "kubernetes.io/hostname

So, here we hardly defined that only one pod could run on node by using “hard” rule, prefixed with required*. To avoid triggering autoscaler, we can use “soft” rule, prefixed with preferred*:

## ...skipped...
labels:
  app: myapp
## ...skipped...
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - myapp
          topologyKey: "kubernetes.io/hostname"

This is a simple check if the node already has a pod with the label app and its value myapp. That’s it. No magic here, sorry :)

Conclusion

This is highly recommended to be defined, but, bear in mind, if you use required* rules, in the case of HPA activities it could cause additional nodes to scale up to at least the number of replicas defined by HPA :)

So it’s recommended to use “soft” rule with the prefix preferred*.

More info here and examples with explanations are here.

Outro

I believe, there should more, but at this time, these look a must-have!

Happy deployments!

kubernetes devops k8s hpa vpa pdb

¯\_(ツ)_/¯

thunder@home:~$

This is my home blog, mostly to share some useful info or code snippets

Table of Contents

Intro

Resource Requests/Limits

Root cause

Recommended configuration

Conclusion

Vertical Pod Autoscaler

What is Vertical Pod Autoscaler?

How does it work?

Recommended configuration

Conclusion

Horizontal Pod Autoscaler

Root cause

Recommended configuration

Liveness and readiness probes

Root cause

Recommended configuration

Liveness probe

Readiness probe

Conclusion

Pod Disruption Budget

Root cause

Option 1: `minAvailable`

Option 2: `maxUnavailable`

Recommended configuration

Conclusion

Pod Anti Affinity

Root cause

Recommended configuration

Conclusion

Outro

¯\_(ツ)_/¯

thunder@home:~$

This is my home blog, mostly to share some useful info or code snippets

Table of Contents

Intro

Resource Requests/Limits

Root cause

Recommended configuration

Conclusion

Vertical Pod Autoscaler

What is Vertical Pod Autoscaler?

How does it work?

Recommended configuration

Conclusion

Horizontal Pod Autoscaler

Root cause

Recommended configuration

Liveness and readiness probes

Root cause

Recommended configuration

Liveness probe

Readiness probe

Conclusion

Pod Disruption Budget

Root cause

Option 1: minAvailable

Option 2: maxUnavailable

Recommended configuration

Conclusion

Pod Anti Affinity

Root cause

Recommended configuration

Conclusion

Outro

Option 1: `minAvailable`

Option 2: `maxUnavailable`