For our Blog Visitor only Get Additional 3 Month Free + 10% OFF on TriAnnual Plan YSBLOG10
Grab the Deal

How to Optimize Kubernetes on Linux Server – Easy Guide

To optimize Kubernetes on a Linux server, tune the OS (sysctl, cgroups, swap), use a modern container runtime (containerd/CRI-O), configure kubelet (reservations, eviction, CPU/Topology Managers), pick a fast CNI (Cilium/Calico) and storage, right-size requests/limits, optimize etcd, and measure with SLO-driven monitoring. Steps and examples below.

Running Kubernetes is easy; running it fast and reliably on a Linux server requires deliberate tuning. In this guide, I’ll show you how to optimize Kubernetes on Linux servers with practical, production-tested steps you can apply today—covering kernel tuning, kubelet configuration, networking, storage, scheduling, and autoscaling.

Search Intent and What You’ll Learn

This tutorial is for platform engineers, DevOps, and sysadmins seeking hands-on Kubernetes performance tuning on Linux. You’ll get a prioritized checklist, rationale behind each change, and copy-paste configurations. We’ll keep language beginner-friendly, but the techniques reflect real-world production experience.

Prerequisites and Baseline Checks

  • Linux kernel 5.4+ (newer kernels offer better cgroups, BPF, and networking performance).
  • cgroups v1 or v2 supported by your K8s version and runtime (v2 is fully supported in modern Kubernetes/containerd).
  • SSD/NVMe for etcd and container storage; XFS or ext4 formatted correctly for overlayfs.
  • Swap disabled (unless you explicitly configure NodeSwap and understand the trade-offs).
  • Time synced via chrony/systemd-timesyncd and entropy available (rngd) for TLS-heavy clusters.

Optimize the Linux OS for Kubernetes

These sysctl settings improve connection tracking, networking throughput, and resource limits. Adjust to your workload scale; test before global rollout.

# /etc/sysctl.d/99-kubernetes-tuning.conf
# Required for most CNIs using bridges
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1

# Increase conntrack table size for high connection churn
net.netfilter.nf_conntrack_max=262144
net.netfilter.nf_conntrack_buckets=65536

# Socket buffers and backlog
net.core.rmem_max=134217728
net.core.wmem_max=134217728
net.core.somaxconn=4096
net.ipv4.tcp_max_syn_backlog=4096

# Ephemeral port range for bursty traffic
net.ipv4.ip_local_port_range=1024 65000

# ARP/Neighbor cache thresholds (large node counts)
net.ipv4.neigh.default.gc_thresh1=4096
net.ipv4.neigh.default.gc_thresh2=8192
net.ipv4.neigh.default.gc_thresh3=16384

# File watchers (fixes issues with dev tools/controllers)
fs.inotify.max_user_watches=1048576
fs.inotify.max_user_instances=1024
# Apply immediately
sudo modprobe br_netfilter
sudo sysctl --system

Disable swap and align cgroups/systemd

  • Disable swap to avoid unpredictable memory reclaim: sudo swapoff -a && sed -i ‘/ swap / s/^/#/’ /etc/fstab.
  • Use systemd as the cgroup driver for kubelet and container runtime to prevent resource accounting drift.
# Verify cgroup driver alignment
# containerd: SystemdCgroup = true (see next section)
# kubelet: KubeletConfiguration cgroupDriver: "systemd"

Filesystem and container storage

  • Use XFS with ftype=1 or ext4 with d_type support for overlayfs.
  • Prefer NVMe/SSD for container and etcd storage; mount with noatime.
  • Enable image garbage collection thresholds in kubelet to prevent disk pressure.

Choose and Tune the Container Runtime

containerd and CRI-O are the fastest options for Kubernetes. Docker works via dockershim-replacements but adds an extra layer. For most clusters, containerd balances performance, features, and ecosystem support.

containerd optimal settings

# /etc/containerd/config.toml (key excerpts)
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "registry.k8s.io/pause:3.9"

[plugins."io.containerd.grpc.v1.cri".registry]
  # Optional: private mirror/cache to speed image pulls
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."your-mirror.local"]
    endpoint = ["https://your-mirror.local"]

Restart containerd after changes and pre-pull base images for latency-sensitive deployments.

Kubelet and Node-Level Configuration

KubeletConfiguration best practices

# /var/lib/kubelet/config.yaml (key excerpts)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
evictionHard:
  "memory.available": "200Mi"
  "nodefs.available": "10%"
  "imagefs.available": "10%"
kubeReserved:
  cpu: "500m"
  memory: "1Gi"
  ephemeral-storage: "2Gi"
systemReserved:
  cpu: "250m"
  memory: "512Mi"
  ephemeral-storage: "1Gi"
imageGCHighThresholdPercent: 80
imageGCLowThresholdPercent: 60
maxPods: 110
cpuManagerPolicy: "static"           # For CPU pinning of Guaranteed pods
topologyManagerPolicy: "restricted"  # Align CPU/memory/PCIe on NUMA nodes

Use Guaranteed QoS for latency-critical apps: set CPU and memory requests = limits. For general workloads, set realistic requests and avoid very low CPU limits to reduce throttling.

Networking Performance: CNI, kube-proxy, and MTU

CNI selection and tuning

  • Cilium (eBPF): Excellent performance, kube-proxy replacement, advanced observability.
  • Calico: Mature, policy-rich, good performance with IPVS; supports eBPF dataplane.
  • Flannel: Simple overlay; fine for small clusters, not the fastest.

Match MTU to your network. For VXLAN overlays, MTU often needs lowering (e.g., 1450) to avoid fragmentation. Enable kube-proxy IPVS mode or use Cilium’s kube-proxy replacement for better service load balancing.

# kube-proxy in IPVS mode (excerpt for kubeadm-managed clusters)
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr"

Storage and Volume Optimization

  • Use CSI drivers optimized for your platform (EBS, Ceph, Longhorn, OpenEBS, local PVs).
  • Prefer NVMe for etcd and high-IO workloads; avoid network-attached storage for etcd.
  • Tune filesystem (XFS with correct reflink/ftype, ext4 with journaling mode ordered, noatime).
  • Right-size PVCs and use ReadWriteOnce where possible for consistency and performance.

Workload Scheduling and Resource Management

Requests, limits, and QoS classes

  • Requests drive scheduling and capacity planning; set them to realistic averages.
  • Limits protect nodes, but strict CPU limits can cause CFS throttling. For throughput services, consider no CPU limit or a higher one.
  • Guaranteed QoS for latency-sensitive services (set requests = limits).

Topology and NUMA awareness

  • Enable CPU Manager (static) and Topology Manager (restricted/single-numa-node) for consistent latency on multi-socket servers.
  • Use nodeAffinity, podAntiAffinity, and topologySpreadConstraints to reduce contention and hotspots.

Autoscaling That Actually Works

Autoscaling is performance insurance. Use it to match capacity to demand and prevent overloaded nodes.

# Horizontal Pod Autoscaler example (CPU-based)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  • HPA for replicas based on CPU/memory/custom metrics.
  • VPA for rightsizing requests (run in recommendation mode for safety on critical apps).
  • Cluster Autoscaler to add/remove nodes (or provider-native tools). Ensure your cloud provider/node group integrates properly.

Control Plane and etcd Tuning

  • Place etcd on dedicated, local NVMe with low latency; avoid remote/networked disks.
  • Run 3–5 etcd members; more is not always better due to quorum latency.
  • Set appropriate etcd resource limits; monitor WAL fsync latency and apply periodic defrag.
  • Co-locate control plane components on adequately sized nodes; pin CPU/memory if noisy neighbors exist.

Observability and SLO-Driven Optimization

  • Metrics: Prometheus + kube-state-metrics; watch scheduler latency, API server requests, etcd commit latency, container CPU throttling, throttled seconds, and node pressure signals.
  • Logs: Use fluent-bit (lightweight) and enable log rotation for containerd JSON logs.
  • Tracing: OpenTelemetry for request path visibility; invaluable for pinpointing network vs CPU bottlenecks.
  • Dashboards: Define SLOs per service (p95 latency, error budget) and optimize only what impacts them.

Quick Wins and Common Bottlenecks

  • Switch kube-proxy to IPVS or use Cilium eBPF dataplane.
  • Align cgroupDriver=systemd for both kubelet and containerd.
  • Right-size requests and remove overly strict CPU limits if throttling is observed.
  • Increase nf_conntrack_max for connection-heavy microservices.
  • Lower MTU on overlays to prevent fragmentation.
  • Enable CPU/Topology Manager for latency-sensitive workloads.
  • Use NVMe for etcd and container storage; set image GC thresholds.

Security Settings That Also Improve Performance

  • Use seccomp and AppArmor to reduce kernel attack surface and syscall overhead variability.
  • Drop unnecessary Linux capabilities (e.g., remove NET_RAW) to limit risk and reduce packet handling overhead in some cases.
  • Run minimal, distroless images to shrink attack surface and startup times.

Example: Putting It All Together

For a new node pool handling user-facing APIs with high connection churn:

  • Kernel/sysctl: apply the config above, ensure nf_conntrack_max=262144, load br_netfilter.
  • Runtime: containerd with SystemdCgroup=true, registry mirror enabled.
  • Kubelet: CPU Manager static, Topology Manager restricted, eviction and image GC thresholds set, reservations applied.
  • CNI: Cilium with eBPF kube-proxy replacement, MTU tuned to 1450.
  • Workloads: Guaranteed QoS for gateway pods, HPA target 60% CPU, no CPU limit for high-throughput services, only requests.
  • Observability: Prometheus alerts on API latency, conntrack usage, CPU throttling, and disk pressure.

FAQ’s: Optimizing Kubernetes on Linux

1. What sysctl settings are best for Kubernetes performance?

Increase conntrack capacity (nf_conntrack_max), enable bridge netfilter for CNIs, raise socket buffers (rmem_max/wmem_max), widen ephemeral ports, and increase neighbor cache thresholds. See the sysctl snippet above for a production-ready baseline.

2. Is containerd faster than Docker for Kubernetes?

Yes, containerd or CRI-O typically offer lower overhead and tighter CRI integration than Docker’s shim-based setups. containerd is a strong default for performance, stability, and ecosystem support. Ensure SystemdCgroup=true for best results.

3. Should I disable swap on Kubernetes nodes?

In most cases, yes. Disabling swap avoids unpredictable reclaim latency and aligns with kubelet’s memory management. Swap support exists in newer Kubernetes versions but is advanced and not recommended for typical production clusters.

4. Which CNI is best for high performance?

Cilium (eBPF) leads for performance and advanced features, including kube-proxy replacement and deep observability. Calico is mature and fast, especially with IPVS or its eBPF dataplane. Flannel is simple but not the fastest at scale.

5. How do CPU limits affect performance in Kubernetes?

Strict CPU limits enforce CFS quotas, which can cause throttling and latency spikes under load. For throughput-critical services, consider setting only requests (no limits) or higher limits, and use Guaranteed QoS for predictable CPU allocation.

Final Checklist

  • Linux tuned: sysctl applied, swap off, cgroup/systemd aligned.
  • containerd/CRI-O configured; registry mirror and log rotation enabled.
  • kubelet reservations, eviction, image GC, CPU/Topology managers set.
  • Fast CNI (Cilium/Calico), MTU correct; kube-proxy in IPVS or eBPF replacement.
  • NVMe-backed etcd and container storage; CSI driver optimized.
  • Requests/limits set for QoS; HPA/VPA and Cluster Autoscaler in place.
  • Observability with Prometheus and alerts on key latency and pressure metrics.

Follow this roadmap and you’ll run a Kubernetes cluster on Linux that is faster, more predictable, and easier to scale. If you prefer expert help, YouStable’s engineers can design, benchmark, and manage a fully optimized Kubernetes stack tailored to your workloads.

Prahlad Prajapati

Prahlad is a web hosting specialist and SEO-focused organic growth expert from India. Active in the digital space since 2019, he helps people grow their websites through clean, sustainable strategies. Passionate about learning and adapting fast, he believes small details create big success. Discover his insights on web hosting and SEO to elevate your online presence.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top