K8S 性能優化 - OS sysctl 調優
前言
K8S 性能優化系列文章,本文為第一篇:OS sysctl 性能優化參數最佳實踐。
參數一覽
sysctl 調優參數一覽
# Kubernetes Settings
vm.max_map_count = 262144
kernel.softlockup_panic = 1
kernel.softlockup_all_cpu_backtrace = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
# Increase the number of connections
net.core.somaxconn = 32768
# Maximum Socket Receive Buffer
net.core.rmem_max = 16777216
# Maximum Socket Send Buffer
net.core.wmem_max = 16777216
# Increase the maximum total buffer-space allocatable
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
# Increase the number of outstanding syn requests allowed
net.ipv4.tcp_max_syn_backlog = 8096
# For persistent HTTP connections
net.ipv4.tcp_slow_start_after_idle = 0
# Allow to reuse TIME_WAIT sockets for new connections
# when it is safe from protocol viewpoint
net.ipv4.tcp_tw_reuse = 1
# Max number of packets that can be queued on interface input
# If kernel is receiving packets faster than can be processed
# this queue increases
net.core.netdev_max_backlog = 16384
# Increase size of file handles and inode cache
fs.file-max = 2097152
# Max number of inotify instances and watches for a user
# Since dockerd runs as a single user, the default instances value of 128 per user is too low
# e.g. uses of inotify: nginx ingress controller, kubectl logs -f
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
# Additional sysctl flags that kubelet expects
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1
# Prevent docker from changing iptables: https://github.com/kubernetes/kubernetes/issues/40182
net.ipv4.ip_forward=1
如果是 AWS,額外增加如下:
# AWS settings
# Issue #23395
net.ipv4.neigh.default.gc_thresh1=0
如果啟用了 IPv6,額外增加如下:
# Enable IPv6 forwarding for network plugins that don't do it themselves
net.ipv6.conf.all.forwarding=1
參數解釋
分類 | 內核參數 | 說明 | 參考鏈接 |
---|---|---|---|
Kubernetes | vm.max_map_count = 262144 |
限制一個進程可以擁有的VMA(虛擬內存區域)的數量, 一個更大的值對于 elasticsearch、mongo 或其他 mmap 用戶來說非常有用 |
ES Configuration |
Kubernetes | kernel.softlockup_panic = 1 |
用于解決 K8S 內核軟鎖相關 bug | root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com) |
Kubernetes | kernel.softlockup_all_cpu_backtrace = 1 |
用于解決 K8S 內核軟鎖相關 bug | root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com) |
Kubernetes | net.ipv4.ip_local_reserved_ports = 30000-32767 |
默認 K8S Nodport 端口 | service-node-port-range and ip_local_port_range collision · Issue #6342 · kubernetes/kops (github.com) |
網絡 | net.core.somaxconn = 32768 |
表示socket監聽(listen)的backlog上限。什么是backlog?backlog就是socket的監聽隊列,當一個請求(request)尚未被處理或建立時,他會進入backlog。 增加連接數. |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.core.rmem_max = 16777216 |
接收套接字緩沖區大小的最大值(以字節為單位)。 最大化 Socket Receive Buffer |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.core.wmem_max = 16777216 |
發送套接字緩沖區大小的最大值(以字節為單位)。 最大化 Socket Send Buffer |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 |
增加總的可分配的 buffer 空間的最大值 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.ipv4.tcp_max_syn_backlog = 8096 |
表示那些尚未收到客戶端確認信息的連接(SYN消息)隊列的長度,默認為1024 增加未完成的syn請求的數量 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.ipv4.tcp_slow_start_after_idle = 0 |
持久化 HTTP 連接 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.ipv4.tcp_tw_reuse = 1 |
表示允許重用TIME_WAIT狀態的套接字用于新的TCP連接,默認為0,表示關閉。 允許在協議安全的情況下重用TIME_WAIT 套接字用于新的連接 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.core.netdev_max_backlog = 16384 |
當網卡接收數據包的速度大于內核處理的速度時,會有一個隊列保存這些數據包。這個參數表示該隊列的最大值 如果內核接收數據包的速度超過了可以處理的速度,這個隊列就會增加 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
文件系統 | fs.file-max = 2097152 |
該參數決定了系統中所允許的文件句柄最大數目,文件句柄設置代表linux系統中可以打開的文件的數量。 增加文件句柄和inode緩存的大小 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
文件系統 | fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288 |
一個用戶的inotify實例和watch的最大數量 由于dockerd作為單個用戶運行,每個用戶的默認實例值128太低了 例如使用inotify: nginx ingress controller, kubectl logs -f |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
kubelet | vm.overcommit_memory = 1 |
對內存分配的一種策略 =1, 表示內核允許分配所有的物理內存,而不管當前的內存狀態如何 |
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
kubelet | kernel.panic = 10 |
panic錯誤中自動重啟,等待時間為10秒 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
kubelet | kernel.panic_on_oops = 1 |
在Oops發生時會進行panic()操作 | Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com) |
網絡 | net.ipv4.ip_forward=1 |
啟用ip轉發 另外也防止docker改變iptables |
Upgrading docker 1.13 on nodes causes outbound container traffic to stop working · Issue #40182 · kubernetes/kubernetes (github.com) |
網絡 | net.ipv4.neigh.default.gc_thresh1=0 |
修復 AWS arp_cache: neighbor table overflow! 報錯 |
arp_cache: neighbor table overflow! · Issue #4533 · kubernetes/kops (github.com) |
EOF
三人行, 必有我師; 知識共享, 天下為公. 本文由東風微鳴技術博客 EWhisper.cn 編寫.