Kubernetes 实战技巧与命令速查

学习资源

资源	链接	说明
K8S训练营	qikqiak.com	系统学习
kubectl速查	jimmysong.io	命令大全
K8s中文文档	docs.kubernetes.org.cn	官方文档
API参考	kubernetes.io	API文档

Deployment 配置要点

# 必须配置项
spec:
  selector:
    matchLabels:      # 1. 必须定义
      app: myapp
  template:
    metadata:
      labels:         # 2. 必须与selector匹配
        app: myapp
    spec:
      containers:     # 3. containers中不能定义labels
      - name: app

常用命令速查

资源清理

# 查找namespace下所有资源
kubectl api-resources --verbs=list --namespaced -o name | \
  xargs -n 1 kubectl get --show-kind --ignore-not-found -n <ns>

# 批量删除资源
kubectl get ingress -n <ns> | grep <name> | awk '{print $1}' | \
  xargs kubectl delete ingress -n <ns>

资源监控

# 查看集群CPU/内存占用
kubectl describe node | \
  grep -E '((Name|Roles):\s{6,})|(\s+(memory|cpu)\s+[0-9]+\w{0,2}.+%\))'

# 查看node节点下pod资源使用
NAME_SPACE=default
NODE=node1
kubectl get pods -n $NAME_SPACE -o wide | \
  awk '{if(NF>9){print $1,$9}else{print $1,$7}}' | \
  grep $NODE | while read name node; do
    echo -n "$name $node "
    kubectl top pod $name -n $NAME_SPACE | grep -v NAME | \
      tail -1 | awk '{print $2,$3,$5}'
  done | sort -nrk 3

容器操作

# 容器内执行命令
kubectl exec <pod> -c <container> -n <ns> -- <command>

# 查看pod中的容器
kubectl get pod <pod> -n <ns> -o jsonpath={.spec.containers[*].name}

# 强制重启pod
kubectl get pod <pod> -n <ns> -o yaml | kubectl replace --force -f -

标签管理

# 查看节点标签
kubectl describe node <node> | grep Labels

# 删除标签
kubectl label nodes <node> <label-key>-

replace vs apply

# replace：完全替换（对象必须存在）
kubectl replace -f config.yaml

# apply：PATCH更新（对象可不存在，无变化时不操作）
kubectl apply -f config.yaml

Pod 重调度机制

默认配置

当节点故障时，Pod需要5分钟以上才能重新调度，原因：

kubelet 配置：

node-status-update-frequency: 10s（状态上报频率）

controller-manager 配置：

node-monitor-period: 5s（状态同步周期）
node-monitor-grace-period: 40s（认定不健康时间）
pod-eviction-timeout: 5m0s（驱逐超时）

计算： 40s（grace） + 5m（eviction） ≈ 5分40秒

优化方案

调整容忍时间（v1.13+）：

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  tolerations:
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30  # 默认300秒，改为30秒
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 30