参考:

https://mp.weixin.qq.com/s/D8efjj9ZhLyEu7zEqWvJiQ

https://stackoverflow.com/questions/71860152/actuator-health-endpoint-returns-out-of-service-when-all-groups-are-up

https://docs.spring.io/spring-boot/docs/2.6.x/reference/htmlsingle/#actuator.endpoints.kubernetes-probes

本文使用 K8s + SpringBoot 实现零宕机发布:健康检查 + 滚动更新 + 优雅停机 + 弹性伸缩 + Prometheus监控 + 配置分离(镜像复用)

配置

健康检查

  • 健康检查类型:就绪探针(readiness)+ 存活探针(liveness)
  • 探针类型:exec(进入容器执行脚本)、tcpSocket(探测端口)、httpGet(调用接口)

业务层面

项目依赖 pom.xml

1
2
3
4
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

定义访问端口、路径及权限 application.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
management:
server:
port: 50000 # 启用独立运维端口(可选,如果不配置,则请求服务端口)
endpoint: # 开启health端点
health:
probes:
enabled: true
# 如果配置add-additional-paths=true,这将使主服务器端口上的liveness在/livez可用,readiness在/readyz可用。
# add-additional-paths: true
# 如果配置show-details: always,将在当前endpoint展示更多节点信息(包含ping结果,livenessState,readinessState,diskSpace等)
# show-details: always
# 通过配置group.*.include,来添加访问/health/(liveness|readiness)接口的结果信息
# group:
# readiness:
# include:
# - readinessState
# - ping
# liveness:
# include:
# - readinessState
# - ping
endpoints:
web:
exposure:
# 默认就是/actuator
base-path: /actuator # 指定上下文路径,启用相应端点
include: health

将暴露/actuator/health/readiness/actuator/health/liveness两个接口,访问方式如下:

1
2
3
http://127.0.0.1:50000/actuator/health -》 返回组下所有信息
http://127.0.0.1:50000/actuator/health/readiness -》 返回readiness组下信息
http://127.0.0.1:50000/actuator/health/liveness -》 返回liveness组下信息

运维层面

k8s部署模版deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: {APP_NAME}
image: {IMAGE_URL}
imagePullPolicy: Always
ports:
- containerPort: {APP_PORT}
- name: management-port
containerPort: 50000 # 应用管理端口
readinessProbe: # 就绪探针
httpGet:
path: /actuator/health/readiness
port: management-port
initialDelaySeconds: 90 # 延迟加载时间
periodSeconds: 30 # 重试时间间隔
timeoutSeconds: 30 # 超时时间设置
successThreshold: 1 # 健康阈值
failureThreshold: 3 # 不健康阈值
livenessProbe: # 存活探针
httpGet:
path: /actuator/health/liveness
port: management-port
initialDelaySeconds: 90 # 延迟加载时间
periodSeconds: 30 # 重试时间间隔
timeoutSeconds: 30 # 超时时间设置
successThreshold: 1 # 健康阈值
failureThreshold: 3 # 不健康阈值

滚动更新

k8s资源调度之滚动更新策略,若要实现零宕机发布,需支持健康检查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: apps/v1
kind: Deployment
metadata:
name: {APP_NAME}
labels:
app: {APP_NAME}
spec:
selector:
matchLabels:
app: {APP_NAME}
replicas: {REPLICAS} # Pod副本数
strategy:
type: RollingUpdate # 滚动更新策略
rollingUpdate:
maxSurge: 1 # 升级过程中最多可以比原先设置的副本数多出的数量
maxUnavailable: 1 # 升级过程中最多有多少个POD处于无法提供服务的状态

优雅停机

在K8s中,当我们实现滚动升级之前,务必要实现应用级别的优雅停机。否则滚动升级时,还是会影响到业务。使应用关闭线程、释放连接资源后再停止服务

业务层面

项目依赖 pom.xml

1
2
3
4
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

定义访问端口、路径及权限 application.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
spring:
application:
name: <xxx>
profiles:
active: @profileActive@
lifecycle:
timeout-per-shutdown-phase: 30s # 停机过程超时时长设置30s,超过30s,直接停机

server:
port: 8080
shutdown: graceful # 默认为IMMEDIATE,表示立即关机;GRACEFUL表示优雅关机

management:
server:
port: 50000 # 启用独立运维端口
endpoint: # 开启shutdown和health端点
shutdown:
enabled: true
health:
probes:
enabled: true
# 如果配置add-additional-paths=true,这将使主服务器端口上的liveness在/livez可用,readiness在/readyz可用。
# add-additional-paths: true
# 如果配置show-details: always,将在当前endpoint展示更多节点信息(包含ping结果,livenessState,readinessState,diskSpace等)
# show-details: always
# 通过配置group.*.include,来添加访问/health/(liveness|readiness)接口的结果信息
# group:
# readiness:
# include:
# - readinessState
# - ping
# liveness:
# include:
# - readinessState
# - ping
endpoints:
web:
exposure:
base-path: /actuator # 指定上下文路径,启用相应端点
include: health,shutdown

将暴露/actuator/shutdown接口,调用方式如下:

1
curl -X POST 127.0.0.1:50000/actuator/shutdown

运维层面

确保dockerfile模版集成curl工具,否则无法使用curl命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
FROM openjdk:8-jdk-alpine
#构建参数
ARG JAR_FILE
ARG WORK_PATH="/app"
ARG EXPOSE_PORT=8080

#环境变量
ENV JAVA_OPTS=""\
JAR_FILE=${JAR_FILE}

#设置时区
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories \
&& apk add --no-cache curl
#将maven目录的jar包拷贝到docker中,并命名为for_docker.jar
COPY target/$JAR_FILE $WORK_PATH/


#设置工作目录
WORKDIR $WORK_PATH


# 指定于外界交互的端口
EXPOSE $EXPOSE_PORT
# 配置容器,使其可执行化
ENTRYPOINT exec java $JAVA_OPTS -jar $JAR_FILE

k8s部署模版deployment.yaml

注:经验证,java项目可省略结束回调钩子的配置

此外,若需使用回调钩子,需保证镜像中包含curl工具,且需注意应用管理端口(50000)不能暴露到公网

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: {APP_NAME}
image: {IMAGE_URL}
imagePullPolicy: Always
ports:
- containerPort: {APP_PORT}
- containerPort: 50000
lifecycle:
# 方式二选一
#httpGet:
# port: management-port
# path: /actuator/shutdown
preStop: # 结束回调钩子
exec:
command: ["curl", "-XPOST", "127.0.0.1:50000/actuator/shutdown"]

弹性伸缩

为pod设置资源限制后,创建HPA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: apps/v1
kind: Deployment
metadata:
name: {APP_NAME}
labels:
app: {APP_NAME}
spec:
template:
spec:
containers:
- name: {APP_NAME}
image: {IMAGE_URL}
imagePullPolicy: Always
resources: # 容器资源管理
limits: # 资源限制(监控使用情况)
cpu: 0.5
memory: 1Gi
requests: # 最小可用资源(灵活调度)
cpu: 0.15
memory: 300Mi
---
kind: HorizontalPodAutoscaler # 弹性伸缩控制器
apiVersion: autoscaling/v2beta2
metadata:
name: {APP_NAME}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {APP_NAME}
minReplicas: {REPLICAS} # 缩放范围
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu # 指定资源指标
target:
type: Utilization
averageUtilization: 50

Prometheus集成

业务层面

项目依赖 pom.xml

1
2
3
4
5
6
7
8
9
<!-- 引入Spring boot的监控机制-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

定义访问端口、路径及权限 application.yaml

1
2
3
4
5
6
7
8
9
10
11
management:
server:
port: 50000 # 启用独立运维端口
metrics:
tags:
application: ${spring.application.name}
endpoints:
web:
exposure:
base-path: /actuator # 指定上下文路径,启用相应端点
include: metrics,prometheus

将暴露/actuator/metric/actuator/prometheus接口,访问方式如下:

1
2
http://127.0.0.1:50000/actuator/metric
http://127.0.0.1:50000/actuator/prometheus

运维层面

deployment.yaml

1
2
3
4
5
6
7
8
9
apiVersion: apps/v1
kind: Deployment
spec:
template:
metadata:
annotations:
prometheus:io/port: "50000"
prometheus.io/path: /actuator/prometheus # 在流水线中赋值
prometheus.io/scrape: "true" # 基于pod的服务发现

配置分离

方案:通过configmap挂载外部配置文件,并指定激活环境运行

作用:配置分离,避免敏感信息泄露;镜像复用,提高交付效率

通过文件生成configmap

1
2
3
4
5
# 通过dry-run的方式生成yaml文件
kubectl create cm -n <namespace> <APP_NAME> --from-file=application-test.yaml --dry-run=1 -oyaml > configmap.yaml

# 更新
kubectl apply -f configmap.yaml

挂载configmap并指定激活环境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: apps/v1
kind: Deployment
metadata:
name: {APP_NAME}
labels:
app: {APP_NAME}
spec:
template:
spec:
containers:
- name: {APP_NAME}
image: {IMAGE_URL}
imagePullPolicy: Always
env:
- name: SPRING_PROFILES_ACTIVE # 指定激活环境
value: test
volumeMounts: # 挂载configmap
- name: conf
mountPath: "/app/config" # 与Dockerfile中工作目录一致
readOnly: true
volumes:
- name: conf
configMap:
name: {APP_NAME}

汇总配置

业务层面

项目依赖 pom.xml

1
2
3
4
5
6
7
8
9
<!-- 引入Spring boot的监控机制-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

定义访问端口、路径及权限 application.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
spring:
application:
name: project-sample
profiles:
active: @profileActive@
lifecycle:
timeout-per-shutdown-phase: 30s # 停机过程超时时长设置30s,超过30s,直接停机

server:
port: 8080
shutdown: graceful # 默认为IMMEDIATE,表示立即关机;GRACEFUL表示优雅关机

management:
server:
port: 50000 # 启用独立运维端口
metrics:
tags:
application: ${spring.application.name}
endpoint: # 开启shutdown和health端点
shutdown:
enabled: true
health:
probes:
enabled: true
# 如果配置add-additional-paths=true,这将使主服务器端口上的liveness在/livez可用,readiness在/readyz可用。
# add-additional-paths: true
# 如果配置show-details: always,将在当前endpoint展示更多节点信息(包含ping结果,livenessState,readinessState,diskSpace等)
# show-details: always
# 通过配置group.*.include,来添加访问/health/(liveness|readiness)接口的结果信息
# group:
# readiness:
# include:
# - readinessState
# - ping
# liveness:
# include:
# - readinessState
# - ping
endpoints:
web:
exposure:
base-path: /actuator # 指定上下文路径,启用相应端点
include: health,shutdown,metrics,prometheus

运维层面

确保dockerfile模版集成curl工具,否则无法使用curl命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
FROM openjdk:8-jdk-alpine
#构建参数
ARG JAR_FILE
ARG WORK_PATH="/app"
ARG EXPOSE_PORT=8080

#环境变量
ENV JAVA_OPTS=""\
JAR_FILE=${JAR_FILE}

#设置时区
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories \
&& apk add --no-cache curl
#将maven目录的jar包拷贝到docker中,并命名为for_docker.jar
COPY target/$JAR_FILE $WORK_PATH/


#设置工作目录
WORKDIR $WORK_PATH


# 指定于外界交互的端口
EXPOSE $EXPOSE_PORT
# 配置容器,使其可执行化
ENTRYPOINT exec java $JAVA_OPTS -jar $JAR_FILE

k8s部署模版deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
apiVersion: apps/v1
kind: Deployment
metadata:
name: {APP_NAME}
labels:
app: {APP_NAME}
spec:
selector:
matchLabels:
app: {APP_NAME}
replicas: {REPLICAS} # Pod副本数
strategy:
type: RollingUpdate # 滚动更新策略
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
name: {APP_NAME}
labels:
app: {APP_NAME}
annotations:
timestamp: {TIMESTAMP}
prometheus.io/port: "50000" # 不能动态赋值
prometheus.io/path: /actuator/prometheus
prometheus.io/scrape: "true" # 基于pod的服务发现
spec:
affinity: # 设置调度策略,采取多主机/多可用区部署
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- {APP_NAME}
topologyKey: "kubernetes.io/hostname" # 多可用区为"topology.kubernetes.io/zone"
terminationGracePeriodSeconds: 30 # 优雅终止宽限期
containers:
- name: {APP_NAME}
image: {IMAGE_URL}
imagePullPolicy: Always
ports:
- containerPort: {APP_PORT}
- name: management-port
containerPort: 50000 # 应用管理端口
readinessProbe: # 就绪探针
httpGet:
path: /actuator/health/readiness
port: management-port
initialDelaySeconds: 90 # 延迟加载时间
periodSeconds: 30 # 重试时间间隔
timeoutSeconds: 30 # 超时时间设置
successThreshold: 1 # 健康阈值
failureThreshold: 3 # 不健康阈值
livenessProbe: # 存活探针
httpGet:
path: /actuator/health/liveness
port: management-port
initialDelaySeconds: 90 # 延迟加载时间
periodSeconds: 30 # 重试时间间隔
timeoutSeconds: 30 # 超时时间设置
successThreshold: 1 # 健康阈值
failureThreshold: 3 # 不健康阈值
resources: # 容器资源管理
limits: # 资源限制(监控使用情况)
cpu: 0.5
memory: 1Gi
requests: # 最小可用资源(灵活调度)
cpu: 0.1
memory: 200Mi
env:
- name: TZ
value: Asia/Shanghai
---
kind: HorizontalPodAutoscaler # 弹性伸缩控制器
apiVersion: autoscaling/v2beta2
metadata:
name: {APP_NAME}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {APP_NAME}
minReplicas: {REPLICAS} # 缩放范围
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu # 指定资源指标
target:
type: Utilization
averageUtilization: 50

问题

程序中有段代码:在CommandLineRunner.run当中执行while(true){...},永无休止的执行一段代码

这会导致一个问题:这个程序永远都无法正常停止!当执行健康检查的/readiness接口时,返回的status永远都是503

解决办法:while(true)放在单独的一个子线程执行