需求 同一套环境, 两个微服务serviceA和serviceB, 且分别有2个版本original, v1
调用链路: serviceA -> serviceB
具体分为以下几种情况
如果serviceA和serviceB都有v1版本
serviceA(v1) -> serviceB(v1)
如果serviceA有v1版本, serviceB没有
serviceA(v1) -> serviceB(original)
如果serviceA没有v1版本, 而serviceB有
serviceA(original) -> serviceB(v1)
技术方案 流量染色 什么是流量染色
在元数据中心(这里可以代指我们的k8s集群),维护每个环境对应的服务列表;在流量的入口处,对请求添加标识;在基础框架层,对流量标识进行解析、透传 和 服务路由。
实际操作一般是在我们的 HTTP 请求中,加入对应环境,用户等变量标识,使请求可以根据这些标识做分类,转发等操作
为什么需要流量染色
使不同的服务,共享环境
可以本地调试特定的服务,而不阻碍服务的正常运行
总结: 降成本, 测试提效, 环境治理,可控
istio - 路由控制 istio版本: 1.14.1
路由这个功能是流量控制里面非常重要,也是最常用的一个功能。在Istio里一般通过Virtual Service(虚拟服务)以及Destination Rule(目标规则)这两个API资源进行动态路由的设置。
虚拟服务(Virtual Service ):
目标规则(Destination Rule ):
我们的方案是, 添加一个request header(project-version), 每次请求会把此header发到下个服务, 保证整个请求链路都带着这个标识
那么整个流程就是这个样子
技术实现(包含测试需要) 功能 技术栈 ci/cd gitlab-runner 流量控制(路由规则) istio(VirtualService/DestinationRule) 两个微服务(serviceA, serviceB) nodejs, koa serviceA的外挂访问配置 istio ingressgateway 作为VirtualService/DestinationRule的管理工具 Rancher 传递标识(传递 request header) axios
测试流程梳理 准备两个nodejs微服务(testaaa和testbbb) testaaa服务需要做的是把从上游获取到的header发给下游
1 2 3 4 5 6 7 router.get ('/' , async (ctx) => { const header = ctx.headers ['project-version' ]; const res = await ctx.rest .get (`${bbbUrl} ` ,{},{ headers : {'project-version' : header || '-' } }); ctx.body = res; });
testbbb服务将当前pod的版本号作为结果返回给testaaa
1 2 3 router.get ('/' , async (ctx) => { ctx.body = process.env .PROJECT_VERSION ; });
准备好两个服务的配置资源文件 Dockerfile(两个项目一样)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 FROM node:12.22 .0 -alpine3.12 as buildWORKDIR /user/src/app RUN set -eux \ && sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories \ && apk add --no-cache curl gcc g++ make linux-headers python2 python3 python3-dev COPY package.json package-lock.json ./ RUN npm install FROM node:12.22 .0 -alpine3.12 as runtimeWORKDIR /user/src/app RUN set -eux \ && apk add --no-cache tzdata \ && cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ && apk del tzdata COPY --from=build /user/src/app/node_modules ./node_modules/ COPY . . CMD [ "npm" , "run" , "start" ]
testaaa, deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 apiVersion: v1 kind: Service metadata: name: testaaa namespace: test-istio spec: selector: app: testaaa ports: - port: 31000 targetPort: 31000 appProtocol: HTTP type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: testaaa-${CI_COMMIT_REF_NAME} namespace: test-istio labels: app: testaaa version: ${CI_COMMIT_REF_NAME} spec: replicas: 1 selector: matchLabels: app: testaaa version: ${CI_COMMIT_REF_NAME} strategy: type: Recreate template: metadata: annotations: sidecar.istio.io/inject: 'true' labels: app: testaaa version: ${CI_COMMIT_REF_NAME} spec: containers: - image: $REGISTRY_ADDRESS/${NODE_ENV}/${CI_PROJECT_NAME}:v${CI_PIPELINE_ID} env: - name: NODE_ENV value: development - name: PROJECT_VERSION value: ${CI_COMMIT_REF_NAME} imagePullPolicy: IfNotPresent livenessProbe: tcpSocket: port: 31000 readinessProbe: tcpSocket: port: 31000 name: testaaa ports: - containerPort: 31000 dnsPolicy: ClusterFirst restartPolicy: Always
testbbb, deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 apiVersion: v1 kind: Service metadata: name: testbbb namespace: test-istio spec: selector: app: testbbb ports: - port: 32000 targetPort: 32000 appProtocol: HTTP type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: testbbb-${CI_COMMIT_REF_NAME} namespace: test-istio labels: app: testbbb version: ${CI_COMMIT_REF_NAME} spec: replicas: 1 selector: matchLabels: app: testbbb version: ${CI_COMMIT_REF_NAME} strategy: type: Recreate template: metadata: annotations: sidecar.istio.io/inject: 'true' labels: app: testbbb version: ${CI_COMMIT_REF_NAME} spec: containers: - image: $REGISTRY_ADDRESS/${NODE_ENV}/${CI_PROJECT_NAME}:v${CI_PIPELINE_ID} env: - name: NODE_ENV value: development - name: PROJECT_VERSION value: ${CI_COMMIT_REF_NAME} imagePullPolicy: IfNotPresent livenessProbe: tcpSocket: port: 32000 readinessProbe: tcpSocket: port: 32000 name: testbbb ports: - containerPort: 32000 dnsPolicy: ClusterFirst restartPolicy: Always
因为istio sidecar要运行到每个POD中进行流量管控,所以我们需要为所需的namespace开启POD自动注入istio sidecar的功能【必须】 1 2 kubectl label namespace test-istio istio-injection=enabled
准备gitlab-ci配置文件, 并部署两个项目的release
版本和release-v1
版本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 stages: - build - deploy build-release: stage: build variables: IMAGE: release/${CI_PROJECT_NAME}:v${CI_PIPELINE_ID} script: - echo "Building application..." - echo "$REGISTRY_PASSWORD" | sudo docker login -u ${REGISTRY_USERNAME} --password-stdin ${REGISTRY_ADDRESS} - echo "registry login success" - sudo docker build -t ${REGISTRY_ADDRESS}/${IMAGE} . - sudo docker push ${REGISTRY_ADDRESS}/${IMAGE} - echo "docker push && push success" tags: - build-runner only: - release - /^release-.*/ deploy-release: stage: deploy variables: NODE_ENV: release script: - echo "Deploying application..." - envsubst < deployment.yaml > deployment_new.yaml - ssh -p ${RELEASE_CI_PORT} -tt ${RELEASE_CI_USER}@${RELEASE_CI_IP} "[ -d ${DEPLOY_PATH} ] && echo ok || mkdir -p ${DEPLOY_PATH}" - scp deployment_new.yaml ${RELEASE_CI_USER}@${RELEASE_CI_IP}:${DEPLOY_PATH}/deployment_${CI_PROJECT_NAME}.yaml - ssh -p ${RELEASE_CI_PORT} -tt ${RELEASE_CI_USER}@${RELEASE_CI_IP} "cd ${DEPLOY_PATH} && kubectl apply -f deployment_${CI_PROJECT_NAME}.yaml" - echo "Application successfully deployed." tags: - back-release only: - release - /^release-.*/
如下
配置ingress, 使testaaa可以对外访问(下一步会在vs中结合配置) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: gateway namespace: test-istio spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*"
准备两个服务各自的DestinationRule, 和VirtualService 管理当前服务所有子集(版本)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: testaaa namespace: test-istio spec: host: testaaa.test-istio.svc.cluster.local subsets: - labels: version: release name: release - labels: version: release-v1 name: release-v1
管理当前服务路由规则(流量控制)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: testaaa namespace: test-istio spec: gateways: - gateway hosts: - "*" http: - match: - headers: project-version: exact: release-v1 route: - destination: host: testaaa.test-istio.svc.cluster.local subset: release-v1 - route: - destination: host: testaaa.test-istio.svc.cluster.local subset: release
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: testbbb namespace: test-istio spec: hosts: - testbbb.test-istio.svc.cluster.local http: - match: - headers: project-version: exact: release-v1 route: - destination: host: testbbb.test-istio.svc.cluster.local subset: release-v1 - route: - destination: host: testbbb.test-istio.svc.cluster.local subset: release --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: testbbb namespace: test-istio spec: host: testbbb.test-istio.svc.cluster.local subsets: - labels: version: release name: release - labels: version: release-v1 name: release-v1 ---
部署好所有服务,会发现在当前命名空间部署的所有pod内都会多一个容器,和一个初始化容器 测试访问(注:这个测试的访问流量是非本集群内出去的,但是先走了istio ingressgateway,所以testaaa和testbbb的路由规则一定会生效) 先得到ingress的ip和端口 1 2 3 4 5 6 [root@k3s-release-server1 ~] 31380 [root@k3s-release-server1 ~] 10.1.4.5
1 2 3 [root@k3s-release-server1 ~] release [root@k3s-release-server1 ~]
1 2 3 [root@k3s-release-server1 ~] release-v1 [root@k3s-release-server1 ~]
1 2 3 [root@k3s-release-server1 ~] release [root@k3s-release-server1 ~]
1 2 3 [root@k3s-release-server1 ~] release [root@k3s-release-server1 ~]
1 2 3 [root@k3s-release-server1 ~] release-v3 [root@k3s-release-server1 ~]
testaaa服务的链路图 testbbb服务的链路图 延申(vs如果不加gateway字段,默认会配置,值是mesh,表示集群内部所有 Sidecar,也就表示此 VirutualService
规则针对集群内的访问生效) 小结
以上测试效果仅适合上游请求也是从当前namespace进来的,
假如在非当前namespace内访问testaaa服务, 不经过gateway, 它呈现出的效果就是testaaa服务规则不会生效, 但是testbbb的规则生效,
假如在非当前namespace内直接访问testbbb服务, 那么testbbb的规则不会生效,
假如在当前namespace直接访问testbbb, 那么testbbb的规则则会生效
直接在主机上访问service是会不生效的,istio的灰度规则是通过请求端的sidecar生效的。可以在一个注入了sidecar的pod(或者请求需流经注入了sidecar的pod)里访问service。以使得流量规则生效
问题1 目前这种方案可以满足我们绝大部分场景下的多分支并行开发, 但是对于个别情况, 例如第三方回调, 还存在问题. 除非回调可以由我们来控制请求报文, 否则回调只能落到original版本
内容更新 此处我们后续改成利用Istio提供的EnvoyFilter解决,以下为样例
控制所有pod-lable
为orange-gateway-a
的工作负载(即网关 ),监听其入站流量,对:authority
header和project-version
header加以判断,是否需要额外添加project-version
header
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 apiVersion: networking.istio.io/v1alpha3 kind: EnvoyFilter metadata: name: header-envoy-filter namespace: sopei-biz spec: workloadSelector: labels: app: orange-gateway-a configPatches: - applyTo: HTTP_FILTER match: context: SIDECAR_INBOUND listener: filterChain: filter: name: envoy.filters.network.http_connection_manager subFilter: name: envoy.filters.http.router patch: operation: INSERT_BEFORE value: name: envoy.filters.http.lua typed_config: "@type" : "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua" inlineCode: | function envoy_on_request(request_handle) local authority = request_handle:headers():get(":authority") local version_header = request_handle:headers():get("project-version") if authority == "aaa.com" then if version_header == nil then request_handle:headers():add("project-version", "release-aaa") end elseif authority == 'bbb.com' then if version_header == nil then request_handle:headers():add("project-version", "release-bbb") end end end
问题2 Istio 使用 Envoy 作为数据面转发 HTTP 请求,而 Envoy 默认要求使用 HTTP/1.1 或 HTTP/2,当客户端使用 HTTP/1.0 时就会返回 426 Upgrade Required
。
而我们的网关是基于openresty开发, 而openresty又基于nginx, nginx默认http version是1.0, 所以这里最后是改了网关的proxy_http_version 1.1;
问题3 https://github.com/istio/istio/issues/41709
https://discuss.istio.io/t/nginx-proxy-pass-to-istio-ingress-gateway-404/4330/3
当使用nginx作为服务网关时, 可能在代理服务前配置了proxy_set_header Host xxx;
, 需要注意这里会影响到下游请求的路由规则, 因为VirtualService会根据hosts
来判断当前流量规则是否生效
所以如果proxy_pass结合upstream使用, 需要在nginx中配置proxy_set_header Host <service-name>;
, 如果不是结合upstream使用可以配置proxy_set_header Host $proxy_host
问题4 参考:
https://github.com/istio/istio/issues/41826
https://github.com/envoyproxy/envoy/issues/14981
https://blog.csdn.net/luo15242208310/article/details/96480095
偶现访问超时, 然后报503的问题
页面响应消息: upstream connect error or disconnect/reset before headers. reset reason: connection termination
一个在istio中经常见到的问题, 但是引起的原因很多, 这次问题的起因我也没摸清, 只能按着kiali提供的链路图去看, 发现问题是外界流量到服务网关这里引起, 然后查看服务网关的envoy日志, 如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 { "downstream_local_address" : "10.42.2.122:80" , "bytes_received" : 0 , "route_name" : "default" , "downstream_remote_address" : "10.42.1.0:0" , "upstream_cluster" : "inbound|80||" , "upstream_local_address" : "127.0.0.6:48839" , "upstream_transport_failure_reason" : null , "connection_termination_details" : null , "duration" : 3825 , "x_forwarded_for" : "175.162.8.253,10.42.1.0" , "path" : "/xxxxxx/api/weapp/v2.0/products?product_type=BEST_SELL" , "start_time" : "2022-11-08T07:23:09.887Z" , "requested_server_name" : "outbound_.80_.release-k3s_.orange-gateway-a.sopei-biz.svc.cluster.local" , "user_agent" : "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.3 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1 wechatdevtools/1.06.2210310 MicroMessenger/8.0.5 Language/zh_CN webview/" , "upstream_host" : "10.42.2.122:80" , "method" : "GET" , "protocol" : "HTTP/1.1" , "bytes_sent" : 95 , "response_code_details" : "upstream_reset_before_response_started{connection_termination}" , "response_flags" : "UC" , "authority" : "xxxxxx" , "upstream_service_time" : null , "request_id" : "717b6d28-3cfc-40a1-88c2-dec26f9b54b8" , "response_code" : 503 }
关键字upstream_reset_before_response_started{connection_termination}
如issue14981, 这类问题的一个可能的解释是,当代理开始发送请求时,上游服务器关闭了连接。了解上游连接在发送第一个请求之前打开了多长时间可能会有所帮助。
大概意思就是说, 请求到了服务网关, 后续api服务响应时间过长, 导致网关关闭了连接(其实这个接口确实响应时间过长)
最后的解决办法:
本着试试的心. 在服务网关上面加了层ingressgateway
, 然后进行测试, 发现这个问题解决了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: gateway spec: selector: istio: ingressgateway servers: - port: number: 80 name: http-orange protocol: HTTP hosts: - "*" --- apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: orange-gateway spec: gateways: - gateway hosts: - "*" http: - route: - destination: host: orange-gateway-a subset: release-k3s --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: orange-gateway-a spec: host: orange-gateway-a subsets: - labels: version: release-k3s name: release-k3s
问题4 更新 针对503问题基本上可通过如下4中方式进行优化: (1)修改VirtualService中HTTPRetry(attempts, perTryTimeout,retryOn),设置错误重试策略 (注:在envoy中需要同时设置timeout(Envoy参考 ),即重试的总时间要小于timeout, 在Istio中需同时设置HttpRoute.timeout即可);
(2)修改DestinationRule中HTTPSettings.idleTimeout,设置envoy连接池中连接的空闲缓存时间;
(3)修改DestinationRule中HTTPSettings.maxRequestsPerConnection为1(关闭Keeplive,连接不重用,性能下降);
(4)修改tomcat connectionTimeout(Springboot配置server.connectionTimeout),增加web容器空闲连接超时时间;
同时关于Istio中503的问题排查方法可以参考如下文章:
【英文版】Istio: 503's with UC's and TCP Fun Times
【中文版】Istio:503、UC 和 TCP