#kubernetes (2023-05)
Archive: https://archive.sweetops.com/kubernetes/
2023-05-03
![gajanandsingh1612 avatar](https://secure.gravatar.com/avatar/0465644198f1db8275760cb7a2e8d4d2.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0021-72.png)
Hi everyone, I want to create a production level high availability kubernetes cluster on premise, so what are the things that I should take care of, can you guys guide me or give me some resources so that I can read these things?
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
The etcd certificate will be expired after a while, will need to renew it manually
2023-05-08
2023-05-17
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
If anyone is willing to field an Istio/Envoy question in here:
I have an issue where all services are behind the same Gateway
(Istio 1.13.x) attached to a GCP LB and the calls are svcA -> svcB -> svcC -> svcB -> svcD
and is resulting in downstream_remote_disconnect
in istio-proxy
logs for svcB
When I clone svcB
in another namespace behind a different VirtualService
and do svcA -> svcB -> svcC -> svcB(clone) -> svcD
I get the expected 200
. Is there anything obvious as to why this is the case?
I am happy to provide more details as needed.
[2023-05-16T22:15:56.839Z] "GET /uri?query=test HTTP/1.1" 0 DC downstream_remote_disconnect - "-" 0 0 119998 - "34.29.X.X, 34.111.X.X,35.191.X.X" "axios/0.27.2" "53b3a438-a3dc-4df7-86a9-7229048d993e" "svcB.stage.XXX.com" "10.88.X.X:8080" inbound|8080|| 127.0.0.6:46205 10.88.X.X:8080 35.191.X.X:0 outbound_.8080_._.svcB.svcB.svc.cluster.local default
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
Is there network policy in the stack?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
No defined NetworkPolicy
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
any logs in Envoy container?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
The one I pasted above with the downstream_remote_disconnect
. What other logs would you be looking for?
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
istio got a proxy pods or container
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
That was from the sidecar container istio-proxy
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
oh sorry it is lol
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
The new pods in another namespace run on the same node or different node?
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
either security group rule issue or need to ssh into the node and take a look
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Let me check. It is definitely possible it is on another node
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
svcB and svcB(clone) are actually on the same node
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
any difference between VirtualServices?
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
only name?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Only the 3rd level domain
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
What are the outputs of describe
them?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Spec:
Gateways:
istio-system/istio-default-external
Hosts:
svcB.stg.XXX.com
Http:
Route:
Destination:
Host: svcB
Port:
Number: 8080
Timeout: 120s
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Very straight forward
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
any events or status in outputs?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
<none>
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
If you are reporting _any_ crash or _any_ potential security issue, _do not_
open an issue in this repo. Please report the issue via emailing
[email protected] where the issue will be triaged appropriately.
Title: What are possible scenarios where we get downstream_remote_disconnect response ?
Description:
We are using envoy proxy to route requests based on header value to respective upstreams. While doing performance testing, few of our requests fail with response code details - downstream_remote_disconnect. Wanted to understand when can we experience this ?
[optional Relevant Links:]
Any extra documentation required to understand the issue.
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
This is set when the stream is terminated due to a downstream FIN. If you're encountering a situation where that detail is set and a wireshark trace shows downstream isn't sending the FIN we'd be happy to look into it, but by default we'd assume it's client-caused at which point there's not much we can do to diagnose what's going on :-)
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
It is the same source code so the client/container image artifact are identical. I did see that ticket but that can’t be accurate in my case
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
I am guessing there is a routing issue based on a call back to svcB after the initial downstream but I don’t see anything to support that or to resolve it
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
if the error was seen in svcB’s pods, there may be more logs in either C or D
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
C was sending back timeouts that I have set to 120s
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
But only in that flow. C does not send any timeouts outside of that specific call
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
is iptables used?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
And I can test that call to the C pod from local and from Apollo without timeout
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
seems to run into a corner case
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
how about sending requests to C and D?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
No issue other than that specific flow
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
or use tcpdump in B pods
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Yeah I was going to escalate this to Wireshark but I was trying to avoid that
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
or try Cilium Mesh
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
which can support non-side-car mode
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Yeah and Traefik Maesh as well. Istio is releasing their own sidecarless option as well
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
How about Linkerd?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
I’d rather not throw away Istio for this one potential bug
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
they rewrite 2.0 with Rust
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
yeah, that is a huge change
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
do you use CNI in the stack?
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
the default GKE CNI
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
oh if GKE dataplane v2 is used, GKE Dataplane V2 is implemented using Cilium
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
The legacy dataplane for GKE is implemented using Calico
![timduhenchanter avatar](https://avatars.slack-edge.com/2019-03-17/579912413365_835e6cb6a5f7a9b47d79_72.jpg)
Yes I believe it’s Calico
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
ok, v1->v2 may worth a test
![Hao Wang avatar](https://secure.gravatar.com/avatar/aa01de6ab42f1576bbb56a203c660939.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0013-72.png)
or there may be some leftover policies in the namespace