SweetOps #kubernetes for May, 2023

Archive: https://archive.sweetops.com/kubernetes/

2023-05-03

gajanandsingh1612

Hi everyone, I want to create a production level high availability kubernetes cluster on premise, so what are the things that I should take care of, can you guys guide me or give me some resources so that I can read these things?

Hao Wang

12:34:01 AM

The etcd certificate will be expired after a while, will need to renew it manually

2023-05-08

2023-05-17

timduhenchanter

07:06:59 PM

If anyone is willing to field an Istio/Envoy question in here:

I have an issue where all services are behind the same Gateway (Istio 1.13.x) attached to a GCP LB and the calls are svcA -> svcB -> svcC -> svcB -> svcD and is resulting in downstream_remote_disconnect in istio-proxy logs for svcB When I clone svcB in another namespace behind a different VirtualService and do svcA -> svcB -> svcC -> svcB(clone) -> svcD I get the expected 200. Is there anything obvious as to why this is the case? I am happy to provide more details as needed.

[2023-05-16T22:15:56.839Z] "GET /uri?query=test HTTP/1.1" 0 DC downstream_remote_disconnect - "-" 0 0 119998 - "34.29.X.X, 34.111.X.X,35.191.X.X" "axios/0.27.2" "53b3a438-a3dc-4df7-86a9-7229048d993e" "svcB.stage.XXX.com" "10.88.X.X:8080" inbound|8080|| 127.0.0.6:46205 10.88.X.X:8080 35.191.X.X:0 outbound_.8080_._.svcB.svcB.svc.cluster.local default

Hao Wang

10:17:10 PM

Is there network policy in the stack?

timduhenchanter

10:17:23 PM

No defined NetworkPolicy

Hao Wang

10:18:15 PM

any logs in Envoy container?

timduhenchanter

10:30:07 PM

The one I pasted above with the downstream_remote_disconnect. What other logs would you be looking for?

Hao Wang

10:32:33 PM

istio got a proxy pods or container

timduhenchanter

10:32:52 PM

That was from the sidecar container istio-proxy

Hao Wang

10:33:26 PM

oh sorry it is lol

Hao Wang

10:34:12 PM

The new pods in another namespace run on the same node or different node?

Hao Wang

10:34:50 PM

either security group rule issue or need to ssh into the node and take a look

timduhenchanter

10:34:59 PM

Let me check. It is definitely possible it is on another node

timduhenchanter

10:37:16 PM

svcB and svcB(clone) are actually on the same node

Hao Wang

10:38:25 PM

any difference between VirtualServices?

Hao Wang

10:38:40 PM

only name?

timduhenchanter

10:38:42 PM

Only the 3rd level domain

Hao Wang

10:40:14 PM

What are the outputs of describe them?

timduhenchanter

10:41:41 PM

Spec:
  Gateways:
    istio-system/istio-default-external
  Hosts:
    svcB.stg.XXX.com
  Http:
    Route:
      Destination:
        Host:  svcB
        Port:
          Number:  8080
    Timeout:       120s

timduhenchanter

10:41:48 PM

Very straight forward

Hao Wang

10:42:44 PM

any events or status in outputs?

timduhenchanter

10:42:57 PM

<none>

Hao Wang

10:45:06 PM

https://github.com/envoyproxy/envoy/issues/14908

#14908 Downstream remote disconnect response

If you are reporting _any_ crash or _any_ potential security issue, _do not_
open an issue in this repo. Please report the issue via emailing
[email protected] where the issue will be triaged appropriately.

Title: What are possible scenarios where we get downstream_remote_disconnect response ?

Description:

We are using envoy proxy to route requests based on header value to respective upstreams. While doing performance testing, few of our requests fail with response code details - downstream_remote_disconnect. Wanted to understand when can we experience this ?

[optional Relevant Links:]

Any extra documentation required to understand the issue.

Hao Wang

10:45:14 PM

This is set when the stream is terminated due to a downstream FIN. If you're encountering a situation where that detail is set and a wireshark trace shows downstream isn't sending the FIN we'd be happy to look into it, but by default we'd assume it's client-caused at which point there's not much we can do to diagnose what's going on :-)

timduhenchanter

10:45:38 PM

It is the same source code so the client/container image artifact are identical. I did see that ticket but that can’t be accurate in my case

timduhenchanter

10:48:33 PM

I am guessing there is a routing issue based on a call back to svcB after the initial downstream but I don’t see anything to support that or to resolve it

Hao Wang

10:48:55 PM

if the error was seen in svcB’s pods, there may be more logs in either C or D

timduhenchanter

10:49:15 PM

C was sending back timeouts that I have set to 120s

timduhenchanter

10:49:28 PM

But only in that flow. C does not send any timeouts outside of that specific call

Hao Wang

10:49:43 PM

is iptables used?