#kubernetes (2023-05)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2023-05-03

gajanandsingh1612 avatar
gajanandsingh1612

Hi everyone, I want to create a production level high availability kubernetes cluster on premise, so what are the things that I should take care of, can you guys guide me or give me some resources so that I can read these things?

1
Hao Wang avatar
Hao Wang

The etcd certificate will be expired after a while, will need to renew it manually

1

2023-05-08

2023-05-17

timduhenchanter avatar
timduhenchanter

If anyone is willing to field an Istio/Envoy question in here:

I have an issue where all services are behind the same Gateway (Istio 1.13.x) attached to a GCP LB and the calls are svcA -> svcB -> svcC -> svcB -> svcD and is resulting in downstream_remote_disconnect in istio-proxy logs for svcB When I clone svcB in another namespace behind a different VirtualService and do svcA -> svcB -> svcC -> svcB(clone) -> svcD I get the expected 200. Is there anything obvious as to why this is the case? I am happy to provide more details as needed.

[2023-05-16T22:15:56.839Z] "GET /uri?query=test HTTP/1.1" 0 DC downstream_remote_disconnect - "-" 0 0 119998 - "34.29.X.X, 34.111.X.X,35.191.X.X" "axios/0.27.2" "53b3a438-a3dc-4df7-86a9-7229048d993e" "svcB.stage.XXX.com" "10.88.X.X:8080" inbound|8080|| 127.0.0.6:46205 10.88.X.X:8080 35.191.X.X:0 outbound_.8080_._.svcB.svcB.svc.cluster.local default
Hao Wang avatar
Hao Wang

Is there network policy in the stack?

timduhenchanter avatar
timduhenchanter

No defined NetworkPolicy

Hao Wang avatar
Hao Wang

any logs in Envoy container?

timduhenchanter avatar
timduhenchanter

The one I pasted above with the downstream_remote_disconnect. What other logs would you be looking for?

Hao Wang avatar
Hao Wang

istio got a proxy pods or container

timduhenchanter avatar
timduhenchanter

That was from the sidecar container istio-proxy

Hao Wang avatar
Hao Wang

oh sorry it is lol

Hao Wang avatar
Hao Wang

The new pods in another namespace run on the same node or different node?

Hao Wang avatar
Hao Wang

either security group rule issue or need to ssh into the node and take a look

timduhenchanter avatar
timduhenchanter

Let me check. It is definitely possible it is on another node

timduhenchanter avatar
timduhenchanter

svcB and svcB(clone) are actually on the same node

Hao Wang avatar
Hao Wang

any difference between VirtualServices?

Hao Wang avatar
Hao Wang

only name?

timduhenchanter avatar
timduhenchanter

Only the 3rd level domain

Hao Wang avatar
Hao Wang

What are the outputs of describe them?

timduhenchanter avatar
timduhenchanter
Spec:
  Gateways:
    istio-system/istio-default-external
  Hosts:
    svcB.stg.XXX.com
  Http:
    Route:
      Destination:
        Host:  svcB
        Port:
          Number:  8080
    Timeout:       120s
timduhenchanter avatar
timduhenchanter

Very straight forward

Hao Wang avatar
Hao Wang

any events or status in outputs?

timduhenchanter avatar
timduhenchanter

<none>

Hao Wang avatar
Hao Wang
#14908 Downstream remote disconnect response

If you are reporting _any_ crash or _any_ potential security issue, _do not_
open an issue in this repo. Please report the issue via emailing
[email protected] where the issue will be triaged appropriately.

Title: What are possible scenarios where we get downstream_remote_disconnect response ?

Description:

We are using envoy proxy to route requests based on header value to respective upstreams. While doing performance testing, few of our requests fail with response code details - downstream_remote_disconnect. Wanted to understand when can we experience this ?

[optional Relevant Links:]

Any extra documentation required to understand the issue.

Hao Wang avatar
Hao Wang
This is set when the stream is terminated due to a downstream FIN. If you're encountering a situation where that detail is set and a wireshark trace shows downstream isn't sending the FIN we'd be happy to look into it, but by default we'd assume it's client-caused at which point there's not much we can do to diagnose what's going on :-)
timduhenchanter avatar
timduhenchanter

It is the same source code so the client/container image artifact are identical. I did see that ticket but that can’t be accurate in my case

timduhenchanter avatar
timduhenchanter

I am guessing there is a routing issue based on a call back to svcB after the initial downstream but I don’t see anything to support that or to resolve it

Hao Wang avatar
Hao Wang

if the error was seen in svcB’s pods, there may be more logs in either C or D

timduhenchanter avatar
timduhenchanter

C was sending back timeouts that I have set to 120s

timduhenchanter avatar
timduhenchanter

But only in that flow. C does not send any timeouts outside of that specific call

Hao Wang avatar
Hao Wang

is iptables used?

timduhenchanter avatar
timduhenchanter

And I can test that call to the C pod from local and from Apollo without timeout

Hao Wang avatar
Hao Wang

seems to run into a corner case

Hao Wang avatar
Hao Wang

how about sending requests to C and D?

timduhenchanter avatar
timduhenchanter

No issue other than that specific flow

Hao Wang avatar
Hao Wang

or use tcpdump in B pods

timduhenchanter avatar
timduhenchanter

Yeah I was going to escalate this to Wireshark but I was trying to avoid that

Hao Wang avatar
Hao Wang

or try Cilium Mesh

Hao Wang avatar
Hao Wang

which can support non-side-car mode

timduhenchanter avatar
timduhenchanter

Yeah and Traefik Maesh as well. Istio is releasing their own sidecarless option as well

Hao Wang avatar
Hao Wang

Hao Wang avatar
Hao Wang

How about Linkerd?

timduhenchanter avatar
timduhenchanter

I’d rather not throw away Istio for this one potential bug

Hao Wang avatar
Hao Wang

they rewrite 2.0 with Rust

Hao Wang avatar
Hao Wang

yeah, that is a huge change

Hao Wang avatar
Hao Wang

do you use CNI in the stack?

timduhenchanter avatar
timduhenchanter

the default GKE CNI

Hao Wang avatar
Hao Wang

oh if GKE dataplane v2 is used, GKE Dataplane V2 is implemented using Cilium

Hao Wang avatar
Hao Wang

The legacy dataplane for GKE is implemented using Calico

timduhenchanter avatar
timduhenchanter

Yes I believe it’s Calico

Hao Wang avatar
Hao Wang

ok, v1->v2 may worth a test

Hao Wang avatar
Hao Wang

or there may be some leftover policies in the namespace

    keyboard_arrow_up