SweetOps #kubernetes for May, 2022

Archive: https://archive.sweetops.com/kubernetes/

2022-05-03

2022-05-04

Pierre Humberdroz

I remember a tool that would check a cluster for usage of deprecated manifests and spit them out. I know there is pluto but I believe there to be a 2nd one but I am not sure. Anyone an idea?

tim.j.birkett

07:20:36 AM

@Pierre Humberdroz - you might be thinking of https://github.com/FairwindsOps/pluto

FairwindsOps/pluto

A cli tool to help discover deprecated apiVersions in Kubernetes

jedineeper

09:29:03 PM

Kubent

2022-05-05

2022-05-09

2022-05-11

Andy

12:01:24 PM

Hi there are any teams using API Gateways within their kubernetes clusters? Just curious as to what current solutions people have good experience with. Searching through the slack history here I see:

• Kong

• Ambassador Edge Stack - (looks like you have to pay for JWT use though)

• Amazon API Gateway (we use EKS)

Niv Weiss

12:22:06 PM

I’m deploying Validating Webhook with the operation rule “create” (and it’s working fine) but when I’m adding “update” property - it doesn’t work. I’m getting this error: Failed to update endpoint default/webhook-server: Internal error occurred: failed calling webhook "webhook-server.default.svc": failed to call webhook: Post "<https://webhook-server.default.svc:443/validate?timeout=5s>": dial tcp ip.ip.ip.ip: connect: connection refused

Any help please?

This is my yaml files:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webhook-server
  namespace: default
  labels:
    app: webhook-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webhook-server
  template:
    metadata:
      labels:
        app: webhook-server
    spec:
      containers:
      - name: server
        image: webhook-server
        imagePullPolicy: Never
        env:
          - name: FAIL
            value: "false"
        ports:
        - containerPort: 8443
          name: webhook-api
        volumeMounts:
        - name: webhook-tls-certs
          mountPath: /run/secrets/tls
          readOnly: true
      volumes:
      - name: webhook-tls-certs
        secret:
          secretName: webhook-server-tls
---
apiVersion: v1
kind: Service
metadata:
  name: webhook-server
  namespace: default
spec:
  selector:
    app: webhook-server
  ports:
    - port: 443
      targetPort: webhook-api
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: demo-webhook
webhooks:
  - name: webhook-server.default.svc
    namespaceSelector:
      matchExpressions:
        - key: validate
          operator: NotIn
          values: [ "skip" ]
    rules:
      - apiGroups:   [""]
        apiVersions: ["v1"]
        operations:  ["CREATE", "UPDATE"]
        resources:   ["*"]
    clientConfig:
      service:
        namespace: "default"
        name: "webhook-server"
        path: "/validate"
      caBundle: ${CA_PEM_B64}
    admissionReviewVersions: ["v1", "v1beta1"]
    sideEffects: None
    timeoutSeconds: 5
#    failurePolicy: Ignore

2022-05-12

Santiago Campuzano

07:45:26 PM

Hello everyone ! We’ve been facing an issue with some latency-sensitive services that we deployed to EKS and are being exposed using Nginx Ingress Controller. The issue is related with the Conntrack table (used by iptables) filling up and then it starts dropping packages. The solution to this problem is simply increasing the Kernel parameter net.ipv4.netfilter.ip_conntrack_max to a higher value, piece of cake. As we are using the EKS/AWS maintained AMI for the worker nodes, this value comes predefined with a relatively small value (our services/apps handle several thousands of reqs per sec). We’ve been exploring different ways of properly setting this value, and the most preferred way would be modifying the kube-proxy-config Config Map, which contains Conntrack specific config

conntrack:
  maxPerCore: 32768
  min: 131072
  tcpCloseWaitTimeout: 1h0m0s
  tcpEstablishedTimeout: 24h0m0s

The problem is that kube-proxy is being managed as an EKS add-on, so, if we modify this config, by, let’s say, Terraform, it will be overridden by the EKS add-on. But we don’t want to self-manage Kube Proxy, as that would slow down our fully automatic EKS upgrades that we handle with Terraform. Any ideas or suggestions ?

2022-05-13

Loukas Agorgianitis

07:50:30 AM

Hello, can somebody tell me the conceptual difference between a controller and a mutating webhook?

mfridh

07:54:18 AM

https://kubernetes.io/docs/concepts/architecture/controller/#controller-pattern

A controller tracks at least one Kubernetes resource type. These objects have a spec field that represents the desired state. The controller(s) for that resource are responsible for making the current state come closer to that desired state.

The controller might carry the action out itself; more commonly, in Kubernetes, a controller will send messages to the API server …

Controllers

In robotics and automation, a control loop is a non-terminating loop that regulates the state of a system. Here is one example of a control loop: a thermostat in a room. When you set the temperature, that’s telling the thermostat about your desired state. The actual room temperature is the current state. The thermostat acts to bring the current state closer to the desired state, by turning equipment on or off.

mfridh

07:55:14 AM

https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks

Admission webhooks are HTTP callbacks that receive admission requests and do something with them. You can define two types of admission webhooks, validating admission webhook and mutating admission webhook. Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults. After all object modifications are complete, and after the incoming object is validated by the API server, validating admission webhooks are invoked and can reject requests to enforce custom policies.

Dynamic Admission Control

In addition to compiled-in admission plugins, admission plugins can be developed as extensions and run as webhooks configured at runtime. This page describes how to build, configure, use, and monitor admission webhooks. What are admission webhooks? Admission webhooks are HTTP callbacks that receive admission requests and do something with them. You can define two types of admission webhooks, validating admission webhook and mutating admission webhook. Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults.

mfridh

07:56:23 AM

So…

Mutating webhooks can influence what a controller is trying to do … because it “sits” as one of the final “filters” before admission to the API server.

Loukas Agorgianitis

07:58:25 AM

As far as I understand a ‘webhook’ (lets say mutating) receives events upon a resource creation/update/delete and can act upon them (by applying further modifications in the resource)

Loukas Agorgianitis

07:59:41 AM

A controller on the other side is simply an application living inside kubernetes that can use the kubernetes api to create/update/delete resources too i think?

Loukas Agorgianitis

07:59:48 AM

In this case why prefer one over the other?

2022-05-18

Steven Miller

02:30:02 PM

What does cloudposse use for ingress controller?

02:33:01 PM

We have used ingress-nginx (open source) (and still use it for some clients) and we mostly use the aws-load-balancer controller

Steven Miller

02:36:41 PM

I didn’t know aws lb controller was an ingress controller. What do you recommend if customers need pervasive network encryption for compliance?

02:53:57 PM

network encryption such as ALB + ACM ? Would that not suffice ?

Steven Miller

02:54:24 PM

I think that will terminate HTTPS on the ALB then plaintext in the private network, right?

Steven Miller

02:54:56 PM

I mean like network encryption for all node-to-node traffic too

Steven Miller

02:55:11 PM

and also ALB to node

02:59:07 PM

See this https://aws.amazon.com/blogs/containers/setting-up-end-to-end-tls-encryption-on-amazon-eks-with-the-new-aws-load-balancer-controller/

Setting up end-to-end TLS encryption on Amazon EKS with the new AWS Load Balancer Controller | Amazon Web Services attachment image

In this blog post, I’ll show you how to set up end-to-end encryption on Amazon Elastic Kubernetes Service(Amazon EKS). End-to-end encryption in this case refers to traffic that originates from your client and terminates at an NGINX server running inside a sample app. I work with regulated customers who need to satisfy regulatory requirements like […]

02:59:27 PM

They recommend using AWS PCA which is 400/mo per CA

02:59:55 PM

you’d need at least 2 CAs, one root unshared and one subordinate shared across the org to create the certs.

Steven Miller

03:00:56 PM

Ok I see, in this approach each pod with a sidecar that terminates SSL, so basically just using HTTPS services in each app then route to them

Steven Miller

03:02:48 PM

I think it would meet compliance to just use normal ACM cert on the LB, then self-signed the rest of the way and configured ALB to trust self signed cert. since we would only care about encryption but not authentication in the case of in cluster networking

03:02:50 PM

You should be able to use a custom signed cert with cert-manager to do this instead of using PCA

Steven Miller

03:03:10 PM

hey awesome, thanks

timduhenchanter

04:53:10 PM

Shot in the dark but does anyone have experience securing API Keys with Istio directly? The AuthorizationPolicy CR does not support reading request header values from a secret. In previous implementations I have used Kong to secure the API keys downstream from Istio but would prefer not to support Kong and the additional overhead for this specific use case.

2022-05-19

azec

09:14:08 PM

@Erik Osterman (Cloud Posse), where can I find that K8S ingress controllers comparison sheet ? I opened it from one of your posts another day and I need it now, but can’t find it …

Erik Osterman (Cloud Posse)

09:17:32 PM

https://www.linkedin.com/posts/osterman_comparison-of-kubernetes-ingress-controllers-[…]sXFc?utm_source=linkedin_share&utm_medium=member_desktop_web

Erik Osterman (Cloud Posse) on LinkedIn: Comparison of Kubernetes Ingress controllers attachment image

Comparison of Kubernetes Ingress controllers . Someone has painstakingly compared…

2022-05-20

Adnan

07:28:41 AM

What would you do if you had a requirement to redirect traffic from one app to different app based on the route existing or not existing in the first app?

• all requests always first go to the primary app

• if the primary app knows the request (non 404 http response code) it answers with a appropriate response

• if the primary app does not know the request (404) it redirects the request to a different fallback secondary app

• the solution should work across multiple clusters so even if the fallback secondary app is on another cluster the redirect/shift traffic scenario should work

timduhenchanter

05:39:14 AM

My personal recommendation would be for the app/proxy to handle this logic and conditionally offer a permanent redirect to the secondary app.

This can be done in code or by having a proxy (i.e. nginx) in front of the app where you catch 404s and redirect to the secondary app.

timduhenchanter

09:48:53 PM

Otherwise, if you’re just referring to path based routing and you’re aware of all the primary service routes you can define them in a K8S Ingress for the primary app and have all inexplicitly defined requests go to the secondary app?

Adnan

04:51:12 AM

ok. thanks for the recommendation. also to do all this behind the scenes there needs to be a proxy plus something that enables communication cross cluster?

timduhenchanter

06:11:35 AM

Yes, a federated service mesh or an endpoint that was exposed to the internet or intranet on a shared network in a multi-cluster scenario

Adnan

07:27:41 AM

so what I would like to use is, connect the clusters via a federated service mesh as you mentioned, then I’d like to use a proxy, as you mentioned, in front of the app that catches 404 and caches that route so next time we save ourselves the app startup time. unfortunately from a half an hour research i wasn’t able to find an obvious tool that can be used to solve this. do you have some recommendations there?

timduhenchanter

08:19:28 AM

Istio + a Deployment with nginx & the primary app where the service points to nginx and the nginx config redirects 404’s to the secondary app is what I had on the top of my head but there is a variety of configurations possible. Also, I don’t know your timeline but federating Istio in a production environment is not trivial

Adnan

10:16:26 AM

ok, thanks for the heads up

2022-05-21

2022-05-22

2022-05-23

Niv Weiss

10:14:16 AM

Hey, I’m trying to install prometheus on EKS fargate but I’m getting an error from the prometheus-alertmanager & prometheus-server pods because the pvc is pending: Pod not supported on Fargate: volumes not supported: storage-volume not supported because: PVC prometheus-alertmanager not bound . I installed the aws-ebs-csi-driver add-on on the eks and it’s still doesn’t work. I installed everything using “AWS blueprint”

I also saw this error: 2022-05-23T13:06:04+03:00 E0523 10:06:04.065256 1 reflector.go:127] [github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117](http://github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117): Failed to watch *v1beta1.VolumeSnapshotClass: failed to list *v1beta1.VolumeSnapshotClass: the server could not find the requested resource (get [volumesnapshotclasses.snapshot.storage.k8s.io](http://volumesnapshotclasses.snapshot.storage.k8s.io)) from the ebs-plugin pod

Anyone knows how can I install prometheus on fargate?

Thank you!

Bootstrapping clusters with EKS Blueprints | Amazon Web Services attachment image

Today, we are introducing a new open-source project called EKS Blueprints that makes it easier and faster for you to adopt Amazon Elastic Kubernetes Service (Amazon EKS). EKS Blueprints is a collection of Infrastructure as Code (IaC) modules that will help you configure and deploy consistent, batteries-included EKS clusters across accounts and regions. You can […]

Vlad Ionescu (he/him)

10:32:50 AM

EKS on Fargate only supports 2 types of storage: local ephemeral storage and EFS. You are trying to use EBS.

There is a feature request for EKS on Fargate to support EBS volumes, but that’s not supported yet.

Community Note

• Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Allow mounting of at EBS volumes onto EKS pods running on fargate

Which service(s) is this request for?
EKS on fargate

Tell us about the problem you’re trying to solve. What are you trying to do, and why is it hard?
We’re looking to migrate a cluster from EC2 nodes to Fargate. There are some stateful pods using EBS which is preventing us from adopting fargate.

Are you currently working around this issue?
We are staying on standard worker nodes

Vlad Ionescu (he/him)

10:34:35 AM

You have a couple options:

• use local ephemeral storage (less than ideal considering that gets deleted when the pod is deleted)

• use EFS instead of EBS

• use EC2 workers instead of Fargate workers

• use Amazon’s Managed Prometheus service

Niv Weiss

12:27:04 PM

So I’m trying to use Amazon’s Managed Prometheus service and as par as I understand I still need to install Prometheus helm chart on my cluster according to their guide. (I’m doing it through aws eks blueprint) Am I right @Vlad Ionescu (he/him)?

Vlad Ionescu (he/him)

12:36:15 PM

I am not fully up-to-date with Amazon’s Managed Prometheus so I can’t comment with certainty. Sorry!

As far as I know, you can use either Prometheus with remote_write (does that still need storage or can you eliminate the EBS requirement?) or OpenTelemetry agents to send metrics to Amazon’s Managed Prometheus but again, high chance I am wrong.

Niv Weiss

01:19:29 PM

Thank you!

Niv Weiss

01:19:33 PM

I will try

Andy

11:51:52 AM

Has anyone had problems with multipass since upgrading to Monterey 12.4? Am frequently seeing time-outs. Managed to start VMs once since the upgrade yesterday.

multipass start k3s-sandbox                                                                                                                                                                                                                     start failed: The following errors occurred:
k3s-sandbox: timed out waiting for response

Andy

03:13:38 PM

To anyone else who sees this issue looks like it might be related to networking https://github.com/canonical/multipass/issues/2387

Multipass crashes unexpectedly, and all instances (including primary) will not restart. I am using multipass 1.8.1+mac,
multipassd 1.8.1+mac. (multipass –version), on macOS Monterey 12.1, M1 Max, 32 GB RAM. I installed multipass for macOS directly from Canonical’s website.

start failed: The following errors occurred:                                    
dbarn: timed out waiting for response

This has happened twice. The first time I uninstalled multipass, reinstalled, and re-built my instances thinking it was a fluke. Multipass performed fine for roughly a week, then crashed again and I lost some work. I am not able to determine what the cause is, or how to reproduce it. System restarts do not help. I’m hoping someone can spot the issue in the logs.

Logs here:

``` [2022-01-04T1356.044] [warning] [qemu-system-aarch64] [2022-01-04T1356.044] [debug] [dbarn] QMP: {“QMP”: {“version”: {“qemu”: {“micro”: 0, “minor”: 1, “major”: 6}, “package”: “”}, “capabilities”: [“oob”]}}

[2022-01-04T1356.062] [debug] [dbarn] QMP: {“return”: {}}

[2022-01-04T1308.663] [debug] [dbarn] QMP: {“timestamp”: {“seconds”: 1641321068, “microseconds”: 663550}, “event”: “NIC_RX_FILTER_CHANGED”, “data”: {“path”: “/machine/unattached/device[7]/virtio-backend”}}

[2022-01-04T1337.352] [debug] [dbarn] process working dir ‘’ [2022-01-04T1337.353] [info] [dbarn] process program ‘qemu-system-aarch64’ [2022-01-04T1337.353] [info] [dbarn] process arguments ‘-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -nic, vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff:d3, -cpu, cortex-a72, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 2, -m, 4096M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/cloud-init-config.iso’ [2022-01-04T1337.353] [warning] [Qt] QProcess: Destroyed while process (“qemu-system-aarch64”) is still running. [2022-01-04T1337.370] [error] [dbarn] process error occurred Crashed program: qemu-system-aarch64; error: Process crashed [2022-01-04T1337.371] [info] [dbarn] process state changed to NotRunning [2022-01-04T1337.371] [error] [dbarn] error: program: qemu-system-aarch64; error: Process crashed [2022-01-04T1337.372] [warning] [Qt] QObject::Process, Unknown): invalid nullptr parameter [2022-01-04T1337.372] [warning] [Qt] QObject::Process, Unknown): invalid nullptr parameter [2022-01-04T1337.372] [warning] [Qt] QObject::Process, Unknown): invalid nullptr parameter [2022-01-04T1337.372] [warning] [Qt] QObject::Process, Unknown): invalid nullptr parameter [2022-01-04T1337.372] [warning] [Qt] QObject::Process, Unknown): invalid nullptr parameter [2022-01-04T1337.372] [warning] [Qt] QObject::Process, Unknown): invalid nullptr parameter [2022-01-04T1338.572] [debug] [update] Latest Multipass release available is version 1.8.0 [2022-01-04T1340.666] [info] [VMImageHost] Did not find any supported products in “appliance” [2022-01-04T1340.671] [info] [rpc] gRPC listening on unix:/var/run/multipass_socket, SSL:on [2022-01-04T1340.673] [debug] [qemu-img] [1300] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/tractorgeeks/ubuntu-20.04-server-cloudimg-arm64.img [2022-01-04T1340.677] [debug] [qemu-img] [1301] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-20.04-server-cloudimg-arm64.img [2022-01-04T1340.681] [debug] [qemu-img] [1302] started: qemu-img snapshot -l /var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/ubuntu-20.04-server-cloudimg-arm64.img [2022-01-04T1340.686] [info] [daemon] Starting Multipass 1.8.1+mac [2022-01-04T1340.686] [info] [daemon] Daemon arguments: /Library/Application Support/com.canonical.multipass/bin/multipassd –verbosity debug [2022-01-04T1316.997] [debug] [dbarn] process working dir ‘’ [2022-01-04T1316.998] [info] [dbarn] process program ‘qemu-system-aarch64’ [2022-01-04T1316.998] [info] [dbarn] process arguments ‘-machine, virt,highmem=off, -accel, hvf, -drive, file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on, -nic, vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff:d3, -cpu, cortex-a72, -device, virtio-scsi-pci,id=scsi0, -drive, file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda, -device, scsi-hd,drive=hda,bus=scsi0.0, -smp, 2, -m, 4096M, -qmp, stdio, -chardev, null,id=char0, -serial, chardev:char0, -nographic, -cdrom, /var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/cloud-init-config.iso’ [2022-01-04T1317.004] [debug] [qemu-system-aarch64] [1384] started: qemu-system-aarch64 -machine virt,highmem=off -nographic -dump-vmstate /private/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/multipassd.pRWjeY [2022-01-04T1317.061] [info] [dbarn] process state changed to Starting [2022-01-04T1317.063] [info] [dbarn] process state changed to Running [2022-01-04T1317.063] [debug] [qemu-system-aarch64] [1385] started: qemu-system-aarch64 -machine virt,highmem=off -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff:d3 -cpu cortex-a72 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/ubuntu-20.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 2 -m 4096M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/dbarn/cloud-init-config.iso [2022-01-04T1317.063] [info] [dbarn] process started [2022-01-04T1317.063] [debug] [dbarn] Waiting for SSH to be up [2022-01-04T1317.347] [warning] [dbarn] qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: Started vmnet interface with configuration: qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: MTU: 1500 qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: Max packet size: 1514 qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: MAC: de9b58:c4 qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: DHCP IPv4 start: 192.168.64.1 qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: DHCP IPv4 end: 192.168.64.254 qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: IPv4 subnet mask: 255.255.255.0 qemu-system-aarch64: -nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=5200ff info: UUID: 9736AE0D-0FAE-4223-847E-517E35533FEA

[2022-01-04T1317.348] [warning] [qemu-system-aarch64] [2022-01-04T1317.351] [debug] [dbarn] QMP: {“QMP”: {“version”: {“qemu”: {“micro”: 0, “minor”: 1, “major”: 6}, “package”: “”}, “capabilities”: [“oob”]}}

[2022-01-04T1317.399] [debug] [dbarn…

2022-05-25

msharma24

04:25:37 AM

Hi there Me and @Nimesh Amin are trying to upgrade the kube-prometheus-stack to use AWS Managed Prometheus service and are facing an issue where we cant seem to figure out how to enable the Sigv4 on the grafana deployed by the kube-prometheus-stack chart.

More on this issue https://github.com/prometheus-community/helm-charts/issues/2092

TLDR; how to enable sigv4 auth in grafana.ini for kube-prometheus-stack helm chart.

Is your feature request related to a problem ?

We have configured the prometheus to remoteWrite to the AWS Managed Prometheus service endpoint, however there is no option in the Grafana deployed with kube-prometheus-stack to enable the “Sigv4” auth on the prometheus datasource —this is needed to configure the AWS Managed Prometheus query endpoint as datasource and use the Sigv4 to assume the ServiceAccount IAM Role permissions.

Please advise if you have been able to solve this issue.

Describe the solution you’d like.

Settings / values to enable the sigv4 auth for grafana

Describe alternatives you’ve considered.

Deploy the https://github.com/grafana/helm-charts in a new namespace as the default installation allows to use “Sigv4” auth for the prometheus datasource.

Additional context.

No response

Erik Osterman (Cloud Posse)

06:12:44 PM

Sorry for the late reply and this is not really an answer - just saying these kinds of issues are why we ended up abandonining kube-prometheus. Just so much over head to do the upgrades. We started using it when it was an open source project at Ticket Master, then it moved to CoreOS, then it moved to “charts official” helm/charts, then that was deprecated, then we started using the bitnami version, then we said NEVER AGAIN.

Is your feature request related to a problem ?

Please advise if you have been able to solve this issue.

Describe the solution you’d like.

Settings / values to enable the sigv4 auth for grafana

Describe alternatives you’ve considered.

Deploy the https://github.com/grafana/helm-charts in a new namespace as the default installation allows to use “Sigv4” auth for the prometheus datasource.

Additional context.

No response

msharma24

07:12:16 PM

All good Erik - We have figured out the right config that works on EKS with sigv4 -its a YAML jungle out there. :)

msharma24

07:16:59 PM

https://github.com/prometheus-community/helm-charts/issues/2092#issuecomment-1140332174

Here is the configuration that worked for us .

@hoangtrucit

Thank you for your response.
We figured the following settings worked for us .

grafana:
  serviceAccount:
    create: false
    name: iamproxy-service-account
  grafana.ini:
    auth:
      sigv4_auth_enabled: true
  additionalDataSources:
    - name: prometheus-amp
      editable: true
      jsonData:
        sigV4Auth: true
        sigV4Region: us-west-2
      type: prometheus
      isDefault: true
      url: <https://aps-workspaces.us-west-2.amazonaws.com/workspaces/<ID>>
  serviceAccount:
    create: false
    name:   iamproxy-service-account
      
  prometheusSpec:
    remoteWrite:
      -  url:  <https://aps-workspaces.us-west-2.amazonaws.com/workspaces/<ID>/api/v1/remote_write>
prometheus:
        sigv4:
          region: us-west-2