Hi all, we’re trying to use omsagent but the pods are giving the following error
td-agent-bit 1.4.2 2020-10-20T06:14:42Z I! Starting Telegraf 2020-10-20T06:14:42Z E! [agent] Failed to connect to output socket_writer, retrying in 15s, error was 'dial tcp 0.0.0.0:25226: getsockopt: connection refused'
any suggestions? the pod is marked as running but nothing is being collected to our workspace
fixed after full restart
Do you use AKS?
Yes - we just restarted the daemonset running and the errors went away
anyone got some good reading around how to best manage keys/secrets/vault with a terraform setup in Azure? looking for some best practices, and strategies to manage secrets that need to be rotated (yes I see its discussed above, bit more of a beginner question here )
Use this tutorial to learn how to automate the rotation of a secret for resources that use two sets of authentication credentials.
thanks a lot, hadn’t seen this, will take a read
also there is some lightning talks about security you might be interested in at the hashiconf October
Hi all - I use AKS and have several deployments that use
azureFile volumes to mount storage accounts into pods. One of the recommendations from Microsoft and CIS is to rotate storage account access keys periodically. One of these keys is stored in a k8s secret object to allow the pod to mount the share, however when the key that is being used is regenerated, the mount breaks (this would be expected; I see
Host is down when I execute
ls in the pod in the mounted directory). When I update the secret value, it doesn’t cause the mount to fix itself and I am forced to restart the pod to have it remount using the new secret value. Is this the expected behavior? Is there anyway to have it “hot remount” when the secret value changes?
This issue seems to have a list of possible solutions wrt updated secrets: https://github.com/hashicorp/terraform-provider-kubernetes/issues/737
Hi there, I have a code that creates configmap & secret. I noticed that when I change a variable the pods are not restarted. This is a know issue: https://stackoverflow.com/questions/37317003/r…
Depending on your setup I’d rather use the stakater approach than the watch approach, wrt remounting the share (not remount it, but restart the pod because of possible open file handles etc).
@geertn - thanks. If I’m reading right, those solutions all end up in the config/secret change being detected and the pods being restarted by something, correct? I’m using helm and use the “Automatically roll deployments” trick (https://helm.sh/docs/howto/charts_tips_and_tricks/) quite a bit, so my plan was to do that here as well if a pod restart is indeed required. My question was mostly if the mount should automatically heal itself without a pod restart (ie k8s detects an unhealthy mount and attempts to rebuild it); seems like that isn’t the case and the pod must be restarted to get it to mount the share with the new secret
Covers some of the tips and tricks Helm chart developers have learned while building production-quality charts.
Yes, pods restarted. AFAICS there is no supported way to do what you want, also not with the new CSI storage driver. You might be able to use the weave watch method (but that’ll probably cause problems with open files / operations) or use a liveness check to see if the volume is broken and then automatically restart the pod.
Or workaround the access key rotation requirement by applying network rules, dynamically create your fileshare or move to nfs?
This is the home of Azure Terraform deployment modules. You will find landing zones now at: https://github.com/Azure/caf-terraform-landingzones - Cloud Adoption Framework for Azure - Terraform land…
Anything like cloud posse for azure?