SweetOps #azure for October, 2020

Archive: https://archive.sweetops.com/azure/

2020-10-02

ballew

09:29:57 PM

Anything like cloud posse for azure?

2020-10-03

github140

07:44:49 AM

https://github.com/aztfmod

Cloud Adoption Framework for Azure - Terraform landing zones modules

This is the home of Azure Terraform deployment modules. You will find landing zones now at: https://github.com/Azure/caf-terraform-landingzones - Cloud Adoption Framework for Azure - Terraform land…

2020-10-06

Craig Dunford

02:58:11 PM

Hi all - I use AKS and have several deployments that use azureFile volumes to mount storage accounts into pods. One of the recommendations from Microsoft and CIS is to rotate storage account access keys periodically. One of these keys is stored in a k8s secret object to allow the pod to mount the share, however when the key that is being used is regenerated, the mount breaks (this would be expected; I see Host is down when I execute ls in the pod in the mounted directory). When I update the secret value, it doesn’t cause the mount to fix itself and I am forced to restart the pod to have it remount using the new secret value. Is this the expected behavior? Is there anyway to have it “hot remount” when the secret value changes?

geertn

03:51:59 PM

This issue seems to have a list of possible solutions wrt updated secrets: https://github.com/hashicorp/terraform-provider-kubernetes/issues/737

Restart pods when configmap/secret is changed · Issue #737 · hashicorp/terraform-provider-kubernetes

Hi there, I have a code that creates configmap & secret. I noticed that when I change a variable the pods are not restarted. This is a know issue: https://stackoverflow.com/questions/37317003/r…

geertn

03:54:36 PM

Depending on your setup I’d rather use the stakater approach than the watch approach, wrt remounting the share (not remount it, but restart the pod because of possible open file handles etc).

Craig Dunford

04:10:29 PM

@geertn - thanks. If I’m reading right, those solutions all end up in the config/secret change being detected and the pods being restarted by something, correct? I’m using helm and use the “Automatically roll deployments” trick (https://helm.sh/docs/howto/charts_tips_and_tricks/) quite a bit, so my plan was to do that here as well if a pod restart is indeed required. My question was mostly if the mount should automatically heal itself without a pod restart (ie k8s detects an unhealthy mount and attempts to rebuild it); seems like that isn’t the case and the pod must be restarted to get it to mount the share with the new secret

Chart Development Tips and Tricks

Covers some of the tips and tricks Helm chart developers have learned while building production-quality charts.

geertn

07:29:40 AM

Yes, pods restarted. AFAICS there is no supported way to do what you want, also not with the new CSI storage driver. You might be able to use the weave watch method (but that’ll probably cause problems with open files / operations) or use a liveness check to see if the volume is broken and then automatically restart the pod.

Or workaround the access key rotation requirement by applying network rules, dynamically create your fileshare or move to nfs?

2020-10-07

2020-10-08

Padarn

02:13:09 PM

anyone got some good reading around how to best manage keys/secrets/vault with a terraform setup in Azure? looking for some best practices, and strategies to manage secrets that need to be rotated (yes I see its discussed above, bit more of a beginner question here )

Pierre-Yves

01:01:50 PM

like this https://docs.microsoft.com/en-us/azure/key-vault/secrets/tutorial-rotation-dual ?

Rotation tutorial for resources with two sets of credentials

Use this tutorial to learn how to automate the rotation of a secret for resources that use two sets of authentication credentials.

Padarn

01:40:43 PM

thanks a lot, hadn’t seen this, will take a read

Pierre-Yves

04:00:23 PM

also there is some lightning talks about security you might be interested in at the hashiconf October

2020-10-13

2020-10-20

Padarn

09:08:11 AM

Hi all, we’re trying to use omsagent but the pods are giving the following error

td-agent-bit 1.4.2
2020-10-20T06:14:42Z I! Starting Telegraf 
2020-10-20T06:14:42Z E! [agent] Failed to connect to output socket_writer, retrying in 15s, error was 'dial tcp 0.0.0.0:25226: getsockopt: connection refused'

any suggestions? the pod is marked as running but nothing is being collected to our workspace

Padarn

10:19:40 AM

fixed after full restart

geertn

10:28:26 AM

Do you use AKS?

Padarn

10:54:12 AM

Yes - we just restarted the daemonset running and the errors went away