/r/kubernetes

Photograph via snooOG

Kubernetes discussion, news, support, and link sharing.

Kubernetes discussion, news, support, and link sharing.

Subreddit rules

Kubernetes Links

Learn Kubernetes

Newsletters

/r/kubernetes

151,890 Subscribers

1

Weekly: Share your EXPLOSIONS thread

Did anything explode this week (or recently)? Share the details for our mutual betterment.

0 Comments
2024/12/18
11:00 UTC

1

I found the framework/formulas in this writeup of measuring the the cost of production issues could be useful for the team i lead. I would agree that targeted improvements can reclaim significant team capacity.

0 Comments
2024/12/18
09:24 UTC

1

kubernetes on bare metal with multi kube server

I am planning to implement a Kubernetes cluster within our vSphere environment, utilizing 6 virtual machines: 3 designated as control plane nodes and 3 as worker nodes. I am seeking guidance regarding the optimal networking configuration for this architecture. Additionally, I would appreciate recommendations regarding load balancer selection based on your implementation experience.

1 Comment
2024/12/18
09:20 UTC

1

How to restore rancher projects after velero restore of cluster

Hi,

Has anyone ever worked with Velero and rancher clusters? I am trying to conduct a successful restore of an old cluster to a new cluster. Right now, I have a script set to run by a Velero prehook command that will run and capture the rancher projects and their IDs in a .json file and the namespaces and their projectIds into a .JSON file. During the Velero restore, the namespace are restored but the projects aren’t. I am trying to create a script that will restore the projects to the new cluster and move those namespaces restored from Velero to their respective projects. Does anyone have experience with this? I have attached my script. It keeps failing once it gets to the namespace part. It keeps failing on the jq portion of the script:

#!/bin/bash

Variables

S3_BUCKET="s3://{{ .Values.s3_bucket }}" OUTPUT_DIR="/tmp/output" CACERT_PATH="/path/to/your/ca-cert.pem" ACCESS_KEY="{{ .Values.secrets.accessKey }}" SECRET_KEY="{{ .Values.secrets.secretKey }}" NEW_CLUSTER_ID="{{ .Values.cluster_id }}" RANCHER_SERVER="{{ .Values.rancher_server }}" KUBECONFIG="/path/to/your/kube/config"

export KUBECONFIG="$KUBECONFIG"

Create output directory

mkdir -p "$OUTPUT_DIR"

Fetch JSON files

echo "Fetching JSON files from S3..." PROJECTS_FILE="$OUTPUT_DIR/projects.json" NAMESPACES_FILE="$OUTPUT_DIR/namespaces.json"

aws s3 cp "$(aws s3 ls "$S3_BUCKET/" --recursive | grep "projects" | sort -k1,2r | awk '{print $NF}' | head -n 1)" "$PROJECTS_FILE" aws s3 cp "$(aws s3 ls "$S3_BUCKET/" --recursive | grep "namespaces" | sort -k1,2r | awk '{print $NF}' | head -n 1)" "$NAMESPACES_FILE"

Validate JSON files

if ! jq empty "$PROJECTS_FILE" >/dev/null 2>&1 || ! jq empty "$NAMESPACES_FILE" >/dev/null 2>&1; then echo "Error: Invalid JSON files. Exiting." exit 1 fi

Fetch existing project names

echo "Fetching existing projects in the cluster..." EXISTING_PROJECT_NAMES=$(curl -s -u "$ACCESS_KEY:$SECRET_KEY" --cacert "$CACERT_PATH"
"$RANCHER_SERVER/v3/projects?clusterId=$NEW_CLUSTER_ID" | jq -r '.data[].name')

declare -A PROJECT_ID_MAP

Restore projects and build PROJECT_ID_MAP

echo "Restoring projects..." jq -c '.[]' "$PROJECTS_FILE" > "$OUTPUT_DIR/projects_clean.json"

while read -r project; do PROJECT_NAME=$(echo "$project" | jq -r '.name // empty') OLD_PROJECT_ID=$(echo "$project" | jq -r '.id // empty')

if [[ "$PROJECT_NAME" == "Default" || "$PROJECT_NAME" == "System" || -z "$PROJECT_NAME" || -z "$OLD_PROJECT_ID" ]]; then continue fi

if echo "$EXISTING_PROJECT_NAMES" | grep -qw "$PROJECT_NAME"; then echo "Skipping existing project: $PROJECT_NAME" continue fi

RESPONSE=$(curl -s -u "$ACCESS_KEY:$SECRET_KEY" --cacert "$CACERT_PATH"
-X POST -H "Content-Type: application/json"
-d "{"name":"$PROJECT_NAME","displayName":"$PROJECT_NAME","clusterId":"$NEW_CLUSTER_ID"}"
"$RANCHER_SERVER/v3/projects")

NEW_PROJECT_ID=$(echo "$RESPONSE" | jq -r '.id // empty') if [[ -z "$NEW_PROJECT_ID" ]]; then echo "Failed to create project: $PROJECT_NAME" continue fi

PROJECT_ID_MAP["$OLD_PROJECT_ID"]="$NEW_PROJECT_ID" echo "Mapped old project ID $OLD_PROJECT_ID to new project ID $NEW_PROJECT_ID" done < "$OUTPUT_DIR/projects_clean.json"

Merge projects.json and namespaces.json

echo "Merging namespaces with new project IDs..." MERGED_FILE="$OUTPUT_DIR/merged_namespaces.json"

jq -c '.[]' "$NAMESPACES_FILE" | while read -r namespace; do NAMESPACE_NAME=$(echo "$namespace" | jq -r '.metadata.name // empty') OLD_PROJECT_ID=$(echo "$namespace" | jq -r '.metadata.annotations["field.cattle.io/projectId"] // empty')

if [[ -n "${PROJECT_ID_MAP[$OLD_PROJECT_ID]}" ]]; then echo "{"namespace":"$NAMESPACE_NAME","NEW_PROJECT_ID":"${PROJECT_ID_MAP[$OLD_PROJECT_ID]}"}" >> "$MERGED_FILE" fi done

if [[ ! -s "$MERGED_FILE" ]]; then echo "Error: Merged file is empty. Exiting." exit 1 fi

Patch namespaces to the new project IDs

echo "Patching namespaces..." while read -r entry; do NAMESPACE_NAME=$(echo "$entry" | jq -r '.namespace // empty') NEW_PROJECT_ID=$(echo "$entry" | jq -r '.NEW_PROJECT_ID // empty')

if [[ -n "$NAMESPACE_NAME" && -n "$NEW_PROJECT_ID" ]]; then echo "Patching namespace $NAMESPACE_NAME to project ID $NEW_PROJECT_ID" kubectl patch namespace "$NAMESPACE_NAME"
-p "{"metadata":{"annotations":{"field.cattle.io/projectId":"$NEW_CLUSTER_ID:$NEW_PROJECT_ID"}}}"
|| echo "Failed to patch namespace: $NAMESPACE_NAME" fi done < "$MERGED_FILE"

echo "Script completed successfully."

0 Comments
2024/12/18
03:06 UTC

2

Multi Wan support

Hello, I am currently helping my brother setup kubernets in our home as a redundancy. The plan is to have a control plane and 3 servers here as well as in a few other locations.

As a way to ensure network reliability we have 2 internets, one is a 750/75 asymmetric DOCSIS, and the other is a 500/500 symmetric fiber. I have bought an ER 605 V1 router by TP Link and plan to use its multi Wan capabilities to either load balance or use one of them as a failover.

However, given we are talking about different ISPs and different public IP addresses, how would one work it out so that whenever we had to switch to the other WAN the cluster wouldn't collapse given it would be using a different IP address than expected?

Thank you for any help.

2 Comments
2024/12/18
02:33 UTC

5

Kubernetes configuration file tutorials

Hi, I’ve been reading in the docs and it feels like they have examples of configuration files. However, I’m really looking for a base up understanding where you start from simple configurations to more advanced. Can someone recommend a reading guide, or perhaps a particular source that would help me learn from the ground up? Am I right in feeling like the docs are more particular example focused, and not really explanatory on how the configuration works from the ground up?

2 Comments
2024/12/17
23:56 UTC

80

[Blog post] Infra engineer at Render breaks down OpenAI's outage

Post is here: https://render.com/blog/a-hidden-dns-dependency-in-kubernetes

Explores how an avoidable DNS dependency likely contributed to the severity of the recent OpenAI outage. Demonstrates how a few small configuration changes have helped to properly isolate control plane and data plane responsibilities in Render's clusters, meaningfully mitigating the impact of similar high-load events.

[Affiliation note: I work for Render.]

19 Comments
2024/12/17
23:15 UTC

9

VPC-Native Clusters now available in GA on DigitalOcean Kubernetes (DOKS)

This feature brings seamless integration between DOKS clusters and VPC resources. With this update, any new cluster created is by default a VPC-native DOKS cluster, a significant advancement that helps ensure secure and isolated networking for your workloads. Learn more in the blog post announcement.

1 Comment
2024/12/17
22:31 UTC

2

Like minikube for local development but hate its CLI?

I was browsing the kubernetes-sigs github and noticed a project I hadn't seen before.
https://github.com/kubernetes-sigs/minikube-gui

Has anyone tried this? I'm interested in the community's thoughts on it.

15 Comments
2024/12/17
21:56 UTC

3

Kubernetes Stretched Cluster

Hey guys,

I have been running a production grade k8s cluster via rancher (rke2) and it has been working great. However recent I ran into a scenario where I have to deploy an independent application stack on a remote datacenter (no redundancy needed).

The though I have been playing with to just deployed a worker node and join it to the main cluster. This connect is low latency via an IPsec tunnel.

The goal here to only simplify the management and orchestration of the application deployment.

But during my research, I haven seen any documents on this perticular casee. I was hoping to get some insight here. Any advice is appreciated on this approach!

6 Comments
2024/12/17
20:37 UTC

23

Could someone explain/give documentation on what is the purpose of Gateway API from K8s v1.31 and Istio being used in conjunction?

I have been using Istio with Istio Ingress Gateway and Virtual Services in an AWS EKS setting and it has worked wonders. We have been looking towards strengthening our security using mTLS as well so looking forward to utilizing this. Always looking forward to Istio's improvements.

Now I have a couple of questions as to why there are ALWAYS different flavors being combined for their network setup.

  1. With k8s v1.31 recent release of Gateway API. Am I understanding that it adds onto Istio? Would like the benefits of what this means for improving Istio or is something to not implement.
  2. I have seen projects like Istio combining let's say Kong + Istio, Istio + Nginx (Ingresses together), or Cilium + Istio. Wouldn't this be a pain to manage and confusing for other DevOps/SREs to understand? I find just sticking with Istio or Cilium (which is also great) is sufficient for many companies needs.

Would appreciate any help on this and if you have any documentation to help me better understand the networking field in K8s please send them over to me. I'll ready whatever.

12 Comments
2024/12/17
19:38 UTC

4

Kubernetes for ML Engineers/ MLOps Engineers?

For building scalable ML Systems, i think that Kubernetes is a really important tool which MLEs / MLOps Engineers should master as well as an Industry standard. If I'm right about this, How can I get started with Kubernetes for ML.

Is there any learning path specific for ML? Can anyone please throw some light and suggest me a starting point? (Courses, Articles, Anything is appreciated)!

3 Comments
2024/12/17
17:54 UTC

2

Which Workload Identity Federation auth method is better to use in GKE?

I've just started working on a GCP/GKE project.

Existing GKE clusters are missing Web Identity Federation and workloads are using static credentials to access services :facepalm:

So I'm going to setup WIF and reading the docs and there are two on how to aurthorize apps to use Services (after you configure cluster and pools).

  • Configure "direct" authorization and principals link
  • Link Kubernetes ServiceAccounts to IAM link

I don't know which one to chose. The second option "linking KSA to IAM" and using KSA annotation `iam.gke.io/gcp-service-account` seem more natural to me since I'm coming from AWS.

Maybe community can share their opinion on which is better to use, or which one they use? (Probably I had to create a poll)

P.S. Personally, I have vast experience in Kubernetes and AWS, but for some reason, I have never had real "prod" exposure to a complex business running solely on GCP.

16 Comments
2024/12/17
13:11 UTC

0

k3s + metallb + traefik isn't allowing me to define ports for ArgoCD

Hi all.

I have a single node k3s deployed, with metallb as the load balancer.

For a test I have deployed nginx listening on port 80, and exposed it with a load balancer service on port 80. Works great.

Next test nginx listening on port 8123, exposed it on port 80, with a target port of 8123, works great.

Next up, trying to get ArgoCD server exposed on port 80. I do a similar thing

kubectl -n argocd expose deploy argocd-server --name=argocd-server-lb-80 --type=LoadBalancer --load-balancer-ip=192.168.2.1 --port=80 --target-port=8080

And it doesn't work. I can get it to work with --port 8080 no problem. I have tried exposing the service as well, which listens on port 80, so

kubectl -n argocd expose service argocd-server --name=argocd-server-service-lb-80 --type=LoadBalancer --load-balancer-ip=192.168.2.2 --port=80 --target-port=80

Also no joy. If I go direct to the cluster ip of argocd-server I can get a response on port 80, so in my head creating a load balancer to point at port 80 of the service should work.

I must be missing something fundamental about what I'm doing.

Many thanks

2 Comments
2024/12/17
12:09 UTC

2

Migrate PVC from local-path to NFS

Hi everyone,
I just setup my NFS server and deployed NFS subdir on my K3S server as an NFS StorageClass (as suggested in another post from an user here on reddit). To be clear this one:

https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/blob/master/charts/nfs-subdir-external-provisioner/README.md

I actually have nextcloud deployed by this helm chart:
https://github.com/nextcloud/helm/tree/main/charts/nextcloud

It's an home installation where everything is on a PVC (even the db that is SQLlite) due to the low number of user (never more than 1 online in the same time, 3 in total). No redis. No nothing.

Actually the PVC use the K3S local-path StorageClass and is on one node of the cluster. I would like to migrate it to the new nfs-client storage class. Possible without lose nothing.

Which is the most secure way to do this?

If connecting in SSH directly on the host OS (so totally bypassing K3S) and I rsync the folder and then I change the PVC, it will work?
Is there any better solution?

Other information:

  • This migration is only from one kind of storage to the other, but is still on the same K3S cluster.
  • I have a backup of all the user data, no configuration backuped. So in case of problem I don't lose nothing but I need to manually recreate the configuration (And the preview image). So not a big impact but I would like to lose time.

Thanks!

7 Comments
2024/12/17
11:23 UTC

1

Weekly: Questions and advice

Have any questions about Kubernetes, related tooling, or how to adopt or use Kubernetes? Ask away!

0 Comments
2024/12/17
11:00 UTC

3

Options for running old version of Kubernetes / AKS

We are currently running an old version of a COTS application on AKS 1.24

We have had to pin it to that particular version as when we tried upgrading to 1.25 and above, the applications crashed with a load of memory related issues. The vendor has said that we must upgrade the app whilst the business is preventing that due to the business changes required. The updated application is significantly different.

We are stuck given that 1.24 is deprecated and cannot be deployed on Azure using AKS. We really need to build out another environment to support other work - it's a highly integrated suite of applications.

What are my options given that the application upgrade is out of reach for now? Is there any easy way to deploy 1.24 within Azure?

6 Comments
2024/12/17
09:30 UTC

6

NFS storage

Hi, which is the better way to mount an NFS folder on Kubernetes?

I started trying use longhorn but then I found that for Nextlcoud but then I found it very slow. Upgrading my homelab server network to a 2.5Gbits help a bit but don't solve the situation.

Now I setup a small server with some USB 3 storage and shared it with an NFS. I would like that Nextlcoud (and any other app) could potentially start on each of the 3 node of my K3S HA cluster. So even the media server and so on.

In the future if using NFS works well, to reach a sort of high availability, I would like to replicate with keepalived the NFS server to create an active/passive configuration (like align the two NFS each 1h).

Is there a good way to do this ? Also I would like to avoid any sort of active/active cofngihration because I think that it will be resource demanding and ends up to slow the things.

Thanks!

8 Comments
2024/12/16
21:33 UTC

32

A Simpler ArgoCD CI/CD Workflow

Hello,

I’ve joined a project where I’m the sole developer, and here’s the current CI/CD process:

  1. Merge to main
  2. A GitHub Action builds and pushes an image to GHCR.
  3. A new branch is created, updating the deployment image reference.
  4. A PR is opened.
  5. Once merged, the ArgoCD GitOps sync deploys the new image.

This feels overly complex. I often end up deleting branches and performing two manual steps before deployment. I’d prefer a simpler GitHub Actions workflow that deploys directly to production.

I would need to use the :latest tag to avoid ArgoCD repeatedly trying to sync the wrong commits. But then I lose track of what’s actually running in production and can’t easily roll back cleanly.

What would be the best way to streamline these steps, use tagged images (with a short SHA), maintain clarity about what’s in production, and enable quick, clean rollbacks?

Thank you!

10 Comments
2024/12/16
18:34 UTC

8

A video explaining multi-cluster resource management

Video covers how to deploy resources and helm charts from a management cluster to a set of managed clusters using Sveltos.

Full disclosure, I am biased because I have been working on Sveltos for almost 3 years now, but I think the video is worth watching. So sharing it (don't kill me it is ALL open source at the end :-) )

https://www.youtube.com/watch?v=bsWEo79w09c

7 Comments
2024/12/16
16:29 UTC

1

Kubernetes Podcast episode 244: 2024 Recap, with Abdel Sghiouar, Kaslin Fields and Mofi Rahman

This episode is a recap of 2024. Co-hosts Abdel and Kaslin and guest host Mofi got together to reflect on how 2024 has been in the Cloud Native and Kubernetes space.

https://kubernetespodcast.com/episode/244-2024-recap/index.html

0 Comments
2024/12/16
15:07 UTC

0

Try the new Kubernetes v1.32 from Canonical!

Canonical Kubernetes Platform v1.32 stable is now released!

https://itnext.io/seamless-cluster-creation-management-announcing-canonical-kubernetes-platform-a6a03f345ca5

Ok I'm gonna be honest here. I've heard a lot of negative things about snaps, but why don't you give this one a try and share your thaughts? You can spin up a cluster anywhere using:

sudo snap install k8s --classic --channel=1.32-classic/stable
sudo k8s bootstrap
sudo k8s status
sudo k8s kubectl get pods -A

And you'll have a production-grade, performant and scalable cluster with very low resouce consumption. Oh! And you don't need to be worried about newer versions of upstream Kubernetes! Just refresh the snap!

6 Comments
2024/12/16
12:24 UTC

0

What should I do?

I want to get into Devops or cloud. What should I be learning? I know I should learn docker and kubernetes, and what are the other things that I should learn? I have graduated this year in June, and recently joined a job in one of WITCH companies... The job is about monitoring something and creating incidents in service now. I don't like this job, what should I do? I feel so worried, that I might not be able to switch later. What should I do? Someone help me. Does irrelevant experience counts as a experience? What should I do?

6 Comments
2024/12/16
11:42 UTC

0

Help Needed: Exposing Redis Master-Slave in EKS for External Access

Hey everyone,

I deployed a Redis master-slave setup in EKS using the Bitnami Redis Helm chart. My application is hosted outside the EKS cluster, so I needed to make Redis accessible externally. Here's what I did:

I created an Ingress to expose the Redis service, which was deployed by the Helm chart.

The read operations work perfectly fine.

However, when I try to perform a write operation, I get this error:

Error: you can't write against a readonly replica.

My Questions:

  1. How can I expose the Redis service externally in a way that supports both read and write operations?

  2. Is there a better way to achieve this that avoids or using an ingress (e.g., LoadBalancer or other methods)?

I'm looking for guidance from anyone who has successfully deployed a similar setup or knows the best practices. Any help or suggestions would be greatly appreciated!

Calling out the tech wizards and DevOps gurus—please help me figure this out. Thanks in advance!


Additional Info:

Redis is deployed as a master-replica setup.

The error clearly indicates that my application is hitting the replica instead of the master for write operations.

I’m open to modifying the setup if needed for better performance and scalability.

Looking forward to your insights!

7 Comments
2024/12/16
11:01 UTC

5

Ask r/kubernetes: What are you working on this week?

What are you up to with Kubernetes this week? Evaluating a new tool? In the process of adopting? Working on an open source project or contribution? Tell /r/kubernetes what you're up to this week!

5 Comments
2024/12/16
11:00 UTC

4

Help: “Easiest and low maintenance” AWS EKS Option

What is the best, easiest and low maintenance way to spin up an EKS cluster that I can support and upgrade in the long run?

My company is getting rid of EKS platform support team, so EKS is gonna fall back on each team. I am still learning kubernetes stuff, got some “basic” understand of it. Job market sucks, so I have to survive this huge task if somehow it gets assigned to me. Figure I have to be one step ahead before my head is underwater.

Eksctl, kubeadm, or just straight vanilla EKS?

TIA

8 Comments
2024/12/16
05:53 UTC

0

How to set up a 3 master node cluster for high availability? HELP

Hi everyone,

I currently have a Kubernetes cluster with 3 nodes (1 master and 2 workers), but I need to create a 5-node cluster (3 master and 2 workers) for high availability in an on-premises setup. I’m particularly concerned about configuring etcd and quorum correctly. But if there’s another thing I have to do I’d appreciate if you have any guidance or any material you have followed

What additional steps should I follow to achieve this setup? Any advice or guidance on best practices for setting up the HA control plane and ensuring etcd quorum would be greatly appreciated.

Thanks in advance!

26 Comments
2024/12/16
04:55 UTC

Back To Top