Categories: FAANG

Managing your cloud ecosystems: Maintaining workload continuity during worker node upgrades

Planning and managing your cloud ecosystem and environments is critical for reducing production downtime and maintaining a functioning workload. In the “Managing your cloud ecosystems” blog series, we cover different strategies for ensuring that your setup functions smoothly with minimal downtime.

To start things off, the first topic in this blog series is ensuring workload continuity during worker node upgrades.

What are worker node upgrades?

Worker node upgrades apply important security updates and patches and should be completed regularly. For more information on types of worker node upgrades, see Updating VPC worker nodes and Updating Classic worker nodes in the IBM Cloud Kubernetes Service documentation.

During an upgrade, some of your worker nodes may become unavailable. It’s important to make sure your cluster has enough capacity to continue running your workload throughout the upgrade process. Building a pipeline to update your worker nodes without causing application downtime will allow you to easily apply worker node upgrades regularly.

For classic worker nodes

Create a Kubernetes configmap that defines the maximum number of worker nodes that can be unavailable at a time, including during an upgrade. The maximum value is specified as a percentage. You can also use labels to apply different rules to different worker nodes. For complete instructions, see Updating Classic worker nodes in the CLI with a configmap in the Kubernetes service documentation. If you choose not to create a configmap, the default maximum amount of worker nodes that become unavailable is 20%.

If you need your total number of worker nodes to remain up and running, use the ibmcloud ks worker-pool resize command to temporarily add extra worker nodes to your cluster for the duration of the upgrade process. When the upgrade is complete, use the same command to remove the additional worker nodes and return your worker pool to its previous size.

For VPC worker nodes

VPC worker nodes are replaced by removing the old worker node and provisioning a new worker node that runs at the new version. You can upgrade one or more worker nodes at the same time, but if you upgrade multiple at once, they become unavailable at the same time. To make sure you have enough capacity to run your workload during the upgrade, you can plan to either resize your worker pools to temporarily add extra worker nodes (similar to the process described for classic worker nodes) or plan to upgrade your worker nodes one by one.

Wrap up

Whether you choose to implement a configmap, resize your worker pool or upgrade components one-by-one, creating a workload continuity plan before you upgrade your worker nodes can help you create a more streamlined, efficient setup with limited downtime.

Now that you have a plan to prevent disruptions during worker node upgrades, keep an eye out for the next blog in our series, which will discuss how, when and why to implement major, minor or patch upgrades to your clusters and worker nodes.

Learn more about IBM Cloud Kubernetes Service clusters

The post Managing your cloud ecosystems: Maintaining workload continuity during worker node upgrades appeared first on IBM Blog.

AI Generated Robotic Content

Recent Posts

Brad Pitt casts Elliot for Achilles – an Ai acting performance experiment

I am putting most of my efforts to achieve more realistic Ai acting with natural…

4 hours ago

New light-based switch could cut chip energy use and speed future AI photonics

Photonic devices are hardware systems that can process information using light instead of electricity. These…

5 hours ago

Microsoft Lens First Tests: It’s Pretty Decent! – ComfyUI Native Support About to Be Merged

Model weights: https://huggingface.co/Comfy-Org/Lens PR: https://github.com/Comfy-Org/ComfyUI/pull/14077 You'll need to git the merge pull request if you're…

1 day ago

Tencent released Z-Image 6B with pixel space gen. No VAE & 1k Resolution.

Link: https://nju-pcalab.github.io/projects/L2P/ submitted by /u/switch2stock [link] [comments]

2 days ago

Building Context-Aware Search in Python with LLM Embeddings + Metadata

Keyword search breaks the moment a user types something a document doesn't literally say.

2 days ago

The Blueprint: How Movix fills a gap in dental skills with specialized agentic AI

Welcome to The Blueprint, a regular feature where we highlight how Google Cloud customers are…

2 days ago