The cloud provider KaaS support myth

Using a cloud provider's KaaS solution means they'll look after everything in your cluster for you, right? Think again.

Written by

Neil Cresswell

Portainer CEO

5 min read

•

May 20, 2025

July 8, 2025

Think your cloud provider will help when your Kubernetes cluster breaks? Are you sure?

Cloud-managed Kubernetes services like Amazon EKS, Azure AKS, and Google GKE are marketed as a way to escape the pain of managing Kubernetes infrastructure. They offer a neatly packaged Kubernetes-as-a-Service (KaaS) experience, where the cloud provider provisions your control plane, wires up etcd, secures access to the Kubernetes API, installs DNS and networking plugins, and handles upgrades with the click of a button. They also provide a plethora of tooling to integrate your operational experience with their additional tooling (e.g., authentication and authorization).

For many teams, this seems like a perfect trade-off. Kubernetes is hard to manage well, and handing over responsibility for the control plane to a cloud provider feels like a smart way to reduce operational risk, simplify platform management, and accelerate adoption.

But here’s the question most teams don’t ask until something goes wrong: what actually happens when your cluster breaks? Will your cloud provider step in to help? Will they troubleshoot a failed upgrade? Will they fix a CSI driver that has stopped provisioning volumes? Will they investigate strange DNS behavior or inconsistent etcd state?

The answer is almost always no unless you are paying for the right level of support (and in some cases, quite a lot).

This blog explains exactly what is and isn’t included in a managed Kubernetes service. It breaks down what the provider is actually responsible for, what you still need to maintain, and why relying on the default level of support is often a risky and expensive mistake.

What is actually included in KaaS?

Each cloud provider offers a slightly different version of managed Kubernetes, but at a high level, here’s what is typically integrated and managed as part of the service:

A fully managed Kubernetes control plane (API server, scheduler, controller manager)
A secured etcd datastore, operated by the provider
A load-balanced API endpoint with integrated authentication
CoreDNS for service discovery, deployed and maintained by the platform
A curated Kubernetes distribution, patched and validated by the provider
A tested and supported CSI driver for block or file storage integration
A default CNI plugin designed to integrate cleanly with the provider’s networking model
Automated upgrade tooling for the control plane and node pools

This stack provides strong infrastructure scaffolding and is often a vast improvement over managing the control plane yourself. But it does not guarantee resilience, and it is not a support contract. While the provider installs and maintains the components they ship, they do not offer active remediation when something breaks unless you have purchased the right tier of support.

What’s managed vs what’s your responsibility?

Component	AWS EKS	Azure AKS	Google GKE	Customer Responsibility?
Kubernetes API Server	Fully managed	Fully managed	Fully managed	No
etcd	Managed (no user access)	Managed (30-min backups)	Managed	No
Control Plane Upgrades	User-initiated, managed execution	User-initiated, managed execution	Auto or user-initiated	Yes (initiate and validate)
CoreDNS	Managed via EKS add-on	Managed via system pod	Fully managed	No
Kubernetes Distribution	AWS-validated version	Microsoft-validated version	Google-patched version	No
CNI Plugin (default)	AWS VPC CNI (addon-managed)	Azure CNI or kubenet (builtin)	GKE CNI (Calico-based)	No
Custom CNI Plugin	Installable but unsupported	Installable but unsupported	Not installable	Yes
CSI Driver (cloud storage)	Must install and upgrade manually	Must enable and manage version	Auto-managed	Yes (AWS and Azure only)
Custom CSI Driver	Installable but unsupported	Installable but unsupported	Installable but unsupported	Yes
Node Pool Upgrades	Optional auto-upgrade	Optional auto-upgrade	Optional auto-upgrade	Yes
Kubernetes Add-ons (custom)	User responsibility	User responsibility	User responsibility	Yes
Application Compatibility	User responsibility	User responsibility	User responsibility	Yes
Workload Troubleshooting	User responsibility	User responsibility	User responsibility	Yes

‍

Upgrades: Not as “managed” as you might expect

All three cloud providers offer version upgrade tooling for the control plane and node pools, either via the web console or CLI. In most cases, these upgrades are not automatic by default. You are required to initiate them and ensure they complete successfully.

More importantly, you are fully responsible for validating that your applications, Helm charts, CRDs, admission plugins, and networking stack are compatible with the new Kubernetes version. If your cluster becomes unstable or stops working after an upgrade, the cloud provider will not assist unless you are on a paid support tier that includes SLA-backed engineering response.

In AWS and Azure, the CSI driver for persistent storage (which provisions volumes for your workloads) must be manually upgraded to stay in sync with the control plane version. If it is not upgraded, your workloads may silently fail to mount volumes, or the driver may enter crash loops. This is not something the cloud provider monitors for you.

Only Google GKE provides automatic management of the default CSI driver, and only for their supported storage backends.

When something breaks, what will your provider actually do?

Let’s consider some real-world examples:

A control plane upgrade gets stuck mid-way and never finishes
CoreDNS starts failing due to a resource constraint or version mismatch
The default CSI driver stops creating volumes after a minor version bump
etcd begins returning stale reads or refusing writes under load
A networking plugin introduces latency or pod connectivity issues

Unless you have a paid support plan that includes a response SLA, the provider will not investigate these issues on your behalf. You will be directed to public documentation or user forums, even if the failure is in a managed component.

With basic support, there is no guaranteed assistance. With developer support, you may open tickets, but critical incident response is not guaranteed. To get real-time help with a cluster issue, you must be on a support tier that includes 24/7 coverage and production-down response targets.

What does real support actually cost?

If you want the provider to step in during outages, diagnose infrastructure failures, and engage engineering teams, you need to pay for it. Below is a summary of the costs required to unlock support tiers that include a 1-hour SLA for critical issues:

Environment Size	AWS (Business Support)	Azure (Standard Support + SLA Tier)	GCP (Enhanced Support)
1 Cluster	$172/month	$172/month	$572/month
3 Clusters	$316/month	$316/month	$716/month
30 Clusters	$2,260+/month	$2,260+/month	$2,860+/month
100 Clusters	$7,300+/month	$7,300+/month	$7,800+/month

So, what's the reality?

Cloud-managed Kubernetes services reduce the operational overhead of provisioning and managing the control plane, but they do not eliminate risk. They provide a framework for automation, not a safety net. They install infrastructure components, but they do not monitor your workloads for breakage or validate your deployments for compatibility.

You still carry the operational responsibility for everything running in your cluster. If your Kubernetes environment underpins production systems or customer-facing applications, it is vital that you budget for support accordingly, treat upgrades with caution, and build the internal expertise required to bridge the gap between infrastructure automation and operational resilience.

Because when something fails, it is not the SLA that restores service. It is your preparedness, your support contract, and your ability to respond that determines the outcome.

Of course, you can also outsource this support to a trusted third party, and this is something that Portainer offers to our customers. With Portainer you get a really simple to understand and operate management control-plane, and optionally, a level 3/4 escalation channel to ensure you have an environment you can rely on.