You don't need KAOS!
Why running the cluster yourself first is the right way to start.
Stub.
The motivation behind KAOS is to derive and automate collective
operating experience. But “collective experience” is built one
incident at a time, and every individual contribution starts the same
way: someone — a human, or a robot — runs kubectl against the
cluster, or receives an alert notification, notices something is off,
and fixes it.
Real experience is earned through trial. You can run those kubectl
commands yourself, or you can hand them to an agentic CLI with access
to the cluster — either path produces experience. Each path leaves
a different gap, though. Humans produce organisational knowledge,
but bring it with them when they leave the organisation. Standalone
agentic CLIs don’t retain knowledge across sessions without extensive
prompting. And human operators of agentic CLIs tend to overlook the
solution entirely — because the agent did the fixing, and you only
learn through pain.
That gap — between “I fixed it” and “we now know how to fix it” — is what KAOS is for.
Becoming a Kubernetes expert is time-consuming, and not every team has the runway for it. Skipping some understanding — taking on what we’d call knowledge debt — is sometimes the right choice, and KAOS is one of the options in that case. Where there’s a proven path through an incident class, letting an agent walk it is safe; where there isn’t, passing the situation to a human operator comes down to gut and metrics, not heroism. The booklet walks the long way around for the team that wants it; going back to learn the hard way is always available later.