Debugging Controller OOM When Cache Grows Unbounded
The Problem
The Kubewarden sbomscanner controller was OOMKilled in production. Memory usage grew unbounded as the cluster scaled.
GitHub Issue #438 tracked how user observed that the sbombastic-controller pod was consuming significantly more memory than other pods in the cluster, eventually hitting resource limits and being terminated by Kubernetes.
Root Cause: Unbounded Cache Growth
Controller-runtime caches every watched resource in memory:
- Resources with reconcilers → full cache
- Resources with indexers → full cache (often overlooked)
- Large fields you don’t need → still cached
Example: VulnerabilityReport reconciler needs to query Images by reference, so it creates an indexer:
mgr.GetFieldIndexer().IndexField(ctx, &VulnerabilityReport{},
".spec.image", indexFunc)The Solution: Field Stripping
The fix, implemented in PR #444, reduces memory usage by stripping unnecessary fields from cached resources before they enter the cache.
The solution removes three types of fields:
- Managed Fields:
metadata.managedFields- Kubernetes tracks field managers here, but controllers typically don’t need this for reconciliation - Data Fields from Image resources: Which used for indexing to help search the VulnerabilityReport resources
- Data Fields from VulnerabilityReport resources: Similar to Image resources, these can contain large payloads
Why This Works
- Managed fields accumulate: Every update adds entries to
managedFields. For frequently updated resources, this can grow to hundreds of KB per resource - Data fields are large: Image manifests and vulnerability reports can contain MBs of data
- Controllers don’t need everything: Reconciliation logic typically only needs spec fields and basic metadata
Key Takeaways
- Controller caches can grow unbounded: Without limits, watching many resources or large resources can cause OOM
- Indexers cache too: Creating an index on a field caches the entire resource
- Strip early: Transform before cache, not during reconcile
References
- Kubewarden sbomscanner Issue #438 - Original OOM issue
- Kubewarden sbomscanner PR #444 - Memory optimization fix
- Controller-Runtime Cache Documentation
- Kubernetes Managed Fields