5.0 KiB
Crossplane Evaluation
Decision: NOT ADOPTING
Crossplane will not be used in this stack. The lack of a plan/preview mechanism is a dealbreaker for enterprise adoption and safe infrastructure management.
Why We Evaluated It
The core problem: Terraform/OpenTofu requires re-implementing the same infrastructure concepts per platform (AWS, XCP-ng, bare metal). At thousands of nodes across multiple platforms, this is a massive maintenance burden. Crossplane's XRD/Composition model promised a unified API:
XRD: "VirtualMachine" (universal API)
├── Composition: AWS → EC2 instance
├── Composition: XCP-ng → XO VM
└── Composition: bare metal → MAAS / Ansible
One API, multiple backends — teams request a "VirtualMachine" and the right composition handles it.
Strengths
- CNCF Graduated (Nov 2025, v2.2) — Apache 2.0 license, top-tier maturity
- Continuous drift detection — automatically reverts manual changes, unlike Terraform's on-demand plan/apply
- No state file management — no remote backends, locking issues, or state corruption
- Kubernetes-native — works with ArgoCD, Flux, kubectl, RBAC out of the box
- XRDs/Compositions — genuine multi-platform abstraction layer, solves the "re-implement per cloud" problem
- Eventual consistency — resources with complex dependencies don't get stuck like Terraform's dependency graph
- Enterprise adoption — Deutsche Kreditbank, Elastic, Nike, Apple, NASA, Grafana Labs, 60+ orgs
- Deutsche Kreditbank replaced Terraform; deployments went from weeks to under one hour
Dealbreaker: No Plan/Preview
The single biggest issue. Terraform's terraform plan lets operators see exactly what will change
before applying. Crossplane applies changes immediately upon resource creation/modification.
- Discussed in the community for 2+ years with no resolution
- A Kubernetes-native solution would be a
PlanCRD that shows proposed changes before approval - ArgoCD
sync --dry-runis a partial workaround but only shows k8s resource diffs, not what the cloud provider will actually do underneath - For regulated environments and SRE teams at scale, change preview is non-negotiable
Possible reasons it hasn't been implemented:
- The continuous reconciliation architecture may make point-in-time snapshots fundamentally hard
- Upbound (commercial entity) may be reserving it for their paid platform
- Or simply not prioritised
Other Significant Concerns
CRD Bloat
provider-awsinstalls 900+ CRDs — can make API server unresponsive for up to an hour (GitHub #2649)- Exceeds Kubernetes' recommended ~500 CRD limit
- Mitigated by "Provider Families" (install per-service sub-providers) but requires careful planning
Debugging Difficulty
- Errors propagate through layers: Claim → XR → Composition → Managed Resource → Provider → Cloud API
- Multiple sources report debugging compositions is painful
- Pipeline Inspector (alpha in v2.2) is being introduced but not production-ready
Chicken-and-Egg Problem
- Crossplane runs inside Kubernetes — cannot provision the cluster it runs on
- Requires a "management cluster" bootstrapped by other means (Terraform, Puppet, etc.)
- If the management cluster dies, no drift detection or reconciliation runs
- Recovery: applying YAMLs to a new cluster works if deterministic resource names are used, otherwise risks creating duplicate cloud resources
Cluster Loss / Immutability Concerns
- State lives in etcd, not a versionable state file
- No independent audit trail or easy way to diff historical states
- On new cluster: resources with explicit external names get adopted; auto-named resources get duplicated
- Need etcd backups as insurance, and deterministic naming everywhere
Performance at Scale
- ~2000 composites took 6+ minutes to reconcile on k3d (GitHub #2256)
- Reconciliation interval not easily configurable globally (GitHub #5934)
YAML Limitations
- No native loops, conditionals, or programming constructs
- Complex compositions require changes in multiple locations
XCP-ng Provider Gap
- No Crossplane provider for XCP-ng exists today
- A mature Terraform provider (
terraform-provider-xenorchestra) exists, maintained by Vates - Could be wrapped via Upjet to auto-generate a Crossplane provider — but nobody has done it
- Would be a greenfield open-source project
Real Issues Reported
- API server unresponsiveness with too many CRDs (GitHub #2649)
- CRD scaling issues beyond ~500 CRDs (GitHub #2895)
- GCP SQL resources randomly marked for deletion — dangerous for production databases
- Reconciliation rate limiting at scale (GitHub #2256)
Conclusion
Crossplane solves a real problem (multi-platform abstraction) that we need, but the lack of plan/preview makes it unsuitable for enterprise-scale production infrastructure management. The operational concerns (CRD bloat, debugging, cluster dependency) add further risk.
We need to find an alternative approach to the multi-platform abstraction problem that Crossplane solves, while retaining plan/preview capabilities.