Files
lab/crossplane-evaluation.md
Michal Rydlikowski ac695f506f first commit
2026-03-15 23:50:43 +00:00

5.0 KiB

Crossplane Evaluation

Decision: NOT ADOPTING

Crossplane will not be used in this stack. The lack of a plan/preview mechanism is a dealbreaker for enterprise adoption and safe infrastructure management.


Why We Evaluated It

The core problem: Terraform/OpenTofu requires re-implementing the same infrastructure concepts per platform (AWS, XCP-ng, bare metal). At thousands of nodes across multiple platforms, this is a massive maintenance burden. Crossplane's XRD/Composition model promised a unified API:

XRD: "VirtualMachine" (universal API)
  ├── Composition: AWS      → EC2 instance
  ├── Composition: XCP-ng   → XO VM
  └── Composition: bare metal → MAAS / Ansible

One API, multiple backends — teams request a "VirtualMachine" and the right composition handles it.

Strengths

  • CNCF Graduated (Nov 2025, v2.2) — Apache 2.0 license, top-tier maturity
  • Continuous drift detection — automatically reverts manual changes, unlike Terraform's on-demand plan/apply
  • No state file management — no remote backends, locking issues, or state corruption
  • Kubernetes-native — works with ArgoCD, Flux, kubectl, RBAC out of the box
  • XRDs/Compositions — genuine multi-platform abstraction layer, solves the "re-implement per cloud" problem
  • Eventual consistency — resources with complex dependencies don't get stuck like Terraform's dependency graph
  • Enterprise adoption — Deutsche Kreditbank, Elastic, Nike, Apple, NASA, Grafana Labs, 60+ orgs
  • Deutsche Kreditbank replaced Terraform; deployments went from weeks to under one hour

Dealbreaker: No Plan/Preview

The single biggest issue. Terraform's terraform plan lets operators see exactly what will change before applying. Crossplane applies changes immediately upon resource creation/modification.

  • Discussed in the community for 2+ years with no resolution
  • A Kubernetes-native solution would be a Plan CRD that shows proposed changes before approval
  • ArgoCD sync --dry-run is a partial workaround but only shows k8s resource diffs, not what the cloud provider will actually do underneath
  • For regulated environments and SRE teams at scale, change preview is non-negotiable

Possible reasons it hasn't been implemented:

  • The continuous reconciliation architecture may make point-in-time snapshots fundamentally hard
  • Upbound (commercial entity) may be reserving it for their paid platform
  • Or simply not prioritised

Other Significant Concerns

CRD Bloat

  • provider-aws installs 900+ CRDs — can make API server unresponsive for up to an hour (GitHub #2649)
  • Exceeds Kubernetes' recommended ~500 CRD limit
  • Mitigated by "Provider Families" (install per-service sub-providers) but requires careful planning

Debugging Difficulty

  • Errors propagate through layers: Claim → XR → Composition → Managed Resource → Provider → Cloud API
  • Multiple sources report debugging compositions is painful
  • Pipeline Inspector (alpha in v2.2) is being introduced but not production-ready

Chicken-and-Egg Problem

  • Crossplane runs inside Kubernetes — cannot provision the cluster it runs on
  • Requires a "management cluster" bootstrapped by other means (Terraform, Puppet, etc.)
  • If the management cluster dies, no drift detection or reconciliation runs
  • Recovery: applying YAMLs to a new cluster works if deterministic resource names are used, otherwise risks creating duplicate cloud resources

Cluster Loss / Immutability Concerns

  • State lives in etcd, not a versionable state file
  • No independent audit trail or easy way to diff historical states
  • On new cluster: resources with explicit external names get adopted; auto-named resources get duplicated
  • Need etcd backups as insurance, and deterministic naming everywhere

Performance at Scale

  • ~2000 composites took 6+ minutes to reconcile on k3d (GitHub #2256)
  • Reconciliation interval not easily configurable globally (GitHub #5934)

YAML Limitations

  • No native loops, conditionals, or programming constructs
  • Complex compositions require changes in multiple locations

XCP-ng Provider Gap

  • No Crossplane provider for XCP-ng exists today
  • A mature Terraform provider (terraform-provider-xenorchestra) exists, maintained by Vates
  • Could be wrapped via Upjet to auto-generate a Crossplane provider — but nobody has done it
  • Would be a greenfield open-source project

Real Issues Reported

  • API server unresponsiveness with too many CRDs (GitHub #2649)
  • CRD scaling issues beyond ~500 CRDs (GitHub #2895)
  • GCP SQL resources randomly marked for deletion — dangerous for production databases
  • Reconciliation rate limiting at scale (GitHub #2256)

Conclusion

Crossplane solves a real problem (multi-platform abstraction) that we need, but the lack of plan/preview makes it unsuitable for enterprise-scale production infrastructure management. The operational concerns (CRD bloat, debugging, cluster dependency) add further risk.

We need to find an alternative approach to the multi-platform abstraction problem that Crossplane solves, while retaining plan/preview capabilities.