first commit
This commit is contained in:
106
crossplane-evaluation.md
Normal file
106
crossplane-evaluation.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# Crossplane Evaluation
|
||||
|
||||
## Decision: NOT ADOPTING
|
||||
|
||||
Crossplane will not be used in this stack. The lack of a plan/preview mechanism is a dealbreaker
|
||||
for enterprise adoption and safe infrastructure management.
|
||||
|
||||
---
|
||||
|
||||
## Why We Evaluated It
|
||||
|
||||
The core problem: Terraform/OpenTofu requires re-implementing the same infrastructure concepts
|
||||
per platform (AWS, XCP-ng, bare metal). At thousands of nodes across multiple platforms, this is
|
||||
a massive maintenance burden. Crossplane's XRD/Composition model promised a unified API:
|
||||
|
||||
```
|
||||
XRD: "VirtualMachine" (universal API)
|
||||
├── Composition: AWS → EC2 instance
|
||||
├── Composition: XCP-ng → XO VM
|
||||
└── Composition: bare metal → MAAS / Ansible
|
||||
```
|
||||
|
||||
One API, multiple backends — teams request a "VirtualMachine" and the right composition handles it.
|
||||
|
||||
## Strengths
|
||||
|
||||
- **CNCF Graduated** (Nov 2025, v2.2) — Apache 2.0 license, top-tier maturity
|
||||
- **Continuous drift detection** — automatically reverts manual changes, unlike Terraform's on-demand plan/apply
|
||||
- **No state file management** — no remote backends, locking issues, or state corruption
|
||||
- **Kubernetes-native** — works with ArgoCD, Flux, kubectl, RBAC out of the box
|
||||
- **XRDs/Compositions** — genuine multi-platform abstraction layer, solves the "re-implement per cloud" problem
|
||||
- **Eventual consistency** — resources with complex dependencies don't get stuck like Terraform's dependency graph
|
||||
- **Enterprise adoption** — Deutsche Kreditbank, Elastic, Nike, Apple, NASA, Grafana Labs, 60+ orgs
|
||||
- **Deutsche Kreditbank** replaced Terraform; deployments went from weeks to under one hour
|
||||
|
||||
## Dealbreaker: No Plan/Preview
|
||||
|
||||
The single biggest issue. Terraform's `terraform plan` lets operators see exactly what will change
|
||||
before applying. Crossplane applies changes immediately upon resource creation/modification.
|
||||
|
||||
- Discussed in the community for 2+ years with no resolution
|
||||
- A Kubernetes-native solution would be a `Plan` CRD that shows proposed changes before approval
|
||||
- ArgoCD `sync --dry-run` is a partial workaround but only shows k8s resource diffs, not what the
|
||||
cloud provider will actually do underneath
|
||||
- **For regulated environments and SRE teams at scale, change preview is non-negotiable**
|
||||
|
||||
Possible reasons it hasn't been implemented:
|
||||
- The continuous reconciliation architecture may make point-in-time snapshots fundamentally hard
|
||||
- Upbound (commercial entity) may be reserving it for their paid platform
|
||||
- Or simply not prioritised
|
||||
|
||||
## Other Significant Concerns
|
||||
|
||||
### CRD Bloat
|
||||
- `provider-aws` installs 900+ CRDs — can make API server unresponsive for up to an hour (GitHub #2649)
|
||||
- Exceeds Kubernetes' recommended ~500 CRD limit
|
||||
- Mitigated by "Provider Families" (install per-service sub-providers) but requires careful planning
|
||||
|
||||
### Debugging Difficulty
|
||||
- Errors propagate through layers: Claim → XR → Composition → Managed Resource → Provider → Cloud API
|
||||
- Multiple sources report debugging compositions is painful
|
||||
- Pipeline Inspector (alpha in v2.2) is being introduced but not production-ready
|
||||
|
||||
### Chicken-and-Egg Problem
|
||||
- Crossplane runs inside Kubernetes — cannot provision the cluster it runs on
|
||||
- Requires a "management cluster" bootstrapped by other means (Terraform, Puppet, etc.)
|
||||
- If the management cluster dies, no drift detection or reconciliation runs
|
||||
- Recovery: applying YAMLs to a new cluster works if deterministic resource names are used,
|
||||
otherwise risks creating duplicate cloud resources
|
||||
|
||||
### Cluster Loss / Immutability Concerns
|
||||
- State lives in etcd, not a versionable state file
|
||||
- No independent audit trail or easy way to diff historical states
|
||||
- On new cluster: resources with explicit external names get adopted; auto-named resources get duplicated
|
||||
- Need etcd backups as insurance, and deterministic naming everywhere
|
||||
|
||||
### Performance at Scale
|
||||
- ~2000 composites took 6+ minutes to reconcile on k3d (GitHub #2256)
|
||||
- Reconciliation interval not easily configurable globally (GitHub #5934)
|
||||
|
||||
### YAML Limitations
|
||||
- No native loops, conditionals, or programming constructs
|
||||
- Complex compositions require changes in multiple locations
|
||||
|
||||
## XCP-ng Provider Gap
|
||||
|
||||
- No Crossplane provider for XCP-ng exists today
|
||||
- A mature Terraform provider (`terraform-provider-xenorchestra`) exists, maintained by Vates
|
||||
- Could be wrapped via Upjet to auto-generate a Crossplane provider — but nobody has done it
|
||||
- Would be a greenfield open-source project
|
||||
|
||||
## Real Issues Reported
|
||||
|
||||
- API server unresponsiveness with too many CRDs (GitHub #2649)
|
||||
- CRD scaling issues beyond ~500 CRDs (GitHub #2895)
|
||||
- GCP SQL resources randomly marked for deletion — dangerous for production databases
|
||||
- Reconciliation rate limiting at scale (GitHub #2256)
|
||||
|
||||
## Conclusion
|
||||
|
||||
Crossplane solves a real problem (multi-platform abstraction) that we need, but the lack of
|
||||
plan/preview makes it unsuitable for enterprise-scale production infrastructure management.
|
||||
The operational concerns (CRD bloat, debugging, cluster dependency) add further risk.
|
||||
|
||||
We need to find an alternative approach to the multi-platform abstraction problem that Crossplane
|
||||
solves, while retaining plan/preview capabilities.
|
||||
Reference in New Issue
Block a user