Files
lab/dns-research.md

144 lines
4.7 KiB
Markdown
Raw Permalink Normal View History

2026-03-15 23:50:43 +00:00
# DNS Solution Research
## Decision: PowerDNS Authoritative + ExternalDNS
### Why PowerDNS
| Feature | PowerDNS | CoreDNS | BIND9 | Technitium |
|---------|----------|---------|-------|------------|
| REST API | Full | No (needs etcd) | No (nsupdate) | Yes |
| Database backend | PostgreSQL/MySQL/SQLite | etcd | Zone files | Custom |
| Health-aware DNS | Lua records (ifportup, ifurlup) | No | No | No |
| ExternalDNS provider | Yes | Yes (via etcd) | Yes (RFC 2136) | No |
| DNSSEC | Yes | Limited | Best | Yes |
| Split DNS | dnsdist routing | Corefile blocks | Views (best) | APP records |
| Maturity | ISP-grade | K8s-focused | Oldest | Newer |
PowerDNS wins on: REST API (critical for Lab), health-check-aware Lua records,
database backend for HA, and ExternalDNS integration.
### Architecture
```
Lab Server
(control plane)
│ PowerDNS REST API
┌───────────────┐
│ PowerDNS │
│ Authoritative│──── PostgreSQL/SQLite backend
│ Server │
└───────┬───────┘
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
Internal DNS ExternalDNS dnsdist
.lab.internal (k8s syncs (split DNS
Services/ routing)
Ingress)
```
### How Lab Uses DNS
#### Auto-registration on onboard
When `lab onboard` completes, Lab calls PowerDNS API:
- A record: `<server>.lab.internal → <ip>`
- PTR record: `<reverse-ip>.in-addr.arpa → <server>.lab.internal`
- Both created/updated atomically
#### Domain claims via labels
Labels can claim shared domain names:
```yaml
labels:
mailserver:
dns:
records:
- type: A
name: "{{server.name}}.lab.internal"
claims:
- name: mail.example.com
type: A
health_check: { port: 25 }
```
All servers with label `mailserver` contribute to `mail.example.com` round-robin.
PowerDNS Lua records remove unhealthy servers automatically.
#### IP mobility
Lab agent on machine reports IP change → Lab server updates PowerDNS API →
A record, PTR, and all claimed domains updated.
#### K8s integration
ExternalDNS runs in k8s, syncs Service/Ingress records to same PowerDNS instance.
Same DNS server serves both bare metal and k8s records.
#### Groups claiming domains
Groups can claim domains for all member servers:
```yaml
groups:
production-web:
match:
labels: [web-frontend]
environment: prod
dns:
claims:
- name: www.example.com
type: A
health_check: { url: "https://{{server.ip}}/healthz" }
```
### DNS Plugin Interface
```go
type DNSPlugin interface {
Name() string
// Record management
CreateRecord(zone, name, recordType string, targets []string, ttl int) error
UpdateRecord(zone, name, recordType string, targets []string, ttl int) error
DeleteRecord(zone, name, recordType string) error
ListRecords(zone string) ([]Record, error)
// Health-checked records
CreateHealthCheckedRecord(zone, name string, targets []string, check HealthCheck) error
// Zone management
CreateZone(name string, kind string) error
DeleteZone(name string) error
}
```
Built-in:
- `dns-powerdns` — PowerDNS REST API (primary)
- `dns-route53` — AWS Route53 (for cloud deployments)
- `dns-rfc2136` — RFC 2136 dynamic updates (BIND/Knot fallback)
### Split DNS Setup
Internal zones (`.lab.internal`) served by PowerDNS authoritatively.
External queries forwarded upstream (8.8.8.8, ISP DNS).
Options:
- **dnsdist** (PowerDNS ecosystem) routes by source subnet
- **CoreDNS as resolver** — serves internal from PowerDNS, forwards external
- **BIND views** — if we need view-based split on same zone (unlikely)
### Evaluated and Not Chosen
| Tool | Why Not |
|------|---------|
| CoreDNS | No REST API, needs etcd intermediary, k8s-focused |
| BIND9 | No REST API, nsupdate is cumbersome for automation |
| Technitium | No ExternalDNS provider, newer/smaller community |
| dnsmasq | Not suitable — caching forwarder, no API, ~1000 client limit |
| Knot DNS | No REST API, better as secondary/downstream |
### DNS-as-Code (Optional Layer)
For static DNS infrastructure (SOA, NS, MX, base zone config):
- **octoDNS** (GitHub) or **DNSControl** (Stack Exchange)
- GitOps workflow: PR → review → merge → sync to PowerDNS
- Dynamic records (server A records, claims) managed by Lab directly via API
- Static records managed via DNS-as-code in Git