Security & Authentication
bilbycast-relay is stateless and zero-knowledge — it forwards encrypted bytes between edges without ever being able to read them — but it still has security-relevant surfaces that operators need to configure correctly. This page covers all of them.
Threat model in one paragraph
Section titled “Threat model in one paragraph”bilbycast-relay sits between two edge nodes that are typically behind NAT. It pairs them by tunnel UUID and forwards packets. The edges use a shared 32-byte ChaCha20-Poly1305 key (distributed by the manager) to encrypt every payload before it touches the relay, so a compromised relay leaks only ciphertext, packet sizes, and timing. Attackers we defend against: an attacker who can run a malicious relay, an attacker on the network between edge and relay, an attacker who tries to bind to a tunnel UUID they don’t own, and an attacker who tries to call the relay’s REST API to enumerate or disrupt other tenants’ tunnels.
Layer 1 — TLS 1.3 via QUIC
Section titled “Layer 1 — TLS 1.3 via QUIC”The transport from any edge to the relay is QUIC, which mandates TLS 1.3. ALPN is enforced — the relay only accepts the bilbycast-relay protocol identifier, which prevents anyone speaking a different ALPN from completing the handshake even if they reach the QUIC port.
The relay generates a self-signed cert at startup if none is configured (BILBYCAST_RELAY_CERT / BILBYCAST_RELAY_KEY). Edges connecting to a self-signed relay must explicitly opt in (accept_self_signed_cert: true plus BILBYCAST_ALLOW_INSECURE=1) — the same safety guard as the manager. For production, supply a real cert.
Edges can also pin the relay’s cert via cert_fingerprint (SHA-256), which validates the exact cert without trusting any CA store.
Layer 2 — End-to-end ChaCha20-Poly1305
Section titled “Layer 2 — End-to-end ChaCha20-Poly1305”This is the crucial layer. The relay is zero-knowledge by design: every payload is encrypted by the source edge with ChaCha20-Poly1305 (AEAD) using a 32-byte key (tunnel_encryption_key) generated by the manager and distributed to both edges out of band. The relay sees only:
- The tunnel UUID (used to route to the peer)
- The ciphertext + 16-byte authentication tag
- A 12-byte nonce
- Packet sizes and timing
It cannot read the plaintext, modify it without breaking the auth tag, or replay packets across tunnels (the nonce + key combination is per-tunnel).
Per-packet overhead: 28 bytes (12-byte nonce + 16-byte Poly1305 tag).
Wire framing
Section titled “Wire framing”| Transport | Framing |
|---|---|
| TCP | [4-byte BE length][nonce + ciphertext + tag] per encrypted record |
| UDP | Tunnel ID prefix + (nonce + ciphertext + tag) — payload encrypted before the tunnel ID is prepended |
This means the relay can de-multiplex by tunnel UUID without ever having to decrypt anything.
Layer 3 — Per-tunnel HMAC bind tokens
Section titled “Layer 3 — Per-tunnel HMAC bind tokens”The end-to-end encryption protects the payload, but it doesn’t on its own prevent an attacker from binding to a tunnel UUID they don’t own and exhausting relay resources. To close that gap, the relay supports optional per-tunnel bind authentication managed by the manager:
- The manager generates a 32-byte secret per tunnel (
tunnel_bind_secret). - The manager sends an
authorize_tunnelcommand to the relay, providing the tunnel UUID and a precomputed HMAC-SHA256 token derived from the secret. - The manager distributes the secret to both edges.
- When an edge binds to the tunnel, it computes the same HMAC and includes it in the
TunnelBindmessage asbind_token. - The relay compares the bind token to its stored authorisation with constant-time comparison (so timing attacks can’t recover the secret bit by bit).
- Mismatched or missing tokens cause the bind to be rejected with a
TunnelDownnotification, which surfaces to the manager as an event.
To revoke an authorisation, the manager sends revoke_tunnel — subsequent binds with the old token are rejected.
Backwards compatibility
Section titled “Backwards compatibility”If no authorisation has been registered for a tunnel UUID, the relay falls back to unauthenticated bind (any edge that knows the UUID can bind). This is for backwards compatibility with older managers that don’t yet send authorize_tunnel. To enforce bind authentication everywhere, make sure the manager is configured to call authorize_tunnel for every tunnel it creates.
Layer 4 — REST API Bearer token
Section titled “Layer 4 — REST API Bearer token”The relay exposes a small REST API for stats and topology inspection:
| Endpoint | Auth required (when api_token is set) |
|---|---|
GET /health | No — always public |
GET /metrics | Yes |
GET /api/v1/tunnels | Yes |
GET /api/v1/edges | Yes |
GET /api/v1/stats | Yes |
To enable token auth, set api_token in the relay config to a 32–128 character string:
api_token = "f3a6b8c1d4e7..."Clients must then send Authorization: Bearer <token> on every request to a non-/health endpoint. The token is checked with constant-time comparison.
If api_token is unset, all endpoints are open and the relay logs a startup warning. This is permitted for development and isolated networks but not recommended for anything reachable from the public internet.
Layer 5 — Manager WebSocket
Section titled “Layer 5 — Manager WebSocket”The relay can optionally connect outbound to a bilbycast-manager via the same WebSocket protocol used by edges. The auth model is identical:
- Initial registration uses a short-lived token issued by the manager.
- The manager mints a permanent
node_secretthat the relay stores in its config. - Subsequent reconnects authenticate with the secret.
- The relay enforces
wss://and supportsaccept_self_signed_cert(gated byBILBYCAST_ALLOW_INSECURE=1) and cert pinning (cert_fingerprint).
This connection is the channel the manager uses to call authorize_tunnel, revoke_tunnel, disconnect_edge, and close_tunnel.
Hardening checklist
Section titled “Hardening checklist”For production deployments:
- Provide a real TLS cert for the relay (
BILBYCAST_RELAY_CERT/BILBYCAST_RELAY_KEY). Don’t rely on the self-signed fallback. - Set
api_tokenin the relay config to a long random value. - Configure the manager to issue
authorize_tunnelfor every tunnel — never rely on the unauthenticated-bind fallback. - Distribute
tunnel_encryption_keyonly via the manager, never out-of-band by hand. - On edge configs, prefer
cert_fingerprintoveraccept_self_signed_cert. - Run the relay behind a firewall that only exposes the QUIC port (default 4433) and the REST API port to the systems that need them.
- Monitor the relay’s
eventstream fortunnel.bind_rejectedevents — repeated failures indicate either misconfiguration or an active attack.
What the relay logs and what it doesn’t
Section titled “What the relay logs and what it doesn’t”| Logged | Not logged |
|---|---|
| Connection lifecycle (edge connect/disconnect, tunnel bind/unbind) | Tunnel ciphertext or any decrypted payload |
| Bind authentication failures | Tunnel encryption keys or bind secrets |
| Push status updates from manager commands | Edge-to-edge media content |
| Stats and bandwidth counters | Specific source/destination IPs of the encapsulated traffic |
| TLS handshake errors | Anything that would let an attacker correlate observed bytes back to a flow |