Security & Authentication

bilbycast-relay is stateless and zero-knowledge — it forwards encrypted bytes between edges without ever being able to read them — but it still has security-relevant surfaces that operators need to configure correctly. This page covers all of them.

Threat model in one paragraph

bilbycast-relay sits between two edge nodes that are typically behind NAT. It pairs them by tunnel UUID and forwards packets. The edges use a shared 32-byte ChaCha20-Poly1305 key (distributed by the manager) to encrypt every payload before it touches the relay, so a compromised relay leaks only ciphertext, packet sizes, and timing. Attackers we defend against: an attacker who can run a malicious relay, an attacker on the network between edge and relay, an attacker who tries to bind to a tunnel UUID they don’t own, and an attacker who tries to call the relay’s REST API to enumerate or disrupt other tenants’ tunnels.

Layer 1 — TLS 1.3 via QUIC

On the QUIC carrier, the transport from any edge to the relay is QUIC, which mandates TLS 1.3. ALPN is enforced — the relay only accepts the bilbycast-relay protocol identifier, which prevents anyone speaking a different ALPN from completing the handshake even if they reach the QUIC port.

The native-UDP carrier (plain UDP, used for native SRT/RIST and bond legs) has no transport TLS — its confidentiality rests entirely on Layer 2 below. That is by design: the carrier exists precisely to avoid QUIC’s per-packet overhead for inner protocols that already encrypt and recover their own traffic, and the relay never sees anything but Layer-2 ciphertext on it either.

The relay generates a self-signed cert at startup if none is configured (tls_cert_path / tls_key_path in the relay config). Edges connecting to a self-signed relay must explicitly opt in (accept_self_signed_cert: true plus BILBYCAST_ALLOW_INSECURE=1) — the same safety guard as the manager. For production, supply a real cert.

Edges can also pin the relay’s cert via cert_fingerprint (SHA-256), which validates the exact cert without trusting any CA store.

Layer 2 — End-to-end ChaCha20-Poly1305

This is the crucial layer. The relay is zero-knowledge by design: every payload is encrypted by the source edge with ChaCha20-Poly1305 (AEAD) using a 32-byte key (tunnel_encryption_key) generated by the manager and distributed to both edges out of band. The relay sees only:

The tunnel UUID (used to route to the peer)
The ciphertext + 16-byte authentication tag
A 12-byte nonce
Packet sizes and timing

It cannot read the plaintext, modify it without breaking the auth tag, or replay packets across tunnels (the nonce + key combination is per-tunnel).

Per-packet overhead: 28 bytes (12-byte nonce + 16-byte Poly1305 tag).

Wire framing

Transport	Framing
TCP	`[4-byte BE length][nonce + ciphertext + tag]` per encrypted record
UDP	Tunnel ID prefix + (`nonce + ciphertext + tag`) — payload encrypted before the tunnel ID is prepended

This means the relay can de-multiplex by tunnel UUID without ever having to decrypt anything.

Layer 3 — Per-tunnel HMAC bind tokens

The end-to-end encryption protects the payload, but it doesn’t on its own prevent an attacker from binding to a tunnel UUID they don’t own and exhausting relay resources. To close that gap, the relay supports optional per-tunnel bind authentication managed by the manager:

The manager generates a 32-byte secret per tunnel (tunnel_bind_secret).
The manager sends an authorize_tunnel command to the relay, providing the tunnel UUID and a precomputed HMAC-SHA256 token derived from the secret.
The manager distributes the secret to both edges.
When an edge binds to the tunnel, it computes the same HMAC and includes it in the TunnelBind message as bind_token.
The relay compares the bind token to its stored authorisation with constant-time comparison (so timing attacks can’t recover the secret bit by bit).
Mismatched or missing tokens cause the bind to be rejected with a TunnelDown notification, which surfaces to the manager as an event.

To revoke an authorisation, the manager sends revoke_tunnel — subsequent binds with the old token are rejected.

Backwards compatibility

If no authorisation has been registered for a tunnel UUID, the relay falls back to unauthenticated bind (any edge that knows the UUID can bind). This is for backwards compatibility with older managers that don’t yet send authorize_tunnel. To enforce bind authentication everywhere, make sure the manager is configured to call authorize_tunnel for every tunnel it creates.

Layer 4 — REST API Bearer token

The relay exposes a small REST API for stats and topology inspection:

Endpoint	Auth required (when `api_token` is set)
`GET /health`	No — always public
`GET /metrics`	Yes
`GET /api/v1/tunnels`	Yes
`GET /api/v1/edges`	Yes
`GET /api/v1/stats`	Yes

To enable token auth, set api_token in the relay config to a 32–128 character string:

api_token = "f3a6b8c1d4e7..."

Clients must then send Authorization: Bearer <token> on every request to a non-/health endpoint. The token is checked with constant-time comparison.

If api_token is unset, all endpoints are open and the relay logs a startup warning. This is permitted for development and isolated networks but not recommended for anything reachable from the public internet.

Layer 5 — Manager WebSocket

The relay can optionally connect outbound to a bilbycast-manager via the same WebSocket protocol used by edges. The auth model is identical:

Initial registration uses a short-lived token issued by the manager.
The manager mints a permanent node_secret that the relay stores in its config.
Subsequent reconnects authenticate with the secret.
The relay enforces wss:// and supports accept_self_signed_cert (gated by BILBYCAST_ALLOW_INSECURE=1) and cert pinning (cert_fingerprint).

This connection is the channel the manager uses to call authorize_tunnel, revoke_tunnel, disconnect_edge, and close_tunnel.

Hardening checklist

For production deployments:

Provide a real TLS cert for the relay (tls_cert_path / tls_key_path in the relay config). Don’t rely on the self-signed fallback.
Set api_token in the relay config to a long random value.
Configure the manager to issue authorize_tunnel for every tunnel — never rely on the unauthenticated-bind fallback.
Distribute tunnel_encryption_key only via the manager, never out-of-band by hand.
On edge configs, prefer cert_fingerprint over accept_self_signed_cert.
Run the relay behind a firewall that only exposes the QUIC port (default 4433), the native-UDP carrier port (default 4434, if you use native SRT/RIST or bond legs over relay), and the REST API port to the systems that need them.
Monitor the relay’s event stream for tunnel.bind_rejected events — repeated failures indicate either misconfiguration or an active attack.

What the relay logs and what it doesn’t

Logged	Not logged
Connection lifecycle (edge connect/disconnect, tunnel bind/unbind)	Tunnel ciphertext or any decrypted payload
Bind authentication failures	Tunnel encryption keys or bind secrets
Push status updates from manager commands	Edge-to-edge media content
Stats and bandwidth counters	Specific source/destination IPs of the encapsulated traffic
TLS handshake errors	Anything that would let an attacker correlate observed bytes back to a flow