Skip to content

Security & Authentication

bilbycast-relay is stateless and zero-knowledge — it forwards encrypted bytes between edges without ever being able to read them — but it still has security-relevant surfaces that operators need to configure correctly. This page covers all of them.

bilbycast-relay sits between two edge nodes that are typically behind NAT. It pairs them by tunnel UUID and forwards packets. The edges use a shared 32-byte ChaCha20-Poly1305 key (distributed by the manager) to encrypt every payload before it touches the relay, so a compromised relay leaks only ciphertext, packet sizes, and timing. Attackers we defend against: an attacker who can run a malicious relay, an attacker on the network between edge and relay, an attacker who tries to bind to a tunnel UUID they don’t own, and an attacker who tries to call the relay’s REST API to enumerate or disrupt other tenants’ tunnels.

The transport from any edge to the relay is QUIC, which mandates TLS 1.3. ALPN is enforced — the relay only accepts the bilbycast-relay protocol identifier, which prevents anyone speaking a different ALPN from completing the handshake even if they reach the QUIC port.

The relay generates a self-signed cert at startup if none is configured (BILBYCAST_RELAY_CERT / BILBYCAST_RELAY_KEY). Edges connecting to a self-signed relay must explicitly opt in (accept_self_signed_cert: true plus BILBYCAST_ALLOW_INSECURE=1) — the same safety guard as the manager. For production, supply a real cert.

Edges can also pin the relay’s cert via cert_fingerprint (SHA-256), which validates the exact cert without trusting any CA store.

This is the crucial layer. The relay is zero-knowledge by design: every payload is encrypted by the source edge with ChaCha20-Poly1305 (AEAD) using a 32-byte key (tunnel_encryption_key) generated by the manager and distributed to both edges out of band. The relay sees only:

  • The tunnel UUID (used to route to the peer)
  • The ciphertext + 16-byte authentication tag
  • A 12-byte nonce
  • Packet sizes and timing

It cannot read the plaintext, modify it without breaking the auth tag, or replay packets across tunnels (the nonce + key combination is per-tunnel).

Per-packet overhead: 28 bytes (12-byte nonce + 16-byte Poly1305 tag).

TransportFraming
TCP[4-byte BE length][nonce + ciphertext + tag] per encrypted record
UDPTunnel ID prefix + (nonce + ciphertext + tag) — payload encrypted before the tunnel ID is prepended

This means the relay can de-multiplex by tunnel UUID without ever having to decrypt anything.

The end-to-end encryption protects the payload, but it doesn’t on its own prevent an attacker from binding to a tunnel UUID they don’t own and exhausting relay resources. To close that gap, the relay supports optional per-tunnel bind authentication managed by the manager:

  1. The manager generates a 32-byte secret per tunnel (tunnel_bind_secret).
  2. The manager sends an authorize_tunnel command to the relay, providing the tunnel UUID and a precomputed HMAC-SHA256 token derived from the secret.
  3. The manager distributes the secret to both edges.
  4. When an edge binds to the tunnel, it computes the same HMAC and includes it in the TunnelBind message as bind_token.
  5. The relay compares the bind token to its stored authorisation with constant-time comparison (so timing attacks can’t recover the secret bit by bit).
  6. Mismatched or missing tokens cause the bind to be rejected with a TunnelDown notification, which surfaces to the manager as an event.

To revoke an authorisation, the manager sends revoke_tunnel — subsequent binds with the old token are rejected.

If no authorisation has been registered for a tunnel UUID, the relay falls back to unauthenticated bind (any edge that knows the UUID can bind). This is for backwards compatibility with older managers that don’t yet send authorize_tunnel. To enforce bind authentication everywhere, make sure the manager is configured to call authorize_tunnel for every tunnel it creates.

The relay exposes a small REST API for stats and topology inspection:

EndpointAuth required (when api_token is set)
GET /healthNo — always public
GET /metricsYes
GET /api/v1/tunnelsYes
GET /api/v1/edgesYes
GET /api/v1/statsYes

To enable token auth, set api_token in the relay config to a 32–128 character string:

api_token = "f3a6b8c1d4e7..."

Clients must then send Authorization: Bearer <token> on every request to a non-/health endpoint. The token is checked with constant-time comparison.

If api_token is unset, all endpoints are open and the relay logs a startup warning. This is permitted for development and isolated networks but not recommended for anything reachable from the public internet.

The relay can optionally connect outbound to a bilbycast-manager via the same WebSocket protocol used by edges. The auth model is identical:

  • Initial registration uses a short-lived token issued by the manager.
  • The manager mints a permanent node_secret that the relay stores in its config.
  • Subsequent reconnects authenticate with the secret.
  • The relay enforces wss:// and supports accept_self_signed_cert (gated by BILBYCAST_ALLOW_INSECURE=1) and cert pinning (cert_fingerprint).

This connection is the channel the manager uses to call authorize_tunnel, revoke_tunnel, disconnect_edge, and close_tunnel.

For production deployments:

  • Provide a real TLS cert for the relay (BILBYCAST_RELAY_CERT / BILBYCAST_RELAY_KEY). Don’t rely on the self-signed fallback.
  • Set api_token in the relay config to a long random value.
  • Configure the manager to issue authorize_tunnel for every tunnel — never rely on the unauthenticated-bind fallback.
  • Distribute tunnel_encryption_key only via the manager, never out-of-band by hand.
  • On edge configs, prefer cert_fingerprint over accept_self_signed_cert.
  • Run the relay behind a firewall that only exposes the QUIC port (default 4433) and the REST API port to the systems that need them.
  • Monitor the relay’s event stream for tunnel.bind_rejected events — repeated failures indicate either misconfiguration or an active attack.
LoggedNot logged
Connection lifecycle (edge connect/disconnect, tunnel bind/unbind)Tunnel ciphertext or any decrypted payload
Bind authentication failuresTunnel encryption keys or bind secrets
Push status updates from manager commandsEdge-to-edge media content
Stats and bandwidth countersSpecific source/destination IPs of the encapsulated traffic
TLS handshake errorsAnything that would let an attacker correlate observed bytes back to a flow