Events & Alarms
bilbycast-edge generates operational events and forwards them to bilbycast-manager via WebSocket. Events provide real-time visibility into connection state changes, failures, and significant operational conditions that go beyond periodic stats and health messages.

Event Protocol
Section titled “Event Protocol”Events are sent as WebSocket messages with type "event":
{ "type": "event", "timestamp": "2026-04-02T12:00:00Z", "payload": { "severity": "warning", "category": "srt", "message": "SRT input disconnected, reconnecting", "flow_id": "flow-1", "details": { ... } }}Severity Levels
Section titled “Severity Levels”| Severity | Meaning | Action |
|---|---|---|
critical | Service-impacting failure | Operator should investigate immediately |
warning | Degradation or potential issue | Operator should investigate when possible |
info | Notable state change | No action required, operational awareness |
Event Fields
Section titled “Event Fields”| Field | Type | Required | Description |
|---|---|---|---|
severity | string | yes | "info", "warning", or "critical" |
category | string | yes | Event category (see tables below) |
message | string | yes | Human-readable description |
flow_id | string | no | Associated flow or tunnel ID |
details | object | no | Structured context (error codes, peer addresses, etc.) |
Deduplication
Section titled “Deduplication”Events are emitted on state transitions, not periodically. For example, an SRT disconnect event fires once when the connection drops, not repeatedly while disconnected. The corresponding reconnection event fires when the connection is restored.
For protocols with automatic reconnection loops (RTSP, SRT caller), disconnect events fire once per connection cycle — if the remote is unreachable and retries continue, subsequent retry failures are silent until a connection succeeds and then drops again.
Buffering
Section titled “Buffering”Events are queued in an unbounded in-memory channel. When the edge is not connected to the manager (e.g., during reconnection), events accumulate and are delivered once the connection is re-established.
Event Reference
Section titled “Event Reference”Flow Lifecycle (flow)
Section titled “Flow Lifecycle (flow)”| Severity | Message | Trigger |
|---|---|---|
| info | Flow ‘{id}’ started | Flow successfully created and running |
| info | Flow ‘{id}’ stopped | Flow stopped by command or config change |
| critical | Flow ‘{id}’ failed to start: {error} | Flow startup error (input bind, output bind, validation) |
| critical | Flow input lost: {error} | Input task exited unexpectedly |
| info | Output ‘{id}’ added to flow ‘{flow_id}‘ | Hot-add output succeeded |
| info | Output ‘{id}’ removed from flow ‘{flow_id}‘ | Hot-remove output succeeded |
| warning | Output ‘{id}’ failed to start on flow ‘{flow_id}’: {error} | Output startup failure within a running flow |
Source: src/engine/manager.rs, src/engine/input_srt.rs
Bandwidth (bandwidth)
Section titled “Bandwidth (bandwidth)”| Severity | Message | Trigger | Details |
|---|---|---|---|
| warning | Flow ‘{id}’ bandwidth exceeded limit ({current} Mbps > {limit} Mbps) | Input bitrate exceeds configured bandwidth_limit.max_bitrate_mbps for the grace period (alarm action) | { current_mbps, limit_mbps, action: "alarm" } |
| critical | Flow ‘{id}’ blocked: bandwidth exceeded limit ({current} Mbps > {limit} Mbps) | Input bitrate exceeds configured limit for the grace period (block action) — packets dropped until bandwidth normalizes | { current_mbps, limit_mbps, action: "block" } |
| info | Flow ‘{id}’ bandwidth returned to normal, unblocked | Bitrate returned within limits after being blocked — flow resumes | { current_mbps, limit_mbps } |
| info | Flow ‘{id}’ bandwidth returned to normal ({current} Mbps <= {limit} Mbps) | Bitrate returned within limits after alarm | { current_mbps, limit_mbps } |
Source: src/engine/bandwidth_monitor.rs
SRT (srt)
Section titled “SRT (srt)”SRT Input
Section titled “SRT Input”| Severity | Message | Trigger |
|---|---|---|
| info | SRT input connected (mode=listener) | Listener accepted a caller connection |
| info | SRT input connected (mode={mode}) | Caller connected to remote |
| info | SRT input connected (mode=listener, redundant leg 1) | Redundant leg 1 accepted |
| warning | SRT input disconnected, reconnecting | Peer disconnected, reconnection in progress |
| critical | SRT input connection failed: {error} | Listener accept or caller connect failed |
SRT Output
Section titled “SRT Output”| Severity | Message | Trigger |
|---|---|---|
| info | SRT output ‘{id}’ connected | Peer connected (listener) or caller connected |
| warning | SRT output ‘{id}’ disconnected | Peer disconnected or connection lost |
| warning | SRT output ‘{id}’ stale connection detected | No ACK received after timeout, re-accepting |
| critical | SRT output ‘{id}’ connection failed: {error} | Caller can’t reach remote, or bind fails |
Source: src/engine/input_srt.rs, src/engine/output_srt.rs
SMPTE 2022-7 Redundancy (redundancy)
Section titled “SMPTE 2022-7 Redundancy (redundancy)”| Severity | Message | Trigger |
|---|---|---|
| warning | Redundant leg 1 lost | SRT redundant leg 1 stopped receiving |
| warning | Redundant leg 2 lost | SRT redundant leg 2 stopped receiving |
| critical | Both redundant legs lost | No data from either leg, will reconnect |
Source: src/engine/input_srt.rs
RTMP (rtmp)
Section titled “RTMP (rtmp)”| Severity | Message | Trigger |
|---|---|---|
| info | RTMP publisher connected | Client connected and started publishing |
| warning | RTMP publisher disconnected | Publisher disconnected |
| critical | RTMP server error: {error} | Server bind or accept failure |
Source: src/engine/input_rtmp.rs
RTSP (rtsp)
Section titled “RTSP (rtsp)”| Severity | Message | Trigger |
|---|---|---|
| info | RTSP connected to {url} | RTSP DESCRIBE/SETUP/PLAY succeeded |
| warning | RTSP input disconnected: {error}. Reconnecting in {n}s | Stream lost after a successful connection |
The disconnect event fires once per connection cycle — if the RTSP server is unreachable and the edge retries repeatedly, only the first failure emits an event. A new “connected” event followed by a new “disconnected” event will fire when the connection is established and then lost again.
Source: src/engine/input_rtsp.rs
HLS (hls)
Section titled “HLS (hls)”| Severity | Message | Trigger |
|---|---|---|
| warning | HLS segment upload failed: {error} | HTTP PUT fails for a segment |
| critical | HLS output failed: {error} | Output task exited with error |
Source: src/engine/output_hls.rs
WebRTC (webrtc)
Section titled “WebRTC (webrtc)”| Severity | Message | Trigger |
|---|---|---|
| info | WHIP publisher connected | ICE+DTLS complete on WHIP server input |
| info | WHIP publisher disconnected | Publisher left |
| info | WHEP connected | WHEP client input connected to remote |
| info | WHEP disconnected | WHEP client input disconnected |
| info | WHEP viewer connected | New WHEP viewer joined an output |
| info | WHEP viewer disconnected | WHEP viewer left an output |
| warning | WebRTC session failed: {error} | ICE failure, DTLS error, or session creation error |
| warning | WebRTC session creation failed: {error} | Output session could not be created |
Source: src/engine/input_webrtc.rs, src/engine/output_webrtc.rs
Audio Encoder (audio_encode)
Section titled “Audio Encoder (audio_encode)”ffmpeg-sidecar audio encoder lifecycle for the Phase B compressed-audio egress on RTMP, HLS, and WebRTC outputs. Emitted by the per-output build helpers (output_rtmp::build_encoder_state, output_hls::hls_output_loop startup gate, output_webrtc::build_webrtc_encoder_state) and the long-running encoder supervisor (engine::audio_encode::supervisor_loop).
| Severity | Message | Trigger |
|---|---|---|
| info | audio encoder started: output ‘{id}’ codec={codec} {N} kbps | First successful ffmpeg spawn for an RTMP/WebRTC output, or HLS startup with audio_encode set. Details payload: { output_id, codec, bitrate_kbps, sample_rate, channels } |
| warning | audio encoder restarted: output ‘{id}’ restart {N}/{max} | Supervisor restarted ffmpeg after a crash. Details payload: { output_id, restart_count, max_restarts } |
| warning | HLS output ‘{id}’: segment {n} remux failed: {error} | Per-segment ffmpeg fork failed on a single HLS segment. The next segment may succeed |
| critical | audio encoder failed: output ‘{id}’ exhausted {N} restarts in {S} s | Supervisor gave up after MAX_RESTARTS in RESTART_WINDOW |
| critical | RTMP/HLS/WebRTC output ‘{id}’: audio_encode requires ffmpeg in PATH but it is not installed | ffmpeg missing at lazy-build time |
| critical | output ‘{id}’: audio_encode requires AAC-LC input … got profile={p} | Phase A AacDecoder rejected the source AAC profile (HE-AAC, AAC-Main, multichannel, etc.) |
| critical | output ‘{id}’: audio_encode is set but the flow input cannot carry TS audio (PCM-only source) | compressed_audio_input is false (e.g. ST 2110-30, rtp_audio input) |
| critical | output ‘{id}’: audio_encode encoder spawn failed: {error} | AudioEncoder::spawn failed for any other reason (codec rejected by ffmpeg, etc.) |
Source: src/engine/audio_encode.rs, src/engine/output_rtmp.rs, src/engine/output_hls.rs, src/engine/output_webrtc.rs.
Tunnel (tunnel)
Section titled “Tunnel (tunnel)”| Severity | Message | Trigger |
|---|---|---|
| info | Tunnel ‘{name}’ started | Tunnel created and connecting |
| info | Tunnel ‘{name}’ stopped | Tunnel stopped by command or config change |
| info | Tunnel connected to relay | QUIC connection established and TunnelReady received |
| warning | Tunnel disconnected from relay: {reason} | QUIC connection lost or forwarder exited |
| warning | Tunnel peer disconnected: {reason} | Relay reported peer unbound (TunnelDown) |
| warning | Tunnel connection to relay failed: {error} | QUIC connect or TLS error |
| critical | Tunnel ‘{name}’ failed: {error} | Tunnel task exited with fatal error |
| critical | Tunnel bind rejected by relay: {reason} | HMAC bind token verification failed |
Source: src/tunnel/manager.rs, src/tunnel/relay_client.rs
Manager Connection (manager)
Section titled “Manager Connection (manager)”| Severity | Message | Trigger |
|---|---|---|
| info | Connected to manager | WebSocket auth succeeded |
| warning | Manager connection lost, reconnecting | WebSocket closed or errored |
| critical | Manager authentication failed: {reason} | Auth rejected by manager |
Source: src/manager/client.rs
Configuration (config)
Section titled “Configuration (config)”| Severity | Message | Trigger |
|---|---|---|
| info | Configuration updated | Config applied via manager command |
| warning | Failed to persist configuration: {error} | Config write to disk failed after update |
Source: src/manager/client.rs
PID Bus / Flow Assembly (flow)
Section titled “PID Bus / Flow Assembly (flow)”Errors generated while bringing up or hot-swapping an assembled flow (assembly.kind = spts | mpts). All pid_bus_* codes are emitted as Critical events with a structured details payload (error_code, input_id, input_type, program_number, …) and the same error_code rides on the corresponding command_ack.error_code — so the manager UI can highlight the offending field on Create/Update modals without parsing the error string. See Flow Assembly (PID Bus).
| Error code | Trigger | Details payload | Remediation |
|---|---|---|---|
pid_bus_spts_input_needs_audio_encode | A referenced input could produce TS via input-level audio_encode but isn’t configured. | { input_id, input_type } | Set audio_encode.codec = "aac_lc" (or HE-AAC / s302m) on the input. ST 2110-31 must use s302m. |
pid_bus_audio_encode_codec_not_supported_on_input | audio_encode.codec validates but has no runtime path on the decoded-ES cache yet (today: mp2, ac3). | { input_id, input_type } | First-light codecs: aac_lc, he_aac_v1, he_aac_v2, s302m. mp2 / ac3 deferred. |
pid_bus_spts_non_ts_input | Referenced input has no current path to TS (e.g. ST 2110-40 ANC). | { input_id, input_type } | ST 2110-40 ANC-to-TS wrapping is deferred. |
pid_bus_no_program | assembly.kind = spts/mpts but programs is empty. | {} | Should not normally reach runtime — config validation catches this earlier. |
pid_bus_essence_kind_not_implemented | SlotSource::Essence with a kind the resolver can’t yet satisfy. | { input_id, kind } | First-light supports video and audio; subtitle / data under development. |
pid_bus_essence_no_catalogue | Essence slot but the named input has no PSI catalogue yet (non-TS input or ingress not warm). | { input_id } | Switch to a SlotSource::Pid slot, or wait for PSI; re-try with UpdateFlowAssembly. |
pid_bus_essence_no_match | Essence slot of kind X, but no matching ES found in the input’s PMT. | { input_id, kind } | Check the input’s live PSI catalogue in the manager UI. |
pid_bus_spts_stream_type_mismatch | Warning logged when a slot’s configured stream_type doesn’t match the source PMT’s declared stream_type. | { input_id, source_pid, configured, observed } | Non-fatal — the slot still forwards bytes. Fix the stream_type on the slot to match the upstream PMT. |
pid_bus_hitless_leg_not_pid | A SlotSource::Hitless leg is neither Pid nor Essence. | { program_number, leg: "primary" | "backup" } | Nested Hitless is rejected at config-save time; this fires only if a follow-up variant slips past validation. |
pid_bus_mpts_pcr_source_required | MPTS program has no effective PCR (neither program-level pcr_source nor flow-level fallback). | { program_number } | Config validation also catches this — runtime check is a belt-and-braces guard. |
pid_bus_pcr_source_unresolved | Configured pcr_source (input_id, pid) doesn’t hit any slot in its program (or an Essence-slot’s input). | { input_id, pid, program_number } | Make sure the PCR PID is one of the PIDs you’re carrying into the program. |
Source: src/engine/flow.rs, src/engine/ts_assembler.rs, src/engine/ts_es_hitless.rs, src/engine/input_pcm_encode.rs.
Display (display)
Section titled “Display (display)”The local-display output (Linux-only, display Cargo feature) emits events under category display. Every failure event sets details.error_code for command_ack correlation, plus details.output_id so the manager UI attributes the failure to the offending output row on a multi-output flow.
| Event | Severity | Trigger |
|---|---|---|
display_started | info | Modeset succeeded, ALSA opened (or muted), first frame queued. |
display_stopped | info | Cancellation token fired. Includes lifetime frames_displayed, frames_dropped_late, audio_underruns. |
display_device_unavailable | critical | KMS connector vanished mid-flow (cable unplug). |
display_mode_set_failed | critical | drmModeSetCrtc returned EINVAL / ENOSPC for the chosen resolution / refresh. |
display_audio_open_failed | critical | snd_pcm_open returned non-zero, or ALSA writei returned ENODEV mid-stream. |
display_decoder_overload | warning | frames_dropped_late > 5 % over a 5-s rolling window. |
display_av_drift | warning | ` |
display_subscriber_lagged | warning | broadcast Lagged(n); rate-limited to one event / second. |
Save-time command_ack.error_code values: display_device_invalid, display_audio_device_invalid, display_resolution_unsupported, display_program_not_found, display_audio_track_not_found, display_device_busy, display_decoder_overload_predicted.
Full reference: Display Output.
Replay (replay)
Section titled “Replay (replay)”The replay server writer + playback emit events under category replay. All replay_* error_code values lift onto command_ack.error_code.
| Event | Severity | Trigger |
|---|---|---|
recording_started | info | A flow with recording.enabled = true brought up its writer. |
recording_stopped | info | Writer cancelled. |
recording_start_failed | critical | Disk I/O error before the first segment landed. |
clip_created | info | mark_in + mark_out produced a new clip. |
clip_deleted | info | Operator removed a clip. |
playback_started | info | A replay input started serving a clip. |
playback_stopped | info | Playback paused or cancelled. |
playback_eof | info | Reached the end of the clip / recording with loop_playback: false. |
writer_lagged | critical | The writer’s bounded mpsc filled — packets dropped to keep the broadcast channel non-blocking. Rate-limited to 1 per 5 s. |
disk_pressure | warning | Recording disk usage crossed 80 % of max_bytes. Sticky until usage falls back below 70 %. |
disk_full | critical | Out of disk space on the replay root. |
index_corrupt | warning | index.bin failed parse on writer init; recovery scan re-aligned to the last valid 24-byte boundary. |
recovery_alert | warning | Crash-recovery scan ran on writer init; .tmp/ orphans removed and / or recording.json was corrupt. |
metadata_stale | warning | recording.json write failed on segment roll. |
max_bytes_below_segment | warning | max_bytes smaller than one segment — retention can’t keep usage under the cap. |
Full reference: Replay and the operator-facing Replay UI.
Content Analysis (content_analysis)
Section titled “Content Analysis (content_analysis)”Tier-gated content health events fired by the in-depth analysis subscribers. Tiers are configured per-flow on FlowConfig.content_analysis (lite / audio_full / video_full). All event names start with content_analysis_* and carry a structured details.error_code.
| Event | Severity | Trigger | Tier |
|---|---|---|---|
content_analysis_scte35_pid | info | New SCTE-35 PID observed in the PMT. | lite |
content_analysis_scte35_cue | info | SCTE-35 splice_insert / time_signal cue decoded. details.pts, details.cue_kind. | lite |
content_analysis_caption_lost | warning | CEA-608 / CEA-708 caption presence dropped from active to absent for ≥ 5 s. | lite |
content_analysis_mdi_above_threshold | warning | Media Delivery Index (RFC 4445) MLR or NDF exceeded the threshold. | lite |
content_analysis_audio_silent | warning | Hard mute or silence detected on a decoded audio PID for ≥ 3 s. | audio_full |
content_analysis_video_freeze | warning | YUV-SAD against previous frame indicated a freeze for ≥ 3 s. | video_full |
Tiers are recommended for monitor-only deployments — a flow with output_ids: [] and one or more analysis tiers on is the canonical shape for remote-site broadcast triage.
Bind / Port Conflict
Section titled “Bind / Port Conflict”Every runtime bind site on the edge emits a Critical event under either port_conflict (EADDRINUSE) or bind_failed (any other bind error) with a structured details = { error_code, component, addr, protocol, error }. The same error_code rides on command_ack.error_code, so the manager UI highlights the offending field on Create / Update modals without parsing the error string.
| Event | Severity | Trigger |
|---|---|---|
port_conflict | critical | Bind attempt returned EADDRINUSE. Common at udp_input, srt_input, rtsp_server, rtmp_server, whip_server, udp_output, standby listeners. |
bind_failed | critical | Any other bind error (permission, address-family mismatch, interface down). |
The manager preflights inputs and outputs against the node’s already-managed entities and rejects collisions with HTTP 422 + error_code: "port_conflict" before any WS round-trip — so most operator-visible failures surface as a save-time error rather than a runtime event.
System Resources (system_resources)
Section titled “System Resources (system_resources)”CPU and RAM threshold events. Configured under resource_limits in config.json — see Resource Limits. Only emitted when resource_limits is set; the configurable grace_period_secs debounces flapping.
| Event | Severity | Trigger |
|---|---|---|
system_resources_cpu_warning | warning | CPU usage crossed cpu_warning_percent for ≥ grace_period_secs. |
system_resources_cpu_critical | critical | CPU usage crossed cpu_critical_percent for ≥ grace_period_secs. With critical_action: "gate_flows", new flow creation is rejected while in this state. |
system_resources_ram_warning | warning | RAM usage crossed ram_warning_percent. |
system_resources_ram_critical | critical | RAM usage crossed ram_critical_percent. |
system_resources_recovered | info | Returned below the warning threshold. |
Distinct from the edge’s static HealthPayload.resource_budget snapshot — that’s a one-shot hardware-capability advertisement at startup.
Bonded (bonded)
Section titled “Bonded (bonded)”Multi-path bonding stack events on the bonded input / output type. Drives the per-leg link indicators in the manager UI.
| Event | Severity | Trigger |
|---|---|---|
bonded_path_up | info | A bonded path adapter completed handshake. |
bonded_path_down | warning | Handshake heartbeat timed out on a bonded leg. |
bonded_all_paths_down | critical | Every bonded leg has lost its peer — flow is down. |
bonded_path_throughput_degraded | warning | A leg’s effective throughput dropped below the configured floor. |
Full reference: Bonding.
Manager-Generated Events
Section titled “Manager-Generated Events”In addition to events sent by the edge, the manager itself generates events under several categories — connection, config_sync, routine, media_library, replay-watchdog, and rate-limiting:
| Severity | Category | Message | Trigger |
|---|---|---|---|
| info | connection | Node connected to manager | Edge successfully authenticates |
| warning | compatibility | Node WS protocol version differs | Protocol version mismatch during auth |
| critical | connection | Node disconnected from manager | Edge WebSocket closes |
| warning | config_sync | Drift detected on managed entity | Reconciliation found a mismatch between manager DB and the edge’s reported config |
| warning | routine | routine_fire_partial | Some actions in a fire failed |
| critical | routine | routine_fire_failed | Every action in a fire failed |
| warning | routine | routine_fire_missed | A scheduled fire was older than the 15-minute grace window and didn’t replay |
| warning | media_library | media_quota_exhausted | A media-library upload would exceed the per-file or per-node total cap |
| warning | media_library | media_deleted_in_use | An operator deleted a media file that still has media_player inputs referencing it |
| info | media_library | media_upload_aborted | An upload was cancelled mid-stream after at least 50 % had transferred |
| critical | replay-watchdog | recording_stalled | A flow’s recording writer hasn’t produced a new segment for > 2 × segment_seconds |
| warning | event_rate_limit | event_rate_limit_exceeded | An edge tripped the per-node event-rate limit and the manager started shedding events |
These are generated server-side in bilbycast-manager/crates/manager-server/src/ws/node_hub.rs and the matching reconciliation / routine / media-library / replay-watchdog modules.
Event Categories Summary
Section titled “Event Categories Summary”| Category | Count | Description |
|---|---|---|
flow | 18 | Flow lifecycle (start/stop/fail, output add/remove) + PID bus / Flow Assembly errors |
bandwidth | 4 | Per-flow bandwidth monitoring (alarm, block, recovery) |
srt | 9 | SRT input and output connection state |
redundancy | 3 | SMPTE 2022-7 dual-leg status |
rtmp | 3 | RTMP publisher connections |
rtsp | 2 | RTSP input state |
hls | 2 | HLS output failures |
webrtc | 8 | WHIP/WHEP session lifecycle |
audio_encode | 7 | ffmpeg-sidecar audio encoder lifecycle (Phase B) |
tunnel | 8 | Tunnel connection state |
manager | 3 | Manager WebSocket connection |
config | 2 | Configuration changes |
ptp | — | SMPTE ST 2110 PTP slave clock state changes (Phase 1) |
network_leg | — | SMPTE 2022-7 Red/Blue per-leg loss / recovery (Phase 1) |
nmos | — | NMOS IS-04 / IS-05 / IS-08 controller activity (Phase 1) |
scte104 | — | SCTE-104 splice events parsed from ST 2110-40 ANC (Phase 1) |
| Total | 69 |
Phase 1 ST 2110 categories
Section titled “Phase 1 ST 2110 categories”| Category | Typical severity | Triggers |
|---|---|---|
ptp | info / warning / critical | ptp_lock_acquired, ptp_lock_lost, ptp_holdover, ptp_unavailable |
network_leg | warning / critical | red_leg_lost, blue_leg_lost, leg_recovered, both_legs_lost |
nmos | info | NMOS controller IS-05 activations, IS-08 channel-map changes |
scte104 | info | Cue-out / cue-in / cancel splice messages parsed from ANC |
The four categories are declared up-front in src/manager/events.rs so the manager UI’s category icons and filter dropdown render them as soon as the first event arrives. Producers wire them in step 9 of the Phase 1 plan.
By Severity
Section titled “By Severity”| Severity | Count | Description |
|---|---|---|
| critical | 26 | Service-impacting: flow/tunnel failures, auth rejection, both legs lost, bandwidth block, audio_encode build/restart-cap failures, PID-bus bring-up errors |
| warning | 21 | Degradation: disconnects, stale connections, upload failures, reconnects, bandwidth exceeded, audio_encode restart / per-segment HLS remux failure, PID-bus stream_type mismatch |
| info | 23 | State changes: connections established, flows started, config updated, bandwidth recovery, audio_encode started |