Skip to content

Events & Alarms

bilbycast-edge generates operational events and forwards them to bilbycast-manager via WebSocket. Events provide real-time visibility into connection state changes, failures, and significant operational conditions that go beyond periodic stats and health messages.

Events & Alarms — severity-graded table with badges, timestamps, node, category, and structured details filterable by severity / category / date

Events are sent as WebSocket messages with type "event":

{
"type": "event",
"timestamp": "2026-04-02T12:00:00Z",
"payload": {
"severity": "warning",
"category": "srt",
"message": "SRT input disconnected, reconnecting",
"flow_id": "flow-1",
"details": { ... }
}
}
SeverityMeaningAction
criticalService-impacting failureOperator should investigate immediately
warningDegradation or potential issueOperator should investigate when possible
infoNotable state changeNo action required, operational awareness
FieldTypeRequiredDescription
severitystringyes"info", "warning", or "critical"
categorystringyesEvent category (see tables below)
messagestringyesHuman-readable description
flow_idstringnoAssociated flow or tunnel ID
detailsobjectnoStructured context (error codes, peer addresses, etc.)

Events are emitted on state transitions, not periodically. For example, an SRT disconnect event fires once when the connection drops, not repeatedly while disconnected. The corresponding reconnection event fires when the connection is restored.

For protocols with automatic reconnection loops (RTSP, SRT caller), disconnect events fire once per connection cycle — if the remote is unreachable and retries continue, subsequent retry failures are silent until a connection succeeds and then drops again.

Events are queued in an unbounded in-memory channel. When the edge is not connected to the manager (e.g., during reconnection), events accumulate and are delivered once the connection is re-established.


SeverityMessageTrigger
infoFlow ‘{id}’ startedFlow successfully created and running
infoFlow ‘{id}’ stoppedFlow stopped by command or config change
criticalFlow ‘{id}’ failed to start: {error}Flow startup error (input bind, output bind, validation)
criticalFlow input lost: {error}Input task exited unexpectedly
infoOutput ‘{id}’ added to flow ‘{flow_id}‘Hot-add output succeeded
infoOutput ‘{id}’ removed from flow ‘{flow_id}‘Hot-remove output succeeded
warningOutput ‘{id}’ failed to start on flow ‘{flow_id}’: {error}Output startup failure within a running flow

Source: src/engine/manager.rs, src/engine/input_srt.rs


SeverityMessageTriggerDetails
warningFlow ‘{id}’ bandwidth exceeded limit ({current} Mbps > {limit} Mbps)Input bitrate exceeds configured bandwidth_limit.max_bitrate_mbps for the grace period (alarm action){ current_mbps, limit_mbps, action: "alarm" }
criticalFlow ‘{id}’ blocked: bandwidth exceeded limit ({current} Mbps > {limit} Mbps)Input bitrate exceeds configured limit for the grace period (block action) — packets dropped until bandwidth normalizes{ current_mbps, limit_mbps, action: "block" }
infoFlow ‘{id}’ bandwidth returned to normal, unblockedBitrate returned within limits after being blocked — flow resumes{ current_mbps, limit_mbps }
infoFlow ‘{id}’ bandwidth returned to normal ({current} Mbps <= {limit} Mbps)Bitrate returned within limits after alarm{ current_mbps, limit_mbps }

Source: src/engine/bandwidth_monitor.rs


SeverityMessageTrigger
infoSRT input connected (mode=listener)Listener accepted a caller connection
infoSRT input connected (mode={mode})Caller connected to remote
infoSRT input connected (mode=listener, redundant leg 1)Redundant leg 1 accepted
warningSRT input disconnected, reconnectingPeer disconnected, reconnection in progress
criticalSRT input connection failed: {error}Listener accept or caller connect failed
SeverityMessageTrigger
infoSRT output ‘{id}’ connectedPeer connected (listener) or caller connected
warningSRT output ‘{id}’ disconnectedPeer disconnected or connection lost
warningSRT output ‘{id}’ stale connection detectedNo ACK received after timeout, re-accepting
criticalSRT output ‘{id}’ connection failed: {error}Caller can’t reach remote, or bind fails

Source: src/engine/input_srt.rs, src/engine/output_srt.rs


SeverityMessageTrigger
warningRedundant leg 1 lostSRT redundant leg 1 stopped receiving
warningRedundant leg 2 lostSRT redundant leg 2 stopped receiving
criticalBoth redundant legs lostNo data from either leg, will reconnect

Source: src/engine/input_srt.rs


SeverityMessageTrigger
infoRTMP publisher connectedClient connected and started publishing
warningRTMP publisher disconnectedPublisher disconnected
criticalRTMP server error: {error}Server bind or accept failure

Source: src/engine/input_rtmp.rs


SeverityMessageTrigger
infoRTSP connected to {url}RTSP DESCRIBE/SETUP/PLAY succeeded
warningRTSP input disconnected: {error}. Reconnecting in {n}sStream lost after a successful connection

The disconnect event fires once per connection cycle — if the RTSP server is unreachable and the edge retries repeatedly, only the first failure emits an event. A new “connected” event followed by a new “disconnected” event will fire when the connection is established and then lost again.

Source: src/engine/input_rtsp.rs


SeverityMessageTrigger
warningHLS segment upload failed: {error}HTTP PUT fails for a segment
criticalHLS output failed: {error}Output task exited with error

Source: src/engine/output_hls.rs


SeverityMessageTrigger
infoWHIP publisher connectedICE+DTLS complete on WHIP server input
infoWHIP publisher disconnectedPublisher left
infoWHEP connectedWHEP client input connected to remote
infoWHEP disconnectedWHEP client input disconnected
infoWHEP viewer connectedNew WHEP viewer joined an output
infoWHEP viewer disconnectedWHEP viewer left an output
warningWebRTC session failed: {error}ICE failure, DTLS error, or session creation error
warningWebRTC session creation failed: {error}Output session could not be created

Source: src/engine/input_webrtc.rs, src/engine/output_webrtc.rs


ffmpeg-sidecar audio encoder lifecycle for the Phase B compressed-audio egress on RTMP, HLS, and WebRTC outputs. Emitted by the per-output build helpers (output_rtmp::build_encoder_state, output_hls::hls_output_loop startup gate, output_webrtc::build_webrtc_encoder_state) and the long-running encoder supervisor (engine::audio_encode::supervisor_loop).

SeverityMessageTrigger
infoaudio encoder started: output ‘{id}’ codec={codec} {N} kbpsFirst successful ffmpeg spawn for an RTMP/WebRTC output, or HLS startup with audio_encode set. Details payload: { output_id, codec, bitrate_kbps, sample_rate, channels }
warningaudio encoder restarted: output ‘{id}’ restart {N}/{max}Supervisor restarted ffmpeg after a crash. Details payload: { output_id, restart_count, max_restarts }
warningHLS output ‘{id}’: segment {n} remux failed: {error}Per-segment ffmpeg fork failed on a single HLS segment. The next segment may succeed
criticalaudio encoder failed: output ‘{id}’ exhausted {N} restarts in {S} sSupervisor gave up after MAX_RESTARTS in RESTART_WINDOW
criticalRTMP/HLS/WebRTC output ‘{id}’: audio_encode requires ffmpeg in PATH but it is not installedffmpeg missing at lazy-build time
criticaloutput ‘{id}’: audio_encode requires AAC-LC input … got profile={p}Phase A AacDecoder rejected the source AAC profile (HE-AAC, AAC-Main, multichannel, etc.)
criticaloutput ‘{id}’: audio_encode is set but the flow input cannot carry TS audio (PCM-only source)compressed_audio_input is false (e.g. ST 2110-30, rtp_audio input)
criticaloutput ‘{id}’: audio_encode encoder spawn failed: {error}AudioEncoder::spawn failed for any other reason (codec rejected by ffmpeg, etc.)

Source: src/engine/audio_encode.rs, src/engine/output_rtmp.rs, src/engine/output_hls.rs, src/engine/output_webrtc.rs.


SeverityMessageTrigger
infoTunnel ‘{name}’ startedTunnel created and connecting
infoTunnel ‘{name}’ stoppedTunnel stopped by command or config change
infoTunnel connected to relayQUIC connection established and TunnelReady received
warningTunnel disconnected from relay: {reason}QUIC connection lost or forwarder exited
warningTunnel peer disconnected: {reason}Relay reported peer unbound (TunnelDown)
warningTunnel connection to relay failed: {error}QUIC connect or TLS error
criticalTunnel ‘{name}’ failed: {error}Tunnel task exited with fatal error
criticalTunnel bind rejected by relay: {reason}HMAC bind token verification failed

Source: src/tunnel/manager.rs, src/tunnel/relay_client.rs


SeverityMessageTrigger
infoConnected to managerWebSocket auth succeeded
warningManager connection lost, reconnectingWebSocket closed or errored
criticalManager authentication failed: {reason}Auth rejected by manager

Source: src/manager/client.rs


SeverityMessageTrigger
infoConfiguration updatedConfig applied via manager command
warningFailed to persist configuration: {error}Config write to disk failed after update

Source: src/manager/client.rs


Errors generated while bringing up or hot-swapping an assembled flow (assembly.kind = spts | mpts). All pid_bus_* codes are emitted as Critical events with a structured details payload (error_code, input_id, input_type, program_number, …) and the same error_code rides on the corresponding command_ack.error_code — so the manager UI can highlight the offending field on Create/Update modals without parsing the error string. See Flow Assembly (PID Bus).

Error codeTriggerDetails payloadRemediation
pid_bus_spts_input_needs_audio_encodeA referenced input could produce TS via input-level audio_encode but isn’t configured.{ input_id, input_type }Set audio_encode.codec = "aac_lc" (or HE-AAC / s302m) on the input. ST 2110-31 must use s302m.
pid_bus_audio_encode_codec_not_supported_on_inputaudio_encode.codec validates but has no runtime path on the decoded-ES cache yet (today: mp2, ac3).{ input_id, input_type }First-light codecs: aac_lc, he_aac_v1, he_aac_v2, s302m. mp2 / ac3 deferred.
pid_bus_spts_non_ts_inputReferenced input has no current path to TS (e.g. ST 2110-40 ANC).{ input_id, input_type }ST 2110-40 ANC-to-TS wrapping is deferred.
pid_bus_no_programassembly.kind = spts/mpts but programs is empty.{}Should not normally reach runtime — config validation catches this earlier.
pid_bus_essence_kind_not_implementedSlotSource::Essence with a kind the resolver can’t yet satisfy.{ input_id, kind }First-light supports video and audio; subtitle / data under development.
pid_bus_essence_no_catalogueEssence slot but the named input has no PSI catalogue yet (non-TS input or ingress not warm).{ input_id }Switch to a SlotSource::Pid slot, or wait for PSI; re-try with UpdateFlowAssembly.
pid_bus_essence_no_matchEssence slot of kind X, but no matching ES found in the input’s PMT.{ input_id, kind }Check the input’s live PSI catalogue in the manager UI.
pid_bus_spts_stream_type_mismatchWarning logged when a slot’s configured stream_type doesn’t match the source PMT’s declared stream_type.{ input_id, source_pid, configured, observed }Non-fatal — the slot still forwards bytes. Fix the stream_type on the slot to match the upstream PMT.
pid_bus_hitless_leg_not_pidA SlotSource::Hitless leg is neither Pid nor Essence.{ program_number, leg: "primary" | "backup" }Nested Hitless is rejected at config-save time; this fires only if a follow-up variant slips past validation.
pid_bus_mpts_pcr_source_requiredMPTS program has no effective PCR (neither program-level pcr_source nor flow-level fallback).{ program_number }Config validation also catches this — runtime check is a belt-and-braces guard.
pid_bus_pcr_source_unresolvedConfigured pcr_source (input_id, pid) doesn’t hit any slot in its program (or an Essence-slot’s input).{ input_id, pid, program_number }Make sure the PCR PID is one of the PIDs you’re carrying into the program.

Source: src/engine/flow.rs, src/engine/ts_assembler.rs, src/engine/ts_es_hitless.rs, src/engine/input_pcm_encode.rs.


The local-display output (Linux-only, display Cargo feature) emits events under category display. Every failure event sets details.error_code for command_ack correlation, plus details.output_id so the manager UI attributes the failure to the offending output row on a multi-output flow.

EventSeverityTrigger
display_startedinfoModeset succeeded, ALSA opened (or muted), first frame queued.
display_stoppedinfoCancellation token fired. Includes lifetime frames_displayed, frames_dropped_late, audio_underruns.
display_device_unavailablecriticalKMS connector vanished mid-flow (cable unplug).
display_mode_set_failedcriticaldrmModeSetCrtc returned EINVAL / ENOSPC for the chosen resolution / refresh.
display_audio_open_failedcriticalsnd_pcm_open returned non-zero, or ALSA writei returned ENODEV mid-stream.
display_decoder_overloadwarningframes_dropped_late > 5 % over a 5-s rolling window.
display_av_driftwarning`
display_subscriber_laggedwarningbroadcast Lagged(n); rate-limited to one event / second.

Save-time command_ack.error_code values: display_device_invalid, display_audio_device_invalid, display_resolution_unsupported, display_program_not_found, display_audio_track_not_found, display_device_busy, display_decoder_overload_predicted.

Full reference: Display Output.


The replay server writer + playback emit events under category replay. All replay_* error_code values lift onto command_ack.error_code.

EventSeverityTrigger
recording_startedinfoA flow with recording.enabled = true brought up its writer.
recording_stoppedinfoWriter cancelled.
recording_start_failedcriticalDisk I/O error before the first segment landed.
clip_createdinfomark_in + mark_out produced a new clip.
clip_deletedinfoOperator removed a clip.
playback_startedinfoA replay input started serving a clip.
playback_stoppedinfoPlayback paused or cancelled.
playback_eofinfoReached the end of the clip / recording with loop_playback: false.
writer_laggedcriticalThe writer’s bounded mpsc filled — packets dropped to keep the broadcast channel non-blocking. Rate-limited to 1 per 5 s.
disk_pressurewarningRecording disk usage crossed 80 % of max_bytes. Sticky until usage falls back below 70 %.
disk_fullcriticalOut of disk space on the replay root.
index_corruptwarningindex.bin failed parse on writer init; recovery scan re-aligned to the last valid 24-byte boundary.
recovery_alertwarningCrash-recovery scan ran on writer init; .tmp/ orphans removed and / or recording.json was corrupt.
metadata_stalewarningrecording.json write failed on segment roll.
max_bytes_below_segmentwarningmax_bytes smaller than one segment — retention can’t keep usage under the cap.

Full reference: Replay and the operator-facing Replay UI.


Tier-gated content health events fired by the in-depth analysis subscribers. Tiers are configured per-flow on FlowConfig.content_analysis (lite / audio_full / video_full). All event names start with content_analysis_* and carry a structured details.error_code.

EventSeverityTriggerTier
content_analysis_scte35_pidinfoNew SCTE-35 PID observed in the PMT.lite
content_analysis_scte35_cueinfoSCTE-35 splice_insert / time_signal cue decoded. details.pts, details.cue_kind.lite
content_analysis_caption_lostwarningCEA-608 / CEA-708 caption presence dropped from active to absent for ≥ 5 s.lite
content_analysis_mdi_above_thresholdwarningMedia Delivery Index (RFC 4445) MLR or NDF exceeded the threshold.lite
content_analysis_audio_silentwarningHard mute or silence detected on a decoded audio PID for ≥ 3 s.audio_full
content_analysis_video_freezewarningYUV-SAD against previous frame indicated a freeze for ≥ 3 s.video_full

Tiers are recommended for monitor-only deployments — a flow with output_ids: [] and one or more analysis tiers on is the canonical shape for remote-site broadcast triage.


Every runtime bind site on the edge emits a Critical event under either port_conflict (EADDRINUSE) or bind_failed (any other bind error) with a structured details = { error_code, component, addr, protocol, error }. The same error_code rides on command_ack.error_code, so the manager UI highlights the offending field on Create / Update modals without parsing the error string.

EventSeverityTrigger
port_conflictcriticalBind attempt returned EADDRINUSE. Common at udp_input, srt_input, rtsp_server, rtmp_server, whip_server, udp_output, standby listeners.
bind_failedcriticalAny other bind error (permission, address-family mismatch, interface down).

The manager preflights inputs and outputs against the node’s already-managed entities and rejects collisions with HTTP 422 + error_code: "port_conflict" before any WS round-trip — so most operator-visible failures surface as a save-time error rather than a runtime event.


CPU and RAM threshold events. Configured under resource_limits in config.json — see Resource Limits. Only emitted when resource_limits is set; the configurable grace_period_secs debounces flapping.

EventSeverityTrigger
system_resources_cpu_warningwarningCPU usage crossed cpu_warning_percent for ≥ grace_period_secs.
system_resources_cpu_criticalcriticalCPU usage crossed cpu_critical_percent for ≥ grace_period_secs. With critical_action: "gate_flows", new flow creation is rejected while in this state.
system_resources_ram_warningwarningRAM usage crossed ram_warning_percent.
system_resources_ram_criticalcriticalRAM usage crossed ram_critical_percent.
system_resources_recoveredinfoReturned below the warning threshold.

Distinct from the edge’s static HealthPayload.resource_budget snapshot — that’s a one-shot hardware-capability advertisement at startup.


Multi-path bonding stack events on the bonded input / output type. Drives the per-leg link indicators in the manager UI.

EventSeverityTrigger
bonded_path_upinfoA bonded path adapter completed handshake.
bonded_path_downwarningHandshake heartbeat timed out on a bonded leg.
bonded_all_paths_downcriticalEvery bonded leg has lost its peer — flow is down.
bonded_path_throughput_degradedwarningA leg’s effective throughput dropped below the configured floor.

Full reference: Bonding.


In addition to events sent by the edge, the manager itself generates events under several categories — connection, config_sync, routine, media_library, replay-watchdog, and rate-limiting:

SeverityCategoryMessageTrigger
infoconnectionNode connected to managerEdge successfully authenticates
warningcompatibilityNode WS protocol version differsProtocol version mismatch during auth
criticalconnectionNode disconnected from managerEdge WebSocket closes
warningconfig_syncDrift detected on managed entityReconciliation found a mismatch between manager DB and the edge’s reported config
warningroutineroutine_fire_partialSome actions in a fire failed
criticalroutineroutine_fire_failedEvery action in a fire failed
warningroutineroutine_fire_missedA scheduled fire was older than the 15-minute grace window and didn’t replay
warningmedia_librarymedia_quota_exhaustedA media-library upload would exceed the per-file or per-node total cap
warningmedia_librarymedia_deleted_in_useAn operator deleted a media file that still has media_player inputs referencing it
infomedia_librarymedia_upload_abortedAn upload was cancelled mid-stream after at least 50 % had transferred
criticalreplay-watchdogrecording_stalledA flow’s recording writer hasn’t produced a new segment for > 2 × segment_seconds
warningevent_rate_limitevent_rate_limit_exceededAn edge tripped the per-node event-rate limit and the manager started shedding events

These are generated server-side in bilbycast-manager/crates/manager-server/src/ws/node_hub.rs and the matching reconciliation / routine / media-library / replay-watchdog modules.


CategoryCountDescription
flow18Flow lifecycle (start/stop/fail, output add/remove) + PID bus / Flow Assembly errors
bandwidth4Per-flow bandwidth monitoring (alarm, block, recovery)
srt9SRT input and output connection state
redundancy3SMPTE 2022-7 dual-leg status
rtmp3RTMP publisher connections
rtsp2RTSP input state
hls2HLS output failures
webrtc8WHIP/WHEP session lifecycle
audio_encode7ffmpeg-sidecar audio encoder lifecycle (Phase B)
tunnel8Tunnel connection state
manager3Manager WebSocket connection
config2Configuration changes
ptpSMPTE ST 2110 PTP slave clock state changes (Phase 1)
network_legSMPTE 2022-7 Red/Blue per-leg loss / recovery (Phase 1)
nmosNMOS IS-04 / IS-05 / IS-08 controller activity (Phase 1)
scte104SCTE-104 splice events parsed from ST 2110-40 ANC (Phase 1)
Total69
CategoryTypical severityTriggers
ptpinfo / warning / criticalptp_lock_acquired, ptp_lock_lost, ptp_holdover, ptp_unavailable
network_legwarning / criticalred_leg_lost, blue_leg_lost, leg_recovered, both_legs_lost
nmosinfoNMOS controller IS-05 activations, IS-08 channel-map changes
scte104infoCue-out / cue-in / cancel splice messages parsed from ANC

The four categories are declared up-front in src/manager/events.rs so the manager UI’s category icons and filter dropdown render them as soon as the first event arrives. Producers wire them in step 9 of the Phase 1 plan.

SeverityCountDescription
critical26Service-impacting: flow/tunnel failures, auth rejection, both legs lost, bandwidth block, audio_encode build/restart-cap failures, PID-bus bring-up errors
warning21Degradation: disconnects, stale connections, upload failures, reconnects, bandwidth exceeded, audio_encode restart / per-segment HLS remux failure, PID-bus stream_type mismatch
info23State changes: connections established, flows started, config updated, bandwidth recovery, audio_encode started