Three Layers of Safety: How Overwatch Stops a Drone

Every operator safety policy we've been audited against asks the same question in different words: "If the drone starts doing something you don't want, how do you stop it?" A one-layer answer is a failed audit. Overwatch has three layers, and they don't depend on each other.

Why a single kill switch isn't enough

A ground-station kill button is only as reliable as the radio link between the ground station and the drone. That link can fail for reasons the operator can't predict or control: the drone flies out of hotspot range, the ESP32 telemetry module loses association, the MacBook hotspot process crashes, there's RF interference from a passing vehicle, the operator accidentally closes their laptop lid. If the button is the only stop, and the link is down, you have no stop.

An onboard failsafe (the drone landing itself on datalink loss) is only as reliable as the param configuration on the drone. If the wrong parameter is set, or if the autopilot firmware has a bug in the failsafe path, the drone won't react. If the drone is in a flight mode that suppresses failsafes, they don't fire.

An RC transmitter kill is only as reliable as the RF link between the transmitter and the drone, and the operator's ability to reach the transmitter in time. If the RC radio is off, out of range, or the operator dropped it, you have no kill.

No single layer is reliable enough. Three independent layers means that the probability of all three failing simultaneously on the same mission is far smaller than the probability of any single one failing — and even if one fails, the other two are still enforcing. That's the compound reliability we need for operations where the consequence of a runaway drone is measured in injuries, deaths, or international incidents.

Layer 1 — Ground Station UI

The Overwatch top bar shows two buttons whenever any drone is airborne. They're always visible; they're impossible to miss if you need them.

EMERGENCY RTH (yellow border). One click, a confirmation dialog, and every airborne drone switches to RTL mode and returns to base under its own power. Use this for recoverable situations — weather deteriorating, airspace conflict appearing, operator lost visual, general mission abort. The drones come home safely.

KILL (red border, gentle pulse). Opens a type-to-confirm modal. The operator types KILL into the input box, the EXECUTE button enables, one click and motors cut on every drone instantly. Any airborne drone falls. This is the last-resort button. Use it only when a drone is uncontrollable and flying toward a hazard that RTH cannot avoid in time. The type-to-confirm pattern makes accidental activation nearly impossible while keeping intentional activation fast under stress.

Both have keyboard shortcuts for operators who prefer them: Cmd+Shift+R for Emergency RTH, Cmd+Shift+K for the kill modal. Esc closes the modal without executing.

On MAVLink hardware, the buttons send real COMMAND_LONG messages to every connected drone: MAV_CMD_NAV_RETURN_TO_LAUNCH for RTH and MAV_CMD_COMPONENT_ARM_DISARM with the force-disarm flag (param2 = 21196) for kill. On AirSDK the buttons are not available because AirSDK drones don't have a live command channel from the ground station.

Every press is audit-logged with operator ID, timestamp, affected drones, and the reason code. If an operator pressed it, we know exactly when and why.

Layer 2 — PX4 Onboard Failsafes

Layer 1 only works while the drone has a telemetry link to the ground station. Layer 2 doesn't need one. These are PX4 parameters configured at commissioning time; they run on the Pixhawk with zero ground dependency.

NAV_DLL_ACT = 3 — on MAVLink datalink loss, enter RTL mode. Timeout is 5 seconds (COM_DL_LOSS_T) so a brief WiFi hiccup doesn't trigger it, but a real disconnection does.
NAV_RCL_ACT = 3 — on RC signal loss, enter RTL mode. Timeout 0.5 seconds (COM_RC_LOSS_T) — fast because an RC dropout is usually a sign of an RF problem that could escalate.
COM_LOW_BAT_ACT = 1, COM_BAT_ACT = 3 — land in place at the battery critical threshold. The drone won't try to fly home if it doesn't have the energy to get there; it commits to a controlled landing at its current position.
GF_ACTION = 3 — on geofence breach, return to launch. The breach event triggers immediately; no grace period for the drone to wander further.

These fire automatically. The operator doesn't press anything. If the ground station crashes, the hotspot drops, the MacBook freezes — any of those conditions trips Layer 2 within seconds and the drone starts coming home on its own.

Verification of Layer 2 is part of the mandatory first-flight procedure documented in PREFLIGHT_CHECKLIST.md: tether the drone, lift off to 10 metres, disable the MacBook hotspot, and watch the drone auto-RTL within 5–6 seconds with no operator action. Every drone does this once before it ever flies an operational mission.

Layer 3 — RC Transmitter Kill Switch

Layer 3 is hardware. The RadioMaster Pocket transmitter has a dedicated kill switch on the front. Flipping it sends a force-disarm signal via ELRS to the drone, regardless of:

Whether Overwatch is running, crashed, or never started
Whether MAVLink is connected to the drone
Whether the drone is in autonomous mode or not
Whether the internet is up, the cellular is up, the hotspot is up

The only thing Layer 3 needs is the operator's hand on the transmitter and a working RF link between the transmitter and the drone. For most operations that RF link is the most reliable radio in the stack — 2.4 GHz ELRS with sub-microsecond latency and sub-metre reliability at several hundred metres.

Verification is also in the pre-flight procedure: on the ground with motors armed and spinning at idle, the operator flips the switch; motors stop within 0.5 seconds. Every drone, every commissioning, and operators are encouraged to drill it until the switch throw is muscle memory — tape over neighbouring switches so they can't hit the wrong one under stress.

Why three and not two or four

Two layers (GS kill + onboard failsafe) covers most scenarios but leaves a gap: a drone in a normal flight mode, with comms up, but behaving unpredictably (firmware bug, sensor glitch, motor failure producing wrong IMU readings). Neither the GS kill nor the onboard failsafe fires; the operator needs a hardware-level authority. Layer 3 closes that gap.

A fourth layer is theoretically possible — a hardware kill wired into the battery harness, say, that cuts power at the BEC regardless of the autopilot's state. We don't currently need it; the X500 V2 reference hardware has no such wiring, and we haven't seen a failure mode on PX4 that three layers don't catch. If that changes, we'll add it.

Drill the procedure monthly

The layers only work if operators can activate them under stress. The pre-flight checklist recommends a monthly drill per operator: run a simulator session, trigger each layer, time the response. Goal: any layer activatable within 2 seconds of recognition. Operators who can't do that get re-trained before they fly real hardware.

Gov and defence procurement reviewers tend to ask for the drill cadence specifically — "how often do your operators practice emergency procedures?" The answer "monthly, with timestamped sim logs" is a different answer from "whenever they feel like it". We bake the cadence into the checklist so customers can adopt it without inventing one themselves.

What it doesn't do

Three layers of safety do not replace good mission planning, good airmanship, or good judgement about when not to fly. Every layer is a failure-response mechanism, not a prevention mechanism. The Pre-Flight Health Panel (eight automated checks + AI-assisted go/no-go) is the prevention layer — it stops missions that shouldn't be launched in the first place. The three kill layers are what catches the ones that do launch and go wrong.

The procurement question isn't just "how do you stop it?" It's "how many independent ways can you stop it, and have you tested them recently?" For Overwatch, the answer is three and yes.