Why a State Machine
Autonomous flight is a safety-critical system. Every possible condition — low battery, GPS dropout, wind gust, detection event, operator command — must map to a known state with a defined response. No ambiguity. No undefined behaviour. No conditional branches that lead to "I don't know what to do."
A finite state machine (FSM) provides this guarantee. The drone is always in exactly one of seven states, and every transition between states is explicit and deterministic. The set of valid transitions is constrained — not every state can reach every other state — which prevents pathological sequences like jumping from landing back to takeoff or from idle directly into emergency without passing through any active flight state.
We chose an FSM over a behaviour tree or planner architecture for one reason: auditability. After a mission, the flight log contains a linear sequence of states and transitions with timestamps and triggers. There is no tree traversal to reconstruct, no planner goal stack to unwind. The operator can read the log and know exactly what happened, in what order, and why. When things go wrong — and in field SAR operations, they will — this determinism is the difference between a useful incident report and a shrug.
The Overwatch Core flight supervisor implements this FSM as the top-level control loop on the drone. Every other subsystem — navigation, detection, telemetry — operates within the context of the current state. The supervisor decides what the drone does. Everything else decides how.
The Seven States
Idle. On the ground, motors off, awaiting a launch command. The supervisor runs pre-flight validation: mission data integrity (waypoint list non-empty, altitudes within range), battery state-of-charge above the launch threshold, GPS fix quality sufficient for home position lock, and communication link active. If any check fails, the drone stays in Idle and reports the specific failure to the operator. No silent failures. The drone does not launch until every pre-flight condition is satisfied.
Takeoff. Vertical climb to the configured mission altitude. The supervisor monitors climb rate, altitude convergence, and GPS/VIO position stability during ascent. Transition to Grid Search occurs when the drone reaches mission altitude and holds position within tolerance for 3 seconds — a stabilisation window that confirms the aircraft is under control and not oscillating. If takeoff fails (motor fault, insufficient thrust, wind gust exceeding the climb envelope), the supervisor transitions directly to Emergency. There is no "retry takeoff" state. A failed takeoff is a serious event; the correct response is to get on the ground and diagnose.
Grid Search. The primary operational state. The drone executes the boustrophedon sweep pattern — following the sequence of relative waypoints generated by the mission planner. Each waypoint is a {dx, dy, dz, dpsi} displacement executed via moveBy commands. Simultaneously, the detection pipeline runs inference on every camera frame. The supervisor monitors all safety constraints continuously during this state: battery SOC, GPS fix quality, VIO drift magnitude, wind speed, geofence distance, and communication link status. Any constraint violation triggers the appropriate transition. Grid Search is where the drone spends most of its flight time, and it is where most transitions originate.
Inspect. Triggered when the detection model reports a confidence score above threshold — currently set at 0.6. The drone pauses its sweep, descends or orbits the detection location to capture additional frames from multiple angles, geotags the detection with GPS coordinates and timestamp, and publishes an alert to the operator. The inspection duration is configurable (default 15 seconds). After inspection completes, the drone resumes the sweep from exactly where it paused. No waypoints are skipped. The sweep index is preserved across the Inspect detour, so coverage continuity is maintained.
RTH (Return to Home). Triggered by three conditions: battery SOC reaching 30% (the RTH threshold), mission complete (all waypoints executed), or explicit operator command. The drone abandons the current sweep, climbs to a safe transit altitude above the search grid, and flies directly to the takeoff coordinates using GPS navigation. RTH is an orderly return — the drone has sufficient battery margin for the transit, plus reserves for headwind, navigation error, and landing approach. This is not an emergency. The drone is coming home because it should, not because it must.
Landing. Final descent at the home position. Controlled vertical descent at a fixed rate, motor shutdown on ground contact detection. After touchdown, the supervisor logs the mission summary: waypoints completed out of total, detections made, battery consumed, total flight time, maximum VIO drift observed during the mission, and peak wind speed encountered. This data feeds post-mission review and helps calibrate future mission planning — if a 500m x 500m area consumed 65% battery in 12 m/s wind, the operator knows to plan smaller grids in similar conditions.
Emergency. The state of last resort. Triggered by any of: battery SOC below 15% (critical threshold), VIO drift exceeding 5m (position estimate unreliable), geofence breach (drone position more than 500m from takeoff), sustained wind exceeding 8 m/s, or communication loss exceeding the configurable timeout while battery is below the RTH threshold. In Emergency, the drone lands immediately at its current position. No transit, no return home, no attempt to complete the mission. Priority is getting the aircraft on the ground safely. The failure modes post covers what happens after an emergency landing in detail.
Transition Rules
Each state has a defined set of possible outbound transitions. The transition graph is constrained — not fully connected. This is deliberate. Restricting transitions prevents pathological state sequences that would be difficult to reason about or debug.
The nominal path is: Idle → Takeoff → Grid Search → RTH → Landing. This is a successful mission with no detections, no safety events, and sufficient battery. The drone launches, searches, returns, and lands.
The detection loop is: Grid Search → Inspect → Grid Search. This can repeat any number of times during a mission. Each Inspect is a temporary detour; the drone always returns to Grid Search and resumes the sweep. If a detection triggers during the last waypoint, the drone completes the Inspect, then transitions to RTH because the mission is complete.
The safety escape is: any active flight state → Emergency. Takeoff, Grid Search, Inspect, and RTH can all transition to Emergency if a critical safety threshold is breached. Idle and Landing cannot — the drone is already on or near the ground in those states.
Some transitions that do not exist, by design: Grid Search cannot transition directly to Landing (it must go through RTH first, ensuring orderly return). Takeoff cannot transition to Inspect (detection is not active during climb). Emergency cannot transition to anything except the implicit "motors off" terminal state. Once the supervisor enters Emergency, the only exit is ground contact.
Every transition is logged with a timestamp, the originating state, the destination state, and the trigger condition. A typical mission log might read: Idle→Takeoff (launch_cmd), Takeoff→GridSearch (alt_reached), GridSearch→Inspect (det_0.73), Inspect→GridSearch (inspect_complete), GridSearch→RTH (battery_30pct), RTH→Landing (home_reached). Six transitions, each with an unambiguous cause.
The Safety Thresholds
Five constraints are monitored continuously during active flight. Each has a numeric threshold, a rationale, and a defined consequence.
Battery RTH: 30% SOC. This threshold is calculated to provide margin for the worst-case return flight. If the drone is at maximum geofence distance (500m) flying into an 8 m/s headwind with a resulting ground speed of 2 m/s, return time is 250 seconds. Add the climb to transit altitude, the descent and landing approach, and a 5-minute hover reserve for GPS acquisition or operator override. The 30% threshold covers this scenario with margin. In calm conditions close to home, 30% is generous. That is the point — the threshold is sized for the worst case, not the average case.
Battery Emergency: 15% SOC. Below 15%, the drone cannot be trusted to reach home under adverse conditions. The correct action is to land immediately, wherever the drone currently is. Losing an aircraft to a controlled off-site landing is preferable to losing it to an uncontrolled descent from altitude when the battery cuts out. The gap between 30% and 15% provides a buffer for scenarios where RTH is initiated at 30% but headwind or navigation error extends the return time.
Geofence: 500m from takeoff. A cylindrical boundary centred on the takeoff position. The supervisor checks horizontal distance continuously. If the drone drifts beyond 500m — possible in high sustained wind or with accumulated VIO drift — the Emergency transition fires. The 500m radius is configurable but defaults to a value that keeps the drone within visual line-of-sight range in most conditions and within reliable communication range of the ground station.
VIO Drift: 5m. When the difference between the VIO-estimated position and the last known GPS fix exceeds 5m, the position estimate is unreliable. Continuing a precision search with 5m+ position error is pointless — the coverage gaps between sweep lines would be larger than the sweep strip width, defeating the purpose of systematic search. At that point the drone is not searching; it is flying a pattern with unknown coverage. The supervisor triggers Emergency because an aircraft with an unreliable position estimate should not be navigating autonomously.
Wind: 8 m/s sustained. Estimated via IMU-derived ground speed versus commanded airspeed. Above 8 m/s sustained, the ANAFI UKR's ground speed and position holding degrade to the point where systematic search is unreliable. The drone can physically fly in stronger wind, but it cannot hold the tight sweep lines required for guaranteed coverage. An aircraft that drifts 3m laterally on each sweep line is creating gaps, not searching. The 8 m/s threshold is the point where search quality degrades below acceptable levels.
What the Operator Sees
The supervisor state is the single most important piece of telemetry the operator receives. The current state is displayed prominently in the operator's interface: GRID_SEARCH, RTH, EMERGENCY. Alongside the state, the display shows time in current state, the last transition and its trigger, and the current values of all monitored constraints — battery SOC, GPS fix quality, VIO drift, estimated wind speed, geofence distance.
When a transition occurs, the operator sees it immediately. If the drone transitions from Grid Search to RTH, the display shows the trigger: battery_30pct or mission_complete or operator_cmd. There is no ambiguity about why the drone stopped searching. If it transitions to Emergency, the operator sees exactly which constraint was violated: vio_drift_5.2m or wind_8.4ms or battery_14pct.
After landing, the complete mission log is available for review. Every state transition with timestamp and trigger, every detection event with confidence score and GPS coordinates, every safety constraint that approached its threshold. Post-incident analysis is deterministic. There is no ambiguity about what the drone did, when it did it, or why. The FSM guarantees this — the drone was always in exactly one state, and every transition had exactly one cause.
This transparency matters for SAR operations specifically because mission failures have real consequences. A drone that aborts a search and the operator does not know why is worse than useless — it wastes time and erodes trust in the system. A drone that aborts and reports GridSearch→Emergency (vio_drift_5.2m) gives the operator actionable information: the GPS environment in that area is poor, VIO accumulated error, the search grid needs to be re-planned with shorter legs or the mission needs to wait for better satellite geometry. The state machine does not just control the drone. It explains itself.