Fully On-Device SAR AI: What We Ship Next

SAR's AI today runs in three places: a local physics-based decision engine on the ground station (always, instant, offline); an optional cloud-refined natural-language reasoning layer (Anthropic, OpenAI, Google — whichever's available); and onboard detection inference on the drone (visual / thermal classification, always on the drone). The cloud layer is what gives SAR AI its quality of contextual reasoning — pattern recognition across missions, natural-language explanations the operator can read at 03:00, weather-forecast integration. It's also the layer that needs internet, which is exactly the layer customers in police, military, coastguard, and EU-sovereignty procurement are most uncomfortable with. Soon, that whole layer moves onto the drone. This post is about what changes.

The split today

SAR's current AI architecture is honest about what runs where:

Local decision engine (ground station, always offline) — physics-based. Battery curves, wind drag, transit power, charge times. Produces the baseline recommendation from live telemetry. Always available; never needs network.
Cloud refinement layer (optional, Anthropic → OpenAI → Google) — adds natural-language reasoning, pattern recognition, weather-forecast integration, contextual explanations the operator reads in the audit log. First successful response wins.
Onboard detection (the drone's companion computer, always on the drone) — visual and thermal classification at 5–30 FPS. Person, vessel, vehicle. Edge inference, sub-200ms latency, never streamed to the cloud.

The cloud refinement is genuinely optional — a deployment with zero API keys configured runs 100% on the local engine and the safety gate. We've shipped this from day one. Air-gapped deployments are already real today.

But there's an asterisk on the air-gapped story. Without the cloud refinement layer, the operator loses the natural-language reasoning quality. Decisions still come out — the physics engine is the load-bearing piece — but the narrative the operator reads in the audit log is thinner. Today's air-gapped mode shows decision-engine output verbatim; cloud-connected mode shows the same decisions wrapped in contextual reasoning that helps the operator understand why a swap-threshold change makes sense given today's wind pattern and yesterday's mission outcomes.

The shift coming up erases that asterisk.

What changes with on-device AI

Drone One's preferred companion-compute spec is moving to a Hailo-10H NPU with 8 GB of dedicated onboard DRAM. The architecturally important detail isn't TOPS — it's the dedicated DRAM. Without it, large-model inference would compete for the Pi's system RAM and starve the camera pipeline, the MAVLink bridge, the alert engine. With it, the NPU becomes a self-contained AI co-processor that doesn't fight for system resources.

What that compute unlocks: the natural-language reasoning layer of SAR AI moves from the cloud onto the drone. Not every layer — detection inference still runs on the drone's main compute (it always has, it always will). But the LLM-quality reasoning layer that today requires a Claude / GPT / Gemini API call now runs locally on the same airframe.

Five concrete LLM/VLM use cases, all running on-device:

1. Pre-flight health summariser. The Pre-Flight Health Panel reads eight live MAVLink checks (battery percentage, cell balance, GPS fix, compass, failsafe config, geofence, telemetry heartbeat, Remote ID) plus the current wind and 1-hour forecast. The on-device LLM reasons across those signals and produces structured JSON output the operator reads as: "Hold launch — drone-2 battery 22%, swap before launch; forecast gust 11.3 m/s within 1 h, abort risk." Today this is a Claude API call. With on-device AI, this runs on the drone's NPU and the operator sees the same output with a LOCAL-DEVICE source badge instead of CLAUDE.

2. In-flight relay reasoner. Every 30 seconds during a fleet patrol, SAR AI reads fleet telemetry — battery levels, positions, weather, detection density, coverage state — and produces optimised relay parameters: swap threshold, relay overlap, transit speed multiplier, cruise speed, preferred standby drone for the next handoff. The physics-based decision engine produces the recommendation; today's cloud layer adds the reasoning narrative. With on-device AI, the reasoning narrative runs locally — same 30-second cycle, same parameters, same audit log, no internet round-trip.

3. Per-relay-handoff debrief. New capability. After each successful handoff, the on-device AI produces a structured summary of the just-completed sortie: drone-id, sortie duration, battery consumption, detections observed, anomalies encountered, handoff quality. Per-relay debriefs are short and machine-parseable; they accumulate over a 24-hour patrol and feed the end-of-mission report below. Today this doesn't exist as a feature — adding it is part of the on-device AI capability uplift.

4. End-of-mission debrief. Also new. At mission end, the on-device AI produces a structured cumulative report: total flight hours, sortie count, detections observed (with VLM-generated descriptions), anomalies, handoff quality across the mission, operator override events, weather correlation analysis, and recommendations for the next deployment of the same fleet at the same AOI. The operator reads this in the after-action review; the audit-log version is machine-parseable for evidential review and procurement reporting.

5. Detection-confirmation VLM. When the onboard detection model fires (SSD MobileNet flags a person, vessel, or vehicle), the captured frame is passed through a Vision-Language Model that produces a natural-language description of what's actually in the image: "Figure in red jacket waving from rocky terrain, partially obscured by trees, approximately 5 m below the drone, posture suggests distress." That's actionable in a way a confidence score isn't. Operators dispatching ground crews want to know what kind of person — adult / child / multiple individuals / posture suggesting injury — not just that something is 87% likely to be a person.

The structured-JSON discipline

Every LLM and VLM output across all five use cases follows a single discipline: the model emits structured JSON; the operator UI renders it as natural-language prose for the operator's eyes.

Not because operators want to read JSON. Because:

Audit log. Every recommendation is parseable, versioned, comparable across missions.
Schema evolution. We can add fields without retraining a model that learnt to emit a particular paragraph format.
Translation. The operator UI renders any locale — JSON is locale-neutral.
Determinism. Schema compliance is automatable; we can evaluate model output for "did it produce the required fields" without subjective text-quality assessment.
Component swappability. When the NPU SDK adds support for a newer / better model, the JSON contract stays the same — the model file changes, the pipeline doesn't.

That last one matters more than it looks. The model layer is intentionally hot-swappable. We don't hardcode any specific model name in the inference pipeline; there's a LocalLlmAdapter interface, and the model behind it is a configuration choice that can change as the model zoo evolves. Today's choice may not be next quarter's choice; the integration cost of an upgrade is bounded.

Three execution modes

The on-device AI stack ships with three feature-flagged execution modes:

Default — on-device, no fallback. The drone runs the LLM and VLM locally; the ground station displays the result; no cloud calls. Fully air-gapped capable. No silent cloud leak. If on-device AI fails (latency spike, NPU error), the operator sees "on-device AI unavailable" and the local decision-engine output stands in. They are not silently exfiltrated to a cloud API. This is the strong sovereignty default.
Feature flag #1 — on-device with ground-station fallback. The drone runs locally; if local AI fails or returns low confidence, the ground-station cloud AI takes over. Hybrid mode for resilience over isolation. Useful when the operator values the contextual reasoning over the air-gapped guarantee.
Feature flag #2 — ground-station replaces on-device. Equivalent to today's product. All AI on the ground station; the drone runs detection only. Useful for V1-spec hardware (no Hailo-10H NPU on-board), for centralised-AI deployments where one operator's command post runs the AI for many drones, or while on-device features are still maturing.

The default is sovereignty. Everything else is opt-in.

Detection inference: always on the drone

This part doesn't change. The on-drone detection pipeline (SSD MobileNet for person / vessel / vehicle classification) has always been on the drone, and stays on the drone, regardless of the AI mode. The reasons are operational, not philosophical: latency (sub-200 ms), bandwidth (a long-range datalink can't carry HD video to the ground for cloud inference), and degraded-network resilience (loss of the C2 link doesn't blind the detection pipeline).

Ground-station validation runs after the on-drone detection: dedupe across multiple detections of the same target, false-positive flagging, escalation routing to the operator. The ground station never re-runs the detection itself — it's the validation layer, not the inference layer. The on-device VLM described in use case 5 above adds natural-language description of detections; it doesn't re-classify them.

When this ships

Today, customer kits ship with the V1 onboard AI hardware — vision-only, ground-station / cloud AI for natural-language reasoning. That's been field-validated through workshop development and is the right risk profile for first paid pilots.

The V2 hardware is what we're testing in the workshop now. It's the architectural lift that makes everything in this post real for shipped customers — once the on-device LLM features are integrated, the workshop fleet has logged 6+ months of field-equivalent flight time on V2, and we have side-by-side quality comparisons against today's cloud layer.

The marketing posture in the meantime: today's air-gapped mode runs the decision engine and the safety gate fully offline; tomorrow's air-gapped mode runs the entire SAR AI stack — including the natural-language reasoning layer — fully on the drone. Same product, same software-portable autonomy story, an architectural upgrade to the AI tier that lands when it's earned.

Where this fits

The on-device AI shift is the load-bearing capability for three customer segments who previously needed asterisks:

Police / forensic-evidence environments. Every AI recommendation lives in an evidential audit trail with no cloud-residency question. The audit log is parseable structured JSON; chain-of-custody is intact from on-drone inference through operator action.
Military / contested-RF deployments. No cloud dependency means no upstream link to deny. The autonomy stack runs the same way under a hostile spectrum environment as it does in good connectivity.
EU-sovereign procurement. Combined with the medium-term Drone One v2 chassis migration toward European-origin components, the on-device AI tier removes the last "but the AI calls a US cloud" objection. Procurement language can read "all autonomous decision-making runs on the airframe; no external-jurisdiction infrastructure is invoked during a mission" — truthfully.

For civilian SAR / coastguard / fire customers in normal connectivity environments, the change is invisible — except faster (no internet round-trip) and free (no per-API-call cost). For procurement-constrained customers, the change is the differentiator.

For first-pilot evaluation today on V1-spec hardware, the pricing page reflects what ships now: ground-station AI plus on-drone detection. For the forward roadmap, what you've just read is what we're building.

If you want to talk through whether on-device AI changes your procurement conversation — whether you're sovereignty-bound, contested-RF-bound, or just allergic to ongoing cloud-API costs — get in touch. The hardware is being validated now. The software ships through 2026.

Related reading: Meet SAR AI — the current architecture this builds on; Air-Gapped Drone Operations — today's offline mode and what tomorrow's adds; Three Layers of Safety — the safety architecture that runs underneath all of this; Building Onboard Human Detection — the detection pipeline that always stays on the drone.