Shield AI: Software-Defined Autonomy for Defense Aviation
Shield AI occupies a position in the defense technology landscape that is simultaneously commercially ambitious and technically precarious. The San Diego-based company has built Hivemind, an autonomy stack designed to fly military aircraft—including the F-16 Fighting Falcon and the V-BAT vertical-takeoff unmanned aircraft—without GPS, without communications, and without a human in the loop. It has flown. It has performed dogfighting maneuvers in DARPA’s AlphaDogfight Trials. It has operated in denied-environment exercises that would ground any system dependent on datalinks.
What Shield AI has not done—what no one has done—is certify a machine-learning-driven autonomy stack under a mature, binding airworthiness framework that regulators and program offices will consistently accept. That gap is where the real systems engineering story lives.
The Certification Problem Is Architectural, Not Procedural
DO-178C, the software standard that governs airborne software across civil and most military programs, was built on a foundational assumption: given the same inputs, software produces the same outputs. Verification means demonstrating that assumption holds across the enumerated requirements. For avionics software running deterministic control laws, DO-178C works well. For a neural network trained on simulated air combat, it does not map cleanly.
The issue is not that DO-178C is ignorant of the problem. RTCA and EUROCAE have been working on DO-178C supplements and companion documents addressing machine learning—most visibly through the EASA concept paper on AI trustworthiness and the emerging RTCA SC-228 and SC-147 work on unmanned systems. But no binding ML-specific airworthiness standard had cleared the regulatory pipeline as of early 2026. Programs like Hivemind operate under Special Conditions, bespoke Letters of Authorization, or test-and-evaluation frameworks negotiated program by program with AFMC, NAVAIR, or the relevant airworthiness authority.
This places Shield AI’s systems engineers in an unusual position. They must simultaneously develop the autonomy software and develop the evidence framework that will be used to evaluate it—arguing not just that their system is safe, but defining what safety evidence for a learned system should look like. That is a significant systems engineering burden on top of what is already a hard software development problem.
Hivemind’s Architecture and the Requirements Problem
Hivemind is, at its core, a hierarchical decision-making system. At the top layer sit mission-level objectives—reach this waypoint, prosecute this threat, survive this threat. Below that are tactical behaviors—maneuver, sensor management, weapon employment logic. At the lowest layer, tight control loops execute against the host aircraft’s flight control computers. The architecture is intentional: it creates natural decomposition boundaries that map onto the certification problem.
The upper layers, where learned policy drives behavior, carry the highest verification uncertainty. The lower layers, which translate Hivemind’s outputs into actuator commands through well-defined interfaces with host avionics, can be bounded and verified more conventionally.
This is the right architectural instinct. The systems engineering challenge is that requirements for the upper layers cannot be written in the same way as requirements for a flight management computer. A requirement that says “the system shall maintain safe separation from terrain” is straightforward to verify when separation is a deterministic control law. When separation behavior emerges from a trained policy that has seen millions of simulated scenarios, the requirement is harder to bound, harder to decompose, and harder to verify against specific test cases.
Shield AI’s published technical work and patent filings suggest their approach involves specifying behavioral requirements in terms of operational envelopes—bounded state-space regions within which the system must operate—and then demonstrating through a combination of simulation and flight test that the trained policy stays within those envelopes across a statistical sample of scenarios. This is closer to how the automotive industry approaches neural network verification (per ISO 21448 and SOTIF) than to how traditional avionics software is verified. Whether defense airworthiness authorities accept that evidence model consistently is an open question that Shield AI is actively working to answer through engagement with DARPA, Air Force Research Laboratory, and DoD’s Chief Digital and Artificial Intelligence Office.
Interface Requirements: The Boundary Between Hivemind and the Host Aircraft
The integration problem between Hivemind and a legacy platform like the F-16 is a canonical example of what systems engineers call the “integration seam”—and it is where the most tractable engineering work happens.
The F-16’s core avionics—the fire control computer, the flight control system, the mission computer—were designed decades before anyone contemplated an AI pilot. They communicate through defined interfaces: MIL-STD-1553 buses, ARINC 429 links, and platform-specific discrete signals. Hivemind cannot reach into the flight control system and rewrite guidance laws. It can only issue commands through the defined command interfaces that a human pilot would otherwise exercise—or through autopilot engagement interfaces designed for conventional guidance modes.
This is both a constraint and a blessing. It means Hivemind must respect the interface contract of the host platform, which bounds what the autonomy layer can actually do. Requirements for the Hivemind-to-avionics interface can be specified with relative precision: latency bounds, command format, mode management, failure behavior when Hivemind outputs are out of bounds or unavailable. This layer can be verified using conventional software testing, hardware-in-the-loop simulation, and ultimately flight test—approaches that existing airworthiness frameworks accommodate.
The harder interface requirements are behavioral: what does the host aircraft expect from an autonomous pilot in terms of mode discipline, emergency response, and graceful degradation? The F-16’s flight control system has its own envelope protection logic. If Hivemind commands a maneuver that envelope protection rejects, what happens? How is that rejection surfaced back to the autonomy layer? How does Hivemind adapt? These are requirement gaps that emerge at the boundary between two systems designed by different organizations at different times for different purposes, and closing them demands disciplined systems engineering that goes beyond software integration.
Shield AI’s work on V-BAT, which it acquired from Martin UAV in 2022, presents a somewhat different problem. V-BAT is a purpose-built unmanned platform with modern avionics designed to accept autonomous commands. Interface requirements can be specified from the ground up in a coordinated design process. The F-16 program, by contrast, is retrofitting autonomy onto an existing certified aircraft—a more complex systems engineering problem by almost every measure.
Engagement with the FAA and DoD on AI Airworthiness
The regulatory engagement picture for Shield AI spans two institutions with different mandates and different timelines.
The FAA’s jurisdiction over autonomous military aircraft is limited, but the agency’s work on AI airworthiness is relevant as a reference framework and because some Shield AI platforms may eventually operate in civil airspace. The FAA’s 2024 framework document on AI in aviation identified five properties for trustworthy aviation AI: explainability, reliability, safety, security, and alignment. These map imperfectly onto engineering requirements, but they signal the direction of regulatory thinking. Shield AI’s technical approach—particularly its emphasis on operational envelopes and statistical evidence—aligns reasonably well with the reliability and safety properties, but explainability remains a genuine tension. Neural network policies do not explain their decisions in the sense an airworthiness authority would find reassuring.
The DoD pathway is more immediate. The CDAO’s Responsible AI framework, combined with platform-specific airworthiness processes at AFMC and NAVAIR, creates the actual approval pathway for Hivemind on military platforms. The DoD’s approach has been more pragmatic than the FAA’s—risk-based, operationally bounded, with heavy reliance on flight test evidence rather than analytical verification. This suits Shield AI’s current capabilities better than a pure DO-178C approach would, but it also means each new platform integration requires a fresh negotiation rather than demonstrating compliance to a stable, accepted standard.
That negotiation overhead is a real cost—for Shield AI, for their government customers, and for program schedules. It also creates an uneven playing field: organizations with established relationships with airworthiness authorities and existing flight test infrastructure can navigate the bespoke certification process more efficiently than new entrants.
What Shield AI Gets Right, and What Remains Open
Shield AI’s engineering discipline around the decomposition of autonomy into certifiable layers is substantively correct. The instinct to push machine-learned behavior up the architectural stack and maintain conventional, verifiable interfaces at the actuator layer is sound. Their investment in simulation-based evidence generation—reportedly at the scale of billions of simulated flight hours—is the only approach that can produce the statistical coverage that a behavioral requirements verification strategy demands.
The gap is on the requirements side. Specifying what an autonomous military aircraft must do, across the full range of scenarios it may encounter, in language precise enough to drive verification, is not a solved problem. Traditional requirements tools—hierarchical textual requirements in document-based systems—are poorly suited to capturing behavioral requirements for learned systems. The dimensionality of the behavioral space is too high, the interdependencies between behaviors are too complex, and the connection between a written requirement and a trained policy’s behavior is not direct in the way a software requirement maps to a function.
This is where the systems engineering toolchain matters as much as the engineering judgment. Teams working on this class of problem are increasingly moving toward graph-based models that can represent behavioral requirements as interconnected nodes with explicit dependencies, interface contracts, and verification methods—capturing not just what the system must do but the logical structure of why, and how those behaviors relate to the operational envelope commitments that underpin the airworthiness argument. Tools built natively around that model, like Flow Engineering, are starting to appear in defense programs precisely because document-based requirements management cannot represent these relationships with enough fidelity to be useful at scale.
Honest Assessment
Shield AI is doing genuinely hard engineering in a space where the regulatory infrastructure has not caught up with the technical capability. Their architectural approach is defensible. Their simulation-based evidence strategy is appropriate given the constraints. Their flight test record demonstrates real-world capability.
What the company cannot yet claim—what no organization can claim—is a repeatable, standardized certification pathway for AI-driven autonomous flight software. Every program is a negotiation. Every airworthiness approval is a custom artifact. Until binding standards emerge from RTCA, EASA, or DoD policy, that will remain true.
The systems engineering burden that imposes on organizations building autonomous aviation systems is significant. It means the engineering discipline around requirements—how behavior is specified, how interfaces are defined, how verification evidence is structured—is not just a process concern. It is a competitive differentiator. Organizations that can produce coherent, traceable, technically credible requirements for autonomous behavior will move through certification faster than those that cannot.
Shield AI has demonstrated that autonomous military flight is technically feasible. The remaining question—how you certify it in a way that scales—is fundamentally a systems engineering question. The answer will come from the intersection of better standards, better tooling, and the hard-won experience of programs like this one.