Shield AI: Building Autonomous Defense Systems at Speed Without Losing Engineering Rigor

Shield AI was founded in 2015 with a premise that was simultaneously obvious and technically brutal: the U.S. military needed autonomous systems that could operate without GPS, without communications links, and without human pilots in the loop. The company’s flagship product, Hivemind, is an AI pilot designed to fly military aircraft in precisely those denied-environment conditions. Today it runs on the V-BAT vertical takeoff drone, and Shield AI has publicly stated ambitions to deploy Hivemind on the F-16 and F/A-18—crewed fighter aircraft with legacy avionics that were never designed to accommodate an AI decision layer.

That gap—between what Hivemind needs to do and what the surrounding system was built to support—is the central engineering challenge Shield AI faces. It is not primarily a machine learning problem. It is a systems engineering problem.

The Architecture of Ambition

Hivemind is not a single software artifact. It is a stack: perception, state estimation, decision-making, and control, layered across different compute substrates depending on the host aircraft. The V-BAT integration is relatively clean—Shield AI has significant influence over the vehicle design. The F-16 and F/A-18 programs are fundamentally different in character. Those aircraft have existing airworthiness approvals, certified avionics architectures, and interface control documents written before modern neural networks existed as a practical engineering tool.

This means Shield AI is not just building an AI pilot. It is building an AI pilot that must interface with systems whose interface requirements were defined under entirely different assumptions about what software is and how it behaves.

The systems engineering implication is significant. Every interface point between Hivemind and a legacy aircraft system becomes a requirements boundary where traceability must be established and maintained. When the AI stack changes—when a perception model is retrained, when a tactical decision policy is updated—the question is not just whether the system works better. The question is whether the changed component still satisfies every requirement at every interface it touches. On a modern, continuously trained AI system, that question can be very hard to answer with confidence.

The Certification Gap and How Industry Is Navigating It

DO-178C, the aviation software development standard that governs airborne software certification in the United States and most allied nations, is built on a deterministic model of software behavior. You specify what the software should do. You write code that does it. You test that the code does it. You produce an artifact trail showing that every requirement has been tested and every test has passed. The assumption underlying all of this is that the same inputs, in the same conditions, produce the same outputs.

Neural networks do not work that way. A trained model is, in a meaningful sense, a function learned from data rather than specified by engineers. Its behavior emerges from training rather than from explicit logical construction. The standard test coverage metrics in DO-178C—modified condition/decision coverage, structural coverage—do not map cleanly onto learned functions. There is no obvious “branch” in a neural network layer to cover.

The FAA and EASA have both acknowledged this gap. EASA published its first concept paper on AI in aviation in 2021 and has issued updated guidance since, acknowledging that existing standards require adaptation. The FAA has been conducting research under its Artificial Intelligence Safety Assurance framework. But as of 2026, there is no ratified airworthiness standard for AI-based autonomous systems in crewed military aircraft. There are workarounds—runtime monitors, formal operational design domains, safety cages that limit what the AI is permitted to command—but no clean, accepted path.

For Shield AI, operating in the military aviation space rather than commercial aviation, the regulatory framework is different in important ways. Military airworthiness in the United States is governed by the applicable service’s airworthiness authority—the Army, Navy, and Air Force each have their own—rather than by the FAA. These authorities have more flexibility to approve novel approaches, and the risk tolerance for experimental programs is higher. But the underlying challenge does not disappear. Programs that move from experimental to operational still require an approval basis, and acquisition programs still require that contractors demonstrate their software meets specified requirements. The artifact trail still matters. The traceability still matters.

Shield AI is not unusual in facing this challenge. Every company attempting to field AI-based autonomous systems in military aviation faces the same structural problem. What distinguishes Shield AI is that they have committed to fielding on multiple platforms simultaneously, at speed, with a team that is large but still small relative to a traditional prime defense contractor.

Speed Culture and the Systems Engineering Tax

Shield AI has been public about its engineering culture. The company grew rapidly, attracting talent from Google, SpaceX, and other technology-forward organizations where fast iteration is the default mode. CEO Ryan Tsukamoto and co-founder Brandon Tseng have spoken about building a company that moves with Silicon Valley velocity in a defense context that typically moves at the pace of acquisition bureaucracy.

This creates a genuine tension. Fast iteration is the right approach for an AI system that needs to improve through experience and testing. Reinforcement learning-based policies, in particular, get better by running in environments that approximate the real one. You want to cycle quickly: deploy, observe, retrain, evaluate, redeploy. The feedback loop is the mechanism of improvement.

But every cycle in that loop, on a safety-critical system, creates a new configuration that must be characterized. In traditional avionics development, configuration management is a solved problem—expensive and slow, but solved. Software versions are locked, tested against a fixed set of requirements, and released as discrete artifacts. In an AI development pipeline, the concept of a “version” gets complicated. A retrained model is technically a different artifact than its predecessor, even if the architecture is identical. The training data changed. The weights changed. The behavior may have changed in ways that are not immediately obvious from test performance metrics.

The systems engineering tax here is real. For every capability advance Hivemind achieves through iteration, there is a corresponding cost in verification, validation, and configuration management effort. Companies that underinvest in that infrastructure end up with systems they cannot explain to an approving authority, no matter how well those systems actually perform. Companies that overinvest in process slow down to the point where the AI falls behind the operational problem it is supposed to solve.

Requirements Traceability Across an Evolving Stack

Defense acquisition programs operate on requirements documents. Source Selection Documents, System Requirements Reviews, Critical Design Reviews—all of these milestones require that a contractor demonstrate traceability from top-level mission requirements down through system, subsystem, and component requirements to the implementation and test evidence. This is not optional bureaucracy. It is the mechanism by which the government confirms it is getting what it paid for, and by which the contractor protects itself from scope disputes.

For a traditional avionics system, requirements traceability is hard but tractable. The architecture is relatively stable. The requirements, once baselined, change slowly. Requirement changes go through a formal change management process that generates an audit trail.

For an autonomy stack like Hivemind, requirements traceability is harder in ways that are qualitatively different, not just quantitatively more burdensome. The fundamental problem is that the behavior of the system is not fully specified by its requirements—it is partly emergent from training. When engineers write a requirement like “the system shall maintain safe separation from terrain in all weather conditions,” they are specifying a desired outcome, not a mechanism. The mechanism is a trained neural policy, and whether that policy satisfies the requirement must be demonstrated through testing and formal verification—neither of which scale easily to the full operational envelope.

This means that requirements traceability for autonomous AI systems must capture not just the link from requirement to code, but the link from requirement to training objective, to evaluation metric, to test condition coverage, to observed performance. That is a significantly more complex traceability graph than a traditional requirements matrix captures. Static document management tools—the kind that represent requirements as rows in a spreadsheet or nodes in a hierarchical tree—struggle to represent these relationships faithfully.

The tooling challenge is one the broader defense autonomy community is actively working on. Requirements management platforms that support graph-based traceability, where a requirement can be linked to multiple artifact types in multiple directions, are better suited to this problem than those that enforce a strict hierarchical decomposition. The gap between what document-centric tools can represent and what autonomy programs actually need to trace is one reason teams in this space often end up building custom internal tooling or maintaining parallel traceability artifacts outside their official requirements management system. Tools like Flow Engineering, which are purpose-built around graph-based requirements models rather than adapted document management, are increasingly relevant to programs with this kind of complex, evolving traceability architecture.

Team Scale and the Organizational Challenge

Shield AI is a mid-sized company by Silicon Valley standards—roughly 700 to 1,000 employees as of publicly available information. By defense prime contractor standards, it is small. By autonomy startup standards, it is large.

That intermediate size creates specific organizational challenges for systems engineering. A small startup can run on shared context: everyone knows what everyone else is doing, and informal coordination fills in for formal process. A large prime contractor has the process infrastructure—configuration management systems, systems engineering organizations with dedicated headcount, independent verification and validation teams—to manage complexity at scale. Shield AI is neither.

What Shield AI has is a culture that values engineering rigor alongside speed, and a set of programs—especially the crewed aircraft integrations—that are forcing the development of more rigorous systems engineering practices whether or not that was the original plan. Flying an AI on a V-BAT is forgiving of process immaturity in ways that attaching an AI pilot to an F-16 is not.

The Honest Assessment

Shield AI is attempting something that has not been done before at scale: fielding a continuously learning AI system across multiple military aircraft platforms under real operational constraints. The technical ambition is genuine. The engineering challenges are also genuine, and they are not fully solved—by Shield AI or by anyone else in the industry.

The most important systems engineering problems they face—certifying non-deterministic software against deterministic standards, maintaining meaningful requirements traceability through rapid AI iteration cycles, scaling process rigor without losing development velocity—are problems that define the next decade of defense autonomy development, not just Shield AI’s roadmap.

How they navigate those problems will matter beyond their own programs. Defense AI developers, acquisition professionals, and systems engineers across the industry are watching how companies like Shield AI handle the intersection of speed and rigor, because the approach that works—if one emerges—will become the template.

The systems engineering community has a stake in that outcome. The answers will not come from process documents alone. They will come from engineering teams that take requirements, traceability, and verification seriously as enabling disciplines rather than compliance burdens—and from the tooling infrastructure that makes doing so tractable at the pace autonomy programs actually move.