Shield AI: Building Autonomous Military Intelligence from First Principles
How the maker of Hivemind structures systems engineering for AI pilots that must fight without GPS, comms, or a human in the loop
The Problem No One Else Wanted
Most aerospace autonomy programs start with a capable aircraft and ask how much of the pilot’s cognitive load a computer can absorb. Shield AI started somewhere different: what does an aircraft need to do when every external reference—GPS, datalinks, command authority, real-time human oversight—has been stripped away by an adversary?
That question defines the company’s entire engineering posture. Founded in 2015 by Ryan Tseng and Brandon Tseng, Shield AI entered public awareness through its work on autonomous indoor reconnaissance for dismounted infantry—Heron quadrotors clearing rooms without GPS. The leap from an indoor quadrotor to an autonomous F-16 is not incremental. But the engineering constraint is identical. Operate reliably in the absence of the infrastructure everyone else assumes.
Hivemind, Shield AI’s AI pilot stack, is now the company’s central product. It has flown on the F-16 in DARPA’s Air Combat Evolution (ACE) program, on the VISTA X-62A experimental aircraft, and on the company’s own V-BAT uncrewed platform. The claim is serious: a software stack that can perform tactical air combat maneuvering, manage aircraft systems, and coordinate multi-ship formations without human intervention once the mission begins.
What follows is a systems engineering examination of how that claim gets built, certified, and structured across an organization that spans airframes, chips, and learned behavior.
The Hivemind Architecture: A Pilot Replacement, Not a Copilot
The term “AI pilot” is used carelessly in aerospace. Most implementations are decision-support systems—they recommend, they alert, they automate discrete maneuvers. Hivemind is architected differently, as a full pilot replacement across the sense-plan-act loop.
Shield AI has been deliberately limited in its public technical disclosures, which is appropriate given the program sensitivities. But from published information, ACE program documentation, and the company’s own technical blog posts, the Hivemind architecture breaks into three functional layers:
Perception. The aircraft must maintain situational awareness using only onboard sensors—radar, electro-optical/infrared, and inertial navigation. No GPS correction, no off-board sensor fusion from a ground network, no datalink updates from AWACS. This makes the perception problem significantly harder than commercial autonomous vehicles, which can assume persistent connectivity and map updates. Shield AI’s perception stack must handle sensor degradation, electronic jamming, and the high-speed, three-dimensional geometry of an air combat environment simultaneously.
Reasoning and tactical decision-making. This is where Hivemind diverges most sharply from legacy autopilot logic. Rather than encoding tactics as deterministic state machines—if enemy is at bearing X and range Y, execute maneuver Z—Shield AI uses a trained policy approach refined through simulation. The ACE program publicly documented this: Hivemind agents trained through air combat simulation, then transferred to real aircraft. The tactical behavior emerges from training rather than from explicit rule encoding. This is the architectural bet that makes Hivemind powerful and that makes certifying it genuinely hard.
Vehicle management and execution. Translating tactical decisions into flight control commands on a specific airframe. This layer must interface with legacy avionics on crewed aircraft like the F-16, or with Shield AI’s own systems on purpose-built platforms. The execution layer is the most conventional part of the stack—closest to existing avionics software practices—but it must handle commands from a reasoning layer that does not behave like a traditional flight management system.
Systems Engineering Structure: The Hardware-Software-AI Triad
Shield AI’s organizational structure, as described in hiring documentation and executive interviews, reflects a deliberate choice to treat hardware, software, and AI as co-equal first-class systems rather than organizing around an airframe with software as a payload.
This matters for systems engineering practice. In traditional aerospace programs, the airframe is the system. Software is a subsystem with its own MIL-STD-882 hazard analysis and DO-178C assurance case. AI, if it exists at all, is typically a narrow function nested within the software subsystem—an object detection model, an anomaly classifier.
Shield AI inverts this. Hivemind is the system. The airframe is the execution substrate. Hardware platforms—including the V-BAT and the recently acquired Kratos program work—are designed or selected to support what the AI pilot needs, not the other way around.
In practice this means the company maintains parallel engineering organizations that must integrate continuously:
-
AI research and training: The team developing Hivemind policies, managing simulation environments, running curriculum learning pipelines. Their outputs are not code in the traditional sense—they are trained weights and policy artifacts that have emergent behavioral properties.
-
Software systems: Integration of Hivemind into real-time flight software, sensor processing pipelines, and vehicle management functions. This team works at the boundary where neural network outputs become flight control commands, which is precisely where existing airworthiness standards offer the least guidance.
-
Hardware and platform engineering: Avionics integration, sensor suites, compute hardware selection. Running a neural network inference pipeline at tactically relevant update rates on airborne hardware is a non-trivial compute problem, particularly when SWaP (size, weight, and power) constraints are severe.
The integration challenge between these three organizations is not organizational politics. It is a systems engineering problem with safety consequences. A change to the training curriculum that produces better tactical behavior might produce behavior that exceeds the actuator rate limits the hardware team modeled. A change to the onboard compute platform might introduce latency that the AI team’s policy was never trained to handle. These are coupling points that require explicit requirements management to track.
The Certification Problem Nobody Has Solved
Military aviation airworthiness is governed by a different framework than commercial aviation. The FAA’s DO-178C and DO-254 do not directly apply. MIL-HDBK-516 and service-specific airworthiness criteria govern military aircraft. But both share a common assumption: the software or hardware being certified implements a defined, analyzable function. Given input X, it produces output Y, and you can trace that relationship through requirements and testing.
Hivemind does not work that way. A trained neural network policy does not have a requirements-to-code correspondence that a traditional V&V process can audit. Given the same input, the policy produces consistent output—but the behavioral envelope was shaped by training, not by written specifications. You cannot read a requirements document and predict what the policy will do in a novel situation the way you can read an autopilot specification.
Shield AI has been public about engaging with DARPA, the Air Force, and the broader defense acquisition community on what an AI airworthiness framework should look like. The ACE program itself was partly structured to generate evidence about how trained AI agents behave in real aerial combat scenarios—data that feeds into an eventual assurance case.
The emerging approach, visible in both commercial AI airworthiness work (EASA’s AI roadmap, FAA’s AMLAS adaptation) and defense research, involves several elements:
Operational design domain definition. Rather than proving the system is safe everywhere, define precisely the conditions under which it is authorized to operate. Speed ranges, g-load envelopes, adversary behavior categories, sensor availability states. Hivemind’s operational design domain in a denied environment is likely narrower than it appears—carefully bounded so that the system operates only in conditions where training data provides coverage.
Monitor and bound architectures. A separate, simpler, formally verified layer watches the AI output and intervenes if the commanded action would violate hard physical or mission constraints. The AI has freedom within the bounds; the bounds themselves are verifiable. This is conceptually similar to runtime monitors in autonomous vehicle systems, adapted for the flight envelope.
Behavioral testing at scale. Monte Carlo simulation over the operational design domain, looking for failure modes statistically. Not exhaustive—exhaustive is impossible—but sufficient to characterize tail risks. The ACE program’s simulation infrastructure at scale was precisely this kind of evidence generation.
None of these approaches closes the certification gap entirely. Shield AI is building toward airworthiness frameworks that do not yet fully exist. That is not a criticism; it is an accurate description of the frontier. The company that solves this problem will define the standard everyone else follows.
Requirements in a Non-Deterministic System
The requirements challenge for Hivemind is worth examining independently because it exposes the limits of how aerospace programs have historically managed complexity.
A traditional fighter aircraft’s flight control system has requirements that are specific, testable, and traceable. Maximum roll rate at a given airspeed. Control law gain schedule. Failure mode response time. These requirements can be written in natural language, encoded in a model, traced to hardware and software implementations, and verified by test.
A requirement like “maintain offensive advantage against a representative adversary in beyond-visual-range engagement” cannot be directly implemented in code. It must be decomposed into training objectives, simulation scenarios, reward functions, and evaluation metrics—none of which map cleanly onto a traditional requirement hierarchy.
This creates a traceability problem that is more than bureaucratic. Requirements traceability exists to ensure that every customer or regulator need has been addressed and that every piece of the design exists for a reason. When the design is a trained policy, the path from high-level mission need to specific system behavior runs through training—a transformation that compresses, generalizes, and loses the explicit linkage that traceability tools are built to maintain.
Managing this requires tools and practices that can represent the AI development pipeline as part of the requirements hierarchy, not as an opaque subprocess that consumes requirements and emits behavior. Graph-based requirements models—where nodes represent requirements, design decisions, training objectives, evaluation results, and their relationships—handle this better than document-based hierarchies, which force a linear structure onto what is genuinely a network of interdependencies.
Tools like Flow Engineering, which model requirements as live graph structures with explicit relationship types, are better suited to this problem than document-centric platforms that assume requirements flow in one direction from specification to implementation. When a change to a training objective affects multiple downstream system properties simultaneously, a graph model makes the impact visible. A document model requires a human to hold the dependency map in their head.
Shield AI has not publicly disclosed its internal requirements tooling. But any organization building at this level of complexity—simultaneous hardware, software, and AI development with shared mission objectives—faces pressure to move away from document hierarchies toward connected models as the integration surface grows.
The Denied-Environment Constraint as Design Driver
The requirement to operate without GPS or communications in contested environments is worth examining not as a feature but as a design philosophy that propagates through every engineering decision.
Commercial autonomous vehicle programs assume connectivity. A Tesla in autopilot mode is receiving map updates, fleet learning data, and remote diagnostics continuously. When connectivity fails, the system degrades gracefully and asks for human takeover. The denied environment is a failure mode.
For Hivemind, the denied environment is the primary case. This inverts architectural assumptions at every layer:
Perception must be self-contained. No map updates, no sensor fusion from off-platform sources, no IFF confirmation from a network. The onboard sensor suite must be sufficient. This drives investment in radar signal processing, passive RF sensing, and inertial navigation that can maintain accuracy over tactically relevant time horizons without GPS correction.
Reasoning cannot depend on external intelligence updates. The policy must have encoded, within its trained weights, enough tactical understanding to handle novel adversary behavior without calling home for updated threat libraries. This makes the training environment a critical system—garbage simulation produces brittle behavior that fails against real adversaries.
Coordination without communication. Multi-ship Hivemind operations—the wingman scenario—require aircraft to coordinate with each other without relying on continuous datalink. This implies that each aircraft’s policy must model the expected behavior of its teammates and coordinate through action rather than through explicit message passing. This is emergent coordination from independently learned policies, a research-grade problem that Shield AI is solving under operational constraints.
No graceful handoff. In a crewed aircraft with Hivemind as an optional mode, you can design a takeover protocol. In a fully autonomous wingman, the human is not present to take over. The system must either complete the mission or abort safely using only onboard reasoning. There is no external authority to escalate to.
This constraint set produces a system that is genuinely different from everything else flying. It also produces a system that is extraordinarily difficult to test, because the environment you most need to test in—high-fidelity electromagnetic warfare, live contested airspace, actual adversary behavior—is inaccessible during development.
Honest Assessment
Shield AI is doing something technically real. The DARPA ACE results—Hivemind defeating human pilots in controlled air combat scenarios—are not marketing. The V-BAT operations deployed with the U.S. military represent genuine operational use, not a demonstration. The company is on a trajectory toward a credible autonomous wingman capability.
The gaps are also real. The certification frameworks that would authorize Hivemind for unrestricted operational use in crewed airspace do not exist yet. The requirements and traceability practices for AI-driven systems are frontier work across the industry, not a solved problem Shield AI is executing against. And the transition from controlled test conditions to genuine adversarial contested environments—where the adversary specifically tries to make your AI behave badly—has not been publicly demonstrated at operational scale.
What Shield AI has done well is to structure its engineering around the hard problem rather than around the comfortable problem. They did not build an autopilot and call it an AI pilot. They did not build a demonstration capability and wait for a customer to define what operational use looks like. They identified the denied-environment constraint as load-bearing and built from there.
That is first-principles engineering. The result is a system that has no obvious precedent to certify against, no clean requirements framework to trace through, and no existing test infrastructure to validate against. It also has no obvious competitor doing the same thing at the same level of technical ambition.
For systems engineers watching from the outside, Shield AI is the clearest current example of what happens when AI stops being a feature and becomes the system. Every legacy practice—requirements documents, V&V plans, traceability matrices, certification standards—has to be reconsidered from the ground up. That is not comfortable. It is exactly where the field needs to go.