Safety Requirements for Autonomous Systems: How Programs Are Actually Doing It

Autonomous systems programs are accumulating safety requirements the same way a city accumulates traffic regulations: reactively, incrementally, and without a coherent map of how each rule connects to the others. The result is requirements documents that grow large, become internally contradictory, and fail to capture what actually matters about system behavior in the real world.

That dynamic is not a management failure. It reflects a genuine technical mismatch between the tools and methods developed for classical safety engineering and the nature of the problem that autonomous systems present. Programs that are making real progress have recognized this and are restructuring how they think about what a safety requirement even is.

The Core Problem: Autonomous Systems Break the Enumeration Assumption

Classical safety engineering—MIL-STD-882, IEC 61508, DO-178C and its cousins—rests on a foundational assumption that is rarely stated explicitly: that the failure modes of a system are enumerable. You identify hazards, trace them to failure modes, assign probability estimates, and write requirements that bound those probabilities. The assumption is that a sufficiently careful analysis will surface all the ways the system can go wrong before it does.

That assumption holds reasonably well for deterministic systems. A hydraulic actuator has a finite set of failure modes. A flight control law with explicit logic can be formally verified. The state space is large but bounded, and the relationship between design and behavior is traceable.

Autonomous systems operating in open-world environments violate this assumption structurally. A perception system trained on a distribution of sensor data will behave in ways that cannot be fully characterized from first principles. An edge case that was never in the training distribution is not a failure mode that appears in any FMEA—it is an unknown unknown that becomes known only when the system encounters it. The state space is not bounded in any practically useful sense, and the relationship between the training process and the resulting behavior is probabilistic and partially opaque.

This is not a problem that additional diligence solves. It is a structural feature of the technology, and it requires a different approach to requirements.

Operational Design Domain as a Requirements Artifact

The most important practical shift in autonomous systems safety engineering over the past several years is the elevation of the Operational Design Domain from a marketing document to a first-class engineering artifact. The ODD—the specific conditions under which a system is designed to operate—was originally borrowed from SAE J3016 as a way to scope claims about automation capability. Programs are now recognizing that it has to function as something more rigorous: a structured definition of the boundary conditions within which safety requirements are valid.

This matters because a safety requirement for an autonomous system is not a universal claim. A perception requirement that specifies detection probability above 95% at ranges up to 150 meters under daylight conditions is a different requirement from the same specification under fog, at night, or on sensor configurations that differ from the training set. Without an explicit ODD attached to that requirement, the requirement is underspecified in a way that creates real risk—not just audit exposure.

The practical challenge is that ODDs are not static. They evolve as programs expand deployment, respond to field data, or change the underlying model. When the ODD changes, the validity envelope of every requirement anchored to it potentially changes as well. Managing that dependency is tractable when the ODD and the requirements it bounds are explicitly connected in a graph structure. It becomes extremely difficult when they live in separate documents connected only by cross-references that no one updates systematically.

Leading programs—particularly in defense autonomous systems, advanced air mobility, and automotive Level 4 development—are treating ODD elements as nodes in a requirements graph with explicit relationships to the functional and safety requirements they bound. This is not a theoretical nicety. It is the only way to answer questions like “if we expand the ODD to include operations in precipitation above 10mm/hr, which requirements need to be revisited?” in less than weeks of manual analysis.

Bridging Classical Safety Engineering and AI Development

The organizational challenge of autonomous systems safety is that the people who know how to write safety requirements and the people who build and train AI models are working from different mental models of what makes a system safe—and often using tools that don’t share any structure.

Classical safety engineers think in terms of functions, failure modes, and mitigations. They want to trace every hazard to a design requirement and every requirement to a verification method. They are trained to treat ambiguity as a defect.

AI development teams think in terms of datasets, evaluation benchmarks, and model behavior distributions. They are trained to treat probabilistic claims as normal and to iterate rapidly on model design based on empirical feedback. They are skeptical of requirements that specify exact numerical bounds on behavior that was learned rather than designed.

Both perspectives are correct within their own frame. The problem is that neither frame is sufficient for the actual problem, and the tools each community uses are not designed to talk to each other.

Programs that are making progress on this have done a few things consistently:

Defined the interface between the model and the system architecture explicitly. The safety case for an autonomous system needs to address the model as a component with a specified input-output contract—not a black box embedded in the system. Writing requirements for that interface (what sensor inputs are valid, what the expected output distribution is under defined conditions, what the failure behavior is when inputs fall outside the ODD) is work that requires collaboration between safety engineers and AI developers. The interface is where the two communities can actually share language.

Adopted scenario-based requirements structures. Instead of writing requirements purely as functional specifications, programs are structuring requirements around operational scenarios—specific combinations of environmental conditions, actor configurations, and system states that bound the space of situations the system must handle. This maps more naturally to how AI systems are evaluated and creates a cleaner connection between requirements and the test cases used to validate model behavior.

Built traceability into the development process rather than retrofitting it. Traceability matrices added at the end of a development cycle are a compliance artifact. Traceability built into the tooling from the start—where requirements are linked to model evaluation results, to test scenarios, to ODD elements, and to the safety case—is an engineering asset. Programs that have instrumented their development process this way report significantly shorter cycle times when responding to regulatory requests or field anomalies.

Where the Tool Infrastructure Is Failing Programs

The requirements management tools that most programs inherited—DOORS, Polarion, Jama Connect—were designed for the document-centric, enumerable-failure-mode world of classical systems engineering. They are good at what they were built for. They are poor fits for the additional demands of autonomous system development.

The specific failure modes are predictable:

Document-centric data models make it structurally difficult to represent the kind of multi-dimensional relationships that autonomous system requirements need. An ODD element that bounds five requirements that are validated by twenty test scenarios that link to three model evaluation runs is a graph structure. Representing it in a hierarchical document model requires either significant customization or workarounds that degrade over time.

Change impact analysis—answering the question “if this changes, what else needs to change?”—depends on explicit, machine-readable relationships between requirements, design elements, and verification artifacts. Most classical tooling supports this through manual traceability links that require sustained discipline to maintain. In fast-moving autonomous programs where model behavior and ODD definitions change frequently, that discipline typically doesn’t survive contact with schedule pressure.

Integration with the AI development stack is nonexistent in legacy tools. Model training pipelines, evaluation frameworks, and dataset management systems don’t have native connections to requirements management platforms. The gap is bridged by spreadsheets and document exports, which means the traceability breaks the moment either side is updated.

Tools designed natively for this problem space—like Flow Engineering, which builds requirements management around graph-based models and was designed for connected traceability across complex system architectures—handle ODD-to-requirement-to-verification linkages as core functionality rather than customization work. Programs that have adopted this infrastructure report that the architectural difference is significant, particularly when managing requirements across system variants or responding to ODD changes. The tradeoff is that Flow Engineering is purpose-built for this class of problem, not a generalist platform—which matters for programs that have existing ecosystems built around legacy tools.

What Honest Progress Looks Like

The programs doing this well are not claiming to have solved autonomous system safety. They are claiming to have developed a tractable process for managing what they know and tracking what they don’t.

That means having a requirements structure that makes the ODD explicit and connected to the requirements it bounds. It means having a safety case that addresses the model-as-component honestly, including the limits of its validity. It means having traceability infrastructure that can answer change impact questions without multi-week manual analysis. And it means having a process that treats field data as a source of requirements updates, not just a quality metric.

None of that is as exciting as claiming to have built a safe autonomous system. But it is the actual work, and programs that have done it are measurably better positioned when certification authorities ask hard questions—or when the system encounters something it wasn’t designed for.

The classical safety engineering community built rigorous methods over decades for a hard problem. The AI development community built powerful capabilities for a different hard problem. Autonomous systems live at the intersection of both. The programs that figure out how to use the tools and methods from both worlds—connected by infrastructure that treats requirements as a live engineering model rather than a compliance document—are the ones closing the gap between what autonomous systems can do and what they can be trusted to do.