The Autonomy Stack and the Requirements Problem: Perception, Planning, and Control as Systems Engineering Challenges

Autonomous systems — self-driving vehicles, autonomous surface vessels, drone swarms, robotic inspection platforms — share a common three-layer architecture. Perception takes raw sensor data and builds a world model. Planning takes that world model and decides what to do. Control takes those decisions and acts on the physical world. This architecture is so well-established that it reads like a textbook entry.

The requirements problem, however, is not well-established. Despite the architectural similarity across autonomous platforms, most programs struggle with how to write, structure, and trace requirements across the autonomy stack. Perception engineers complain that their requirements don’t tell them what environments they’re designing for. Planning engineers inherit perception outputs that are underspecified as interfaces. Control engineers discover, too late, that their degraded-mode requirements were never written at all.

This article is not a textbook overview of the autonomy stack. It is an analysis of where requirements management goes wrong at each layer, what the leading teams are doing differently, and what the field is still getting wrong at the structural level.


Perception: The Operational Design Domain Problem

The most common perception requirements failure is writing sensor performance requirements in isolation from operating environment characterization.

A requirement that reads “the LiDAR subsystem shall detect objects ≥0.1 m² at a range of 150 m” is not wrong — but it is incomplete in a way that creates serious downstream problems. Detected at 150 m under what conditions? In fog? In direct sunlight? On a wet surface with retroreflective lane markings? At what ambient temperature? With what sensor contamination state?

The Operational Design Domain (ODD) — or for non-vehicular autonomy, the Operational Environment Descriptor (OED) — is the artifact that should answer these questions. It characterizes the full envelope of environmental conditions the system is expected to encounter. Without it, perception requirements are written to an implicit, unspoken environment that different engineers on the team interpret differently.

The leading autonomous vehicle programs — and their counterparts in maritime and aerial autonomy — have learned to treat the ODD as a first-class systems engineering artifact that sits at the top of their requirements hierarchy, not as a paragraph in a concept of operations document. Every perception requirement traces to one or more ODD parameters. When the ODD changes, traceability analysis immediately surfaces which perception requirements need review.

Most programs, however, still treat the ODD as a marketing document. It is written once, during program initiation, and then ignored. Perception engineers don’t know it exists. When environmental edge cases surface in testing — heavy rain degrading camera performance, snow accumulation blinding ultrasonic sensors — the program scrambles to characterize what happened rather than having pre-specified requirements to verify against.

The structural mistake is architectural: the ODD lives in a different document, in a different tool, maintained by a different team than the requirements. There is no live traceability link. When one changes, the other doesn’t know.


The Interface Problem Between Perception and Planning

Perception outputs are inputs to planning. This sounds obvious. What is not obvious — and what most programs get wrong — is the requirements specification for that interface.

Planning modules consume perception outputs: object detections, classifications, velocity estimates, occupancy grids, localization confidence scores. Each of these has performance characteristics that matter to planning: latency, update rate, false positive rate, false negative rate, classification confidence bounds, and — critically — uncertainty quantification.

The failure mode is that perception engineers write requirements for what their subsystem produces, and planning engineers write requirements for what their subsystem needs, and nobody formally specifies the interface between them as a requirements artifact with bidirectional traceability.

In practice, this means:

  • Planning algorithms are tuned to perception characteristics that are never formally specified, so they break when perception changes.
  • Perception teams optimize for metrics (mAP, IoU) that don’t directly map to what planning actually needs.
  • Latency requirements on perception outputs are either absent or written so conservatively that they become untestable performance goals.

The leading teams are doing something different. They treat the perception-to-planning interface as a system-level requirement artifact — sometimes called a derived interface requirement or an ICD (Interface Control Document) requirement — that is co-owned by both teams and traces bidirectionally into both perception and planning requirement hierarchies. When a planning engineer needs object classification with 95% precision at class-level for certain object categories, that need is formally allocated down into a perception requirement, not informally communicated in a Slack message.

This requires a requirements tool capable of representing that bidirectional traceability clearly. Flat document-based tools — and this describes the majority of deployed DOORS installations — represent interfaces poorly. An interface requirement in a flat hierarchy is just another paragraph in a numbered list, with no structural representation of the fact that it has parents in two different requirement trees.


Planning: The Edge Case Enumeration Problem

Planning requirements are structurally different from perception requirements. Perception requirements can, in principle, be fully specified: given these environmental conditions, the sensor shall produce outputs with these characteristics. The specification space, while large, is finite and enumerable.

Planning requirements cannot be fully specified by enumeration. The driving scenario space — or the maritime encounter space, or the airspace conflict space — is effectively infinite. You cannot write a requirement for every situation a planner will encounter. Programs that try end up with requirement sets that are simultaneously enormous and incomplete: thousands of individual scenario requirements that still don’t cover the edge cases that actually cause failures.

The alternative approach, which leading autonomous systems programs have begun adopting, is hierarchical scenario decomposition with coverage modeling. Rather than enumerating specific scenarios, you define:

  1. Behavioral competency requirements — what the system must be capable of, expressed at a functional level (e.g., “the system shall yield to emergency vehicles that are exhibiting auditory and visual signals”)
  2. Scenario class requirements — what categories of situations each competency must handle, with explicit enumeration of the environmental parameters that define each class
  3. Edge case extension requirements — explicit requirements derived from formal hazard analysis that address the specific parameter combinations most likely to cause failure

This three-level structure doesn’t eliminate the incompleteness problem — no requirements structure can — but it makes the incompleteness explicit and traceable. You know which scenario classes you haven’t fully characterized. You know which edge cases came from hazard analysis. When a novel failure occurs in testing or operation, you can trace it back to the gap in your requirements structure and fill it systematically rather than treating every incident as a surprise.

The structural requirement for this approach is a requirements tool that supports hierarchical, graph-connected requirement structures with explicit coverage modeling. This is not a description of Microsoft Word, Confluence, or even most traditional requirements management tools. It is a description of what modern AI-native tools are being built to support.


Control: The Degraded Sensor State Problem

Control requirements are where the autonomy stack’s requirements failures become dangerous.

Nominal control requirements — how the system behaves when sensors are healthy, communications are intact, and the environment is within ODD — are generally well-specified. The failure is in degraded-mode requirements: what the system does when one or more inputs are missing, unreliable, or contradictory.

The most common failure pattern is that degraded-mode requirements are written as an afterthought, after the control architecture is already defined. The architecture team makes decisions about actuator authority, fail-safe states, and supervisory override logic based on nominal requirements. Then, during safety analysis, the degraded-mode requirements are reverse-engineered from what the architecture can do, rather than forward-allocated from what safety demands.

This is backwards. The consequence is autonomous systems with fail-safe behaviors that are safe in the sense of “the actuators go to a defined state” but not safe in the operational sense of “the vehicle/vessel/drone ends up somewhere that doesn’t cause harm.” A self-driving vehicle that pulls over and stops is not safe if it pulls over in a lane of traffic. An autonomous ship that holds position is not safe if it holds position in a shipping fairway. A drone that returns to home is not safe if it does so in a correlated GPS-spoofing environment.

The leading teams write degraded-mode requirements before the control architecture is defined, as part of safety requirements allocation from the top-level system hazard analysis. These requirements specify:

  • Detection requirements: the system shall detect [sensor failure mode] within [time bound] with [confidence threshold]
  • Classification requirements: the system shall distinguish between sensor hardware failure, sensor contamination, and environmental conditions that exceed ODD bounds
  • Response requirements: under [degraded state class], the system shall [specific behavior] within [time bound] and shall [transition condition]
  • Recovery requirements: the system shall [return to nominal operation / require manual intervention] when [specified conditions are met]

This structure — detection, classification, response, recovery — is a requirements pattern that experienced autonomy safety engineers use consistently. It is not universally adopted. Many programs still write degraded-mode requirements as a single list of “shall operate in degraded mode” statements that leave every meaningful question unanswered.


Where the Field Is Still Making Foundational Mistakes

Beyond the layer-specific problems, three systemic failures appear across autonomous systems programs regardless of domain:

Treating the stack as three independent workstreams. Perception, planning, and control are developed by different teams, using different tools, often with different requirements processes. The handoffs between them are managed through informal communication rather than formal interface requirements. The result is a requirements structure that has gaps at every layer boundary — exactly where the most consequential failures occur.

Confusing verification with validation. Programs write requirements that are easy to verify in isolation (“the classifier shall achieve mAP ≥ 0.85 on the internal test dataset”) but that don’t validate system-level safety (“the system shall not make a control action that endangers an unprotected road user”). The internal test dataset is not the operational environment. The gap between them is where autonomous systems fail.

No live traceability. Requirements documents are written, approved, and then disconnected from the engineering work that follows. Design decisions, architecture changes, test results, and incident findings are not traced back to requirements. When a new sensor is added or a planning algorithm changes, nobody systematically checks which requirements that change affects. Traceability matrices exist as spreadsheets that are updated once per program phase, not as live connections that surface risk in real time.


How Modern Tools Are Beginning to Address This

The requirements tooling that dominates the autonomy space today — IBM DOORS Next, Jama Connect, Polarion — was built for a different era of systems engineering. These tools do handle structured requirements, traceability, and review workflows. But their underlying model is document-oriented and hierarchically flat. They can represent that requirement A traces to requirement B, but they struggle to represent the network of dependencies that the autonomy stack actually has: interface requirements with multiple parents, coverage relationships between scenario classes and specific scenarios, bidirectional allocation between perception performance and planning needs.

Platforms like Flow Engineering are built on a graph model rather than a document model, which is a structural fit for how autonomy requirements actually relate to each other. The ability to represent an interface requirement as genuinely connected to both the perception requirement tree and the planning requirement tree — not just linked via a traceability matrix — changes what analysis is possible. Coverage gaps are visible. Impact analysis when requirements change is automatic rather than manual. AI-assisted requirement generation can suggest derived requirements at layer boundaries, where the hardest gaps occur.

This doesn’t solve the intellectual problem of writing good autonomy requirements — that still requires domain expertise in sensor physics, planning algorithms, and operational safety. But it removes the structural obstacles that cause teams to make the same foundational mistakes across programs.


Honest Assessment

The autonomy field is not failing at requirements management because engineers are careless. The failure is structural: the standard practices and standard tools were not designed for the specific challenges of hierarchical, cross-layer, environment-parameterized requirements that the autonomy stack demands.

Programs that are succeeding have typically made three moves: they treat the ODD/OED as a live systems engineering artifact with formal traceability; they write degraded-mode requirements before — not after — control architecture decisions; and they manage the stack as a connected system model rather than three separate workstreams with handoff documents between them.

The programs still making foundational mistakes are, in most cases, not aware they’re making them. The requirements look complete. The traceability matrices are filled in. The safety cases exist. The failures surface in testing, in field operation, or — worst case — in incidents. At that point, tracing back to the requirements gap that caused the failure is an exercise in forensic engineering rather than preventive engineering.

The difference between those two outcomes is not primarily a technology problem. It is a problem of requirements culture, requirements structure, and the willingness to treat the autonomy stack as the genuinely complex systems engineering challenge it is.