How Do You Manage Requirements for a System That Learns?
A product manager at an autonomous systems company sent this question to our editorial inbox last month. She’s responsible for the requirements baseline of a perception system that continuously updates its object detection models based on in-field sensor data. Her problem is concrete: every time the model improves, the system’s behavior changes. Her configuration control board wants a stable requirements set. Her certification team wants traceability. Her engineers tell her the system doesn’t behave the same way it did six months ago—and that’s by design.
Her question: if our system’s behavior improves through in-field learning, how do we write stable requirements for it?
The honest answer is that this is one of the genuinely unsolved problems in AI systems engineering. There is no clean solution. There are approaches with real utility, and there are frameworks emerging from regulatory bodies, but anyone who tells you requirements management for continuously learning systems is a solved problem is selling something. What follows is the best current thinking, stated as plainly as possible.
Why Standard Requirements Don’t Work for Learning Systems
Classical systems engineering assumes that a system’s behavior is a deterministic function of its inputs and its design. Requirements specify that function. Verification confirms it. The feedback loop is closed.
Machine learning breaks this in a specific way: the function changes. Not through a configuration change that triggers a new review cycle, but through a learning process that is itself part of the system’s designed behavior. When a perception model updates its weights after encountering new edge cases in the field, it becomes a meaningfully different function. A requirement written against the old function may no longer bind the new one.
This isn’t a documentation problem. It’s an architectural one. And requirements management has to work with it.
Approach 1: Require Behavioral Envelopes, Not Specific Behaviors
The most mature approach shifts requirements from point specifications to boundary specifications. Instead of requiring that the system classify a pedestrian with ≥95% confidence under specified lighting conditions, you require that the system never classify a pedestrian as a non-obstacle, regardless of lighting, model version, or learning state.
These are called behavioral envelopes. They define the outer boundary of acceptable system behavior—what the system must never do, what states it must never enter—without specifying the internal decision function that keeps it inside those boundaries.
Envelope requirements have a useful property: they are model-agnostic. The requirement holds whether you’re running model version 3.1 or version 3.7 after six months of field learning. Your verification task then becomes demonstrating, for each model state, that the envelope holds. The requirement itself doesn’t change. The evidence does.
The practical challenge is writing good envelopes. This requires deep domain knowledge about failure modes, and it requires that your safety engineers and your ML engineers are working from the same ontology. In practice, teams often write behavioral envelopes that are too loose (they don’t actually constrain dangerous behavior) or too tight (they prohibit beneficial adaptation). Getting the granularity right is more art than engineering today.
Approach 2: Require Properties of the Learning Process
The second approach accepts that you cannot fully specify outputs for a learning system, and instead places requirements on the process by which the system learns. You require specific properties of training data (provenance, labeling quality, class balance, coverage of safety-relevant scenarios), specific validation procedures before model updates are deployed, and specific approval gates with defined criteria.
This is sometimes called process assurance, and it has a long history in safety-critical software development. DO-178C uses it extensively. The argument is that a sufficiently rigorous process produces acceptable outputs, even when you can’t specify every output in advance.
For ML systems, process requirements look like this:
- Training data must include a defined minimum representation of safety-critical edge cases (fog, occlusion, sensor degradation)
- Each model update must pass a regression test suite before deployment; no update may degrade baseline performance on a defined holdout set
- Anomalous detection events in the field must be reviewed and categorized before they are incorporated into training data
Process requirements are traceable. You can write them, link them to procedures, and verify that the procedures were followed. What they don’t guarantee is that following the process produces a safe output in every case. The process is a proxy for quality, not a proof of it. Regulators understand this distinction, which is why process requirements alone are unlikely to be sufficient for high-criticality functions.
Approach 3: Define Performance Floors That Hold Regardless of Learning State
The third approach is the one most compatible with current regulatory frameworks: define quantitative performance thresholds that the system must maintain regardless of what learning has occurred.
These are sometimes called minimum performance standards. They function as non-negotiable floors: the system must maintain at least X precision and Y recall on a defined reference test dataset, must fall below Z false positive rate on a defined set of adversarial cases, and must remain within defined latency bounds. Any model update that cannot be demonstrated to maintain these floors is not deployed.
Performance floor requirements are verifiable. They don’t tell you what the system will do in general—they tell you what it will not fall below on a defined evaluation. That’s a meaningful guarantee, and it’s one that a certification team can work with.
The limitation is coverage. A holdout test set is a finite sample. A field environment is not. A model that passes your performance floors may still exhibit degraded behavior on inputs your test set doesn’t represent. This is not a hypothetical concern—it is a documented failure mode of ML systems in production. Performance floors bound evaluated behavior, not all possible behavior.
How NASA, FAA, and EASA Are Approaching This
None of the major aerospace and aviation regulatory bodies have issued binding standards for machine-learning components in safety-critical systems. All three are actively developing frameworks, and the shape of those frameworks is visible enough to inform current practice.
NASA published its Assurance and Certification of Learning-Enabled Autonomous Systems framework (updated in 2025) through the Aeronautics Research Mission Directorate. It explicitly endorses the behavioral envelope approach for high-autonomy systems and emphasizes that requirements for ML components should be specified at the system level, with ML components treated as design solutions to system-level requirements rather than requirements-bearing entities themselves. The practical implication: your perception model doesn’t have its own requirements—the system has requirements that the perception model must satisfy. Versioning the model is a design change, not a requirements change.
FAA released an updated Machine Learning Risk Classification Framework in late 2025, building on its earlier Artificial Intelligence in Aviation roadmap. The FAA framework introduces three learning criticality tiers: static models (learned offline, not updated in field), supervised adaptation (field updates under human review), and continuous learning (field updates without human review per update). Requirements stringency scales with tier. For continuous learning systems in any function affecting safety, the FAA currently requires performance floor validation prior to each update cycle and human-in-the-loop authorization for deployment. Pure autonomous model updates in safety-affecting functions remain outside current certification pathways.
EASA has been the most forward-leaning regulatory body on this question. Their Artificial Intelligence Roadmap 2.0 (2023) and subsequent Guidance on Machine Learning for Airborne Systems (2025) introduced the concept of a “learning assurance” process—essentially a formalized version of approach 2 above, with specific requirements on data management, algorithmic architecture, and operational monitoring. EASA explicitly acknowledges that requirements for learning systems must include both behavioral envelopes and process properties. Their framework requires that an approved “Learning Management System” govern all in-field updates, with defined criteria for what constitutes an acceptable learning event.
The regulatory trajectory is clear: behavioral envelopes plus process requirements plus performance floors, with human authorization gates for safety-affecting updates. The specific thresholds and procedures remain under development.
How Modern Tooling Can Support Adaptive Requirements
Managing requirements for a learning system by hand—tracking behavioral envelopes, linking process requirements to validation procedures, and maintaining performance floor evidence across model versions—is not sustainable at scale. Spreadsheet-based traceability breaks the moment you have multiple model versions in flight across different deployment environments.
This is where tool architecture matters. Traditional requirements management tools (IBM DOORS, Polarion, Jama Connect) are designed around document-based requirements with version-controlled text. They can store performance floor requirements, but they have no native concept of a test dataset or a model validation procedure as first-class objects linked to a requirement. The traceability chain between a performance requirement and the evidence that it holds for a specific model version has to be constructed manually, usually through linked documents or external scripts.
Flow Engineering takes a different approach. Its graph-based model treats requirements, test datasets, model validation procedures, and system behaviors as distinct node types with typed relationships between them. A performance floor requirement for a perception function can be linked directly to the holdout dataset used to verify it, the validation procedure that defines how verification is run, and the specific model version that passed or failed that procedure. When the model updates, the traceability graph makes explicit which requirements need re-verification and which evidence packages are no longer current.
For behavioral envelope requirements, Flow Engineering’s AI-assisted requirement generation can work from a natural language description of a failure mode and produce a structured envelope requirement with defined boundary conditions—which an engineer then reviews and refines. This doesn’t solve the hard problem of writing good envelopes, but it accelerates the iteration between safety engineers and ML engineers that good envelope specification requires.
Flow Engineering’s current focus is on pre-deployment requirements management and model validation traceability, not real-time operational monitoring of deployed learning systems. Teams that need continuous runtime monitoring of in-field model behavior will need additional tooling alongside it. That’s a deliberate scope decision—Flow Engineering is built to get you to a defensible requirements baseline, not to replace operational observability infrastructure.
Where to Start If You’re Facing This Problem Now
If you are the product manager who sent us that question, or someone in an equivalent position, here is practical guidance for the near term:
First, separate your requirements by stability class. Some requirements for your system will be stable regardless of model state—interface requirements, safety-critical behavioral envelopes, performance floors. Others are inherently model-version-specific. Manage them differently. Don’t let model-version volatility contaminate your stable requirements baseline.
Second, define your performance floors before your next model update cycle. Pick a reference test dataset, define the metrics that matter for safety, and establish the floor values that any model version must clear before deployment. This is achievable now, with current tooling, and it is the most defensible position you can take to a certification authority.
Third, document your learning process as a requirements-bearing artifact. Write down what constitutes an acceptable training data addition, what the validation procedure is for each update, and who authorizes deployment. Treat this as a controlled document. Regulators will ask for it.
Fourth, use your requirements tool to build explicit links between performance floors and their evidence. Whether you use Flow Engineering or another platform, the traceability chain from requirement to test dataset to validation result to model version should be navigable without manual reconstruction.
The problem your product manager described is real, and it is not going away. The systems engineering discipline is catching up to it, but the gap between the sophistication of learning systems and the maturity of requirements frameworks for them is still wide. The approaches above don’t close that gap. They work within it honestly.