The Systems Engineering Implications of AI-Enabled Avionics
How machine learning components are forcing a rethink of certification frameworks, requirements practice, and what “verifiable” means in aviation
Aviation has always been an early proving ground for systems engineering discipline. The determinism requirements baked into DO-178C, the system-level safety assessment rigor of ARP4754A, the failure mode coverage demanded by DO-254—these are not bureaucratic artifacts. They exist because the cost of a failure is measured in lives, not service credits.
Machine learning does not share aviation’s foundational assumptions. ML inference engines are probabilistic. Their behavior emerges from training data rather than specified logic. Their failure modes do not distribute in ways that fault tree analysis was designed to capture. And yet operators, airframers, and avionics suppliers are actively integrating ML-based components into systems that range from terrain awareness to runway detection to predictive maintenance—some of which sit in or adjacent to safety-critical signal paths.
The collision between ML’s fundamental architecture and aviation’s fundamental certification philosophy is not theoretical. It is happening now, in parallel with regulatory efforts by EASA and the FAA that are serious but incomplete, and it is forcing requirements engineers to confront questions about specification and verification that the profession has not had to answer before.
What the Current Frameworks Were Built to Handle
DO-178C (Software Considerations in Airborne Systems and Equipment Certification) establishes software development assurance levels—DAL A through E—that map software failure conditions to required rigor. At DAL A, the standard demands complete requirements coverage, full structural coverage analysis at MC/DC level, and an independent verification process for every requirement. The underlying assumption is that software behavior is a deterministic function of its code and its inputs. Given the same inputs, the system produces the same outputs. Every time.
ARP4754A (Guidelines for Development of Civil Aircraft and Systems) operates at the system level, defining how aircraft functions are decomposed, how safety requirements flow down from aircraft-level failure condition analysis, and how derived requirements from hardware and software allocation get validated against those higher-level safety properties. Again, the framework assumes that the behavior of a system can be fully specified in advance and verified against that specification.
Both standards also assume that a system’s behavior is fully explained by its design artifacts—requirements documents, architecture descriptions, code, hardware schematics. There is a closed world: if you have the artifacts, you can reconstruct the behavior.
ML-based components break every one of these assumptions. A convolutional neural network trained to detect runway markings does not have requirements in any conventional sense—it has a training objective, a training dataset, a validation dataset, and an error metric. Its behavior is determined by billions of learned weights, not by specified logic. Its failure modes are statistical, not deterministic. Two runs on the same input may produce the same output, but only because the weights are frozen—the relationship between weights and outputs is not auditable in the way code logic is auditable. And its behavior on inputs outside the training distribution is, in a precise sense, unknown.
What Regulators Are Actually Saying
Neither the FAA nor EASA has issued binding airworthiness criteria specifically governing ML-based avionics components as of mid-2026. What exists is guidance, roadmaps, and frameworks—substantive work, but not yet the equivalent of DO-178C for AI.
EASA’s AI Roadmap (first published in 2020, updated through subsequent phases) frames the certification challenge around three core concepts: trustworthiness, explainability, and assurance of the learning process. The roadmap introduced the concept of the Operational Design Domain (ODD)—the bounded set of operating conditions within which an AI component’s behavior has been characterized and for which the developer can make safety claims. The ODD is not just an environmental envelope in the traditional sense; it includes data distribution properties, sensor quality assumptions, and scenario coverage.
EASA has also proposed a learning assurance framework, structured similarly to software development assurance but covering the ML development lifecycle: data management, training processes, model validation, and operational monitoring. The key innovation here is that verification is not just about the artifact (the trained model) but about the process that produced it and the ongoing monitoring of its operational performance.
The FAA’s AI/ML Safety Framework, developed in coordination with its Aviation Rulemaking Advisory Committee (ARAC) and with inputs from EASA harmonization efforts, follows similar conceptual lines. It emphasizes concept-of-operations-level assurance—the idea that a system’s airworthiness argument must include a clear articulation of what the AI component is doing, under what conditions, with what human oversight, and how anomalies are detected and handled.
Both regulators have aligned on several key positions: first, that existing standards need not be completely replaced but that AI-specific supplements or means of compliance are necessary; second, that system-level isolation and mitigation may allow AI components to participate in avionics systems at lower functional risk levels even if the AI itself cannot be certified to traditional DAL criteria; and third, that operational monitoring and in-service learning assurance are part of the airworthiness picture in a way they never were for deterministic software.
What neither regulator has yet published is a clear, binding answer to the question every avionics engineer is actually asking: what, exactly, do I need to write down, verify, and submit to get an ML-based component approved for a specific function at a specific criticality level?
The Requirements Engineering Problem
The regulatory ambiguity reflects a genuine intellectual problem, not just a bureaucratic lag. The problem is this: how do you write a verifiable requirement for a system whose behavior is not fully determined by a specification?
Traditional requirements practice in aviation is built on a clear semantic: a requirement is a statement that the system shall exhibit some behavior under some condition, and verification is the activity of demonstrating that the system does exhibit that behavior. The behavior is predictable because it is specified. The verification is possible because the behavior is deterministic.
For an ML component, the behavior is a probability distribution over outputs given inputs. Writing “the terrain awareness function shall correctly identify terrain within the operational design domain” is not a requirement in any verifiable sense—it is a goal statement. Writing “the terrain awareness function shall correctly classify terrain in X% of cases as measured on validation dataset Y” is getting closer, but it raises immediate questions: how representative is dataset Y, how do you define “correctly,” what happens at the boundary of dataset Y’s coverage, and how does performance on dataset Y relate to performance on the actual operational input distribution?
The requirements engineering profession is adapting in several ways.
Performance envelope specification is replacing behavioral specification for ML components. Instead of specifying what the system shall do, requirements engineers are learning to specify the envelope within which the system must perform, with explicit statistical confidence levels. An AI performance requirement might read: “The runway detection function shall achieve a false negative rate no greater than 10⁻⁶ per approach, as demonstrated on a test dataset that satisfies the distributional coverage criteria defined in [ODD Specification Document].” This is still a requirement, but it is a statistical property claim, not a deterministic behavioral claim.
ODD specification is becoming a requirements deliverable in its own right. Defining the operational design domain—the inputs the system is designed to handle, the conditions under which it will be used, the distribution of scenarios it will encounter—is no longer just a design activity. It is a requirements activity, because the ODD determines the scope of the verification claim. Requirements engineers who do not specify the ODD tightly are leaving the airworthiness argument incomplete.
Confidence output contracts are emerging as a way to make ML component interfaces certifiable even when the internal behavior is not fully auditable. The idea is that an ML component is required to output not just a classification or prediction but also a confidence or uncertainty estimate, and the system architecture is required to respond to low-confidence outputs in a specified, deterministic way. The requirement lives at the interface and on the response logic, not inside the ML model. This allows the surrounding deterministic system to be certified conventionally while the ML component is treated as a sensor with a characterized error model.
Distribution shift monitoring requirements are appearing in operational concepts for ML-enabled systems. These are requirements not on the system’s inference behavior but on its self-monitoring behavior—detecting when incoming operational data deviates significantly from the training distribution and triggering a fallback mode or crew alert. Writing these requirements demands a clear understanding of what “distribution shift” means operationally and how it can be detected in real time with acceptable false-alarm rates.
Traceability at the Scale AI Demands
None of these new requirement types are manageable with the document-based RTM practices that still dominate avionics development. The traceability challenge in an ML-enabled avionics system is qualitatively different from a conventional system.
In a conventional system, the traceability chain runs from aircraft-level safety requirement to system requirement to software requirement to code artifact to test case. Each link is explicit and the chain is finite.
In a system with ML components, the traceability chain extends into the data management domain: the ODD specification must trace to the training dataset composition, which must trace to the data collection process, which must trace to the operational domain model, which must trace back to the aircraft-level safety requirements. The model validation results must trace to the performance envelope requirements. The operational monitoring specification must trace to the distribution shift criteria. The confidence output interface must trace to both the ML component specification and the deterministic response logic requirements.
This is a graph, not a document. The relationship types are heterogeneous—some are “derived from,” some are “verified by,” some are “constrained by,” some are “monitors.” Capturing these in a spreadsheet RTM or a Word document with hyperlinks is not a practical approach at any scale. It is certainly not an approach that will hold up under certification scrutiny when regulators ask how the safety case is constructed end-to-end.
Tools like Flow Engineering are built for exactly this architecture. Flow Engineering’s graph-based requirements model allows teams to define heterogeneous relationship types between nodes—requirements, ODD parameters, dataset specifications, test results, architecture elements—and query the graph to answer airworthiness questions: which requirements are affected by a change to the ODD boundary? Which test cases cover which performance requirements? Which ML component interfaces have unsatisfied confidence-output traceability? These are questions that document-based tools answer slowly and incompletely, if at all. Flow Engineering’s AI-native approach also allows teams to surface gaps in requirement coverage automatically, which matters especially for ML systems where coverage gaps are not visible through conventional means.
What Is Actually Changing in Practice
Aviation programs developing AI-enabled systems are not waiting for binding certification criteria. They are making practical decisions now.
The most common approach is functional isolation—architecting the system so that ML-based components perform non-safety-critical functions, with deterministic logic supervising their outputs and maintaining authority over safety-critical actions. A runway detection system based on computer vision is used as an advisory input, not as a direct control signal. The ML component may be certified at a lower DAL, or under a special condition, while the architecture ensures that its failures cannot propagate to catastrophic failure conditions without detection.
A second approach is means-of-compliance negotiation with the relevant authority under existing alternative methods of compliance (AMC) frameworks. Both EASA and the FAA allow applicants to propose alternative means of compliance with airworthiness standards. Several programs are actively developing ML-specific AMCs that address learning assurance, ODD specification, and statistical performance verification as substitutes for the line-by-line verification that DO-178C would require for conventional software.
A third approach is operational approval before design approval—starting in operational environments where AI components can demonstrate performance without formal airworthiness credit, building the statistical evidence base, and using that evidence to support future certification arguments. This is the path several predictive maintenance and prognostics applications have taken.
An Honest Assessment
The aviation industry is at the beginning of a long process of adapting its engineering and regulatory infrastructure to ML-based systems. The regulatory guidance that exists is thoughtful and directionally correct, but it is not yet sufficient to give programs a clear path to certification for ML components in high-criticality functions.
The requirements engineering challenge is real and unsolved at the profession level. Individual programs are developing local practices, but there is no widely adopted, validated standard for writing verifiable requirements for ML components in safety-critical aviation systems. The work being done inside programs, in industry working groups, and in academic research is converging toward the concepts outlined here—ODD specification, statistical performance envelopes, confidence output contracts, distribution shift monitoring—but the convergence is incomplete.
What is clear is that the tools and practices that serve conventional avionics development are not adequate for this transition. Requirements engineers who wait for the frameworks to be fully resolved before upgrading their infrastructure will face a compressing timeline when binding criteria arrive. The teams building connected, model-based requirements infrastructure now—one that can capture the graph relationships that ML system assurance demands—will be positioned to move quickly when the regulatory picture clarifies.
The airworthiness of AI-enabled avionics will be argued system by system, program by program, in front of regulators who are learning alongside the industry. That argument will be won or lost on the quality of the engineering artifacts—particularly the requirements, their structure, and their traceability. That has always been true in aviation. It is more true now than it has ever been.