Can You Certify AI and Machine Learning Components Under DO-178C?
The direct answer is: not straightforwardly, and not yet with a fully approved means of compliance. DO-178C remains the governing standard for airborne software assurance in the United States and Europe, but it was written for deterministic, logic-driven software. Machine learning components — particularly neural networks — violate several of the assumptions the standard rests on. That does not mean ML cannot enter avionics systems. It means the industry is actively constructing the scaffolding required to certify it, and avionics programs integrating ML today must understand exactly where the gaps are.
This article explains what DO-178C requires, where ML-based systems structurally diverge from those requirements, what regulatory guidance currently exists, and what supplemental means of compliance are emerging. It also covers the practical engineering implications for programs that need to scope, trace, and verify ML components right now, before that scaffolding is complete.
What DO-178C Actually Requires
DO-178C — Software Considerations in Airborne Systems and Equipment Certification — specifies a set of objectives that avionics software must satisfy to receive certification credit. The standard is structured around software levels (DAL A through E), and the rigor of each objective scales with those levels. For DAL A software, the objectives include:
Traceability. Every high-level requirement must trace to low-level requirements, and every low-level requirement must trace to source code. The chain is bidirectional and must be verified.
Structural coverage. Source code must be exercised by tests to specific structural coverage criteria. DAL A requires Modified Condition/Decision Coverage (MC/DC), which demands that each independent condition in a decision be shown to independently affect the outcome. DAL B requires decision coverage; DAL C requires statement coverage.
Determinism. DO-178C assumes that given the same inputs, software will always produce the same outputs. This assumption is embedded in the test repeatability and equivalence requirements throughout the standard.
Verification of development artifacts. Reviews, analyses, and tests must be applied to requirements, design, source code, and executables. Each artifact must be traceable to the process that produced it.
These are not bureaucratic formalities. They exist because the failure modes of logic-driven software are, in principle, analyzable. You can read code, trace its decision branches, bound its behavior, and prove that tests have exercised all of those branches. The architecture of the standard assumes that the behavior of the software is determined by human-written, inspectable logic.
Where Machine Learning Diverges
Neural networks invert nearly every assumption above.
The logic is not inspectable. A trained neural network’s behavior is encoded in millions or billions of floating-point weights. There is no equivalent of reading a conditional branch and understanding what condition triggers it. The model’s decision boundaries are implicit and distributed across the network.
MC/DC does not map onto neural architectures. Structural coverage for classical software means exercising code paths. A neural network has no code paths in that sense. Neurons activate along a continuum; there are no discrete conditions whose independent effect can be demonstrated in the way MC/DC requires. Researchers have proposed analogues — neuron coverage, layer coverage, activation pattern diversity — but none of these have been accepted as equivalent means of compliance by the FAA or EASA.
Training data is a first-class engineering artifact, and it has no counterpart in DO-178C. The behavior of a trained model is determined in large part by the distribution of its training data. DO-178C does not address the development lifecycle of training datasets, labeling processes, augmentation strategies, or distribution shift. This is not a gap in the standard — it is an absence, because that category of artifact did not exist when the standard was written.
Non-determinism complicates verification repeatability. Many ML inference frameworks, particularly on GPU hardware, produce outputs that can vary slightly across runs due to floating-point operation ordering. DO-178C’s repeatability requirements assume identical outputs from identical inputs.
The boundary between requirements and implementation is blurred. In classical development, a requirement states what the software shall do; the design and code show how. For an ML component, the requirement might be “classify runway surface condition with less than 2% false negative rate under the following operational conditions.” Whether the trained model satisfies that requirement is determined empirically across a test dataset, not by logical analysis of the implementation.
None of this means ML is inherently unsafe. It means the analytical framework DO-178C provides — which has worked extremely well for deterministic software for decades — does not transfer without significant modification.
Current Regulatory Guidance
Both the FAA and EASA have acknowledged the gap and are working to close it.
FAA — PSCP for Machine Learning-Based Airborne Systems. The FAA’s Project-Specific Certification Plan (PSCP) approach allows applicants to propose a custom certification basis for novel technologies. Several programs have submitted PSCPs that address ML components, and the FAA has published general guidance on what it expects such plans to contain: explicit identification of the ML component’s role in the safety architecture, clear statement of the operational design domain (ODD), dataset management documentation, and a verification strategy that addresses both nominal and out-of-distribution performance. The FAA has not published an approved means of compliance for ML — the PSCP process is, by design, project-specific.
EASA — Concept Paper: First Usable Guidance for Level 1 and 2 ML Applications. EASA’s 2023 and 2024 concept papers introduced the concept of Learning Assurance, organized around a machine learning development lifecycle (MLDLC). The papers propose objectives for data management, model training, model verification, and operational monitoring. EASA explicitly distinguishes between ML Level 1 (no learning in deployed system) and ML Level 2 (online learning). The concept papers are guidance material, not approved means of compliance. EASA has signaled that a full Acceptable Means of Compliance (AMC) is in development, with no committed timeline.
ARP 6983 — in development. SAE is developing ARP 6983, Process Standard for Development and Certification/Approval of Aeronautical Safety-Related Products Implementing AI, which is intended to provide the industry-consensus lifecycle standard for ML in avionics, analogous to how DO-178C provides the lifecycle for classical software. Draft material has circulated within SAE committees but the standard is not yet released for public use.
The practical implication: programs being developed today that integrate ML components are operating in a regulatory environment where the guidance exists but the approved means of compliance does not. The FAA and EASA expect applicants to engage early, propose a certification basis through project-specific plans, and document their reasoning explicitly.
Supplemental and Emerging Means of Compliance
Several technical approaches are being developed or proposed to fill the gaps DO-178C leaves for ML:
Formal methods for bounded verification. For constrained ML architectures — small networks with bounded inputs — formal verification tools can prove certain safety properties hold across the entire input space. This is computationally expensive and currently applicable only to narrow, well-scoped functions, but it provides the closest analog to the exhaustive analysis DO-178C enables for classical software.
Adversarial robustness testing. Rather than structural coverage, adversarial test suites systematically probe the model’s decision boundaries for unexpected failures. Robustness metrics (FGSM, PGD, natural adversarial examples) are increasingly being proposed as part of verification evidence packages, though acceptance criteria remain applicant-defined.
Dataset management and data quality standards. EASA’s learning assurance framework places heavy emphasis on dataset governance — provenance, completeness, balance, labeling quality, and traceability of data transformations. Programs are beginning to treat training datasets as configuration-managed engineering artifacts with audit trails, analogous to how DO-178C treats source code.
Operational design domain (ODD) constraints and monitoring. Rather than verifying ML behavior across all possible inputs, ODD-constrained designs limit the operational envelope in which the ML component is authorized to act, and pair it with runtime monitors that detect out-of-distribution inputs. The safety argument shifts from “we have verified this model universally” to “we have verified this model within this envelope, and we detect departures from that envelope.”
Partitioning and DAL allocation. Several applicants have structured their architectures so that the ML component is allocated a lower DAL (or no DAL) by arguing that it functions as an advisory input to a deterministic, certified decision function. This approach can work for non-critical functions but requires the safety analysis to carefully bound what harm can result if the ML component produces a wrong output.
Practical Implications for Avionics Programs Today
If you are integrating an ML component into an avionics program right now, several steps apply regardless of whether the final means of compliance is settled:
Define the ML component’s requirements explicitly and narrowly. Every ML component needs performance requirements stated in terms of measurable outcomes — accuracy, recall, latency, confidence calibration — bounded by the operational design domain. Ambiguous requirements (“shall accurately detect obstacles”) are not sufficient. They will not support verification, and regulators will reject them.
Document the verification strategy before the model is trained. The choice of training data, evaluation datasets, test conditions, and acceptance thresholds should be part of a documented plan, not an ad hoc decision made after training.
Assign a certification risk flag to every ML-allocated requirement. Within your requirements management system, ML-allocated requirements carry a different verification posture than deterministic-software requirements. They need explicit linkage to the planned means of compliance, the ODD, and the monitoring strategy.
Engage the certification authority early. Neither the FAA nor EASA expects applicants to have solved the ML certification problem independently. Both expect early engagement, written issue papers, and a proposed certification basis that the authority can respond to. Waiting until late in development to raise these questions creates program-ending risk.
Managing ML Requirements in Practice
The requirements management challenge for ML-integrated programs is materially harder than for classical avionics software. You are tracking not just functional and safety requirements allocated to code, but performance requirements allocated to models, data requirements allocated to training pipelines, ODD constraints allocated to runtime monitors, and verification strategy documentation that spans multiple engineering disciplines.
This is an area where the architecture of your requirements management tool matters. Tools built on document-based paradigms — where requirements live in structured text and traceability is maintained through manual link tables — struggle when the verification evidence for a requirement is not a test case but a statistical evaluation over a dataset. The link structure becomes unwieldy, and the relationships between a performance requirement, its evaluation methodology, its acceptance threshold, and its monitoring strategy are difficult to represent coherently.
Graph-based requirements management platforms handle this more naturally. When requirements, components, verification activities, and supporting artifacts are nodes in a connected model rather than rows in a document, you can represent the full verification network for an ML component — requirement → ODD constraint → evaluation dataset → acceptance threshold → runtime monitor — without forcing it into a flat traceability matrix.
Flow Engineering, which was built as an AI-native requirements management platform for hardware and systems engineering programs, structures requirements as graph nodes with typed relationships. This lets teams explicitly tag requirements as ML-allocated, link them to the planned verification methodology and operational domain constraints, and track the status of each compliance argument as it develops. For programs navigating project-specific certification plans, where the verification approach itself is part of the deliverable to the authority, that kind of structured visibility is operationally valuable.
The platform also supports cross-discipline traceability — connecting systems-level safety requirements through to ML component performance requirements and down to dataset and evaluation artifacts — which is the traceability chain regulators will expect to see documented when they review an ML certification package.
Honest Assessment
DO-178C cannot be directly applied to ML components without supplemental means of compliance. The structural coverage objectives, determinism assumptions, and code-level traceability requirements do not have straightforward ML equivalents. Regulatory guidance from the FAA and EASA exists and is useful, but approved means of compliance are not yet finalized.
Programs integrating ML components today are not blocked, but they are operating with more regulatory uncertainty than programs using classical software architectures. The mitigation is to scope ML components precisely, document the verification strategy explicitly, engage certification authorities early, and maintain structured traceability that can support a project-specific certification case. The engineering work required to do that well is real, and it starts in requirements management before a model is ever trained.