Why Hardware Certification Programs Are Starting to Demand AI Tool Qualification

The conversation has shifted. Eighteen months ago, program managers at aerospace and automotive OEMs were asking whether they could use AI-assisted tools in certification workflows. Today, the question is what those tools must prove before they can stay.

The change is partly regulatory momentum and partly the consequence of early adoption. Programs that moved fast with AI-assisted requirements generation and traceability are now in detailed design or preliminary certification review—and their DERs, functional safety managers, and notified bodies are asking questions the tools weren’t designed to answer.

The standard at the center of this is DO-330, the software tool qualification supplement to DO-178C. It was written for deterministic software tools. It is now being applied, formally or by analogy, to tools that contain probabilistic models. That application is not clean. But it is happening, and it is affecting procurement decisions on programs that have not yet started.


What DO-330 Actually Requires

DO-330 exists because a tool that automates, replaces, or reduces verification steps can introduce errors that the process no longer catches. The standard defines three Tool Qualification Levels: TQL-1 through TQL-3, with TQL-1 being the most rigorous.

The level assigned depends on two factors: the Development Assurance Level (DAL) of the software being certified, and what the tool actually does to the process.

A tool that automates a process that would otherwise require verification gets the highest scrutiny. If your requirements tool is generating derived requirements that flow into a safety-critical allocation, and those requirements are not being independently re-verified by a human, that tool is performing a verification-equivalent function. On a DAL-A or DAL-B program, that can trigger TQL-1 or TQL-2.

TQL-1 qualification requires:

  • A Tool Operational Requirements document describing what the tool does
  • A Tool Development Plan and associated life cycle data
  • Evidence of testing sufficient to demonstrate the tool performs its stated function correctly
  • Configuration management and problem reporting records for the tool itself

For a COTS requirements tool like DOORS or Jama Connect—tools with deterministic, documented behavior—this qualification process is demanding but tractable. The tool does what the vendor documents. You can test it. You can produce evidence that it does it correctly.

For a tool that uses a large language model to suggest, generate, or evaluate requirements content, the tractability breaks down at several points.


Where AI Tools Collide With Qualification Logic

The DO-330 qualification framework assumes three things that large language models violate structurally.

Determinism. DO-330 qualification testing assumes you can run the same input, get the same output, and document that the output is correct. LLMs are not deterministic by default. Temperature settings, model version updates, and prompt context all affect outputs. Two runs of the same requirement through an AI traceability engine can produce different link suggestions. Qualification test suites depend on repeatability.

Explainability. The Tool Operational Requirements document must describe what the tool does with sufficient precision that a test can confirm it. “The model identifies likely traceability links based on semantic similarity” is not an operational requirement—it is a capability description. An operational requirement specifies conditions, inputs, and outputs with enough precision to make a pass/fail determination. For most current LLM-based tools, that specification either doesn’t exist or can’t be made accurate without misrepresenting how the model works.

Configuration stability. DO-330 requires that the tool under qualification be configuration-controlled. When an AI vendor updates the underlying model—even a minor version update—the qualification evidence for the previous configuration may no longer apply. For tools built on top of third-party foundation models, configuration control of the model itself is often not available to the tool vendor, let alone the end user.

None of these problems are fatal. But they require honesty about what the tool is actually doing in the workflow, and they require system-level mitigations if the tool is performing functions that affect certification evidence.


How the Industry Is Actually Responding

The response across aerospace, automotive, and medical devices is not uniform. It is splitting into three patterns.

Pattern 1: AI tools are placed strictly outside the certification boundary. The tool is used for exploration, early drafting, and internal review. Nothing the AI generates or evaluates is treated as certification-grade output without human re-generation or independent verification. The tool is classified as a development aid at a qualification level that imposes minimal evidence obligations. This is the most common current approach on active programs. It is also the most conservative and the most likely to leave performance gains on the table.

Pattern 2: Program offices are writing AI tool qualification criteria into SOWs and supplier requirements. This is the pattern that is changing procurement. Primes and Tier 1s with established qualification processes are starting to require that AI tool vendors provide Tool Operational Requirements documentation, version-specific qualification test results, and a configuration control plan before a tool can be used on activities that touch certification evidence. Several automotive OEMs with ISO 26262 programs and at least two major aerospace primes have added AI tool qualification clauses to their supplier quality agreements in the past twelve months.

Pattern 3: Tool vendors are beginning to differentiate on qualification transparency. This is the most significant long-term development. Vendors who can produce structured documentation of what their AI features do—and, critically, what they don’t do—are gaining traction on programs where qualification scrutiny is high. Vendors who describe their AI capabilities only in marketing terms are being asked harder questions in pre-award technical evaluations.


The Transparency Dimension That Procurement Is Starting to Score

There is a concept in qualification discussions that is underappreciated in vendor marketing but heavily weighted by safety managers: the mitigation envelope. If a tool can clearly document the conditions under which its AI outputs are reliable, the conditions under which they are not, and the human review steps required to make those outputs certification-grade, then the tool can be qualified—or excluded from the certification boundary in a disciplined way—regardless of whether the AI behavior itself is fully deterministic.

This is not a workaround. It is the correct application of DO-330 logic to tools that augment human judgment rather than replace it.

The tools that are failing qualification reviews are not failing because they contain AI. They are failing because their vendors cannot answer the following questions with documented specificity:

  1. What inputs does the AI component accept, and what are the preconditions for valid operation?
  2. What outputs does it produce, and what is the defined format and accuracy envelope for those outputs?
  3. What actions does the tool prevent the AI from taking without explicit engineer confirmation?
  4. How is the AI component version-controlled, and what is the process for re-qualification after a model update?
  5. Where in the workflow does the tool draw a hard line between AI suggestion and engineer decision?

Tools built on graph-based, structured data models have a natural advantage here over tools built on document-based or unstructured representations. A tool that tracks requirements, attributes, links, and change history as explicit data objects can define precisely which of those objects the AI touches and which it doesn’t. The qualification envelope can be drawn around specific operations on specific data types. That is not possible in a tool where requirements exist as formatted text in a word-processor-style interface with AI features layered on top.


Where Flow Engineering Fits This Picture

Flow Engineering is one of the tools that has built its architecture around the distinction between AI-suggested and engineer-confirmed. Its graph-native data model means that every requirement, every attribute, and every traceability link is a structured object with an owner and a history. When an AI feature proposes a link or flags a gap, that proposal is a distinct data state from a confirmed link. The engineer action that confirms it is logged separately.

That architectural discipline is not incidental to qualification. It is what makes it possible to answer questions 1 through 5 above with precision rather than approximation. When a safety manager asks “did the AI create this traceability link or did an engineer create it?”—the answer is in the data model, not in a process description that someone has to reconstruct from memory.

Flow Engineering is also explicit about where its AI does and doesn’t operate. It does not use AI for output generation in contexts where the output would be treated as certification-grade without review. The AI assists with gap detection, consistency checking, and suggesting candidate links. The confirmation of those suggestions—and the recording of that confirmation—is a required human step. That design choice limits some of the speed gains that more aggressive AI automation would produce. It also means the tool can be placed in certification-proximate workflows without requiring the program to treat every AI suggestion as an unqualified tool output.

This is the tradeoff the qualification-aware programs are making: some capability ceiling in exchange for a clear, auditable boundary between machine suggestion and human judgment.


The Honest Assessment for Program Managers

AI tool qualification is not a barrier that regulators are erecting to slow adoption. It is the same question they have always asked about every tool that touches certification evidence: can you tell me what it does, show me that it does it correctly, and demonstrate that you would catch it if it didn’t?

The programs that are running into trouble are the ones where AI tools were adopted quickly on the assumption that the qualification question could be answered later. It cannot be answered later if the tool vendor doesn’t have the documentation and the program doesn’t have the workflow controls to produce it.

The practical guidance is straightforward:

Before procurement: Require that vendors provide, not just describe, a Tool Operational Requirements document for their AI features. Ask specifically how model updates are handled with respect to prior qualification evidence. Ask where the tool draws the line between AI output and engineer decision.

During program setup: Classify AI tool activities explicitly against DO-330 (or the equivalent in your standard—ISO 26262 has analogous requirements under the software tool confidence level framework, as does IEC 62304). Don’t defer this classification to the first certification audit.

In workflow design: Build the human confirmation step into the process, not as a checkbox but as a recorded action in the tool. The tool’s own data should be the evidence that the engineer reviewed and confirmed every AI suggestion that became certification-grade output.

The regulators are not asking for perfection. They are asking for honesty and traceability—the same things they have always asked for. The AI tools that will survive procurement scrutiny on safety-critical programs are the ones whose vendors understood that before the procurement, not after it.