How the FDA’s AI/ML Action Plan Is Changing Medical Device Requirements Practices

The FDA’s 2021 AI/ML Action Plan was not a surprise to anyone who had been watching the agency’s thinking evolve since 2019. But surprise and preparedness are different things. Most medical device companies, even sophisticated ones with mature quality management systems, built their requirements practices around a stable assumption: that a cleared device stays cleared. Software updates go through a defined change process, risk assessments get updated, and the cycle repeats. That model breaks down completely when the device’s core logic is an adaptive algorithm that improves — or drifts — on its own.

The requirements engineering consequences of this shift are still being worked out in real programs, in real organizations, under real regulatory timelines. This article examines what has actually changed, what the standards demand, and where the tools most teams are using today fall short.

The PCCP Problem Is Fundamentally a Requirements Problem

The Predetermined Change Control Plan is the FDA’s mechanism for allowing AI/ML-based Software as a Medical Device (SaMD) to evolve without a full premarket submission for every change. The agency issued its final guidance on PCCPs in December 2024. The core logic is straightforward: if you can specify in advance what will change, under what conditions, within what boundaries, and how you’ll know if it’s safe, the FDA can pre-approve that envelope rather than reviewing each instance.

That description makes a PCCP sound like a configuration document. It is not. A PCCP is a forward-looking requirements artifact. It specifies:

Modification scope: What aspects of the algorithm are permitted to change (architecture, weights, pre/post-processing, intended use population)
Performance boundaries: Quantified thresholds that define the boundary between a permitted change and a change that triggers a new submission
Methodology commitments: How changes will be validated, what data will be used, and how that process maintains device safety
Impact assessment protocols: How the manufacturer will determine whether a change has affected clinical performance in ways that matter to patients

Every one of these elements is a requirements artifact. Modification scope is a constraint on the design space. Performance boundaries are acceptance criteria. Methodology commitments are process requirements. Impact assessment protocols are system requirements on the monitoring and validation infrastructure.

The problem is that most requirements management tools — IBM DOORS, DOORS Next, Jama Connect, Polarion, Codebeamer — were designed to manage requirements for a fixed design. They handle change through versioning: you capture what the requirement said, you capture what it says now, and you trace those versions through impact analysis. That is a perfectly coherent model for hardware and traditional software.

It is the wrong model for a PCCP. A PCCP does not describe how one version of a requirement transitions to another. It describes a space of permitted futures, and it requires the tool chain to maintain the relationship between that space and the current algorithm state at all times. That is a fundamentally different data structure. It is a graph, not a document.

What IEC 62304 and IEC 82304 Actually Require

IEC 62304 is the foundational standard for medical device software lifecycle processes. IEC 82304 extends that framework specifically to health software, including software that runs on general-purpose hardware. Together, they establish the regulatory floor for AI/ML SaMD development.

IEC 62304 is often described as a software process standard, which is accurate but incomplete. It is a traceability standard. Its core requirement — that every software item can be traced to a system requirement, and every system requirement to a safety classification — means that requirements are not just documentation. They are the structural spine of your entire risk and safety architecture.

For AI/ML components, this creates specific pressures:

Architectural decomposition: IEC 62304 requires that you decompose software into units with defined interfaces and behaviors. An ML model trained end-to-end does not decompose naturally. Its behavior emerges from the training data and optimization process, not from explicitly coded logic. Teams must therefore write requirements at the boundary of the model — what goes in, what comes out, what the acceptable output space is — rather than for the model’s internal behavior. This is a legitimate approach, but it requires requirements tooling that can represent those boundaries precisely and link them to risk controls.

Safety classification propagation: Under IEC 62304, if an AI/ML component can contribute to a hazardous situation, it is Class C software. That classification propagates specific process requirements: full software development lifecycle documentation, rigorous unit and integration testing, traceability from every requirement to every test. For adaptive algorithms, the question of “which version of the algorithm” is being tested becomes structurally important — your tool chain must be able to answer it unambiguously.

Change management with SOUP: AI/ML frameworks — TensorFlow, PyTorch, scikit-learn — are Software of Unknown Provenance (SOUP) under 62304. Managing SOUP requirements, including known anomalies and verification of intended use, must be integrated into the requirements trace. Most teams manage SOUP in a separate spreadsheet, which is a traceability gap.

IEC 82304-1 adds a requirements layer that 62304 does not explicitly address: the intended purpose must be specified with enough precision that misuse can be identified and its risk assessed. For AI/ML SaMD, intended purpose includes the training data distribution, the clinical context, the target population, and the expected operating environment. These are not one-line statements. They are multi-attribute specifications that must remain linked to algorithm performance requirements and post-market surveillance triggers throughout the product lifecycle.

Post-Market Surveillance as a Requirements Input

This is the piece that most tool chains are entirely unprepared for.

Traditional post-market surveillance (PMS) in medical devices is a compliance output. You collect data, you analyze it, you update your risk file if something unexpected appears, and you file periodic safety update reports. The requirements process and the PMS process live in separate systems, often managed by separate teams, connected only through the periodic change control cycle.

The FDA’s AI/ML framework breaks this architecture. Under the PCCP model, post-market performance data is not just evidence that the current device is safe — it is an input to the ongoing requirements process. Specifically:

Real-world performance data defines whether the algorithm is still operating within its approved performance boundaries
Drift detection results can trigger requirement reviews (or submissions, if boundaries are exceeded)
Population distribution shifts in real-world data may require updates to intended use specifications
Adverse events involving AI outputs must be traceable to specific algorithm behaviors, which requires linking the event record to the requirements and validation artifacts for that algorithm version

This feedback loop does not exist in most requirements tools because no one designed it in. DOORS and DOORS Next are excellent at managing large, stable requirement hierarchies. They are not designed to ingest structured performance signals from monitoring systems and surface them as pending requirement reviews.

Jama Connect has made meaningful progress on this front through its review-center workflows and integration APIs. But the integrations are custom and maintenance-heavy, and they still treat PMS data as an external artifact to be manually linked, not as a live requirements input.

Algorithm Transparency: A New Class of Requirement

The FDA’s framework — and the parallel EU AI Act requirements for high-risk medical AI — introduces what practitioners are now calling “transparency requirements”: specifications about how an algorithm must be able to explain its outputs to clinicians and regulators.

These are not documentation requirements. They are functional requirements on the device. The algorithm must, in defined circumstances, be able to produce an explanation of why it produced a given output, what features drove that output, and what the confidence level is. For clinical users, these explanations must be legible and actionable. For regulators, they must be auditable.

Writing these requirements is hard because the field of explainable AI (XAI) does not yet have standardized performance metrics the way, say, sensitivity and specificity do. Teams are writing requirements like:

“The system shall provide a saliency map highlighting the image regions that contributed to a positive finding, with a minimum intersection-over-union of 0.65 against clinician-annotated regions of interest on the validation dataset”
“For any output with confidence below 0.72, the system shall display a structured uncertainty statement including the top three contributing factors to reduced confidence”

These requirements have a different structure than classical functional requirements. They reference statistical thresholds, they link to dataset artifacts, and they have test methods that involve human evaluation panels. Maintaining these in a flat-list requirements tool — where every requirement is essentially a paragraph of text with an ID and a status — creates serious maintenance problems. When the threshold changes from 0.65 to 0.70 because your validation dataset expanded, you need to propagate that change across the test protocol, the risk assessment, the PCCP, and the clinical evaluation report. That propagation is a graph traversal. In a document-based system, it is a manual hunt.

Where the Existing Tool Chains Fall Short

To be fair to the established players: IBM DOORS has been managing safety-critical requirements in aerospace and defense for decades. It has sophisticated change impact analysis, proven integration with downstream verification tools, and a large installed base with mature processes built around it. The same is true of Polarion, which has particularly strong integration with the Siemens digital engineering stack, and of Codebeamer, which has made real investments in pharmaceutical and medical device workflows.

The gap is not competence. The gap is architecture. These tools were built when requirements were the start of a linear development process. You write requirements, you design to them, you test against them, you release, you maintain. The AI/ML SaMD lifecycle is not linear. It is iterative at the algorithm level and continuous at the monitoring level, and the requirements process must participate in both loops simultaneously.

Specifically, the gaps that matter for FDA AI/ML compliance are:

No native model for requirement spaces: PCCPs require expressing a permissible range of future states, not just current specifications. None of the major legacy tools have a native construct for this.

Weak graph semantics: The relationships between requirements, risk controls, algorithm behaviors, test datasets, and monitoring triggers are graph-structured. Document-based tools approximate this with custom attributes and external spreadsheets, which breaks down at scale.

No real-time external data integration: Connecting PMS signals to live requirement status requires event-driven integration that most legacy tools expose only through batch import APIs.

Collaboration models designed for annual release cycles: AI/ML SaMD teams may update model weights continuously, with each update requiring at least a lightweight impact assessment against the PCCP boundaries. The review and approval workflows in most tools are designed for quarterly or annual change cycles, not continuous integration pipelines.

How Modern Platforms Are Addressing This

Teams that have moved to graph-based, AI-native requirements platforms are finding that the PCCP workflow becomes significantly more tractable. When requirements are nodes in a connected graph — linked directly to risk items, test cases, algorithm versions, and monitoring thresholds — the impact assessment for a proposed change is a query, not a manual review exercise.

Flow Engineering, for example, structures requirements as a connected model rather than a document hierarchy. When a team defines PCCP boundaries, those boundaries live as explicit relationships in the model: this performance threshold is the boundary condition for this set of algorithm requirements, which trace to these clinical safety controls. When post-market data signals that performance is approaching a boundary, the team can immediately surface which requirements are relevant and what the downstream implications are. That is a structural advantage for regulatory compliance, not a workflow nicety.

The practical benefit shows up in QMS audit preparation. Instead of assembling a traceability matrix manually from multiple systems before a submission or inspection, the team queries the model. The connections are already there, maintained continuously, not reconstructed for the occasion.

Practical Implications for Teams Now

If you are running a medical device program with an AI/ML component — whether you are in active development, in submission, or post-clearance — the following adjustments to your requirements practice are worth making now, regardless of your current tool chain:

1. Treat your PCCP as a living requirements document, not a static submission artifact. It should be under change control, linked to your algorithm versioning system, and reviewed whenever real-world performance data triggers an alert.

2. Write transparency requirements explicitly. Do not defer them to the validation phase. They need to be specified, reviewed, and traced to risk controls before you begin model training, because they constrain your architecture choices.

3. Define the PMS-to-requirements interface as a system design problem. What data will flow from your monitoring system to your requirements process, in what format, on what trigger? If you cannot answer this question, your PCCP is incomplete, whatever it looks like on paper.

4. Audit your SOUP traceability. If your AI/ML framework dependencies are not in your requirements trace with explicit verification of intended use and known anomaly management, you have a 62304 gap that will surface in inspection.

5. Evaluate whether your current tool chain can represent requirement spaces. If it cannot, the PCCP boundary conditions you are writing into Word documents will require manual verification for every algorithm update. That is not a sustainable process at any meaningful update frequency.

Honest Assessment

The FDA’s AI/ML framework is not finished. The PCCP guidance is final, but agency thinking on continuous learning, international harmonization with EU MDR and the AI Act, and real-world performance requirements continues to evolve. Teams that build rigid compliance processes around today’s specific requirements will spend the next three years rebuilding them.

The more durable investment is in requirements infrastructure that can accommodate change: graph-based models over flat documents, connected traceability over assembled matrices, AI-native tooling over AI-added-on features. The teams that are navigating this most effectively are not the ones who have perfectly mapped today’s regulatory requirements into their existing tools. They are the ones who have built requirements processes that can evolve as the regulatory environment does — because in AI/ML SaMD, that is not a future consideration. It is the current condition.

How the FDA's AI/ML Action Plan Is Changing Medical Device Requirements Practices

Key Takeaways