How Medical Device Companies Are Navigating the FDA’s Evolving AI/ML Guidance

The FDA did not write its original software guidance with a device that learns in mind. The 510(k) framework assumes the thing you submit is the thing that ships, indefinitely. A convolutional neural network retrained on new patient cohorts every six months is not that thing. Neither is a clinical decision support algorithm that adjusts its confidence thresholds based on site-specific performance data.

The agency has been working to close that gap since its 2019 discussion paper on AI/ML-based Software as a Medical Device (SaMD), and the pace has accelerated. The 2023 Marketing Submission Recommendations for AI/ML-Enabled Devices, the final guidance on Predetermined Change Control Plans (PCCPs), and the agency’s ongoing Total Product Lifecycle (TPLC) publications have collectively sketched a new regulatory architecture. For systems engineers, that architecture creates requirements problems that existing tooling and existing processes were not built to solve.

What the FDA Is Actually Requiring

The core regulatory innovation in the PCCP framework is conceptually simple: if your device’s algorithm will change after market clearance, you must describe those changes in advance, specify the limits of acceptable change, and define the performance monitoring that will confirm the device remains safe and effective after each update. The FDA is not saying you cannot have an adaptive AI. It is saying that adaptation must itself be pre-specified.

This transforms what would otherwise be a post-market change requiring a new submission into a pre-authorized modification pathway—but only if the engineering team has done the upstream work of bounding the change space. In practice, a PCCP submission requires three components:

Description of planned changes. Not vague intentions (“the model may be retrained”) but specific modification types: retraining on expanded datasets, architectural modifications within specified parameter ranges, threshold adjustments within defined statistical bounds.

Methodology for implementing changes. The training pipeline, validation protocols, and testing procedures that will govern each modification type. This is essentially a requirements specification for your MLOps process, written in language a regulatory reviewer can evaluate.

Impact assessment. A structured analysis of how each planned change type affects device safety and effectiveness, with the implicit requirement that changes outside the assessed envelope trigger a new submission.

The Total Product Lifecycle framework adds the longitudinal dimension. TPLC treats the device not as a static artifact that exits the regulatory process at clearance but as a system whose safety and effectiveness must be demonstrated continuously. Real-world performance data feeds back into the pre-market evidence base. Post-market surveillance becomes a technical input to future PCCP updates, not just an adverse event reporting obligation.

The IEC 62304 Tension Is Real

IEC 62304 remains the applicable standard for medical device software, and its lifecycle model was designed around a world where software versions are discrete, change is deliberate, and the change management process produces a new, identifiable release. That architecture handles traditional embedded software well. It does not map cleanly onto a system where the model weights—which are functionally part of the software—change on a schedule determined by incoming data rather than by a release engineering team.

The standard’s Section 8, covering software maintenance, requires that changes be analyzed for potential safety impact, implemented through a controlled process, and verified. None of that is wrong. The problem is granularity. When a neural network is retrained, the “change” is a new weight matrix that may contain millions of parameters. Conventional change impact analysis, which assumes humans can read and reason about what changed, does not apply directly. The impact must be inferred from behavioral testing across a validated test set—which is a different epistemological approach to change verification than 62304 was written to support.

Forward-thinking companies are handling this by creating a layered software architecture that makes the boundary between fixed and adaptive components explicit in both the design and the regulatory documentation:

  • Fixed software layer: Traditional 62304 lifecycle applies. This includes the application logic, user interface, data ingestion pipelines, and the inference engine itself. These are versioned, change-controlled, and submitted as conventional software components.
  • Model artifact layer: Treated as a configuration of the inference engine rather than a software change, governed by the PCCP-specified retraining protocol rather than standard change management. Regulatory status depends on whether the change falls within the pre-authorized PCCP envelope.

This architectural separation is not universally accepted—regulatory affairs professionals at different companies draw the boundary differently—but it represents the dominant approach among larger device manufacturers working through the current guidance ambiguity.

How Leading Companies Are Structuring Engineering Processes

Across public submissions, published engineering frameworks, and conversations with regulatory engineering teams, several structural patterns are emerging among companies that have successfully navigated PCCP submissions.

Requirements that specify the algorithm’s operating envelope, not just its outputs. Traditional medical device requirements specify what the device must do: sensitivity ≥ 0.92 across the specified patient population, specificity ≥ 0.87, inference latency ≤ 200ms. PCCP-ready requirements add a second axis: the bounds within which the algorithm’s behavior can change while the device remains compliant without a new submission. This includes performance floor definitions (the minimum acceptable performance on the validation cohort after any authorized retraining), dataset composition constraints (acceptable ranges for demographic and clinical variable distributions in retraining data), and architectural bounds (parameter count ranges, layer structure constraints that define what counts as the same model type).

Explicit traceability from regulatory commitments to engineering controls. Every statement in a PCCP creates a downstream engineering obligation. If the plan states that retraining will use datasets with specified demographic balance, there must be a traceable requirement on the data pipeline that enforces that balance, a verification procedure that confirms it was achieved, and a test record that documents it for each retraining event. Companies that manage this with static documents—spreadsheets, Word-based RTMs—report significant overhead in keeping the chain consistent across model versions. The traceability problem is not conceptually new, but the version multiplicity is.

Dual-mode validation protocols. The validation approach for the fixed software layer (functional testing, boundary value analysis, regression testing against a fixed test suite) is distinct from the validation approach for the adaptive layer (performance benchmarking on held-out datasets, distribution shift analysis, fairness audits across demographic subgroups). Companies are documenting these as separate but linked validation protocols, with explicit handoff procedures that determine which protocol applies to a given change.

Post-market data architecture as a requirements artifact. TPLC makes post-market performance data a regulatory input, which means the system that collects, stores, and analyzes that data is not just an operational tool—it is part of the device’s safety case. Requirements for the post-market surveillance system are being written with the same rigor as requirements for the device itself, including data completeness specifications, latency requirements on adverse signal detection, and defined triggers for PCCP amendment.

The EU Parallel Problem

Companies targeting both FDA clearance and CE marking under EU MDR face a compounded challenge. EU MDR Article 61 requires clinical evaluation that demonstrates conformity with general safety and performance requirements throughout the device’s lifetime. The EU AI Act, which applies to AI systems used in medical devices classified as high-risk, adds requirements for post-market monitoring, transparency, and human oversight of AI decision-making that partially overlap with but are not identical to the FDA’s TPLC framework.

The two regimes are not harmonized on the mechanics of change control for adaptive AI. The EU AI Act requires that “substantial modifications” to high-risk AI systems trigger a new conformity assessment, but the definition of substantial modification is not yet fully operationalized through harmonized standards. ENISA guidance and the draft harmonized standard work under CEN/CENELEC are ongoing. In practice, companies are defining their change control boundaries conservatively enough to satisfy the FDA’s PCCP framework and then arguing to their notified body that the same architecture satisfies the EU AI Act’s substantial modification threshold—with mixed success depending on the notified body.

The practical burden is parallel documentation. A PCCP is an FDA-format document. A notified body expects a technical file structured around IVDR/MDR Annex II, with a clinical evaluation report and post-market clinical follow-up plan that may need to incorporate the adaptive algorithm’s performance trajectory. The engineering content is largely the same; the presentation architecture is entirely different. Teams that maintain requirements and traceability data in a living model rather than in formatted documents have a significant advantage when generating both outputs from the same underlying artifact set.

Traceability Approaches That Are Working

The traceability challenge for adaptive AI medical devices is fundamentally a graph problem, not a document problem. The relationships that matter—between regulatory commitments, design requirements, architecture decisions, validation protocols, test results, and post-market performance data—are many-to-many, version-dependent, and need to be queried in multiple directions. A requirement may be satisfied by a combination of a fixed software control and an algorithmic performance bound. A post-market signal may implicate multiple requirements simultaneously. A PCCP amendment may require propagating changes across dozens of linked artifacts.

Document-based RTMs, whether maintained in spreadsheets or generated from traditional requirements management systems, handle this through manual link maintenance. Every version update requires human review of every affected link. For a device with annual retraining cycles and a complex PCCP, this becomes a significant quality system burden.

Tools built on graph-native data models handle version-dependent traceability more naturally. Flow Engineering, for example, structures requirements as nodes in a connected model where relationships are first-class objects, queryable and analyzable rather than just documented. For medical device teams managing adaptive AI products, this architecture allows querying the model in ways that document-based systems cannot support directly: “What requirements are affected if the retraining dataset composition constraint changes?” or “What validation evidence would be invalidated by an architectural modification outside the current PCCP envelope?” Those are exactly the questions a regulatory engineer needs to answer quickly when a post-market signal arrives or when the clinical team proposes an enhancement.

The shift from document-based to graph-based requirements management is not cost-free. It requires the team to invest in model discipline—defining entity types, relationship types, and attribute schemas before the model is useful—and it requires integration with downstream tools (test management, post-market surveillance systems, document generation for regulatory submissions). But for adaptive AI medical devices, where the traceability problem will compound with every model version, the investment case is straightforward.

Honest Assessment

The FDA’s PCCP and TPLC frameworks are genuinely better regulatory architecture for adaptive AI than what existed before. They acknowledge that the device is a system that evolves, and they create a structured pathway for managing that evolution. The underlying logic—pre-specify the change space, define the controls, monitor performance, trigger new submissions when you exit the envelope—is sound engineering translated into regulatory language.

The implementation is incomplete. The guidance leaves significant discretion to manufacturers on questions that matter: What constitutes an architectural change sufficient to exit the PCCP envelope? How granular must the dataset composition constraints be? What statistical confidence is required on post-market performance monitoring before a retraining trigger fires? Companies are making defensible choices on these questions, but they are making different choices, and there is no harmonized standard yet to resolve the variance.

For systems engineers, the immediate implication is clear: the requirements problem for adaptive AI medical devices is harder than for conventional devices, the regulatory stakes for getting it wrong are higher, and the tooling most engineering teams are using was not designed for it. The companies gaining ground are those treating the PCCP as an engineering document first and a regulatory document second—building the requirements model that the PCCP describes, then generating the submission artifact from that model rather than writing the submission and hoping the engineering catches up.

That inversion of the traditional regulatory-to-engineering flow is the deeper shift the FDA’s guidance is forcing. It is overdue, and it is not optional.