What Is a Failure Mode and Effects Analysis (FMEA)?

What FMEA Is — and What It Is Not

Failure Mode and Effects Analysis is a structured, bottom-up method for identifying every way a system, subsystem, component, or process can fail, reasoning about the effect of each failure, and rating the risk so engineering teams can prioritize mitigations. It is proactive — the analysis happens before failures occur, during design or process planning — and it is systematic, working through potential failure modes one by one rather than relying on engineering intuition alone.

FMEA is not a test. It does not verify that a system works. It is an engineering reasoning exercise that surfaces what could go wrong and creates a documented record of how the team decided to address it.

The output of an FMEA is not just a filled-in worksheet. Done well, FMEA produces a set of risk ratings that drive actions: design changes, additional test coverage, tightened manufacturing tolerances, new requirements. The worksheet is evidence of the reasoning. The actions it generates — particularly the requirements it creates — are the actual engineering product.

Three Types of FMEA

The term FMEA covers a family of related analyses. Practitioners in automotive and aerospace commonly work with three distinct types, each with a different scope and focus.

Design FMEA (DFMEA)

Design FMEA examines how individual components or assemblies can fail to meet their intended function. The scope is the product design itself, not the process used to manufacture it. A DFMEA asks: given the design as specified, how can this part or subsystem fail? What is the effect on the next higher assembly? What is the effect on the end user?

DFMEA is the most common entry point for hardware teams. For a brake caliper, a DFMEA might identify failure modes including seal degradation leading to fluid loss, piston corrosion leading to sticking, and incorrect bore diameter leading to inadequate clamping force. Each failure mode is rated and linked to a design control or mitigation.

Process FMEA (PFMEA)

Process FMEA shifts focus from the design to the manufacturing and assembly process. The failure modes in a PFMEA are process failures: incorrect torque application, out-of-tolerance machining, wrong material lot, contamination during assembly. The effects are defects in the produced part and, ultimately, field failures in the product.

PFMEA is critical in high-volume manufacturing where process variation is a significant risk driver even when the design is sound. Automotive suppliers are required by customer quality systems to maintain PFMEAs for production processes.

System FMEA (SFMEA)

System FMEA operates at the highest level of abstraction, examining interactions between subsystems and identifying failure modes that emerge from those interactions rather than from any single component. An SFMEA might identify that two subsystems, each functioning correctly in isolation, interact in a way that produces a hazardous output under specific conditions.

System FMEA is particularly relevant in complex, multi-domain products: autonomous vehicles, satellite systems, avionics architectures. The cross-subsystem failure modes it identifies are often the ones that escape component-level analyses.

Severity, Occurrence, and Detectability: How Risk Gets Rated

Every failure mode in an FMEA receives three numeric ratings, each on a 1–10 scale, and those ratings combine into a single priority score.

Severity (S) rates the consequence of the failure effect on the customer or system. A rating of 1 represents no discernible effect. A rating of 10 represents a failure that affects safe operation without warning — injury or regulatory non-compliance. Severity ratings do not change based on controls; they represent the inherent consequence of the failure if it reaches the end user or the system boundary.

Occurrence (O) rates how frequently the failure mode is expected to occur, given current design or process controls. A rating of 1 represents a failure mode that is extremely unlikely. A rating of 10 represents a failure that is almost certain to occur in operation. Occurrence ratings are reduced by design changes that eliminate or reduce the root cause of the failure, not by detection controls.

Detectability (D) rates the likelihood that current controls will detect the failure mode before it reaches the customer or causes harm. A rating of 1 means the failure is almost certain to be detected. A rating of 10 means there is no detection mechanism — the failure will not be caught. Detectability ratings are improved by adding inspections, tests, or diagnostic coverage.

Risk Priority Number (RPN) is the product of the three ratings:

RPN = S × O × D

The theoretical range is 1–1000. High RPNs identify failure modes that demand mitigation action. Teams define an RPN threshold above which action is required, though many quality standards now discourage treating RPN as the sole criterion. A failure mode with a severity of 10 and an RPN of 100 may deserve more attention than a failure mode with a severity of 3 and an RPN of 200. RPN is a prioritization tool, not a pass/fail gate.

After mitigation actions are defined and implemented, the FMEA is updated with revised occurrence and detectability ratings to produce an action priority RPN that confirms the risk reduction achieved.

Automotive and Aerospace Variants

AIAG-VDA FMEA (Automotive)

The Automotive Industry Action Group (AIAG) and Verband der Automobilindustrie (VDA) jointly published a harmonized FMEA methodology in 2019, replacing both organizations’ prior independent handbooks. The AIAG-VDA handbook defines a seven-step approach:

Planning and preparation
Structure analysis
Function analysis
Failure analysis
Risk analysis
Optimization
Results documentation

The handbook also introduces Action Priority (AP) ratings — High, Medium, Low — as a supplement to RPN, addressing the known limitation that RPN can underweight high-severity, low-frequency failures. AP ratings are mandatory in AIAG-VDA compliant FMEAs and provide a more defensible basis for prioritization decisions.

AIAG-VDA FMEA is required for suppliers delivering to major OEMs under IATF 16949 quality management systems.

Aerospace Variants

In aerospace, FMEA is defined by SAE ARP4761 (Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment) and MIL-STD-1629A for defense applications. Aerospace FMEAs are typically part of a broader safety assessment process that includes Fault Tree Analysis (FTA), Common Cause Analysis (CCA), and Functional Hazard Assessment (FHA).

A key distinction in aerospace is that failure modes are typically classified by their effect on the aircraft-level function, with consequences ranging from No Safety Effect through Minor, Major, Hazardous, and Catastrophic. These classifications map to quantitative probability requirements: a catastrophic failure mode must have a probability of occurrence below 10⁻⁹ per flight hour.

Aerospace FMEAs must trace to certification artifacts under DO-178C (software) and DO-254 (hardware), making the connection between failure mode analysis and requirements baseline a regulatory requirement, not just good practice.

FMEA Outputs as Requirements Inputs

This is where many teams fail to close the loop. An FMEA that sits in a spreadsheet, reviewed once and filed, provides compliance evidence but limited engineering value. The failure modes identified in an FMEA should generate two categories of downstream requirements:

Failure mitigation requirements specify what the design must do to prevent the failure mode from occurring or to reduce its likelihood. If an FMEA identifies that a power supply rail can experience voltage spike conditions that damage downstream logic, the mitigation requirement might be: The power conditioning circuit shall limit output voltage to no more than 3.6 V under any load condition specified in the operating envelope. That requirement belongs in the requirements baseline with a traceable link back to the failure mode that generated it.

Design constraints specify how the design must be implemented to ensure detectability or containment. These are often phrased as architectural constraints: The system shall provide a hardware fault detection signal within 10 ms of detecting any processor watchdog timeout. This constraint comes directly from an FMEA’s detectability mitigation for a failure mode in the processor subsystem.

The connection runs both directions. Requirements also reference failure modes: a safety requirement may exist because of a specific failure mode, and that lineage should be visible in the traceability record. When the failure mode changes — because the design changes — the team needs to know which requirements are affected.

This bidirectional connection between FMEA and requirements is the technical debt most hardware teams carry quietly. It is managed poorly in document-based environments where the FMEA lives in one spreadsheet, the requirements live in another tool, and the links between them exist only in the memory of a senior engineer.

How Flow Engineering Connects FMEA to the Systems Graph

Teams using Flow Engineering approach this problem through the tool’s graph-based requirements model. Rather than maintaining FMEA as a separate artifact that feeds requirements through manual copy-and-paste workflows, Flow Engineering allows teams to represent failure modes as nodes in the systems graph alongside requirements, functions, and components.

When a failure mode is identified — say, inadequate thermal dissipation in the power electronics module leading to reduced output voltage under high ambient temperature — it can be added to the graph as a failure mode node linked to the relevant component node. The mitigation requirement that addresses it is then created as a connected requirement node, with the link type explicitly marking the relationship as a failure-mitigation dependency.

This structure makes several things possible that are difficult or impossible in document-based environments. First, completeness checks become automatable: any failure mode with a severity rating above a defined threshold and no connected mitigation requirement is a gap the system flags. Second, impact analysis is immediate — when a component changes, the team can traverse the graph to find all failure modes associated with that component and all requirements linked to those failure modes. Third, when a reviewer or regulator asks “show me every requirement that was generated by a safety-critical failure mode,” that query is answerable in seconds rather than requiring a manual audit across multiple documents.

Flow Engineering is deliberately specialized for this kind of connected systems reasoning. It is not a full-featured FMEA authoring tool — teams will still use AIAG-VDA worksheets or MIL-STD-1629A templates in their preferred format for the detailed rating work. What Flow Engineering handles is the connection layer: linking the outputs of that analysis to the rest of the engineering record in a way that survives design changes and supports traceability at scale.

For teams operating under DO-254, ARP4761, or ISO 26262, where demonstrating requirements traceability through the safety lifecycle is a certification requirement, this connected approach is the difference between traceability as a genuine engineering practice and traceability as a documentation exercise completed at program close.

Practical Starting Points

If you are implementing or improving FMEA practice on a hardware program, three structural decisions matter most:

Define what happens to high-RPN (or High AP) findings before the analysis ends. Every FMEA should have a clear protocol: findings above a threshold generate required actions, assigned owners, and due dates. Without this, the analysis produces risk information that no one acts on.

Establish the link between FMEA outputs and the requirements baseline as a process requirement, not a suggestion. The requirements that emerge from failure mode mitigation should enter the same requirements management workflow as any other derived requirement, with attribution to the failure mode that generated them.

Revisit the FMEA when the design changes. FMEA is not a one-time activity. When a component is redesigned, a supplier changes, or an operating environment is updated, the affected failure modes need re-evaluation. Teams that treat FMEA as a program-start activity and never revisit it accumulate risk silently.

FMEA is one of the most mature structured analysis methods in hardware engineering. The methodology is well-defined, the standards are clear, and the tooling exists to support it. The persistent challenge is integration — connecting the risk reasoning captured in the FMEA to the requirements that must address it, and maintaining that connection as the design evolves. That integration problem is a systems engineering problem, and solving it requires treating FMEA outputs as first-class inputs to the requirements model rather than as parallel documentation.