What Is Failure Mode and Effects Analysis (FMEA)?

Failure Mode and Effects Analysis is a systematic, bottom-up method for identifying every way a component or process can fail, determining what effect each failure has on higher-level system function, and assessing how serious that effect is before it happens in a real product. The operative word is before. FMEA is a proactive discipline—it is performed during design, not extracted from field failure reports after the fact.

The method traces its lineage to U.S. military procedures from the 1940s and was formalized in MIL-P-1629 in 1949. It entered automotive and aerospace practice in the following decades and is now embedded in the safety standards that govern most safety-critical hardware development. When an engineer on an automotive brake-by-wire project or a medical device team asks “what could go wrong and how badly?”—FMEA is the structured answer to that question.

Understanding FMEA requires separating the concept from the worksheet. Many teams treat FMEA as a form to fill out. It is actually an analytical process that produces a structured dataset. The worksheet is just the record; the value is in the thinking that populates it and the decisions that follow from it.


Core Concepts

Design FMEA vs. Process FMEA

FMEA splits into two primary variants based on what is being analyzed.

Design FMEA (DFMEA) focuses on the product itself—its components, subsystems, and interfaces. The question is: given this design, in what ways can each element fail to perform its intended function? DFMEA is the responsibility of the product design team and is initiated during concept and detailed design phases. A failure mode in a DFMEA might be “Hall-effect sensor output shorted to ground,” and its effect at the system level might be “incorrect rotor position reported to motor controller, leading to torque ripple or loss of drive.” The cause might be “insufficient conformal coating on PCB exposed to condensation.”

Process FMEA (PFMEA) focuses on the manufacturing and assembly process—not the product design, but the operations that produce it. The failure mode here is a process defect: incorrect torque applied to a fastener, wrong component placed in a pick-and-place operation, inadequate solder joint. The effect is what that defect produces in the finished product and, by extension, in the field. PFMEA is owned by manufacturing engineering and is typically initiated when design is mature enough to define the production process.

The distinction matters because the responsible teams, the timing within the development program, and the corrective actions are entirely different. A DFMEA finding might drive a component change or an added redundancy. A PFMEA finding might drive a tooling change, a process control limit, or an added inspection step. Conflating the two produces analysis that is actionable by no one.

There is a third variant—System FMEA (SFMEA)—that operates at the top-level system architecture and examines interactions between subsystems. This is particularly relevant in complex systems where a failure in one subsystem propagates through interfaces to produce effects in another. System FMEA is often paired with Fault Tree Analysis (FTA), which is a top-down complement to FMEA’s bottom-up perspective.

The Risk Priority Number

FMEA assigns a numerical risk score to each failure mode called the Risk Priority Number (RPN). RPN is the product of three separate 1–10 ratings:

RPN = Severity (S) × Occurrence (O) × Detection (D)

  • Severity rates the consequence of the failure effect on the end user or system function. A score of 1 means the user would not notice the effect. A score of 10 means the failure causes injury, death, or catastrophic system loss with no warning.
  • Occurrence rates the likelihood that the failure cause will occur during the design’s expected life and operating conditions. A score of 1 means the cause is extremely unlikely. A score of 10 means the cause is almost certain.
  • Detection rates how well the current design controls—tests, inspections, sensors, diagnostics—will catch the failure cause or failure mode before it reaches the customer. Counterintuitively, a score of 1 means detection is near-certain. A score of 10 means there is no current means of detection.

RPN values range from 1 to 1,000. Teams set internal thresholds—commonly anything above 100 or 120 triggers mandatory corrective action—but absolute RPN thresholds are a heuristic, not a standard. What matters more than the absolute number is Severity. Industry best practice, reflected in the AIAG-VDA FMEA Handbook (first edition 2019), is that any failure mode with a Severity rating of 9 or 10 receives mandatory design review regardless of its RPN score. A highly detectable, low-occurrence failure that would kill someone still demands attention.

The AIAG-VDA handbook also introduced the Action Priority (AP) rating—High, Medium, Low—as a complement to RPN, recognizing that teams were gaming RPN by inflating Detection scores rather than actually improving designs. AP bases priority primarily on Severity and treats Occurrence and Detection as secondary filters. The evolution reflects that RPN is a useful prioritization signal, not an autonomous decision engine.

What FMEA Produces

A completed FMEA generates several outputs that feed downstream engineering activities:

  1. A prioritized list of failure modes requiring design action, ranked by RPN and flagged by Severity.
  2. Recommended actions assigned to owners with target completion dates—component changes, added redundancy, tighter tolerances, additional test coverage.
  3. Updated RPN scores after actions are implemented, demonstrating risk reduction.
  4. Design and process control recommendations that feed into the design specification, test plan, and manufacturing control plan.

The FMEA worksheet itself becomes a living document—it is updated when the design changes, when new failure data becomes available, or when a field issue requires retrospective analysis. In regulated industries, it is also a deliverable submitted to customers, certification bodies, and safety auditors.


FMEA in Safety Standards

ISO 26262 (Automotive Functional Safety)

ISO 26262 governs functional safety of electrical and electronic systems in road vehicles. It requires systematic hazard identification as part of the Hazard Analysis and Risk Assessment (HARA) at the vehicle level, and then requires that safety goals derived from HARA be allocated down to system and component requirements with specified Automotive Safety Integrity Levels (ASILs).

FMEA is not the only tool ISO 26262 accepts for this work, but it is explicitly named as a recommended method for hardware safety analysis at the component level (Part 5) and for design verification. A Design FMEA is typically required evidence that the hardware design has been systematically analyzed for failure modes, and that the resulting safety mechanisms—redundancy, diagnostics, monitoring—are adequate for the ASIL assignment. PFMEA feeds into the production control plan and manufacturing quality evidence required under Part 8.

IEC 62304 (Medical Device Software)

IEC 62304 covers software development processes for medical devices. It does not mandate FMEA by name, but it requires software risk management in accordance with ISO 14971, which is the medical device risk management standard. ISO 14971 requires systematic hazard identification, probability of harm estimation, severity of harm assessment, and risk control implementation—which maps directly to FMEA’s Occurrence, Severity, and Detection structure.

In practice, medical device teams use DFMEA at the hardware level under ISO 14971 and apply FMEA-style analysis to software failure modes as a hazard identification input. The FMEA worksheet provides traceable evidence that each identified hazard has been addressed through design or process controls, which is exactly what a 510(k) submission or CE technical file requires.

Aerospace: ARP4761 and DO-178C

In aerospace, the primary safety analysis standard is SAE ARP4761, which defines recommended practices for civil airborne systems safety assessment. ARP4761 specifies a suite of analyses—Functional Hazard Assessment (FHA), Preliminary System Safety Assessment (PSSA), System Safety Assessment (SSA), and Common Cause Analysis (CCA)—that work together to demonstrate compliance with FAA AC 25.1309 and equivalent EASA requirements.

FMEA and Failure Modes, Effects, and Criticality Analysis (FMECA—FMEA with an added Criticality ranking) are core components of this suite. They produce the failure mode data that populates fault trees and are required artifacts in the SSA. DO-178C, which governs airborne software, requires similar systematic analysis of software failure conditions, with software levels (A through E) assigned based on the severity of effects—directly analogous to ASIL assignment in automotive.

The common thread across all three standards: regulators require documented evidence that someone systematically asked “what fails, how often, how badly, and how would you know?”—and that the answers drove specific design decisions with traceable justification.


Making FMEA Results Actionable: The Traceability Problem

FMEA, done rigorously, produces a detailed picture of risk. The failure modes are identified. The RPNs are calculated. The recommended actions are listed. This is where many programs stall.

The gap is traceability. A failure mode is only actionable if it can be linked to the specific requirements—system requirements, subsystem requirements, interface requirements—that it threatens or that must change to address it. Without that link, the FMEA lives in a separate document and the design team has no structured way to know which requirements need revision, which test cases need to verify the mitigation, and whether a design change that reduces one risk inadvertently introduces another.

This is the upstream traceability problem. FMEA is downstream of requirements—it analyzes a design that was specified by requirements. But its findings flow back upstream. A safety-critical failure mode with no current detection mechanism might require a new diagnostic requirement. A high-severity failure mode caused by an interface ambiguity might require a tighter interface control document. A failure mode that the current test plan cannot detect might require a new verification requirement. If requirements and FMEA exist in separate silos—FMEA in a spreadsheet, requirements in a document management system—those feedback paths are manual, error-prone, and often broken.

Modern requirements management tools address this by building the traceability graph that connects failure modes to requirements bidirectionally. Flow Engineering (flowengineering.com) is built specifically for this kind of connected systems engineering work. Rather than treating requirements as documents and FMEA as a separate worksheet, Flow Engineering represents system architecture, requirements, and safety analysis as nodes in a connected graph. When a failure mode is identified, it can be linked directly to the requirement it threatens, the component it originates in, and the verification method that must demonstrate the mitigation. Engineers can query the graph: “Which requirements have open FMEA findings against them with no closed corrective action?” or “Which failure modes with Severity ≥ 9 are not yet linked to a verification test?”

This kind of connected query is impossible when requirements are in a Word document and FMEA is in a spreadsheet, even if both are stored in the same folder. The structure of the data determines whether the analysis is usable.

Flow Engineering’s AI-native architecture also helps during FMEA preparation. Deriving failure modes requires understanding what a component is supposed to do—its function—and then systematically considering how it could fail to perform that function. When functions and requirements are already captured in a structured model, the AI can propose candidate failure modes based on the component’s specified behavior, accelerating the FMEA kickoff and reducing the risk of a team missing a failure mode because someone forgot to add a row to the spreadsheet.

The practical implication for program teams is this: the value of FMEA is not in the worksheet—it is in the decisions the worksheet drives. Those decisions are requirements changes, design changes, and test additions. If your requirements tool cannot ingest FMEA findings and show you exactly which requirements are affected, you are managing risk through document coordination instead of through engineering analysis. The two approaches produce very different outcomes under audit and, more importantly, in the field.


Practical Starting Points

If you are implementing FMEA on a program for the first time, or improving an existing practice, the following sequence reflects how mature teams operate:

1. Define scope and boundaries before starting the worksheet. FMEA scope creep is real. Decide whether you are analyzing a subsystem or a full system, and where the boundaries are. An undefined scope produces an incomplete analysis that cannot be audited.

2. Build from a functional block diagram. Every element in the FMEA should correspond to a function or component that is already defined in your system architecture. If you are writing FMEA rows for things that do not appear in your design documentation, either your design documentation is incomplete or your FMEA scope is wrong.

3. Use Severity as your primary decision driver, not RPN. Prioritize high-Severity items regardless of Occurrence and Detection scores. Optimize RPN reduction secondarily.

4. Assign corrective actions to named owners with due dates. An FMEA with recommended actions and no owners is a documentation exercise, not an engineering activity.

5. Close the loop to requirements. Every corrective action that changes the design should produce a requirements change. Every new diagnostic or monitoring function should generate a verifiable requirement. Use your requirements management tool to capture and track that link—not a separate action item tracker.

6. Update the FMEA when the design changes. FMEA is not a one-time artifact. A design change that is not reflected in an updated FMEA means your safety analysis no longer represents your actual design.

FMEA is one of the most powerful tools available to hardware and systems engineers for identifying risk before it materializes. Its effectiveness is directly proportional to how well its findings connect to the rest of the engineering system—the requirements, the architecture, and the verification plan. The analysis and the design artifacts need to be in conversation. When they are, FMEA is a decision-making engine. When they are not, it is a compliance checkbox.