What FMEA Actually Is — and What It Is Not

Failure Mode and Effects Analysis (FMEA) is a bottom-up, inductive safety and reliability analysis method. Starting from individual components or process steps, it asks three systematic questions: What can fail? What happens when it fails? What can be done about it? The answers produce a ranked inventory of risk that drives corrective actions across design, process, and system development.

The word “inductive” is load-bearing here. Unlike fault tree analysis (FTA), which starts from an undesired top-level event and decomposes downward, FMEA starts from the lowest-level failure and propagates upward through the system to identify effects. Both methods belong in a complete safety analysis program; they are complementary, not interchangeable.

FMEA is codified across several industry standards: IEC 60812 (general), MIL-STD-1629A (military and aerospace), SAE J1739 (automotive), and ARP5580 (non-automotive commercial). The mechanics differ slightly between domains, but the analytical logic is the same.

What FMEA is not: it is not a compliance checkbox, and it is not a static document. An FMEA spreadsheet that gets filed after PDR and never updated has negative engineering value — it creates the illusion of analysis while the design evolves around it. Useful FMEA is a living model that feeds the requirement hierarchy in real time.


The Three Types: Design, Process, and System FMEA

Design FMEA (DFMEA)

DFMEA analyzes the hardware or software design itself. The item under analysis is typically a component, subassembly, or function. For each item, engineers enumerate potential failure modes — open circuit, short to ground, fracture under load, output out of tolerance — then trace the effect of each failure mode upward through the assembly hierarchy to the end user or mission-level effect.

DFMEA is performed during design development, ideally before detailed drawings are released. Its primary output is a set of design changes and design controls that reduce risk. When a design change is not feasible or cost-effective, the alternative is detection: adding a sensor, a diagnostic, or a test that catches the failure before it propagates. Either output — a design change or a detection control — should generate a derived requirement.

Process FMEA (PFMEA)

PFMEA analyzes manufacturing and assembly processes rather than the design itself. The failure mode is now a process defect: a fastener torqued out of spec, a solder joint insufficiently reflowed, a firmware version incorrectly flashed. The effect is what the process defect does to the finished product or the end customer.

PFMEA is the manufacturing engineer’s primary risk management tool. Its outputs drive process controls, inspection criteria, and acceptance test procedures — all of which are requirements on the manufacturing system. In safety-critical industries, PFMEA outputs are traceable to quality plans and production control documents in the same way DFMEA outputs are traceable to the system requirement specification.

System FMEA (SFMEA)

System FMEA operates at the highest level of abstraction: it analyzes functional failures at the system or subsystem boundary, focusing on how failure modes in one subsystem affect the behavior of adjacent subsystems or the overall mission. It is often performed early in concept development, before detailed design, and uses functional block diagrams or system architecture models as its input.

SFMEA answers the question: if this function fails in this mode, what does the system do, and is that tolerable? The outputs drive system-level architecture decisions — redundancy, fail-safe states, isolation boundaries — and these decisions are expressed as requirements on the system’s functional and non-functional behavior.


Risk Priority Numbers: What They Measure and Where They Break Down

The Risk Priority Number (RPN) is the most widely used quantification method in FMEA. It is calculated as:

RPN = Severity (S) × Occurrence (O) × Detection (D)

Each factor is rated on a scale, typically 1–10:

  • Severity rates the consequence of the failure effect on the customer or system. A rating of 10 is catastrophic — loss of life, loss of vehicle, regulatory non-compliance. A rating of 1 is unnoticeable.
  • Occurrence rates the likelihood of the failure cause occurring. A rating of 10 is virtually certain; a rating of 1 is extremely unlikely.
  • Detection rates the ability of current controls to detect the failure before it reaches the customer or causes a system effect. Counterintuitively, a rating of 10 means the failure is undetectable, and a rating of 1 means it is certain to be caught.

The RPN ranges from 1 to 1000. Teams typically set a threshold — often 100 or 125 — above which corrective action is mandatory. Actions target whichever factor is most reducible: you can rarely reduce severity (the physics are what they are), so most actions target occurrence (design changes, redundancy) or detection (diagnostics, inspection).

Where RPNs break down: An RPN of 120 composed of S=10, O=3, D=4 is very different from an RPN of 120 composed of S=4, O=5, D=6. The first involves a catastrophic severity that deserves urgent attention regardless of occurrence; the second is a moderate nuisance. Relying on the RPN scalar alone obscures these differences. This is why IEC 61508, ISO 26262 (automotive functional safety), and DO-178C (avionics software) require explicit severity-based analysis in addition to or instead of RPN thresholds. For safety-critical systems, any failure mode with a severity rating of 9 or 10 demands mitigation independent of its RPN.


The Critical Step Most Teams Skip: Converting FMEA Outputs to Derived Requirements

Here is where most FMEA implementations fail in practice. The analysis identifies a failure mode, assigns a high RPN, proposes a corrective action — and then the corrective action lives only in the FMEA spreadsheet. It never becomes a formal requirement. It is never assigned to a design function. It is never given a verification method. At the next program review, no one can answer the question: “How do we know the corrective action was implemented?”

The discipline that closes this gap is derived requirements generation from FMEA outputs.

The logic is straightforward:

  1. A failure mode with an unacceptable RPN or unacceptable severity requires a mitigation.
  2. The mitigation is a design decision: add redundancy, add a diagnostic, add a protection circuit, constrain an operating condition.
  3. Any design decision that is not a requirement is not a requirement — it is a wish.
  4. Therefore, every FMEA-driven mitigation must be expressed as a formal derived requirement, allocated to the appropriate system level, and placed in the requirement hierarchy.

For example: a DFMEA on a battery management IC identifies a failure mode of “cell voltage measurement error > 50 mV due to ADC offset drift at high temperature.” The effect is overcharge of a lithium cell, severity 9. The proposed control is a software plausibility check comparing redundant measurements. The derived requirement should read:

“The battery management software shall detect a discrepancy greater than 25 mV between primary and secondary cell voltage measurements and shall command a safe-state transition within 100 ms of detection.”

That requirement now exists in the system requirement specification. It is allocated to the BMS software subsystem. It has a verification method (test). It can be traced back to the FMEA entry that generated it. And it can be traced forward to the test case that proves it works.

This is not optional housekeeping. In ISO 26262 automotive development, derived safety requirements must be traceable to the hazard analysis and risk assessment (HARA) that generated them. In DO-254 avionics hardware development, safety requirements generated from failure mode analysis must be verified with evidence. In IEC 61508 general functional safety, the safety requirements specification must be derived from and traceable to the hazard and risk analysis. The derived requirement is the engineering mechanism that connects the analysis to the product.


Traceability Architecture: Where FMEA Fits in the Requirement Hierarchy

A well-structured safety-critical program has a requirement hierarchy that flows from mission and safety goals down through functional requirements, system requirements, subsystem requirements, and component requirements. FMEA findings enter this hierarchy as derived requirements — requirements that were not explicitly stated by the customer or stakeholder, but which are logically necessary to satisfy a safety goal or mitigate an identified risk.

The traceability links that must exist:

  • Upward: The derived requirement traces to the FMEA entry that generated it, and the FMEA entry traces to the applicable safety goal or hazard category.
  • Downward: The derived requirement traces to the design element (component, function, software module) that implements the mitigation.
  • Laterally: The derived requirement traces to the verification test case that confirms the mitigation is effective.

Without all three links, you have incomplete traceability — and an incomplete audit trail. Regulators, certification bodies, and customers reviewing your safety case will look specifically for these links.


How Modern Tools Implement FMEA-to-Requirements Traceability

Document-based tools — Microsoft Word requirements specifications, Excel FMEA worksheets — make this traceability architecture nearly impossible to maintain at scale. The FMEA spreadsheet and the requirements document are separate artifacts. Linking them requires manual cross-referencing, which degrades as both documents evolve. By CDR, the links are typically fiction.

Graph-based requirements management tools address this structurally by making every artifact a node and every relationship an explicit edge. The FMEA entry, the derived requirement, the design element, and the verification test case all exist as nodes in a connected graph. Adding a link between them is a first-class operation, not a comment in a cell.

Flow Engineering is built on this graph model and extends it specifically to support the analytical workflows that safety-critical hardware programs run. An FMEA finding — failure mode, effect, cause, controls, RPN — can be captured as a structured node. From that node, engineers can generate a derived requirement directly, automatically inheriting the traceability link back to the source analysis. That requirement then participates in the full Flow Engineering requirement hierarchy: it can be allocated to subsystems, assigned a verification method, and linked to test cases.

The operational value appears at review time. When a safety reviewer asks “show me every requirement that was derived from a failure mode with severity ≥ 8,” Flow Engineering can answer that query by traversing the graph. When a test engineer asks “which test cases verify requirements that came from FMEA,” the answer is a graph query, not a spreadsheet search. When a failure mode is updated because the design changed, the impact on derived requirements is immediately visible because the connections are structural, not editorial.

Flow Engineering’s focus is on requirements authoring, management, and traceability — it is not a full FMEA tool with built-in RPN calculators and automotive-specific rating tables. Teams using dedicated FMEA applications (such as ReliaSoft XFMEA or Relyence FMEA) would import or reference FMEA findings into Flow Engineering to manage the downstream traceability. That integration boundary is a deliberate product choice: Flow Engineering is the system of record for requirements and their relationships, not a replacement for domain-specific analysis tools.


Practical Starting Points

If your program is currently running FMEA in spreadsheets and managing requirements in documents, the path to better traceability does not require a full toolchain replacement on day one. Three concrete steps that produce immediate value:

1. Add a “Derived Requirement ID” column to your FMEA. For every action item that resolves a high-RPN or high-severity finding, require that a requirement ID be entered before the action item is closed. This forces the question: does this corrective action exist as a formal requirement? If no ID can be entered, the requirement has not been written yet.

2. Tag derived requirements in your requirement specification. Whether your SRS is in a modern tool or a Word document, every requirement that originated from FMEA should be tagged as such, with a reference to the FMEA entry number. This makes the derived-requirement population queryable and auditable.

3. Verify coverage in both directions. For every FMEA action item, there should be a requirement. For every derived safety requirement, there should be a test case. Running this coverage check before each major review will surface gaps before reviewers do.

As programs grow in complexity — more subsystems, more failure modes, more verification events — the spreadsheet approach to this coverage check becomes untenable. That is where a graph-based tool like Flow Engineering pays for itself: the coverage check is a structural property of the model, not a manual reconciliation exercise.


Honest Assessment

FMEA is one of the most mature and widely deployed safety analysis methods in engineering. Its value is not theoretical — it has caught catastrophic failure modes in automotive, aerospace, medical device, and industrial systems for decades. The process works.

The failure mode — if you will — is organizational. FMEA is treated as an analysis artifact rather than an input to engineering. The corrective actions stay in the spreadsheet. The derived requirements never get written. The test cases never get linked. The safety case at the end of the program has a gap that everyone hopes the auditor does not find.

Closing that gap requires discipline in process and, at scale, tooling that makes the connections structural rather than editorial. The combination of rigorous FMEA practice and a requirements management system built for traceability is not overhead — it is the mechanism by which hazard analysis actually changes what gets built.