What Is a Safety Case? Structure, Evidence, and Engineering Accountability

What a Safety Case Actually Is

A safety case is a structured argument — supported by evidence — that a system is acceptably safe for a specific use in a specific environment.

That definition carries three load-bearing words: structured, argument, and evidence. Strip out any one of them and what remains is not a safety case. A collection of test reports without an argument is a filing cabinet. An argument without evidence is an assertion. Evidence without structure is noise.

The concept emerged formally in the UK in the 1990s, following the Piper Alpha offshore platform disaster. The Cullen Report recommended that operators of major hazard facilities demonstrate safety through a structured case rather than simply complying with prescriptive rules. The Safety Case Regulations followed in 1992, and the practice spread — first into nuclear, then defense, then rail, then aerospace — wherever regulators concluded that prescriptive checklists were insufficient for complex, high-consequence systems.

Today, safety cases are required or strongly expected under IEC 61508 (functional safety of E/E/PE systems), DEF STAN 00-056 (UK defense), EN 50129 (railway signaling), DO-178C and DO-254 (aerospace software and hardware), and nuclear licensing frameworks in multiple jurisdictions.

The core obligation in all of these is the same: show your reasoning, show your evidence, and show that the two connect.

The Two Dominant Frameworks

Goal Structuring Notation (GSN)

GSN is a graphical argumentation language developed at the University of York. It provides a set of defined node types for building visual argument diagrams:

Goals — the claims being made (e.g., “The braking system is safe for operational use on urban rail lines at speeds up to 80 km/h”)
Strategies — the reasoning approach used to break a goal into sub-goals (e.g., “Argument over identified hazard classes”)
Solutions — the evidence items that support a lowest-level goal (e.g., test reports, analysis results, inspection records)
Contexts — the conditions, standards, or definitions within which the argument holds
Assumptions — claims taken as true without evidence
Justifications — rationale for why a strategy or goal decomposition is valid

A GSN diagram is read top-down. A top-level safety goal decomposes through strategies into sub-goals, which decompose further until they reach solutions — actual evidence artifacts. The diagram makes the argument structure explicit and auditable.

The GSN Community Standard (maintained by the Origin Consulting group and adopted widely in UK defense and rail) defines formal rules for well-formed GSN arguments. Undischarged goals — goals with no supporting evidence — are explicitly marked, which forces incompleteness into view rather than allowing it to hide in prose.

Claims, Arguments, Evidence (CAE)

CAE is an older and simpler framework, originating in work by Tim Kelly and drawing on Stephen Toulmin’s model of argumentation. Where GSN is graphical and formal, CAE can be expressed in structured prose, tables, or simple diagrams.

The structure is exactly what the name says:

Claim — an assertion about safety (e.g., “All identified software failure modes have been mitigated to tolerable probability”)
Argument — the logical reasoning connecting the claim to supporting evidence
Evidence — the artifacts that justify the argument (test results, hazard analyses, verification records, certification data)

CAE is less visually rich than GSN but more flexible in how it is documented. Many organizations use CAE structure in prose sections of a safety case document, sometimes alongside GSN diagrams for the top-level argument.

Both frameworks share the same failure mode: arguments that are syntactically complete but evidentially hollow. A goal can be fully decomposed in a GSN diagram while the solution nodes point to evidence that does not actually demonstrate what the argument requires. Reviewers — and regulators — look for this. The strength of a safety case is not the elegance of its notation; it is the quality and traceability of the evidence underneath.

How Safety Cases Are Used Across Domains

Aerospace

In civil aerospace, the primary regulatory frameworks are DO-178C (software), DO-254 (complex hardware), and ARP4754A (systems). None of these uses the term “safety case” explicitly, but all require structured demonstration of safety objectives through traceable plans, analyses, and verification records.

Military aerospace under DEF STAN 00-056 is more explicit: a Safety Case is a contractual deliverable. It must demonstrate that the system achieves an acceptable level of safety against defined safety requirements, using a structured argument supported by evidence from the design, analysis, and verification lifecycle.

The aircraft-level safety case typically references subordinate safety cases for major systems — avionics, propulsion, flight controls — which in turn reference software and hardware component safety arguments. The hierarchy must be coherent: the evidence at lower levels must actually support the claims made at higher levels.

Defense

UK defense programs under DEF STAN 00-056 require both a safety case and a Safety Management Plan. The safety case must be iteratively updated through the acquisition lifecycle — a preliminary safety case at concept, a developed safety case at design maturity, and an operational safety case before entry into service.

The US defense community uses somewhat different terminology (e.g., “safety substantiation” under MIL-STD-882), but the underlying requirement is structurally identical: demonstrate through argument and evidence that residual risk is acceptable to the approving authority.

Defense safety cases are typically classified at some level, which creates its own version of the traceability problem: evidence must be accessible to reviewers who may have different clearance levels or organizational roles.

Rail

EN 50129 (railway signaling) explicitly requires a Safety Case as a defined deliverable. The standard distinguishes between a Generic Product Safety Case (for a type of system), a Generic Application Safety Case (for a class of applications), and a Specific Application Safety Case (for a particular installation).

The layered structure matters because railway signaling equipment is often developed by one party (a product vendor), applied by another (a systems integrator), and operated by a third (the infrastructure owner). The safety case hierarchy must maintain coherent argument chains across organizational boundaries, which in practice requires rigorous traceability of requirements, hazards, and verification records between parties.

Nuclear

Nuclear licensing worldwide requires demonstration of safety through structured cases, though terminology and formality vary by national regulator. In the UK, the Office for Nuclear Regulation expects a Safety Case that covers both design-basis and beyond-design-basis scenarios, with formal hazard analysis, safety classification of structures and systems, and documented verification that safety requirements are met.

Nuclear safety cases are notable for their lifetime scope: they must be maintained and updated across decades of operation, as modifications are made, aging effects are assessed, and operating experience is incorporated. This is where evidence chain integrity becomes a generational problem, not just a project-level one.

The Evidence Chain Problem

Understanding the structure of a safety case is the straightforward part. Building one that actually holds together is harder, and the difficulty concentrates in one place: the evidence chain.

A safety case argument is only as strong as the evidence that discharges its lowest-level goals. That evidence is generated throughout the engineering lifecycle — in requirements documents, hazard analyses, design specifications, verification plans, test procedures, test results, and review records. For large programs, this evidence may span hundreds of thousands of artifacts across multiple organizations, tools, and document repositories.

The practical consequence is that most safety case failures are not argument failures. They are evidence failures:

Evidence exists but cannot be shown to trace to the specific requirement or hazard it is claimed to address.
Verification was performed against a version of a requirement that was subsequently changed, and the change was not flagged for re-verification.
A hazard mitigation was specified in a requirement, the requirement was verified, but the verification method used does not actually demonstrate that the mitigation works.
A safety argument assumes a particular operational context, but that context was never formally specified or checked against the deployment environment.

These failures are not exotic. They appear in almost every safety case audit of significant scale. They are failures of traceability.

How Modern Traceability Tools Build the Evidence Chain

The connection between safety cases and requirements management is direct: requirements are how safety objectives become engineering constraints, and verification records are how compliance with those constraints is demonstrated. A safety case that cannot trace its evidence to specific verified requirements is not a safety case — it is an unsubstantiated claim.

Legacy requirements tools were built around document management. IBM DOORS, in its original form, stores requirements in module hierarchies with link tables. The links exist, but understanding what the links mean — which requirements are safety-critical, which verification methods are adequate for which claim types, where coverage gaps exist — requires manual analysis and separate tooling. The evidence chain is technically present but not operationally interrogable.

More recent platforms have shifted toward graph-based models of requirements and traceability. In a graph model, every requirement, hazard, design element, verification record, and test result is a node, and the relationships between them are typed edges. This makes the evidence chain a first-class data structure rather than a byproduct of document linking.

Flow Engineering takes this approach to requirements and traceability. Its graph-based model allows teams to define explicit relationship types — “satisfies,” “verifies,” “mitigates,” “derived from” — and to query the resulting network to identify coverage gaps before a safety case review rather than during one. A requirement that has no associated verification record is visible as an uncovered node. A change to a hazard mitigation requirement automatically surfaces the downstream verification records that need to be re-evaluated.

The practical implication for safety cases is that the evidence chain becomes something you can interrogate rather than something you have to reconstruct. When a solution node in a GSN diagram points to a verification record, that record can trace back through the graph to the requirement it verifies, the hazard it addresses, and the safety objective that hazard sits under. The argument and the evidence are structurally linked, not just narratively asserted.

Flow Engineering is purpose-built for hardware and systems programs rather than software-only or general enterprise use, which reflects the domain where safety cases are most demanding — where requirements span multiple engineering disciplines, where hardware and software interact, and where verification methods include physical tests, analyses, and inspections rather than code coverage alone.

Practical Starting Points for Building a Safety Case

Start with the top-level safety claim. Before structuring anything else, write the claim you are trying to make. “This system is safe” is not a claim — it is a platitude. “This system satisfies the safety requirements specified in [reference standard] for [specific use] in [specific environment]” is a claim. If you cannot write this sentence precisely, the rest of the safety case will drift.

Map your hazards to your argument structure. The safety argument is typically organized around identified hazard classes or safety functions. Your hazard log is not separate from your safety case — it is the spine of the argument. Every branch of the GSN diagram or CAE structure should trace back to a specific hazard or set of hazards.

Identify your evidence before you structure your argument. It is tempting to build the argument top-down and then go find the evidence. The risk is constructing an argument that sounds right but cannot be discharged. Start with the evidence you actually have — test records, analyses, verification reports — and build the argument from what can be substantiated.

Treat the evidence chain as a living artifact. A safety case assembled once and filed is not a safety case — it is a snapshot that degrades. Every requirement change, every design modification, every new test result needs to be assessed for its effect on the argument. This requires a traceability model that makes change impact visible, not a document review process conducted annually.

Plan for the review audience. Safety cases are reviewed by independent safety assessors, regulators, and in some domains, military or nuclear authorities with significant technical depth and no tolerance for hand-waving. The question they will ask of every solution node is: “Does this evidence actually demonstrate what the argument requires?” Structure your case so that question has an unambiguous answer.

A safety case is ultimately an accountability instrument. It documents who claimed what, on what basis, and with what evidence. The engineering work that produces safe systems is what actually makes systems safe — but the safety case is what makes that safety demonstrable, auditable, and defensible when the system is deployed in the real world.