Humanoid Robotics: The Systems Engineering Problem Nobody Is Talking About

The demos are extraordinary. Figure’s 02 hands a bag of groceries to a human and explains in plain English what it is doing. Agility Robotics’ Digit walks a warehouse floor, avoids a forklift, and resumes its route without intervention. Physical Intelligence’s foundation models adapt robot behavior to novel objects in real time. Apptronik’s Apollo stacks cases in a distribution center alongside human workers.

Watch these videos and you could reasonably conclude that humanoid robotics is an execution problem — just manufacturing, cost-down, and go-to-market. The hard science is done.

It is not done. There is a systems engineering crisis building underneath the industry’s commercial momentum, and almost nobody is discussing it publicly. The crisis has two interlocked parts: humanoids cannot be meaningfully specified using the requirements engineering tools the industry has inherited, and the functional safety standards designed for industrial robots are structurally incompatible with how humanoids actually operate.

This is not a criticism of any one company. It is a structural observation about a technology category that has matured faster than the engineering discipline around it.

What Makes Humanoids Different From Every Prior Robot Category

To understand why existing frameworks fail, you need to understand what makes humanoids categorically different from their predecessors.

Industrial manipulators — the kind governed by ISO 10218 and the collaborative robot addendum ISO/TS 15066 — operate in a known, controlled environment. Their workspace is defined. The set of objects they interact with is enumerated. The humans who enter their operating zone do so in a structured way, under controlled conditions. The robot’s behavioral envelope is finite and auditable.

Autonomous mobile robots (AMRs) like those from MiR or Fetch added mobility but kept the task domain narrow: move a payload from point A to point B. Their operational design domain is the warehouse floor. The cast of characters — humans, forklifts, shelving — is known.

Humanoids break both of these constraints simultaneously. They are designed to operate in general environments alongside arbitrary humans performing arbitrary tasks. That is the product proposition. A humanoid that only works in one carefully controlled setting is just an expensive AMR with arms.

This means the canonical safety engineering question — “enumerate the hazards associated with this system in its operating environment” — becomes effectively unanswerable. You cannot enumerate hazards across an unbounded environment with unbounded human interaction patterns.

The autonomous vehicle industry encountered a version of this problem with its Operational Design Domain (ODD) concept. The ODD defines the conditions under which an autonomous system is designed to operate safely — weather, road type, speed range, geographic region. Every safety case for an AV is scoped to its ODD. When the vehicle leaves the ODD, it must either hand off to a human or bring itself to a safe state.

Humanoid robots have no ODD. That is the product proposition. And that means the entire safety case methodology inherited from both industrial robotics and autonomous vehicles breaks down at the definitional level.

The FMEA Problem

Traditional Failure Mode and Effects Analysis asks engineers to enumerate failure modes, assign severity and probability, and design mitigations. It is a powerful tool for systems with a finite, auditable behavioral space.

AI-driven systems do not have a finite behavioral space. A humanoid running a learned policy or a foundation model can exhibit behaviors that were never explicitly programmed and were not present during testing. This is not a bug — it is the mechanism by which these systems generalize to novel situations. But it renders FMEA incomplete by construction.

Consider a concrete example. A humanoid working alongside a human in a kitchen misidentifies a person’s forearm as a cylindrical object it should grasp. This failure mode does not appear in any component-level FMEA because no single component failed. The sensor returned accurate depth data. The perception model returned a plausible object classification. The grasp planner selected an appropriate grasp for the classified object. Every subsystem performed within specification. The system-level behavior was dangerous.

This is an emergent failure mode — it arises from the interaction of correctly functioning components in an unanticipated context. Traditional FMEA has no native mechanism for capturing this class of failure. STPA (System-Theoretic Process Analysis) does better, because it focuses on control structure and unsafe control actions rather than component failures. But STPA was not designed for systems where the control structure itself adapts dynamically through learning.

The industry is aware of this problem but has not converged on a methodology. Several teams are adapting techniques from the AV space — scenario-based testing, simulation-based coverage metrics, behavioral specification languages. These are useful but incomplete. A simulation can only cover scenarios the simulation designers imagined.

The Standards Gap

ISO 10218-1 and -2 govern industrial robot safety. ISO/TS 15066 extends this to collaborative robots that share workspace with humans. These are mature, well-understood standards with clear conformance paths.

None of them apply to humanoids operating in unstructured environments. The standards assume:

A defined operating space. ISO 10218 is built around the concept of a safeguarded space — a zone that can be protected with physical barriers, light curtains, or presence-detection systems. A humanoid walking through a house has no safeguarded space. The entire environment is the operating space.

A fixed task structure. Collaborative robot risk assessments assume a known set of tasks with known force and speed profiles. The maximum permissible contact force for a cobot is derived from the task. A general-purpose humanoid does not have a fixed task structure — its task portfolio is theoretically open-ended.

A human who can anticipate robot behavior. ISO/TS 15066’s biomechanical injury thresholds assume the human can see and respond to robot motion. But emergent AI behavior may not be predictable even to a trained operator standing next to the system.

The standards gap is not merely technical. It has commercial consequences. Industrial customers with safety compliance obligations — which is most of the target market for humanoids — cannot easily deploy these systems without a conformance path. Insurance underwriters cannot price coverage without a conformance path. Some regulated industries cannot deploy at all.

The ISO Technical Committee 299 (Robotics) has working groups examining this gap. IEC is doing similar work in the functional safety domain. Progress is real but slow — standards development cycles run five to seven years under normal conditions. Companies deploying humanoids commercially in 2026 and 2027 are doing so ahead of any applicable standard.

This is not unprecedented. The AV industry operated ahead of its regulatory framework for years. What it means in practice is that deployment decisions are being made on internal safety cases of widely varying rigor, with no external benchmark to evaluate them against.

How Requirements Engineering Fails at Humanoid Scale

Step back from safety standards for a moment and look at the requirements engineering problem upstream.

A humanoid robot has thousands of requirements spanning mechanical, electrical, software, perception, control, human-machine interaction, safety, regulatory, and operational domains. These requirements are not independent. A change to the torque limit on a wrist joint affects the grasp planner, which affects the task planner, which affects the safety envelope, which affects the human-robot interaction design, which may affect regulatory classification.

This is the classic systems engineering problem — managing interdependency across a complex, multi-domain system. The standard tools for this are requirements management platforms: IBM DOORS, Jama Connect, Polarion, Codebeamer. These tools are mature and widely deployed in aerospace and automotive.

The problem is that these tools were designed for document-centric requirements workflows. Requirements are written as text, organized into hierarchies, and linked via manual traceability matrices. This works adequately for systems with relatively stable, well-bounded requirements — a flight control computer, a braking system, a communication protocol.

Humanoids are not that system. Their requirements are not stable — they evolve rapidly as behavioral capabilities expand. They are not well-bounded — the interface between AI policy behavior and system-level requirements is not cleanly separable. And the traceability problem is not a hierarchy problem — it is a graph problem. The dependency structure between humanoid requirements does not decompose into a clean tree. It is a dense network of cross-domain dependencies that document-centric tools represent poorly.

The teams navigating this most effectively have moved toward graph-based requirements models — treating requirements as nodes in a connected graph rather than items in a hierarchy. This lets them trace how a behavioral change in a learned policy propagates through perception, actuation, safety, and regulatory requirements simultaneously, rather than discovering these dependencies when something breaks.

Tools like Flow Engineering, built specifically for hardware and systems engineering on a graph-native architecture, are starting to see adoption in exactly this context — teams that have outgrown the document metaphor and need to reason about requirement propagation across a densely connected system. For humanoid programs where a single design decision can invalidate requirements across six subsystems simultaneously, the ability to query and traverse a live requirements graph is not a workflow convenience. It is a prerequisite for managing the system at all.

What the Industry Is Actually Doing

Despite the structural gaps described above, commercial deployments are proceeding. How are companies managing?

Constrained deployment environments. The commercial deployments happening now — warehouse logistics, light manufacturing — are not actually unstructured. They are semi-structured environments with limited human interaction zones, defined task repertoires, and operating procedures that approximate an ODD even if they are not formalized as one. Companies are essentially creating de-facto ODDs through operational design rather than standards compliance.

Internal safety governance. Without external standards to conform to, leading companies are developing internal safety frameworks drawing on automotive functional safety (ISO 26262), AV methodology (UL 4600), and aerospace practices (ARP4754A). These are more rigorous than nothing, but they are not interoperable — a safety case built on Figure’s internal framework tells an industrial customer nothing they can rely on under their own compliance obligations.

Human supervisory control. Several deployments retain a human-in-the-loop for any novel or ambiguous situation. This sidesteps the emergent behavior problem in the near term but does not scale and does not constitute a safety architecture — it is a compensating control that defers the problem.

Simulation-based behavioral coverage. Physical Intelligence and similar teams with foundation model architectures are investing heavily in simulation diversity — attempting to cover as broad a behavioral distribution as possible before real-world deployment. This is genuinely promising but produces coverage metrics that are difficult to interpret in safety terms. “We tested 10 million simulation episodes” is not equivalent to a safety case.

The Honest Assessment

Humanoid robotics will reach significant commercial scale. The hardware capabilities demonstrated by Figure, Agility, Apptronik, and their competitors are real, and the market demand from logistics and manufacturing is real. None of the systems engineering gaps described here are permanent barriers.

But the industry is carrying technical debt in its safety and requirements practices that will become increasingly expensive as deployment scale increases. A single high-profile injury incident in a commercial humanoid deployment will trigger regulatory responses that make the current ambiguity look comfortable by comparison.

The companies that will navigate this well are not necessarily the ones with the best hardware or the most capable AI. They are the ones building rigorous requirements engineering and safety practices now — before an incident forces the issue, and before standards bodies establish frameworks that may not reflect operational reality.

That means investing in requirements engineering infrastructure that can handle graph-scale interdependency, adopting safety analysis methods that address emergent AI behavior rather than just component failure, and engaging proactively with ISO TC299 and IEC working groups to shape standards that will govern this category.

The demos are extraordinary. The engineering work that makes them safe to deploy at scale has barely started.