Waabi: AI-First Autonomous Trucking and the Requirements Question

How simulation-first development reshapes what it means to write a verifiable requirement

Waabi is not trying to replicate the Waymo playbook at lower cost. The Toronto-based autonomous trucking company, founded in 2021 by Raquel Urtasun and built around a small team of researchers from the University of Toronto, Uber ATG, and Apple, is making a structurally different bet: that the correct architecture for autonomous driving is one where the AI system is trained, validated, and continuously improved inside a high-fidelity simulation environment — Waabi World — rather than primarily on public roads. The trucks accumulate real miles, but the core validation infrastructure is synthetic.

That bet has direct, underappreciated consequences for how Waabi’s engineering team writes and manages requirements. Autonomous trucking is not a domain where you can afford vague acceptance criteria. Federal Motor Carrier Safety Administration regulations, SAE Level 4 definitions, and the actuarial reality of commercial freight operations demand precision. The question Waabi’s systems engineering function has to answer is one that most autonomy programs treat as secondary: what does it actually mean for a requirement to be verifiable, when the primary verification environment is a simulation?

The Current State of Autonomy Validation

The dominant model for autonomous vehicle validation through 2024 has been accumulation. Waymo, Cruise (before its regulatory difficulties), and Aurora have all built programs centered on total disengagement miles, total autonomous miles, and scenario libraries derived from real-world encounters. The logic is defensible: if the system performs without intervention across enough diverse real-world conditions, you accumulate statistical confidence.

The problem is the denominator. The National Highway Traffic Safety Administration estimates roughly 40,000 road deaths annually in the United States, concentrated heavily in edge cases — unusual intersections, rare weather combinations, atypical pedestrian behavior, sensor-degrading conditions. To encounter those scenarios at statistically meaningful rates on public roads requires billions of miles. Aurora’s commercial launch on I-45 between Dallas and Houston represents genuine engineering progress, but even at commercial scale, real-world fleet operations sample tail-risk scenarios at rates that make statistical validation of safety-critical behavior extremely slow.

Waabi’s answer is to synthesize those scenarios at scale inside Waabi World, a simulation environment built on neural rendering and learned world models, rather than hand-authored physics engines. The distinction matters. Hand-authored simulators generate scenarios the authors anticipate. Learned world models can generalize to distributions of scenarios the authors did not explicitly program — including, in principle, the distribution of scenarios that actually cause accidents.

What Simulation-First Means for Requirements Engineering

The conventional requirements engineering workflow in automotive and aerospace looks roughly like this: derive requirements from customer and regulatory inputs, allocate them to subsystems, verify each requirement through test, document results in a traceability matrix. “Test” in this context means physical test — hardware-in-the-loop, vehicle-level validation drives, or formal type certification testing. Simulation plays a supporting role: it informs design, stress-tests edge cases, and catches regressions early. But the authoritative verification record is physical.

Waabi’s methodology inverts this priority ordering. Simulation is not the pre-screening layer before real-world validation. It is the primary verification environment. This creates three engineering challenges that conventional requirements tooling was not designed to handle.

First: requirements must be written against a simulation contract, not a physical test procedure. A requirement like “the system shall maintain a minimum following distance of 4 seconds under wet road conditions” is straightforward to verify on a proving ground in rain. In Waabi World, verifying the same requirement demands that the simulation’s wet road model produce physically accurate stopping distances, sensor degradation profiles, and tire interaction dynamics. If the simulation model is wrong, the verification is wrong — even if the test case executes cleanly. The simulation fidelity level becomes a precondition of the requirement’s verifiability, and that precondition must itself be documented, validated, and traced.

This is a requirements architecture problem. The fidelity envelope of Waabi World is not a fixed given; it is a living technical artifact that constrains what requirements can be considered verified at any point in the program. Engineering teams that treat simulation as a black-box test executor will miss this dependency entirely.

Second: scenario coverage replaces procedural test cases as the primary traceability unit. In physical testing, a requirement is verified by executing a defined procedure and recording pass/fail. In simulation, a requirement is verified by demonstrating performance across a distribution of scenarios. “The system shall detect a stationary object in the travel lane at 200 meters under fog conditions” is not verified by running one fog scenario. It is verified by sampling the distribution of fog densities, object types, approach speeds, and lane geometries — and demonstrating that performance meets the criterion across that distribution with a specified confidence level.

This means requirements must be written with explicit scope: what scenario variables are in-scope for the requirement, what distributions those variables are drawn from, and what statistical criterion constitutes satisfaction. That is harder to write than a procedural test case. It also requires traceability tooling that can represent probabilistic verification evidence, not just binary pass/fail records.

Third: the sim-to-real gap becomes a first-class requirement in its own right. Every simulation-verified requirement carries an implicit assumption: that the simulated environment is a sufficiently accurate model of the real world for the requirement’s domain. When Waabi moves trucks from Waabi World onto Texas highways, they are not simply “confirming” simulation results. They are generating evidence about the fidelity of the simulation model itself — specifically, about which requirements were correctly verified in simulation and which were verified against a model that diverged from reality.

Managing this loop — using real-world data to bound simulation fidelity, using bounded fidelity to scope simulation verification claims, and propagating fidelity bounds back to requirement status — is a continuous process. It requires traceability infrastructure that connects physical test observations to simulation model parameters to requirement verification status. In most programs running conventional requirements management tools, that loop is closed manually, slowly, and incompletely.

What Waabi Is Actually Doing Well

Waabi World’s technical architecture addresses the scenario coverage problem more directly than most industry observers credit. The use of learned world models — neural representations of sensor behavior, environmental conditions, and agent interactions — allows Waabi to synthesize scenarios that are drawn from the actual distribution of real-world driving rather than from an engineer’s imagination. When a real-world encounter produces a novel sensor artifact, that artifact can be incorporated into the simulation model and immediately used to generate thousands of synthetic variants. The feedback loop between real-world data collection and simulation model update is substantially tighter than in hand-authored simulation environments.

This matters for requirements verification because it changes the nature of scenario coverage gaps. In a hand-authored simulator, a gap in scenario coverage is invisible until someone explicitly adds a new scenario type. In a learned simulator, coverage gaps are bounded by the distribution of the training data — which means they can be estimated, tracked, and systematically reduced as real-world data accumulates. The gap is not eliminated, but it is tractable in a way that hand-authored simulation gaps are not.

Waabi’s focus on long-haul trucking on structured highway environments also represents sound systems engineering judgment. The operational design domain is narrow by design. Highway trucking between fixed terminal pairs has a dramatically smaller scenario space than urban robotaxi operation. That domain restriction makes simulation-first validation more tractable, makes scenario coverage claims more defensible, and makes the sim-to-real gap smaller and more characterizable. It is not a concession to technical difficulty — it is a deliberate architecture choice that makes the requirements verification problem more tractable.

Where the Approach Leaves Open Questions

The clearest unresolved challenge is not technical — it is institutional. Simulation-generated verification evidence does not yet have an established pathway to regulatory acceptance in the United States. The FHWA, FMCSA, and NHTSA have not published frameworks for evaluating simulation-based safety cases with the same specificity as physical test standards. Waabi, like every simulation-first autonomy program, is operating in a regulatory environment that was built around the assumption that verification means physical testing.

The practical implication is that Waabi’s engineering team must maintain two parallel verification arguments: the simulation-based argument that drives internal development decisions, and the real-world operational record that forms the externally legible safety case. That duplication is not inefficiency — it is a necessary response to the regulatory environment — but it means that the productivity gains from simulation-first development are partially offset by the cost of maintaining that parallel evidence base.

The second open question is auditable traceability of probabilistic verification claims. When a requirement is verified against a distribution of 500,000 synthetic scenarios, the verification record is not a single pass/fail data point — it is a statistical summary over a scenario set that itself has provenance (how were the scenarios generated? what distributions were sampled? what was the simulation model version?). Capturing that provenance in a form that is auditable, reproducible, and traceable to the originating requirement is a tooling problem that the industry has not fully solved. Modern graph-based requirements platforms are better positioned to represent this kind of evidence structure than legacy document-based tools, but even they require careful schema design to handle probabilistic verification records correctly.

What This Means for the Autonomy Validation Field

Waabi’s simulation-first architecture is a preview of where high-assurance autonomy validation will eventually have to go for the entire industry. The math on road-miles-based validation does not close for tail-risk scenarios at any commercially feasible fleet size. At some point, every serious autonomy program will need to make simulation-generated evidence a primary input to its safety case, not a supplementary one.

The requirements engineering implications of that shift are consequential. Requirements will need to be written with explicit simulation scope and fidelity preconditions. Verification evidence will be statistical, not procedural. Traceability will need to connect real-world observations back to simulation model parameters and forward to requirement verification status. The organizational boundary between test engineering and simulation engineering will dissolve.

The tools that exist today — built around document-centric requirement capture, binary verification status, and manual traceability — will struggle with this model. What the field needs is requirements infrastructure that treats simulation fidelity as a traceable artifact, supports probabilistic verification records natively, and can propagate the real-world feedback loop back to requirement status automatically. Tools like Flow Engineering, which represent requirements as connected graph structures rather than flat documents, are closer to that architecture than legacy platforms — but even they will need to extend their evidence models to handle simulation-provenance data at the scale Waabi’s methodology demands.

Honest Assessment

Waabi is solving a real problem with a technically sound approach. The simulation-first methodology is not hype — it is a direct response to the scalability ceiling of road-miles-based validation, and the company’s technical leadership has the research depth to execute it credibly.

The genuine risks are institutional, not technical. Regulatory frameworks have not caught up. The tooling for managing simulation-based requirements verification evidence at scale does not fully exist. And the commercial timeline for long-haul autonomous trucking continues to be subject to the same infrastructure, insurance, and operational dependencies that have stretched every autonomy program’s horizon.

But the requirements question Waabi is forcing the industry to confront is the right one. If a requirement can only be verified on a public road, at road-test speeds, in conditions that occur at a rate of once per million miles, then it is not a requirement the industry can actually close. Waabi’s bet is that “verifiable in high-fidelity simulation with bounded real-world correspondence” is a better requirements engineering standard than “theoretically verifiable on a road we have not yet driven enough of.” That bet may be correct. The answer will determine how the autonomy industry writes requirements for the next decade.