What Is the Test and Evaluation Master Plan (TEMP)?
A Test and Evaluation Master Plan is the document that answers one question for a defense or large government program: how will we prove this system does what we said it would do? The TEMP is not a test script, a test report, or a test schedule. It is the governing strategy that connects the program’s requirements baseline to every test event planned across the system’s development and operational evaluation lifecycle.
For programs subject to Department of Defense acquisition policy — specifically those governed by DoDI 5000.02 and its successor directives — the TEMP is a mandatory acquisition artifact. It must exist before testing begins, it must be approved by the appropriate test authority, and it must be updated whenever the program’s requirements or test approach changes materially. Failure to maintain a current TEMP is not a documentation problem; it is a program execution problem, because decisions about test resource allocation, range scheduling, and operational test entry are all gated on what the TEMP says.
What the TEMP Actually Contains
A well-structured TEMP is organized around four core elements: the test strategy, the critical parameters, the test and evaluation outline, and the resource summary.
The test strategy states the integrated logic connecting developmental test and evaluation (DT&E), operational test and evaluation (OT&E), and — for applicable programs — live fire test and evaluation (LFT&E). It explains why tests are sequenced the way they are, what assumptions underlie the plan, and what conditions must be met before each test phase begins. The strategy section is where the program articulates how early developmental results will inform operational test planning, and how risks identified in DT&E will be dispositioned before OT&E entry.
Critical parameters are the quantified thresholds the system must meet. These appear in two forms. Critical Technical Parameters (CTPs) are engineering measures — range, accuracy, mean time between failures — that characterize whether the system was built correctly. Critical Operational Issues (COIs) are operationally framed questions — Can the system be maintained by the unit’s organic maintainers? Does it integrate with the existing command and control architecture? — that characterize whether the system will be effective and suitable in realistic operational conditions. Every CTP and COI in the TEMP must be derived from the program’s approved requirements documents: the Capability Development Document (CDD) or its equivalent.
The test and evaluation outline is the most detailed section. It maps each CTP and COI to specific test events, identifies who conducts each event, specifies the data collection requirements, and establishes the evaluation criteria that will be used to judge results. This is where the traceability requirement becomes concrete: if a parameter exists in the requirements baseline and does not appear in the T&E outline, the TEMP is incomplete by definition.
The resource summary covers ranges, instrumentation, threat representatives, system-under-test articles, funding, and schedule. Defense testing is resource-constrained in ways that commercial testing rarely is. Hypersonic ranges, anechoic chambers, and threat emitter systems are shared national assets. The TEMP must demonstrate that the program has identified and coordinated access to what it needs, not simply assumed availability.
How the TEMP Relates to the Requirements Baseline
The relationship between the TEMP and the requirements baseline is bidirectional and dynamic, though programs often treat it as a one-time derivation exercise done at program inception.
In the correct model, the requirements baseline — the CDD, the System Requirements Specification, the Allocated Baseline — defines what must be demonstrated. The TEMP then specifies how and when each requirement will be demonstrated, by whom, and to what standard. When the requirements baseline changes, the TEMP must be evaluated for impact. A threshold relaxed in the CDD may eliminate a test event. A new requirement added after Milestone B may require a new test phase or a significant change to an existing one.
In practice, this relationship breaks down in two common ways. First, requirements change through Engineering Change Proposals, Configuration Control Board actions, or formal CDD amendments, but the TEMP update lags by months or is skipped entirely. Second, the TEMP is written at a level of abstraction that does not actually trace to specific requirement identifiers — it references “performance” without specifying which requirement paragraph, making gap analysis impossible without manual reconstruction.
Both failure modes create the same outcome: a test program that was designed to evaluate a system that no longer exists as specified, and a test report that cannot definitively answer whether requirements were met.
Milestone Reviews and TEMP Updates
The TEMP is formally reviewed at each major acquisition decision point. The structure of these reviews reflects the program’s maturity and the increasing rigor of test evidence required as the program advances.
At Milestone B (entry into Engineering and Manufacturing Development), the TEMP must be approved before the program can proceed. At this stage, DT&E is fully planned, OT&E is planned at a strategy level, and LFT&E sufficiency has been assessed. The test agency — typically the Army Test and Evaluation Command (ATEC), the Navy’s Commander Operational Test and Evaluation Force (COTF), or the Air Force Operational Test and Evaluation Center (AFOTEC) depending on service — provides an independent assessment of whether the TEMP is executable and sufficient.
At Milestone C (entry into Production and Deployment), the TEMP is updated to reflect the results of developmental testing and to finalize the operational test plan. By this point, the system’s configuration should be stable enough to support a complete OT&E plan with specific scenarios, threat environments, and sample sizes.
At the Full-Rate Production Decision, the TEMP is updated to reflect completed testing and document any residual evaluation activities planned for the production phase.
The Director, Operational Test and Evaluation (DOT&E) provides oversight for major defense acquisition programs and publishes annual reports assessing program test adequacy. Programs whose TEMPs are not current, not approved, or not traceable to their requirements baseline receive explicit findings in DOT&E’s annual report — a consequence that program managers take seriously.
Who Owns the TEMP
The TEMP has three stakeholders with distinct and sometimes competing interests, and understanding their roles is essential to understanding why TEMP development is politically as well as technically complex.
The program office writes the TEMP and is responsible for its currency. The program manager’s incentive is to demonstrate sufficient testing with available resources and schedule. This can create pressure toward test designs with marginal sample sizes, optimistic range availability assumptions, or evaluation criteria that are technically defensible but operationally thin.
The independent test agency reviews and approves the TEMP for the developmental test portion, and the operational test agency (which may be a separate organization from the developmental test agency) approves the OT&E plan. These agencies have statutory independence from the program office precisely because their incentive is to ensure the test is rigorous enough to catch problems, not to protect the program’s schedule.
The operational user — represented through the requirements authority and, for OT&E, through operational unit participation — defines what operational effectiveness and suitability mean in practice. The COIs in the TEMP must reflect what the warfighter actually needs to know, not what is convenient to measure. The user’s voice in TEMP development is frequently underweighted until OT&E, when soldiers or sailors or airmen operating the system in realistic conditions produce results the program office did not anticipate.
How Modern Tools Support TEMP-Driven Programs
The structural challenge of TEMP management is fundamentally a traceability problem. A program may have hundreds of requirements, dozens of test events, and a requirements baseline that evolves continuously across a development lifecycle that spans years. Maintaining the web of relationships between requirements and test coverage manually — through spreadsheets, word processor documents, and RTM exports — is not just inefficient; it is unreliable at program scale.
This is where tools built around live, graph-based traceability models offer a qualitative improvement over document-based approaches. Flow Engineering (flowengineering.com) is built specifically for hardware and systems engineering teams working in this environment. Rather than storing requirements and test events in separate documents that must be manually reconciled, Flow Engineering maintains a connected model where requirements, test events, COIs, and CTPs are nodes in a live graph. Relationships between them are first-class objects, not annotations in a spreadsheet cell.
The practical consequence for TEMP-driven programs is significant. When a requirement changes — threshold value revised, new derived requirement added, parent requirement deleted — the system immediately surfaces which test events are affected. A relaxed threshold might mean an existing test event can be retired. A new requirement with no test event mapped to it appears as an open gap, visible to the program office and the test agency simultaneously, before a test readiness review surfaces it as a finding.
Flow Engineering also supports the multi-stakeholder structure of TEMP development. Program office engineers, test agency reviewers, and requirements managers working from the operational user’s CDD can all work in the same connected model rather than exchanging document versions and maintaining parallel change logs. The traceability state is always current because it is never a separate artifact from the engineering data itself.
For programs that have historically managed TEMP currency through periodic manual reconciliation — a common practice that virtually guarantees the TEMP is out of date relative to the actual requirements baseline — this represents a structural change in how the document stays current, not just a productivity improvement.
Flow Engineering’s focus on defense and systems engineering programs means it does not attempt to serve the broader enterprise quality management or manufacturing execution markets. Teams that need those capabilities will need separate tooling. But for the core problem of maintaining live traceability between a requirements baseline and a test program across a multi-year acquisition, that deliberate focus is an asset rather than a limitation.
Practical Starting Points
If you are a program engineer or test manager working on TEMP development, three practices have the highest leverage regardless of what tools you use.
First, establish requirement identifiers in the TEMP at the paragraph level, not the document level. “This test event evaluates Section 3.2 performance requirements” is not traceability. “This test event evaluates SRS-0147, SRS-0148, and SRS-0152” is.
Second, treat COI development as a user requirements activity, not a program office drafting activity. Bring the operational user into COI formulation before the TEMP is submitted for approval, not after OT&E produces unexpected results.
Third, define your TEMP update trigger criteria explicitly. What magnitude of requirements change triggers a formal TEMP update versus a program office memo? Answering this before the first ECB action prevents the ad hoc decisions that produce stale TEMPs.
The TEMP exists because defense acquisition learned, repeatedly and expensively, that systems which are not tested against realistic requirements in realistic conditions will fail in operational use. The document’s value is only realized when its connection to the requirements baseline is current, its coverage is complete, and its stakeholders have maintained shared understanding of what the test program is actually evaluating.