Skip to content

Scope & Philosophy

Substrate's defining principle — what it models, what it deliberately doesn't, and why that boundary is also what makes it useful.

What Substrate models — and what it does not

Substrate models what is observable through an AWS API call, not what software inside a resource does. A plugin's job is to make every API observation a caller can make accurate, seedable, and time-ordered — never to execute the workload behind the API.

In scope: request/response shapes, error codes, resource state and its transitions over the simulated clock (an instance moving pending → running, a job reporting Failed with a seeded reason, a command invocation going Pending → InProgress → Success), and seedable outcomes that let a consumer's poll/retry/wait loop be tested.

Out of scope: actually running the work — executing user-data or cloud-init, running a Lambda's code, performing an inference, running a training job, bootstrapping a node. Such inputs are captured as recorded intent with a seedable success/failure/completion signal; the internal semantics of the workload are not modelled.

This boundary is also why deterministic replay works: API observations can be recorded as events and replayed identically, whereas resource internals are nondeterministic (real time, scheduling, I/O). The scope boundary and the deterministic-replay guarantee are the same line viewed from two sides.

Seeding: determinism without sacrificing coverage

Determinism does not mean every test sees the same result. Seeding is the mechanism that lets a deterministic emulator produce different outcomes on demand. By default an operation returns its nominal success path; a test seeds an alternate outcome through a control-plane endpoint, and the plugin reads that seed at request time. The same launch can therefore be made to return InsufficientInstanceCapacity, a training job to come back Failed with a CapacityError, or a query to return a specific result set — each chosen by the test, each fully reproducible.

Crucially, the failure, capacity, and timing paths a consumer's retry/poll/wait loops exist to handle are exactly the paths that are rare, slow, or impossible to trigger on demand against real AWS. Seeding makes them first-class, instant, and deterministic.

Why determinism (vs. containers or real infrastructure)

The same testing need is reachable three ways, with very different trade-offs:

  • Real AWS / LocalStack-with-containers run actual workloads, so behaviour depends on wall-clock timing, process scheduling, network, and the live state of a remote account. Failure and edge-case paths are hard to trigger and rarely reproduce; a flake cannot be replayed.
  • Hand-written mocks are deterministic but bespoke per test, drift from the real API, and cannot model state transitions or be inspected over time.
  • Substrate records every request as an immutable event over a simulated clock, so a run is reproducible by construction.

What determinism and replay give you

  • No flakes — the same inputs always produce the same outputs, so a green test stays green and a red test is a real signal, not timing noise.
  • Exact reproduction — a failure replays identically from its recorded events; you debug the exact run, not an approximation of it.
  • Time-travel inspection — step backward through request history and read resource state at any point.
  • Testable rare paths — seeded outcomes make capacity failures, throttling, terminal job states, and slow transitions instant and repeatable.
  • Fast and free — no network, no real account, no provisioning latency or spend; suitable for unit tests and tight inner loops.
  • Regression fixtures — a recorded run can be exported as a standalone test, turning a once-seen scenario into a permanent guard.

The deliberate cost is fidelity to workload internals, which is out of scope: Substrate is the fast, deterministic tier for exercising how code drives and reacts to the AWS API — not for validating what runs inside a resource.

How it compares

Substrate is a different tier from container emulators and real accounts, not a drop-in replacement for them. It trades workload-execution fidelity for determinism, replayability, and cost insight.

SubstrateLocalStackmotoReal AWS
Deterministic replaypartial
Time-travel debugging
Cost visibility before deployafter the bill
Seedable failure / capacity / timing pathspartialpartial
Runs your actual workload code(by design)
Language-agnostic (any SDK/CLI over HTTP)Python-first
No account · offline · free

The single ❌ is the scope boundary above, by design. If you need to execute your Lambda's code or boot a real container, reach for LocalStack or a real account. If you need to test how your infrastructure code drives and reacts to the AWS API — fast, deterministically, and on every change — that is exactly what Substrate is for.

Why this fits AI-generated infrastructure

AI generates infrastructure code at volume, inside generate → test → fix loops, and those loops need what this tier provides:

  • Determinism, because a flaky failure is a false signal an agent acts on — wasting fix cycles chasing timing noise rather than real bugs.
  • Free, fast, and offline, so verification can run on every generated candidate without a real account, provisioning latency, or spend.
  • Cost visibility as a machine-gradeable guardrailIntent{MaxCost} lets a deploy fail when generated infra would blow a budget, a check an AI has no instinct for.
  • Seedable failure paths, so the retry/poll/wait logic generated code claims to have is actually exercised against capacity errors, throttling, and terminal states — not merely assumed.