BDD scenarios

Gherkin .feature files run via vitest + @amiceli/vitest-cucumber, wired to the Nwire test harness. Each scenario is one end-to-end story: real dispatches, real reactions, real assertions against telemetry + projections.

Why BDD here

Most code lives in handlers + reactions. The most valuable test isn't "this handler returns X" — it's "Avi submits → grade → mastery updates → notification sent." A scenario language that reads like the user's journey, mapped to real Nwire actions.

It's also a forcing function: writing scenarios surfaces gaps in the domain language. If you can't write the step, you don't yet have the shape.

Setup

bash

pnpm add -D @amiceli/vitest-cucumber

(@nwire/test-kit/bdd provides the wrapper. The cucumber lib is a peer dep — install only if you use BDD.)

A complete scenario

gherkin

# features/avi-submits-and-auto-grades.feature
Feature: Avi submits an answer
  As a 9-year-old student learning Hebrew letters
  I want to submit my answer
  So that I get immediate feedback

  Background:
    Given Avi is enrolled in "Hebrew Letters"
    And the auto-grader confidence threshold is 0.75

  Scenario: High-confidence answer is auto-graded
    When Avi submits "alef" to exercise "letter-1"
    Then within 200ms event submissions.answer-submitted fires
    And within 500ms event submissions.auto-graded fires
    And the submission status is "graded"
    And no events leak across tenants

  Scenario: Low-confidence answer is flagged for review
    When Avi submits "?" to exercise "letter-1"
    Then within 200ms event submissions.answer-submitted fires
    And within 500ms event submissions.flagged-for-review fires
    And the submission status is "under-review"
    And a 3-day reminder timer is scheduled

// features/avi-submits-and-auto-grades.test.ts
import { feature } from "@nwire/test-kit"
import { harness, type Harness } from "@nwire/test-kit"
import { submissionsApp } from "../app"
import { enrollStudent, submitAnswer } from "../modules/submissions"

interface Ctx { h: Harness; submissionId?: string }

feature("./avi-submits-and-auto-grades.feature", (ctx) => {
  const c: Ctx = {} as Ctx
  ctx.background(async () => { c.h = await harness({ app: submissionsApp }) })
  ctx.afterScenario(async () => c.h.stop())

  ctx.given("Avi is enrolled in {course}", async (course: string) => {
    await c.h.dispatch(enrollStudent, { studentId: "avi", course })
  })

  ctx.given("the auto-grader confidence threshold is {n}", async (_n: string) => {
    // configured at module-load time; nothing to do
  })

  ctx.when("Avi submits {answer} to exercise {id}", async (answer: string, id: string) => {
    const result = await c.h.dispatch(submitAnswer, { studentId: "avi", exerciseId: id, answer })
    c.submissionId = (result as any).payload.submissionId
  })

  ctx.then("within {ms}ms event {name} fires", async (ms: string, name: string) => {
    await c.h.idle(parseInt(ms, 10))
    expect(c.h.telemetry.count("event.published", e => e.event.eventName === name)).toBeGreaterThan(0)
  })

  ctx.then("the submission status is {status}", async (status: string) => {
    const view = await c.h.app.runtime.getActorStore().load("submission", c.submissionId!, "")
    expect(view?.state).toBe(status === "under-review" ? "under-review" : "graded")
  })

  ctx.then("a {duration} reminder timer is scheduled", async (_d: string) => {
    expect(c.h.telemetry.count("timer.scheduled")).toBeGreaterThan(0)
  })

  ctx.then("no events leak across tenants", async () => {
    expect(c.h.telemetry.drift(declaredChainsForSubmissions)).toEqual([])
  })
})

What scenarios cost you

Two layers:

The .feature file — the domain story; lives in plain English with product / QA stakeholders
The step bindings — TypeScript glue from each pattern to a harness call

The bindings file is small once you commit to a vocabulary: Avi submits X, event Y fires, submission status is Z. Reuse them across many features.

Where scenarios shine

Acceptance tests with stakeholders

Hand the .feature file to a non-engineer. They can read it, suggest edits, push back. The TypeScript bindings stay in engineering's hands.

Regression suite for tricky chains

Every time a bug shipped because "we forgot reaction X fires on event Y" — write a scenario. Now it's permanent.

Onboarding new contributors

A new engineer reads features/*.feature and learns the domain in 20 minutes. No code dive needed.

Where scenarios are wrong tool

Don't use BDD for unit tests

Validating a handler's input zod schema → vitest unit test. Asserting a state transition in isolation → vitest unit test. BDD shines when you're asserting a flow that crosses primitives.

Don't over-mock

The harness boots the WHOLE app. Don't write mocks for actors / reactions inside scenarios — that defeats the point. If you need mocks for external calls, pass them via runtime.registerExternalCallExecutor.

Scenario outlines for tables

gherkin

Scenario Outline: verdict drives next state
  Given Avi submitted "alef" to exercise "letter-1"
  When the grader reports verdict <verdict> with confidence <conf>
  Then the submission moves to <state>

  Examples:
    | verdict      | conf  | state          |
    | "passed"     | 0.95  | "graded"       |
    | "needs-redo" | 0.95  | "graded"       |
    | "uncertain"  | 0.30  | "under-review" |

vitest-cucumber expands each row into a separate test with substituted values.

BDD scenarios ​

Why BDD here ​

Setup ​

A complete scenario ​

What scenarios cost you ​

Where scenarios shine ​

Where scenarios are wrong tool ​

Scenario outlines for tables ​

See also ​