Master Build Map
Purpose
This is the canonical development standard for building Nooterra into a real enterprise world-runtime company. It is not a pitch deck. It is not a loose roadmap. It is not a research note. It is the operating specification for how to build:- a real product
- a real control system
- a real software company
- a real operational platform
Core Position
Nooterra is not an agent wrapper. Nooterra is not a prompt layer. Nooterra is not a dashboard over disconnected APIs. Nooterra is a policy-constrained, uncertainty-aware, intervention-driven enterprise control system built on top of a persistent business world model. The system exists to answer six questions better than a human operator:- What is true right now?
- What is likely to happen next?
- What would happen if we act?
- What is allowed?
- What should be done now?
- When should the system abstain and ask a human?
- memory
- consistency
- calibration
- side-effect reasoning
- governance
- replayability
What “Real” Means
The system is only considered real when all of the following are true:- It acts from connected system state, not only natural-language setup.
- It is fail-closed on missing evidence, missing auth, missing provenance, or high uncertainty.
- It records expected effects before action and compares them to observed effects later.
- It produces deterministic machine-readable artifacts at every safety-critical step.
- It supports replay, audit, and incident review without depending on ephemeral process memory.
- It can be operated by a small but serious team serving design partners and paying customers.
- an LLM can draft an action
- a planner can rank tasks
- a dashboard looks convincing
- the codebase contains ambitious concepts
Company Standard
The company itself must be built as seriously as the software. Nooterra needs all of the following to become real:- clear initial ICP
- consistent product positioning
- design-partner operating motion
- onboarding discipline
- billing discipline
- support workflow
- security posture
- observability
- incident handling
- product analytics
- release discipline
- deployment discipline
- reproducible environments
- real documentation
- what customer problem is solved
- what system capability is added
- what risk is reduced
- how success is measured
Product Definition
Initial Product
The first real product is: Stripe-first governed AR and finance-ops control The first user flow is:- Connect Stripe
- Materialize company state
- Review overdue/risky invoices
- Launch the AR runtime in shadow mode
- Review proposed actions through the action gateway
- Observe outcomes and allow autonomy to expand only from evidence
Immediate Buyers
The initial buyer is not “every company.” The initial buyer is:- founder-led SMB
- finance ops lead
- controller
- head of revenue operations
- operator responsible for collections, billing follow-up, disputes, or payment coordination
Initial Promise
The initial promise is: Connect Stripe, understand what is happening in your receivables, and safely govern what happens next.Non-Goals for the Initial Product
- no fake Gmail world-model source
- no multi-domain “AI employee” promise
- no broad autonomous external sends by default
- no generic team generator as the main product
- no pretending simulation is universal before it is domain-specific and measured
Product Principles
1. Simplicity at the surface, sophistication underneath
The user experience must stay simple enough for a non-technical operator. Users should not need to understand:- prompt design
- graph schemas
- agent routing
- autonomy math
- model calibration
- connected systems
- company state
- what needs attention
- what Nooterra wants to do
- why it wants to do it
- whether it is safe to allow
2. No fake state
If data is not live, the UI must say so. Never fabricate coverage, projections, actions, or health state to make the product feel complete.3. No autonomy without evidence
No action class may become more autonomous without:- persisted execution history
- persisted grade history
- persisted incident history
- persisted uncertainty handling
- promotion evidence
4. No LLM as source of truth
The LLM can:- extract
- summarize
- explain
- draft
- propose
- the canonical memory
- the policy authority
- the trust assignment system
- the source of company state
5. Build from one domain outward
Depth before breadth. The order is:- AR collections
- finance control plane
- multi-source company state
- domain packs
- domain-agnostic enterprise control
Canonical System Model
Nooterra is built from five coupled models:- World Model Persistent observed and estimated company state
- Causal Intervention Model What changes if action A is taken instead of action B
- Policy Model What is allowed, forbidden, escalated, reversible, or high-risk
- Operator Model What the AI runtime is currently competent to do safely
- Objective Model What the company is optimizing under constraints
Canonical Engineering Layers
Each layer below is required. Each has explicit standards.Layer 1: Observation Plane
Purpose
Continuously ingest source-system events and normalize them into tenant-scoped, typed observations.Required Capabilities
- webhooks
- polling
- sync cursors
- idempotent ingest
- provenance tracking
- raw payload retention
- extraction confidence
- failure retries
- dead-letter handling
Required Sources by Phase
Phase 1:- Stripe only
- Gmail or email/conversation source
- accounting source
- CRM source
- support platform
- calendar/tasks
- documents/contracts
Standards
- Every inbound source event gets a stable dedupe key.
- Raw payloads are retained or referenced before transformation.
- Tenant identity must be explicit and fail closed if missing.
- Connector state must survive process restarts.
- Every connector must support resumption from last good cursor.
Definition of Done
- no duplicate world events for duplicate source deliveries
- replay-safe ingest
- connector-specific tests for malformed payload, duplicate payload, missing tenant, and retry behavior
Layer 2: Temporal Event Ledger
Purpose
Maintain the immutable, append-only business history.Required Capabilities
- append-only writes
- per-tenant hash chain
- bi-temporal semantics
- causal references
- object references
- event querying
- object history reconstruction
- replay support
Standards
- events are never mutated in place
- corrections are represented as new events
- hash chain integrity is verifiable
- write path must be deterministic where contractually required
- events must preserve provenance and confidence
Definition of Done
- replay over the ledger can reconstruct downstream state
- chain verification succeeds under repeated runs
- missing or invalid provenance fails closed where required
Layer 3: Canonical Object Graph
Purpose
Represent the nouns of the business as typed, versioned objects and relationships.Required Capabilities
- canonical object types
- object versioning
- relationship graph
- tenant isolation
- search
- history
- provenance linking back to ledger
- entity resolution
Initial Canonical Types
Phase 1:- party
- invoice
- payment
- dispute
- obligation
- credit
- refund
- task
- approval
- action proposal
- conversation
- message
- contract
- opportunity
- account
Standards
- observed state and estimated state remain separate
- object updates must be explainable from events
- relationships carry type and strength
- object listing/search must be tenant-scoped
Definition of Done
- object state can be traced back to ledger events
- object history is reconstructable
- entity conflicts can be represented explicitly instead of silently overwritten
Layer 4: Beliefs, Predictions, and Calibration
Purpose
Represent hidden state and future state with explicit uncertainty and calibration.Required Capabilities
- durable beliefs
- durable prediction history
- durable prediction outcomes
- calibration reporting
- confidence intervals
- drift detection
- OOD detection
- fallback behavior when sidecar/model is unavailable
Standards
- beliefs are first-class records, not only denormalized JSON
- predictions are versioned and timestamped
- observed outcomes are linked back to specific predictions
- uncertainty metadata must be preserved, not discarded
- sidecar/model failure must lower autonomy, not silently continue as if confidence were unchanged
Definition of Done
- predictions can be inspected historically
- outcomes can be joined back to predictions
- calibration reports are reproducible
- drift and OOD alter runtime behavior, not only monitoring
Layer 5: Action Ontology
Purpose
Represent actions as causal business interventions, not bare tool calls.Required Capabilities
- typed action classes
- preconditions
- expected effects
- side-effect surface
- blast radius
- reversibility
- outcome delay
- outcome signals
- default intervention confidence
Standards
- every external-effect action must have a registered action type
- unsupported action types fail closed
- action types include observability expectations
- expected effects must be storable and comparable to actual outcomes later
Definition of Done
- the gateway uses action types, not string heuristics alone
- simulation and replay are action-type aware
- action class metadata is queryable and stable
Layer 6: Policy Runtime and Authority System
Purpose
Control what the system may do, on whose authority, and under what constraints.Required Capabilities
- tenant auth
- user auth
- authority grants
- delegated authority
- budget limits
- policy overrides
- structured constraints
- approval requirements
- deny rules
- disclosure rules
Standards
- no bypass paths for risky or paid actions
- child authority only attenuates parent authority
- tenant mismatch fails closed
- production write routes require authenticated context
- policy evaluation must be auditable
Definition of Done
- every action can explain why it was allowed, denied, or escrowed
- grant lineage is reconstructable
- policy decisions are reproducible under replay
Layer 7: Action Gateway
Purpose
Serve as the single chokepoint for external or risky side effects.Required Capabilities
- validation
- rate limits
- budget checks
- disclosure enforcement
- simulation
- escrow decisions
- execution logging
- release/rejection flow
- evidence bundles
- replay-ready persistence
Standards
- gateway is always on the control path for external-effect actions
- all safety-critical steps must be persisted, not only returned in memory
- every gateway result produces machine-readable artifacts
- missing simulation or insufficient uncertainty support must degrade to approval or denial
Definition of Done
- every governed action has a durable gateway row
- preflight and simulation are persisted
- approval release preserves audit continuity
- no external action can occur without passing the gateway
Layer 8: Objective Model and Planner
Purpose
Move from “what is likely next?” to “what should be done under explicit objectives and constraints?”Required Capabilities
- tenant-scoped weighted objectives
- hard constraints
- uncertainty penalty
- deterministic candidate ranking
- action scoring
- planner summary
- reactive planning in Phase 1
- short-horizon planning in later phases
Standards
- objectives are explicit and persisted
- planner output is deterministic for fixed inputs
- uncertainty reduces score
- hard constraints can remove candidates entirely
- planner must not invent action classes not supported by the ontology
Definition of Done
- plan output can be replayed from the same state snapshot
- planner scoring is explainable
- top-ranked actions align with real operator judgment in pilot review
Layer 9: Operator Model and Earned Autonomy
Purpose
Track what the runtime is currently competent to do and enforce autonomy accordingly.Required Capabilities
- persisted coverage cells
- persisted autonomy decisions
- promotion proposals
- demotion on incidents
- abstention on uncertainty or drift
- per-action-class autonomy
- per-object-type autonomy
Standards
- autonomy enforcement must use persisted state
- critical incidents demote immediately
- uncertainty can cap effective autonomy below nominal autonomy
- promotion is recommendation-first, not silent escalation
Definition of Done
- autonomy level affects runtime behavior
- autonomy history survives restarts
- promotion/demotion can be audited after the fact
Layer 10: Feedback Loop, Effect Tracking, and Replay
Purpose
Measure whether actions caused the expected change and whether the system’s decision quality is improving.Required Capabilities
- expected effect persistence
- delayed outcome observation
- effect comparison
- action outcome records
- replay endpoints
- watcher jobs
- objective achievement scoring
- side-effect recording
Standards
- expected effects are recorded before or at proposal time
- actual effects are computed from real object state and ledger events later
- replay exposes action, expected effects, observed effects, and verdict
- watcher logic must be deterministic for a fixed
asOftime and dataset
Definition of Done
- the system can answer “what did we think would happen?”
- the system can answer “what actually happened?”
- the system can answer “did that intervention work?”
Layer 11: Runtime Packs
Purpose
Package domain-specific operational behavior behind a stable, simple product surface.Initial Runtime Packs
Phase 1:- AR collections
- disputes
- refunds
- credits/write-offs
- payment plans
- finance ops suite
- support ops
- revops
Standards
- runtime packs expose simple business outcomes, not technical primitives
- each runtime pack declares:
- supported action classes
- objective defaults
- policy defaults
- approval defaults
- allowed sources
- non-technical users do not need to hand-assemble workers
Definition of Done
- users can provision a runtime pack without writing prompts
- the runtime pack’s control path is fully governed
Layer 12: Product UX and Dashboard
Purpose
Make the system usable by real operators without leaking internal complexity.Required Surfaces
- onboarding
- company state
- runtime overview
- predictions
- approval queue
- policy runtime
- autonomy map
- action replay
- simulation / what-if
- incident review
Standards
- no fake data in production surfaces
- empty states are explicit
- language is runtime-first and world-model-first
- “agents/workers” are internal implementation details unless the user is technical
- approvals show evidence, not only buttons
Definition of Done
- a non-technical operator can connect Stripe and understand the first recommended action
- the UI reveals why an action is proposed and why it is blocked or escalated
Layer 13: Support, Admin, and Internal Operations
Purpose
Allow a small company to actually operate the product.Required Capabilities
- partner onboarding checklist
- incident queue
- customer activity timelines
- billing support tooling
- runtime support tooling
- internal overrides with audit trails
- replay and evidence export
Standards
- support actions must be auditable
- internal operators use the same state system wherever possible
- no “support by database guessing” as standard practice
Definition of Done
- incidents can be diagnosed with product-native evidence
- partner onboarding does not depend on ad hoc memory
Security and Compliance Standard
This company will hold business state, financial state, action history, and approval data. Security is core product functionality.Required Controls
- tenant isolation at every data path
- authenticated writes
- least-privilege service credentials
- secret rotation
- database backups and restore drills
- audit logging
- environment separation
- API abuse controls
- secure webhook verification
- encryption at rest and in transit
Required Security Workstreams
Phase 1:- tenant auth hardening
- write-route auth hardening
- secrets inventory
- backup policy
- restore drill
- incident-response runbook
- SSO/SCIM for enterprise
- granular RBAC
- data retention/deletion workflows
- key rotation automation
- external security review
- compliance program sized to customers
- formal change management
- vendor-risk review process
Definition of Done
- the company can explain its trust boundaries in detail
- no critical path depends on shared human knowledge
Infrastructure Standard
Current Direction
- dashboard on Vercel
- runtime and auth on Railway
- Postgres as system of record
- object storage for evidence
- Sentry
- PostHog
- Resend
- Stripe
Required Infrastructure Capabilities
- repeatable environments
- managed Postgres with PITR
- alerting
- structured logs
- environment-specific secrets
- deploy rollback path
- scheduler reliability
- job monitoring
- object storage
- metrics on gateway, runtime, watcher, and connector paths
Standards
- do not rewrite frameworks for vanity reasons
- optimize for reliability and operational clarity before benchmark-driven micro-optimizations
- any background job must be restart-safe and idempotent
ML and Evaluation Standard
The ML system is part of the product, not a side experiment.Required Evaluation Domains
- prediction quality
- calibration quality
- side-effect prediction quality
- intervention effect quality
- planner recommendation quality
- approval routing quality
- autonomy promotion quality
- replay accuracy
Required Artifacts
- model version
- feature version
- training window
- evaluation set
- calibration report
- rollback path
Standards
- no model upgrade without offline evaluation
- no autonomy upgrade based on model change alone
- rules remain available as fallbacks
- unknown distributions reduce autonomy
Long-Term Research Standard
By the time Nooterra claims to be state of the art, it should have:- its own benchmark suite
- reproducible evaluation harnesses
- intervention-effect experiments
- counterfactual replay experiments
- internal reports on domain transfer quality
Development Workflow Standard
Every material feature must follow this flow:- write the product intent
- define the failure mode
- define the data contract
- define the audit artifact
- define the tests
- implement the smallest real slice
- run targeted tests
- instrument it
- document it
- only then expand scope
Required Per-Feature Deliverables
- code
- migration, if schema changes
- route contract, if API changes
- tests
- docs
- rollout note
Required Review Questions
- how does this fail closed?
- what is the tenant boundary?
- what artifacts are persisted?
- how is uncertainty represented?
- what are the deterministic guarantees?
- how would this be replayed during an incident?
Testing Standard
Required Test Classes
- unit tests
- route tests
- integration tests
- fail-closed tests
- determinism tests
- replay tests
- tenancy-isolation tests
- migration/bootstrap parity tests where applicable
Safety-Critical Paths Must Test
- missing tenant
- malformed body
- stale auth
- unsupported action class
- missing evidence
- sidecar unavailable
- drift/OOD
- denied policy
- approval required
- replay after restart
Operational Paths Must Test
- scheduler restart
- duplicate webhook delivery
- watcher idempotency
- release/reject flow
- calibration persistence
Data and Migration Standard
Rules
- migrations are additive and backward-safe by default
- new durable behavior requires schema before feature
- process memory is never the source of truth for safety-critical runtime state
- denormalized fields may exist, but durable source records must exist first
Required Tables by Maturity
Already required:- world_events
- world_objects
- world_relationships
- gateway_actions
- world_beliefs
- world_predictions
- world_prediction_outcomes
- world_autonomy_coverage
- world_autonomy_decisions
- tenant_objectives
- world_action_outcomes
- world_action_effect_observations
- action budgets
- intervention experiments
- counterfactual replay sets
- model release registry
Release and Rollout Standard
Every New Capability Must Have
- activation strategy
- rollback strategy
- monitoring strategy
- incident owner
- customer-facing truthfulness standard
Staged Rollout
- local and targeted tests
- internal staging
- design partner shadow mode
- limited approval-first rollout
- narrower autonomous rollout after evidence
Company-Building Standard
Nooterra needs more than engineers and code.Required Functions
- product / founder
- backend / platform engineering
- frontend / product engineering
- ML / evaluation
- design
- support / customer success
- ops / reliability
- security / compliance ownership
Required Company Systems
- support tooling
- onboarding process
- incident process
- weekly product review
- analytics review
- design-partner review
- release checklist
- architecture review
- documentation ownership
Priority Build Streams
These streams should run in order of dependency, with some parallelism where safe.Stream A: Wedge Excellence
- Stripe ingest
- company state
- AR runtime
- approval queue
- real action evidence
- realized outcomes
Stream B: Control-System Completion
- full gateway
- budgets
- temporal policy constraints
- watcher automation
- effect-aware evaluation
- autonomy tied to measured intervention quality
Stream C: Finance Control Plane
- disputes
- refunds
- credits
- write-offs
- payment plans
- finance-specific simulation
Stream D: Operations Maturity
- support tooling
- admin tooling
- incident review
- observability
- billing self-service
- analytics discipline
Stream E: Multi-Source State
- Gmail/conversations
- CRM
- accounting
- support
- entity resolution
- cross-system effects
Stream F: Research and Frontier
- intervention-effect learning
- counterfactual replay
- short-horizon control planning
- benchmark suite
- domain transfer
What Is Explicitly Deferred
The following should not distract the core build order:- framework rewrites for prestige
- fake broad “AI employee” packaging
- six shallow domains at once
- aggressive enterprise surface area before the wedge is proven
- autonomy claims unsupported by replayable evidence
Definition of Success
Nooterra succeeds when it can truthfully say:- we maintain a live business world model
- we know what we observed vs what we inferred
- we know what we predicted vs what actually happened
- we know what actions were safe, unsafe, effective, or ineffective
- we know what autonomy has been earned and why
- we can replay and audit critical decisions
- non-technical operators can use the system to run real business workflows
Code Reality
Yes, this system requires a lot of code. Not because complexity is fashionable, but because the real system includes:- connectors
- ledger
- graph
- beliefs
- prediction history
- calibration
- action ontology
- gateway
- policy runtime
- autonomy persistence
- effect tracking
- replay
- watcher jobs
- planner
- runtime packs
- dashboard
- admin tooling
- billing
- auth
- observability
- support tooling