Master Build Map

Purpose

This is the canonical development standard for building Nooterra into a real enterprise world-runtime company. It is not a pitch deck. It is not a loose roadmap. It is not a research note. It is the operating specification for how to build:

a real product
a real control system
a real software company
a real operational platform

The standard applies to product, backend, runtime, data, ML, security, infrastructure, operations, and company-building work. If a proposed feature, refactor, or milestone conflicts with this document, this document wins unless it is explicitly superseded.

Core Position

Nooterra is not an agent wrapper. Nooterra is not a prompt layer. Nooterra is not a dashboard over disconnected APIs. Nooterra is a policy-constrained, uncertainty-aware, intervention-driven enterprise control system built on top of a persistent business world model. The system exists to answer six questions better than a human operator:

What is true right now?
What is likely to happen next?
What would happen if we act?
What is allowed?
What should be done now?
When should the system abstain and ask a human?

That means the system must be stronger than typical AI products in six dimensions:

memory
consistency
calibration
side-effect reasoning
governance
replayability

What “Real” Means

The system is only considered real when all of the following are true:

It acts from connected system state, not only natural-language setup.
It is fail-closed on missing evidence, missing auth, missing provenance, or high uncertainty.
It records expected effects before action and compares them to observed effects later.
It produces deterministic machine-readable artifacts at every safety-critical step.
It supports replay, audit, and incident review without depending on ephemeral process memory.
It can be operated by a small but serious team serving design partners and paying customers.

The system is not considered real merely because:

an LLM can draft an action
a planner can rank tasks
a dashboard looks convincing
the codebase contains ambitious concepts

Company Standard

The company itself must be built as seriously as the software. Nooterra needs all of the following to become real:

clear initial ICP
consistent product positioning
design-partner operating motion
onboarding discipline
billing discipline
support workflow
security posture
observability
incident handling
product analytics
release discipline
deployment discipline
reproducible environments
real documentation

Every engineering milestone must answer:

what customer problem is solved
what system capability is added
what risk is reduced
how success is measured

Product Definition

Initial Product

The first real product is: Stripe-first governed AR and finance-ops control The first user flow is:

Connect Stripe
Materialize company state
Review overdue/risky invoices
Launch the AR runtime in shadow mode
Review proposed actions through the action gateway
Observe outcomes and allow autonomy to expand only from evidence

Immediate Buyers

The initial buyer is not “every company.” The initial buyer is:

founder-led SMB
finance ops lead
controller
head of revenue operations
operator responsible for collections, billing follow-up, disputes, or payment coordination

Initial Promise

The initial promise is: Connect Stripe, understand what is happening in your receivables, and safely govern what happens next.

Non-Goals for the Initial Product

no fake Gmail world-model source
no multi-domain “AI employee” promise
no broad autonomous external sends by default
no generic team generator as the main product
no pretending simulation is universal before it is domain-specific and measured

Product Principles

1. Simplicity at the surface, sophistication underneath

The user experience must stay simple enough for a non-technical operator. Users should not need to understand:

prompt design
graph schemas
agent routing
autonomy math
model calibration

Users should experience:

connected systems
company state
what needs attention
what Nooterra wants to do
why it wants to do it
whether it is safe to allow

2. No fake state

If data is not live, the UI must say so. Never fabricate coverage, projections, actions, or health state to make the product feel complete.

3. No autonomy without evidence

No action class may become more autonomous without:

persisted execution history
persisted grade history
persisted incident history
persisted uncertainty handling
promotion evidence

4. No LLM as source of truth

The LLM can:

extract
summarize
explain
draft
propose

The LLM cannot be:

the canonical memory
the policy authority
the trust assignment system
the source of company state

5. Build from one domain outward

Depth before breadth. The order is:

AR collections
finance control plane
multi-source company state
domain packs
domain-agnostic enterprise control

Canonical System Model

Nooterra is built from five coupled models:

World Model Persistent observed and estimated company state
Causal Intervention Model What changes if action A is taken instead of action B
Policy Model What is allowed, forbidden, escalated, reversible, or high-risk
Operator Model What the AI runtime is currently competent to do safely
Objective Model What the company is optimizing under constraints

The runtime loop is: Observe -> Model -> Predict -> Simulate -> Plan -> Act -> Evaluate -> Learn

Canonical Engineering Layers

Each layer below is required. Each has explicit standards.

Layer 1: Observation Plane

Purpose

Continuously ingest source-system events and normalize them into tenant-scoped, typed observations.

Required Capabilities

webhooks
polling
sync cursors
idempotent ingest
provenance tracking
raw payload retention
extraction confidence
failure retries
dead-letter handling

Required Sources by Phase

Phase 1:

Stripe only

Phase 2:

Gmail or email/conversation source
accounting source
CRM source

Phase 3:

support platform
calendar/tasks
documents/contracts

Standards

Every inbound source event gets a stable dedupe key.
Raw payloads are retained or referenced before transformation.
Tenant identity must be explicit and fail closed if missing.
Connector state must survive process restarts.
Every connector must support resumption from last good cursor.

Definition of Done

no duplicate world events for duplicate source deliveries
replay-safe ingest
connector-specific tests for malformed payload, duplicate payload, missing tenant, and retry behavior

Layer 2: Temporal Event Ledger

Purpose

Maintain the immutable, append-only business history.

Required Capabilities

append-only writes
per-tenant hash chain
bi-temporal semantics
causal references
object references
event querying
object history reconstruction
replay support

Standards

events are never mutated in place
corrections are represented as new events
hash chain integrity is verifiable
write path must be deterministic where contractually required
events must preserve provenance and confidence

Definition of Done

replay over the ledger can reconstruct downstream state
chain verification succeeds under repeated runs
missing or invalid provenance fails closed where required

Layer 3: Canonical Object Graph

Purpose

Represent the nouns of the business as typed, versioned objects and relationships.

Required Capabilities

canonical object types
object versioning
relationship graph
tenant isolation
search
history
provenance linking back to ledger
entity resolution

Initial Canonical Types

Phase 1:

party
invoice
payment
dispute
obligation

Phase 2:

credit
refund
task
approval
action proposal

Phase 3:

conversation
message
contract
opportunity
account

Standards

observed state and estimated state remain separate
object updates must be explainable from events
relationships carry type and strength
object listing/search must be tenant-scoped

Definition of Done

object state can be traced back to ledger events
object history is reconstructable
entity conflicts can be represented explicitly instead of silently overwritten

Layer 4: Beliefs, Predictions, and Calibration

Purpose

Represent hidden state and future state with explicit uncertainty and calibration.

Required Capabilities

durable beliefs
durable prediction history
durable prediction outcomes
calibration reporting
confidence intervals
drift detection
OOD detection
fallback behavior when sidecar/model is unavailable

Standards

beliefs are first-class records, not only denormalized JSON
predictions are versioned and timestamped
observed outcomes are linked back to specific predictions
uncertainty metadata must be preserved, not discarded
sidecar/model failure must lower autonomy, not silently continue as if confidence were unchanged

Definition of Done

predictions can be inspected historically
outcomes can be joined back to predictions
calibration reports are reproducible
drift and OOD alter runtime behavior, not only monitoring

Layer 5: Action Ontology

Purpose

Represent actions as causal business interventions, not bare tool calls.

Required Capabilities

typed action classes
preconditions
expected effects
side-effect surface
blast radius
reversibility
outcome delay
outcome signals
default intervention confidence

Standards

every external-effect action must have a registered action type
unsupported action types fail closed
action types include observability expectations
expected effects must be storable and comparable to actual outcomes later

Definition of Done

the gateway uses action types, not string heuristics alone
simulation and replay are action-type aware
action class metadata is queryable and stable

Layer 6: Policy Runtime and Authority System

Purpose

Control what the system may do, on whose authority, and under what constraints.

Required Capabilities

tenant auth
user auth
authority grants
delegated authority
budget limits
policy overrides
structured constraints
approval requirements
deny rules
disclosure rules

Standards

no bypass paths for risky or paid actions
child authority only attenuates parent authority
tenant mismatch fails closed
production write routes require authenticated context
policy evaluation must be auditable

Definition of Done

every action can explain why it was allowed, denied, or escrowed
grant lineage is reconstructable
policy decisions are reproducible under replay

Layer 7: Action Gateway

Purpose

Serve as the single chokepoint for external or risky side effects.

Required Capabilities

validation
rate limits
budget checks
disclosure enforcement
simulation
escrow decisions
execution logging
release/rejection flow
evidence bundles
replay-ready persistence

Standards

gateway is always on the control path for external-effect actions
all safety-critical steps must be persisted, not only returned in memory
every gateway result produces machine-readable artifacts
missing simulation or insufficient uncertainty support must degrade to approval or denial

Definition of Done

every governed action has a durable gateway row
preflight and simulation are persisted
approval release preserves audit continuity
no external action can occur without passing the gateway

Layer 8: Objective Model and Planner

Purpose

Move from “what is likely next?” to “what should be done under explicit objectives and constraints?”

Required Capabilities

tenant-scoped weighted objectives
hard constraints
uncertainty penalty
deterministic candidate ranking
action scoring
planner summary
reactive planning in Phase 1
short-horizon planning in later phases

Standards

objectives are explicit and persisted
planner output is deterministic for fixed inputs
uncertainty reduces score
hard constraints can remove candidates entirely
planner must not invent action classes not supported by the ontology

Definition of Done

plan output can be replayed from the same state snapshot
planner scoring is explainable
top-ranked actions align with real operator judgment in pilot review

Layer 9: Operator Model and Earned Autonomy

Purpose

Track what the runtime is currently competent to do and enforce autonomy accordingly.

Required Capabilities

persisted coverage cells
persisted autonomy decisions
promotion proposals
demotion on incidents
abstention on uncertainty or drift
per-action-class autonomy
per-object-type autonomy

Standards

autonomy enforcement must use persisted state
critical incidents demote immediately
uncertainty can cap effective autonomy below nominal autonomy
promotion is recommendation-first, not silent escalation

Definition of Done

autonomy level affects runtime behavior
autonomy history survives restarts
promotion/demotion can be audited after the fact

Layer 10: Feedback Loop, Effect Tracking, and Replay

Purpose

Measure whether actions caused the expected change and whether the system’s decision quality is improving.

Required Capabilities

expected effect persistence
delayed outcome observation
effect comparison
action outcome records
replay endpoints
watcher jobs
objective achievement scoring
side-effect recording

Standards

expected effects are recorded before or at proposal time
actual effects are computed from real object state and ledger events later
replay exposes action, expected effects, observed effects, and verdict
watcher logic must be deterministic for a fixed asOf time and dataset

Definition of Done

the system can answer “what did we think would happen?”
the system can answer “what actually happened?”
the system can answer “did that intervention work?”

Layer 11: Runtime Packs

Purpose

Package domain-specific operational behavior behind a stable, simple product surface.

Initial Runtime Packs

Phase 1:

AR collections

Phase 2:

disputes
refunds
credits/write-offs
payment plans

Phase 3:

finance ops suite
support ops
revops

Standards

runtime packs expose simple business outcomes, not technical primitives
each runtime pack declares:
- supported action classes
- objective defaults
- policy defaults
- approval defaults
- allowed sources
non-technical users do not need to hand-assemble workers

Definition of Done

users can provision a runtime pack without writing prompts
the runtime pack’s control path is fully governed

Layer 12: Product UX and Dashboard

Purpose

Make the system usable by real operators without leaking internal complexity.

Required Surfaces

onboarding
company state
runtime overview
predictions
approval queue
policy runtime
autonomy map
action replay
simulation / what-if
incident review

Standards

no fake data in production surfaces
empty states are explicit
language is runtime-first and world-model-first
“agents/workers” are internal implementation details unless the user is technical
approvals show evidence, not only buttons

Definition of Done

a non-technical operator can connect Stripe and understand the first recommended action
the UI reveals why an action is proposed and why it is blocked or escalated

Layer 13: Support, Admin, and Internal Operations

Purpose

Allow a small company to actually operate the product.

Required Capabilities

partner onboarding checklist
incident queue
customer activity timelines
billing support tooling
runtime support tooling
internal overrides with audit trails
replay and evidence export

Standards

support actions must be auditable
internal operators use the same state system wherever possible
no “support by database guessing” as standard practice

Definition of Done

incidents can be diagnosed with product-native evidence
partner onboarding does not depend on ad hoc memory

Security and Compliance Standard

This company will hold business state, financial state, action history, and approval data. Security is core product functionality.

Required Controls

tenant isolation at every data path
authenticated writes
least-privilege service credentials
secret rotation
database backups and restore drills
audit logging
environment separation
API abuse controls
secure webhook verification
encryption at rest and in transit

Required Security Workstreams

Phase 1:

tenant auth hardening
write-route auth hardening
secrets inventory
backup policy
restore drill
incident-response runbook

Phase 2:

SSO/SCIM for enterprise
granular RBAC
data retention/deletion workflows
key rotation automation
external security review

Phase 3:

compliance program sized to customers
formal change management
vendor-risk review process

Definition of Done

the company can explain its trust boundaries in detail
no critical path depends on shared human knowledge

Infrastructure Standard

Current Direction

dashboard on Vercel
runtime and auth on Railway
Postgres as system of record
object storage for evidence
Sentry
PostHog
Resend
Stripe

Required Infrastructure Capabilities

repeatable environments
managed Postgres with PITR
alerting
structured logs
environment-specific secrets
deploy rollback path
scheduler reliability
job monitoring
object storage
metrics on gateway, runtime, watcher, and connector paths

Standards

do not rewrite frameworks for vanity reasons
optimize for reliability and operational clarity before benchmark-driven micro-optimizations
any background job must be restart-safe and idempotent

ML and Evaluation Standard

The ML system is part of the product, not a side experiment.

Required Evaluation Domains

prediction quality
calibration quality
side-effect prediction quality
intervention effect quality
planner recommendation quality
approval routing quality
autonomy promotion quality
replay accuracy

Required Artifacts

model version
feature version
training window
evaluation set
calibration report
rollback path

Standards

no model upgrade without offline evaluation
no autonomy upgrade based on model change alone
rules remain available as fallbacks
unknown distributions reduce autonomy

Long-Term Research Standard

By the time Nooterra claims to be state of the art, it should have:

its own benchmark suite
reproducible evaluation harnesses
intervention-effect experiments
counterfactual replay experiments
internal reports on domain transfer quality

Development Workflow Standard

Every material feature must follow this flow:

write the product intent
define the failure mode
define the data contract
define the audit artifact
define the tests
implement the smallest real slice
run targeted tests
instrument it
document it
only then expand scope

Required Per-Feature Deliverables

code
migration, if schema changes
route contract, if API changes
tests
docs
rollout note

Required Review Questions

how does this fail closed?
what is the tenant boundary?
what artifacts are persisted?
how is uncertainty represented?
what are the deterministic guarantees?
how would this be replayed during an incident?

Testing Standard

Required Test Classes

unit tests
route tests
integration tests
fail-closed tests
determinism tests
replay tests
tenancy-isolation tests
migration/bootstrap parity tests where applicable

Safety-Critical Paths Must Test

missing tenant
malformed body
stale auth
unsupported action class
missing evidence
sidecar unavailable
drift/OOD
denied policy
approval required
replay after restart

Operational Paths Must Test

scheduler restart
duplicate webhook delivery
watcher idempotency
release/reject flow
calibration persistence

Data and Migration Standard

Rules

migrations are additive and backward-safe by default
new durable behavior requires schema before feature
process memory is never the source of truth for safety-critical runtime state
denormalized fields may exist, but durable source records must exist first

Required Tables by Maturity

Already required:

world_events
world_objects
world_relationships
gateway_actions
world_beliefs
world_predictions
world_prediction_outcomes
world_autonomy_coverage
world_autonomy_decisions
tenant_objectives
world_action_outcomes
world_action_effect_observations

Later required:

action budgets
intervention experiments
counterfactual replay sets
model release registry

Release and Rollout Standard

Every New Capability Must Have

activation strategy
rollback strategy
monitoring strategy
incident owner
customer-facing truthfulness standard

Staged Rollout

local and targeted tests
internal staging
design partner shadow mode
limited approval-first rollout
narrower autonomous rollout after evidence

Never skip straight from “it works locally” to “it runs autonomously for customers.”

Company-Building Standard

Nooterra needs more than engineers and code.

Required Functions

product / founder
backend / platform engineering
frontend / product engineering
ML / evaluation
design
support / customer success
ops / reliability
security / compliance ownership

At the start, one person may cover multiple functions. That does not remove the need for those functions.

Required Company Systems

support tooling
onboarding process
incident process
weekly product review
analytics review
design-partner review
release checklist
architecture review
documentation ownership

Priority Build Streams

These streams should run in order of dependency, with some parallelism where safe.

Stream A: Wedge Excellence

Stripe ingest
company state
AR runtime
approval queue
real action evidence
realized outcomes

Stream B: Control-System Completion

full gateway
budgets
temporal policy constraints
watcher automation
effect-aware evaluation
autonomy tied to measured intervention quality

Stream C: Finance Control Plane

disputes
refunds
credits
write-offs
payment plans
finance-specific simulation

Stream D: Operations Maturity

support tooling
admin tooling
incident review
observability
billing self-service
analytics discipline

Stream E: Multi-Source State

Gmail/conversations
CRM
accounting
support
entity resolution
cross-system effects

Stream F: Research and Frontier

intervention-effect learning
counterfactual replay
short-horizon control planning
benchmark suite
domain transfer

What Is Explicitly Deferred

The following should not distract the core build order:

framework rewrites for prestige
fake broad “AI employee” packaging
six shallow domains at once
aggressive enterprise surface area before the wedge is proven
autonomy claims unsupported by replayable evidence

Definition of Success

Nooterra succeeds when it can truthfully say:

we maintain a live business world model
we know what we observed vs what we inferred
we know what we predicted vs what actually happened
we know what actions were safe, unsafe, effective, or ineffective
we know what autonomy has been earned and why
we can replay and audit critical decisions
non-technical operators can use the system to run real business workflows

That is the bar.

Code Reality

Yes, this system requires a lot of code. Not because complexity is fashionable, but because the real system includes:

connectors
ledger
graph
beliefs
prediction history
calibration
action ontology
gateway
policy runtime
autonomy persistence
effect tracking
replay
watcher jobs
planner
runtime packs
dashboard
admin tooling
billing
auth
observability
support tooling

The right question is not “is this a lot of code?” The right question is: Is every layer necessary to build a real, auditable, governed enterprise control system? For this company, the answer is yes.

Overview

World Runtime

Historical

​Master Build Map

​Purpose

​Core Position

​What “Real” Means

​Company Standard

​Product Definition

​Initial Product

​Immediate Buyers

​Initial Promise

​Non-Goals for the Initial Product

​Product Principles

​1. Simplicity at the surface, sophistication underneath

​2. No fake state

​3. No autonomy without evidence

​4. No LLM as source of truth

​5. Build from one domain outward

​Canonical System Model

​Canonical Engineering Layers

​Layer 1: Observation Plane

​Purpose

​Required Capabilities

​Required Sources by Phase

​Standards

​Definition of Done

​Layer 2: Temporal Event Ledger

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 3: Canonical Object Graph

​Purpose

​Required Capabilities

​Initial Canonical Types

​Standards

​Definition of Done

​Layer 4: Beliefs, Predictions, and Calibration

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 5: Action Ontology

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 6: Policy Runtime and Authority System

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 7: Action Gateway

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 8: Objective Model and Planner

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 9: Operator Model and Earned Autonomy

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 10: Feedback Loop, Effect Tracking, and Replay

​Purpose

​Required Capabilities

​Standards

​Definition of Done

​Layer 11: Runtime Packs

​Purpose

​Initial Runtime Packs

​Standards

​Definition of Done

​Layer 12: Product UX and Dashboard

​Purpose

Master Build Map

Purpose

Core Position

What “Real” Means

Company Standard

Product Definition

Initial Product

Immediate Buyers

Initial Promise

Non-Goals for the Initial Product

Product Principles

1. Simplicity at the surface, sophistication underneath

2. No fake state

3. No autonomy without evidence

4. No LLM as source of truth

5. Build from one domain outward

Canonical System Model

Canonical Engineering Layers

Layer 1: Observation Plane

Purpose

Required Capabilities

Required Sources by Phase

Standards

Definition of Done

Layer 2: Temporal Event Ledger

Purpose

Required Capabilities

Standards

Definition of Done

Layer 3: Canonical Object Graph

Purpose

Required Capabilities

Initial Canonical Types

Standards

Definition of Done

Layer 4: Beliefs, Predictions, and Calibration

Purpose

Required Capabilities

Standards

Definition of Done

Layer 5: Action Ontology

Purpose

Required Capabilities

Standards

Definition of Done

Layer 6: Policy Runtime and Authority System

Purpose

Required Capabilities

Standards

Definition of Done

Layer 7: Action Gateway

Purpose

Required Capabilities

Standards

Definition of Done

Layer 8: Objective Model and Planner

Purpose

Required Capabilities

Standards

Definition of Done

Layer 9: Operator Model and Earned Autonomy

Purpose

Required Capabilities

Standards

Definition of Done

Layer 10: Feedback Loop, Effect Tracking, and Replay

Purpose

Required Capabilities

Standards

Definition of Done

Layer 11: Runtime Packs

Purpose

Initial Runtime Packs

Standards

Definition of Done

Layer 12: Product UX and Dashboard

Purpose

Required Surfaces

Standards

Definition of Done