Chapters

DOC-05 / Technical reference · Chapter 03

The Agentic Core: Atlas and the Agents

Describes the architecture of the agentic engine: LLM classification of incoming emails, headless spawning of Claude Code via pseudo-TTY, injection of agent personas, and post-spawn orchestration (deploy, QA, recap).

The agentic core: orchestrator and agents

Purpose of this section — It describes the engine that transforms an incoming email (or a project task) into an executed action: LLM-based intent classification, launching a headless code session via a pseudo-terminal, injection of the designated agent's persona, and post-execution orchestration (pre-production deployment, quality control, summary email).

1. Overview

The agentic core rests on two main building blocks:

The orchestrator (directing agent, public codename: Atlas). It has no dedicated runtime: "being Atlas" consists of starting an LLM-assisted code session with a dedicated system prompt. It is driven by a set of Python modules and a Node coordinator script.
Specialised agents (around thirty in total, organised into functional families): personas — identity, cognitive framework, business scope — injected into an LLM's context at the moment a task is executed.

The main flow, called the Atlas Inbox pipeline, proceeds as follows:

Forwarded email → Atlas inbox
        │
        ▼  (ingestion → messages table + tracking row status='received')
 ┌──────────────────────────┐
 │ Intent classifier        │  intent ∈ {run, chantier, question,
 │  (LLM via facade)        │           noise, negociation, conseil}
 └──────────────────────────┘  → status='classified'
        │                        + materialises run / question /
        │                          negociation / conseil depending on intent
        ▼
 ┌──────────────────────────┐
 │ Spawn module             │  whitelist {run, chantier, question, negociation}
 │  → Node coordinator      │  launches headless code session via pseudo-TTY
 │    (JSON stream)         │  + Atlas system prompt (CODE + COMMIT only)
 └──────────────────────────┘
        │  The session writes a temporary result file then terminates
        ▼
 ┌──────────────────────────┐
 │ Post-spawn orchestration │  pre-prod deployment → Playwright QA →
 │                          │  Atlas summary email → UPDATE 'actioned'
 └──────────────────────────┘  (re-spawn up to 3 iterations if QA fails)

Ingestion step — The transition "email received → tracking row" is handled by an IMAP polling module. This module queries the Atlas inbox via the parent platform's messaging facade, performs an unread message search, then for each message: inserts a row into the messages table (idempotent operation enforced by a uniqueness constraint on the message identifier) and creates a tracking row with status received, before marking the message as read. It automatically chains the antivirus scan followed by classification — it is this chaining, and not a disabled external scheduler, that loops the pipeline end to end.

2. Agent model

2.1 Storage

Agents are stored in a central table within the main schema. A compatibility view, bearing a name inherited from an earlier migration (Strangler Fig pattern), exposes the same columns. Python code reads via this view; the web interface reads directly from the table via the ORM. Both surfaces are strictly equivalent.

2.2 Key columns

Column	Type	Role
`codename`	varchar(64)	Lookup key, unique kebab-case (e.g. `orchestrator`, `backend`, `securite`, `seo-technique`).
`nickname`	varchar(64)	Public human alias (e.g. Atlas, Gauss, Mitnick, Otlet).
`role`	varchar(255)	Job title (e.g. "SEO Technical Officer").
`group_name`	varchar(64)	Functional family: `direction`, `cadrage`, `execution`, `validation`.
`orbite`	integer	Orbit ring (1, 2 or 3). ⚠️ Does not always match the visually displayed ring (see section 7).
`heritage`	varchar(255)	Geographic/historical anchor of the persona (e.g. "Ancient Greece" for Atlas, "United States, 1963" for Mitnick).
`cognitive_frame`	text	"Way of thinking" — block injected with top priority into the briefing (the 4 cognitive dimensions).
`personality`, `quote`, `inspiration`, `proof`	text	Rich identity (`full` tier only).
`job_mission`, `job_perimeter`, `job_key_checks`	text	Business scope (`metier` tier and above).
`content_md`	text	Long markdown profile, consumed by the legacy agent call module.
`active`	integer	1 = agent is recruitable.
`prompt_domains`	varchar(255)	CSV of prompt domains (content, faq, cover…) for routing to generation agents.
`auto_spawn`, `spawn_count`, `error_count`	integer	Runtime counters.

Internationalisation columns (EN) are present for narrative fields: role_en, heritage_en, cognitive_frame_en, etc.

2.3 Satellite tables

Several satellite tables orbit the main agents table: activity, events, heartbeats, inter-agent relationships, skills, tools, experience and experience history. Some of these tables have a legacy compatibility view (activity, heartbeat, relationships, experience and history); others do not (events, skills, tools). The agents domain manifest lists the full surface of exposed tables and API routes.

2.4 Persona loading

Before an LLM executes a task, it "becomes" the designated agent through a three-layer context assembly mechanism:

Identity + cognitive framework + business scope sourced from the agents table.
Relevant scars via vector semantic search (RAG pgvector) on the embeddings table, prioritising scars belonging to the agent in question before supplementing with global scars. Explicit objective: prevent an agent from seeing another agent's lessons as a priority.
Project mission brief if the project can be resolved from the task.
User preferences (operator profile, core slice).

Three rendering tiers are calibrated by token budget, in order to limit attention dilution and maximise prompt cache hit rate (TTL ~5 min):

Tier	Approx. budget	Included content	Typical use
Core	~150 tokens	Cognitive framework only + minimal identity	Code-free fast path, Task-type sub-agents
Métier	~350 tokens	Core + role + business scope	Default for full Atlas spawn and task worker
Full	~600 tokens	Métier + heritage + quote + personality + inspiration + proof	Review, drill, mission brief

A backward-compatibility alias exposes the métier tier under the generic name format_briefing for legacy calls.

2.5 Note on the legacy agent call module

An older agent call module allows an agent to be invoked as a subprocess and the exchange to be logged in the reactor. ⚠️ This module carries significant technical debt: its scar and doctrine loading helpers still connect to a legacy MySQL database — a remnant predating the migration to PostgreSQL. Exceptions are silently swallowed, meaning the call does not crash but may return profiles with no scars or doctrines.

Point to confirm — Is this legacy path still invoked in production, or has it been superseded by the Atlas spawn pipeline associated with the current persona module? The module's internal comment indicates an ongoing migration to the PostgreSQL agents table, but both scar/doctrine helpers still point to the old MySQL source.

Atlas the orchestrator — the spawn engine

Restricted role of the spawned agent

The system prompt passed to the spawned agent strictly scopes its operational perimeter: investigation, code writing, and commit. The agent does not trigger deployments, does not send emails, and does not update the inbound message registry — all of these operations are orchestrated by the Python engine after the agent finishes execution.

Perimeter guardrails are encoded in the system prompt: modifications are permitted only within the hub, back-office, and the relevant tenant zones; never within a shared public component.

"Early checkpoint" instruction (introduced mid-2026): the system prompt now requires the agent to write an intermediate result file as early as possible — from the first partial diagnostic — and to overwrite it after each major milestone, rather than waiting until the work is fully complete. The rationale is straightforward: if the agent is interrupted by the timeout, the last persisted state serves as salvage to notify the team. A partial file is better than nothing. This instruction was introduced following an incident in which the result file had not yet been written at the moment the process was killed.

Spawn candidate selection

The selection engine queries the inbound message registry and retains only entries that meet all three of the following conditions:

status classified;
classified intent among: run, chantier, question, negociation;
no recent spawn audit (started, completed, or failed within the last hour) — idempotence guarantee against double-spawn.

Applied safety constants:

Allowed intents: run, chantier, question, negociation — the noise and conseil intents are excluded from spawn.
Maximum spawns per cycle: 3, to cap costs in the event of runaway behavior.
Timeout per spawn: 40 minutes. This timeout was raised (from 25 minutes) following an incident in which an investigation on a complex production environment was cut off just before the deliverable was written — the spawn was marked as failed even though the fix was already in place. This change was deployed together with the "early checkpoint" instruction.
Kill-switch: an environment variable allows the spawn engine to be disabled entirely without touching the code.
Default model: Sonnet.

Caution note: the header comment of the main module is stale — it still advertises a two-intent allowlist, a 10-minute timeout, and a different cron offset. The effective constants in the code are authoritative: four allowed intents, 40-minute timeout. This page uses the actual values; do not rely on the file's header comment.

Anti-race: advisory database lock

To cover the collision between the 5-minute cron and a simultaneous manual invocation, the engine acquires a session-level advisory lock on the identifier of the message being processed before any spawn. This lock is held via a dedicated interactive connection and released automatically upon its closure or process death. If the lock is already held by another instance, the message is skipped for that cycle and a spawn_skipped_lock_held audit record is written.

Prompt construction

The prompt passed to the agent is built from three sources:

the forwarded email (sender, subject, body truncated to 6,000 characters);
the tenant identity, resolved by domain heuristic (explicit lookup table first, then fallback to the registered contact);
the output of the intent classifier (intent, confidence score, model reasoning).

In the event of a QA failure, a re-spawn prompt is built by appending the previous commits and the detected errors (HTTP errors at the application layer, browser errors at the interface layer).

Agent contractual deliverable

Before terminating its execution, the agent must unconditionally write a structured result file containing: the status (ok or otherwise), the relevant tenant, the routes to submit to QA, the list of commits performed, a markdown summary, and a draft reply to the user. This file constitutes the contract read by the Python orchestrator to decide on post-spawn actions.

Programmatic invocation via pseudo-TTY

Fundamental harness rule: a Claude agent launched programmatically must go through a real pseudo-TTY via a dedicated Node.js wrapper — never via a direct Python subprocess call.

Why: launching claude in non-interactive mode from a Python subprocess causes a silent hang — buffer deadlock and/or non-TTY mode detection that refuses the prompt passed as an argument. A pseudo-TTY of type xterm-color works around both pitfalls.

Call chain

The Python engine writes the prompt and the system prompt to temporary files, then delegates execution to the Node.js wrapper by passing the paths to those files, the spawn identifier, the target model, and the timeout. The call looks schematically like:

node un composant interne \
  --id <ID> \
  --prompt-file <prompt-file> \
  --system-prompt-file <system-file> \
  --model sonnet \
  --stream-log <log-file> \
  --add-dir <project-directory> \
  --timeout-sec 2400

The Node wrapper

The Node.js wrapper (node-pty dependency) handles the following:

Locating the Claude binary on the host machine.
Building the arguments passed to Claude: non-interactive mode, allowed working directory, no session persistence, verbose stream-JSON output format with partial messages and hook events, model and system prompt appended.
Critical argument ordering: the --add-dir option must be placed before all other flags — otherwise its varargs behavior "consumes" the prompt as a directory. This constraint cost 30 minutes of debugging during a past incident.
Spawning the pseudo-TTY with generous dimensions (200 columns × 50 rows).
Injecting the environment variables required by the agent's Stop hooks (database coordinates, worker context) — the database password is inherited from the parent environment, never hardcoded.
Emitting each stream-JSON event as JSONL on stdout (with tee to a log file), detecting the final result event.
Exit codes: 0 on success, 1 on error, 2 on invalid arguments. A final wrapper event of type node_pty_exit is emitted for the parent Python process.

Live event persistence

On the Python side, each received JSONL line is parsed and persisted in real time to the spawn event log. Each record carries: a sequence number, an event type (sub-typed according to whether it is a stream event, a system event, or a tool result), the name of the tool involved if any, and the full JSON payload.

Architecture note: the event log table is in fact a simple updatable view layered on top of an underlying physical table — the same Strangler-Fig pattern used for the agent registry. Inserts through the view work normally on the exposed columns.

This real-time persistence allows a hang to be diagnosed (the last event visible in the database reveals where the agent is blocked) and enables cost, duration, and token aggregation from the final result event. Everything is also audited in the central audit log.

Intent classification

Intent enumeration

The classifier recognizes six valid intents. The following table describes their semantics and downstream effect:

Intent	Meaning	Downstream effect
`run`	Atomic task executable in a single command	Creation or update of a run in the run registry
`chantier`	Structured multi-step mission (skeleton + agents)	Creation of a chantier draft
`question`	Request for opinion or analysis, reply as draft	Creation or update of a question in the registry
`noise`	Newsletter, spam, nothing actionable	No action
`negociation`	Inbound commercial request (prospect, quote)	Insertion of a negotiation into the commercial registry
`conseil`	Request for opinion or expertise on an existing external item	Insertion of a conseil record into the dedicated registry

Each materialization is mutually exclusive and idempotent: unique indexes and conditional guards prevent duplicate creation even under repeated calls. The created identifier is traced back to the originating message record.

LLM engine

Classification relies on the AI provider facade — a centralized abstraction layer — never on a direct SDK call. Model routing is resolved dynamically by configuration; the default provider and model are Mistral / mistral-small-latest. Output is requested as strict JSON, with a 30-second timeout.

Guardrails

Anti-prompt-injection: the email body is sandwiched between two explicit delimiters and preceded by a warning instructing the model that this is email data, not instructions. The body is truncated to 8,000 characters.
Strict enumeration: any intent outside the six recognized values is rejected, regardless of the model's response. Any confidence score outside the [0, 1] range is also rejected.
Confidence auto-floor: a confidence score below 0.5 forces the intent to noise.
Uncertainty threshold: a confidence score below 0.7 triggers a classification_uncertain audit record.
Pre-LLM short-circuit: if the founder re-forwards an Atlas summary (subject starting with Fwd: [Atlas]…), the intent is forced to noise without an LLM call, to avoid a costly no-op spawn.
Antivirus scan first: classification is blocked if any attachment has a verdict other than clean — "scan before opening" doctrine.
False-noise guardrail with founder attachment: if an email is classified as noise but originates from the founder and carries attachments, the intent is reclassified as question. Rationale: actionable content may be present in a screenshot that the text model cannot see. A noise_override_founder_attachment audit record is written.
Existing client guardrail: if an email is classified as negociation or conseil but the sender address matches an active tenant, the intent is downgraded to run scoped to the relevant tenant. Client emails are not new commercial negotiations. An existing_client_guard audit record is written.

At the end of classification, the inbound message registry is updated with the intent, confidence score, model used, classifier reasoning, and the classified status. A classified audit record is written.

OCR enrichment of attachments

Before the LLM call, if attachments are present and the antivirus verdict is clean, an optical character recognition (OCR) text extraction module is invoked on the temporary attachment directory. The engine used is Tesseract, running locally — no cloud call is made.

The extracted text is injected into the sandwich prompt context, between its own delimiters (IMAGE_OCR_TEXT_START / END), after the email body. The OCR engine configuration is resolved by the dynamic routing system.

Important — personal data: the extracted OCR text is never persisted to the database (personal data risk — in-memory use only within the classification cycle). Only an ocr_extracted audit record is written, carrying the number of characters extracted and the approximate number of images processed.

Rationale: the text model cannot see screenshots attached to emails. Without OCR, tickets containing screenshots of issues were incorrectly classified as noise, blocking all automated processing. Local Tesseract OCR extracts visible text at no additional cost and without leaking data to a third-party service.

The two parallel AI routers

The system runs two distinct multi-provider AI routing engines that share neither code nor configuration. This is not an accidental duplicate: they serve two different execution environments. Knowing which one to modify depends on the development context.

Axis	Python router (automations)	TypeScript router (Nuxt hub)
Runtime	Scheduled automations (crons, nightly loop, reminder, classifier, learning…)	Nitro/Nuxt hub runtime (server API entry points)
Exposed surface	`embed(…)` + `complete(…)`	Content generation + provider resolution per client
Providers	Embeddings: Mistral / Voyage / OpenAI; completion: Claude / Mistral / OpenAI	Mistral / Anthropic / OpenAI
Routing source	AI routing configuration YAML file	Per request (explicit provider) or per client configuration
Default	Embeddings → Mistral; completion → Claude	Mistral (FR/EU data sovereignty)
Claude provider call	Delegated to the agent invocation mechanism (subprocess)	Direct HTTP request via a dedicated utility function
Doctrine	Mandatory pass-through via the validated routing facade	Internal Nuxt module service, no hook-guarded facade

The full detail of the Python router (embeddings, completion, YAML file, retries, guardrails) is documented in the Memory & Learning chapter. This section describes the TypeScript counterpart, symmetrically.

The generation function — single entry point of the Nuxt runtime

The content generation function is the single entry point of the Nuxt runtime for producing text via AI. It accepts as input: a prompt, an optional systemPrompt, an optional provider, an optional model, a maximum token count (default 4096), and a temperature (default 0.7). The response is normalized and exposes: the generated content, the provider that actually responded after any failover, the model used, the token count (input/output), and the duration in milliseconds.

Default provider = Mistral — a sovereignty choice (FR data, European API), consistent with the Python router's default for embeddings.
Default models: Mistral → mistral-large-latest; Anthropic → claude-sonnet-4-6; OpenAI → gpt-4o.

Automatic failover (sovereign order)

The failover order is: Mistral → Anthropic → OpenAI. On invocation, the system builds the effective order by placing the requested provider first, then iterates according to the following logic:

generateContent(request)
   effective provider = request.provider ?? 'mistral'
   order = [provider, ...sovereign_order \ provider]
        │
        ▼   for each provider in order:
   API key missing? ──yes──► skip (next provider, silently)
        │ no
        ▼
   callProvider()  ──success──► return { provider, durationMs, … }
        │ exception
        ▼
   last in order? ──yes──► error "All AI providers are unavailable"
        │ no
        ▼
   log failover → next provider

Two conditions trigger a move to the next provider:

Missing API key: the system silently skips the affected provider.
Call exception: the system fails over to the next provider and logs the event.

During a failover, the model used is realigned to the fallback provider's default model — and not to the model requested for the initial provider. The provider returned in the response always reflects who actually responded, which is useful to log on the caller side.

The three providers are called via distinct functions: Mistral and OpenAI via direct HTTP request (120 s timeout, identical message format with optional system + user); Anthropic via a dedicated utility function.

Point to confirm: the Anthropic utility function is referenced in the code but its implementation was not located within the module scope. Either it is resolved by Nitro auto-import from an unindexed utility, or it is latent technical debt that would cause the Anthropic branch to fail at runtime. The Mistral → Anthropic failover is therefore not guaranteed to be operational without prior verification.

API keys (anti-leak)

API keys are resolved exclusively by environment variable name; no value appears in the code or in this documentation. The variables concerned are: the Mistral API key, the Anthropic API key, and the OpenAI API key — all stored in environment files excluded from version control.

A missing key is not a blocking error: it simply causes the corresponding provider to be silently skipped in the failover loop.

Per-client provider selection — tenant configuration resolution

The per-client resolution function reads the AI configuration stored in the client configuration table (JSON column) and returns the pair { provider, model }. The extraction logic is:

AI provider configured → defaults to mistral if absent;
AI model configured → defaults to the provider's default model if absent.

Any error (unknown client, invalid JSON, helper failure) falls back silently to Mistral with its default model. The active=1 filter ensures that an inactive client is never returned, even if its configuration specifies a provider.

Observed state: the majority of clients have the AI provider field absent from their configuration (→ sovereign Mistral by default); only a few set it explicitly.

Point to confirm: the per-client resolution function has no identified caller within the module scope. The two actual consumers of the generation function pass the provider hard-coded in the request and never query the client configuration. Per-tenant routing is therefore wired but not yet connected to the generation flow.

Actual callers of the generation function

Two Nuxt entry points currently consume the AI gateway:

Entry point	Usage	Requested provider
Client reply draft generation (Atlas pipeline)	Drafts the client reply for a completed Atlas run (formal register, 3–6 sentences, team signature)	Mistral / `mistral-small-latest`
Multi-agent dialogue workspace (client)	Multi-agent perspectives on support notes, then lead response	Notes: Mistral / `mistral-small-latest`; lead response: Anthropic

Important reading: the Atlas reply draft does not use the Python router — although it is the final piece of the Atlas Inbox pipeline, it is served by a Nuxt endpoint, and therefore by the TypeScript gateway. This is the concrete illustration of the boundary:

Classification and triggering = Python router (automations);
Draft authoring from the hub cockpit = TypeScript router (Nuxt).

The multi-agent dialogue workspace is the only caller that deliberately mixes two providers within a single HTTP request (Mistral for the analysis notes, Anthropic for the lead's final voice).

Orbits: Direction, Scoping, Execution, Validation

The data model

Active agents are distributed along two axes that coexist and must be distinguished:

The functional family — the business source of truth, corresponding to the four structural roles: direction, scoping, execution, validation.
The numeric ring (1/2/3) — an attribute stored in the database that does not map 1:1 to the functional family (for example, the scoping family appears in both ring 1 and ring 2).

Observed distribution across active agents:

 ring   | family     | count
--------+------------+--------
   1    | direction  |   4     ← Atlas, Hill, Colbert, Winnicott
   1    | scoping    |   3     ← Montesquieu, Gauss, Clausewitz
   2    | scoping    |   2     ← Marco Polo, Socrate
   2    | validation |   8     ← Otlet, Mitnick, Itten…
   3    | execution  |  12
   3    | validation |   1

The visual reactor render

The reactor visualization page does not read the ring column from the database. It recomputes the display ring from the functional family via a local lookup table:

const ringMap = { direction: 1, cadrage: 2, execution: 2, validation: 3 }

Ring 1 — Direction (60 s rotation)
Ring 2 — Scoping + Execution (90 s, counter-clockwise)
Ring 3 — Validation (120 s)

Point to confirm: is the divergence between the ring column in the database and the interface lookup table intentional or technical debt? Since the visual render is driven exclusively by the functional family, it is considered reliable; the ring column appears underused on the front-end side.

Routing by prompt domain

In parallel with the rings, agents are selectable by domain via a CSV attribute. A dedicated endpoint allows querying agents compatible with a given domain (contenu, faq, cover, podcast, linkedin, reels), with cross-tenant proxy to the Mother Ship if the call originates from a remote client VPS.

Atlas classifier calibration

A monthly reporting script aggregates classification data and audit logs to produce a classifier health report. It computes the following metrics:

Volume by intent: number of classifications, average confidence, and standard deviation.
Proxy success rate: per intent, ratio of non-cancelled jobs to total drafts generated. A cancelled draft means the responsible party rejected the suggestion.
Uncertainty rate: proportion of classifications flagged as uncertain out of the total.
Failures: count of failed classifications and blocks due to unsafe attachments.
Phase 3 promotion threshold (multi-tenant): the promotion criterion is a success rate above 90% over at least 30 internal jobs. Until this threshold is reached, Atlas remains confined to internal use.

Important note: the "calibration" described here is observability and threshold management, not model retraining or automatic weight adjustment. Automatic calibration of the Atlas system prompt (example-based learning, continuous improvement) is a roadmap feature, not yet delivered.

Post-spawn orchestration: deployment, QA and notification

Once the agent spawn completes, the system automatically chains three phases: pre-production deployment, quality assurance, then e-mail notification.

Reading the spawn contract

The post-spawn orchestrator starts by reading the JSON contract produced by the agent. This contract may contain an intent correction decided by the agent itself: it is allowed to downgrade the initial intent to a non-code-producing intent (question, negotiation, run). This correction is possible because the agent has read the full conversation thread, whereas the input classifier only had access to a truncated excerpt.

No-code branch

When the final intent is of type question, negotiation or run, the system entirely bypasses the deployment and QA phases. It directly sends a summary and marks the record as handled.

Code branch (chantier)

When the intent is a chantier, the full sequence is triggered:

Pre-production deployment via the standard deployment script, with a 15-minute timeout.
Execution of the automated QA test suite (Playwright) against each affected route.
Depending on the QA result, the orchestrator takes one of the two decisions described below.

QA iteration loop

If QA fails, the orchestrator attempts a new spawn cycle enriched with the error context, up to a maximum of three iterations. Beyond that, it triggers a human escalation. If QA passes, it sends the summary and closes the record.

spawn ok ──► deploy preprod ──► QA route(s)
                                    │
              ┌── QA OK ────────────┴── QA FAIL ──┐
              ▼                                    ▼
        Atlas notification               iteration < 3 ?
        record 'actioned'       ┌── yes ──┴── no ──┐
                                ▼                   ▼
                      re-spawn (QA error    human escalation
                      context)             (max iter reached)

E-mail notification

The final notification is routed through the system's e-mail sending facade. It is always addressed to the internal Synedre team — never directly to the external requester. This rule is a fundamental doctrine: the AI never contacts the end client on its own initiative. The credentials for the sending mailbox are read from server environment variables and never appear in plaintext in the code.

Pipeline trigger loop

The Atlas pipeline is driven by standard system scheduled tasks. Each task goes through a supervision component that manages locks and execution logs.

Note: The application-level scheduling engine built into the server has been out of service since May 2026. The pipeline relies entirely on system scheduled tasks.

Three tasks form the end-to-end loop:

Frequency	Component	Role in the loop
Every minute	Atlas mailbox collector	Ingestion and chaining. Polls the dedicated inbox, inserts new messages, then cascades into antivirus analysis and intent classification. Classification is therefore triggered here, within the same cycle as ingestion.
Every 5 minutes (with a 2-minute offset)	Spawn orchestrator	Spawn and orchestration. Retrieves messages whose intent is classified and eligible, then launches the spawn and post-spawn orchestration (deployment, QA, notification).
Every 15 minutes	Lock guardian	Safety net. Emits an alert if a zombie lock persists beyond 10 minutes, preventing any silent pipeline blockage.

The actual execution order is therefore: ingestion (minute 0) → analysis + classification (same tick) → spawn (minute 2 of the 5-minute cycle). The two-minute offset between the collector and the spawn orchestrator ensures that classification is complete before the spawn looks for eligible messages. An advisory database-level lock covers the residual collision between a scheduled run and a manual trigger.

The Atlas mailbox collector is distinct from the platform's general mailbox collector, which feeds a separate table and must not be confused with it.

Pipeline components: roles and responsibilities

Main application components

Component	Role
Atlas mailbox collector	Polls the dedicated inbox, inserts raw messages and tracking entries, then cascades into antivirus analysis and classification. Runs every minute.
Classification engine	Classifies the message intent via an LLM and materialises the corresponding record (run, question, negotiation, advice, chantier). Since June 2026: OCR enrichment of attachments and a dual safeguard against false positives and requests from prospects not yet clients.
Spawn orchestrator	Handles the coder agent spawn, pre-production deployment, the QA loop and the final notification. Since June 2026: spawn timeout extended to 40 minutes, with an early checkpoint instruction in the system prompt.
Classifier calibration tool	Produces the monthly classification quality report and drives the threshold for advancing to the next phase.
Streaming spawn wrapper	Mandatory Node.js component for launching the AI agent in interactive JSON streaming mode via a virtual terminal.
Agent persona loader	Assembles the agent personality across three levels, enriches context via vector search in the scar database, and injects the mission letter.
Agent invocation client (legacy)	Subprocess-based agent invocation, maintained for compatibility. The scar and doctrine helpers in this component still rely on the legacy relational database.
Multi-provider AI router (server-side web)	Exposes a unified content generation interface with automatic failover between AI providers (configurable priority order). Resolves the preferred provider per client according to the topology configuration.
Agents domain (web interface)	Agent registry, event reactor, database facades and central dashboard API.

Tables and data structures

Agent registry: main table of declared agents, also exposed via a summary view.
Raw ingested messages: storage of incoming e-mails prior to processing.
Atlas tracking: lifecycle tracking table for each message processed by Atlas.
Response drafts: prepared response bodies, read by the classifier when materialising a question-type intent.
Atlas audit log: trace of all pipeline actions.
Spawn events: updatable view over the base spawn events table (see section on the updatable view).
Work records: dedicated tables for runs, questions, negotiations, advice and chantiers.
RAG database: storage of vector embeddings for scar retrieval.
Client topology: configuration of client VPS instances and their AI preferences.

External module: OCR

Since June 2026, the classifier calls a local OCR module (Tesseract) to extract text from image attachments before classification. This module is independent of external AI providers.

PreviousThe Data Layer NextWorksites, Jobs & Tasks — Data Model and API

All chapters