Chapters

DOC-05 / Technical reference · Chapter 07

Inbox hub & Atlas Inbox — two email pipelines

Describes the two IMAP→DB pipelines of the harness (contact@ sorted for the hub, atlas@ triggering agentic actions), the outbound sending facade, and the associated blocking doctrines (AV scan, zero direct client contact).

Overview — two distinct pipelines

The system hosts two mailboxes and two email storage spaces that share no join relationship whatsoever. This is the primary source of confusion; the following must be understood before anything else:

	Pipeline "Inbox hub"	Pipeline "Atlas Inbox"
IMAP mailbox	Main contact mailbox (shared hosting)	Dedicated agentic mailbox (shared hosting, separate mailbox)
Target storage	Hub email table	Raw message table + Atlas workflow table
Active collector	Python collection script (active since May 2026)	Direct Python poll script (no intermediate facade)
Disabled implementation	Legacy Nuxt facade disabled May 2026	(no Nuxt facade — direct Python poll)
Classification	Static heuristic (sender domain → client / MRR / spam)	LLM, intent among 6 possible values
Purpose	Triage and display in the hub (client bugs, MRR, priority)	Trigger actions: run, project, question, negotiation, advice

The general flow diagram is as follows:

Main contact mailbox (IMAP)
              │
              │ cron every minute — Python inbox hub collector
              │ (read-only, 7-day sliding window, dedup by UID)
              ▼
    Inbox hub storage
    (heuristic classification: client / priority / MRR / spam)
              │
              ▼  hub interface

  [DISABLED May 2026] legacy Nuxt facade / Nitro scheduled task
  — was saturating the event loop on IMAP timeouts

Atlas agentic mailbox (IMAP)
              │ cron every minute — Atlas Python collector
              │ (reads unread messages)
              ▼
  Raw messages ──1:1──▶ Atlas workflow entries (status "received")
              │
              ├─ attachments present → antivirus analysis + heuristics
              │        └─ safe attachments: yes / no
              ▼
  LLM classification → detected intent
              │
  ┌──────┬──────────┬──────────┬──────────────┐
 run  question  project  negotiation  advice / noise

Pipeline "Inbox hub" — main contact mailbox

A single active collector (since May 2026)

This email table has only one active feed path wired in production: the Python collector dedicated to the inbox hub. The legacy Nuxt facade is out of service.

Active Python collector — performs a direct IMAP poll of the main contact mailbox via the inbox connection facade (read-only mode). It queries messages from the last 7 days, processes UIDs from most recent to oldest, and inserts each message using an upsert logic: on conflict with an existing IMAP identifier, already-populated fields are never overwritten. It runs as a cron job every minute. This collector replaces the legacy Nitro scheduled task.
Legacy implementation — disabled: the old Nuxt facade (sync endpoint → Nitro task), triggered by a dedicated cron script, was disabled in May 2026. It was saturating the Nuxt server event loop due to repeated IMAP timeouts that also blocked the database connection. The corresponding cron line is commented out in the crontab.

There are therefore not two co-active paths: one live path (Python collector) and one dead path (legacy Nuxt facade). The emergency read skill (IMAP fallback, read-only) explicitly states that it never touches the main storage in order to avoid any double insertion.

Detection of a genuine insertion: the upsert query uses a mechanism that distinguishes a genuine insertion from a backfill on a duplicate — which allows new messages and silent updates to be counted separately.

Heuristic classification (no LLM)

Inbox hub classification relies on a static heuristic applied to the sender address only (the subject is not used):

The address is compared against a reference set of known client domains and email addresses → enriches the message with the client name, priority, and MRR.
Otherwise, spam markers are tested against the address (newsletter, noreply, no-reply, mailer-daemon, postmaster, notifications@, updates@, marketing@) → the message is flagged spam.

The initial status assigned is either spam or new. The message body is extracted in two variants (plain text and HTML) via multipart parsing, explicitly ignoring parts marked as attachments. Both bodies are persisted in full, without truncation. On re-sync of an already-known message, the upsert backfills the bodies only if they are empty (via COALESCE) — the existing classification remains intact.

Attachments: not captured by the active flow. The Python inbox hub collector does not persist any attachments — Content-Disposition: attachment parts are explicitly ignored during body extraction. Attachment storage (one row per file in binary data) belonged to the now-disabled legacy facade; as of the current state, the inbox hub attachment table is no longer fed by this pipeline.

IMAP connection credential management

The active collector does not read the IMAP connection variables directly: it delegates to a connection facade that resolves the credentials for the contact mailbox. These values reside exclusively in the host server environment file, never in the repository.

Several system components (outbound mail client, utility scripts, main collector) reference the IMAP password under slightly different variable names depending on their origin. In the event of a discrepancy, the host environment file takes precedence and defines the actual mapping.

Inbox hub storage structure

Field	Type	Role
Internal identifier	integer, primary key
IMAP identifier	string, unique	Deduplication key (upsert `ON CONFLICT`)
Sender (address, name), subject	string
Received date	timestamp with time zone
Client name, client priority, MRR	string / integer	Populated by heuristic classification
Bug flag, bug keyword, AI severity, AI summary	integer / string / text	Bug triage (set to 0 at initial insertion)
Archive path	string	Reference to the archived .eml file (not written by the active flow)
Status	string	`new`, `spam`, …
Processed by, processed on, notes	various	Internal annotations
Link to project	integer, foreign key	Association with a project
Plain text body, HTML body	text	Full untruncated body
Created at, updated at	timestamp	Written on every insertion or update

The hub interface exposes several endpoints to list messages, access message details, update a message, view its attachments, link it to a project, or manually trigger a synchronisation.

Inbound Email Reception and Classification Pipeline

This pipeline processes emails received on CodeMyShop's central inbox through three automatically chained steps: collection → attachment analysis → classification.

Message Collection

A scheduler triggers the collection cycle every minute via a non-blocking lock mechanism: if a cycle is already running, the next trigger is silently ignored. A watchdog periodically checks (every fifteen minutes) for orphaned locks and releases them as needed.

The collection cycle proceeds through the following steps:

IMAP connection via the mandatory facade — mailbox access is exclusively routed through the Synedre OS email abstraction layer (see the facade guardrail section below). Any direct IMAP connection is structurally blocked.
Unseen message query — the request targets messages flagged UNSEEN, with a configurable limit (50 by default). The fetch uses PEEK mode to avoid automatically marking messages as read.
Database persistence — each message is inserted idempotently: a conflict on the message identifier is silently ignored. The message body is capped (60,000 characters for plain text, 120,000 for HTML) to prevent capacity overflows. The \Seen flag is set only after insertion is confirmed, guaranteeing that any unflagged message will be reprocessed in the event of a crash.
Atlas registry entry — a second, equally idempotent insert creates the tracking entry with the initial status received.
Automatic chaining — if the message contains attachments, antivirus analysis is triggered before classification (scan-first doctrine), after which classification is launched in all cases.

Internal anti-loop: emails whose sender is the Atlas mailbox itself, or whose subject contains an internal tracking identifier, are marked as read and discarded without insertion, preventing self-spawn loops. Manual re-forwards from an identified founder sender are processed normally, as they constitute a legitimate workflow.

Audit: each cycle is recorded in an append-only forensic log. By design, no personal data is stored in plaintext: only a truncated cryptographic digest of the sender address is retained.

Attachment Analysis

When a message contains attachments, the analysis module is invoked before any classification. The mechanism is as follows:

Retrieval of message metadata (IMAP identifier, attachment presence) via a join between tracking tables.
Read-only re-fetch of the message by its IMAP identifier.
Extraction of attachments to an isolated temporary directory, with path-traversal protection applied to filenames.
Pre-scan rejection: known executable extensions (.exe, .scr, .bat, .cmd, .com, .vbs, .js, .jar, .msi, .dmg, .deb, .rpm, .ps1) and files exceeding 25 MB are immediately rejected with the verdict blocked.
Antivirus analysis of each attachment via the security agent facade (see §5).
Global verdict: attachments_safe = 1 only if at least one file was scanned and no suspicious, infected, or rejected file was detected. Otherwise, attachments are moved to quarantine and the message status is switched to failed.
Persistence of the detailed verdict (JSONB structured format) and audit logging.

AI-Based Classification

Classification determines the nature of each inbound email and triggers the materialisation of the corresponding business objects.

Pre-Classification Guardrails

Block on unsafe attachments: if a message contains attachments, the analysis verdict is available, and that verdict is negative (attachments_safe = 0), classification is refused and the event is audited. Tolerance: an absent verdict (message with no attachment or scan not yet performed) allows classification to proceed.

Pre-LLM short-circuit — founder re-forward to Atlas: if the sender is the founder and the subject indicates a re-forward of an Atlas summary, the intent is forced to noise without invoking the language model. This short-circuit avoids triggering a costly spawn on already-processed content.

Prompt Injection Protection

The email body is enclosed by explicit start and end markers.
An explicit warning instructs the model that the enclosed content is email data, not instructions.
The body is truncated to 8,000 characters before being submitted to the model.
Output is constrained by a strict enumeration of valid intents — any out-of-enum hallucination is rejected by validation.
Scope and recipient are never derived from the message content.

Local OCR — Context Enrichment

After antivirus verdict validation (safe attachments), if attachments are present, a local OCR engine (Tesseract) analyses images to extract their text. This text is appended to the prompt context between dedicated markers. Two doctrine rules apply:

OCR text is never persisted to the database (potentially personal data — in-memory use only).
OCR is triggered after the antivirus guardrail, so that the content of an unscanned image is never read.

Language Model Invocation

If no short-circuit has fired, the system queries a language model via the Synedre OS unified AI facade. The provider and model are defined by the routing configuration (default: Mistral, model mistral-small-latest). Output is requested as structured JSON with a 30-second timeout.

Output validation: the intent must belong to the set of valid values and the confidence score must be between 0 and 1. A score below 0.7 triggers a classification_uncertain audit event. A score below 0.5 automatically floors the intent to noise, independently of the audit threshold.

Post-LLM Guardrails

Founder re-forward with attachments incorrectly classified as noise: if the model returns noise while the sender is the founder and the message contains attachments, the intent is requalified to question. This case covers situations where actionable content is contained in an image that the small model cannot analyse directly.
Existing client rerouted: if the intent is negociation or conseil but the sender address matches an active tenant (verified by email domain against client reference tables), the intent is requalified to run scoped to that tenant. Prospects remain in negociation.

Valid Intents and Materialisation

The six valid intents are: run, chantier, question, noise, negociation, conseil. Each intent materialises at most one entry in the corresponding business table, idempotently.

Intent	Action	Object created	Deduplication	Link recorded
`run`	UPSERT	Work item (`source='atlas-inbox'`, `trigger='email'`, optional scope if rerouted)	Unique constraint on source reference	Linked work item identifier
`question`	UPSERT	Question (`status='pending'`)	Unique constraint on source reference	Linked question identifier
`negociation`	INSERT	Negotiation (`status='nouveau'`)	Guard on existing link	Linked negotiation identifier
`conseil`	INSERT	Advisory (`status='received'`)	Guard on existing link	Linked advisory identifier
`chantier`	Manual draft (CLI)	Workstream (`status='draft'`)	—	Linked workstream identifier
`noise`	Total no-op	—	—	—

For negociation and conseil intents, the company name is derived from the sender domain, except for generic domains (gmail.com, outlook.com, laposte.net, etc.).

Special case for the chantier intent: unlike other intents, workstream creation is not automatic upon classification. The message remains in classified status with the chantier intent. Workstream creation in draft mode is triggered manually by the founder via the --link-chantier option of the CLI, in accordance with the seven-step procedure doctrine.

Command-Line Interface — Classification

# Classify an email by its identifier
python3 -m un composant interne --id-atlas-email 42

# Classify all pending emails (received, safe or absent attachments)
python3 -m un composant interne --all-pending

# Reclassify an already-processed email
python3 -m un composant interne --reclassify 42

# Create a draft workstream from a chantier-classified email
python3 -m un composant interne --link-chantier 42

# Report low-confidence classifications (< 0.7)
python3 -m un composant interne --report-uncertain

Email Facade Guardrail

This guardrail is a technical mechanism, not a mere convention. It is a pre-execution hook wired to shell commands launched by agents. Its role is to intercept any attempt at direct mailbox access (inline Python IMAP/SMTP send or read, or via an unregistered script) and block it with a hard error code.

How it works:

An allowlist enumerates the authorised Synedre OS email modules (email client, IMAP sync, direct access). Any command referencing one of these modules is passed through.
Unlisted commands are analysed for direct-access patterns: smtplib and imaplib imports, SMTP_SSL/IMAP4_SSL classes, MIME message send methods, or calls to banned parallel facades (send_immediate, send_email_quick).
On a match, the command is blocked and an explanatory error message is displayed to the agent.

This guardrail was introduced following a production incident in which an unofficial facade bypassed the IMAP /Sent folder and corrupted the encoding of accented characters during shell escaping. The hook structurally prevents any recurrence.

Atlas Tracking Table Schema

The central tracking table stores all lifecycle information for an inbound email:

Identity: Atlas entry primary key, foreign key to the email messages table.
Classification: classified intent, confidence score (numeric), model name used, textual justification.
Status: progression through the pipeline (received → classified or failed).
Attachments: detailed verdict (JSONB), binary safety indicator.
Links to business objects: identifiers of created workstreams, work items, questions, negotiations, and advisories.
Dispatch authorisation block: dispatch token, issuance and consumption timestamps, consumer identifier, reply message identifier.
Provenance: source type and associated metadata (JSONB), allowing an IMAP-polled email to be distinguished from a manual re-forward. This information enables the founder re-forward short-circuit to function without wasting a spawn.

Forensic audit log: an append-only audit table records each significant action (action type, actor, structured details in JSONB, timestamp). By design, details never contain personal data in plaintext: only digests, identifiers, error codes, and metrics.

Note — Spawn and dispatch are two distinct pipelines:

The spawn module (scheduler every 5 minutes) triggers an agent in autonomous mode exclusively for chantier and question intents. The noise and run intents never trigger a spawn via this scheduler.

The dispatch module manages the email-based authorisation pipeline: dispatch token generation, sending the authorisation email via the official facade, then triple anti-spoofing verification (sender allowlist + DKIM + thread matching + single-use atomic token) before executing the dispatch command.

Both modules are outside the scope of this chapter.

Manual Attachment Extraction

A command-line tool allows attachments to be extracted from an email by message identifier (not by mailbox identifier). It accepts an optional output directory; by default, files are placed in a dedicated temporary folder.

Mailbox connection credentials (server, username, password) are read from the repository's environment files — the effective value takes precedence.
The parser is multipart-aware and preserves binary bytes on write.

Doctrine: extracted files must never be opened or parsed before the antivirus scanner has returned a clean verdict (see next section). The email search command lists attachments without extracting them, precisely to enforce this scan-first principle.

Outbound Professional Email Facade

A single facade centralises all outbound professional email sending. Direct use of the low-level SMTP library is prohibited outside this facade. This is the absolute rule for this scope.

Draft → Validation → Send Cycle

The workflow follows three distinct steps:

Draft (--draft): writes a timestamped JSON file in the drafts directory with status draft, and automatically sends a validation copy to the manager (never to the client). The message is not delivered to the client at this stage — a console message states this explicitly.
Validation (--validate): resends the validation copy to the manager for review. The --list command lists pending drafts; --preview displays the full content of an identified draft.
Actual Send (--send --draft-id N): sends the message to the final recipient and archives a copy in the IMAP mailbox's Sent folder. Only this explicit trigger, after human validation, delivers the message to the client.

The draft and send sub-commands are never chained automatically: this is the implementation of the show before send doctrine.

Available Arguments

Flag	Effect
`--to` / `--cc` / `--subject`	Recipients (repeatable or CSV) and message subject
`--body`	Inline message body
`--body-file`	Body read from a UTF-8 file (recommended, avoids inline)
`--markdown`	Converts the Markdown body to HTML before sending (recommended for all client emails)
`--html`	Body already in HTML format
`--attachment`	Attachment (repeatable) — fails immediately if the file is missing
`--draft-id`	Targets an existing draft for sending, preview, or validation

The --draft option requires --to, --subject, and a body (--body or --body-file) to be provided.

Automatic Validation Copy

Each time a draft is created, the system sends a copy to the manager with the subject prefixed [TO VALIDATE → <recipient>] <subject>. An anti-loop mechanism prevents this copy from being sent if the recipient is already the validation mailbox itself.

Before displaying the draft, the system queries the tone management facade to recall the appropriate register for the recipient — a best-effort, non-blocking operation.

Actual Send and Archiving

The actual send goes through SMTP and then archives a copy in the IMAP mailbox's Sent folder. Several naming variants for this folder are supported (OVH, Gmail conventions, UTF-7 encoding). If IMAP archiving fails, the email has still been sent — this is a non-blocking warning. The HTML signature is appended automatically at send time.

SMTP/IMAP Configuration

Connection parameters (SMTP server, port, credentials, sending mailbox, IMAP server) are read from the repository's environment files. A startup check lists any missing variable. Using port 465 activates SMTP over SSL mode.

Mandatory Antivirus Scan Before Opening (P0)

The antivirus scan facade is under the responsibility of agent Mitnick. No extracted attachment is opened or parsed before a clean verdict. Use of an external cloud service (such as an online analysis service) is excluded to protect client data. This safeguard is absolute and cannot be bypassed.

Analysis Layers

The scanner applies five successive layers:

Executable rejection: any recognised executable extension (Windows/Linux binaries, shell scripts, PowerShell, JavaScript, VBScript, Java archives…) immediately triggers an infected verdict.
ClamAV: signature-based analysis via the system binary. A positive match triggers infected in fail-fast mode. If ClamAV is not installed on the machine, the scan returns an explicit error.
PDF heuristics: regex-based detection of dangerous PDF constructs (embedded JavaScript, automatic actions, submitted forms, rich media…) on the first megabytes of the file → suspicious verdict.
Office macros: for macro-enabled Office formats, searches for auto-execution patterns and system calls via the olevba tool (best-effort) → suspicious.
Archives: no descent into archives; suspicious verdict by default, manual review required.

Clean Verdict Marker

On a clean verdict, an atomic JSON marker file (SHA-256 hash + timestamp) is written to a subdirectory of the attachment. Downstream processes operate in fail-close mode: the absence of the marker means "not yet scanned" and blocks processing. The write is atomic (temporary file + replacement); if it fails, the verdict remains correct — marker persistence is optional only.

Verdict Table

Verdict	Meaning	CLI Exit Code
`clean`	No signal detected — opening authorised	0
`suspicious`	Static signals present — manual review or sandbox	1
`infected`	ClamAV match or executable — DO NOT OPEN	2
`error`	Scan impossible (ClamAV unavailable, file missing…)	3

The command-line interface accepts one or more paths and returns the worst verdict across all analysed files. The inbound email processing pipeline consumes this scanner and refuses to proceed if attachments have not been declared safe.

Doctrine — Zero Direct Client Communication (P0)

The system never communicates directly with clients. All external communication goes through the human manager.

Several technical safeguards enforce this rule within this scope:

The sending facade never sends to the client when creating a draft; it writes the draft and sends a validation copy to the manager. Only the explicit post-validation send delivers the message to the client.
A prevention hook intercepts any attempt to directly use low-level mail libraries outside the whitelist of authorised facades. This safeguard makes the facade non-bypassable.
The inbound email classification engine never derives the recipient or the scope of a reply from the message body (anti-injection protection). Reply drafts remain in pending status until human validation.
Formal address (vouvoiement) is used systematically for all external contacts. Appointments are scheduled exclusively via an online booking link — a hard-coded time slot is never proposed in an email.

Active Scheduled Tasks

Frequency	Role	Pipeline
Every minute	Inbox synchronisation	IMAP collector → inbound email storage table (sole live collector)
Every minute	Processing pipeline polling	New email detection → processing queue
Every 15 minutes	Orphan lock monitoring	Automatic release of blocking pipeline locks
Every 5 minutes (with a 2-min initial delay)	Agentic spawn	Launch of headless instances for `chantier` and `question` intents only (fixed whitelist)

Disabled collector: the former collector that called the web application facade via an HTTP route has been commented out since May 2026. It caused saturation of the application server's event loop (cascading IMAP/database timeouts). It has been replaced by the external Python worker listed above. Do not reactivate it without fixing the root cause.

Known Pitfalls

Two distinct tables, two isolated worlds. The hub email storage table (indexed by IMAP identifier) and the processing pipeline tables (indexed by account and message identifier) cannot be joined.
Read flag set after insertion: a crash between fetch and insertion leaves the email unread, and therefore reprocessed on the next cycle. Idempotency is ensured by an insert conflict constraint.
Mandatory silent fetch (PEEK): the processing pipeline fetch must use PEEK mode to avoid consuming the unread flag prematurely.
--markdown recommended for all client emails (otherwise the body appears raw, unformatted). Always prefer --body-file over inline --body.
Scan-first non-negotiable: opening an attachment before a clean verdict constitutes a P0-level architectural debt.
un composant interne drift: the internal description of the email collector mentions a 5-minute interval, but it actually runs every minute.
Email anti-loop: the founder's contact mailbox intentionally forwards to the pipeline mailbox. Only the latter is filtered as an automatic sender. Do not add the founder's mailbox to the list of senders to ignore.
chantier intent: the classification engine never creates a chantier automatically — this is an explicit CLI action. This prevents the creation of orphan chantiers without a mission brief.

PreviousThe Hub (/hub/*)NextMemory & Learning — Three-Level Architecture

All chapters