Skip to main content

Architecture

System Overview

Crocbot is a personal AI assistant with a Gateway control plane and Telegram integration. It follows a lean, single-user deployment model optimized for VPS/Docker hosting.

Dependency Graph

          +------------------+
          |     Telegram     |
          |   (grammY bot)   |
          +--------+---------+
                   |
                   v
          +--------+---------+
          |     Gateway      |
          | (WebSocket + HTTP)|
          +--------+---------+
                   |
      +------+-----+------+---------+
      |      |            |         |
      v      v            v         v
  Agents  Sessions     Cron    MCP Server
  (Pi RT) (Storage)   (Jobs)  (SSE + HTTP)
      |                              ^
      +---> Model Router              |
      |    (reasoning / utility)   External AI
      +---> Rate Limiter + KeyPool  Systems
      |         |
      |    LLM Providers
      |   (Claude, OpenAI, ...)
      +---> MCP Client
      |    (stdio, SSE, HTTP)
      +---> Reasoning Adapter
      |    (reasoning_delta / tags)
      +---> Project Workspaces
      |    (isolated memory/prompts)
      +---> Knowledge Import
           (parse/chunk/embed/dedup)

Components

Gateway

  • Purpose: Central control plane for sessions, Telegram, tools, and events
  • Tech: Node.js, WebSocket, Express
  • Location: src/gateway/
  • Stability: Session reset aborts active runs, config merge by ID (preserves array ordering), session key normalization (case-insensitive), WebSocket max payload 5 MB, bounded agent run sequence map, expired hook auth state pruning

CLI

  • Purpose: Command-line interface for gateway management and agent invocation
  • Tech: Node.js, commander
  • Location: src/cli/, src/commands/

Telegram Channel

  • Purpose: Full bot integration via grammY with groups, DMs, media, and inline model selection
  • Tech: grammY, @grammyjs/runner
  • Location: src/telegram/, src/channels/
  • Key modules: bot-handlers.ts (callbacks), model-buttons.ts (inline keyboards), network-errors.ts (Grammy timeout recovery), monitor.ts (scoped rejection handler)

Agent Runtime

  • Purpose: Pi embedded runtime with tool streaming and block streaming
  • Tech: TypeScript, RPC mode
  • Location: src/agents/
  • Key modules: session-transcript-repair.ts (JSONL repair, tool call sanitization), session-file-repair.ts (crash-resilient file recovery)
  • Stability: Compaction deadlock prevention via withTimeout (30s default), token accounting fix after compaction (totalTokens update), exec override preservation across compaction, tool call ID sanitization for transcript integrity

Media Pipeline

  • Purpose: Image/audio/video processing, transcription, size caps
  • Tech: Node.js streams, temp file lifecycle
  • Location: src/media/

Security Layer

  • Purpose: SSRF protection, path traversal validation, exec allowlisting, input validation, auth hardening
  • Tech: DNS pinning, IP range blocking, AbortSignal timeouts, timing-safe comparison
  • Location: src/infra/net/ (ssrf.ts, fetch-guard.ts), src/infra/exec-approvals.ts, src/security/, src/gateway/
  • Key modules: ssrf.ts (private IP/hostname blocking, IPv6-mapped bypass prevention, redirect validation), fetch-guard.ts (guarded fetch wrapper), exec-approvals.ts (shell expansion blocking, heredoc handling, allowlist enforcement), security-headers.ts (CSP, X-Frame-Options, nosniff, path traversal filtering), auth-rate-limit.ts (sliding-window per-IP auth rate limiting with lockout), secret-equal.ts (timing-safe token comparison via crypto.timingSafeEqual), path-output.ts (output path containment), http-body.ts (bounded HTTP body reading), base64.ts (oversized base64 rejection)
  • Security domains: Network/SSRF, filesystem containment, input sanitization, authentication, execution hardening, data leak prevention, ACP tool safety

Secrets Masking (src/infra/secrets/)

  • Purpose: Prevent credential leakage across all output boundaries
  • Tech: Custom Aho-Corasick masker, SecretsRegistry singleton, value-based + pattern-based defense-in-depth
  • Key modules: registry.ts (singleton, auto-discovery from env/config), masker.ts (Aho-Corasick for 10+ patterns, sequential fallback), stream-masker.ts (cross-chunk boundary detection), logging-transport.ts (tslog masking transport), llm-masking.ts (context wrapper), tool-result-masking.ts (agent tool output), error-masking.ts (error messages)
  • Boundaries: (1) Logging, (2) Config snapshots, (3) LLM context, (4) Streaming output, (5) Tool results, (6) Telegram send, (7) Error formatting

Runtime Infrastructure (src/infra/)

  • Purpose: Cross-cutting runtime utilities for concurrency, timeouts, and memory safety
  • Key modules: async-mutex.ts (Promise-chain mutex replacing proper-lockfile for session locking), with-timeout.ts (generic withTimeout<T> wrapper with configurable deadline and cleanup)
  • Memory bounding: Diagnostic session state capped, directory cache bounded with LRU eviction, shell output buffers truncated at configurable limit, abort controller maps bounded, agent run sequence tracking bounded
  • Heartbeat hardening: Wake handler race prevention, runOnce error recovery (scheduler survives thrown errors), heartbeat exempt from empty-event skip

Logging & Observability

  • Purpose: Structured logging, metrics, error alerting
  • Tech: tslog (with secrets masking transport), OpenTelemetry-compatible metrics
  • Location: src/logging/, src/metrics/, src/alerting/

Plugin System

  • Purpose: Extensible plugin runtime with SDK
  • Tech: TypeScript plugin loader
  • Location: src/plugins/, src/plugin-sdk/

Cron Scheduler

  • Purpose: Scheduled jobs and wakeups
  • Tech: Node.js timers, JSONL persistence
  • Location: src/cron/

Memory

  • Purpose: Conversation memory, context management, and AI-powered consolidation
  • Tech: SQLite + sqlite-vec (vector similarity), file-based storage, utility model for consolidation/extraction
  • Location: src/memory/
  • Key modules: consolidation.ts (5-action consolidation engine), auto-memorize.ts (post-conversation extraction pipeline), consolidation-schema.ts (schema migration), consolidation-actions.ts (types, DI interfaces)
  • Consolidation engine: Processes new memory chunks through a pipeline: vector similarity search -> candidate validation -> LLM analysis (utility model) -> atomic DB action (MERGE, REPLACE, KEEP_SEPARATE, UPDATE, SKIP). Safety gates enforce minimum similarity (0.9) for destructive REPLACE. All decisions logged to consolidation_log audit table with reasoning, source IDs, and timestamps.
  • 4-area schema: Memories categorized into main (general), fragments (facts/preferences), solutions (problem/solution pairs), instruments (tools/techniques). Area metadata stored on each chunk; recall queries filter by area.
  • Auto-memorize hooks: Fire-and-forget extraction at session end. Three extraction types (solutions, fragments, instruments) run independently via Promise.allSettled. Budget-aware: each type checks rate limiter before LLM call, skips gracefully when exhausted. Extracted items stored with area metadata, triggering consolidation for dedup.
  • Composition: AutoMemorize (transcript extraction) -> storeExtractedChunk (categorized storage) -> ConsolidationEngine (dedup pipeline). All LLM calls use taskType: "consolidation" to route through the utility model role.

Rate Limiting (src/infra/)

  • Purpose: Per-provider rate limiting, API key rotation, and transient error retry
  • Tech: Sliding window log algorithm (RPM/TPM), health-aware round-robin key pool, exponential backoff with jitter
  • Key modules: provider-rate-limiter.ts (sliding window RPM/TPM enforcement), key-pool.ts (health-aware round-robin with rate limiter integration), llm-retry.ts (transient error classification and Retry-After parsing), rate-limit-middleware.ts (pre-flight/post-flight wrapper for LLM call sites)
  • Composition: Four-layer pipeline — ProviderRateLimiter (sliding window) -> KeyPool (key selection) -> retryAsync + createLlmRetryOptions (transient retry) -> withRateLimitCheck (call-site middleware). Zero overhead when no limits configured (pass-through mode).

Model Roles (src/agents/)

  • Purpose: Route LLM calls to specialized models by task type for cost optimization
  • Tech: Pattern-based task classification, 2-role architecture (reasoning + utility)
  • Key modules: model-router.ts (ModelRouter interface, createModelRouter factory), task-classifier.ts (fixed task-type-to-role mapping), model-roles.ts (config parsing, role resolution, fallback logic)
  • Composition: TaskClassifier (call-site classification) -> ModelRouter (role resolution) -> resolveModel (provider/model selection). Utility tasks (compaction, memory-flush, heartbeat, llm-task) route to cheap model; reasoning tasks use primary model. Missing config gracefully degrades to primary model.

MCP Client

  • Purpose: In-process client connecting to external MCP tool servers
  • Tech: @modelcontextprotocol/sdk, stdio/SSE/HTTP transports
  • Location: src/mcp/ (client.ts, client-transport.ts, transport-*.ts)
  • Key modules: client.ts (lifecycle manager with reconnect), tool-bridge.ts (MCP-to-agent tool conversion), transport-ssrf.ts (SSRF-guarded fetch for remote transports)

MCP Server

  • Purpose: Exposes crocbot as MCP infrastructure for external AI systems
  • Tech: @modelcontextprotocol/sdk, SSE + streamable HTTP transports
  • Location: src/mcp/ (server.ts, server-auth.ts, server-tools.ts, server-mount.ts)
  • Key modules: server-auth.ts (Bearer token with timing-safe comparison), server-tools.ts (send_message, finish_chat, query_memory, list_capabilities), server-mount.ts (HTTP route mounting)

Reasoning Model Support (src/agents/, src/shared/text/)

  • Purpose: Native reasoning stream parsing for o1/o3, DeepSeek-R1, and Claude extended thinking
  • Tech: Provider-specific stream adapters, tag-based fallback, SQLite trace storage
  • Key modules: reasoning-stream-adapter.ts (per-provider reasoning_delta detection), chat-generation-result.ts (chunk accumulator separating reasoning from response), reasoning-trace-storage.ts (queryable trace table by session/turn/model), reasoning-tags.ts (XML tag parsing with strict/preserve modes)
  • Capabilities: Native reasoning_delta streaming for providers that support it, tag-based fallback (<think>, <thinking>, <thought>) for others. ChatGenerationResult accumulator handles cross-chunk boundaries and provides delta computation. Reasoning token budget tracking per session. CLI display modes (on/stream/off), Telegram blocking of reasoning content, WebSocket broadcast of thinking events.

Project Workspaces (src/agents/, src/config/)

  • Purpose: Isolated memory, prompts, knowledge base, and sessions per project
  • Tech: Project-scoped directory layout under state dir, per-project sqlite-vec indexes
  • Key modules: project-scope.ts (10 exported functions: resolveProjectDir, listProjects, getProjectConfig, setProjectConfig, getActiveProject, setActiveProject, isDefaultProject, resolveProjectMemoryDir, resolveProjectSessionsDir, stripProjectFromSessionKey), types.projects.ts (ProjectConfig, ProjectsConfig types)
  • Storage layout: {stateDir}/agents/{agentId}/projects/{projectName}/ with subdirectories for memory/, sessions/, and config files. Default project uses agent-level paths (no “default” subdirectory).
  • Switching: CLI (crocbot --project <name>, /project subcommands), Telegram (/project <name>), RPC (projects.list, projects.current, projects.switch, projects.create, projects.delete).

Knowledge Import Pipeline (src/knowledge/)

  • Purpose: Ingest external documents and URLs into project-scoped vector knowledge base
  • Tech: Parser registry (strategy pattern), heading-aware chunking, content-hash dedup, sqlite-vec storage
  • Key modules: pipeline.ts (6-stage orchestrator: fetch/parse/chunk/embed/dedup/store), parsers/registry.ts (priority-ordered parser dispatch), chunker.ts (heading-aware splitting with overlap), dedup.ts (hash-first then similarity dedup), storage.ts (sqlite-vec knowledge_chunks/knowledge_vectors schema), state.ts (incremental re-import state machine), incremental.ts (new/unchanged/changed classification)
  • Parsers: text-parser.ts (universal fallback), markdown-parser.ts (frontmatter extraction), pdf-parser.ts (pdfjs-dist, lazy-loaded), url-parser.ts (cheerio + node-html-markdown, SSRF-guarded fetch)
  • CLI: crocbot knowledge import <source> (URL/file/text, --project, --category, --dry-run, --force, --batch), crocbot knowledge list, crocbot knowledge remove <source>
  • Composition: ParserRegistry (format detection) -> chunker (heading-aware splitting) -> embedChunksInBatches -> deduplicateChunks (hash + similarity) -> KnowledgeStorage (sqlite-vec). Incremental re-import via content-hash state machine skips unchanged sources.

Tech Stack Rationale

TechnologyPurposeWhy Chosen
Node.js 22+RuntimeModern ESM support, stable LTS
TypeScript (ES2023)LanguageType safety, strict mode, NodeNext modules
tsdown (rolldown)Bundler~5s builds, replaces tsc emit
pnpmPackage managerFast, disk-efficient
grammYTelegram SDKModern, well-maintained, middleware support
VitestTestingFast, ESM-native, good DX
@modelcontextprotocol/sdkMCP Client + ServerOfficial TypeScript SDK for Model Context Protocol
oxlint + oxfmtLint + FormatFast Rust-based toolchain, 134 type-aware rules

Data Flow

  1. User sends message via Telegram (or external AI calls MCP server endpoint)
  2. grammY bot receives message, routes to Gateway
  3. Gateway creates/resumes session, resolves active project (default or project-scoped)
  4. Agent builds context: system prompt + project-scoped memory recall + conversation history
  5. SecretsRegistry masks any credentials in LLM context before provider call
  6. Model router classifies the task type and resolves the appropriate model (reasoning or utility)
  7. Rate limiter checks RPM/TPM capacity; KeyPool selects best API key via round-robin
  8. Agent processes message with LLM provider (transient errors retried with backoff), invoking MCP client tools as needed
  9. Reasoning adapter separates reasoning_delta from text_delta streams; traces stored to reasoning_traces table
  10. StreamMasker masks secrets in streaming response chunks (cross-boundary detection)
  11. Tool results masked before persistence and display
  12. Response streamed back through Gateway to Telegram (masked) and persisted to project-scoped session transcript (masked)

Key Architectural Decisions

See Architecture Decision Records for detailed history:

Directory Structure

src/
  agents/           # Agent runtime, tools, session repair, model routing
  alerting/         # Error reporting and alerting
  auto-reply/       # Message dispatch and routing
  browser/          # Browser control (CDP)
  channels/         # Channel registry
  cli/              # CLI entry point
  commands/         # CLI commands
  config/           # Configuration loading
  cron/             # Scheduled jobs
  daemon/           # Daemon process management
  gateway/          # Gateway control plane
  hooks/            # Hook system
  infra/            # Infrastructure (exec, net/SSRF, secrets masking, rate limiting)
  knowledge/        # Knowledge import pipeline (parsers, chunking, dedup, storage)
  logging/          # Structured logging (tslog)
  media/            # Media pipeline
  media-understanding/  # Media analysis
  mcp/              # MCP client, server, transports, tool bridge
  memory/           # Memory management
  metrics/          # Metrics and monitoring
  plugins/          # Plugin runtime
  plugin-sdk/       # Plugin SDK
  providers/        # LLM provider integrations
  routing/          # Message routing
  telegram/         # Telegram bot (grammY)
docs/               # Documentation (Mintlify)
ui/                 # Control UI
test/               # Shared/e2e tests
scripts/            # Development scripts