8. Reasoning Model Support

Status: Accepted Date: 2026-02-16

Context

Phase 13 adds native reasoning model support to crocbot. Reasoning models — OpenAI o1/o3/o4-mini, Anthropic Claude with extended thinking, DeepSeek-R1 — generate intermediate chain-of-thought tokens before producing final responses. These reasoning tokens improve answer quality for complex tasks but introduce architectural challenges: heterogeneous wire formats across providers, streaming event model differences, token budget tracking needs, and trace storage requirements for audit and analysis. Crocbot already has tag-based reasoning infrastructure from the v0.1.117 upstream sync: <think> tag parsing, block promotion, streaming detection, CLI display, Telegram blocking, session persistence, WebSocket broadcast, and OpenAI reasoning replay. However, this tag-based approach only works when reasoning content arrives embedded in text content as <think> tags. Native reasoning events from the Pi-AI SDK (thinking_start, thinking_delta, thinking_end) are filtered out at line 59 of pi-embedded-subscribe.handlers.messages.ts and never reach the processing pipeline. The Pi-AI SDK already normalizes all three provider formats into a common thinking_delta event model. The key architectural question is how to consume these native events alongside the existing tag-based fallback, accumulate reasoning and response content, track token budgets per session, and persist reasoning traces for audit.

Decision

1. Adapter pattern for provider-specific reasoning events

A ReasoningStreamAdapter interface normalizes SDK streaming events to a common ReasoningChunk type. Four concrete adapters handle the provider landscape:

OpenAI adapter: Handles thinking_delta events from o1/o3/o4-mini models. Maps SDK-normalized events to ReasoningChunk.
Anthropic adapter: Handles thinking_delta events from Claude extended thinking. Maps SDK-normalized events to ReasoningChunk.
DeepSeek adapter: Handles thinking_delta events from DeepSeek-R1. Maps SDK-normalized events to ReasoningChunk.
Tag fallback adapter: Handles <think> tags embedded in text_delta events. Reuses existing THINKING_TAG_SCAN_RE / blockState.thinking / stripBlockTags() infrastructure. Activated when no native adapter matches or when models route through third-party providers that transform SSE streams.

Adapter selection uses a strategy pattern: at message_start, the resolver iterates adapters in priority order (OpenAI -> Anthropic -> DeepSeek -> Tag Fallback) and locks in the first adapter where canHandle(model, provider) returns true. The adapter is fixed for the message duration — no mid-stream fallback to avoid duplicate processing. The event filter gate at line 59 of pi-embedded-subscribe.handlers.messages.ts is expanded to also accept thinking_start, thinking_delta, and thinking_end events. These are routed through the active adapter to produce ReasoningChunk objects.

2. ChatGenerationResult accumulator

A ChatGenerationResult class accumulates streaming chunks, routing reasoning content (from adapters) and text content (from text_delta events) into separate buffers. It provides:

Chunk routing: Reasoning chunks from the adapter append to reasoningText; text deltas append to responseText.
Cross-chunk tag boundary handling: When the tag fallback adapter is active, text deltas pass through the existing stripBlockTags() function which maintains stateful blockState.thinking tracking. When a native adapter is active, tag stripping is bypassed.
Delta computation: Each appendReasoningChunk() and appendTextDelta() call returns the delta (new text since last call) for incremental UI updates.
Usage recording: recordUsage() normalizes provider-specific token fields (completion_tokens_details.reasoning_tokens for OpenAI/DeepSeek, computed from delta length for Anthropic) to a common reasoningTokens count.

The accumulator is created at message_start and finalized at message_end. It replaces the ad-hoc delta tracking in the current emitReasoningStream() function.

3. Reasoning trace storage

A reasoning_traces SQLite table stores completed reasoning traces for audit, export, and analysis. Schema follows the consolidation-schema.ts idempotent migration pattern:

CREATE TABLE IF NOT EXISTS reasoning_traces (
  id TEXT PRIMARY KEY,
  session_key TEXT NOT NULL,
  run_id TEXT NOT NULL,
  model TEXT NOT NULL,
  provider TEXT NOT NULL,
  reasoning_text TEXT NOT NULL,
  reasoning_tokens INTEGER NOT NULL DEFAULT 0,
  total_tokens INTEGER NOT NULL DEFAULT 0,
  duration_ms INTEGER NOT NULL DEFAULT 0,
  metadata TEXT DEFAULT '{}',
  created_at INTEGER NOT NULL
);

Indexed on session_key, run_id, model, and created_at for efficient queries. Retention is configurable with a default of 30 days, enforced by pruneOlderThan(). The trace store subscribes to agent events via onAgentEvent() at application startup. It listens for stream: "thinking" events to accumulate reasoning text during streaming, then persists the complete trace when the generation finishes. Reasoning text passes through the Phase 9 secrets masking pipeline before storage, adding a 6th masking boundary to the existing 5-boundary pipeline (logging, config, LLM call, tool result, Telegram output).

4. Per-session reasoning budget tracker

A ReasoningBudgetTracker tracks cumulative reasoning token consumption per session. Configuration:

maxReasoningTokensPerSession: Hard limit (default 500,000 tokens, 0 = unlimited)
warningThresholdPercent: Emit warning at this percentage (default 80%)

The tracker emits events via emitAgentEvent() with stream: "budget":

reasoning_budget_warning at the warning threshold
reasoning_budget_exceeded at 100%

The budget tracker operates at the session level (total reasoning tokens per session), complementing the Phase 10 rate limiter which operates at the provider level (RPM/TPM per sliding window). Budget checks run after rate limiter checks succeed.

5. Preserving tag-based fallback

The existing tag-based reasoning infrastructure is preserved in its entirety:

THINKING_TAG_SCAN_RE / FINAL_TAG_SCAN_RE regexes
blockState.thinking cross-chunk tracking
stripBlockTags() stateful stripping
promoteThinkingTagsToBlocks() post-stream conversion
extractAssistantThinking() final message extraction

These serve as the fallback adapter path and continue to handle models accessed through third-party providers, open-weight deployments (vLLM, Ollama), and any model that emits <think> tags in text content rather than native reasoning events.

Consequences

Enables

Native thinking_delta event consumption from the Pi-AI SDK (previously filtered out)
Real-time reasoning token streaming to CLI and WebSocket clients via native events (lower latency than tag scanning)
Per-session reasoning token budget tracking with configurable thresholds and warnings
Reasoning trace audit trail with SQLite storage, session/run queries, and configurable retention
Normalized provider abstraction: four adapters hide the OpenAI, Anthropic, DeepSeek, and tag-based format differences behind a common ReasoningChunk interface

Preserves

Tag-based reasoning fallback for third-party providers, open-weight models, and proxy deployments
Existing emitReasoningStream() and emitAgentEvent() infrastructure
Existing reasoning mode configuration ("off", "on", "stream")
Phase 9 secrets masking pipeline (extended with 6th boundary for trace storage)
Phase 10 per-provider rate limiting (budget tracker is complementary, not replacement)

Trade-offs

Adapter selection is locked at message_start — if a native adapter is selected but the model unexpectedly sends tag-based reasoning, the tags will not be processed (mitigated by conservative canHandle() checks and tag fallback as lowest priority)
Trace storage increases SQLite database size by 1-10KB per reasoning interaction (mitigated by configurable retention and pruning)
Budget tracker is in-memory per session — budget state is lost on process restart (acceptable: budget is a guardrail, not a billing mechanism)
Anthropic reasoning token count is estimated from delta text length (no dedicated thinking_tokens field in usage), introducing slight inaccuracy (mitigated by using characters/4 as conservative estimate)

Dependencies

Phase 11 (ADR-0006): Model role routing identifies reasoning models via ModelRouter
Phase 10 (ADR-0005): Per-provider rate limiting for complementary budget checks
Phase 9 (ADR-0004): Secrets masking pipeline for reasoning trace storage

Start Here

Help

Install & Updates

CLI

Core Concepts

Gateway & Ops

Web & Interfaces

Channels

Providers

Automation & Hooks

Tools & Skills

Nodes & Media

Platforms

Reference & Templates

0008 reasoning model support

8. Reasoning Model Support

Context

Decision

1. Adapter pattern for provider-specific reasoning events

2. ChatGenerationResult accumulator

3. Reasoning trace storage

4. Per-session reasoning budget tracker

5. Preserving tag-based fallback

Consequences

Enables

Preserves

Trade-offs

Dependencies

Start Here

Help

Install & Updates

CLI

Core Concepts

Gateway & Ops

Web & Interfaces

Channels

Providers

Automation & Hooks

Tools & Skills

Nodes & Media

Platforms

Reference & Templates

​8. Reasoning Model Support

​Context

​Decision

​1. Adapter pattern for provider-specific reasoning events

​2. ChatGenerationResult accumulator

​3. Reasoning trace storage

​4. Per-session reasoning budget tracker

​5. Preserving tag-based fallback

​Consequences

​Enables

​Preserves

​Trade-offs

​Dependencies

8. Reasoning Model Support

Context

Decision

1. Adapter pattern for provider-specific reasoning events

2. ChatGenerationResult accumulator

3. Reasoning trace storage

4. Per-session reasoning budget tracker

5. Preserving tag-based fallback

Consequences

Enables

Preserves

Trade-offs

Dependencies