8. Reasoning Model Support
Status: Accepted Date: 2026-02-16Context
Phase 13 adds native reasoning model support to crocbot. Reasoning models — OpenAI o1/o3/o4-mini, Anthropic Claude with extended thinking, DeepSeek-R1 — generate intermediate chain-of-thought tokens before producing final responses. These reasoning tokens improve answer quality for complex tasks but introduce architectural challenges: heterogeneous wire formats across providers, streaming event model differences, token budget tracking needs, and trace storage requirements for audit and analysis. Crocbot already has tag-based reasoning infrastructure from the v0.1.117 upstream sync:<think> tag parsing, block promotion, streaming detection, CLI display, Telegram blocking, session persistence, WebSocket broadcast, and OpenAI reasoning replay. However, this tag-based approach only works when reasoning content arrives embedded in text content as <think> tags. Native reasoning events from the Pi-AI SDK (thinking_start, thinking_delta, thinking_end) are filtered out at line 59 of pi-embedded-subscribe.handlers.messages.ts and never reach the processing pipeline.
The Pi-AI SDK already normalizes all three provider formats into a common thinking_delta event model. The key architectural question is how to consume these native events alongside the existing tag-based fallback, accumulate reasoning and response content, track token budgets per session, and persist reasoning traces for audit.
Decision
1. Adapter pattern for provider-specific reasoning events
AReasoningStreamAdapter interface normalizes SDK streaming events to a common ReasoningChunk type. Four concrete adapters handle the provider landscape:
- OpenAI adapter: Handles
thinking_deltaevents from o1/o3/o4-mini models. Maps SDK-normalized events toReasoningChunk. - Anthropic adapter: Handles
thinking_deltaevents from Claude extended thinking. Maps SDK-normalized events toReasoningChunk. - DeepSeek adapter: Handles
thinking_deltaevents from DeepSeek-R1. Maps SDK-normalized events toReasoningChunk. - Tag fallback adapter: Handles
<think>tags embedded intext_deltaevents. Reuses existingTHINKING_TAG_SCAN_RE/blockState.thinking/stripBlockTags()infrastructure. Activated when no native adapter matches or when models route through third-party providers that transform SSE streams.
message_start, the resolver iterates adapters in priority order (OpenAI -> Anthropic -> DeepSeek -> Tag Fallback) and locks in the first adapter where canHandle(model, provider) returns true. The adapter is fixed for the message duration — no mid-stream fallback to avoid duplicate processing.
The event filter gate at line 59 of pi-embedded-subscribe.handlers.messages.ts is expanded to also accept thinking_start, thinking_delta, and thinking_end events. These are routed through the active adapter to produce ReasoningChunk objects.
2. ChatGenerationResult accumulator
AChatGenerationResult class accumulates streaming chunks, routing reasoning content (from adapters) and text content (from text_delta events) into separate buffers. It provides:
- Chunk routing: Reasoning chunks from the adapter append to
reasoningText; text deltas append toresponseText. - Cross-chunk tag boundary handling: When the tag fallback adapter is active, text deltas pass through the existing
stripBlockTags()function which maintains statefulblockState.thinkingtracking. When a native adapter is active, tag stripping is bypassed. - Delta computation: Each
appendReasoningChunk()andappendTextDelta()call returns the delta (new text since last call) for incremental UI updates. - Usage recording:
recordUsage()normalizes provider-specific token fields (completion_tokens_details.reasoning_tokensfor OpenAI/DeepSeek, computed from delta length for Anthropic) to a commonreasoningTokenscount.
message_start and finalized at message_end. It replaces the ad-hoc delta tracking in the current emitReasoningStream() function.
3. Reasoning trace storage
Areasoning_traces SQLite table stores completed reasoning traces for audit, export, and analysis. Schema follows the consolidation-schema.ts idempotent migration pattern:
session_key, run_id, model, and created_at for efficient queries. Retention is configurable with a default of 30 days, enforced by pruneOlderThan().
The trace store subscribes to agent events via onAgentEvent() at application startup. It listens for stream: "thinking" events to accumulate reasoning text during streaming, then persists the complete trace when the generation finishes.
Reasoning text passes through the Phase 9 secrets masking pipeline before storage, adding a 6th masking boundary to the existing 5-boundary pipeline (logging, config, LLM call, tool result, Telegram output).
4. Per-session reasoning budget tracker
AReasoningBudgetTracker tracks cumulative reasoning token consumption per session. Configuration:
maxReasoningTokensPerSession: Hard limit (default 500,000 tokens, 0 = unlimited)warningThresholdPercent: Emit warning at this percentage (default 80%)
emitAgentEvent() with stream: "budget":
reasoning_budget_warningat the warning thresholdreasoning_budget_exceededat 100%
5. Preserving tag-based fallback
The existing tag-based reasoning infrastructure is preserved in its entirety:THINKING_TAG_SCAN_RE/FINAL_TAG_SCAN_REregexesblockState.thinkingcross-chunk trackingstripBlockTags()stateful strippingpromoteThinkingTagsToBlocks()post-stream conversionextractAssistantThinking()final message extraction
<think> tags in text content rather than native reasoning events.
Consequences
Enables
- Native
thinking_deltaevent consumption from the Pi-AI SDK (previously filtered out) - Real-time reasoning token streaming to CLI and WebSocket clients via native events (lower latency than tag scanning)
- Per-session reasoning token budget tracking with configurable thresholds and warnings
- Reasoning trace audit trail with SQLite storage, session/run queries, and configurable retention
- Normalized provider abstraction: four adapters hide the OpenAI, Anthropic, DeepSeek, and tag-based format differences behind a common
ReasoningChunkinterface
Preserves
- Tag-based reasoning fallback for third-party providers, open-weight models, and proxy deployments
- Existing
emitReasoningStream()andemitAgentEvent()infrastructure - Existing reasoning mode configuration (
"off","on","stream") - Phase 9 secrets masking pipeline (extended with 6th boundary for trace storage)
- Phase 10 per-provider rate limiting (budget tracker is complementary, not replacement)
Trade-offs
- Adapter selection is locked at
message_start— if a native adapter is selected but the model unexpectedly sends tag-based reasoning, the tags will not be processed (mitigated by conservativecanHandle()checks and tag fallback as lowest priority) - Trace storage increases SQLite database size by 1-10KB per reasoning interaction (mitigated by configurable retention and pruning)
- Budget tracker is in-memory per session — budget state is lost on process restart (acceptable: budget is a guardrail, not a billing mechanism)
- Anthropic reasoning token count is estimated from delta text length (no dedicated
thinking_tokensfield in usage), introducing slight inaccuracy (mitigated by using characters/4 as conservative estimate)
Dependencies
- Phase 11 (ADR-0006): Model role routing identifies reasoning models via
ModelRouter - Phase 10 (ADR-0005): Per-provider rate limiting for complementary budget checks
- Phase 9 (ADR-0004): Secrets masking pipeline for reasoning trace storage
