Health Check Interpretation

Guide to understanding and troubleshooting health check responses from the crocbot gateway.

Health Endpoints Overview

Endpoint	Purpose	Use Case
`/health`	Gateway liveness	Platform probes (Docker, Fly.io, k8s)
`/metrics`	Prometheus metrics	Monitoring dashboards
`crocbot health`	CLI health check	Manual diagnostics
`crocbot status`	Full status report	Comprehensive debugging

Health Endpoint Response

Request

curl http://localhost:18789/health

Response

The /health endpoint is a lightweight liveness probe that returns a minimal response:

{"status": "healthy"}

Field	Type	Description
`status`	string	`"healthy"` if the gateway is running

A 200 response with "healthy" status confirms the gateway process is alive and accepting HTTP connections. If the endpoint does not respond, the gateway is down. For detailed diagnostics (memory, uptime, component health), use the CLI:

crocbot health --json
crocbot status --all

Interpreting Health Status

Status: Healthy (200 response)

{"status": "healthy"}

Meaning: Gateway is running and accepting connections.

No Response / Connection Refused

Meaning: Gateway is down or unresponsive. Actions:

Check if process is running
Check for crash in logs
Restart gateway
See Startup Shutdown

Memory Monitoring

The /health endpoint does not return memory metrics. Use the CLI or /metrics endpoint for memory monitoring:

# CLI health with diagnostics
crocbot health --json

# Prometheus metrics
curl -s http://localhost:18789/metrics | grep -E 'heap|rss'

Check for Restart Loop

# Docker restart count
docker inspect crocbot | jq '.[0].RestartCount'

CLI Health Commands

Basic Health Check

crocbot health
# Returns: OK or ERROR with details

JSON Output

crocbot health --json
# Returns full health snapshot as JSON

With Timeout

crocbot health --timeout 5000
# 5 second timeout (default is 10s)

Status Command (More Detail)

# Quick summary
crocbot status

# Full diagnostics
crocbot status --all

# With gateway probe
crocbot status --deep

Platform Health Probes

Docker Healthcheck

In docker-compose.yml or Dockerfile:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:18789/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 30s

Check health status:

docker inspect crocbot | jq '.[0].State.Health'

Fly.io Health Checks

In fly.toml:

[[services.http_checks]]
  interval = "30s"
  timeout = "10s"
  grace_period = "30s"
  method = "GET"
  path = "/health"

Kubernetes Probes

livenessProbe:
  httpGet:
    path: /health
    port: 18789
  initialDelaySeconds: 30
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /health
    port: 18789
  initialDelaySeconds: 5
  periodSeconds: 10

Troubleshooting Health Issues

Health Endpoint Not Responding

# 1. Check if process is running
pgrep -f crocbot || docker ps -f name=crocbot

# 2. Check if port is listening
ss -tlnp | grep 18789
# or
lsof -i :18789

# 3. Check network binding
curl -v http://localhost:18789/health 2>&1 | head -20

Connection Refused

Cause: Gateway not running or not bound to expected port/interface.

# Check gateway is running
systemctl --user status crocbot-gateway
# or
docker ps -f name=crocbot

# Check environment for port override
grep -i port /path/to/.env

Timeout

Cause: Gateway overloaded or hung.

# Check CPU usage
top -p $(pgrep -f crocbot) -n 1

# Check if process is responsive
strace -p $(pgrep -f crocbot) -c

# Force restart if hung
systemctl --user restart crocbot-gateway

High Memory Suspected

# Check process memory via CLI diagnostics
crocbot health --json

# Or check via system tools
ps -o rss,vsz -p $(pgrep -f crocbot)

# Restart if needed
docker restart crocbot

Metrics Endpoint

For detailed operational metrics, use /metrics:

curl http://localhost:18789/metrics

Key Metrics to Monitor

# Uptime
curl -s http://localhost:18789/metrics | grep crocbot_uptime

# Message counts
curl -s http://localhost:18789/metrics | grep crocbot_messages_total

# Errors
curl -s http://localhost:18789/metrics | grep crocbot_errors_total

# Telegram latency
curl -s http://localhost:18789/metrics | grep telegram_latency

See Metrics Documentation for full metric list.

Health Check Script

#!/bin/bash
# health-check.sh

set -e

HEALTH_URL="http://localhost:18789/health"
TIMEOUT=5

# Fetch health
RESPONSE=$(curl -sf --max-time $TIMEOUT "$HEALTH_URL" 2>/dev/null)

if [ $? -ne 0 ]; then
  echo "CRITICAL: Health endpoint unreachable"
  exit 2
fi

STATUS=$(echo "$RESPONSE" | jq -r '.status')

echo "Status: $STATUS"

if [ "$STATUS" != "healthy" ]; then
  echo "WARNING: Gateway status is $STATUS"
  exit 1
fi

echo "OK: Gateway healthy"
exit 0

Alerting on Health Issues

Configure alerting to notify on health degradation:

gateway:
  alerting:
    enabled: true
    telegram:
      chatId: "YOUR_ADMIN_CHAT_ID"
      minSeverity: "critical"

Critical alerts trigger for:

Gateway crashes
Authentication failures
Persistent connection failures

See Alerting Documentation for full configuration.

Health Checks (CLI) - CLI health commands
Metrics - Prometheus metrics endpoint
Alerting - Alert configuration
Startup Shutdown - Start/stop procedures
Incident Response - General troubleshooting

Start Here

Help

Install & Updates

CLI

Core Concepts

Gateway & Ops

Web & Interfaces

Channels

Providers

Automation & Hooks

Tools & Skills

Nodes & Media

Platforms

Reference & Templates

​Health Check Interpretation

​Health Endpoints Overview

​Health Endpoint Response

​Request

​Response

​Interpreting Health Status

​Status: Healthy (200 response)

​No Response / Connection Refused

​Memory Monitoring

​Check for Restart Loop

​CLI Health Commands

​Basic Health Check

​JSON Output

​With Timeout

​Status Command (More Detail)

​Platform Health Probes

​Docker Healthcheck

​Fly.io Health Checks

​Kubernetes Probes

​Troubleshooting Health Issues

​Health Endpoint Not Responding

​Connection Refused

​Timeout

​High Memory Suspected

​Metrics Endpoint

​Key Metrics to Monitor

​Health Check Script

​Alerting on Health Issues

​Related Documentation