Production Deployment

Deploy LRS-Agents to production with monitoring, logging, and high availability.

Overview

This guide covers:

  • Production-ready architecture

  • Monitoring and alerting

  • Structured logging

  • Scaling strategies

  • High availability

  • Security best practices

Architecture

Docker Deployment

Basic Docker Setup

# Build image
docker build -t lrs-agents:latest -f docker/Dockerfile .

# Run single container
docker run -d \
  -p 8000:8000 \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -e DATABASE_URL=$DATABASE_URL \
  lrs-agents:latest

Docker Compose

Use Docker Compose for local development and testing:

cd docker
docker-compose up -d

# Services available:
# - API: http://localhost:8000
# - Dashboard: http://localhost:8501
# - Database: localhost:5432

Production Docker Compose:

# docker-compose.prod.yml
version: '3.8'

services:
  lrs-api:
    image: lrsagents/lrs-agents:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/lrs
      - REDIS_URL=redis://redis:6379/0
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - LOG_LEVEL=INFO
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:15-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/var/lib/redis

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - lrs-api

volumes:
  postgres_data:
  redis_data:

Kubernetes Deployment

Basic Deployment

# Create namespace
kubectl create namespace lrs-agents

# Apply configurations
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/hpa.yaml

Verify deployment:

# Check pods
kubectl get pods -n lrs-agents

# Check services
kubectl get svc -n lrs-agents

# View logs
kubectl logs -f deployment/lrs-agents -n lrs-agents

Production Configuration

# k8s/deployment-prod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lrs-agents
  namespace: lrs-agents
spec:
  replicas: 5  # Start with 5 replicas
  selector:
    matchLabels:
      app: lrs-agents
  template:
    metadata:
      labels:
        app: lrs-agents
    spec:
      containers:
      - name: lrs-api
        image: lrsagents/lrs-agents:v0.2.0  # Pin version
        resources:
          requests:
            memory: "2Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: lrs-secrets
              key: database-url
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

Auto-scaling:

# k8s/hpa-prod.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lrs-agents-hpa
  namespace: lrs-agents
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lrs-agents
  minReplicas: 5
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Monitoring

Structured Logging

Set up structured logging for production:

from lrs.monitoring.structured_logging import create_logger_for_agent
import logging

# Create logger
logger = create_logger_for_agent(
    agent_id="production_agent",
    log_file="/var/log/lrs/agent.jsonl",
    console=False,  # Disable console in production
    level=logging.INFO
)

# Log events
logger.log_tool_execution(
    tool_name="fetch_api",
    success=True,
    execution_time=150.5,
    prediction_error=0.1
)

logger.log_adaptation_event(
    trigger="High prediction error",
    old_precision=0.6,
    new_precision=0.4,
    action="Explore alternatives"
)

Log Aggregation

Send logs to centralized system:

ELK Stack (Elasticsearch, Logstash, Kibana):

# filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/lrs/*.jsonl
  json.keys_under_root: true
  json.add_error_key: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

Datadog:

from datadog import initialize, statsd

# Initialize Datadog
initialize(api_key=os.getenv('DATADOG_API_KEY'))

# Send metrics
statsd.increment('lrs.agent.execution')
statsd.histogram('lrs.precision', precision_value)
statsd.gauge('lrs.tool.success_rate', success_rate)

Metrics and Alerting

Expose Prometheus metrics:

from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Define metrics
agent_runs = Counter('lrs_agent_runs_total', 'Total agent runs')
tool_executions = Counter('lrs_tool_executions_total', 'Total tool executions', ['tool', 'status'])
precision_value = Gauge('lrs_precision_value', 'Current precision', ['level'])
execution_time = Histogram('lrs_execution_time_seconds', 'Execution time')

# Record metrics
agent_runs.inc()
tool_executions.labels(tool='fetch_api', status='success').inc()
precision_value.labels(level='execution').set(0.75)

with execution_time.time():
    result = agent.run(task)

# Start metrics server
start_http_server(9090)

Prometheus configuration:

# prometheus.yml
scrape_configs:
  - job_name: 'lrs-agents'
    static_configs:
      - targets: ['lrs-api:9090']
    scrape_interval: 15s

Alerting rules:

# alerts.yml
groups:
- name: lrs_agents
  rules:
  - alert: HighFailureRate
    expr: rate(lrs_tool_executions_total{status="failure"}[5m]) > 0.5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High tool failure rate"

  - alert: LowPrecision
    expr: lrs_precision_value{level="execution"} < 0.3
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Agent precision consistently low"

  - alert: ServiceDown
    expr: up{job="lrs-agents"} == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "LRS-Agents service is down"

Dashboard

Run Streamlit dashboard for real-time monitoring:

# In separate container/pod
streamlit run lrs/monitoring/dashboard.py --server.port=8501

Grafana dashboards:

{
  "dashboard": {
    "title": "LRS-Agents Monitoring",
    "panels": [
      {
        "title": "Precision Over Time",
        "targets": [
          {
            "expr": "lrs_precision_value{level=\"execution\"}"
          }
        ]
      },
      {
        "title": "Tool Success Rate",
        "targets": [
          {
            "expr": "rate(lrs_tool_executions_total{status=\"success\"}[5m]) / rate(lrs_tool_executions_total[5m])"
          }
        ]
      },
      {
        "title": "Adaptation Events",
        "targets": [
          {
            "expr": "rate(lrs_adaptation_events_total[5m])"
          }
        ]
      }
    ]
  }
}

Database Management

Schema Setup

Initialize production database:

# Run migrations
psql $DATABASE_URL < docker/init.sql

# Or use migration tool
alembic upgrade head

Connection Pooling

Configure connection pooling:

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=20,          # Connections per instance
    max_overflow=10,       # Additional connections
    pool_timeout=30,       # Wait timeout
    pool_recycle=3600,     # Recycle connections after 1 hour
    pool_pre_ping=True     # Verify connections before use
)

Backup Strategy

Automated backups:

#!/bin/bash
# backup.sh

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="lrs_backup_$DATE.sql"

# Create backup
pg_dump $DATABASE_URL > $BACKUP_FILE

# Compress
gzip $BACKUP_FILE

# Upload to S3
aws s3 cp $BACKUP_FILE.gz s3://lrs-backups/

# Cleanup old backups (keep last 30 days)
find . -name "lrs_backup_*.sql.gz" -mtime +30 -delete

Schedule with cron:

# Daily backups at 2 AM
0 2 * * * /path/to/backup.sh

Security

API Authentication

Implement JWT authentication:

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt

security = HTTPBearer()

def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
    try:
        payload = jwt.decode(
            credentials.credentials,
            SECRET_KEY,
            algorithms=["HS256"]
        )
        return payload
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="Invalid token")

@app.post("/api/agent/run")
async def run_agent(task: str, token: dict = Depends(verify_token)):
    # Execute agent
    pass

Environment Variables

Securely manage secrets:

# Never commit secrets to version control
# Use environment variables or secret management

# Development
export ANTHROPIC_API_KEY="sk-ant-..."

# Production - Use secret management
# AWS Secrets Manager
aws secretsmanager get-secret-value --secret-id lrs/api-keys

# Kubernetes Secrets
kubectl create secret generic lrs-secrets \
  --from-literal=anthropic-api-key=sk-ant-...

Rate Limiting

Implement rate limiting:

from fastapi import Request
from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@app.post("/api/agent/run")
@limiter.limit("10/minute")  # 10 requests per minute
async def run_agent(request: Request, task: str):
    # Execute agent
    pass

Performance Optimization

Caching

Implement Redis caching:

import redis
import hashlib
import json

redis_client = redis.Redis(host='redis', port=6379, db=0)

def cache_agent_result(task: str, result: dict, ttl: int = 3600):
    """Cache agent execution result"""
    cache_key = hashlib.md5(task.encode()).hexdigest()
    redis_client.setex(cache_key, ttl, json.dumps(result))

def get_cached_result(task: str):
    """Get cached result if available"""
    cache_key = hashlib.md5(task.encode()).hexdigest()
    cached = redis_client.get(cache_key)
    return json.loads(cached) if cached else None

# Usage
result = get_cached_result(task)
if not result:
    result = agent.run(task)
    cache_agent_result(task, result)

Async Execution

Use async for better throughput:

import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=10)

async def run_agent_async(task: str):
    """Run agent in thread pool"""
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(
        executor,
        agent.run,
        task
    )
    return result

# Handle multiple requests concurrently
tasks = [run_agent_async(t) for t in task_list]
results = await asyncio.gather(*tasks)

Resource Limits

Set resource limits:

# Limit maximum iterations
result = agent.run(task, max_iterations=50)

# Timeout protection
import signal

def timeout_handler(signum, frame):
    raise TimeoutError("Agent execution timeout")

signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(300)  # 5 minute timeout

try:
    result = agent.run(task)
except TimeoutError:
    logger.error("Agent execution timed out")
finally:
    signal.alarm(0)

Health Checks

Implement health check endpoint:

from fastapi import FastAPI, status
from sqlalchemy import text

app = FastAPI()

@app.get("/health")
async def health_check():
    """Health check endpoint"""
    health = {
        "status": "healthy",
        "version": "0.2.0",
        "checks": {}
    }

    # Check database
    try:
        with engine.connect() as conn:
            conn.execute(text("SELECT 1"))
        health["checks"]["database"] = "ok"
    except Exception as e:
        health["status"] = "unhealthy"
        health["checks"]["database"] = f"error: {str(e)}"

    # Check Redis
    try:
        redis_client.ping()
        health["checks"]["redis"] = "ok"
    except Exception as e:
        health["status"] = "unhealthy"
        health["checks"]["redis"] = f"error: {str(e)}"

    # Check API keys
    if not os.getenv("ANTHROPIC_API_KEY"):
        health["status"] = "unhealthy"
        health["checks"]["api_keys"] = "missing"
    else:
        health["checks"]["api_keys"] = "ok"

    status_code = (
        status.HTTP_200_OK if health["status"] == "healthy"
        else status.HTTP_503_SERVICE_UNAVAILABLE
    )

    return health, status_code

Troubleshooting

Common Issues

High Memory Usage:

# Check memory usage
kubectl top pods -n lrs-agents

# Increase memory limits
# Update deployment.yaml and apply

Database Connection Errors:

# Enable connection pooling
# Add pool_pre_ping=True
# Increase pool_size

Slow Response Times:

# Check logs for slow operations
kubectl logs -f deployment/lrs-agents -n lrs-agents | grep "execution_time"

# Enable caching
# Scale horizontally

Debug Mode

Enable debug logging:

# Set environment variable
export LOG_LEVEL=DEBUG

# Or in Kubernetes
kubectl set env deployment/lrs-agents LOG_LEVEL=DEBUG -n lrs-agents

Checklist

Pre-deployment:

  • [ ] API keys configured

  • [ ] Database initialized

  • [ ] Secrets properly managed

  • [ ] Resource limits set

  • [ ] Health checks implemented

  • [ ] Monitoring configured

  • [ ] Logging set up

  • [ ] Backups automated

  • [ ] Rate limiting enabled

  • [ ] Load balancing configured

Post-deployment:

  • [ ] Health checks passing

  • [ ] Metrics being collected

  • [ ] Logs aggregating correctly

  • [ ] Alerts configured

  • [ ] Dashboard accessible

  • [ ] Performance acceptable

  • [ ] Error rate within limits

Next Steps

  • Set up monitoring with Prometheus/Grafana

  • Configure log aggregation (ELK, Datadog)

  • Implement CI/CD pipeline

  • Load test your deployment

  • Document runbooks for common issues

  • Set up on-call rotation

Further Reading

  • Monitoring API - Monitoring API reference

  • ../tutorials/07_production_deployment - Production tutorial

  • Kubernetes documentation

  • Docker best practices

  • Prometheus operator guide