Production Deployment
Deploy LRS-Agents to production with monitoring, logging, and high availability.
Overview
This guide covers:
Production-ready architecture
Monitoring and alerting
Structured logging
Scaling strategies
High availability
Security best practices
Architecture
Recommended Stack
┌─────────────────────────────────────────┐
│ Load Balancer (Nginx) │
└─────────────────┬───────────────────────┘
│
┌───────────┴───────────┐
│ │
┌─────▼──────┐ ┌─────▼──────┐
│ LRS API │ │ LRS API │ (Multiple instances)
│ Instance │ │ Instance │
└─────┬──────┘ └─────┬──────┘
│ │
└───────────┬───────────┘
│
┌───────────┴───────────┐
│ │
┌─────▼──────┐ ┌─────▼──────┐
│ PostgreSQL │ │ Redis │
│ Database │ │ Cache │
└────────────┘ └────────────┘
Components:
Load Balancer: Distributes traffic across instances
LRS API Instances: Stateless agent execution servers
PostgreSQL: Persistent storage for execution history
Redis: Caching and job queue
Docker Deployment
Basic Docker Setup
# Build image
docker build -t lrs-agents:latest -f docker/Dockerfile .
# Run single container
docker run -d \
-p 8000:8000 \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-e DATABASE_URL=$DATABASE_URL \
lrs-agents:latest
Docker Compose
Use Docker Compose for local development and testing:
cd docker
docker-compose up -d
# Services available:
# - API: http://localhost:8000
# - Dashboard: http://localhost:8501
# - Database: localhost:5432
Production Docker Compose:
# docker-compose.prod.yml
version: '3.8'
services:
lrs-api:
image: lrsagents/lrs-agents:latest
deploy:
replicas: 3
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/lrs
- REDIS_URL=redis://redis:6379/0
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- LOG_LEVEL=INFO
depends_on:
- postgres
- redis
postgres:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
redis:
image: redis:7-alpine
volumes:
- redis_data:/var/lib/redis
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/nginx/ssl
depends_on:
- lrs-api
volumes:
postgres_data:
redis_data:
Kubernetes Deployment
Basic Deployment
# Create namespace
kubectl create namespace lrs-agents
# Apply configurations
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/hpa.yaml
Verify deployment:
# Check pods
kubectl get pods -n lrs-agents
# Check services
kubectl get svc -n lrs-agents
# View logs
kubectl logs -f deployment/lrs-agents -n lrs-agents
Production Configuration
# k8s/deployment-prod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: lrs-agents
namespace: lrs-agents
spec:
replicas: 5 # Start with 5 replicas
selector:
matchLabels:
app: lrs-agents
template:
metadata:
labels:
app: lrs-agents
spec:
containers:
- name: lrs-api
image: lrsagents/lrs-agents:v0.2.0 # Pin version
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: lrs-secrets
key: database-url
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
Auto-scaling:
# k8s/hpa-prod.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: lrs-agents-hpa
namespace: lrs-agents
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: lrs-agents
minReplicas: 5
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitoring
Structured Logging
Set up structured logging for production:
from lrs.monitoring.structured_logging import create_logger_for_agent
import logging
# Create logger
logger = create_logger_for_agent(
agent_id="production_agent",
log_file="/var/log/lrs/agent.jsonl",
console=False, # Disable console in production
level=logging.INFO
)
# Log events
logger.log_tool_execution(
tool_name="fetch_api",
success=True,
execution_time=150.5,
prediction_error=0.1
)
logger.log_adaptation_event(
trigger="High prediction error",
old_precision=0.6,
new_precision=0.4,
action="Explore alternatives"
)
Log Aggregation
Send logs to centralized system:
ELK Stack (Elasticsearch, Logstash, Kibana):
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/lrs/*.jsonl
json.keys_under_root: true
json.add_error_key: true
output.elasticsearch:
hosts: ["elasticsearch:9200"]
Datadog:
from datadog import initialize, statsd
# Initialize Datadog
initialize(api_key=os.getenv('DATADOG_API_KEY'))
# Send metrics
statsd.increment('lrs.agent.execution')
statsd.histogram('lrs.precision', precision_value)
statsd.gauge('lrs.tool.success_rate', success_rate)
Metrics and Alerting
Expose Prometheus metrics:
from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Define metrics
agent_runs = Counter('lrs_agent_runs_total', 'Total agent runs')
tool_executions = Counter('lrs_tool_executions_total', 'Total tool executions', ['tool', 'status'])
precision_value = Gauge('lrs_precision_value', 'Current precision', ['level'])
execution_time = Histogram('lrs_execution_time_seconds', 'Execution time')
# Record metrics
agent_runs.inc()
tool_executions.labels(tool='fetch_api', status='success').inc()
precision_value.labels(level='execution').set(0.75)
with execution_time.time():
result = agent.run(task)
# Start metrics server
start_http_server(9090)
Prometheus configuration:
# prometheus.yml
scrape_configs:
- job_name: 'lrs-agents'
static_configs:
- targets: ['lrs-api:9090']
scrape_interval: 15s
Alerting rules:
# alerts.yml
groups:
- name: lrs_agents
rules:
- alert: HighFailureRate
expr: rate(lrs_tool_executions_total{status="failure"}[5m]) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "High tool failure rate"
- alert: LowPrecision
expr: lrs_precision_value{level="execution"} < 0.3
for: 10m
labels:
severity: warning
annotations:
summary: "Agent precision consistently low"
- alert: ServiceDown
expr: up{job="lrs-agents"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "LRS-Agents service is down"
Dashboard
Run Streamlit dashboard for real-time monitoring:
# In separate container/pod
streamlit run lrs/monitoring/dashboard.py --server.port=8501
Grafana dashboards:
{
"dashboard": {
"title": "LRS-Agents Monitoring",
"panels": [
{
"title": "Precision Over Time",
"targets": [
{
"expr": "lrs_precision_value{level=\"execution\"}"
}
]
},
{
"title": "Tool Success Rate",
"targets": [
{
"expr": "rate(lrs_tool_executions_total{status=\"success\"}[5m]) / rate(lrs_tool_executions_total[5m])"
}
]
},
{
"title": "Adaptation Events",
"targets": [
{
"expr": "rate(lrs_adaptation_events_total[5m])"
}
]
}
]
}
}
Database Management
Schema Setup
Initialize production database:
# Run migrations
psql $DATABASE_URL < docker/init.sql
# Or use migration tool
alembic upgrade head
Connection Pooling
Configure connection pooling:
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=20, # Connections per instance
max_overflow=10, # Additional connections
pool_timeout=30, # Wait timeout
pool_recycle=3600, # Recycle connections after 1 hour
pool_pre_ping=True # Verify connections before use
)
Backup Strategy
Automated backups:
#!/bin/bash
# backup.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="lrs_backup_$DATE.sql"
# Create backup
pg_dump $DATABASE_URL > $BACKUP_FILE
# Compress
gzip $BACKUP_FILE
# Upload to S3
aws s3 cp $BACKUP_FILE.gz s3://lrs-backups/
# Cleanup old backups (keep last 30 days)
find . -name "lrs_backup_*.sql.gz" -mtime +30 -delete
Schedule with cron:
# Daily backups at 2 AM
0 2 * * * /path/to/backup.sh
Security
API Authentication
Implement JWT authentication:
from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt
security = HTTPBearer()
def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
try:
payload = jwt.decode(
credentials.credentials,
SECRET_KEY,
algorithms=["HS256"]
)
return payload
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid token")
@app.post("/api/agent/run")
async def run_agent(task: str, token: dict = Depends(verify_token)):
# Execute agent
pass
Environment Variables
Securely manage secrets:
# Never commit secrets to version control
# Use environment variables or secret management
# Development
export ANTHROPIC_API_KEY="sk-ant-..."
# Production - Use secret management
# AWS Secrets Manager
aws secretsmanager get-secret-value --secret-id lrs/api-keys
# Kubernetes Secrets
kubectl create secret generic lrs-secrets \
--from-literal=anthropic-api-key=sk-ant-...
Rate Limiting
Implement rate limiting:
from fastapi import Request
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
@app.post("/api/agent/run")
@limiter.limit("10/minute") # 10 requests per minute
async def run_agent(request: Request, task: str):
# Execute agent
pass
Performance Optimization
Caching
Implement Redis caching:
import redis
import hashlib
import json
redis_client = redis.Redis(host='redis', port=6379, db=0)
def cache_agent_result(task: str, result: dict, ttl: int = 3600):
"""Cache agent execution result"""
cache_key = hashlib.md5(task.encode()).hexdigest()
redis_client.setex(cache_key, ttl, json.dumps(result))
def get_cached_result(task: str):
"""Get cached result if available"""
cache_key = hashlib.md5(task.encode()).hexdigest()
cached = redis_client.get(cache_key)
return json.loads(cached) if cached else None
# Usage
result = get_cached_result(task)
if not result:
result = agent.run(task)
cache_agent_result(task, result)
Async Execution
Use async for better throughput:
import asyncio
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=10)
async def run_agent_async(task: str):
"""Run agent in thread pool"""
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
executor,
agent.run,
task
)
return result
# Handle multiple requests concurrently
tasks = [run_agent_async(t) for t in task_list]
results = await asyncio.gather(*tasks)
Resource Limits
Set resource limits:
# Limit maximum iterations
result = agent.run(task, max_iterations=50)
# Timeout protection
import signal
def timeout_handler(signum, frame):
raise TimeoutError("Agent execution timeout")
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(300) # 5 minute timeout
try:
result = agent.run(task)
except TimeoutError:
logger.error("Agent execution timed out")
finally:
signal.alarm(0)
Health Checks
Implement health check endpoint:
from fastapi import FastAPI, status
from sqlalchemy import text
app = FastAPI()
@app.get("/health")
async def health_check():
"""Health check endpoint"""
health = {
"status": "healthy",
"version": "0.2.0",
"checks": {}
}
# Check database
try:
with engine.connect() as conn:
conn.execute(text("SELECT 1"))
health["checks"]["database"] = "ok"
except Exception as e:
health["status"] = "unhealthy"
health["checks"]["database"] = f"error: {str(e)}"
# Check Redis
try:
redis_client.ping()
health["checks"]["redis"] = "ok"
except Exception as e:
health["status"] = "unhealthy"
health["checks"]["redis"] = f"error: {str(e)}"
# Check API keys
if not os.getenv("ANTHROPIC_API_KEY"):
health["status"] = "unhealthy"
health["checks"]["api_keys"] = "missing"
else:
health["checks"]["api_keys"] = "ok"
status_code = (
status.HTTP_200_OK if health["status"] == "healthy"
else status.HTTP_503_SERVICE_UNAVAILABLE
)
return health, status_code
Troubleshooting
Common Issues
High Memory Usage:
# Check memory usage
kubectl top pods -n lrs-agents
# Increase memory limits
# Update deployment.yaml and apply
Database Connection Errors:
# Enable connection pooling
# Add pool_pre_ping=True
# Increase pool_size
Slow Response Times:
# Check logs for slow operations
kubectl logs -f deployment/lrs-agents -n lrs-agents | grep "execution_time"
# Enable caching
# Scale horizontally
Debug Mode
Enable debug logging:
# Set environment variable
export LOG_LEVEL=DEBUG
# Or in Kubernetes
kubectl set env deployment/lrs-agents LOG_LEVEL=DEBUG -n lrs-agents
Checklist
Pre-deployment:
[ ] API keys configured
[ ] Database initialized
[ ] Secrets properly managed
[ ] Resource limits set
[ ] Health checks implemented
[ ] Monitoring configured
[ ] Logging set up
[ ] Backups automated
[ ] Rate limiting enabled
[ ] Load balancing configured
Post-deployment:
[ ] Health checks passing
[ ] Metrics being collected
[ ] Logs aggregating correctly
[ ] Alerts configured
[ ] Dashboard accessible
[ ] Performance acceptable
[ ] Error rate within limits
Next Steps
Set up monitoring with Prometheus/Grafana
Configure log aggregation (ELK, Datadog)
Implement CI/CD pipeline
Load test your deployment
Document runbooks for common issues
Set up on-call rotation
Further Reading
Monitoring API - Monitoring API reference
../tutorials/07_production_deployment - Production tutorial
Kubernetes documentation
Docker best practices
Prometheus operator guide