Redis Rollback & Troubleshooting Guide¶
🔄 Rollback Procedures¶
Se algo der errado com o Redis, você tem várias opções de rollback:
Option 1: Fallback Instantâneo (Environment Variable)¶
Mais rápido: Apenas mude a variável de ambiente, sem mudança de código.
Local Development¶
# Edit .env
SESSION_BACKEND=inmemory # Fastest, no persistence
# OR
SESSION_BACKEND=cloudsql # Slower, but works
# Restart bot
python slack_bot.py
Production (Cloud Run)¶
# Update env var via Cloud Console:
# SESSION_BACKEND=cloudsql
# Or via gcloud:
gcloud run services update ifriend-slack-bot \
--region=us-central1 \
--set-env-vars=SESSION_BACKEND=cloudsql
Vantagens: - ✅ Sem mudança de código - ✅ Rollback em < 1 minuto - ✅ Zero downtime
Desvantagens: - ⚠️ CloudSQL ainda é lento (~800ms overhead) - ⚠️ InMemory perde sessões ao reiniciar
Option 2: Git Revert (Código Anterior)¶
Rollback completo: Volta para o código antes do Redis.
# Check current commit
git log --oneline -5
# Find commit antes do Redis
# Exemplo: abc1234 "feat: implement Redis SessionService"
# Revert to previous commit
git revert abc1234
# Or reset hard (cuidado: perde mudanças)
git reset --hard HEAD~1
# Force push (se já fez push)
git push origin feature/redis --force
Vantagens: - ✅ Volta exatamente ao estado anterior - ✅ Remove dependência do Redis
Desvantagens: - ❌ Precisa rebuild/deploy - ❌ Downtime durante deploy
Option 3: Remove Redis Files (Manual Cleanup)¶
Se quiser remover completamente os arquivos Redis:
# Remove Redis services
rm ifriend_agent/session/redis_session_service.py
rm ifriend_agent/memory/redis_memory_service.py
rm ifriend_agent/config/backends.py
# Remove from __init__.py
# Edit: ifriend_agent/session/__init__.py
# Remove: from .redis_session_service import RedisSessionService
# Edit: ifriend_agent/memory/__init__.py
# Remove: from .redis_memory_service import RedisMemoryService
# Remove dependency
# Edit: ifriend_agent/requirements.txt
# Remove: redis[asyncio]>=5.0.0
# Restore slack_bot.py imports
# Edit: slack_bot.py
# Change:
from ifriend_agent.config.backends import get_session_service
# To:
from ifriend_agent.session import CloudSQLSessionService
from ifriend_agent.memory import CloudSQLMemoryService
# And restore:
session_service = CloudSQLSessionService(...)
memory_service = CloudSQLMemoryService(...)
🐛 Troubleshooting Guide¶
Issue 1: "Cannot connect to Redis"¶
Symptoms:¶
ERROR - ❌ Erro ao inicializar SessionService: Error connecting to Redis
Diagnosis:¶
# Test Redis connection
redis-cli -u $REDIS_URL ping
# Check REDIS_URL format
echo $REDIS_URL
# Should be: redis://host:6379/0
# NOT: localhost:6379 (missing scheme)
Solutions:¶
A) Fix REDIS_URL format:
# Wrong:
REDIS_URL=localhost:6379
# Correct:
REDIS_URL=redis://localhost:6379/0
B) Start Redis (if not running):
# Check if Redis running
docker ps | grep redis
# If not, start it:
docker run -d -p 6379:6379 redis:7-alpine
C) Fallback to CloudSQL:
SESSION_BACKEND=cloudsql
# Restart bot
Issue 2: "Redis timeout / slow response"¶
Symptoms:¶
WARNING - Redis operation took 500ms (expected < 10ms)
Diagnosis:¶
# Check Redis latency
redis-cli --latency
# Should be < 1ms locally
# Check Redis info
redis-cli INFO stats
Solutions:¶
A) Check Redis memory usage:
redis-cli INFO memory
# If memory full, increase or clear:
redis-cli FLUSHDB # WARNING: Deletes all data
B) Reduce TTL (less data):
# Edit .env
REDIS_SESSION_TTL=1800 # 30min instead of 1 hour
C) Switch to InMemory (dev only):
SESSION_BACKEND=inmemory
Issue 3: "Sessions not persisting"¶
Symptoms:¶
User: @Bot continuando conversa anterior
Bot: Desculpe, não tenho contexto anterior
Diagnosis:¶
# Check if sessions are being created
redis-cli KEYS "session:*"
# Check TTL
redis-cli TTL "session:slack_C123_U456_T789:meta"
# If -2: key doesn't exist
# If -1: no expiration (wrong)
# If >0: expires in N seconds (correct)
# Check events
redis-cli LLEN "session:slack_C123_U456_T789:events"
Solutions:¶
A) Verify REDIS_SESSION_TTL:
# Edit .env
REDIS_SESSION_TTL=3600 # Should be number, not empty
# Check in bot logs:
python slack_bot.py 2>&1 | grep TTL
B) Check Redis persistence config:
# Redis config
redis-cli CONFIG GET save
# Should have persistence enabled
# If not:
docker run -d -p 6379:6379 redis:7-alpine redis-server --save 60 1
C) Fallback to CloudSQL (guaranteed persistence):
SESSION_BACKEND=cloudsql
Issue 4: "Performance not improving"¶
Symptoms:¶
Bot still slow (~2s response time)
Diagnosis:¶
# Verify backend in use
python slack_bot.py 2>&1 | grep "Session Backend"
# Should show: Session Backend: redis
# Run benchmark
python benchmark_session_performance.py
# Check Redis response time
redis-cli --latency-history
Solutions:¶
A) Verify Redis is actually being used:
# Check .env
cat .env | grep SESSION_BACKEND
# Should be: SESSION_BACKEND=redis
# Monitor Redis
redis-cli MONITOR
# Run bot, should see RPUSH commands
B) Check network latency (Cloud Run → Memorystore):
# From Cloud Run pod:
gcloud run services describe ifriend-slack-bot \
--region=us-central1
# Verify VPC connector configured:
--vpc-connector=ifriend-connector
C) Check if CloudSQL still being used for session:
# Check logs
gcloud run logs read ifriend-slack-bot | grep -i cloudsql
# Should only see for Memory, not Session
Issue 5: "Redis memory usage too high"¶
Symptoms:¶
Redis using > 80% memory
Diagnosis:¶
# Check memory
redis-cli INFO memory | grep used_memory_human
# Check number of keys
redis-cli DBSIZE
# Check largest keys
redis-cli --bigkeys
Solutions:¶
A) Reduce TTL (faster cleanup):
# Edit .env
REDIS_SESSION_TTL=1800 # 30min instead of 1 hour
B) Manual cleanup old sessions:
# Delete sessions older than 1 hour
redis-cli --scan --pattern "session:*" | xargs redis-cli DEL
C) Increase Redis memory:
# Google Cloud Memorystore
gcloud redis instances update ifriend-redis \
--size=2 \
--region=us-central1
🚨 Emergency Procedures¶
Critical Issue: Bot Completely Down¶
# 1. IMMEDIATE: Switch to inmemory (fastest recovery)
SESSION_BACKEND=inmemory
python slack_bot.py
# 2. Investigate Redis
redis-cli ping
docker logs ifriend-redis
# 3. Fallback to CloudSQL if needed
SESSION_BACKEND=cloudsql
python slack_bot.py
Production Outage¶
# 1. IMMEDIATE: Update Cloud Run env var
gcloud run services update ifriend-slack-bot \
--region=us-central1 \
--set-env-vars=SESSION_BACKEND=cloudsql
# 2. Check Memorystore status
gcloud redis instances list
# 3. Check VPC connector
gcloud compute networks vpc-access connectors list
# 4. Check logs
gcloud run logs read ifriend-slack-bot --limit 100
📊 Health Checks¶
Pre-Deployment Checklist¶
# ✅ Redis running
redis-cli ping # Should return: PONG
# ✅ Connection working
python -c "
import asyncio
from ifriend_agent.config.backends import get_session_service
async def test():
svc = get_session_service('redis')
s = await svc.create_session('test', 'u1')
print(f'✅ {s.id}')
asyncio.run(test())
"
# ✅ Performance good
python benchmark_session_performance.py | grep "Average"
# Should show < 10ms for Redis
# ✅ Bot starts
python slack_bot.py 2>&1 | head -20
# Should show: "Redis: Performance otimizada"
Post-Deployment Monitoring¶
# Check response times
gcloud run logs read ifriend-slack-bot | grep "⚡ Redis"
# Check error rate
gcloud run logs read ifriend-slack-bot | grep ERROR | wc -l
# Check Memorystore metrics
gcloud redis instances describe ifriend-redis \
--region=us-central1
🔧 Configuration Validation¶
Validate .env File¶
# Check required vars
cat .env | grep -E "SESSION_BACKEND|REDIS_URL|REDIS_SESSION_TTL"
# Should output:
# SESSION_BACKEND=redis
# REDIS_URL=redis://localhost:6379/0
# REDIS_SESSION_TTL=3600
# Test format
python -c "
import os
from dotenv import load_dotenv
load_dotenv()
assert os.getenv('SESSION_BACKEND') in ['redis', 'cloudsql', 'inmemory']
assert os.getenv('REDIS_URL').startswith('redis://')
assert int(os.getenv('REDIS_SESSION_TTL')) > 0
print('✅ Config valid')
"
📞 Support Escalation¶
Level 1: Environment Variable Fallback (< 1 min)¶
SESSION_BACKEND=inmemory # Fastest
# or
SESSION_BACKEND=cloudsql # Persistent
Level 2: Restart Services (< 5 min)¶
# Local
docker restart ifriend-redis
python slack_bot.py
# Production
gcloud redis instances reboot ifriend-redis --region=us-central1
gcloud run services update ifriend-slack-bot --region=us-central1
Level 3: Full Rollback (< 15 min)¶
git revert <redis-commit>
gcloud builds submit --config cloudbuild.slack.yaml
📋 Rollback Success Criteria¶
After rollback, verify:
- ✅ Bot responds to Slack messages
- ✅ No errors in logs
- ✅ Sessions persist across restarts (if using cloudsql)
- ✅ Response time acceptable (even if slower)
💡 Tips¶
Quick Backend Switch (No Code Change)¶
# Development
export SESSION_BACKEND=inmemory && python slack_bot.py
# Production
gcloud run services update ifriend-slack-bot \
--update-env-vars=SESSION_BACKEND=cloudsql
Monitor Real-Time¶
# Watch Redis activity
redis-cli MONITOR
# Watch bot logs
tail -f <(python slack_bot.py)
# Watch Cloud Run logs
gcloud run logs tail ifriend-slack-bot --region=us-central1
Test Before Full Deploy¶
# 1. Test locally first
SESSION_BACKEND=redis python slack_bot.py
# 2. Test with staging environment
gcloud run deploy ifriend-slack-bot-staging \
--set-env-vars=SESSION_BACKEND=redis
# 3. Monitor for 1 hour
# 4. Then deploy to production
Remember:
- ⚡ Fastest rollback: Change SESSION_BACKEND env var
- 🔒 Safest rollback: Git revert + redeploy
- 🎯 Best practice: Test locally before production deploy