Skip to content

Redis Rollback & Troubleshooting Guide

🔄 Rollback Procedures

Se algo der errado com o Redis, você tem várias opções de rollback:


Option 1: Fallback Instantâneo (Environment Variable)

Mais rápido: Apenas mude a variável de ambiente, sem mudança de código.

Local Development

# Edit .env
SESSION_BACKEND=inmemory    # Fastest, no persistence
# OR
SESSION_BACKEND=cloudsql    # Slower, but works

# Restart bot
python slack_bot.py

Production (Cloud Run)

# Update env var via Cloud Console:
# SESSION_BACKEND=cloudsql

# Or via gcloud:
gcloud run services update ifriend-slack-bot \
  --region=us-central1 \
  --set-env-vars=SESSION_BACKEND=cloudsql

Vantagens: - ✅ Sem mudança de código - ✅ Rollback em < 1 minuto - ✅ Zero downtime

Desvantagens: - ⚠️ CloudSQL ainda é lento (~800ms overhead) - ⚠️ InMemory perde sessões ao reiniciar


Option 2: Git Revert (Código Anterior)

Rollback completo: Volta para o código antes do Redis.

# Check current commit
git log --oneline -5

# Find commit antes do Redis
# Exemplo: abc1234 "feat: implement Redis SessionService"

# Revert to previous commit
git revert abc1234

# Or reset hard (cuidado: perde mudanças)
git reset --hard HEAD~1

# Force push (se já fez push)
git push origin feature/redis --force

Vantagens: - ✅ Volta exatamente ao estado anterior - ✅ Remove dependência do Redis

Desvantagens: - ❌ Precisa rebuild/deploy - ❌ Downtime durante deploy


Option 3: Remove Redis Files (Manual Cleanup)

Se quiser remover completamente os arquivos Redis:

# Remove Redis services
rm ifriend_agent/session/redis_session_service.py
rm ifriend_agent/memory/redis_memory_service.py
rm ifriend_agent/config/backends.py

# Remove from __init__.py
# Edit: ifriend_agent/session/__init__.py
# Remove: from .redis_session_service import RedisSessionService

# Edit: ifriend_agent/memory/__init__.py
# Remove: from .redis_memory_service import RedisMemoryService

# Remove dependency
# Edit: ifriend_agent/requirements.txt
# Remove: redis[asyncio]>=5.0.0

# Restore slack_bot.py imports
# Edit: slack_bot.py
# Change:
from ifriend_agent.config.backends import get_session_service
# To:
from ifriend_agent.session import CloudSQLSessionService
from ifriend_agent.memory import CloudSQLMemoryService

# And restore:
session_service = CloudSQLSessionService(...)
memory_service = CloudSQLMemoryService(...)

🐛 Troubleshooting Guide

Issue 1: "Cannot connect to Redis"

Symptoms:

ERROR - ❌ Erro ao inicializar SessionService: Error connecting to Redis

Diagnosis:

# Test Redis connection
redis-cli -u $REDIS_URL ping

# Check REDIS_URL format
echo $REDIS_URL
# Should be: redis://host:6379/0
# NOT: localhost:6379 (missing scheme)

Solutions:

A) Fix REDIS_URL format:

# Wrong:
REDIS_URL=localhost:6379

# Correct:
REDIS_URL=redis://localhost:6379/0

B) Start Redis (if not running):

# Check if Redis running
docker ps | grep redis

# If not, start it:
docker run -d -p 6379:6379 redis:7-alpine

C) Fallback to CloudSQL:

SESSION_BACKEND=cloudsql
# Restart bot


Issue 2: "Redis timeout / slow response"

Symptoms:

WARNING - Redis operation took 500ms (expected < 10ms)

Diagnosis:

# Check Redis latency
redis-cli --latency
# Should be < 1ms locally

# Check Redis info
redis-cli INFO stats

Solutions:

A) Check Redis memory usage:

redis-cli INFO memory

# If memory full, increase or clear:
redis-cli FLUSHDB  # WARNING: Deletes all data

B) Reduce TTL (less data):

# Edit .env
REDIS_SESSION_TTL=1800  # 30min instead of 1 hour

C) Switch to InMemory (dev only):

SESSION_BACKEND=inmemory


Issue 3: "Sessions not persisting"

Symptoms:

User: @Bot continuando conversa anterior
Bot: Desculpe, não tenho contexto anterior

Diagnosis:

# Check if sessions are being created
redis-cli KEYS "session:*"

# Check TTL
redis-cli TTL "session:slack_C123_U456_T789:meta"
# If -2: key doesn't exist
# If -1: no expiration (wrong)
# If >0: expires in N seconds (correct)

# Check events
redis-cli LLEN "session:slack_C123_U456_T789:events"

Solutions:

A) Verify REDIS_SESSION_TTL:

# Edit .env
REDIS_SESSION_TTL=3600  # Should be number, not empty

# Check in bot logs:
python slack_bot.py 2>&1 | grep TTL

B) Check Redis persistence config:

# Redis config
redis-cli CONFIG GET save
# Should have persistence enabled

# If not:
docker run -d -p 6379:6379 redis:7-alpine redis-server --save 60 1

C) Fallback to CloudSQL (guaranteed persistence):

SESSION_BACKEND=cloudsql


Issue 4: "Performance not improving"

Symptoms:

Bot still slow (~2s response time)

Diagnosis:

# Verify backend in use
python slack_bot.py 2>&1 | grep "Session Backend"
# Should show: Session Backend: redis

# Run benchmark
python benchmark_session_performance.py

# Check Redis response time
redis-cli --latency-history

Solutions:

A) Verify Redis is actually being used:

# Check .env
cat .env | grep SESSION_BACKEND
# Should be: SESSION_BACKEND=redis

# Monitor Redis
redis-cli MONITOR
# Run bot, should see RPUSH commands

B) Check network latency (Cloud Run → Memorystore):

# From Cloud Run pod:
gcloud run services describe ifriend-slack-bot \
  --region=us-central1

# Verify VPC connector configured:
--vpc-connector=ifriend-connector

C) Check if CloudSQL still being used for session:

# Check logs
gcloud run logs read ifriend-slack-bot | grep -i cloudsql

# Should only see for Memory, not Session


Issue 5: "Redis memory usage too high"

Symptoms:

Redis using > 80% memory

Diagnosis:

# Check memory
redis-cli INFO memory | grep used_memory_human

# Check number of keys
redis-cli DBSIZE

# Check largest keys
redis-cli --bigkeys

Solutions:

A) Reduce TTL (faster cleanup):

# Edit .env
REDIS_SESSION_TTL=1800  # 30min instead of 1 hour

B) Manual cleanup old sessions:

# Delete sessions older than 1 hour
redis-cli --scan --pattern "session:*" | xargs redis-cli DEL

C) Increase Redis memory:

# Google Cloud Memorystore
gcloud redis instances update ifriend-redis \
  --size=2 \
  --region=us-central1


🚨 Emergency Procedures

Critical Issue: Bot Completely Down

# 1. IMMEDIATE: Switch to inmemory (fastest recovery)
SESSION_BACKEND=inmemory
python slack_bot.py

# 2. Investigate Redis
redis-cli ping
docker logs ifriend-redis

# 3. Fallback to CloudSQL if needed
SESSION_BACKEND=cloudsql
python slack_bot.py

Production Outage

# 1. IMMEDIATE: Update Cloud Run env var
gcloud run services update ifriend-slack-bot \
  --region=us-central1 \
  --set-env-vars=SESSION_BACKEND=cloudsql

# 2. Check Memorystore status
gcloud redis instances list

# 3. Check VPC connector
gcloud compute networks vpc-access connectors list

# 4. Check logs
gcloud run logs read ifriend-slack-bot --limit 100

📊 Health Checks

Pre-Deployment Checklist

# ✅ Redis running
redis-cli ping  # Should return: PONG

# ✅ Connection working
python -c "
import asyncio
from ifriend_agent.config.backends import get_session_service
async def test():
    svc = get_session_service('redis')
    s = await svc.create_session('test', 'u1')
    print(f'✅ {s.id}')
asyncio.run(test())
"

# ✅ Performance good
python benchmark_session_performance.py | grep "Average"
# Should show < 10ms for Redis

# ✅ Bot starts
python slack_bot.py 2>&1 | head -20
# Should show: "Redis: Performance otimizada"

Post-Deployment Monitoring

# Check response times
gcloud run logs read ifriend-slack-bot | grep "⚡ Redis"

# Check error rate
gcloud run logs read ifriend-slack-bot | grep ERROR | wc -l

# Check Memorystore metrics
gcloud redis instances describe ifriend-redis \
  --region=us-central1

🔧 Configuration Validation

Validate .env File

# Check required vars
cat .env | grep -E "SESSION_BACKEND|REDIS_URL|REDIS_SESSION_TTL"

# Should output:
# SESSION_BACKEND=redis
# REDIS_URL=redis://localhost:6379/0
# REDIS_SESSION_TTL=3600

# Test format
python -c "
import os
from dotenv import load_dotenv
load_dotenv()
assert os.getenv('SESSION_BACKEND') in ['redis', 'cloudsql', 'inmemory']
assert os.getenv('REDIS_URL').startswith('redis://')
assert int(os.getenv('REDIS_SESSION_TTL')) > 0
print('✅ Config valid')
"

📞 Support Escalation

Level 1: Environment Variable Fallback (< 1 min)

SESSION_BACKEND=inmemory  # Fastest
# or
SESSION_BACKEND=cloudsql  # Persistent

Level 2: Restart Services (< 5 min)

# Local
docker restart ifriend-redis
python slack_bot.py

# Production
gcloud redis instances reboot ifriend-redis --region=us-central1
gcloud run services update ifriend-slack-bot --region=us-central1

Level 3: Full Rollback (< 15 min)

git revert <redis-commit>
gcloud builds submit --config cloudbuild.slack.yaml

📋 Rollback Success Criteria

After rollback, verify:

  • ✅ Bot responds to Slack messages
  • ✅ No errors in logs
  • ✅ Sessions persist across restarts (if using cloudsql)
  • ✅ Response time acceptable (even if slower)

💡 Tips

Quick Backend Switch (No Code Change)

# Development
export SESSION_BACKEND=inmemory && python slack_bot.py

# Production
gcloud run services update ifriend-slack-bot \
  --update-env-vars=SESSION_BACKEND=cloudsql

Monitor Real-Time

# Watch Redis activity
redis-cli MONITOR

# Watch bot logs
tail -f <(python slack_bot.py)

# Watch Cloud Run logs
gcloud run logs tail ifriend-slack-bot --region=us-central1

Test Before Full Deploy

# 1. Test locally first
SESSION_BACKEND=redis python slack_bot.py

# 2. Test with staging environment
gcloud run deploy ifriend-slack-bot-staging \
  --set-env-vars=SESSION_BACKEND=redis

# 3. Monitor for 1 hour

# 4. Then deploy to production

Remember: - ⚡ Fastest rollback: Change SESSION_BACKEND env var - 🔒 Safest rollback: Git revert + redeploy - 🎯 Best practice: Test locally before production deploy