Message Types¶
Enum: MessageType¶
Defined in messaging/base.py:
| Value | Description |
|---|---|
| TEXT | Plain text message |
| IMAGE | Image message |
| FILE | Document/file attachment |
| AUDIO | Audio/voice message |
| VIDEO | Video message |
Processing Rules¶
AUDIO¶
- If
MessageType.AUDIOANDmessage.textis empty → audio is transcribed via Gemini 2.5 Flash (Vertex AI), transcript replacesmessage.text - If
MessageType.AUDIOANDmessage.textis present → audio is IGNORED, text is processed as-is - Metadata keys:
audio_data(bytes),audio_mime_type(str)
FILE¶
- File/DOCUMENT messages keep their text (if any) — the text is treated as a question about the file
- If no text, default prompt is "Analise este arquivo."
- File bytes are passed as
Part.from_bytes()to ADK — Gemini 2.5 Flash natively processes PDFs, images, CSVs, DOCX, etc. - Text is passed as
Part.from_text() - Metadata keys:
file_data(bytes),file_mime_type(str),file_name(str)
TEXT¶
- Processed normally as the user's message
Adapter Implementation¶
WhatsApp Official (audio)¶
- Detects
type:"audio"→ calls_download_media(media_id)→ 2 API calls (GET media info → GET download URL) - Detects
type:"document"→ same_download_media()flow
WhatsApp Evolution (audio)¶
- Detects
messageType:"audioMessage"or"ptt"→ calls_download_audio_media()via Evolution/message/getMediaendpoint - Detects
messageType:"documentMessage"→ calls_download_media_message()
Telegram (audio)¶
- Detects
message.voiceormessage.audio→ calls_download_audio(file_id)viabot.get_file()— only downloads if no text present - Detects
message.document→ calls_download_file(file_id)— keeps both text and file
WebChat (audio/file)¶
- Detects
audio_data(base64) +audio_mime_typein JSON body → decodes via_decode_audio() - Detects
file_data(base64) +file_mime_type+file_name→ decodes via_decode_file()(max 20MB)
Audio Transcriber¶
File: messaging/audio/transcriber.py
Uses Gemini 2.5 Flash via Vertex AI (not GCP Speech-to-Text):
- Direct API call (not through ADK) to avoid audio token cost in main conversation
- Language hint detection: checks first 200 bytes for Portuguese accent chars (ãáàâãéêíóôõúç) → hints ["pt-BR", "en-US", "es-ES"]
- Max file size: 10MB (configurable via AUDIO_MAX_BYTES)
- Returns None on failure (processor falls back to error message)