Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install whatsapp-voice-chat-integration-open-source或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install whatsapp-voice-chat-integration-open-source⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/whatsapp-voice-chat-integration-open-source/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: whatsapp-voice-talk description: Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS. ---
Turn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: voice → transcription → intent detection → response generation → text-to-speech.
Perfect for:
pip install openai-whisper soundfile numpy
const { processVoiceNote } = require('./scripts/voice-processor');
const fs = require('fs');
// Read a voice message (OGG, WAV, MP3, etc.)
const buffer = fs.readFileSync('voice-message.ogg');
// Process it
const result = await processVoiceNote(buffer);
console.log(result);
// {
// status: 'success',
// response: "Current weather in Delhi is 19°C, haze. Humidity is 56%.",
// transcript: "What's the weather today?",
// intent: 'weather',
// language: 'en',
// timestamp: 1769860205186
// }
For automatic processing of incoming WhatsApp voice messages:
node scripts/voice-listener-daemon.js
This watches ~/.clawdbot/media/inbound/ every 5 seconds and processes new voice files.
Incoming Voice Message
↓
Transcribe (Whisper API)
↓
"What's the weather?"
↓
Detect Language & Intent
↓
Match against INTENTS
↓
Execute Handler
↓
Generate Response
↓
Convert to TTS
↓
Send back via WhatsApp
✅ Zero Setup Complexity - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.
✅ Multi-Language - Automatic English/Hindi detection. Extend easily.
✅ Intent-Driven - Define custom intents with keywords and handlers.
✅ Real-Time Processing - 5-10 seconds per message (after first model load).
✅ Customizable - Add weather, status, commands, or anything else.
✅ Production Ready - Built from real usage in Clawdbot.
// User says: "What's the weather in Bangalore?"
// Response: "Current weather in Delhi is 19°C..."
// (Built-in intent, just enable it)
// User says: "Turn on the lights"
// Handler: Sends signal to smart home API
// Response: "Lights turned on"
// User says: "Add milk to shopping list"
// Handler: Adds to database
// Response: "Added milk to your list"
// User says: "Is the system running?"
// Handler: Checks system status
// Response: "All systems online"
Edit voice-processor.js:
const INTENTS = {
'shopping': {
keywords: ['shopping', 'list', 'buy', 'खरीद'],
handler: 'handleShopping'
}
};
const handlers = {
async handleShopping(language = 'en') {
return {
status: 'success',
response: language === 'en'
? "What would you like to add to your shopping list?"
: "आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?"
};
}
};
detectLanguage() for your language's Unicode:const urduChars = /[\u0600-\u06FF]/g; // Add this
return language === 'ur' ? 'Urdu response' : 'English response';
transcribe.py:result = model.transcribe(data, language="ur")
In transcribe.py:
model = whisper.load_model("tiny") # Fastest, 39MB
model = whisper.load_model("base") # Default, 140MB
model = whisper.load_model("small") # Better, 466MB
model = whisper.load_model("medium") # Good, 1.5GB
Scripts:
transcribe.py - Whisper transcription (Python)voice-processor.js - Core logic (intent parsing, handlers)voice-listener-daemon.js - Auto-listener watching for new messagesReferences:
SETUP.md - Installation and configurationAPI.md - Detailed function documentationIf running as a Clawdbot skill, hook into message events:
// In your Clawdbot handler
const { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');
message.on('voice', async (audioBuffer) => {
const result = await processVoiceNote(audioBuffer, message.from);
// Send response back
await message.reply(result.response);
// Or send as voice (requires TTS)
await sendVoiceMessage(result.response);
});
OGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.
WhatsApp uses Opus-coded OGG by default — works out of the box.
"No module named 'whisper'"
pip install openai-whisper
"No module named 'soundfile'"
pip install soundfile
Voice messages not processing?
clawdbot status (is it running?)~/.clawdbot/media/inbound/ (files arriving?)node scripts/voice-listener-daemon.js (see logs)Slow transcription? Use smaller model: whisper.load_model("base") or "tiny"
references/SETUP.md for detailed installation and configurationreferences/API.md for function signatures and examplesscripts/ for working codeMIT - Use freely, customize, contribute back!
---
Built for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.
安装 whatsappVoiceOpenSkill 后,可以对 AI 说这些话来触发它
Send a Slack message to the #engineering channel about the deployment
Formats and sends the message with relevant context, tagging the right people
Summarize all unread messages in my inbox from today
Reads messages across connected channels and returns a prioritized summary
Draft a reply to this customer complaint and send it for review
Writes an empathetic, professional response and routes it to the approval queue
将技能文件夹放到 ~/.claude/skills/whatsapp-voice-chat-integration-open-source/ 目录(个人级,所有项目可用),或 .claude/skills/whatsapp-voice-chat-integration-open-source/(项目级)。重启 AI 客户端后,用 /whatsapp-voice-chat-integration-open-source 主动调用,或让 AI 根据上下文自动发现并使用。
whatsappVoiceOpenSkill 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
whatsappVoiceOpenSkill 可免费安装使用。请查阅仓库了解许可证信息。
Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
whatsappVoiceOpenSkill 属于「Communication」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。