Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install mm-voice-maker或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install mm-voice-maker⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/mm-voice-maker/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: mm-voice-maker description: Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creating custom voices, or processing/merging audio. ---
Professional text-to-speech skill with emotion detection, voice cloning, and audio processing capabilities powered by MiniMax Voice API and FFmpeg.
| Area | Features | |------|----------| | TTS | Sync (HTTP/WebSocket), async (long text), streaming | | Segment-based | Multi-voice, multi-emotion synthesis from segments.json, auto merge | | Voice | Cloning (10s–5min), design (text prompt), management | | Audio | Format conversion, merge, normalize, trim, remove silence (FFmpeg) |
mmVoice_Maker/
├── SKILL.md # This overview
├── mmvoice.py # CLI tool (recommended for Agents)
├── check_environment.py # Environment verification
├── requirements.txt
├── scripts/ # Entry: scripts/__init__.py
│ ├── utils.py # Config, data classes
│ ├── sync_tts.py # HTTP/WebSocket TTS
│ ├── async_tts.py # Long text TTS
│ ├── segment_tts.py # Segment-based TTS (multi-voice, multi-emotion)
│ ├── voice_clone.py # Voice cloning
│ ├── voice_design.py # Voice design
│ ├── voice_management.py # List/delete voices
│ └── audio_processing.py # FFmpeg audio tools
└── reference/ # Load as needed
├── cli-guide.md # CLI usage guide
├── getting-started.md # Setup and quick test
├── tts-guide.md # Sync/async TTS workflows
├── voice-guide.md # Clone/design/manage
├── audio-guide.md # Audio processing
├── script-examples.md # Runnable code snippets
├── troubleshooting.md # Common issues
├── api_documentation.md # Complete API reference
└── voice_catalog.md # Voice selection guide
6-step workflow: [step1]. Verify environment
[step2-preparation]⚠️NOTE: Before processing the text, you must read voice-catalog.md for voice selection.
[step2]. Process text into script → . Note: [Step2.4] is really important, you must check it twice before sending the script to the user.
[step2.5]. ⚠️ Generate preview for user confirmation (highly recommended for multi-voice content)
[step3]. Present plan to user for confirmation
[step4]. Validate segments.json
[step5]. Generate and merge audio → intermediate files in , final output in
[step6]. ⚠️ CRITICAL: User confirms audio quality FIRST → THEN cleanup temp files (only after user is satisfied)
> is Claude's current working directory (not the skill directory). Audio files are saved relative to where Claude is running commands.
python check_environment.py
Checks:
If API key is not set, ask user for keys and set it:
export MINIMAX_VOICE_API_KEY="your-api-key-here"
⚠️ MOST IMPORTANT PRINCIPLE: Gender Matching First
Before selecting voices, you MUST always match gender first. This is non-negotiable.
Golden Rule: > If a character is male → use male voice > If a character is female → use female voice > If a character is neutral/other → choose appropriate neutral voice
Why this matters:
Examples: | Character | Wrong Voice | Correct Voice | |-----------|-------------|---------------| | 唐三藏 (male monk) | female-yujie ❌ | Chinese (Mandarin)_Gentleman ✅ | | 林黛玉 (female) | male-qn-badao ❌ | female-shaonv ✅ | | 曹操 (male warlord) | female-chengshu ❌ | Chinese (Mandarin)_Unrestrained_Young_Man ✅ |
Decision guide: Evaluate based on:
emotion empty)Use case scenarios:
| Scenario | Description | Segments | Voice Selection | |----------|-------------|----------|-----------------| | Single Voice | User needs one voice for the entire content. Segment only by length (≤1,000,000 chars per segment). | Split by length only | One voice_id for all segments | | Multi-Voice | Multiple characters/speakers, each with different voice. Segment by speaker/role changes. | Split by logical unit (speaker, dialogue, etc.) | Different voice_id per role | | Podcast/Interview | Host and guest speakers with distinct voices. | Split by speaker | Voice per host/guest | | Audiobook/Fiction | Narrator and character voices. | Split by narration vs. dialogue | Voice per narrator/character | | Documentary | Mostly narration with occasional quotes. | Keep as one segment | Single narrator voice | | Report/Announcement | Formal content with consistent tone. | Keep as one segment | Professional voice |
Processing Workflow (4 sub-steps):
Step 2.1: Text Segmentation and Role Analysis First, segment your text into logical units and identify the role/character for each segment.
Key principle (Important!): Split by logical unit, NOT simply by sentence
When to split (Important!):
When NOT to split (Important!):
Decision depends on use case:
| Use case | Example | Split strategy | |----------|---------|----------------| | Single Voice | Long article, news piece, announcement | Split by length (≤1,000,000 chars), same voice for all | | Podcast/Interview | "Host: Welcome to the show. Guest: Thank you for having me." | Split by speaker | | Documentary narration | "The scientist explained, 'The results are promising.'" | Keep as one segment (narrator voice) | | Audiobook/Fiction | "'Who's there?' she whispered." | Split: "'Who's there?'" should be in character voice, while "she whispered." should be in narrator's voice | | Report | "According to the report, the economy is growing." | Keep as one segment |
Example1: Single Voice (speech-2.8) For single-voice content (e.g., news, announcements, articles), segment only by length while maintaining the same voice:
[
{"text": "First part of the article (under 1,000,000 chars)...", "role": "narrator", "voice_id": "female-shaonv", "emotion": ""},
{"text": "Second part of the article (under 1,000,000 chars)...", "role": "narrator", "voice_id": "female-shaonv", "emotion": ""},
{"text": "Third part of the article (under 1,000,000 chars)...", "role": "narrator", "voice_id": "female-shaonv", "emotion": ""}
]
...
安装 mmVoiceMaker 后,可以对 AI 说这些话来触发它
Help me get started with mmVoiceMaker
Explains what mmVoiceMaker does, walks through the setup, and runs a quick demo based on your current project
Use mmVoiceMaker to voice synthesis, voice cloning, voice design, and audio post-proces...
Invokes mmVoiceMaker with the right parameters and returns the result directly in the conversation
What can I do with mmVoiceMaker in my design & creative workflow?
Lists the top use cases for mmVoiceMaker, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/mm-voice-maker/ 目录(个人级,所有项目可用),或 .claude/skills/mm-voice-maker/(项目级)。重启 AI 客户端后,用 /mm-voice-maker 主动调用,或让 AI 根据上下文自动发现并使用。
mmVoiceMaker 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
mmVoiceMaker 可免费安装使用。请查阅仓库了解许可证信息。
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
mmVoiceMaker 属于「Design & Creative」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my design & creative tasks using mmVoiceMaker
Identifies repetitive steps in your workflow and sets up mmVoiceMaker to handle them automatically