M

mmVoiceMaker

mm-voice-maker

Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...

数据来源：ClawHub。在 ClawSkills 查看

735下载量

3收藏数

0浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 mmVoiceMaker

--- name: mm-voice-maker description: Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creating custom voices, or processing/merging audio. ---

MiniMax Voice Maker

Professional text-to-speech skill with emotion detection, voice cloning, and audio processing capabilities powered by MiniMax Voice API and FFmpeg.

Capabilities

| Area | Features | |------|----------| | TTS | Sync (HTTP/WebSocket), async (long text), streaming | | Segment-based | Multi-voice, multi-emotion synthesis from segments.json, auto merge | | Voice | Cloning (10s–5min), design (text prompt), management | | Audio | Format conversion, merge, normalize, trim, remove silence (FFmpeg) |

File structure:

mmVoice_Maker/
├── SKILL.md                       # This overview
├── mmvoice.py                     # CLI tool (recommended for Agents)
├── check_environment.py           # Environment verification
├── requirements.txt
├── scripts/                       # Entry: scripts/__init__.py
│   ├── utils.py                   # Config, data classes
│   ├── sync_tts.py                # HTTP/WebSocket TTS
│   ├── async_tts.py               # Long text TTS
│   ├── segment_tts.py             # Segment-based TTS (multi-voice, multi-emotion)
│   ├── voice_clone.py             # Voice cloning
│   ├── voice_design.py            # Voice design
│   ├── voice_management.py        # List/delete voices
│   └── audio_processing.py        # FFmpeg audio tools
└── reference/                     # Load as needed
    ├── cli-guide.md               # CLI usage guide
    ├── getting-started.md         # Setup and quick test
    ├── tts-guide.md               # Sync/async TTS workflows
    ├── voice-guide.md             # Clone/design/manage
    ├── audio-guide.md             # Audio processing
    ├── script-examples.md         # Runnable code snippets
    ├── troubleshooting.md         # Common issues
    ├── api_documentation.md       # Complete API reference
    └── voice_catalog.md           # Voice selection guide

Main Workflow Guideline (Text to Speech)

6-step workflow: [step1]. Verify environment

[step2-preparation]⚠️NOTE: Before processing the text, you must read voice-catalog.md for voice selection.

[step2]. Process text into script → /audio/segments.json. Note: [Step2.4] is really important, you must check it twice before sending the script to the user.

[step2.5]. ⚠️ Generate preview for user confirmation (highly recommended for multi-voice content)

[step3]. Present plan to user for confirmation

[step4]. Validate segments.json

[step5]. Generate and merge audio → intermediate files in /audio/tmp/, final output in /audio/output.mp3

[step6]. ⚠️ CRITICAL: User confirms audio quality FIRST → THEN cleanup temp files (only after user is satisfied)

> is Claude's current working directory (not the skill directory). Audio files are saved relative to where Claude is running commands.

Step 1: Verify environment

python check_environment.py

Checks:

Python 3.8+
Required packages (requests, websockets)
FFmpeg installation
MINIMAX_VOICE_API_KEY environment variable

If API key is not set, ask user for keys and set it:

export MINIMAX_VOICE_API_KEY="your-api-key-here"

Step 2: Decision and Pre-processing

⚠️ MOST IMPORTANT PRINCIPLE: Gender Matching First

Before selecting voices, you MUST always match gender first. This is non-negotiable.

Golden Rule: > If a character is male → use male voice > If a character is female → use female voice > If a character is neutral/other → choose appropriate neutral voice

Why this matters:

Violating gender matching (e.g., male character with female voice) breaks immersion
Even if personality traits match, gender comes first
This is especially critical for classic literature, historical content, and professional narration

Examples: | Character | Wrong Voice | Correct Voice | |-----------|-------------|---------------| | 唐三藏 (male monk) | female-yujie ❌ | Chinese (Mandarin)_Gentleman ✅ | | 林黛玉 (female) | male-qn-badao ❌ | female-shaonv ✅ | | 曹操 (male warlord) | female-chengshu ❌ | Chinese (Mandarin)_Unrestrained_Young_Man ✅ |

Decision guide: Evaluate based on:

Does the user specify a model? → Use that model, or use the default one "speech-2.8"
Is multi-voice needed? → Different voice_id per speaker/character
For speech-2.8: emotion is auto-matched (leave emotion empty)
For older models: manually specify emotion tags

Use case scenarios:

| Scenario | Description | Segments | Voice Selection | |----------|-------------|----------|-----------------| | Single Voice | User needs one voice for the entire content. Segment only by length (≤1,000,000 chars per segment). | Split by length only | One voice_id for all segments | | Multi-Voice | Multiple characters/speakers, each with different voice. Segment by speaker/role changes. | Split by logical unit (speaker, dialogue, etc.) | Different voice_id per role | | Podcast/Interview | Host and guest speakers with distinct voices. | Split by speaker | Voice per host/guest | | Audiobook/Fiction | Narrator and character voices. | Split by narration vs. dialogue | Voice per narrator/character | | Documentary | Mostly narration with occasional quotes. | Keep as one segment | Single narrator voice | | Report/Announcement | Formal content with consistent tone. | Keep as one segment | Professional voice |

Processing Workflow (4 sub-steps):

Step 2.1: Text Segmentation and Role Analysis First, segment your text into logical units and identify the role/character for each segment.

Key principle (Important!): Split by logical unit, NOT simply by sentence

When to split (Important!):

Different speakers clearly marked
Narrator vs. character dialogue (in fiction/audiobooks/interview etc.)
In some scenarios (like audiobooks, multi-voice fiction etc.), where speaker's identity is important, split when narration and dialogue mix in the same sentence.

When NOT to split (Important!):

Third-person narration like "John said..." or "The reporter noted..."
Quoted speech in narration (in documentary/podcast/report etc.) should keep in narrator's voice
Keep in narrator's voice unless specific characterization is needed

Decision depends on use case:

| Use case | Example | Split strategy | |----------|---------|----------------| | Single Voice | Long article, news piece, announcement | Split by length (≤1,000,000 chars), same voice for all | | Podcast/Interview | "Host: Welcome to the show. Guest: Thank you for having me." | Split by speaker | | Documentary narration | "The scientist explained, 'The results are promising.'" | Keep as one segment (narrator voice) | | Audiobook/Fiction | "'Who's there?' she whispered." | Split: "'Who's there?'" should be in character voice, while "she whispered." should be in narrator's voice | | Report | "According to the report, the economy is growing." | Keep as one segment |

Example1: Single Voice (speech-2.8) For single-voice content (e.g., news, announcements, articles), segment only by length while maintaining the same voice:

[
  {"text": "First part of the article (under 1,000,000 chars)...", "role": "narrator", "voice_id": "female-shaonv", "emotion": ""},
  {"text": "Second part of the article (under 1,000,000 chars)...", "role": "narrator", "voice_id": "female-shaonv", "emotion": ""},
  {"text": "Third part of the article (under 1,000,000 chars)...", "role": "narrator", "voice_id": "female-shaonv", "emotion": ""}
]

...

Prompt 示例

安装 mmVoiceMaker 后，可以对 AI 说这些话来触发它

U

Help me get started with mmVoiceMaker

A

Explains what mmVoiceMaker does, walks through the setup, and runs a quick demo based on your current project

U

Use mmVoiceMaker to voice synthesis, voice cloning, voice design, and audio post-proces...

A

Invokes mmVoiceMaker with the right parameters and returns the result directly in the conversation

U

What can I do with mmVoiceMaker in my design & creative workflow?

A

Lists the top use cases for mmVoiceMaker, with example commands for each scenario

常见问题

如何安装 mmVoiceMaker？▾

将技能文件夹放到 ~/.claude/skills/mm-voice-maker/ 目录（个人级，所有项目可用），或 .claude/skills/mm-voice-maker/（项目级）。重启 AI 客户端后，用 /mm-voice-maker 主动调用，或让 AI 根据上下文自动发现并使用。

mmVoiceMaker 支持哪些 AI 平台？▾

mmVoiceMaker 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

mmVoiceMaker 是免费的吗？▾

mmVoiceMaker 可免费安装使用。请查阅仓库了解许可证信息。

mmVoiceMaker 有什么功能？▾

Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...

mmVoiceMaker 属于哪个分类？▾

mmVoiceMaker 属于「Design & Creative」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with mmVoiceMaker→Automate Design & Creative Workflows with mmVoiceMaker→Team Collaboration with mmVoiceMaker→

mmVoiceMaker

安装

关于 mmVoiceMaker

MiniMax Voice Maker

Capabilities

File structure:

Main Workflow Guideline (Text to Speech)

Step 1: Verify environment

Step 2: Decision and Pre-processing

Prompt 示例

常见问题

使用场景

同类技能推荐

Humanizer

Nano Banana Pro

Openai Whisper

YouTube Watcher