M

Minimax-Multimodal-Toolkit

minimax-multimodal

Use mmx to generate text, images, video, speech, and music via the MiniMax AI platform. Use when the user wants to create media content, chat with MiniMax mo...

数据来源：ClawHub。在 ClawSkills 查看

3.2k下载量

16收藏数

18浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Minimax-Multimodal-Toolkit

--- name: minimax-multimodal-toolkit description: MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs. ---

MiniMax Multi-Modal Toolkit

Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases (audio + music + video + image). Includes voice cloning & voice design for custom voices, image generation with character reference, and FFmpeg-based media tools for audio/video format conversion, concatenation, trimming, and extraction.

Output Directory

All generated files MUST be saved to minimax-output/ under the AGENT'S current working directory (NOT the skill directory). Every script call MUST include an explicit --output / -o argument pointing to this location. Never omit the output argument or rely on script defaults.

Rules:

Before running any script, ensure minimax-output/ exists in the agent's working directory (create if needed: mkdir -p minimax-output)
Always use absolute or relative paths from the agent's working directory: --output minimax-output/video.mp4
Never cd into the skill directory to run scripts — run from the agent's working directory using the full script path
Intermediate/temp files (segment audio, video segments, extracted frames) are automatically placed in minimax-output/tmp/. They can be cleaned up when no longer needed: rm -rf minimax-output/tmp

Prerequisites

brew install ffmpeg jq              # macOS (or apt install ffmpeg jq on Linux)
bash scripts/check_environment.sh

No Python or pip required — all scripts are pure bash using curl, ffmpeg, jq, and xxd.

API Host Configuration

MiniMax provides two service endpoints for different regions. Set MINIMAX_API_HOST before running any script:

| Region | Platform URL | API Host Value | |--------|-------------|----------------| | China Mainland（中国大陆） | https://platform.minimaxi.com | https://api.minimaxi.com | | Global（全球） | https://platform.minimax.io | https://api.minimax.io |

# China Mainland
export MINIMAX_API_HOST="https://api.minimaxi.com"

# or Global
export MINIMAX_API_HOST="https://api.minimax.io"

IMPORTANT — When API Host is missing: Before running any script, check if MINIMAX_API_HOST is set in the environment. If it is NOT configured:

Ask the user which service endpoint their MiniMax account uses:

- China Mainland → https://api.minimaxi.com - Global → https://api.minimax.io

Instruct and help user to set it via export MINIMAX_API_HOST="https://api.minimaxi.com" (or the global variant) in their terminal or add it to their shell profile (~/.zshrc / ~/.bashrc) for persistence

API Key Configuration

Set the MINIMAX_API_KEY environment variable before running any script:

export MINIMAX_API_KEY="your-api-key-here"

The key starts with sk-api- or sk-cp-, obtainable from https://platform.minimaxi.com (China) or https://platform.minimax.io (Global)

IMPORTANT — When API Key is missing: Before running any script, check if MINIMAX_API_KEY is set in the environment. If it is NOT configured:

Ask the user to provide their MiniMax API key
Instruct and help user to set it via export MINIMAX_API_KEY="sk-..." in their terminal or add it to their shell profile (~/.zshrc / ~/.bashrc) for persistence

Key Capabilities

| Capability | Description | Entry point | |------------|-------------|-------------| | TTS | Text-to-speech synthesis with multiple voices and emotions | scripts/tts/generate_voice.sh | | Voice Cloning | Clone a voice from an audio sample (10s–5min) | scripts/tts/generate_voice.sh clone | | Voice Design | Create a custom voice from a text description | scripts/tts/generate_voice.sh design | | Music Generation | Generate songs with lyrics or instrumental tracks | scripts/music/generate_music.sh | | Image Generation | Text-to-image, image-to-image with character reference | scripts/image/generate_image.sh | | Video Generation | Text-to-video, image-to-video, subject reference, templates | scripts/video/generate_video.sh | | Long Video | Multi-scene chained video with crossfade transitions | scripts/video/generate_long_video.sh | | Media Tools | Audio/video format conversion, concatenation, trimming, extraction | scripts/media_tools.sh |

TTS (Text-to-Speech)

Entry point: scripts/tts/generate_voice.sh

IMPORTANT: Single voice vs Multi-segment — Choose the right approach

| User intent | Approach | |-------------|----------| | Single voice / no multi-character need | tts command — generate the entire text in one call | | Multiple characters / narrator + dialogue | generate command with segments.json |

Default behavior: When the user simply asks to generate speech/voice and does NOT mention multiple voices or characters, use the tts command directly with a single appropriate voice. Do NOT split into segments or use the multi-segment pipeline — just pass the full text to tts in one call.

Only use multi-segment generate when:

The user explicitly needs multiple voices/characters
The text requires narrator + character dialogue separation
The text exceeds 10,000 characters (API limit per request) — in this case, split into segments with the same voice

Single-voice generation (DEFAULT)

bash scripts/tts/generate_voice.sh tts "Hello world" -o minimax-output/hello.mp3
bash scripts/tts/generate_voice.sh tts "你好世界" -v female-shaonv -o minimax-output/hello_cn.mp3

Multi-segment generation (multi-voice / audiobook / podcast)

Complete workflow — follow ALL steps in order:

Write segments.json — split text into segments with voice assignments (see format and rules below)
Run generate command — this reads segments.json, generates audio for EACH segment via TTS API, then merges them into a single output file with crossfade

# Step 1: Write segments.json to minimax-output/
# (use the Write tool to create minimax-output/segments.json)

# Step 2: Generate audio from segments.json — this is the CRITICAL step
# It generates each segment individually and merges them into one file
bash scripts/tts/generate_voice.sh generate minimax-output/segments.json \
  -o minimax-output/output.mp3 --crossfade 200

Do NOT skip Step 2. Writing segments.json alone does nothing — you MUST run the generate command to actually produce audio.

Voice management

# List all available voices
bash scripts/tts/generate_voice.sh list-voices

# Voice cloning (from audio sample, 10s–5min)
bash scripts/tts/generate_voice.sh clone sample.mp3 --voice-id my-voice

# Voice design (from text description)
bash scripts/tts/generate_voice.sh design "A warm female narrator voice" --voice-id narrator

Audio processing

bash scripts/tts/generate_voice.sh merge part1.mp3 part2.mp3 -o minimax-output/combined.mp3
bash scripts/tts/generate_voice.sh convert input.wav -o minimax-output/output.mp3

TTS Models

| Model | Notes | |-------|-------| | speech-2.8-hd | Recommended, auto emotion matching | | speech-2.8-turbo | Faster variant | | speech-2.6-hd | Previous gen, manual emotion | | speech-2.6-turbo | Previous gen, faster |

segments.json Format

Default crossfade between segments: 200ms (--crossfade 200).

...

Prompt 示例

安装 Minimax-Multimodal-Toolkit 后，可以对 AI 说这些话来触发它

U

Help me get started with Minimax-Multimodal-Toolkit

A

Explains what Minimax-Multimodal-Toolkit does, walks through the setup, and runs a quick demo based on your current project

U

Use Minimax-Multimodal-Toolkit to use mmx to generate text, images, video, speech, and music via the ...

A

Invokes Minimax-Multimodal-Toolkit with the right parameters and returns the result directly in the conversation

U

What can I do with Minimax-Multimodal-Toolkit in my design & creative workflow?

A

Lists the top use cases for Minimax-Multimodal-Toolkit, with example commands for each scenario

常见问题

如何安装 Minimax-Multimodal-Toolkit？▾

将技能文件夹放到 ~/.claude/skills/minimax-multimodal/ 目录（个人级，所有项目可用），或 .claude/skills/minimax-multimodal/（项目级）。重启 AI 客户端后，用 /minimax-multimodal 主动调用，或让 AI 根据上下文自动发现并使用。

Minimax-Multimodal-Toolkit 支持哪些 AI 平台？▾

Minimax-Multimodal-Toolkit 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Minimax-Multimodal-Toolkit 是免费的吗？▾

Minimax-Multimodal-Toolkit 可免费安装使用。请查阅仓库了解许可证信息。

Minimax-Multimodal-Toolkit 有什么功能？▾

Use mmx to generate text, images, video, speech, and music via the MiniMax AI platform. Use when the user wants to create media content, chat with MiniMax mo...

Minimax-Multimodal-Toolkit 属于哪个分类？▾

Minimax-Multimodal-Toolkit 属于「Design & Creative」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with Minimax-Multimodal-Toolkit→Automate Design & Creative Workflows with Minimax-Multimodal-Toolkit→Team Collaboration with Minimax-Multimodal-Toolkit→