Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install faster-whisper或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install faster-whisper⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/faster-whisper/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: faster-whisper description: "Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT/VTT/TTML/CSV subtitles, speaker diarization, URL/YouTube input, batch processing with ETA, transcript search, chapter detection, per-file language map." version: 1.5.1 author: ThePlasmak homepage: https://github.com/ThePlasmak/faster-whisper tags: [ "audio", "transcription", "whisper", "speech-to-text", "ml", "cuda", "gpu", "subtitles", "diarization", "podcast", "chapters", "search", "csv", "ttml", "batch", ] platforms: ["linux", "macos", "wsl2"] metadata: { "openclaw": { "emoji": "🗣️", "requires": { "bins": ["python3"], "optionalBins": ["ffmpeg", "yt-dlp"], "optionalPaths": ["~/.cache/huggingface/token"], }, }, } ---
Local speech-to-text using faster-whisper — a CTranslate2 reimplementation of OpenAI's Whisper that runs 4-6x faster with identical accuracy. With GPU acceleration, expect ~20x realtime transcription (a 10-minute audio file in ~30 seconds).
Use this skill when you need to:
--diarize)--rss fetches and transcribes episodes--translate--language-map assigns a different language per file--multilingual for mixed-language audio--initial-prompt for jargon-heavy content or any other terms to look out for--normalize and --denoise before transcription--stream shows segments as they're transcribed--clip-timestamps to transcribe specific sections--search "term" finds all timestamps where a word/phrase appears--detect-chapters finds section breaks from silence gaps--export-speakers DIR saves each speaker's turns as separate WAV files--format csv produces a properly-quoted CSV with timestampsTrigger phrases: "transcribe this audio", "convert speech to text", "what did they say", "make a transcript", "audio to text", "subtitle this video", "who's speaking", "translate this audio", "translate to English", "find where X is mentioned", "search transcript for", "when did they say", "at what timestamp", "add chapters", "detect chapters", "find breaks in the audio", "table of contents for this recording", "TTML subtitles", "DFXP subtitles", "broadcast format subtitles", "Netflix format", "ASS subtitles", "aegisub format", "advanced substation alpha", "mpv subtitles", "LRC subtitles", "timed lyrics", "karaoke subtitles", "music player lyrics", "HTML transcript", "confidence-colored transcript", "color-coded transcript", "separate audio per speaker", "export speaker audio", "split by speaker", "transcript as CSV", "spreadsheet output", "transcribe podcast", "podcast RSS feed", "different languages in batch", "per-file language", "transcribe in multiple formats", "srt and txt at the same time", "output both srt and text", "remove filler words", "clean up ums and uhs", "strip hesitation sounds", "remove you know and I mean", "transcribe left channel", "transcribe right channel", "stereo channel", "left track only", "wrap subtitle lines", "character limit per line", "max chars per subtitle", "detect paragraphs", "paragraph breaks", "group into paragraphs", "add paragraph spacing"
⚠️ Agent guidance — keep invocations minimal:
_CORE RULE: default command (./scripts/transcribe audio.mp3) is the fastest path — add flags only when the user explicitly asks for that capability._
Transcription:
--diarize if the user asks "who said what" / "identify speakers" / "label speakers"--format srt/vtt/ass/lrc/ttml if the user asks for subtitles/captions in that format--format csv if the user asks for CSV or spreadsheet output--word-timestamps if the user needs word-level timing--initial-prompt if there's domain-specific jargon to prime--translate if the user wants non-English audio translated to English--normalize/--denoise if the user mentions bad audio quality or noise--stream if the user wants live/progressive output for long files--clip-timestamps if the user wants a specific time range--temperature 0.0 if the model is hallucinating on music/silence--vad-threshold if VAD is aggressively cutting speech or including noise--min-speakers/--max-speakers when you know the speaker count--hf-token if the token is not cached at ~/.cache/huggingface/token--max-words-per-line for subtitle readability on long segments--filter-hallucinations if the transcript contains obvious artifacts (music markers, duplicates)--merge-sentences if the user asks for sentence-level subtitle cues--clean-filler if the user asks to remove filler words (um, uh, you know, I mean, hesitation sounds)--channel left|right if the user mentions stereo tracks, dual-channel recordings, or asks for a specific channel--max-chars-per-line N when the user specifies a character limit per subtitle line (e.g., "Netflix format", "42 chars per line"); takes priority over --max-words-per-line--detect-paragraphs if the user asks for paragraph breaks or structured text output; --paragraph-gap (default 3.0s) only if they want a custom gap--speaker-names "Alice,Bob" when the user provides real names to replace SPEAKER_1/2 — always requires --diarize--hotwords WORDS when the user names specific rare terms not well served by --initial-prompt; prefer --initial-prompt for general domain jargon--prefix TEXT when the user knows the exact words the audio starts with--detect-language-only when the user only wants to identify the language, not transcribe--stats-file PATH if the user asks for performance stats, RTF, or benchmark info--parallel N for large CPU batch jobs; GPU handles one file efficiently on its own — don't add for single files or small batches--retries N for unreliable inputs (URLs, network files) where transient failures are expected--burn-in OUTPUT only when user explicitly asks to embed/burn subtitles into the video; requires ffmpeg and a video file input--keep-temp when the user may re-process the same URL to avoid re-downloading--output-template when user specifies a custom naming pattern in batch mode--format srt,text): only when user explicitly wants multiple formats in one pass; always pair with -o --diarize adds ~20-30s on top of thatSearch:
...
安装 Faster Whisper 后,可以对 AI 说这些话来触发它
Help me get started with Faster Whisper
Explains what Faster Whisper does, walks through the setup, and runs a quick demo based on your current project
Use Faster Whisper to local speech-to-text using faster-whisper
Invokes Faster Whisper with the right parameters and returns the result directly in the conversation
What can I do with Faster Whisper in my design & creative workflow?
Lists the top use cases for Faster Whisper, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/faster-whisper/ 目录(个人级,所有项目可用),或 .claude/skills/faster-whisper/(项目级)。重启 AI 客户端后,用 /faster-whisper 主动调用,或让 AI 根据上下文自动发现并使用。
Faster Whisper 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
Faster Whisper 可免费安装使用。请查阅仓库了解许可证信息。
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...
Faster Whisper 属于「Design & Creative」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my design & creative tasks using Faster Whisper
Identifies repetitive steps in your workflow and sets up Faster Whisper to handle them automatically