V

Video Understanding

video-understanding

Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.

数据来源：ClawHub。在 ClawSkills 查看

927下载量

4收藏数

6浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Video Understanding

--- name: video-understanding description: > Analyze videos with Google Gemini multimodal AI. Download from any URL (Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, 1000+ sites) and get transcripts, descriptions, and answers to questions. Use when asked to watch, analyze, summarize, or transcribe a video, or answer questions about video content. Triggers on video URLs or requests involving video understanding. compatibility: "Requires yt-dlp, ffmpeg, and GEMINI_API_KEY environment variable. Python 3.10+ with uv." metadata: openclaw: emoji: "🎬" requires: bins: ["yt-dlp", "ffmpeg"] env: ["GEMINI_API_KEY"] primaryEnv: "GEMINI_API_KEY" install: - id: "yt-dlp-brew" kind: "brew" formula: "yt-dlp" bins: ["yt-dlp"] label: "Install yt-dlp (brew)" - id: "ffmpeg-brew" kind: "brew" formula: "ffmpeg" bins: ["ffmpeg"] label: "Install ffmpeg (brew)" ---

Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

yt-dlp — brew install yt-dlp / pip install yt-dlp
ffmpeg — brew install ffmpeg (for merging video+audio streams)
GEMINI_API_KEY environment variable

Default Output

Returns structured JSON:

transcript — Verbatim transcript with [MM:SS] timestamps
description — Visual description (people, setting, UI, text on screen, flow)
summary — 2-3 sentence summary
duration_seconds — Estimated duration
speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

uv run {baseDir}/scripts/analyze_video.py "<video-url>"

Ask a question (adds "answer" field)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"

Override prompt entirely

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw

Download only (no analysis)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4

Options

| Flag | Description | Default | |------|-------------|---------| | -q / --question | Question to answer (added to default fields) | none | | -p / --prompt | Override entire prompt (ignores -q) | structured JSON | | -m / --model | Gemini model | gemini-2.5-flash | | -o / --output | Save output to file | stdout | | --keep | Keep downloaded video file | false | | --download-only | Download only, skip analysis | false | | --max-size | Max file size in MB | 500 | | --raw | Raw text output instead of JSON | false |

How It Works

YouTube URLs → Passed directly to Gemini (no download needed)
All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
Gemini analyzes video with structured prompt → returns JSON
Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

Use -q for targeted questions on top of the full analysis
YouTube is fastest (no download step)
Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
The script auto-installs Python dependencies via uv

Prompt 示例

安装 Video Understanding 后，可以对 AI 说这些话来触发它

U

Help me get started with Video Understanding

A

Explains what Video Understanding does, walks through the setup, and runs a quick demo based on your current project

U

Use Video Understanding to analyze and summarize videos from 1000+ sites using Google Gemini A...

A

Invokes Video Understanding with the right parameters and returns the result directly in the conversation

U

What can I do with Video Understanding in my design & creative workflow?

A

Lists the top use cases for Video Understanding, with example commands for each scenario

常见问题

如何安装 Video Understanding？▾

将技能文件夹放到 ~/.claude/skills/video-understanding/ 目录（个人级，所有项目可用），或 .claude/skills/video-understanding/（项目级）。重启 AI 客户端后，用 /video-understanding 主动调用，或让 AI 根据上下文自动发现并使用。

Video Understanding 支持哪些 AI 平台？▾

Video Understanding 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Video Understanding 是免费的吗？▾

Video Understanding 可免费安装使用。请查阅仓库了解许可证信息。

Video Understanding 有什么功能？▾

Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.

Video Understanding 属于哪个分类？▾

Video Understanding 属于「Design & Creative」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with Video Understanding→Automate Design & Creative Workflows with Video Understanding→Team Collaboration with Video Understanding→