V

Vision Sandbox

vision-sandbox

🌐 English

Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.

数据来源：ClawHub。在 ClawSkills 查看

5.7k下载量

1收藏数

29浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Vision Sandbox

--- name: Vision Sandbox slug: vision-sandbox version: 1.1.0 description: Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. metadata: openclaw: emoji: "🔭" primaryEnv: "GEMINI_API_KEY" requires: bins: ["uv"] env: ["GEMINI_API_KEY"] ---

Vision Sandbox 🔭

Leverage Gemini's native code execution to analyze images with high precision. The model writes and runs Python code in a Google-hosted sandbox to verify visual data, perfect for UI auditing, spatial grounding, and visual reasoning.

Installation

clawhub install vision-sandbox

Usage

uv run vision-sandbox --image "path/to/image.png" --prompt "Identify all buttons and provide [x, y] coordinates."

Pattern Library

📍 Spatial Grounding

Ask the model to find specific items and return coordinates.

Prompt: "Locate the 'Submit' button in this screenshot. Use code execution to verify its center point and return the [x, y] coordinates in a [0, 1000] scale."

🧮 Visual Math

Ask the model to count or calculate based on the image.

Prompt: "Count the number of items in the list. Use Python to sum their values if prices are visible."

🖥️ UI Audit

Check layout and readability.

Prompt: "Check if the header text overlaps with any icons. Use the sandbox to calculate the bounding box intersections."

🖐️ Counting & Logic

Solve visual counting tasks with code verification.

Prompt: "Count the number of fingers on this hand. Use code execution to identify the bounding box for each finger and return the total count."

Integration with OpenCode

This skill is designed to provide Visual Grounding for automated coding agents like OpenCode.

Step 1: Use vision-sandbox to extract UI metadata (coordinates, sizes, colors).
Step 2: Pass the JSON output to OpenCode to generate or fix CSS/HTML.

Configuration

GEMINI_API_KEY: Required environment variable.
Model: Defaults to gemini-3-flash-preview.

Prompt 示例

安装 Vision Sandbox 后，可以对 AI 说这些话来触发它

U

Help me get started with Vision Sandbox

A

Explains what Vision Sandbox does, walks through the setup, and runs a quick demo based on your current project

U

Use Vision Sandbox to agentic Vision via Gemini's native Code Execution sandbox

A

Invokes Vision Sandbox with the right parameters and returns the result directly in the conversation

U

What can I do with Vision Sandbox in my developer & devops workflow?

A

Lists the top use cases for Vision Sandbox, with example commands for each scenario

常见问题

如何安装 Vision Sandbox？▾

将技能文件夹放到 ~/.claude/skills/vision-sandbox/ 目录（个人级，所有项目可用），或 .claude/skills/vision-sandbox/（项目级）。重启 AI 客户端后，用 /vision-sandbox 主动调用，或让 AI 根据上下文自动发现并使用。

Vision Sandbox 支持哪些 AI 平台？▾

Vision Sandbox 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Vision Sandbox 是免费的吗？▾

Vision Sandbox 可免费安装使用。请查阅仓库了解许可证信息。

Vision Sandbox 有什么功能？▾

Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.

Vision Sandbox 属于哪个分类？▾

Vision Sandbox 属于「Developer & DevOps」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with Vision Sandbox→Automate Developer & DevOps Workflows with Vision Sandbox→Team Collaboration with Vision Sandbox→