S

Smart Web Scraper

smart-web-scraper

Extract structured data from any web page. Supports CSS selectors, auto-detection of tables and lists, JSON/CSV output formats. Use when asked to scrape a we...

数据来源：ClawHub。在 ClawSkills 查看

2.2k下载量

0收藏数

14浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Smart Web Scraper

--- name: smart-web-scraper description: Extract structured data from any web page. Supports CSS selectors, auto-detection of tables and lists, JSON/CSV output formats. Use when asked to scrape a website, extract data from a page, pull product info, gather contact details, or collect listings from a URL. ---

Smart Web Scraper

Extract structured data from web pages into clean JSON or CSV.

Quick Start

# Scrape a page, extract all text content
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com"

# Extract specific elements with CSS selector
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com/products" -s ".product-card"

# Auto-detect and extract tables
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing"

# Extract all links from a page
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com"

# Extract structured data (title, meta, headings, links)
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py structure "https://example.com"

# Output as JSON
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".item" -f json

# Output as CSV
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s "table tr" -f csv

# Save to file
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".product" -f json -o products.json

# Multi-page scrape (follow pagination)
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py crawl "https://example.com/page/1" --pages 5 -s ".article"

Commands

| Command | Args | Description | |---------|------|-------------| | extract | [-s selector] [-f format] [-o file] | Extract content, optionally filtered by CSS selector | | tables | [-f format] [-o file] | Auto-detect and extract all HTML tables | | links | [--external] [--internal] | Extract all links (href + text) | | structure | | Extract page structure: title, meta, headings, images, links | | crawl | --pages N [-s selector] [-f format] [-o file] | Follow pagination links, extract from multiple pages |

Output Formats

| Format | Flag | Description | |--------|------|-------------| | Text | -f text | Plain text (default) | | JSON | -f json | Structured JSON array | | CSV | -f csv | Comma-separated values | | Markdown | -f md | Markdown-formatted |

Examples

Extract product listings

uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://shop.example.com" -s ".product" -f json

Output:

[
  {"text": "Widget Pro - $29.99", "tag": "div", "class": "product"},
  {"text": "Widget Max - $49.99", "tag": "div", "class": "product"}
]

Extract pricing table

uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing" -f csv

Get all external links

uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com" --external

Rate Limiting

Default: 1 request per second (respectful crawling)
Override with --delay 0.5 (seconds between requests)
Respects robots.txt by default (override with --ignore-robots)

Notes

Requires beautifulsoup4 and lxml (auto-installed by uv run --with)
Uses a standard browser User-Agent to avoid blocks
Handles redirects, encoding detection, and error pages gracefully
No JavaScript rendering (use for static HTML pages)

Prompt 示例

安装 Smart Web Scraper 后，可以对 AI 说这些话来触发它

U

Help me get started with Smart Web Scraper

A

Explains what Smart Web Scraper does, walks through the setup, and runs a quick demo based on your current project

U

Use Smart Web Scraper to extract structured data from any web page

A

Invokes Smart Web Scraper with the right parameters and returns the result directly in the conversation

U

What can I do with Smart Web Scraper in my data & analytics workflow?

A

Lists the top use cases for Smart Web Scraper, with example commands for each scenario

常见问题

如何安装 Smart Web Scraper？▾

将技能文件夹放到 ~/.claude/skills/smart-web-scraper/ 目录（个人级，所有项目可用），或 .claude/skills/smart-web-scraper/（项目级）。重启 AI 客户端后，用 /smart-web-scraper 主动调用，或让 AI 根据上下文自动发现并使用。

Smart Web Scraper 支持哪些 AI 平台？▾

Smart Web Scraper 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Smart Web Scraper 是免费的吗？▾

Smart Web Scraper 可免费安装使用。请查阅仓库了解许可证信息。

Smart Web Scraper 有什么功能？▾

Extract structured data from any web page. Supports CSS selectors, auto-detection of tables and lists, JSON/CSV output formats. Use when asked to scrape a we...

Smart Web Scraper 属于哪个分类？▾

Smart Web Scraper 属于「Data & Analytics」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with Smart Web Scraper→Automate Data & Analytics Workflows with Smart Web Scraper→Team Collaboration with Smart Web Scraper→

Smart Web Scraper

安装

关于 Smart Web Scraper

Smart Web Scraper

Quick Start

Commands

Output Formats

Examples

Extract product listings

Extract pricing table

Get all external links

Rate Limiting

Notes

Prompt 示例

常见问题

使用场景

同类技能推荐

Weather

Multi Search Engine

Tavily 搜索

Baidu web search