Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install anycrawl或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install anycrawl⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/anycrawl/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.
export ANYCRAWL_API_KEY="your-api-key"
Make it permanent by adding to ~/.bashrc or ~/.zshrc:
echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc
Get your API key at: https://anycrawl.dev
openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"
Scrape a single URL and convert to LLM-ready structured data.
Parameters:
url (string, required): URL to scrapeengine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]timeout (number, optional): Timeout in milliseconds (default: 30000)wait_for (number, optional): Delay before extraction in ms (browser engines only)wait_for_selector (string/object/array, optional): Wait for CSS selectorsinclude_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])exclude_tags (array, optional): Exclude these HTML tagsproxy (string, optional): Proxy URL (e.g., "http://proxy:port")json_options (object, optional): JSON extraction with schema/promptextract_source (string, optional): "markdown" (default) or "html"Examples:
// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })
// Scrape SPA with Playwright
anycrawl_scrape({
url: "https://spa-example.com",
engine: "playwright",
formats: ["markdown", "screenshot"]
})
// Extract structured JSON
anycrawl_scrape({
url: "https://product-page.com",
engine: "cheerio",
json_options: {
schema: {
type: "object",
properties: {
product_name: { type: "string" },
price: { type: "number" },
description: { type: "string" }
},
required: ["product_name", "price"]
},
user_prompt: "Extract product details from this page"
}
})
Search Google and return structured results.
Parameters:
query (string, required): Search queryengine (string, optional): Search engine - "google" (default)limit (number, optional): Max results per page (default: 10)offset (number, optional): Number of results to skip (default: 0)pages (number, optional): Number of pages to retrieve (default: 1, max: 20)lang (string, optional): Language locale (e.g., "en", "zh", "vi")safe_search (number, optional): 0 (off), 1 (medium), 2 (high)scrape_options (object, optional): Scrape each result URL with these optionsExamples:
// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })
// Multi-page search in Vietnamese
anycrawl_search({
query: "hướng dẫn Node.js",
pages: 3,
lang: "vi"
})
// Search and auto-scrape results
anycrawl_search({
query: "best AI tools 2026",
limit: 5,
scrape_options: {
engine: "cheerio",
formats: ["markdown"]
}
})
Start crawling an entire website (async job).
Parameters:
url (string, required): Seed URL to start crawlingengine (string, optional): "cheerio" (default), "playwright", "puppeteer"strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"max_depth (number, optional): Max depth from seed URL (default: 10)limit (number, optional): Max pages to crawl (default: 100)include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])scrape_paths (array, optional): Only scrape URLs matching these patternsscrape_options (object, optional): Per-page scrape optionsExamples:
// Crawl entire website
anycrawl_crawl_start({
url: "https://docs.example.com",
engine: "cheerio",
max_depth: 5,
limit: 50
})
// Crawl only blog posts
anycrawl_crawl_start({
url: "https://example.com",
strategy: "same-domain",
include_paths: ["/blog/*"],
exclude_paths: ["/blog/tags/*"],
scrape_options: {
formats: ["markdown"]
}
})
// Crawl product pages only
anycrawl_crawl_start({
url: "https://shop.example.com",
strategy: "same-domain",
scrape_paths: ["/products/*"],
limit: 200
})
Check crawl job status.
Parameters:
job_id (string, required): Crawl job IDExample:
anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })
Get crawl results (paginated).
Parameters:
job_id (string, required): Crawl job IDskip (number, optional): Number of results to skip (default: 0)Example:
// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })
// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })
Cancel a running crawl job.
Parameters:
job_id (string, required): Crawl job IDQuick helper: Search Google then scrape top results.
Parameters:
query (string, required): Search querymax_results (number, optional): Max results to scrape (default: 3)scrape_engine (string, optional): Engine for scraping (default: "cheerio")formats (array, optional): Output formats (default: ["markdown"])lang (string, optional): Search languageExample:
anycrawl_search_and_scrape({
query: "latest AI news",
max_results: 5,
formats: ["markdown"]
})
| Engine | Best For | Speed | JS Rendering | |--------|----------|-------|--------------| | cheerio | Static HTML, news, blogs | ⚡ Fastest | ❌ No | | playwright | SPAs, complex web apps | 🐢 Slower | ✅ Yes | | puppeteer | Chrome-specific, metrics | 🐢 Slower | ✅ Yes |
All responses follow this structure:
{
"success": true,
"data": { ... },
"message": "Optional message"
}
Error response:
{
"success": false,
"error": "Error type",
"message": "Human-readable message"
}
400 - Bad Request (validation errors)401 - Unauthorized (invalid API key)402 - Payment Required (insufficient credits)404 - Not Found429 - Rate limit exceeded500 - Internal server error安装 AnyCrawl-API 后,可以对 AI 说这些话来触发它
Help me get started with AnyCrawl-API
Explains what AnyCrawl-API does, walks through the setup, and runs a quick demo based on your current project
Use AnyCrawl-API to perform high-performance web scraping, crawling, and Google search ...
Invokes AnyCrawl-API with the right parameters and returns the result directly in the conversation
What can I do with AnyCrawl-API in my data & analytics workflow?
Lists the top use cases for AnyCrawl-API, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/anycrawl/ 目录(个人级,所有项目可用),或 .claude/skills/anycrawl/(项目级)。重启 AI 客户端后,用 /anycrawl 主动调用,或让 AI 根据上下文自动发现并使用。
AnyCrawl-API 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
AnyCrawl-API 可免费安装使用。请查阅仓库了解许可证信息。
Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.
AnyCrawl-API 属于「Data & Analytics」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my data & analytics tasks using AnyCrawl-API
Identifies repetitive steps in your workflow and sets up AnyCrawl-API to handle them automatically