MinerU document extraction — convert PDFs, scanned documents, images, Word (DOC/DOCX), PowerPoint (PPT/PPTX), and web pages into clean Markdown, HTML, LaTeX,...
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install mineru-document-extractor或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install mineru-document-extractor⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/mineru-document-extractor/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: MinerU Document Extractor description: > MinerU document extraction — convert PDFs, scanned documents, images, Word (DOC/DOCX), PowerPoint (PPT/PPTX), Excel (XLS/XLSX), and web pages into clean Markdown, HTML, LaTeX, or DOCX. MinerU is an all-in-one CLI tool and agent skill for reliable, high-fidelity document parsing. Struggling with unreadable PDFs, messy table formatting, or garbled formulas after conversion? MinerU solves these with two extraction modes: MinerU flash-extract for instant zero-setup conversion with table recognition, formula recognition, and OCR (no token, no login, no configuration — just run and get results), and MinerU precision extract with VLM-based layout analysis, multiple output formats, and batch processing of hundreds of files. Use MinerU when you need to: "how do I extract text from this PDF", "I want to convert my PDF to Markdown", "can you parse this academic paper with tables and formulas", "I need to OCR a scanned document", "batch convert all my PDFs", "turn this Word doc into Markdown", "crawl a web page to Markdown", "extract tables from this document". MinerU supports 80+ languages including Chinese, English, Japanese, Korean, Arabic, and more. Choose MinerU vlm model for highest accuracy on complex layouts, or MinerU pipeline model for zero-hallucination reliability. Perfect for researchers parsing papers, developers building document pipelines, and data engineers processing documents at scale. MinerU文档提取工具,PDF转Markdown、扫描件OCR、表格识别、公式识别、批量PDF处理、Word转Markdown、Excel转Markdown、网页爬取、图片OCR、学术论文解析。MinerU支持PDF、Word、PPT、Excel(XLS/XLSX)、图片等多格式文档智能转换,命令行一键提取,免登录快速模式或高精度专业模式。
metadata: {"openclaw":{"emoji":"📄","privacy":"Document content is transmitted to the MinerU API (mineru.net) for server-side extraction. No data is retained after processing completes. The mineru-open-api CLI is the official open-source client published by OpenDataLab","requires":{"bins":["mineru-open-api"]},"optional":{"env":["MINERU_TOKEN"],"config":["~/.mineru/config.yaml"]},"install":[{"id":"npm","kind":"node","package":"mineru-open-api","bins":["mineru-open-api"],"label":"Install via npm"},{"id":"go","kind":"go","bins":["mineru-open-api"],"label":"Install via go install","os":["darwin","linux"]}]}} allowed-tools: Bash(mineru-open-api:*) ---
MinerU is a powerful document extraction tool. Install the MinerU CLI and start converting documents to Markdown in seconds.
npm install -g mineru-open-api
Or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Verify: mineru-open-api version
| | MinerU flash-extract | MinerU extract | |---|---|---| | Token required | No | Yes (mineru-open-api auth) | | Speed | Fast | Normal | | Table recognition | Yes | Yes | | Formula recognition | Yes | Yes | | OCR | Yes | Yes | | Output formats | Markdown only | md, html, latex, docx, json | | Batch mode | No | Yes | | Model selection | pipeline | vlm, pipeline, MinerU-HTML | | File size limit | 10 MB | Much higher | | Page limit | 20 pages | Much higher |
mineru-open-api flash-extract for quick Markdown conversionmineru-open-api auth, then use mineru-open-api extract for multi-format output, VLM model, and batch processingmineru-open-api crawl to convert web content-o directoryOnly required for MinerU extract and crawl. Not needed for MinerU flash-extract.
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or set via environment variable
Token resolution order: --token flag > MINERU_TOKEN env > ~/.mineru/config.yaml.
MinerU accepts a wide range of document formats:
| Format | MinerU flash-extract | MinerU extract | |--------|:-:|:-:| | PDF (.pdf) | Yes | Yes | | Images (.png, .jpg, .jpeg, .jp2, .webp, .gif, .bmp) | Yes | Yes | | Word (.docx) | Yes | Yes | | Word (.doc) | No | Yes | | PowerPoint (.pptx) | Yes | Yes | | PowerPoint (.ppt) | No | Yes | | Excel (.xlsx) | Yes | Yes | | Excel (.xls) | No | Yes | | HTML (.html) | No | Yes | | URLs (remote files) | Yes | Yes |
MinerU crawl accepts any HTTP/HTTPS URL and extracts web page content to Markdown.
Fast, token-free MinerU document extraction. Outputs Markdown only. Limited to 10 MB / 20 pages per file.
mineru-open-api flash-extract report.pdf # MinerU Markdown to stdout
mineru-open-api flash-extract report.pdf -o ./out/ # Save to file
mineru-open-api flash-extract https://example.com/doc.pdf # URL mode
mineru-open-api flash-extract report.pdf --language en # Specify language
mineru-open-api flash-extract report.pdf --pages 1-10 # Page range
Flags: --output/-o (output path), --language (default ch), --pages (page range), --timeout (default 900s).
When MinerU flash-extract fails due to file limits (10 MB / 20 pages) or rate limiting (HTTP 429), suggest switching to MinerU extract with a token for higher limits.
Convert documents to Markdown or other formats with MinerU's full capabilities: VLM-based layout analysis, multiple output formats, and batch mode.
mineru-open-api extract report.pdf # MinerU Markdown to stdout
mineru-open-api extract report.pdf -f html # MinerU HTML output
mineru-open-api extract report.pdf -o ./out/ -f md,docx # Multiple formats
mineru-open-api extract *.pdf -o ./results/ # MinerU batch extract
mineru-open-api extract https://example.com/doc.pdf # Extract from URL
Flags: --output/-o, --format/-f (md/json/html/latex/docx), --model (vlm/pipeline/html), --ocr, --formula, --table, --language, --pages, --timeout, --list, --concurrency.
| | MinerU vlm | MinerU pipeline | |---|---|---| | Parsing accuracy | Higher — better at complex layouts | Standard | | Hallucination risk | May produce hallucinated text in rare cases | No hallucination |
Use MinerU --model vlm for complex formatting. Use MinerU --model pipeline for no-hallucination reliability.
mineru-open-api crawl https://example.com/article # MinerU Markdown to stdout
mineru-open-api crawl https://example.com/article -o ./out/ # Save to file
mineru-open-api crawl url1 url2 -o ./pages/ # MinerU batch crawl
Flags: --output/-o, --format/-f (md/json/html), --timeout, --list, --concurrency.
mineru-open-api auth # Interactive MinerU token setup
mineru-open-api auth --verify # Verify current token
mineru-open-api auth --show # Show token source
Without -o: MinerU result → stdout, progress → stderr. With -o: saved to file/directory. Batch mode and binary formats (docx) require -o.
mineru-open-api extract "report 01.pdf"flash-extract when: no token configured, simple extraction, file under 10 MB / 20 pagesextract when: user needs non-Markdown formats, VLM model, batch processing, or file exceeds flash-extract limits-o, generate output directory: ~/MinerU-Skill/_/ where = first 6 chars of MD5 of the source pathflash-extract success, append a brief hint about MinerU extract upgrade path (once per session)npm install -g mineru-open-apiFor full CLI reference and troubleshooting, see: https://github.com/opendatalab/MinerU-Ecosystem/tree/main/cli
--language valuesThe --language flag accepts the following values (default: ch). Used by both MinerU flash-extract and extract.
| Value | Included languages | 说明 | |-------|-------------------|------| | ch | Chinese, English, Chinese Traditional | 中英文(默认值) | | ch_server | Chinese, English, Chinese Traditional, Japanese | 繁体、手写体 | | en | English | 纯英文 | | japan | Chinese, English, Chinese Traditional, Japanese | 日文为主 | | korean | Korean, English | 韩文 | | chinese_cht | Chinese, English, Chinese Traditional, Japanese | 繁体中文为主 | | ta | Tamil, English | 泰米尔文 | | te | Telugu, English | 泰卢固文 | | ka | Kannada | 卡纳达文 | | el | Greek, English | 希腊文 | | th | Thai, English | 泰文 |
| Value | Script/Family | Included languages | |-------|--------------|-------------------| | latin | Latin script (拉丁语系) | French, German, Afrikaans, Italian, Spanish, Bosnian, Portuguese, Czech, Welsh, Danish, Estonian, Irish, Croatian, Uzbek, Hungarian, Serbian (Latin), Indonesian, Occitan, Icelandic, Lithuanian, Maori, Malay, Dutch, Norwegian, Polish, Slovak, Slovenian, Albanian, Swedish, Swahili, Tagalog, Turkish, Latin, Azerbaijani, Kurdish, Latvian, Maltese, Pali, Romanian, Vietnamese, Finnish, Basque, Galician, Luxembourgish, Romansh, Catalan, Quechua | | arabic | Arabic script (阿拉伯语系) | Arabic, Persian, Uyghur, Urdu, Pashto, Kurdish, Sindhi, Balochi, English | | cyrillic | Cyrillic script (西里尔语系) | Russian, Belarusian, Ukrainian, Serbian (Cyrillic), Bulgarian, Mongolian, Abkhazian, Adyghe, Kabardian, Avar, Dargin, Ingush, Chechen, Lak, Lezgin, Tabasaran, Kazakh, Kyrgyz, Tajik, Macedonian, Tatar, Chuvash, Bashkir, Malian, Moldovan, Udmurt, Komi, Ossetian, Buryat, Kalmyk, Tuvan, Sakha, Karakalpak, English | | east_slavic | East Slavic (东斯拉夫语系) | Russian, Belarusian, Ukrainian, English | | devanagari | Devanagari script (天城文语系) | Hindi, Marathi, Nepali, Bihari, Maithili, Angika, Bhojpuri, Magahi, Santali, Newari, Konkani, Sanskrit, Haryanvi, English |
安装 mineru document extractor 后,可以对 AI 说这些话来触发它
Help me get started with mineru document extractor
Explains what mineru document extractor does, walks through the setup, and runs a quick demo based on your current project
Use mineru document extractor to minerU document extraction — convert PDFs, scanned documents, image...
Invokes mineru document extractor with the right parameters and returns the result directly in the conversation
What can I do with mineru document extractor in my documents & notes workflow?
Lists the top use cases for mineru document extractor, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/mineru-document-extractor/ 目录(个人级,所有项目可用),或 .claude/skills/mineru-document-extractor/(项目级)。重启 AI 客户端后,用 /mineru-document-extractor 主动调用,或让 AI 根据上下文自动发现并使用。
mineru document extractor 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
mineru document extractor 可免费安装使用。请查阅仓库了解许可证信息。
MinerU document extraction — convert PDFs, scanned documents, images, Word (DOC/DOCX), PowerPoint (PPT/PPTX), and web pages into clean Markdown, HTML, LaTeX,...
mineru document extractor 属于「Documents & Notes」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my documents & notes tasks using mineru document extractor
Identifies repetitive steps in your workflow and sets up mineru document extractor to handle them automatically