PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install pdf-parser-mineru或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install pdf-parser-mineru⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/pdf-parser-mineru/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: pdf-process-mineru description: PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats. ---
Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.
Description: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.
Parameters:
file_path (string, required): Absolute path to the PDF fileoutput_dir (string, required): Absolute path to the output directorybackend (string, optional): Parsing backend, options: hybrid-auto-engine (default), pipeline, vlm-auto-enginelanguage (string, optional): OCR language code, such as en (English), ch (Chinese), ja (Japanese), etc., defaults to auto-detectionenable_formula (boolean, optional): Whether to enable formula recognition, defaults to trueenable_table (boolean, optional): Whether to enable table extraction, defaults to truestart_page (integer, optional): Start page number (starting from 0), defaults to 0end_page (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pagesReturn Value:
{
"success": true,
"output_path": "/path/to/output",
"markdown_content": "Converted Markdown content...",
"images": ["List of image paths"],
"tables": ["List of table information"],
"formula_count": 10
}
Examples:
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'
# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'
# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
---
Convert PDF documents to JSON format, including detailed layout and structural information.
Description: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.
Parameters:
file_path (string, required): Absolute path to the PDF fileoutput_dir (string, required): Absolute path to the output directorybackend (string, optional): Parsing backend, options: hybrid-auto-engine (default), pipeline, vlm-auto-enginelanguage (string, optional): OCR language code, such as en (English), ch (Chinese), ja (Japanese), etc., defaults to auto-detectionenable_formula (boolean, optional): Whether to enable formula recognition, defaults to trueenable_table (boolean, optional): Whether to enable table extraction, defaults to truestart_page (integer, optional): Start page number (starting from 0), defaults to 0end_page (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pagesReturn Value:
{
"success": true,
"output_path": "/path/to/output.json",
"pages": [
{
"page_no": 0,
"page_size": [595, 842],
"blocks": [
{
"type": "text",
"text": "Text content",
"bbox": [x, y, x, y]
}
],
"images": [],
"tables": [],
"formulas": []
}
],
"metadata": {
"total_pages": 10,
"author": "Author",
"title": "Title"
}
}
Examples:
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'
# Use specific backend and language
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'
---
# Update pip and install uv
pip install --upgrade pip
pip install uv
# Install MinerU (including all features)
uv pip install -U "mineru[all]"
# Check if MinerU is installed successfully
mineru --version
# Test basic functionality
mineru --help
- Using pipeline backend: minimum 16GB, recommended 32GB+ - Using hybrid/vlm backend: minimum 16GB, recommended 32GB+
- pipeline backend: supports CPU-only - hybrid/vlm backend: requires NVIDIA GPU (Volta architecture and above) or Apple Silicon
- Ensure using Python 3.10-3.13 - Windows only supports Python 3.10-3.12 (ray does not support 3.13) - Using uv pip install can resolve most dependency conflicts
- Use pipeline backend - Limit parsing pages: start_page and end_page - Reduce virtual memory allocation
- Enable GPU acceleration - Use hybrid-auto-engine backend - Disable unnecessary features (formulas, tables)
- Specify the correct document language - Ensure the backend supports OCR (use pipeline or hybrid-*)
安装 pdf-parser-mineru 后,可以对 AI 说这些话来触发它
Help me get started with pdf-parser-mineru
Explains what pdf-parser-mineru does, walks through the setup, and runs a quick demo based on your current project
Use pdf-parser-mineru to pDF document parsing tool based on local MinerU, supports convertin...
Invokes pdf-parser-mineru with the right parameters and returns the result directly in the conversation
What can I do with pdf-parser-mineru in my documents & notes workflow?
Lists the top use cases for pdf-parser-mineru, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/pdf-parser-mineru/ 目录(个人级,所有项目可用),或 .claude/skills/pdf-parser-mineru/(项目级)。重启 AI 客户端后,用 /pdf-parser-mineru 主动调用,或让 AI 根据上下文自动发现并使用。
pdf-parser-mineru 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
pdf-parser-mineru 可免费安装使用。请查阅仓库了解许可证信息。
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
pdf-parser-mineru 属于「Documents & Notes」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my documents & notes tasks using pdf-parser-mineru
Identifies repetitive steps in your workflow and sets up pdf-parser-mineru to handle them automatically