D

document-parser

Extract structured data from PDFs, images, and Word files with layout analysis, table recognition, OCR, seal detection, and directory extraction.

数据来源：ClawHub。在 ClawSkills 查看

1.1k下载量

0收藏数

10浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 document-parser

document-parser

高精度文档解析技能，从 PDF、图片、Word 文档中提取结构化数据。

用途

解析 PDF、图片 (JPG/PNG)、Word 文档
版面分析与结构提取
表格识别（输出 HTML/Markdown）
OCR 文字识别
印章检测
目录提取

命令

解析文档

document-parser parse <文件路径> [选项]

示例：

document-parser parse C:\docs\report.pdf
document-parser parse C:\docs\scan.jpg --layout --table
document-parser parse C:\docs\contract.docx --output markdown

查询任务状态

document-parser status <任务 ID>

参数说明

| 参数 | 说明 | 示例 | |------|------|------| | 文件路径 | PDF/图片/Word 文件路径 | C:\docs\report.pdf | | --layout | 启用版面分析 | --layout | | --table | 启用表格识别 | --table | | --seal | 启用印章检测 | --seal | | --output | 输出格式 (json/markdown/both) | --output markdown | | --pages | 页码范围 | --pages 1-5,8,10-12 |

配置

方式一：环境变量

DOCUMENT_PARSER_API_KEY=your_api_key
DOCUMENT_PARSER_BASE_URL=http://47.111.146.164:8088/taidp/v1/idp/general_parse

方式二：配置文件

在技能目录创建 config.json：

{
  "api_key": "your_api_key",
  "base_url": "http://47.111.146.164:8088/taidp/v1/idp/general_parse"
}

输出格式

返回结构化 JSON 包含：

pages: 解析后的页面数组
elements: 版面元素（文本、表格、图片等）
markdown: Markdown 格式文本
data: 数据统计摘要

依赖

requests
python-docx (Word 支持)
Pillow (图片处理)

错误码

| 错误码 | 消息 | 说明 | |--------|------|------| | 10000 | Success | 识别成功 | | 10001 | Missing parameter | 参数缺失 | | 10002 | Invalid parameter | 非法参数 | | 10003 | Invalid file | 文件格式非法 | | 10004 | Failed to recognize | 识别失败 | | 10005 | Internal error | 内部错误 |

Prompt 示例

安装 document-parser 后，可以对 AI 说这些话来触发它

U

Help me get started with document-parser

A

Explains what document-parser does, walks through the setup, and runs a quick demo based on your current project

U

Use document-parser to extract structured data from PDFs, images, and Word files with layo...

A

Invokes document-parser with the right parameters and returns the result directly in the conversation

U

What can I do with document-parser in my documents & notes workflow?

A

Lists the top use cases for document-parser, with example commands for each scenario

常见问题

如何安装 document-parser？▾

将技能文件夹放到 ~/.claude/skills/document-parser/ 目录（个人级，所有项目可用），或 .claude/skills/document-parser/（项目级）。重启 AI 客户端后，用 /document-parser 主动调用，或让 AI 根据上下文自动发现并使用。

document-parser 支持哪些 AI 平台？▾

document-parser 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

document-parser 是免费的吗？▾

document-parser 可免费安装使用。请查阅仓库了解许可证信息。

document-parser 有什么功能？▾

Extract structured data from PDFs, images, and Word files with layout analysis, table recognition, OCR, seal detection, and directory extraction.

document-parser 属于哪个分类？▾

document-parser 属于「Documents & Notes」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with document-parser→Automate Documents & Notes Workflows with document-parser→Team Collaboration with document-parser→

document-parser

安装

关于 document-parser

document-parser

用途

命令

解析文档

查询任务状态

参数说明

配置

方式一：环境变量

方式二：配置文件

输出格式

依赖

错误码

Prompt 示例

常见问题

使用场景

同类技能推荐

Nano Pdf

Obsidian

Notion

Word / DOCX