L

Links to PDFs

links-to-pdfs

Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.

数据来源：ClawHub。在 ClawSkills 查看

2.0k下载量

2收藏数

1浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Links to PDFs

--- name: scraper description: Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs. ---

docs-scraper

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

Installation

npm install -g docs-scraper

Quick start

Scrape any document URL to PDF:

docs-scraper scrape https://example.com/document

Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf

Basic scraping

Scrape with daemon (recommended, keeps browser warm):

docs-scraper scrape <url>

Scrape with named profile (for authenticated sites):

docs-scraper scrape <url> -p <profile-name>

Scrape with pre-filled data (e.g., email for DocSend):

docs-scraper scrape <url> -D [email protected]

Direct mode (single-shot, no daemon):

docs-scraper scrape <url> --no-daemon

Authentication workflow

When a document requires authentication (login, email verification, passcode):

Initial scrape returns a job ID:

```bash docs-scraper scrape https://docsend.com/view/xxx # Output: Scrape blocked # Job ID: abc123 ```

Retry with data:

```bash docs-scraper update abc123 -D [email protected] # or with password docs-scraper update abc123 -D [email protected] -D password=1234 ```

Profile management

Profiles store session cookies for authenticated sites.

docs-scraper profiles list     # List saved profiles
docs-scraper profiles clear    # Clear all profiles
docs-scraper scrape <url> -p myprofile  # Use a profile

Daemon management

The daemon keeps browser instances warm for faster scraping.

docs-scraper daemon status     # Check status
docs-scraper daemon start      # Start manually
docs-scraper daemon stop       # Stop daemon

Note: Daemon auto-starts when running scrape commands.

Cleanup

PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.

Manual cleanup:

docs-scraper cleanup                    # Delete all PDFs
docs-scraper cleanup --older-than 1h    # Delete PDFs older than 1 hour

Job management

docs-scraper jobs list         # List blocked jobs awaiting auth

Supported sources

Direct PDF links - Downloads PDF directly
Notion pages - Exports Notion page to PDF
DocSend documents - Handles DocSend viewer
LLM fallback - Uses Claude API for any other webpage

---

Scraper Reference

Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.

DirectPdfScraper

Handles: URLs ending in .pdf

Data fields: None (downloads directly)

Example:

docs-scraper scrape https://example.com/document.pdf

---

DocsendScraper

Handles: docsend.com/view/, docsend.com/v/, and subdomains (e.g., org-a.docsend.com)

URL patterns:

Documents: https://docsend.com/view/{id} or https://docsend.com/v/{id}
Folders: https://docsend.com/view/s/{id}
Subdomains: https://{subdomain}.docsend.com/view/{id}

Data fields:

| Field | Type | Description | |-------|------|-------------| | email | email | Email address for document access | | password | password | Passcode/password for protected documents | | name | text | Your name (required for NDA-gated documents) |

Examples:

# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected]

# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected] -D password=secret123

# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D [email protected] -D name="John Doe"

# Retry blocked job
docs-scraper update abc123 -D [email protected] -D password=secret123

Notes:

DocSend may require any combination of email, password, and name
Folders are scraped as a table of contents PDF with document links
The scraper auto-checks NDA checkboxes when name is provided

---

NotionScraper

Handles: notion.so/, .notion.site/*

Data fields:

| Field | Type | Description | |-------|------|-------------| | email | email | Notion account email | | password | password | Notion account password |

Examples:

# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123

# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
  -D [email protected] -D password=mypassword

# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123

Notes:

Public Notion pages don't require authentication
Toggle blocks are automatically expanded before PDF generation
Uses session profiles to persist login across scrapes

---

LlmFallbackScraper

Handles: Any URL not matched by other scrapers (automatic fallback)

Data fields: Dynamic - determined by Claude analyzing the page

The LLM scraper uses Claude to analyze the page HTML and detect:

Login forms (extracts field names dynamically)
Cookie banners (auto-dismisses)
Expandable content (auto-expands)
CAPTCHAs (reports as blocked)
Paywalls (reports as blocked)

Common dynamic fields:

| Field | Type | Description | |-------|------|-------------| | email | email | Login email (if detected) | | password | password | Login password (if detected) | | username | text | Username (if login uses username) |

Examples:

# Generic webpage (no auth)
docs-scraper scrape https://example.com/article

# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
  -D [email protected] -D password=secret

# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret

Notes:

Requires ANTHROPIC_API_KEY environment variable
Field names are extracted from the page's actual form fields
Limited to 2 login attempts before failing
CAPTCHAs require manual intervention

---

Data field summary

| Scraper | email | password | name | Other | |---------|-------|----------|------|-------| | DirectPdf | - | - | - | - | | DocSend | ✓ | ✓ | ✓ | - | | Notion | ✓ | ✓ | - | - | | LLM Fallback | ✓ | ✓ | - | Dynamic* |

*Fields detected dynamically from page analysis

Environment setup (optional)

Only needed for LLM fallback scraper:

export ANTHROPIC_API_KEY=your_key

Optional browser settings:

export BROWSER_HEADLESS=true   # Set false for debugging

Common patterns

Archive a Notion page:

docs-scraper scrape https://notion.so/My-Page-abc123

Download protected DocSend:

docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update <job-id> -D [email protected] -D password=1234

Batch scraping with profiles:

docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysite

Output

Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf) Blocked: Job ID + required credential types

Troubleshooting

Timeout: docs-scraper daemon stop && docs-scraper daemon start
Auth fails: docs-scraper jobs list to check pending jobs
Disk full: docs-scraper cleanup to remove old PDFs

Prompt 示例

安装 Links to PDFs 后，可以对 AI 说这些话来触发它

U

Help me get started with Links to PDFs

A

Explains what Links to PDFs does, walks through the setup, and runs a quick demo based on your current project

U

Use Links to PDFs to scrape documents from Notion, DocSend, PDFs, and other sources into...

A

Invokes Links to PDFs with the right parameters and returns the result directly in the conversation

U

What can I do with Links to PDFs in my documents & notes workflow?

A

Lists the top use cases for Links to PDFs, with example commands for each scenario

常见问题

如何安装 Links to PDFs？▾

将技能文件夹放到 ~/.claude/skills/links-to-pdfs/ 目录（个人级，所有项目可用），或 .claude/skills/links-to-pdfs/（项目级）。重启 AI 客户端后，用 /links-to-pdfs 主动调用，或让 AI 根据上下文自动发现并使用。

Links to PDFs 支持哪些 AI 平台？▾

Links to PDFs 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Links to PDFs 是免费的吗？▾

Links to PDFs 可免费安装使用。请查阅仓库了解许可证信息。

Links to PDFs 有什么功能？▾

Scrape documents from Notion, DocSend, PDFs, and other sources into local PDF files. Use when the user needs to download, archive, or convert web documents to PDF format. Supports authentication flows for protected documents and session persistence via profiles. Returns local file paths to downloaded PDFs.

Links to PDFs 属于哪个分类？▾

Links to PDFs 属于「Documents & Notes」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with Links to PDFs→Automate Documents & Notes Workflows with Links to PDFs→Team Collaboration with Links to PDFs→

Links to PDFs

安装

关于 Links to PDFs

docs-scraper

Installation

Quick start

Basic scraping

Authentication workflow

Profile management

Daemon management

Cleanup

Job management

Supported sources

Scraper Reference

DirectPdfScraper

DocsendScraper

NotionScraper

LlmFallbackScraper

Data field summary

Environment setup (optional)

Common patterns

Output

Troubleshooting

Prompt 示例

常见问题

使用场景

同类技能推荐

Nano Pdf

Obsidian

Notion

Word / DOCX