S

smart_ocr

smar

🌐 English

Extract text from images and scanned documents using PaddleOCR - supports 100+ languages

数据来源：ClawHub。在 ClawSkills 查看

1.2k下载量

0收藏数

13浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 smart_ocr

--- name: smart-ocr description: Extract text from images and scanned documents using PaddleOCR - supports 100+ languages author: claude-office-skills version: "1.0" tags: [ocr, paddleocr, text-extraction, multilingual, image] models: [claude-sonnet-4, claude-opus-4] tools: [computer, code_execution, file_operations] library: name: PaddleOCR url: https://github.com/PaddlePaddle/PaddleOCR stars: 69k ---

Smart OCR Skill

Overview

This skill enables intelligent text extraction from images and scanned documents using PaddleOCR - a leading OCR engine supporting 100+ languages. Extract text from photos, screenshots, scanned PDFs, and handwritten documents with high accuracy.

How to Use

Provide the image or scanned document
Optionally specify language(s) to detect
I'll extract text with position and confidence data

Example prompts:

"Extract all text from this screenshot"
"OCR this scanned PDF document"
"Read the text from this business card photo"
"Extract Chinese and English text from this image"

Domain Knowledge

PaddleOCR Fundamentals

from paddleocr import PaddleOCR

# Initialize OCR engine
ocr = PaddleOCR(use_angle_cls=True, lang='en')

# Run OCR on image
result = ocr.ocr('image.png', cls=True)

# Result structure: [[box, (text, confidence)], ...]
for line in result[0]:
    box = line[0]      # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    text = line[1][0]  # Extracted text
    conf = line[1][1]  # Confidence score
    print(f"{text} ({conf:.2f})")

Supported Languages

# Common language codes
languages = {
    'en': 'English',
    'ch': 'Chinese (Simplified)',
    'cht': 'Chinese (Traditional)',
    'japan': 'Japanese',
    'korean': 'Korean',
    'french': 'French',
    'german': 'German',
    'spanish': 'Spanish',
    'russian': 'Russian',
    'arabic': 'Arabic',
    'hindi': 'Hindi',
    'vi': 'Vietnamese',
    'th': 'Thai',
    # ... 100+ languages supported
}

# Use specific language
ocr = PaddleOCR(lang='ch')  # Chinese
ocr = PaddleOCR(lang='japan')  # Japanese
ocr = PaddleOCR(lang='multilingual')  # Auto-detect

Configuration Options

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    # Detection settings
    det_model_dir=None,         # Custom detection model
    det_limit_side_len=960,     # Max side length for detection
    det_db_thresh=0.3,          # Binarization threshold
    det_db_box_thresh=0.5,      # Box score threshold
    
    # Recognition settings
    rec_model_dir=None,         # Custom recognition model
    rec_char_dict_path=None,    # Custom character dictionary
    
    # Angle classification
    use_angle_cls=True,         # Enable angle classification
    cls_model_dir=None,         # Custom classification model
    
    # Language
    lang='en',                  # Language code
    
    # Performance
    use_gpu=True,               # Use GPU if available
    gpu_mem=500,                # GPU memory limit (MB)
    enable_mkldnn=True,         # CPU optimization
    
    # Output
    show_log=False,             # Suppress logs
)

Processing Different Sources

Image Files

# Single image
result = ocr.ocr('image.png')

# Multiple images
images = ['img1.png', 'img2.png', 'img3.png']
for img in images:
    result = ocr.ocr(img)
    process_result(result)

PDF Files (Scanned)

from pdf2image import convert_from_path

def ocr_pdf(pdf_path):
    """OCR a scanned PDF."""
    # Convert PDF pages to images
    images = convert_from_path(pdf_path)
    
    all_text = []
    for i, img in enumerate(images):
        # Save temp image
        temp_path = f'temp_page_{i}.png'
        img.save(temp_path)
        
        # OCR the image
        result = ocr.ocr(temp_path)
        
        # Extract text
        page_text = '\n'.join([line[1][0] for line in result[0]])
        all_text.append(f"--- Page {i+1} ---\n{page_text}")
        
        os.remove(temp_path)
    
    return '\n\n'.join(all_text)

URLs and Bytes

import requests
from io import BytesIO

# From URL
response = requests.get('https://example.com/image.png')
result = ocr.ocr(BytesIO(response.content))

# From bytes
with open('image.png', 'rb') as f:
    img_bytes = f.read()
result = ocr.ocr(BytesIO(img_bytes))

Result Processing

def process_ocr_result(result):
    """Process OCR result into structured data."""
    
    lines = []
    for line in result[0]:
        box = line[0]
        text = line[1][0]
        confidence = line[1][1]
        
        # Calculate bounding box
        x_coords = [p[0] for p in box]
        y_coords = [p[1] for p in box]
        
        lines.append({
            'text': text,
            'confidence': confidence,
            'bbox': {
                'left': min(x_coords),
                'top': min(y_coords),
                'right': max(x_coords),
                'bottom': max(y_coords),
            },
            'raw_box': box
        })
    
    return lines

# Sort by position (top to bottom, left to right)
def sort_by_position(lines):
    return sorted(lines, key=lambda x: (x['bbox']['top'], x['bbox']['left']))

Text Layout Reconstruction

def reconstruct_layout(result, line_threshold=10):
    """Reconstruct text layout from OCR results."""
    
    lines = process_ocr_result(result)
    lines = sort_by_position(lines)
    
    # Group into logical lines
    text_lines = []
    current_line = []
    current_y = None
    
    for line in lines:
        y = line['bbox']['top']
        
        if current_y is None or abs(y - current_y) < line_threshold:
            current_line.append(line)
            current_y = y
        else:
            # New line
            text_lines.append(' '.join([l['text'] for l in current_line]))
            current_line = [line]
            current_y = y
    
    # Add last line
    if current_line:
        text_lines.append(' '.join([l['text'] for l in current_line]))
    
    return '\n'.join(text_lines)

Best Practices

Preprocess Images: Improve quality before OCR
Choose Correct Language: Specify language for better accuracy
Handle Multi-column: Process columns separately
Filter Low Confidence: Skip results below threshold
Batch Processing: Process multiple images efficiently

Common Patterns

Image Preprocessing

from PIL import Image, ImageEnhance, ImageFilter

def preprocess_image(image_path):
    """Preprocess image for better OCR."""
    img = Image.open(image_path)
    
    # Convert to grayscale
    img = img.convert('L')
    
    # Enhance contrast
    enhancer = ImageEnhance.Contrast(img)
    img = enhancer.enhance(2.0)
    
    # Sharpen
    img = img.filter(ImageFilter.SHARPEN)
    
    # Save preprocessed
    preprocessed_path = 'preprocessed.png'
    img.save(preprocessed_path)
    
    return preprocessed_path

Batch OCR with Progress

from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor

def batch_ocr(image_paths, max_workers=4):
    """OCR multiple images in parallel."""
    
    results = {}
    
    def process_single(img_path):
        result = ocr.ocr(img_path)
        return img_path, result
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(process_single, p) for p in image_paths]
        
        for future in tqdm(futures, desc="Processing OCR"):
            path, result = future.result()
            results[path] = result
    
    return results

Examples

Example 1: Business Card Reader

from paddleocr import PaddleOCR

...

Prompt 示例

安装 smart_ocr 后，可以对 AI 说这些话来触发它

U

Help me get started with smart_ocr

A

Explains what smart_ocr does, walks through the setup, and runs a quick demo based on your current project

U

Use smart_ocr to extract text from images and scanned documents using PaddleOCR - su...

A

Invokes smart_ocr with the right parameters and returns the result directly in the conversation

U

What can I do with smart_ocr in my documents & notes workflow?

A

Lists the top use cases for smart_ocr, with example commands for each scenario

常见问题

如何安装 smart_ocr？▾

将技能文件夹放到 ~/.claude/skills/smar/ 目录（个人级，所有项目可用），或 .claude/skills/smar/（项目级）。重启 AI 客户端后，用 /smar 主动调用，或让 AI 根据上下文自动发现并使用。

smart_ocr 支持哪些 AI 平台？▾

smart_ocr 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

smart_ocr 是免费的吗？▾

smart_ocr 可免费安装使用。请查阅仓库了解许可证信息。

smart_ocr 有什么功能？▾