S

Senior Data Engineer

senior-data-engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka,...

数据来源：ClawHub。在 ClawSkills 查看

2.6k下载量

3收藏数

14浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Senior Data Engineer

--- name: "senior-data-engineer" description: Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues. ---

Senior Data Engineer

Production-grade data engineering skill for building scalable, reliable data systems.

Trigger Phrases

Activate this skill when you see:

Pipeline Design:

"Design a data pipeline for..."
"Build an ETL/ELT process..."
"How should I ingest data from..."
"Set up data extraction from..."

Architecture:

"Should I use batch or streaming?"
"Lambda vs Kappa architecture"
"How to handle late-arriving data"
"Design a data lakehouse"

Data Modeling:

"Create a dimensional model..."
"Star schema vs snowflake"
"Implement slowly changing dimensions"
"Design a data vault"

Data Quality:

"Add data validation to..."
"Set up data quality checks"
"Monitor data freshness"
"Implement data contracts"

Performance:

"Optimize this Spark job"
"Query is running slow"
"Reduce pipeline execution time"
"Tune Airflow DAG"

---

Quick Start

Core Tools

# Generate pipeline orchestration config
python scripts/pipeline_orchestrator.py generate \
  --type airflow \
  --source postgres \
  --destination snowflake \
  --schedule "0 5 * * *"

# Validate data quality
python scripts/data_quality_validator.py validate \
  --input data/sales.parquet \
  --schema schemas/sales.json \
  --checks freshness,completeness,uniqueness

# Optimize ETL performance
python scripts/etl_performance_optimizer.py analyze \
  --query queries/daily_aggregation.sql \
  --engine spark \
  --recommend

---

Workflows

→ See references/workflows.md for details

Architecture Decision Framework

Use this framework to choose the right approach for your data pipeline.

Batch vs Streaming

| Criteria | Batch | Streaming | |----------|-------|-----------| | Latency requirement | Hours to days | Seconds to minutes | | Data volume | Large historical datasets | Continuous event streams | | Processing complexity | Complex transformations, ML | Simple aggregations, filtering | | Cost sensitivity | More cost-effective | Higher infrastructure cost | | Error handling | Easier to reprocess | Requires careful design |

Decision Tree:

Is real-time insight required?
├── Yes → Use streaming
│   └── Is exactly-once semantics needed?
│       ├── Yes → Kafka + Flink/Spark Structured Streaming
│       └── No → Kafka + consumer groups
└── No → Use batch
    └── Is data volume > 1TB daily?
        ├── Yes → Spark/Databricks
        └── No → dbt + warehouse compute

Lambda vs Kappa Architecture

| Aspect | Lambda | Kappa | |--------|--------|-------| | Complexity | Two codebases (batch + stream) | Single codebase | | Maintenance | Higher (sync batch/stream logic) | Lower | | Reprocessing | Native batch layer | Replay from source | | Use case | ML training + real-time serving | Pure event-driven |

When to choose Lambda:

Need to train ML models on historical data
Complex batch transformations not feasible in streaming
Existing batch infrastructure

When to choose Kappa:

Event-sourced architecture
All processing can be expressed as stream operations
Starting fresh without legacy systems

Data Warehouse vs Data Lakehouse

| Feature | Warehouse (Snowflake/BigQuery) | Lakehouse (Delta/Iceberg) | |---------|-------------------------------|---------------------------| | Best for | BI, SQL analytics | ML, unstructured data | | Storage cost | Higher (proprietary format) | Lower (open formats) | | Flexibility | Schema-on-write | Schema-on-read | | Performance | Excellent for SQL | Good, improving | | Ecosystem | Mature BI tools | Growing ML tooling |

---

Tech Stack

| Category | Technologies | |----------|--------------| | Languages | Python, SQL, Scala | | Orchestration | Airflow, Prefect, Dagster | | Transformation | dbt, Spark, Flink | | Streaming | Kafka, Kinesis, Pub/Sub | | Storage | S3, GCS, Delta Lake, Iceberg | | Warehouses | Snowflake, BigQuery, Redshift, Databricks | | Quality | Great Expectations, dbt tests, Monte Carlo | | Monitoring | Prometheus, Grafana, Datadog |

---

Reference Documentation

1. Data Pipeline Architecture

See references/data_pipeline_architecture.md for:

Lambda vs Kappa architecture patterns
Batch processing with Spark and Airflow
Stream processing with Kafka and Flink
Exactly-once semantics implementation
Error handling and dead letter queues

2. Data Modeling Patterns

See references/data_modeling_patterns.md for:

Dimensional modeling (Star/Snowflake)
Slowly Changing Dimensions (SCD Types 1-6)
Data Vault modeling
dbt best practices
Partitioning and clustering

3. DataOps Best Practices

See references/dataops_best_practices.md for:

Data testing frameworks
Data contracts and schema validation
CI/CD for data pipelines
Observability and lineage
Incident response

---

Troubleshooting

→ See references/troubleshooting.md for details

Prompt 示例

安装 Senior Data Engineer 后，可以对 AI 说这些话来触发它

U

Help me get started with Senior Data Engineer

A

Explains what Senior Data Engineer does, walks through the setup, and runs a quick demo based on your current project

U

Use Senior Data Engineer to data engineering skill for building scalable data pipelines, ETL/EL...

A

Invokes Senior Data Engineer with the right parameters and returns the result directly in the conversation

U

What can I do with Senior Data Engineer in my developer & devops workflow?

A

Lists the top use cases for Senior Data Engineer, with example commands for each scenario

常见问题

如何安装 Senior Data Engineer？▾

将技能文件夹放到 ~/.claude/skills/senior-data-engineer/ 目录（个人级，所有项目可用），或 .claude/skills/senior-data-engineer/（项目级）。重启 AI 客户端后，用 /senior-data-engineer 主动调用，或让 AI 根据上下文自动发现并使用。

Senior Data Engineer 支持哪些 AI 平台？▾

Senior Data Engineer 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Senior Data Engineer 是免费的吗？▾

Senior Data Engineer 可免费安装使用。请查阅仓库了解许可证信息。

Senior Data Engineer 有什么功能？▾

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka,...

Senior Data Engineer 属于哪个分类？▾

Senior Data Engineer 属于「Developer & DevOps」分类，该分类的技能帮助 AI 智能体在此领域执行专业任务。

使用场景

Getting Started with Senior Data Engineer→Automate Developer & DevOps Workflows with Senior Data Engineer→Team Collaboration with Senior Data Engineer→

Senior Data Engineer

安装

关于 Senior Data Engineer

Senior Data Engineer

Table of Contents

Trigger Phrases

Quick Start

Core Tools

Workflows

Architecture Decision Framework

Batch vs Streaming

Lambda vs Kappa Architecture

Data Warehouse vs Data Lakehouse

Tech Stack

Reference Documentation

1. Data Pipeline Architecture

2. Data Modeling Patterns

3. DataOps Best Practices

Troubleshooting

Prompt 示例

常见问题

使用场景

同类技能推荐

Github

Browser Use

Browser Automation

Playwright MCP