Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka,...
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install senior-data-engineer或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install senior-data-engineer⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/senior-data-engineer/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: "senior-data-engineer" description: Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues. ---
Production-grade data engineering skill for building scalable, reliable data systems.
- Building a Batch ETL Pipeline - Implementing Real-Time Streaming - Data Quality Framework Setup
---
Activate this skill when you see:
Pipeline Design:
Architecture:
Data Modeling:
Data Quality:
Performance:
---
# Generate pipeline orchestration config
python scripts/pipeline_orchestrator.py generate \
--type airflow \
--source postgres \
--destination snowflake \
--schedule "0 5 * * *"
# Validate data quality
python scripts/data_quality_validator.py validate \
--input data/sales.parquet \
--schema schemas/sales.json \
--checks freshness,completeness,uniqueness
# Optimize ETL performance
python scripts/etl_performance_optimizer.py analyze \
--query queries/daily_aggregation.sql \
--engine spark \
--recommend
---
→ See references/workflows.md for details
Use this framework to choose the right approach for your data pipeline.
| Criteria | Batch | Streaming | |----------|-------|-----------| | Latency requirement | Hours to days | Seconds to minutes | | Data volume | Large historical datasets | Continuous event streams | | Processing complexity | Complex transformations, ML | Simple aggregations, filtering | | Cost sensitivity | More cost-effective | Higher infrastructure cost | | Error handling | Easier to reprocess | Requires careful design |
Decision Tree:
Is real-time insight required?
├── Yes → Use streaming
│ └── Is exactly-once semantics needed?
│ ├── Yes → Kafka + Flink/Spark Structured Streaming
│ └── No → Kafka + consumer groups
└── No → Use batch
└── Is data volume > 1TB daily?
├── Yes → Spark/Databricks
└── No → dbt + warehouse compute
| Aspect | Lambda | Kappa | |--------|--------|-------| | Complexity | Two codebases (batch + stream) | Single codebase | | Maintenance | Higher (sync batch/stream logic) | Lower | | Reprocessing | Native batch layer | Replay from source | | Use case | ML training + real-time serving | Pure event-driven |
When to choose Lambda:
When to choose Kappa:
| Feature | Warehouse (Snowflake/BigQuery) | Lakehouse (Delta/Iceberg) | |---------|-------------------------------|---------------------------| | Best for | BI, SQL analytics | ML, unstructured data | | Storage cost | Higher (proprietary format) | Lower (open formats) | | Flexibility | Schema-on-write | Schema-on-read | | Performance | Excellent for SQL | Good, improving | | Ecosystem | Mature BI tools | Growing ML tooling |
---
| Category | Technologies | |----------|--------------| | Languages | Python, SQL, Scala | | Orchestration | Airflow, Prefect, Dagster | | Transformation | dbt, Spark, Flink | | Streaming | Kafka, Kinesis, Pub/Sub | | Storage | S3, GCS, Delta Lake, Iceberg | | Warehouses | Snowflake, BigQuery, Redshift, Databricks | | Quality | Great Expectations, dbt tests, Monte Carlo | | Monitoring | Prometheus, Grafana, Datadog |
---
See references/data_pipeline_architecture.md for:
See references/data_modeling_patterns.md for:
See references/dataops_best_practices.md for:
---
→ See references/troubleshooting.md for details
安装 Senior Data Engineer 后,可以对 AI 说这些话来触发它
Help me get started with Senior Data Engineer
Explains what Senior Data Engineer does, walks through the setup, and runs a quick demo based on your current project
Use Senior Data Engineer to data engineering skill for building scalable data pipelines, ETL/EL...
Invokes Senior Data Engineer with the right parameters and returns the result directly in the conversation
What can I do with Senior Data Engineer in my developer & devops workflow?
Lists the top use cases for Senior Data Engineer, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/senior-data-engineer/ 目录(个人级,所有项目可用),或 .claude/skills/senior-data-engineer/(项目级)。重启 AI 客户端后,用 /senior-data-engineer 主动调用,或让 AI 根据上下文自动发现并使用。
Senior Data Engineer 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
Senior Data Engineer 可免费安装使用。请查阅仓库了解许可证信息。
Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka,...
Senior Data Engineer 属于「Developer & DevOps」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my developer & devops tasks using Senior Data Engineer
Identifies repetitive steps in your workflow and sets up Senior Data Engineer to handle them automatically