S

Senior Ml Engineer

senior-ml-engineer

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, R...

数据来源：ClawHub。在 ClawSkills 查看

2.2k下载量

2收藏数

19浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 Senior Ml Engineer

--- name: "senior-ml-engineer" description: ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training. triggers: - MLOps pipeline - model deployment - feature store - model monitoring - drift detection - RAG system - LLM integration - model serving - A/B testing ML - automated retraining ---

Senior ML Engineer

Production ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.

---

Model Deployment Workflow

Deploy a trained model to production with monitoring:

Export model to standardized format (ONNX, TorchScript, SavedModel)
Package model with dependencies in Docker container
Deploy to staging environment
Run integration tests against staging
Deploy canary (5% traffic) to production
Monitor latency and error rates for 1 hour
Promote to full production if metrics pass
Validation: p95 latency < 100ms, error rate < 0.1%

Container Template

FROM python:3.11-slim

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model/ /app/model/
COPY src/ /app/src/

HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]

Serving Options

| Option | Latency | Throughput | Use Case | |--------|---------|------------|----------| | FastAPI + Uvicorn | Low | Medium | REST APIs, small models | | Triton Inference Server | Very Low | Very High | GPU inference, batching | | TensorFlow Serving | Low | High | TensorFlow models | | TorchServe | Low | High | PyTorch models | | Ray Serve | Medium | High | Complex pipelines, multi-model |

---

MLOps Pipeline Setup

Establish automated training and deployment:

Configure feature store (Feast, Tecton) for training data
Set up experiment tracking (MLflow, Weights & Biases)
Create training pipeline with hyperparameter logging
Register model in model registry with version metadata
Configure staging deployment triggered by registry events
Set up A/B testing infrastructure for model comparison
Enable drift monitoring with alerting
Validation: New models automatically evaluated against baseline

Feature Store Pattern

from feast import Entity, Feature, FeatureView, FileSource

user = Entity(name="user_id", value_type=ValueType.INT64)

user_features = FeatureView(
    name="user_features",
    entities=["user_id"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="purchase_count_30d", dtype=ValueType.INT64),
        Feature(name="avg_order_value", dtype=ValueType.FLOAT),
    ],
    online=True,
    source=FileSource(path="data/user_features.parquet"),
)

Retraining Triggers

| Trigger | Detection | Action | |---------|-----------|--------| | Scheduled | Cron (weekly/monthly) | Full retrain | | Performance drop | Accuracy < threshold | Immediate retrain | | Data drift | PSI > 0.2 | Evaluate, then retrain | | New data volume | X new samples | Incremental update |

---

LLM Integration Workflow

Integrate LLM APIs into production applications:

Create provider abstraction layer for vendor flexibility
Implement retry logic with exponential backoff
Configure fallback to secondary provider
Set up token counting and context truncation
Add response caching for repeated queries
Implement cost tracking per request
Add structured output validation with Pydantic
Validation: Response parses correctly, cost within budget

Provider Abstraction

from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential

class LLMProvider(ABC):
    @abstractmethod
    def complete(self, prompt: str, **kwargs) -> str:
        pass

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
    return provider.complete(prompt)

Cost Management

| Provider | Input Cost | Output Cost | |----------|------------|-------------| | GPT-4 | $0.03/1K | $0.06/1K | | GPT-3.5 | $0.0005/1K | $0.0015/1K | | Claude 3 Opus | $0.015/1K | $0.075/1K | | Claude 3 Haiku | $0.00025/1K | $0.00125/1K |

---

RAG System Implementation

Build retrieval-augmented generation pipeline:

Choose vector database (Pinecone, Qdrant, Weaviate)
Select embedding model based on quality/cost tradeoff
Implement document chunking strategy
Create ingestion pipeline with metadata extraction
Build retrieval with query embedding
Add reranking for relevance improvement
Format context and send to LLM
Validation: Response references retrieved context, no hallucinations

Vector Database Selection

| Database | Hosting | Scale | Latency | Best For | |----------|---------|-------|---------|----------| | Pinecone | Managed | High | Low | Production, managed | | Qdrant | Both | High | Very Low | Performance-critical | | Weaviate | Both | High | Low | Hybrid search | | Chroma | Self-hosted | Medium | Low | Prototyping | | pgvector | Self-hosted | Medium | Medium | Existing Postgres |

Chunking Strategies

| Strategy | Chunk Size | Overlap | Best For | |----------|------------|---------|----------| | Fixed | 500-1000 tokens | 50-100 | General text | | Sentence | 3-5 sentences | 1 sentence | Structured text | | Semantic | Variable | Based on meaning | Research papers | | Recursive | Hierarchical | Parent-child | Long documents |

---

Model Monitoring

Monitor production models for drift and degradation:

Set up latency tracking (p50, p95, p99)
Configure error rate alerting
Implement input data drift detection
Track prediction distribution shifts
Log ground truth when available
Compare model versions with A/B metrics
Set up automated retraining triggers
Validation: Alerts fire before user-visible degradation

Drift Detection

from scipy.stats import ks_2samp

def detect_drift(reference, current, threshold=0.05):
    statistic, p_value = ks_2samp(reference, current)
    return {
        "drift_detected": p_value < threshold,
        "ks_statistic": statistic,
        "p_value": p_value
    }

Alert Thresholds

| Metric | Warning | Critical | |--------|---------|----------| | p95 latency | > 100ms | > 200ms | | Error rate | > 0.1% | > 1% | | PSI (drift) | > 0.1 | > 0.2 | | Accuracy drop | > 2% | > 5% |

---

Reference Documentation

MLOps Production Patterns

references/mlops_production_patterns.md contains:

Model deployment pipeline with Kubernetes manifests
Feature store architecture with Feast examples
Model monitoring with drift detection code
A/B testing infrastructure with traffic splitting
Automated retraining pipeline with MLflow

LLM Integration Guide

references/llm_integration_guide.md contains:

Provider abstraction layer pattern
Retry and fallback strategies with tenacity
Prompt engineering templates (few-shot, CoT)
Token optimization with tiktoken
Cost calculation and tracking

RAG System Architecture

references/rag_system_architecture.md contains:

...

Prompt 示例

安装 Senior Ml Engineer 后，可以对 AI 说这些话来触发它

U

Help me get started with Senior Ml Engineer

A

Explains what Senior Ml Engineer does, walks through the setup, and runs a quick demo based on your current project

U

Use Senior Ml Engineer to mL engineering skill for productionizing models, building MLOps pip...

A

Invokes Senior Ml Engineer with the right parameters and returns the result directly in the conversation

U

What can I do with Senior Ml Engineer in my developer & devops workflow?

A

Lists the top use cases for Senior Ml Engineer, with example commands for each scenario

常见问题

如何安装 Senior Ml Engineer？▾

将技能文件夹放到 ~/.claude/skills/senior-ml-engineer/ 目录（个人级，所有项目可用），或 .claude/skills/senior-ml-engineer/（项目级）。重启 AI 客户端后，用 /senior-ml-engineer 主动调用，或让 AI 根据上下文自动发现并使用。

Senior Ml Engineer 支持哪些 AI 平台？▾

Senior Ml Engineer 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

Senior Ml Engineer 是免费的吗？▾

Senior Ml Engineer 可免费安装使用。请查阅仓库了解许可证信息。

Senior Ml Engineer 有什么功能？▾