Comprehensive time series data science skill covering feature engineering, model training, and competition-winning strategies for forecasting and prediction problems.
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install time-series-analysis或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install time-series-analysis⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/time-series-analysis/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: time-series-ds description: Comprehensive time series data science skill covering feature engineering, model training, and competition-winning strategies for forecasting and prediction problems. ---
Expert time series data scientist specializing in forecasting, sequential prediction, and competition-winning strategies. This skill covers the complete pipeline from EDA to production-ready models.
- Focus on 5-10 most predictive features, not all available - Lag, rolling, and EWM features are often more valuable than the raw data - Interaction features between top predictors can be game-changers
- NEVER use random splits for time series - Train on past, validate on future (e.g., ts_index <= threshold) - Leakage from future data will destroy real-world performance
- If weights are provided, use them directly in training - High-weight samples disproportionately affect score - Sample weighting in model.fit() is better than custom loss
- Train same model with different random seeds - Average predictions reduces variance - Common seeds: 42, 2024, or any fixed set
---
GROUP_COLS = ['entity_id', 'category', 'horizon']
for lag in [1, 3, 5, 10]:
df[f'{col}_lag{lag}'] = df.groupby(GROUP_COLS)[col].shift(lag)
for window in [5, 10, 20]:
df[f'{col}_roll_mean{window}'] = df.groupby(GROUP_COLS)[col].transform(
lambda x: x.rolling(window, min_periods=1).mean()
)
df[f'{col}_roll_std{window}'] = df.groupby(GROUP_COLS)[col].transform(
lambda x: x.rolling(window, min_periods=1).std()
)
for span in [5, 10]:
df[f'{col}_ewm{span}'] = df.groupby(GROUP_COLS)[col].transform(
lambda x: x.ewm(span=span, adjust=False).mean()
)
df[f'{col}_diff1'] = df.groupby(GROUP_COLS)[col].diff(1)
df[f'{col}_diff_pct'] = df.groupby(GROUP_COLS)[col].pct_change(1)
# Difference between related features
df['feat_diff'] = df['feature_a'] - df['feature_b']
# Ratio between features
df['feat_ratio'] = df['feature_a'] / (df['feature_b'] + 1e-7)
# Product interactions
df['feat_product'] = df['feature_a'] * df['feature_b']
# Compute on training data only (ts_index <= threshold)
train_only = df[df.ts_index <= VAL_THRESHOLD]
enc_stats = {
'category': train_only.groupby('category')['target'].mean().to_dict(),
'global_mean': train_only['target'].mean()
}
# Apply to all data
df['category_enc'] = df['category'].map(enc_stats['category']).fillna(enc_stats['global_mean'])
# Cyclical encoding for periodicity
df['t_cycle'] = np.sin(2 * np.pi * df['ts_index'] / period)
df['t_cycle_cos'] = np.cos(2 * np.pi * df['ts_index'] / period)
# Normalized time position
df['ts_normalized'] = df['ts_index'] / df['ts_index'].max()
# Time bins
df['ts_bin'] = pd.cut(df['ts_index'], bins=10, labels=False)
---
lgb_cfg = {
'objective': 'regression',
'metric': 'rmse',
'learning_rate': 0.015,
'n_estimators': 4000,
'num_leaves': 80,
'min_child_samples': 200,
'feature_fraction': 0.6,
'bagging_fraction': 0.7,
'bagging_freq': 5,
'lambda_l1': 0.1,
'lambda_l2': 10.0,
'verbosity': -1
}
val_pred = np.zeros(len(y_val))
test_pred = np.zeros(len(X_test))
for seed in [42, 2024]:
model = lgb.LGBMRegressor(**lgb_cfg, random_state=seed)
model.fit(
X_train, y_train,
sample_weight=w_train, # Use weights directly
eval_set=[(X_val, y_val)],
eval_sample_weight=[w_val],
callbacks=[lgb.early_stopping(200, verbose=False)]
)
val_pred += model.predict(X_val) / 2
test_pred += model.predict(X_test) / 2
# Train separate model per forecast horizon
for horizon in [1, 3, 10, 25]:
train_h = df[df.horizon == horizon]
test_h = test_df[test_df.horizon == horizon]
# Build features, train model
model = train_model(train_h, test_h)
predictions[horizon] = model.predict(test_h)
---
VAL_THRESHOLD = int(df['ts_index'].max() * 0.85)
train_mask = df['ts_index'] <= VAL_THRESHOLD
val_mask = df['ts_index'] > VAL_THRESHOLD
X_train = df.loc[train_mask, feature_cols]
X_val = df.loc[val_mask, feature_cols]
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(df):
# Train on expanding window
pass
def weighted_rmse_score(y_true, y_pred, weights):
"""Weighted RMSE skill score (higher is better)"""
denom = np.sum(weights * y_true**2)
if denom <= 0:
return 0.0
numer = np.sum(weights * (y_true - y_pred)**2)
ratio = numer / denom
return float(np.sqrt(1.0 - np.clip(ratio, 0.0, 1.0)))
---
- Distribution by time period - Distribution by category/horizon - Trend and seasonality detection
- Pattern analysis (random vs systematic) - Group-based imputation strategy
- Concentration analysis - Impact on scoring metric
- Correlation with target - Multicollinearity between features
- Stationarity tests - Rolling statistics visualization
---
| Pitfall | Solution | |---------|----------| | Random train/test split | Use time-based split | | Using future data for encoding | Compute stats on train only | | Ignoring sample weights | Use sample_weight in fit() | | Too many features | Focus on top 5-10 predictors | | Single model | Multi-seed ensemble | | Overfitting validation | Large early stopping patience |
---
graph TD
A[Load Data] --> B[Compute Encoding Stats on Train]
B --> C[Build Features]
C --> D[Time-Based Split]
D --> E{For Each Horizon}
E --> F[Train Multi-Seed Ensemble]
F --> G[Validate & Score]
G --> H[Generate Predictions]
H --> I[Aggregate & Submit]
---
# Run complete pipeline
python train_winning.py
# Generate submission
python generate_submission.py
# Validate submission format
python -c "
import pandas as pd
sub = pd.read_csv('submission.csv')
print(f'Rows: {len(sub)}, Cols: {list(sub.columns)}')
print(sub.head())
"
---
/data-analyst for comprehensive EDA
/data-scientist for advanced feature engineering
/fintech-engineer for financial risk analysis
/quant-analyst for portfolio strategies安装 time-sereis-analysis 后,可以对 AI 说这些话来触发它
Help me get started with time-sereis-analysis
Explains what time-sereis-analysis does, walks through the setup, and runs a quick demo based on your current project
Use time-sereis-analysis to comprehensive time series data science skill covering feature engin...
Invokes time-sereis-analysis with the right parameters and returns the result directly in the conversation
What can I do with time-sereis-analysis in my data & analytics workflow?
Lists the top use cases for time-sereis-analysis, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/time-series-analysis/ 目录(个人级,所有项目可用),或 .claude/skills/time-series-analysis/(项目级)。重启 AI 客户端后,用 /time-series-analysis 主动调用,或让 AI 根据上下文自动发现并使用。
time-sereis-analysis 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
time-sereis-analysis 可免费安装使用。请查阅仓库了解许可证信息。
Comprehensive time series data science skill covering feature engineering, model training, and competition-winning strategies for forecasting and prediction problems.
time-sereis-analysis 属于「Data & Analytics」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my data & analytics tasks using time-sereis-analysis
Identifies repetitive steps in your workflow and sets up time-sereis-analysis to handle them automatically