Rate limiting algorithms, implementation strategies, HTTP conventions, tiered limits, distributed patterns, and client-side handling. Use when protecting APIs from abuse, implementing usage tiers, or configuring gateway-level throttling.
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install api-rate-limiting或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install api-rate-limiting⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/api-rate-limiting/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: rate-limiting model: standard description: Rate limiting algorithms, implementation strategies, HTTP conventions, tiered limits, distributed patterns, and client-side handling. Use when protecting APIs from abuse, implementing usage tiers, or configuring gateway-level throttling. ---
| Algorithm | Accuracy | Burst Handling | Best For | |-----------|----------|----------------|----------| | Token Bucket | High | Allows controlled bursts | API rate limiting, traffic shaping | | Leaky Bucket | High | Smooths bursts entirely | Steady-rate processing, queues | | Fixed Window | Low | Allows edge bursts (2x) | Simple use cases, prototyping | | Sliding Window Log | Very High | Precise control | Strict compliance, billing-critical | | Sliding Window Counter | High | Good approximation | Production APIs — best tradeoff |
Fixed window problem: A user sends the full limit at 11:59 and again at 12:01, doubling the effective rate. Sliding window fixes this.
Bucket holds tokens up to capacity. Tokens refill at a fixed rate. Each request consumes one.
class TokenBucket:
def __init__(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_refill = time.monotonic()
def allow(self) -> bool:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
Hybrid of fixed window and sliding window log — weights the previous window's count by overlap percentage:
def sliding_window_allow(key: str, limit: int, window_sec: int) -> bool:
now = time.time()
current_window = int(now // window_sec)
position_in_window = (now % window_sec) / window_sec
prev_count = get_count(key, current_window - 1)
curr_count = get_count(key, current_window)
estimated = prev_count * (1 - position_in_window) + curr_count
if estimated >= limit:
return False
increment_count(key, current_window)
return True
---
| Approach | Scope | Best For | |----------|-------|----------| | In-memory | Single server | Zero latency, no dependencies | | Redis (INCR + EXPIRE) | Distributed | Multi-instance deployments | | API Gateway | Edge | No code, built-in dashboards | | Middleware | Per-service | Fine-grained per-user/endpoint control |
Use gateway-level limiting as outer defense + application-level for fine-grained control.
---
Always return rate limit info, even on successful requests:
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1625097600
Retry-After: 30
| Header | When to Include | |--------|-----------------| | RateLimit-Limit | Every response | | RateLimit-Remaining | Every response | | RateLimit-Reset | Every response | | Retry-After | 429 responses only |
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Maximum 1000 requests per hour.",
"retry_after": 30,
"limit": 1000,
"reset_at": "2025-07-01T12:00:00Z"
}
}
Never return 500 or 503 for rate limiting — 429 is the correct status code.
---
Apply limits at multiple granularities:
| Scope | Key | Example Limit | Purpose | |-------|-----|---------------|---------| | Per-IP | Client IP | 100 req/min | Abuse prevention | | Per-User | User ID | 1000 req/hr | Fair usage | | Per-API-Key | API key | 5000 req/hr | Service-to-service | | Per-Endpoint | Route + key | 60 req/min on /search | Protect expensive ops |
Tiered pricing:
| Tier | Rate Limit | Burst | Cost | |------|-----------|-------|------| | Free | 100 req/hr | 10 | $0 | | Pro | 5,000 req/hr | 100 | $49/mo | | Enterprise | 100,000 req/hr | 2,000 | Custom |
Evaluate from most specific to least specific: per-endpoint > per-user > per-IP.
---
Redis-based pattern for consistent limiting across instances:
def redis_rate_limit(redis, key: str, limit: int, window: int) -> bool:
pipe = redis.pipeline()
now = time.time()
window_key = f"rl:{key}:{int(now // window)}"
pipe.incr(window_key)
pipe.expire(window_key, window * 2)
results = pipe.execute()
return results[0] <= limit
Atomic Lua script (prevents race conditions):
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call('INCR', key)
if current == 1 then
redis.call('EXPIRE', key, window)
end
return current <= limit and 1 or 0
Never do separate GET then SET — the gap allows overcount.
---
NGINX:
http {
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
limit_req_status 429;
}
}
}
Kong:
plugins:
- name: rate-limiting
config:
minute: 60
hour: 1000
policy: redis
redis_host: redis.internal
---
Clients must handle 429 gracefully:
async function fetchWithRetry(url: string, maxRetries = 3): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const res = await fetch(url);
if (res.status !== 429) return res;
const retryAfter = res.headers.get('Retry-After');
const delay = retryAfter
? parseInt(retryAfter, 10) * 1000
: Math.min(1000 * 2 ** attempt, 30000);
await new Promise(r => setTimeout(r, delay));
}
throw new Error('Rate limit exceeded after retries');
}
Retry-After when present---
Track these metrics:
---
| Anti-Pattern | Fix | |-------------|-----| | Application-only limiting | Always combine with infrastructure-level limits | | No retry guidance | Always include Retry-After header on 429 | | Inconsistent limits | Same endpoint, same limits across services | | No burst allowance | Allow controlled bursts for legitimate traffic | | Silent dropping | Always return 429 so clients can distinguish from errors | | Global single counter | Per-endpoint counters to protect expensive operations | | Hard-coded limits | Use configuration, not code constants |
---
200 OK when rate limiting — clients must know they were throttled安装 API Rate Limiting 后,可以对 AI 说这些话来触发它
Help me get started with API Rate Limiting
Explains what API Rate Limiting does, walks through the setup, and runs a quick demo based on your current project
Use API Rate Limiting to rate limiting algorithms, implementation strategies, HTTP conventio...
Invokes API Rate Limiting with the right parameters and returns the result directly in the conversation
What can I do with API Rate Limiting in my developer & devops workflow?
Lists the top use cases for API Rate Limiting, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/api-rate-limiting/ 目录(个人级,所有项目可用),或 .claude/skills/api-rate-limiting/(项目级)。重启 AI 客户端后,用 /api-rate-limiting 主动调用,或让 AI 根据上下文自动发现并使用。
API Rate Limiting 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
API Rate Limiting 可免费安装使用。请查阅仓库了解许可证信息。
Rate limiting algorithms, implementation strategies, HTTP conventions, tiered limits, distributed patterns, and client-side handling. Use when protecting APIs from abuse, implementing usage tiers, or configuring gateway-level throttling.
API Rate Limiting 属于「Developer & DevOps」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my developer & devops tasks using API Rate Limiting
Identifies repetitive steps in your workflow and sets up API Rate Limiting to handle them automatically