Use Case
Agent Evaluation is a powerful AI agent skill that extends your assistant with new capabilities. Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent. This guide walks you through installing Agent Evaluation, configuring it for your setup, and running your first commands — so you can start getting value in minutes.
Install Agent Evaluation: npx clawhub@latest --dir ~/.claude/skills install agent-evaluation
Restart your AI client (Claude Code, Cursor, Gemini CLI, or OpenClaw)
Type a natural language request related to developer & devops to trigger Agent Evaluation
Review the output and refine your prompt for better results
Combine Agent Evaluation with other skills to build multi-step workflows
Copy these prompts and use them with your AI agent after installing Agent Evaluation
Help me get started with Agent Evaluation
What can Agent Evaluation do for my developer & devops workflow?
Show me an example of using Agent Evaluation
Select your agent
Option 1: Install via CLI (recommended)
Recommended (no pre-install needed)
npx clawhub@latest --dir ~/.claude/skills install agent-evaluationOr via clawhub CLI (if already installed)
clawhub --dir ~/.claude/skills install agent-evaluation⚠️ Requires Node.js 18+. No Node? Use Option 2 below to download the ZIP instead. Install Node.js →
Option 2: Manual install (no Node required)
Download the ZIP, extract it, and place the folder at the path below. Restart your agent to activate.
Install path
~/.claude/skills/agent-evaluation/💡Extract and place the folder at the path above, then restart your agent.