Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/...
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install mlx-local-inference或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install mlx-local-inference⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/mlx-local-inference/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: mlx-local-inference description: > Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/v1. Embedding/ASR/OCR via Python libraries (mlx-lm, mlx-vlm, mlx-audio). Works offline. Use instead of cloud APIs for privacy or low latency. metadata: { "openclaw": { "os": ["darwin"], "requires": { "anyBins": ["uv"] } } } ---
Local AI inference on Apple Silicon. oMLX handles LLM/VLM with continuous batching. Python libraries handle Embedding/ASR/OCR directly via uv.
┌─────────────────────────────────────┐
│ oMLX (localhost:8000/v1) │
│ - LLM (Qwen3.5-35B, etc.) │
│ - VLM (vision-language models) │
│ - Continuous batching + SSD cache │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Python Libraries (via uv run) │
│ - mlx-lm: Embedding │
│ - mlx-vlm: OCR (PaddleOCR-VL) │
│ - mlx-audio: ASR (Qwen3-ASR) │
└─────────────────────────────────────┘
| Capability | Implementation | Model | Size | |-----------|---------------|-------|------| | 💬 LLM | oMLX API | Qwen3.5-35B-A3B-4bit | ~20 GB | | 👁️ VLM | oMLX API | Any mlx-vlm model | varies | | 📐 Embed | mlx-lm (uv) | Qwen3-Embedding-0.6B-4bit-DWQ | ~1 GB | | 🎤 ASR | mlx-audio (uv) | Qwen3-ASR-1.7B-8bit | ~1.5 GB | | 👁️ OCR | mlx-vlm (uv) | PaddleOCR-VL-1.5-6bit | ~3.3 GB |
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="local")
# Text generation
resp = client.chat.completions.create(
model="Qwen3.5-35B-A3B-4bit",
messages=[{"role": "user", "content": "Hello"}]
)
print(resp.choices[0].message.content)
---
uv run --with mlx-lm python -c "
from mlx_lm import load
model, tokenizer = load('~/models/Qwen3-Embedding-0.6B-4bit-DWQ')
text = 'text to embed'
inputs = tokenizer(text, return_tensors='np')
embeddings = model(**inputs).last_hidden_state.mean(axis=1)
print(embeddings.shape)
"
---
> Important: Must run with --python 3.11 to avoid OpenMP threading issues (SIGSEGV).
uv run --python 3.11 --with mlx-audio python -m mlx_audio.stt.generate \
--model ~/models/Qwen3-ASR-1.7B-8bit \
--audio "audio.wav" \
--output-path /tmp/asr_result \
--format txt \
--language zh \
--verbose
---
> Important: The generate function parameter order must be (model, processor, prompt, image).
cat << 'PY_EOF' > run_ocr.py
import os
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
model_path = os.path.expanduser("~/models/PaddleOCR-VL-1.5-6bit")
model, processor = load(model_path)
prompt = apply_chat_template(processor, config=model.config, prompt="OCR:", num_images=1)
output = generate(model, processor, prompt, "document.jpg", max_tokens=512, temp=0.0)
print(output.text)
PY_EOF
uv run --python 3.11 --with mlx-vlm python run_ocr.py
---
# Check running models
curl http://localhost:8000/v1/models
# Restart oMLX
launchctl kickstart -k gui/$(id -u)/com.omlx-server
All models stored in ~/models/ using oMLX-compatible structure:
~/models/
├── Qwen3-Embedding-0.6B-4bit-DWQ/
├── Qwen3-ASR-1.7B-8bit/
├── PaddleOCR-VL-1.5-6bit/
└── Qwen3.5-35B-A3B-4bit/
uv installed (curl -LsSf https://astral.sh/uv/install.sh | sh)安装 mlx-local-inference 后,可以对 AI 说这些话来触发它
Help me get started with mlx-local-inference
Explains what mlx-local-inference does, walks through the setup, and runs a quick demo based on your current project
Use mlx-local-inference to use when calling local AI on this Mac — text generation, embeddings...
Invokes mlx-local-inference with the right parameters and returns the result directly in the conversation
What can I do with mlx-local-inference in my design & creative workflow?
Lists the top use cases for mlx-local-inference, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/mlx-local-inference/ 目录(个人级,所有项目可用),或 .claude/skills/mlx-local-inference/(项目级)。重启 AI 客户端后,用 /mlx-local-inference 主动调用,或让 AI 根据上下文自动发现并使用。
mlx-local-inference 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
mlx-local-inference 可免费安装使用。请查阅仓库了解许可证信息。
Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/...
mlx-local-inference 属于「Design & Creative」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my design & creative tasks using mlx-local-inference
Identifies repetitive steps in your workflow and sets up mlx-local-inference to handle them automatically