MLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX. Covers local inference, streaming, tool calling, LoRA fine-tuning, and embeddings.
数据来源:ClawHub。 在 ClawSkills 查看
选择你使用的 Agent
方法一:命令行安装(推荐)
推荐(无需提前安装 clawhub)
npx clawhub@latest --dir ~/.claude/skills install mlx-swift-lm或使用 clawhub CLI(需提前安装)
clawhub --dir ~/.claude/skills install mlx-swift-lm⚠️ 需要 Node.js 18+,没有 Node?请使用下方方法二直接下载 ZIP。 安装 Node.js →
方法二:手动下载安装(无需 Node)
下载 ZIP,解压后将文件夹放到以下路径,重启 Agent 即可:
安装路径
~/.claude/skills/mlx-swift-lm/💡解压后将文件夹放到上方路径,重启 Agent 即可生效
--- name: swift-mlx-lm description: MLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX. Covers local inference, streaming, tool calling, LoRA fine-tuning, and embeddings. triggers: - mlx - mlx-swift - mlx-lm - apple silicon llm - local llm swift - vision language model swift - lora training swift ---
mlx-swift-lm is a Swift package for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Apple Silicon using MLX. It supports local inference, fine-tuning via LoRA/DoRA, and embeddings generation.
MLXLMCommon - Core infrastructure (ModelContainer, ChatSession, KVCache, etc.)
MLXLLM - Text-only LLM support (Llama, Qwen, Gemma, Phi, DeepSeek, etc. - examples, not exhaustive)
MLXVLM - Vision-Language Models (Qwen2-VL, PaliGemma, Gemma3, etc. - examples, not exhaustive)
Embedders - Embedding models (BGE, Nomic, MiniLM)
| Purpose | File Path | |---------|-----------| | Thread-safe model wrapper | Libraries/MLXLMCommon/ModelContainer.swift | | Simplified chat API | Libraries/MLXLMCommon/ChatSession.swift | | Generation & streaming | Libraries/MLXLMCommon/Evaluate.swift | | KV cache types | Libraries/MLXLMCommon/KVCache.swift | | Model configuration | Libraries/MLXLMCommon/ModelConfiguration.swift | | Chat message types | Libraries/MLXLMCommon/Chat.swift | | Tool call processing | Libraries/MLXLMCommon/Tool/ToolCallFormat.swift | | Concurrency utilities | Libraries/MLXLMCommon/Utilities/SerialAccessContainer.swift | | LLM factory & registry | Libraries/MLXLLM/LLMModelFactory.swift | | VLM factory & registry | Libraries/MLXVLM/VLMModelFactory.swift | | LoRA configuration | Libraries/MLXLMCommon/Adapters/LoRA/LoRAContainer.swift | | LoRA training | Libraries/MLXLLM/LoraTrain.swift |
import MLXLLM
import MLXLMCommon
// Load model (downloads from HuggingFace automatically)
let modelContainer = try await LLMModelFactory.shared.loadContainer(
configuration: .init(id: "mlx-community/Qwen3-4B-4bit")
)
// Create chat session
let session = ChatSession(modelContainer)
// Single response
let response = try await session.respond(to: "What is Swift?")
print(response)
// Streaming response
for try await chunk in session.streamResponse(to: "Explain concurrency") {
print(chunk, terminator: "")
}
import MLXVLM
import MLXLMCommon
let modelContainer = try await VLMModelFactory.shared.loadContainer(
configuration: .init(id: "mlx-community/Qwen2-VL-2B-Instruct-4bit")
)
let session = ChatSession(modelContainer)
// With image (video is also an optional parameter)
let image = UserInput.Image.url(imageURL)
let response = try await session.respond(
to: "Describe this image",
image: image,
video: nil // Optional video parameter
)
import Embedders
// Note: Embedders uses loadModelContainer() helper (not a factory pattern)
let container = try await loadModelContainer(
configuration: ModelConfiguration(id: "mlx-community/bge-small-en-v1.5-mlx")
)
let embeddings = await container.perform { model, tokenizer, pooler in
let tokens = tokenizer.encode(text: "Hello world")
let input = MLXArray(tokens).expandedDimensions(axis: 0)
let output = model(input)
let pooled = pooler(output, normalize: true)
eval(pooled)
return pooled
}
ChatSession manages conversation history and KV cache automatically:
let session = ChatSession(
modelContainer,
instructions: "You are a helpful assistant", // System prompt
generateParameters: GenerateParameters(
maxTokens: 500,
temperature: 0.7
)
)
// Multi-turn conversation (history preserved automatically)
let r1 = try await session.respond(to: "What is 2+2?")
let r2 = try await session.respond(to: "And if you multiply that by 3?")
// Clear session to start fresh
await session.clear()
For lower-level control, use generate() directly:
let input = try await modelContainer.prepare(input: UserInput(prompt: .text("Hello")))
let stream = try await modelContainer.generate(input: input, parameters: GenerateParameters())
for await generation in stream {
switch generation {
case .chunk(let text):
print(text, terminator: "")
case .info(let info):
print("\n\(info.tokensPerSecond) tok/s")
case .toolCall(let call):
// Handle tool call
break
}
}
// 1. Define tool
struct WeatherInput: Codable { let location: String }
struct WeatherOutput: Codable { let temperature: Double; let conditions: String }
let weatherTool = Tool<WeatherInput, WeatherOutput>(
name: "get_weather",
description: "Get current weather",
parameters: [.required("location", type: .string, description: "City name")]
) { input in
WeatherOutput(temperature: 22.0, conditions: "Sunny")
}
// 2. Include tool schema in request
let input = UserInput(
prompt: .text("What's the weather in Tokyo?"),
tools: [weatherTool.schema]
)
// 3. Handle tool calls in generation stream
for await generation in try await modelContainer.generate(input: input, parameters: params) {
switch generation {
case .chunk(let text): print(text)
case .toolCall(let call):
let result = try await call.execute(with: weatherTool)
print("Weather: \(result.conditions)")
case .info: break
}
}
See references/tool-calling.md for multi-turn and feeding results back.
let params = GenerateParameters(
maxTokens: 1000, // nil = unlimited
maxKVSize: 4096, // Sliding window (uses RotatingKVCache)
kvBits: 4, // Quantized cache (4 or 8 bit)
temperature: 0.7, // 0 = greedy/argmax
topP: 0.9, // Nucleus sampling
repetitionPenalty: 1.1, // Penalize repeats
repetitionContextSize: 20 // Window for penalty
)
Restore chat from persisted history:
let history: [Chat.Message] = [
.system("You are helpful"),
.user("Hello"),
.assistant("Hi there!")
]
let session = ChatSession(
modelContainer,
history: history
)
// Continues from this point
// From URL (file or remote)
let image = UserInput.Image.url(fileURL)
// From CIImage
let image = UserInput.Image.ciImage(ciImage)
// From MLXArray directly
let image = UserInput.Image.array(mlxArray)
// From URL (file or remote)
let video = UserInput.Video.url(videoURL)
// From AVFoundation asset
let video = UserInput.Video.avAsset(avAsset)
// From pre-extracted frames
let video = UserInput.Video.frames(videoFrames)
let response = try await session.respond(
to: "What happens in this video?",
video: video
)
let images: [UserInput.Image] = [
.url(url1),
.url(url2)
]
let response = try await session.respond(
to: "Compare these two images",
images: images,
videos: []
)
let session = ChatSession(
modelContainer,
processing: UserInput.Processing(
resize: CGSize(width: 512, height: 512) // Resize images
)
)
...
安装 MLX Swift LM Expert 后,可以对 AI 说这些话来触发它
Help me get started with MLX Swift LM Expert
Explains what MLX Swift LM Expert does, walks through the setup, and runs a quick demo based on your current project
Use MLX Swift LM Expert to mLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX
Invokes MLX Swift LM Expert with the right parameters and returns the result directly in the conversation
What can I do with MLX Swift LM Expert in my developer & devops workflow?
Lists the top use cases for MLX Swift LM Expert, with example commands for each scenario
将技能文件夹放到 ~/.claude/skills/mlx-swift-lm/ 目录(个人级,所有项目可用),或 .claude/skills/mlx-swift-lm/(项目级)。重启 AI 客户端后,用 /mlx-swift-lm 主动调用,或让 AI 根据上下文自动发现并使用。
MLX Swift LM Expert 支持 Claude、Cursor、OpenClaw,可与这些 AI 平台无缝集成,扩展其能力。
MLX Swift LM Expert 可免费安装使用。请查阅仓库了解许可证信息。
MLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX. Covers local inference, streaming, tool calling, LoRA fine-tuning, and embeddings.
MLX Swift LM Expert 属于「Developer & DevOps」分类,该分类的技能帮助 AI 智能体在此领域执行专业任务。
Automate my developer & devops tasks using MLX Swift LM Expert
Identifies repetitive steps in your workflow and sets up MLX Swift LM Expert to handle them automatically