M

MLX Swift LM Expert

mlx-swift-lm

MLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX. Covers local inference, streaming, tool calling, LoRA fine-tuning, and embeddings.

数据来源：ClawHub。在 ClawSkills 查看

1.9k下载量

3收藏数

0浏览量

安装

选择你使用的 Agent

方法一：命令行安装（推荐）

关于 MLX Swift LM Expert

--- name: swift-mlx-lm description: MLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX. Covers local inference, streaming, tool calling, LoRA fine-tuning, and embeddings. triggers: - mlx - mlx-swift - mlx-lm - apple silicon llm - local llm swift - vision language model swift - lora training swift ---

mlx-swift-lm Skill

1. Overview & Triggers

mlx-swift-lm is a Swift package for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Apple Silicon using MLX. It supports local inference, fine-tuning via LoRA/DoRA, and embeddings generation.

When to Use This Skill

Running LLM/VLM inference on macOS/iOS with Apple Silicon
Streaming text generation from local models
Vision tasks with images/video (VLMs)
Tool calling / function calling with models
LoRA adapter training and fine-tuning
Text embeddings for RAG/semantic search

Architecture Overview

MLXLMCommon     - Core infrastructure (ModelContainer, ChatSession, KVCache, etc.)
MLXLLM          - Text-only LLM support (Llama, Qwen, Gemma, Phi, DeepSeek, etc. - examples, not exhaustive)
MLXVLM          - Vision-Language Models (Qwen2-VL, PaliGemma, Gemma3, etc. - examples, not exhaustive)
Embedders       - Embedding models (BGE, Nomic, MiniLM)

2. Key File Reference

| Purpose | File Path | |---------|-----------| | Thread-safe model wrapper | Libraries/MLXLMCommon/ModelContainer.swift | | Simplified chat API | Libraries/MLXLMCommon/ChatSession.swift | | Generation & streaming | Libraries/MLXLMCommon/Evaluate.swift | | KV cache types | Libraries/MLXLMCommon/KVCache.swift | | Model configuration | Libraries/MLXLMCommon/ModelConfiguration.swift | | Chat message types | Libraries/MLXLMCommon/Chat.swift | | Tool call processing | Libraries/MLXLMCommon/Tool/ToolCallFormat.swift | | Concurrency utilities | Libraries/MLXLMCommon/Utilities/SerialAccessContainer.swift | | LLM factory & registry | Libraries/MLXLLM/LLMModelFactory.swift | | VLM factory & registry | Libraries/MLXVLM/VLMModelFactory.swift | | LoRA configuration | Libraries/MLXLMCommon/Adapters/LoRA/LoRAContainer.swift | | LoRA training | Libraries/MLXLLM/LoraTrain.swift |

3. Quick Start

LLM Chat (Simplest API)

import MLXLLM
import MLXLMCommon

// Load model (downloads from HuggingFace automatically)
let modelContainer = try await LLMModelFactory.shared.loadContainer(
    configuration: .init(id: "mlx-community/Qwen3-4B-4bit")
)

// Create chat session
let session = ChatSession(modelContainer)

// Single response
let response = try await session.respond(to: "What is Swift?")
print(response)

// Streaming response
for try await chunk in session.streamResponse(to: "Explain concurrency") {
    print(chunk, terminator: "")
}

VLM with Image

import MLXVLM
import MLXLMCommon

let modelContainer = try await VLMModelFactory.shared.loadContainer(
    configuration: .init(id: "mlx-community/Qwen2-VL-2B-Instruct-4bit")
)

let session = ChatSession(modelContainer)

// With image (video is also an optional parameter)
let image = UserInput.Image.url(imageURL)
let response = try await session.respond(
    to: "Describe this image",
    image: image,
    video: nil  // Optional video parameter
)

Embeddings

import Embedders

// Note: Embedders uses loadModelContainer() helper (not a factory pattern)
let container = try await loadModelContainer(
    configuration: ModelConfiguration(id: "mlx-community/bge-small-en-v1.5-mlx")
)

let embeddings = await container.perform { model, tokenizer, pooler in
    let tokens = tokenizer.encode(text: "Hello world")
    let input = MLXArray(tokens).expandedDimensions(axis: 0)
    let output = model(input)
    let pooled = pooler(output, normalize: true)
    eval(pooled)
    return pooled
}

4. Primary Workflow: LLM Inference

ChatSession API (Recommended)

ChatSession manages conversation history and KV cache automatically:

let session = ChatSession(
    modelContainer,
    instructions: "You are a helpful assistant",  // System prompt
    generateParameters: GenerateParameters(
        maxTokens: 500,
        temperature: 0.7
    )
)

// Multi-turn conversation (history preserved automatically)
let r1 = try await session.respond(to: "What is 2+2?")
let r2 = try await session.respond(to: "And if you multiply that by 3?")

// Clear session to start fresh
await session.clear()

Streaming with generate()

For lower-level control, use generate() directly:

let input = try await modelContainer.prepare(input: UserInput(prompt: .text("Hello")))
let stream = try await modelContainer.generate(input: input, parameters: GenerateParameters())

for await generation in stream {
    switch generation {
    case .chunk(let text):
        print(text, terminator: "")
    case .info(let info):
        print("\n\(info.tokensPerSecond) tok/s")
    case .toolCall(let call):
        // Handle tool call
        break
    }
}

Tool Calling

// 1. Define tool
struct WeatherInput: Codable { let location: String }
struct WeatherOutput: Codable { let temperature: Double; let conditions: String }

let weatherTool = Tool<WeatherInput, WeatherOutput>(
    name: "get_weather",
    description: "Get current weather",
    parameters: [.required("location", type: .string, description: "City name")]
) { input in
    WeatherOutput(temperature: 22.0, conditions: "Sunny")
}

// 2. Include tool schema in request
let input = UserInput(
    prompt: .text("What's the weather in Tokyo?"),
    tools: [weatherTool.schema]
)

// 3. Handle tool calls in generation stream
for await generation in try await modelContainer.generate(input: input, parameters: params) {
    switch generation {
    case .chunk(let text): print(text)
    case .toolCall(let call):
        let result = try await call.execute(with: weatherTool)
        print("Weather: \(result.conditions)")
    case .info: break
    }
}

See references/tool-calling.md for multi-turn and feeding results back.

GenerateParameters

let params = GenerateParameters(
    maxTokens: 1000,           // nil = unlimited
    maxKVSize: 4096,           // Sliding window (uses RotatingKVCache)
    kvBits: 4,                 // Quantized cache (4 or 8 bit)
    temperature: 0.7,          // 0 = greedy/argmax
    topP: 0.9,                 // Nucleus sampling
    repetitionPenalty: 1.1,    // Penalize repeats
    repetitionContextSize: 20  // Window for penalty
)

Prompt Caching / History Re-hydration

Restore chat from persisted history:

let history: [Chat.Message] = [
    .system("You are helpful"),
    .user("Hello"),
    .assistant("Hi there!")
]

let session = ChatSession(
    modelContainer,
    history: history
)
// Continues from this point

5. Secondary Workflow: VLM Inference

Image Input Types

// From URL (file or remote)
let image = UserInput.Image.url(fileURL)

// From CIImage
let image = UserInput.Image.ciImage(ciImage)

// From MLXArray directly
let image = UserInput.Image.array(mlxArray)

Video Input

// From URL (file or remote)
let video = UserInput.Video.url(videoURL)

// From AVFoundation asset
let video = UserInput.Video.avAsset(avAsset)

// From pre-extracted frames
let video = UserInput.Video.frames(videoFrames)

let response = try await session.respond(
    to: "What happens in this video?",
    video: video
)

Multiple Images

let images: [UserInput.Image] = [
    .url(url1),
    .url(url2)
]

let response = try await session.respond(
    to: "Compare these two images",
    images: images,
    videos: []
)

VLM-Specific Processing

let session = ChatSession(
    modelContainer,
    processing: UserInput.Processing(
        resize: CGSize(width: 512, height: 512)  // Resize images
    )
)

6. Best Practices

DO

...

Prompt 示例

安装 MLX Swift LM Expert 后，可以对 AI 说这些话来触发它

U

Help me get started with MLX Swift LM Expert

A

Explains what MLX Swift LM Expert does, walks through the setup, and runs a quick demo based on your current project

U

Use MLX Swift LM Expert to mLX Swift LM - Run LLMs and VLMs on Apple Silicon using MLX

A

Invokes MLX Swift LM Expert with the right parameters and returns the result directly in the conversation

U

What can I do with MLX Swift LM Expert in my developer & devops workflow?

A

Lists the top use cases for MLX Swift LM Expert, with example commands for each scenario

常见问题

如何安装 MLX Swift LM Expert？▾

将技能文件夹放到 ~/.claude/skills/mlx-swift-lm/ 目录（个人级，所有项目可用），或 .claude/skills/mlx-swift-lm/（项目级）。重启 AI 客户端后，用 /mlx-swift-lm 主动调用，或让 AI 根据上下文自动发现并使用。

MLX Swift LM Expert 支持哪些 AI 平台？▾

MLX Swift LM Expert 支持 Claude、Cursor、OpenClaw，可与这些 AI 平台无缝集成，扩展其能力。

MLX Swift LM Expert 是免费的吗？▾

MLX Swift LM Expert 可免费安装使用。请查阅仓库了解许可证信息。

MLX Swift LM Expert 有什么功能？▾