2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.
在Apple Silicon上为Qwen模型实现原生多token推测解码,解码TPS提升2.24倍
共 1051 个仓库
备份、整理、重新发现你曾点赞过的每一个 GitHub 仓库。
2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.
在Apple Silicon上为Qwen模型实现原生多token推测解码,解码TPS提升2.24倍
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
适用于Apple Silicon的LLM推理服务器,支持连续批处理和SSD缓存,通过macOS菜单栏管理
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
纯C/C++实现的扩散模型推理,支持SD、Flux、Wan等模型
Implementation of Fish Audio S2 Pro model inference in native ggml.
使用ggml原生实现Fish Audio S2 Pro模型推理的C++库
Standalone C++ inference project for VoxCPM models built on top of ggml.
基于ggml的VoxCPM模型独立C++推理引擎
SGLang is a high-performance serving framework for large language models and multimodal models.
面向大语言模型和多模态模型的高性能服务框架
Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
使用KoboldAI界面轻松运行GGUF模型,单文件零安装。