Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
基于Kaldi和ONNX Runtime的离线语音识别、合成、说话人分离与VAD
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
鲁棒的开源语音识别模型,含15亿参数
Port of Funasr's Sense-voice model in C/C++
Funasr SenseVoice模型的C/C++移植版
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
工业级语音识别工具包,支持50+语言、流式处理和OpenAI兼容API
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
支持31种语言、方言、歌词、热词、时间戳和说话人日志的端到端语音识别大模型。
Faster Whisper transcription with CTranslate2
使用CTranslate2加速的Whisper语音转录工具
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
易用的语音工具包,包含语音识别、语音合成、说话人验证和关键词检测
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
多语言语音理解,支持ASR、情感识别和音频事件检测,速度比Whisper快15倍
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
简单灵活的中文语音对话机器人和智能音箱项目,支持ChatGPT多轮对话和脑机交互。