Home Projects Zengram ZenVox OpenClaw Memory Toolkit View on GitHub

Voice-to-text shouldn't cost $16/month

Wispr Flow charges you $16/month to wrap Whisper in a pretty UI and clean your text with an API call. Your audio goes to their cloud. You can't choose your own LLM. You can't go offline. And you definitely can't inspect the source code.

ZenVox does the same thing for $0. Your audio never leaves your machine.


Press a hotkey. Talk. It types.

ZenVox sits in your system tray. When you press the hotkey, it starts recording. When you stop talking, it auto-stops via voice activity detection. Whisper transcribes locally, your chosen LLM cleans the output, and the result is pasted wherever your cursor is.

# 1. Press Ctrl+Alt+F12
Recording... listening via VAD

# 2. Speak naturally — ZenVox detects silence and auto-stops
Transcribing... Faster-Whisper (large-v3-turbo)

# 3. AI cleans the output
Cleaning... Gemini Flash Lite (free tier)

# 4. Cleaned text auto-pasted at your cursor
Done! 247 words cleaned and pasted in 3.2s

ZenVox vs the paid alternatives

ZenVox Wispr Flow
Price Free $8-16/month
Audio leaves your machine Never Yes (cloud STT)
Choose your own LLM 5 providers Their API only
Fully offline option Yes (Ollama) No
Cleaning presets 4 1
Bilingual FR/EN Native English-centric
Searchable history SQLite No
Open source Yes No

The $0/month voice-to-text stack

No paid API required. Here's what most people use:

Component What Cost
Faster-Whisper Speech-to-text, runs locally Free
Gemini Flash Lite AI text cleaning (1500 req/day free) Free tier
ZenVox Glues it together Free, forever

Want fully offline? Use Ollama as your cleaning provider. Zero API calls, everything stays on your machine.


Everything you need, nothing you don't

Transcription

  • Faster-Whisper with 4 model sizes (tiny to large-v3-turbo)
  • CUDA GPU acceleration (auto-detected)
  • VAD silence detection — auto-stops when you stop talking
  • Audio feedback beeps for start/stop

AI Cleaning

  • 5 providers: Gemini, OpenAI, Anthropic, Groq, Ollama
  • 4 presets: General, Technical, Minimal, Structured
  • Per-provider API key storage (secure keyring)
  • Fully offline with Ollama — zero external calls

Desktop App

  • System tray with colored status dots
  • Configurable hotkeys (record + re-paste)
  • 3 output modes: auto-paste, clipboard, append to file
  • Searchable SQLite transcription history

Built for Quebec

  • Native bilingual FR/EN — handles Franglais naturally
  • Accent-aware cleaning that preserves Quebec expressions
  • One-command install (Python + GPU + dependencies)
  • MIT licensed — yours to modify, extend, sell

Up and running in one command

git clone https://github.com/ZenSystemAI/ZenVox.git
cd ZenVox
python install.py
# That's it. Installs venv, dependencies, detects GPU, creates shortcuts.
# Launch from Start Menu or run:
python zenvox.py

The installer detects your GPU, sets up a virtual environment, installs all dependencies, and creates desktop shortcuts. First-run experience guides you through provider setup.