The Problem

Voice-to-text shouldn't cost $16/month

Wispr Flow charges you $16/month to wrap Whisper in a pretty UI and clean your text with an API call. Your audio goes to their cloud. You can't choose your own LLM. You can't go offline. And you definitely can't inspect the source code.

ZenVox does the same thing for $0. Your audio never leaves your machine.

How It Works

Press a hotkey. Talk. It types.

ZenVox sits in your system tray. When you press the hotkey, it starts recording. When you stop talking, it auto-stops via voice activity detection. Whisper transcribes locally, your chosen LLM cleans the output, and the result is pasted wherever your cursor is.

# 1. Press Ctrl+Alt+F12
Recording... listening via VAD

# 2. Speak naturally — ZenVox detects silence and auto-stops
Transcribing... Faster-Whisper (large-v3-turbo)

# 3. AI cleans the output
Cleaning... Gemini Flash Lite (free tier)

# 4. Cleaned text auto-pasted at your cursor
Done! 247 words cleaned and pasted in 3.2s

Comparison

ZenVox vs the paid alternatives

	ZenVox	Wispr Flow
Price	Free	$8-16/month
Audio leaves your machine	Never	Yes (cloud STT)
Choose your own LLM	5 providers	Their API only
Fully offline option	Yes (Ollama)	No
Cleaning presets	4	1
Bilingual FR/EN	Native	English-centric
Searchable history	SQLite	No
Open source	Yes	No

The Stack

The $0/month voice-to-text stack

No paid API required. Here's what most people use:

Component	What	Cost
Faster-Whisper	Speech-to-text, runs locally	Free
Gemini Flash Lite	AI text cleaning (1500 req/day free)	Free tier
ZenVox	Glues it together	Free, forever

Want fully offline? Use Ollama as your cleaning provider. Zero API calls, everything stays on your machine.

Features

Everything you need, nothing you don't

Transcription

Faster-Whisper with 4 model sizes (tiny to large-v3-turbo)
CUDA GPU acceleration (auto-detected)
VAD silence detection — auto-stops when you stop talking
Audio feedback beeps for start/stop

AI Cleaning

5 providers: Gemini, OpenAI, Anthropic, Groq, Ollama
4 presets: General, Technical, Minimal, Structured
Per-provider API key storage (secure keyring)
Fully offline with Ollama — zero external calls

Desktop App

System tray with colored status dots
Configurable hotkeys (record + re-paste)
3 output modes: auto-paste, clipboard, append to file
Searchable SQLite transcription history

Built for Quebec

Native bilingual FR/EN — handles Franglais naturally
Accent-aware cleaning that preserves Quebec expressions
One-command install (Python + GPU + dependencies)
MIT licensed — yours to modify, extend, sell

Quick Start

Up and running in one command

git clone https://github.com/ZenSystemAI/ZenVox.git
cd ZenVox
python install.py
# That's it. Installs venv, dependencies, detects GPU, creates shortcuts.
# Launch from Start Menu or run:
python zenvox.py

The installer detects your GPU, sets up a virtual environment, installs all dependencies, and creates desktop shortcuts. First-run experience guides you through provider setup.