Voice (alpha β push-to-talk)
The web UI now has a π€ button next to the answer textarea. Hold it, speak your answer, release β the transcript drops into the textarea so you can edit before submitting. Default backend is fully local: a whisper.cpp subprocess Rocky shells out to. Nothing leaves the machine.
Setup β one line
bash
curl -fsSL https://raw.githubusercontent.com/NVME-git/rocky/main/scripts/install-whisper.sh | sh
The installer:
- Detects platform (
linux-x64,linux-arm64,macos-arm64,macos-x64) - Installs
whisper-cliβ viabrew install whisper-cppon macOS, building from source on Linux (cmake+ a C++ compiler required, ~2-5 min) - Pulls
ggml-base.en.bin(~142 MB) into~/.rocky/models/ - Adds a
[voice]block to~/.config/rocky/config.toml
After it finishes, restart rocky view and the mic button is live.
Two backends
| Backend | Latency | Privacy | Setup |
|---|---|---|---|
whisper-cpp (default) | ~1-2 s on CPU | β fully local | install-whisper.sh |
browser (opt-in) | real-time | β Chrome β Google, Safari β Apple | flip a config flag |
Browser mode is forbidden when privacy.strict = true β Rocky wonβt let you accidentally exfiltrate audio. Enable it explicitly with:
[voice]
provider = "browser"
browser_consent = true
What ships in v0.2
- Web UI mic button (push-to-hold, WAV encoded client-side at 16 kHz mono)
- POST
/api/transcribeendpoint that calls the configured STT provider whisper.cppsubprocess invocation with clear errors when the binary or model is missing (the web UI surfaces an inline link to the installer)- One-line installer (
scripts/install-whisper.sh)
Whatβs planned for later
rocky quiz --voiceβ fully hands-free CLI session: Rocky speaks the question via OS TTS, captures your answer withcpal+webrtc-vad(700ms silence ends an utterance), evaluates, repeats- In-binary
whisper-rsbuild (cargo install --features voice) for users who want one binary, no PATH dependency - Real-time streaming transcription (Pattern C in ADR 0006)
There is no wake-word / always-on listening β push-to-talk is the only model. By design.
Disabling voice
[voice]
provider = "off"
Mic button stops working immediately, no rebuild needed.