The full take
Video · audio · synced
A note in your own voice — captured, divided, transcribed.
See and hear yourself exactly as others do.
Vinh Giang's three-step technique: film yourself for five minutes, then study the take three ways — sound only, picture only, and a transcript with every um, ah and repeat marked for elimination. Speakeasy is the ritual, automated. Each recording leaves the four artefacts below behind: the original take and the three reviews — alongside a measure of how your pitch, volume, and pace move while you speak. Press the brass.
Grant the camera and microphone to begin. Nothing leaves this tab.
Open the room
Each recording leaves four pieces behind: the full take, the audio alone, the silent picture, and the words. They appear here when you stop.
Flat or full — the difference is speed, pitch, volume, pause.
Vinh Giang's vocal toolbox. Speed conveys excitement, or — slowed — gives weight. Pitch signals emotion. Volume projects confidence (or, when whispered, intimacy). Pause makes the next word matter. The analysis below measures three of those four — pitch, volume and pace, with the fourth (pause) showing up as the gaps between them. Watch the lines for stretches where they all flatten at once: that's where your delivery sat in monotone.
A note on the
fourth artefact
Transcription runs entirely in the browser. The model is OpenAI's
Whisper compiled to ONNX and served via @xenova/transformers;
on first use the weights download once and live thereafter in your
browser cache. Desktop loads base.en (≈ 145 MB); phones
load tiny.en (≈ 39 MB) so iOS Safari doesn't run out of
memory. Nothing is sent to any server.