Speakeasy

A note in your own voice — captured, divided, transcribed.

Camera open · ready when you are

Vinh Giang on the communication mirror — watch on YouTube.

See and hear yourself exactly as others do.

Vinh Giang's three-step technique: film yourself for five minutes, then study the take three ways — sound only, picture only, and a transcript with every um, ah and repeat marked for elimination. Speakeasy is the ritual, automated. Each recording leaves the four artefacts below behind: the original take and the three reviews — alongside a measure of how your pitch, volume, and pace move while you speak. Press the brass.

Grant the camera and microphone to begin. Nothing leaves this tab.

00:00

Open the room

Awaiting consent.

Camera · microphone · this tab only

Four Artefacts

i — iv

⁂

Each recording leaves four pieces behind: the full take, the audio alone, the silent picture, and the words. They appear here when you stop.

Flat or full — the difference is speed, pitch, volume, pause.

Vinh Giang's vocal toolbox. Speed conveys excitement, or — slowed — gives weight. Pitch signals emotion. Volume projects confidence (or, when whispered, intimacy). Pause makes the next word matter. The analysis below measures three of those four — pitch, volume and pace, with the fourth (pause) showing up as the gaps between them. Watch the lines for stretches where they all flatten at once: that's where your delivery sat in monotone.

Vinh Giang on vocal variety — watch on YouTube.

A note on the
fourth artefact

Transcription runs entirely in the browser. The model is OpenAI's Whisper compiled to ONNX and served via @xenova/transformers; on first use the weights download once and live thereafter in your browser cache. Desktop loads base.en (≈ 145 MB); phones load tiny.en (≈ 39 MB) so iOS Safari doesn't run out of memory. Nothing is sent to any server.

The full take

Audio alone

Silent footage

The words