Live Caption and Voice Typing both rely on the same offline live transcription setup, but they serve different surfaces: one shows floating captions, while the other sends text into another app.
Best for
- Showing system audio as a floating subtitle window
- Dictating directly into chat apps, documents, or forms
- Users who already understand the main recording workflow and only need one side capability
Before you start
- Finish Getting Started, or configure a working
Live Record ModelinSettings > Model Settings. - If you want to send text into another app, the feature still depends on the same offline live transcription stack.
Where Live Caption starts
- Open Live Record.
- Find the
Live CaptionorSystem Audio Captionstoggle on that page. - If you only want floating subtitles for system audio, you do not need to start recording first. Turning it on is enough.
- If you later start
Live Record, both can run in parallel.
What Settings > Subtitle Settings controls
- It controls the floating caption window behavior: startup behavior, always-on-top, click-through, font size, width, color, and background transparency.
- It does not provide the start toggle. The real entry point stays on the Live Record page.
- If
Live Captionis already on but the window still does not appear, continue to FAQ and Troubleshooting.
How Voice Typing starts
- Open
Settings > Voice Typing. - Turn on
Voice Typing. - Assign a global shortcut on that page.
- Choose either
Push to Talk (Hold)orToggle (Press once). - If it still is not ready, review the readiness and dependency status shown there.
Push to Talk versus Toggle
Push to Talk (Hold)works better for short bursts because capture only runs while you hold the shortcut.Toggle (Press once)works better for longer dictation because one press starts and the next press stops.- In both modes,
Voice Typingstill depends on a workingLive Record Model, any requiredVADmodel, an available input device, and background warm-up. - If
Voice Typingis not ready yet, the same settings page tells you whether the blocker is the shortcut, model, VAD, input device, or runtime warm-up.
When to choose which one
- If your goal is to see system audio as floating text instead of sending it somewhere else, start with
Live Caption. - If your goal is to put spoken text into another app's input field, start with
Voice Typing. - If one of the features is enabled but does not behave the way you expect, go straight to FAQ and Troubleshooting.