Warning: Deprecation Notice
As of , Web Captioner has been sunset. I've left this article up, but unless someone hosts a fork of the original source code, this article won't work anymore. If you find viable alternatives, please let me know on Mastodon!
Introduction
Yesterday, I participated in a Twitter Space with JavaScript Jam to chat about accessibility and web standards. Ordinarily when I do this kind of speaking, I try to make sure there are, at minimum, automatic captions for the live event (yes, despite automatic captions' serious limitations) and polished captions for any recordings afterwards. What I didn't realize, until Nic Steenhout pointed it out, was that Twitter Spaces no longer provides captions. They used to, but Twitter's new management has since disabled that, rolling back significant progress.
Ideally, live chats like this will happen on platforms that provide captions out of the box which are built directly into the interface. That will ensure the captions are discoverable, and that the user won't have to leave the application window to get their captions. If you're looking to ensure participants can get captions, this is where you should start: by seeking out platforms that explicitly support captions.
If you absolutely can't get in-app captions to work or move to another platform, then what follows is the last-ditch fallback I ended up going with: using Web Captioner to generate a shareable link to automatic captions that you can pass along to participants in your Twitter Space, Discord voice chat, or other uncaptioned live audio.
What is Web Captioner?
Web Captioner is a website that uses Google Chrome's built-in speech recognition to provide a simple speech-to-text display. Currently, that speech recognition functionality is only in Google Chrome, so Chrome is required for whoever is recording the audio.
Web Captioner seems largely designed for local use cases, such as a classroom setting, where you could show the transcription on a big screen. Web Captioner also has options to integrate with software like Zoom, OBS Studio, or vMix to provide true closed captions. Crucially for this jerry-rigged solution, Web Captioner also provides an experimental feature for creating shareable links to your captions.
Step 1: Set Up Audio Loopback
The specifics of this step depend a lot on your operating system.
Web Captioner depends on Chrome's ability to capture audio. As a result, while Web Captioner can pick up your microphone just fine, it won't be able to transcribe the audio from anyone else on the call out of the box. If you want to transcribe the full call, you'll need to set up Google Chrome to use the system's entire audio output as an input.
To do this, you'll need to install and set up some audio loopback software, since operating systems can get pretty weird about using audio outputs from some applications as inputs in others. Web Captioner themselves have some steps you can walk through for setting up some loopback software:
- Web Captioner's steps for setting up VB-CABLE if you just want to caption another application's audio, and aren't worried about capturing your own microphone. This is useful if you have a second device you want to use just for listening/transcribing.
- Web Captioner's steps for setting up loopback for other applications and your microphone, using Voicemeeter on Windows and Loopback on macOS. This is the way to go if you're chatting and transcribing from the same device.
Audio loopback can get really messy, particularly if you also plan to use your own microphone, so where possible, I'd recommend using a two-device setup — one for your microphone, and one for listening — if you can get away with it.
If you already have audio loopback software set up (For instance, I use shinywhitebox's SWB Audio App for streams, and I know of other people who use BlackHole) to provide device audio output as an audio input source, then you should be able to use that setup just fine.
Step 2: Get Chrome to Use Your Audio Output
Once you've set up your loopback software to provide device audio output as a new audio input source, we need to get Google Chrome to use that device audio in place of your microphone.
To set this, go to Chrome's microphone settings at chrome://settings/, and find the microphone dropdown. Choose the audio source you created during Step 1. On my Mac, it had "(Virtual)" at the end to make it easier to find. I'm not sure whether Windows would do the same thing.
If your device audio isn't available from the dropdown, you might need to restart Chrome.
Step 3: Test the Transcription in Web Captioner
Time for the big moment of truth! Let's make sure Web Captioner can accurately pick up your device audio now.
Go to Web Captioner, and click either of the bright blue Start Captioning buttons in the header:
This will take you to the mostly blank, black screen of the transcription interface:
Click the yellow Start Captioning button in the footer. In another tab, window, or application, play some audio with some dialogue. I used a YouTube video for this. Hop back to Web Captioner and confirm that Web Captioner is transcribing the audio. If you're using Web Captioner on the same device you're chatting from, try speaking into the mic to confirm whether your mic audio is also getting transcribed.
Assuming this worked, it's all smooth sailing from here! ⛵
Step 4: Getting the Shareable Link
Next up, we'll have Web Captioner generate a link to our live captions that we can share with participants.
You'll need to be signed in to save settings and to generate the link. First, click the Settings menu icon in the very bottom right corner of the captioner's interface, and then click Sign in. Follow the steps to authenticate into Web Captioner with an account.
Next, we're going to enable the experimental Share feature. This experiment is currently hidden away, so to get to it, you'll need to visit https://webcaptioner.com/ directly. When you do, you'll be greeted with a popup like this:
Check each checkbox provided, and then click the Add Experiment button to proceed.
Return to the captioner interface. Next to the Start Captioning button, there should be a new button with an icon that looks like a radio tower. Click it to open up a new popup:
Use these settings to configure your share link as needed. I wasn't able to get the custom vanity link to work, but that could have just been a temporary issue. When you're ready, click Get Link.
Step 5: Share the Link and Go Live!
Before the live event starts, promote the captions link, making it as easy as possible to find. After all, these captions are only as useful as they are discoverable. You should also draw attention to the link during the event itself. I set up a memorable, intuitive redirect (benmyers.dev/captions
) so I could mention the captions link on air without having to spell out the randomly-generated string of letters. For the purposes of the Twitter Space, we also pinned a tweet to the top of the Space with a link to the captions.
When the event starts, be sure to click Start Captioning to kick off the transcription.
Step 6: Post-Event Wrap-Up
After the event is done, you'll want to return to Web Captioner and stop the captions. You should probably do this before you or any other hosts say something you don't want broadcast to the world 😅
While in the Web Captioner interface, you can export a transcript! This is especially helpful if you plan to upload a recording of the event. To export your transcript, pop open the Settings menu in the bottom right corner of the captioner interface again, and click the button with the floppy disk Save icon:
From there, you'll be able to export your transcript as both a text file and a Word document:
Using this transcript elsewhere?
If you're planning to use this transcript alongside an upload of the event, please be sure to clean it up and correct it first. The exported transcript will have plenty of mistranscribed words, as well as weird mixes of prematurely cut off sentences alongside run-on sentences. You'll also need to clearly indicate who's speaking, and probably add in any non-dialogue audio cues as well.
Finally, you might want to put your Chrome instance back in its default microphone state, as well as disable any audio loopback software you have runnning, so that your audio experience for day-to-day app usage is back to normal.
Conclusion
If you can use a platform that supports captioning out of the box, please do so. It'll be far more reliable than running finicky audio loopback software, depending on continued support for a hidden experimental feature inside Web Captioner, and requiring listeners to have a separate window up to follow the conversation. However, if you've exhausted other options, a shareable link like this could work in a pinch.
You may also be interested in /u/mossonrok's Reddit post, where they go into using a similar approach on macOS, with a focus on Discord voice chats.