Have you ever listened to a podcast, webinar, or class lecture and wished you had a written transcript? Microsoft released a new feature for Word for the web that simplifies transcription for English speakers. It’s a great start, but like all audio transcription tools, you still need to do editing.
Word for the Web
If you’re not familiar with Word for the web, it’s the online word processor from Microsoft that works in your web browser. It was previously called “Word Web App”.
It’s like Google Docs except it’s from Microsoft and your documents are saved in OneDrive. Its command structure is similar to the desktop version, but it’s not as feature-rich.
While the online version is free, the Transcribe feature is a premium feature. You’ll see the menu item and notice that you need to upgrade to Microsoft 365. This is the new name for Office 365.
Let’s start with the end in mind. How do you want to use the audio transcript? In my case, I needed a simple transcript of a podcast. Because this is for my personal use, I’m not too concerned with word mistakes, timecodes, styles, and other matters.
If I were the podcast producer and wanted to repurpose the audio transcript for the website, I would have stricter criteria. I would spend the extra time to fix issues. Similarly, If I were transcribing something critical, I might use a transcriptionist or oTranscribe and do it myself.
Regardless of which system or service you use for audio transcription, you’ll need to do some work.
Transcribing can be tough for a number of reasons. If you’ve ever played around with dictation programs you know that sometimes you have to tell the software about punctuation. Now, consider how much more difficult the task is without these verbal clues. The bottom line is don’t expect perfection. If this were easy, the closed captions on YouTube videos would be flawless.
To test the transcribe function, I chose 3 files. The first was the Techmeme Ride Home podcast which is produced in a studio. The second was an MP4 file produced with Techsmith’s Camtasia in my home office. The last file was a podcast with multiple speakers.
- proper names
It also appears the Word transcribe service breaks the audio into 20-30 second chunks unless it can determine a notable break or a change in speakers. The net effect is that when you add the content to Word, you get some punctuation issues. This is in contrast to Sonix.AI and Trint which can do much longer segments. The flip side is sometimes the paid services made the segments too long, which caused a “wall of text”.
The number speakers in the file plays a role. While I was dealing with 2 speakers at the max, I can see where a group discussion would add complexity.
Another item to consider is the subject matter. While I don’t know the mechanics of these systems, I’m guessing the algorithms rely on dictionaries or lexicons. If you’re transcribing files with technical terms or jargon, you might see some misinterpretations. Or, the service may convert a software release reference such as 9.4.0 to 9 dot 4 dot 0.
The environment and background noise impacts results. The more noise you have on the source audio file, the worse the transcription. If you’re recording lectures, keep in mind noise from classmates or distance to the speaker. The same goes for webinars or videos which include soundtracks.
If you’ve got a noisy file, you might try Audacity to clean up the file. You can also use that program to convert one audio file type to another format.
Find the Audio Source
For the audio transcription to work, you’ll need a source file. Word for web allows the following audio file format types to be transcribed:
Some podcasters make it easy to download the file. The embedded web players have a dedicated download button.
Other podcast providers or media players don’t allow the same options. For the Techmeme file, I downloaded the podcast from PocketCasts Plus. This is a premium feature.
Transcribing the Audio File
Applies To : Word for web
This method requires you have a source file in one of the accepted audio file formats and Microsoft 365. The amount of time Microsoft takes to transcribe your audio file is based on the file size.
- Go to www.office.com and log into your account.
- Click the Word icon.
- From the Dictate menu, select Transcribe.
- In the Transcribe Pane, click the blue Upload audio button.
- Navigate to your audio file and click Open.
- Leave the application open while the file is processing.
After the transcription is done, the Transcribe pane on the right side will show a time-stamped transcript version.
The Transcribe pane includes:
- Audio file name and link
- Time bar and playback controls including variable speed options
- Transcript sections with timestamp, speaker label, and text snippet
The large white area (4) is your editor area where you can add in transcript sections.
Once the transcript shows, you’re ready to make edits. If you want to skip editing, you can simply click the Add all to document button. That will carry over everything except the timestamps. Most likely, you’ll need to edit.
When you hover over a segment, the background color will change to white. This becomes the active transcript section. You can click the timestamp to the left of the speaker label to replay the section’s audio. The top playback controls [A] allow you to adjust the speed and direction. You can either edit the section by clicking the pencil icon [B] or add it to the Word editor by using the add_circle_outline button [C]
If your audio has multiple speakers, you should identify them first and change their labels. Microsoft allows you to make global changes like turning all “Speaker 1” to “Tony”. This makes it much easier to follow along.
To Edit a Section
- Hover over the section you wish to edit.
- Click the pencil icon. An outline appears around the editable areas.
- Make you changes.
- Click the checkmark in lower right to save your changes.
One convenience Microsoft offers is you can selectively add transcript sections. Instead of clicking the Add all to document button, you click within the Word document where you’d like the segment to show and then click the add_circle_outline button that appears in the top right corner. This is helpful if you wish to omit sections, such as advertisements, or put a section in a different order.
Here are some items I encountered in testing:
Too many speakers – This would happen if a segment was recorded elsewhere and added in such as an intro track. Another cause was the advertisements.
Quotations – The system would not properly identify when someone was quoting. A speaker would use “quote” and “end quote”. The transcript would show those same words minus the quotation marks. This problem is best solved by using the search and replace feature in Word.
Strange paragraph breaks – this is an annoyance and I’m not sure there is anything you can do to fix it unless you’re the speaker. I think this is a byproduct of Microsoft breaking the segments into 20-30 second timeframes. Here’s an example.
In contrast, Sonix.AI used a longer time window. Interestingly, they use British English spelling for colour.
I edited the text, but hearing original audio – any text edits you make do not alter the underlying audio. If you listen to the transcript, you’ll hear the original.
After going through this testing, I’m reminded how tough it can be to transcribe an audio file to text. Despite the issues I encountered, they weren’t unique to Microsoft. Both Trint and Sonix.AI had their issues as well. I will continue to use the service for personal needs. It’s a nice feature that will improve. However, it’s still very new and not why we bought Word.
The paid services have an edge in this area. They were designed from the ground-up to do the audio transcription. Their feature-set is geared toward transcribing and allows you to build custom dictionaries or edit the audio file. The good news is there are plenty of options for us. Both the paid services had trials so you can test their effectiveness.
Shortly after I wrote this article, David Rohrer did a very thorough review of more transcription services. He used his own podcast, The Business of Digital, for the test. It’s a good read and points out other issues to consider.