Have you ever listened to a podcast, webinar, or class lecture and wished you had the written transcript? In this tutorial, I’ll show how to convert an audio file to text using Microsoft’s Word for the web. It’s a great start, but you still need to do editing like with all audio transcription tools.
What is Word for the Web?
If you’re not familiar with Word for the web, the online word processor from Microsoft works in your web browser. It was previously called “Word Web App.”
It’s like Google Docs, except it’s from Microsoft, and your documents are saved in Microsoft OneDrive. The command structure is similar to the desktop version but not as feature-rich.
While the online version is free, the Transcribe feature is premium. You’ll see the menu item and notice that you need to upgrade to Microsoft 365. This is the new name for Office 365.
Transcript Purpose
Let’s start with the end in mind. How do you want to use the audio transcript? In my case, I needed a simple podcast transcript. Because this is for my personal use, I’m not too concerned with word mistakes, timecodes, styles, and other matters.
If I were the podcast producer and wanted to repurpose the website’s audio transcript, I would have stricter criteria. I would spend the extra time fixing issues. Similarly, if I were transcribing something critical, I might use a transcriptionist or oTranscribe and do it myself.
Regardless of which system or service you use for audio transcription, you’ll need to do some work.
Setting Expectations
Transcribing can be tough for many reasons. If you’ve ever played with dictation programs, you know that sometimes you must tell the software about punctuation. Now, consider how much more difficult the task is without these verbal clues. The bottom line is don’t expect perfection. If this were easy, closed captions on YouTube videos would be flawless.
To test the transcribe function, I chose 3 files. The first was the Techmeme Ride Home podcast, which is produced in a studio. The second was an MP4 file produced with Techsmith’s Camtasia in my home office. The last file was a podcast with multiple speakers.
I also used Sonix.AI and Trint, which are paid services, on the same Techmeme podcast file. All services had issues with:
- punctuation
- proper names
- homonyms
- quotations
The Word transcribe service also appears to break the audio into 20-30 second chunks unless it can determine a notable break or a change in speakers. The net effect is that you get some punctuation issues when you add content to Microsoft Word. This is in contrast to Sonix.AI and Trint, which can do much longer segments. The flip side is that sometimes paid services make the segments too long, which causes a “wall of text.”
The number of speakers in the file plays a role. While I was dealing with 2 speakers at the max, I can see where a group discussion would add complexity.
Another item to consider is subject matter. While I don’t know these systems’ mechanics, I’m guessing the algorithms rely on dictionaries or lexicons. Also, remember that you can’t take advantage of a Word exclude dictionary since we’re using Word for the web.
You might see some misinterpretations if you’re transcribing files with technical terms or jargon. The service may also convert a software release reference such as 9.4.0 to 9 dot 4 dot 0.
The environment and background noise impact the results. The more noise you have on the source audio file, the worse the transcription. If you’re recording lectures, keep in mind noise from classmates or your distance from the speaker. The same goes for webinars or videos, which include soundtracks.
If you have a noisy audio file, try Audacity to clean up the audio source file. You can also use that program to convert one audio file type to another format.
Find the Audio Source
For the audio transcription to work, you’ll need a source file. Word for web allows the following audio file format types to be transcribed:
- .wav
- .mp3
- .mp4
- .m4a
Some podcasters make it easy to download the file. The embedded web players have a dedicated download button.
Other podcast providers or media players don’t allow the same options. For the Techmeme file, I downloaded the podcast from PocketCasts Plus. This is a premium feature.
Transcribing the Audio File
This method requires a source file in one of the accepted audio file formats and Microsoft 365. The amount of time Microsoft takes to transcribe your audio file is based on the file size.
- Go to www.office.com and log into your account.
- Click the Word icon from the left pane.
- Click New blank document.
- From the Dictate menu, select Transcribe.
- In the Transcribe Pane, click the blue Upload audio button.
- Navigate to your audio file and click Open.
- Leave the application open while the file is processing.
After the transcription is done, the Transcribe pane on the right side will show a time-stamped transcript version.
The Transcribe pane includes:
- Audio file name and link
- Time bar and playback controls, including variable speed options
- Transcript sections with a timestamp, speaker label, and text snippet
- A large editor area where you can add in transcript sections.
Making Edits
Once the transcript shows up, you’re ready to make edits. If you want to skip editing, you can click the Add all to document button. That will carry over to everything except the timestamps. Most likely, you’ll need to edit.
When you hover over a segment, the background color will change to white. This becomes the active transcript section. You can click the timestamp to the left of the speaker label to replay the section’s audio. The top playback controls [A] allow you to adjust the speed and direction. You can edit the section by clicking the pencil icon [B] or add it to the Word editor using the ⊕ button [C].
You should identify your audio first and change their labels if your audio has multiple speakers. Microsoft allows you to make global changes like turning all “Speaker 1” to “Tony.” This makes it much easier to follow along.
To Edit a Section
- Hover over the section you wish to edit.
- Click the pencil icon. An outline appears around the editable areas.
- Make your changes.
- Click the checkmark in the lower right to save your changes.
One convenience Microsoft offers is that you can selectively add transcript sections. For example, instead of clicking the Add all to document button, you click within the Word document where you’d like the segment to show and then click the ⊕ button that appears in the top right corner. This is helpful if you wish to omit sections, such as advertisements, or put a section in a different order.
Troubleshooting
Here are some items I encountered in testing:
Too many speakers – This would happen if a segment was recorded elsewhere and added in, such as an intro track. Another cause was the advertisements.
Quotations – The system would not properly identify when someone was quoting. A speaker would use “quote” and “end quote.” The transcript would show those same words minus the quotation marks. This problem is best solved by using the search and replace feature in Word.
Strange paragraph breaks – this is an annoyance, and I’m not sure there is anything you can do to fix it unless you’re the speaker. I think this is a byproduct of Microsoft breaking segments into 20-30 second timeframes. Here’s an example.
In contrast, Sonix.AI used a longer time window. Interestingly, they use British English spelling for colour.
I edited the text, but I still hear the original audio – any text edits you make do not alter the underlying audio. So if you listen to the transcript, you’ll hear the original.
Final Thoughts
After going through this testing, I’m reminded of how tough it can be to transcribe an audio file to text. Despite the issues I encountered, they weren’t unique to Microsoft. Both Trint and Sonix.AI had their issues as well. I will continue to use the service for personal needs. It’s a nice feature that will improve. However, it’s still very new and not why we bought Microsoft Word.
The paid services have an edge in this area. They were designed from the ground up to do the audio transcription. Their feature-set is geared toward transcribing and allows you to build custom dictionaries or edit the audio file. The good news is there are plenty of options for us. Both the paid services had trials so that you can test their effectiveness.
Shortly after I wrote this article, David Rohrer did a comprehensive review of more transcription services. He used his own podcast, The Business of Digital, for the test. It’s a good read and points out other issues to consider.