Transcribe Audio Files with Word

Have you ever listened to a podcast, webinar, or class lecture and wished you had the written transcript? Microsoft released a new feature for Word for the web that simplifies transcription for English speakers. It transcribes audio to text. It’s a great start, but you still need to do editing like all audio transcription tools.

Word for the Web

If you’re not familiar with Word for the web, it’s the online word processor from Microsoft that works in your web browser. It was previously called “Word Web App”.

It’s like Google Docs, except it’s from Microsoft, and your documents are saved in Microsoft OneDrive. The command structure is similar to the desktop version, but it’s not as feature-rich.

While the online version is free, the Transcribe feature is a premium feature. You’ll see the menu item and notice that you need to upgrade to Microsoft 365. This is the new name for Office 365.

Upgrade notice for Transcribe.

Transcript Purpose

Let’s start with the end in mind. How do you want to use the audio transcript? In my case, I needed a simple podcast transcript. Because this is for my personal use, I’m not too concerned with word mistakes, timecodes, styles, and other matters.

If I were the podcast producer and wanted to repurpose the website’s audio transcript, I would have stricter criteria. I would spend the extra time to fix issues. Similarly, If I were transcribing something critical, I might use a transcriptionist or oTranscribe and do it myself.

Regardless of which system or service you use for audio transcription, you’ll need to do some work.

Setting Expectations

Transcribing can be tough for many reasons. If you’ve ever played around with dictation programs, you know that sometimes you have to tell the software about punctuation. Now, consider how much more difficult the task is without these verbal clues. The bottom line is don’t expect perfection. If this were easy, the closed captions on YouTube videos would be flawless.

To test the transcribe function, I chose 3 files. The first was the Techmeme Ride Home podcast, which is produced in a studio. The second was an MP4 file produced with Techsmith’s Camtasia in my home office. The last file was a podcast with multiple speakers.

I also used Sonix.AI and Trint, which are paid services, on the same Techmeme podcast file. All services had issues with:

  • punctuation
  • proper names
  • homonyms
  • quotations

It also appears the Word transcribe service breaks the audio into 20-30 second chunks unless it can determine a notable break or a change in speakers. The net effect is that when you add the content to Microsoft Word, you get some punctuation issues. This is in contrast to Sonix.AI and Trint, which can do much longer segments. The flip side is sometimes the paid services made the segments too long, which caused a “wall of text.”

The number speakers in the file plays a role. While I was dealing with 2 speakers at the max, I can see where a group discussion would add complexity.

Another item to consider is the subject matter. While I don’t know these systems’ mechanics, I’m guessing the algorithms rely on dictionaries or lexicons. If you’re transcribing files with technical terms or jargon, you might see some misinterpretations. The service may also convert a software release reference such as 9.4.0 to 9 dot 4 dot 0.

The environment and background noise impact results. The more noise you have on the source audio file, the worse the transcription. If you’re recording lectures,  keep in mind noise from classmates or your distance from the speaker. The same goes for webinars or videos, which include soundtracks.

If you’ve got a noisy file, you might try Audacity to clean up the audio source file. You can also use that program to convert one audio file type to another format.

Find the Audio Source

For the audio transcription to work, you’ll need a source file.  Word for web allows the following audio file format types to be transcribed:

  • .wav
  • .mp3
  • .mp4
  • .m4a

Some podcasters make it easy to download the file. The embedded web players have a dedicated download button.

Download audio file option on LibSyn player.
LibSyn menu bar with Download option.

Other podcast providers or media players don’t allow the same options. For the Techmeme file, I downloaded the podcast from PocketCasts Plus.  This is a premium feature.

Transcribing the Audio File

This method requires you have a source file in one of the accepted audio file formats and Microsoft 365. The amount of time Microsoft takes to transcribe your audio file is based on the file size.

  1. Go to www.office.com and log into your account.
  2. Click the Word icon.
  3. From the Dictate menu, select Transcribe.
  4. In the Transcribe Pane, click the blue Upload audio button.
  5. Navigate to your audio file and click Open.
  6. Leave the application open while the file is processing.

After the transcription is done, the Transcribe pane on the right side will show a time-stamped transcript version.

The Transcribe pane includes:

  1. Audio file name and link
  2. Time bar and playback controls, including variable speed options
  3. Transcript sections with a timestamp, speaker label, and text snippet
  4. A large editor area where you can add in transcript sections.
Transcribe pane and editor with callout options.
Word for Web Transcribe Pane

Making Edits

Once the transcript shows, you’re ready to make edits. If you want to skip editing, you can click the Add all to document button. That will carry over everything except the timestamps. Most likely, you’ll need to edit.

When you hover over a segment, the background color will change to white. This becomes the active transcript section. You can click the timestamp to the left of the speaker label to replay the section’s audio. The top playback controls [A] allow you to adjust the speed and direction. You can either edit the section by clicking the pencil icon [B] or add it to the Word editor by using the add_circle_outline button [C]

Editing a Transcribe section.
Transcription from the Techmeme podcast

If your audio has multiple speakers, you should identify them first and change their labels. Microsoft allows you to make global changes like turning all “Speaker 1” to “Tony.” This makes it much easier to follow along.

To Edit a Section

  1. Hover over the section you wish to edit.
  2. Click the pencil icon. An outline appears around the editable areas.
  3. Make your changes.
  4. Click the checkmark in the lower right to save your changes.
Checkmark confirms your transcribe edits.
Check to add transcribed segment to Word

One convenience Microsoft offers is you can selectively add transcript sections. Instead of clicking the Add all to document button, you click within the Word document where you’d like the segment to show and then click the button that appears in the top right corner. This is helpful if you wish to omit sections, such as advertisements, or put a section in a different order.

Troubleshooting

Here are some items I encountered in testing:

Too many speakers – This would happen if a segment was recorded elsewhere and added in, such as an intro track. Another cause was the advertisements.

Quotations – The system would not properly identify when someone was quoting. A speaker would use “quote” and “end quote.” The transcript would show those same words minus the quotation marks. This problem is best solved by using the search and replace feature in Word.

Strange paragraph breaks – this is an annoyance, and I’m not sure there is anything you can do to fix it unless you’re the speaker. I think this is a byproduct of Microsoft breaking the segments into 20-30 second timeframes. Here’s an example.

Example of segment causing extra paragraphs.
Odd transcript breaks

In contrast, Sonix.AI used a longer time window. Interestingly, they use British English spelling for colour.

Longer transcribe segment by Sonix.AI.
Sonix.AI transcription showing longer segments

I edited the text, but I still hear the original audio – any text edits you make do not alter the underlying audio. If you listen to the transcript, you’ll hear the original.

Final Thoughts

After going through this testing, I’m reminded how tough it can be to transcribe an audio file to text. Despite the issues I encountered, they weren’t unique to Microsoft. Both Trint and Sonix.AI had their issues as well. I will continue to use the service for personal needs. It’s a nice feature that will improve. However, it’s still very new and not why we bought Microsoft Word.

The paid services have an edge in this area. They were designed from the ground-up to do the audio transcription. Their feature-set is geared toward transcribing and allows you to build custom dictionaries or edit the audio file. The good news is there are plenty of options for us. Both the paid services had trials so that you can test their effectiveness.

Shortly after I wrote this article, David Rohrer did a comprehensive review of more transcription services. He used his own podcast, The Business of Digital, for the test. It’s a good read and points out other issues to consider.