One problem people face is the proper sharing of sensitive information. The information can be anything from contracts, technical specifications to resumes. A problem arises when you want to show the information, but not all of it, for confidentiality or competitive reasons. In other words, you need to hide or remove text from a document. There are right and wrong ways to do this redaction or “blacklining” in Microsoft Word. (Updated to reflect plugin changes.)
Individuals vs. Companies
For starters, this tutorial is geared toward individuals. However, if you’re an enterprise user of Microsoft 365, you have more resources. Specifically, you have access to Microsoft 365 compliance center. Within the center, there is an eDiscovery feature that allows for redactions.
The term redaction may not be a household term but is often used in the legal community. It’s removing confidential or sensitive data before giving the document to others. This is different than removing hidden Metadata as the reader will see information is covered up. With the advent of privacy rules and the rise of file sharing, we should know how to do proper redactions. Some people also refer to this process as sanitizing.
Common Word Redaction Methods and Issues
A popular way to redact information is to use Microsoft’s Word highlight tool and cover up the confidential text with black. This works fine if you’re printing or faxing the document to someone else. However, this process doesn’t work well if you email or upload the file to someone else since they can easily undo the highlighting. So while this is a simple method, it’s not secure and should be avoided.
Next, people converted the same Microsoft Word document to Acrobat PDF. When the recipient opened the file, they would see the black highlighted text, which would print the same. On the surface, this looked like an elegant solution since people couldn’t edit the text. But looks can be misleading.
Many authors didn’t realize that if you can select text in Adobe Reader and copy and paste it back to an editor, you can see the contents underneath the black highlighted text. For you skeptics, I’ve attached a sample PDF file so you can try this experiment on your own.
- You can either download the file or open it in your browser.
- Use the Select tool in Adobe Reader or your browser and select all the text.
- From the Edit menu, select Copy
- Open an editor such as Notepad or VS Code.
- Paste the text into a new document.
You should be able to see the contents behind the black highlighted text. Don’t worry; I created fictitious data so that you could see the problem.
Better Solutions to Sanitize PDF Documents
There are several ways to solve this issue, as the legal and intelligence communities know. One solution is to use a 3rd party utility from Appligent that properly handles redaction. The company offers several flavors of the tool, but it may be too expensive if you have an occasional need since the starting price is about $250. You also need to convert your Word document to a PDF file.
Another option is to save your Word document as a PDF file and bring it into an image program like SnagIt. Using SnagIt, I applied a blur effect to the portions I wanted to redact.
This solution isn’t perfect, though. For example, in the PDF I referenced in Step 1, you can grab all the text in your browser. With the SnagIt version, I couldn’t do this. Instead, I had to open the PDF version in Microsoft Word. I could edit the text, but there were some character recognition issues. However, my blurred text came in as an image object.
As more machine learning systems are coming online, blurring portions may not be effective. This is particularly true on form fields. LifeHacker had an interesting article on this.
Microsoft Word AddOns & 3rd Party Services
An open-sourced free add-in that works with Microsoft Word is a more practical solution for most of us. The redaction program allows you to create a redacted document to send to others. The redacted text stays hidden even if you convert it to PDF. This makes it easy to send documents to others without the fear and anxiety that something confidential remains.
Microsoft created and supported the tool for those who remembered my original article. However, they retired the program and made it open source. This means others can improve the code and contribute. However, since it’s no longer an official program, Microsoft does not support it. Instead, it sits in their CodePlex archive. The other big “gotcha” is that it no longer has an installer file, so you must compile your own build.
Please note this program was archived in July 2021.
How to Redact Text
The screenshots are from my original article, which used Microsoft Word 2003. However, I have seen reports that the compiled code in the CodePlex archive work on Word 2016.
The download installs a small floating toolbar to Microsoft Word. To redact text, highlight the words and click the Mark button.
Your redacted text displays in 25% gray shading. This may be an issue if you have other document parts using the same shading and color percentage.
As you might guess from the screen captures, the toolbar is easy to use. There are a few caveats. You can use the tool on most parts of a Word document, but there are some exceptions. Specifically, the Redaction Add-in does not support the redaction of:
- Content in textboxes or frames
Once you’ve highlighted all the text to redact, click Redact Document. This creates another Word document. You also have the option of protecting the document as well.
If you were to copy the text or convert the document to an Adobe PDF file and perform the above-mentioned trick, you wouldn’t see your redacted text. Instead, looking at the contents in a program editor in HEX mode, you’ll see pipe signs in place of the hidden text. And no, there isn’t a 1 to 1 correlation between letters and the pipe sign.
In its safeguards, Microsoft recommends using caution when redacting single words. The concern is based on reports that people can use dictionary-type tools and font knowledge to predict the missing word. John Markoff’s article on this subject, titled Illuminating Blacked-Out Words, appeared in the New York Times in May of 2004.
I found the Microsoft Word add-in very easy to use. The primary limitation is you’ll have to have someone compile the code. I have seen references to PDF programs that can adequately do redaction as well as items in the Microsoft Store. However, I have no experience with these tools, so I can’t offer an opinion.
A reader pointed me to a 3rd party paid solution called Redact Assistant from the Payne Group. From the product spec sheet, it looks to work with both Microsoft Word and Microsoft Excel. I contacted the company to get a trial, but haven’t heard back. I’m also confused about their pricing because, in one section, the text reads, “For less than 20, you can purchase the retail version online“. However, if you were to click the Buy Now button, the price that shows is $120.
This may be one case where I think we’ve taken a step backward when it comes to functionality. It’s almost as if only enterprises and people willing to pay for 3rd party services have access to Word redaction tools. I can’t say I redacted text a lot, but it was nice to have that ability when needed.