One problem people face is the proper sharing of sensitive information. The information can be anything from contracts, technical specifications, to resumes. A problem arises when you want to show the information, but not all of it for confidentiality or competitive reasons. In other words, you need to hide or remove text from a document. There are right and wrong ways to do this redaction or “blacklining” in Microsoft Word. (Updated to reflect plugin changes.)
The term redaction may not be a household term, but is often used in the legal community. It’s the practice of removing confidential or sensitive data before giving the document to others. This is different than removing hidden Metadata as the reader will see information is covered up. With the advent of privacy rules and the rise of file sharing, the rest of us should be aware of how to do proper redactions. Some people also refer to this process as sanitizing.
Common Word Redaction Methods and Issues
A popular way that people redact information is to use Microsoft’s Word highlight tool and cover up the confidential text with black. This works fine if you’re printing the document or faxing it to someone else. This process doesn’t work well if you email or upload the file to someone else since they can easily undo the highlighting. While this is a simple method, it’s not secure and should be avoided.
Next people started to convert the same Microsoft Word document to Acrobat PDF. When the recipient opened the file they would see the black highlighted text and it would print the same. On the surface, this looked like an elegant solution since people couldn’t edit the text. But looks can be misleading.
What many authors didn’t realize is that if you can select text in Adobe Reader and copy and paste it back to an editor, you can see the contents underneath the black highlighted text. For you skeptics, I’ve attached a sample PDF file so you can try this experiment on your own.
- Go to https://www.www.timeatlas.com/tutorial/RedactWordTest.pdf
- You can either download the file or open it in your browser.
- Use the Select tool in Adobe Reader or your browser and select all the text.
- From the Edit menu select Copy
- Open an editor such as Notepad or VS Code.
- Paste the text into a new document.
You should be able to see the contents that were behind the black highlighted text. Don’t worry; I created fictitious data just so you could see the problem.
Better Solutions to Sanitize PDF Documents
There are several ways to solve this issue as the legal and intelligence communities know. One solution is to use a 3rd party utility from Appligent that properly handles redaction. The company offers several flavors of the tool, but it may be too expensive for you if you have an occasional need since the starting price is about $250. You also need to convert your Word document to a PDF file.
Another option is to save your Word document as a PDF file and then bring it into an image program like SnagIt. Using SnagIt, I applied a blur effect to the portions I wanted to redact.
This solution isn’t perfect though. For example, in the PDF I referenced in Step 1, you can grab all the text in your browser. With the SnagIt version, I couldn’t do this. Instead, I had to open the PDF version in Microsoft Word. I could then edit the text but there were some character recognition issues. However, my blurred text came in as an image object.
Microsoft Word AddOns & 3rd Party Services
A more practical solution for most of us is an open-sourced free add-in that works with Microsoft Word. The redaction program allows you to create a redacted document that you can send to others. The redacted text stays hidden even if you convert it to a PDF file. This makes it easy to send documents to others without the fear and anxiety that something confidential remains.
For those of you who remembered my original article, the tool was created and supported by Microsoft. However, they retired the program and made it opensource. This means others can improve the code and contribute. However, since it’s no longer an official program, Microsoft does not support it. It sits in their CodePlex archive. The other big “gotcha” is it no longer has an installer file, so you will have to compile your own build.
How to Redact Text
info The screenshots here are from my original article which used Microsoft Word 2003. I have seen reports that the compiled code in the CodePlex archive work on Word 2016.
The download installs a small floating toolbar to Microsoft Word. To redact text, you simply highlight the words and click the Mark button.
Your redacted text displays in 25% gray shading. This may be an issue if you have other parts of your document using the same shading and color percentage.
As you might guess from the screen captures, the toolbar is very easy to use. There are a few caveats. You can use the tool on most parts of a Word document, but there are some exceptions. Specifically, the Redaction Add-in does not support redaction of:
- Content in textboxes or frames
Once you’ve highlighted all the text to redact, you click Redact Document. This creates another Word document. You also have the option of protecting the document as well.
Now if you were to copy the text or convert the document to an Adobe PDF file and perform the trick mentioned above, you wouldn’t see your redacted text. If you look at the contents in a program editor in HEX mode, you’ll see pipe signs in place of the hidden text. And no, there isn’t a 1 to 1 correlation between letters and the pipe sign.
One item that Microsoft references in their safeguards is to use caution when redacting single words. The concern is based on reports that people can use dictionary type tools along with knowledge of your font to predict the missing word. An interesting article on this subject by John Markoff titled Illuminating Blacked-Out Words appeared in the New York in May of 2004.
I found the Microsoft Word add-in very easy to use. The primary limitation is you’ll have to have someone compile the code. I have seen references to PDF programs that can properly do redaction as well as items in the Microsoft Store. However, I have no experience with these tools so can’t offer an opinion.
A reader pointed me to a 3rd party paid solution called Redact Assistant from the Payne Group. From the product spec sheet, it looks to work with both Microsoft Word and Microsoft Excel. I reached out to the company to get a trial, but haven’t heard back. I’m also confused about their pricing because in one section the text reads “For less than 20, you can purchase the retail version online”. However, if you were to click the Buy Now button, the price that shows is $120.