Many Ways to Convert Word to HTML

Many people would agree that Microsoft Word is a versatile program. The problem is that the software may not be the best tool to convert Word to HTML. One reason is that Microsoft adds extra code that allows you to switch from one document format to another easily. The result is larger files and code that may cause issues. But there are free and paid alternatives

Like many people, I use Microsoft Word to write content. I sometimes write my articles in Word and convert them into my content management system (CMS). I would prefer not to rewrite the article in another application, nor do I want to write in my CMS editor since it has limited functionality. It can be a trade-off between convenience, functionality, and file size.

Document Complexity

When I first wrote this article in 2008, the conversion options were much different. For example, Gmail had an option to view an attachment as HTML. There were also a couple of commercial vendors who have since disappeared. I’ve updated the article to reflect those changes. The main considerations I see are:

  • How complicated is your Word document?
  • Does your Word document have images?
  • Are you comfortable uploading a file to a 3rd party service?
  • Where will the HTML document be seen?
  • Will the document be viewed on a mobile phone?
  • Are you willing to pay for conversion?
  • How frequently do you need to convert Word documents?

Microsoft Word -Web Page

You might think the best and most convenient way to get your Word document to HTML is to use the Save as type: Web Page. Then you could upload the saved HTML file to a web server. There are two issues you should review with this file type.

This Web Page format appends the information from the File Properties dialog and other descriptive information to the document’s top. These data elements include author, last author, company, document stats, and so on. You can see some of these elements in the image below.

Converted Word doc in HTML editor.
Extra info Word adds to source code

The Web File version is probably fine for company intranets, where users aren’t as concerned about privacy. Some of this information could be seen if you emailed the Word file to a co-worker. In contrast, I wouldn’t use this format to post your resume on the web, especially if you wrote it using a company PC that shows the organization’s name.

The second issue is this HTML format adds tags to the file. One function of these tags is to convert your Microsoft Word style information. These tags also make it easier to go from one file type version to another. For example, if you wanted to go from .HTML to .RTF or back to .DOCX.

converted Word file with extra tags
HTML file with extra info

This extra code increases the size of your web page. This may not sound like an issue, but it can be based on your document size. And the extra code may cause rendering issues on mobile devices.

Another drawback of this extra code is if you need to edit the HTML file. Most HTML documents have a separate CSS file that controls styling. With converted documents, this styling is done inline. However, based on how the initial Microsoft Word document was styled, you might have to make changes to every paragraph or span. With a CSS file, you’d probably make one change.

Microsoft Word – Filtered Web Page

Microsoft has another HTML file format called the Web Page, Filtered. This file type strips most of the document information. It also cuts the number of style codes. Although smaller, this file format still contains numerous references.

In my small test page, the size was cut from 9.82K to 4.03K with this format. Much of the savings in this example was from the removal of the document information. In my first file, the heading tag for Example 1 was on line 175. In the Web Page, filtered format, the same heading is at line 58.

Word HTML filtered doc.
Converted Word doc using Web Page, filtered

The bottom line is the Microsoft Word conversion options are free and offer you convenience. The downside is you may reveal too much info in your document, and future HTML editing may be tougher with all the extra info.

Content Management Systems (CMS)

Many content management systems promise that it’s easy to create content. Ideally, you write your article in their HTML editor. I’ve yet to find an editor that gives me the functionality or space I need, which is why I sometimes write in Microsoft Word.

Some CMS editors provide a Paste from Word toolbar button.

Paste Word toolbar button.
Example of a CMS with Paste Word

These utilities will remove some tags, but not all. Depending on your document, there might be lots of these tags. Some systems also offer a Paste as Text button. This button will remove all formatting. This button works well for simple documents, but if you have any formatting for lists, tables, paragraphs, and so on, you might spend more time reapplying the formatting.

Paste & Convert Solutions

Another way to convert Word documents is to use an online service. These are free services that best handle simple text. One advantage is these services are not saving your files, and no uploads are required. The major drawback is images are typically ignored.

TextFixer

One of the simplest converters to use is TextFixer. You copy the Word document and then paste the contents into a textbox. The service does a reasonable job of providing HTML but strips out any images. This also means if you used fancy list bullets or windings, you might see a different character.

If you only have a few images, this service is still a good consideration because it’s pretty easy to add the images back if you know a little HTML.

This site also has a number of other free text tools and tutorials that might interest you such as converting HTML to text or text sorting.

Word2CleanHTML

Word2CleanHTML provides a similar free conversion service. Like TextFixer, you copy and paste your Word document into a textbox. The advantage is they provide several checkboxes for additional filterings, such as removing blank paragraph tags or converting “smart quotes.”

They also provide tabs so you can compare the “Original HTML,” “Clean HTML,” and “Preview.” These tabs were useful as they helped me spot a blank table in my original document. The main drawback is they don’t handle images, but as I said above, those are easy to add back in if you don’t have too many and know a bit of HTML.

Upload File and Convert Solutions

Another group of services allows you to upload the Word .doc or .docx file for conversion. In some cases, the service did other file type conversions aside from Microsoft Word. Typically, these services did a better job of conversion as they would retain images. However, some people don’t like uploading files to another site, especially if it’s confidential information that might go on a corporate intranet.

Online-Convert.com

Online-Convert.Com is a service that converts many file types, as the image below shows. It’s similar to a service I reviewed sometime back called Zamzar. The process is simple in that you choose the end file type you need, such as HTML. You then upload your Word source document. One advantage is you can upload files from a URL, Google Drive or DropBox.

Drag and drop interface for converting docs.
Convert different file forms to HTML

Your file will be converted, including any images. The images are converted to PNGs and are included in the zipped file. The resulting file contains more inline styling than the previous solutions, but not nearly as much as Microsoft Word. There are some additional META tags added in the section. If you prefer, you can also opt to get a download link for the converted file sent to an email address.

Word to HTML

This service is the most versatile of the Word to HTML converters. It has plenty of features for the free version, but it also has a paid or Pro component that adds much more. Unlike sites that do lots of file conversions, WordtoHTML specializes in HTML. They also have the most control over CSS and even javascript. The free version doesn’t allow file uploads and embedding images. However, for images, you can provide a URL, and they will add the image tags, size, and description.

Word to HTML interface.
Free version of Word to HTML

You’ll also need to copy the code from the HTML editor as the free version doesn’t allow file downloads.

The WordtoHTML Pro version costs $49 a year and includes additional features. If you routinely convert Microsoft Word files or PDF files to HTML, this is a good option as you can customize and save your settings in template files. For example, you may want to include head or meta-information or prefer a certain type of formatting.

This is one of the few services I’ve seen that has PDF to HTML conversions. It worked pretty well, but like complex Word documents, you may see issues. For example, drop caps don’t always come in. And if people have tweaked the letter kerning to get words to fit, you may see errors. The Find and Replace option is also handy if you need to replace character entities.

Batch and Complex Conversions

The other situation some companies run into is how to convert hundreds of Word documents to HTML files. I doubt anyone would want to do these file conversions one document at a time. Instead, they could use another commercial program called DocConverterPro. The program allows you to batch convert .doc, .docx, .rtf and PDF files to HTML or XHTML. This product is the new and improved version of the WordCleaner program I used back in 2008. It’s been rebranded.

DocConverterPro does more than batch HTML conversion. I used this software years ago to convert my Word documents for this site. At the time, I wasn’t familiar enough with HTML. One of the nice features is you can create conversion templates. For example, I might have one template for content on this website where I use CSS files to handle the presentation. On another site, I might choose to embed the style information in the file. It’s very flexible and powerful.

DocConverterPro template
You can edit conversion templates

The service also has a Windows program. That was the method I used many years ago, but I think the online version is easier. The pricing does vary between the services and options. For example, the online version is $99.