Cleaning up word HTML

When posting content from Microsoft Word on the web. There are a few things to remember so that you don’t end up having to go back and reedit your documents.

First and foremost Word does funky things to formatting, especially when it comes to quotes and fonts. In order to make your word files look good; whether your posting in your blog or creating content in Collage or even in Dreamweaver, watch out for quotes because while these look “good” if you do it wrong they look like this ?quote? and no one likes these. So what’s the best way to fix this problem, I bring you two solutions one using Dreamweaver and the other a little bit more work.

The easy way first. In Dreamweaver make a new HTML page and then paste your word content into the design view. It should look exactly like the text out of Word. Onto the clean up, under the Commands tab at the top select the clean up word HTML option, the defaults will work just fine and hit okay. Dreamweaver will now run through the code and strip out all the unwanted Word formatting and leave you with code you can paste into any web based WYSIWYG editor.

Now for the multi step way. From Word open up notepad (easy way is to hit start > run and in the dialog box type notepad) from here paste the content from Word into notepad. Because notepad can’t understand the Word formatting it just skips it but you will need to walk through your text just to make sure everything is as you want it to be, as it doesn’t clean it up like Dreamweaver does. Once you’ve walked through your text; in notepad now you can paste it into what ever web editor you wish.

Hope that helps rid you of the many headaches that can arise when pasting from Word. Have other tips on cleaning up word text? Post a comment.

7 Responses to “Cleaning up word HTML”

  1. David Delgado Says:

    It is amazing how one program can cause so much pain.

    If you are using Word for the spellchecking and thesaurus capabilities entirely, and really don’t care about the look, James Falkofske at Metro State has created a utility called ‘HTML_Cleaner’ which will strip out all Microsoft CSS. Look under ‘Conference Materials’ on his site, http://james.metrorichmedia.com.

    One other note: Desire2Learn, when using Internet Explorer will try to automatically clean copy-and-pasted HTML from Microsoft Word.

    Cheers.

  2. James Falkofske Says:

    Actually, there is an updated version of my HTML_Cleaner program posted at my website http://www.PedagogyOnline.com — under the tab ONLINE TOOLS. Install both the .NET platform from Microsoft and the unzipped package from my website. There are a set of screen-capture illustrated instructions under the HELP menu.

    James

  3. Doe Says:

    Good site! I’ll stay reading! Keep improving!

  4. George Says:

    Good site! I’ll stay reading! Keep improving!

  5. Andrew Plimmer Says:

    Nice stuff Joel. It seems you have got a practical knowledge about cleaning up word HTML. I have tried myself and your methods worked perfectly.

  6. Haustiere Says:

    Nice post!! You are an expert!! I wish to learn more about SEO tricks.. I love your site!! Thanks for sharing your thoughts!!

  7. Cleaning Tips Says:

    Nice and usefull post, thanks, this is one for my bookmarks!

Leave a Reply