The best way to de-bastardize Microsoft HTML (MS-HTML) or any crappy HTML is to use the wonderful open source, W3C approved program HTML Tidy.
http://www.w3.org/People/Raggett/tidy/
http://tidy.sourceforge.net/
Tidy can now perform wonders on HTML saved from Microsoft Word 2000! Word bulks out HTML files with stuff for round-tripping presentation between HTML and Word. If you are more concerned about using HTML on the Web, check out Tidy's "Word-2000" config option! Of course Tidy does a good job on Word'97 files as well!
To use from a command line, just add --word-2000 yes