HTML Hell

GradyHendrix

New kid, be gentle!
Super Member
Registered
Joined
Nov 16, 2010
Messages
157
Reaction score
15
I'm not a technically adept person by any means, so this isn't entirely unexpected, but I'm in the middle of HTML hell. I've got an 80,000 word document written in MS Word and I'm trying to convert it to HTML and then to and ebook friendly format like .epub or what-have-you.

But getting it clean in HTML from Word is awful. I've been trying to use an automatic conversion program like Word 2 Clean but I'm finding so many insane errors with indenting that I'm going nuts.

Any suggestions on a better way to go about this, besides, "Don't write in MS Word?"
 

Deleted member 42

Any suggestions on a better way to go about this, besides, "Don't write in MS Word?"

Find a Mac using friend with Pages.

Scrivener for Mac is a lovely tool for making epubs; it's worth contacting the developer or a Windows user and finding out if that's true for Windows.

Word Perfect for Windows does pretty clean Save As HTML.
 

GradyHendrix

New kid, be gentle!
Super Member
Registered
Joined
Nov 16, 2010
Messages
157
Reaction score
15
Thanks for the advice. I'm already using Bean for most writing and jEdit for HTML editing and tweaking. My real crisis right now is that I've done everything I know how to strip out the MS Word formatting for tab indents and line spacing. I mean....everything! And yet I still have the occasional bizarre tab indent in my document after I transfer it to HTML and take a look and, for some even weirder reason, there's a line space after every single paragraph. That's really what has me crying blood right now.

How to avoid in the future - check!
How to fix what I've got in hand now - confused and frustrated!
 

zpeteman

Natural born...
Super Member
Registered
Joined
Jun 2, 2007
Messages
306
Reaction score
37
Location
Nashville, TN
Website
thefiddlersgun.com
Why not just do a find/replace for all your tabs? Word's find function is very robust and it's pretty simple to remove just about any formatting.

Then copy it into Pages to make the epub (be sure to download the template from Apple). (If you aren't using a Mac then you probably shouldn't be writing in the first place! :) )
 

Deleted member 42

Thanks for the advice. I'm already using Bean for most writing and jEdit for HTML editing and tweaking. My real crisis right now is that I've done everything I know how to strip out the MS Word formatting for tab indents and line spacing. I mean....everything! And yet I still have the occasional bizarre tab indent in my document after I transfer it to HTML and take a look and, for some even weirder reason, there's a line space after every single paragraph. That's really what has me crying blood right now.

How to avoid in the future - check!
How to fix what I've got in hand now - confused and frustrated!

Can you send me the HTML file? (medievalist AT Mac.com)

I'll send you back a cleaned up file; let me know what in particular you need done. (I suspect you could do this, but I've been hand-coding HTML a very long time, and have a fabulous Mac-only text editor called BBEdit.)
 

GradyHendrix

New kid, be gentle!
Super Member
Registered
Joined
Nov 16, 2010
Messages
157
Reaction score
15
If that's a serious offer, I'm definitely taking you up on it.

Thanks for the advice about find/replace (and I'm using a Mac, thank goodness - didn't know it was such an advantage but I can't imagine this stuff being harder!). I tried it and it worked on a big chunk of tabs but for some reason the first tab will not match up.
 

Deleted member 42

If that's a serious offer, I'm definitely taking you up on it.

Thanks for the advice about find/replace (and I'm using a Mac, thank goodness - didn't know it was such an advantage but I can't imagine this stuff being harder!). I tried it and it worked on a big chunk of tabs but for some reason the first tab will not match up.

I'm serious.

You might also take a look at TextWrangler for cleaning up text; it doesn't include the HTML tools of it's more geeky sibling BBEdit Pro, but it is free and very good at cleaning up via search and replace.

http://www.barebones.com/
 

valeriec80

Got the hang of it, here
Super Member
Registered
Joined
Jun 12, 2009
Messages
388
Reaction score
33
I'd suggest learning how to code html if you don't already know it. This is a great site: http://htmldog.com.

Really, for a simple document like a book, you don't need to know that much code. Making a paragraph, centering things, italicizing things and putting line breaks in is about the extent of it, unless you need to get super fancy for some reason.
 
Joined
Feb 21, 2011
Messages
1,733
Reaction score
197
Location
Amsterdam, the Netherlands
Website
amsterdamassassin.wordpress.com
Thanks for the advice. I'm already using Bean for most writing and jEdit for HTML editing and tweaking. My real crisis right now is that I've done everything I know how to strip out the MS Word formatting for tab indents and line spacing. I mean....everything! And yet I still have the occasional bizarre tab indent in my document after I transfer it to HTML and take a look and, for some even weirder reason, there's a line space after every single paragraph. That's really what has me crying blood right now.

How to avoid in the future - check!
How to fix what I've got in hand now - confused and frustrated!

I replace all em-dashes, ellipsis, single and double quotes with html codes, make sure all italics are wrapped in <i></i>, then copy the whole document and put it in Notepad or similar. That will strip all formatting - tabs, indents, font and font sizes. From Notepad I copy it into jEdit. That way, only the html formatting you've done prior to rinsing the document through notepad will have survived.

It sounds like you copied the content of the .doc into the jEdit, without rinsing out the formatting by the notepad.
 

GradyHendrix

New kid, be gentle!
Super Member
Registered
Joined
Nov 16, 2010
Messages
157
Reaction score
15
Yes, yes, yes! Kathleen! Yes! I have been following that site religiously! I actually find it hugely helpful and I'm glad you're seconding the notion that it's actually useful and good. I'm glad someone else is vouching for it - I was worried I might be using it and someone wiser would come in and say, "You're using THAT site? Ha! It's 5 years out of date!"

Thanks for the pointers and offers of assistance, everyone. I'm digging into this again today and taking all this on board and trying my best. Seriously, for someone who is shy about technical things I feel like I'm hacking my way through an unmapped forest, so getting feedback is keeping me from going crazy.