PDA

View Full Version : Kindle MOBI: back-compatibility and future-proofing?



Todsplace
05-20-2011, 02:20 PM
Hi guys,

I was thinking about ebook back-compatibility and future-proofing, and was wondering if anyone knew of the perfect setup for handling for this?

I'm guessing that the areas of concern where are doctypes (XHTML, HTML), charsets, and entity usage (named, numerical, or direct character injection into something like UTF-8?). What's the best combo of these things?

Some things I've started to realize (and please correct me if I'm wrong, as I'm a newbie):



Numerical entities are supposed to give old kindles some trouble.
UTF-8, in general on the www, does not need entities, and can support direct injection of the character[1 (http://stackoverflow.com/questions/3922342/chartset-utf8-and-character-entities)][2 (http://stackoverflow.com/questions/520236/should-i-still-use-html-entities-why)], but most people who use UTF-8 in ebooks seem to also put the named entities as well.
I believe earlier Kindles had initially more support for ISO-8859-1 than UTF-8, making me wonder if UTF-8 HTML docs will be okay on ye olde Kindles.
I've read that at one stage, Kindle DTP required people to use named entities over numerical ones--atleast the accounts form people at the DTP forum seem to suggest this[3 (http://forums.kindledirectpublishing.com/kdpforums/message.jspa?messageID=23586#23586)]. However, the current guide on their site does not mention named entities at all[4 (http://kindlegen.s3.amazonaws.com/AmazonKindlePublishingGuidelines.pdf)].
The release notes inside the zip file for the latest version of KindleGen states that Kindlegen has "changed default encoding of the generated books to utf-8."[Link to download page of latest version. (http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621)]

My head is spinning, lol. I'm not a techhead, and I just need a robust doctype/charset/special character handling suggestion--one that will be back/future compatible. I'm currently using XHTML Transitional with an ISO-8859-1 charset and named entities.

Any help would be great :)
Todsplace

davidw
05-20-2011, 02:50 PM
I think named entities are better than numerical ones. I recently had a customer with some Hebrew characters that were not coming through at all, so we had to choose between rendering them as images and simply avoiding them.

I'd like to see some stats on numbers of different kindles out there, I think that'd be useful in deciding what to support and what not to.

Todsplace
05-20-2011, 03:41 PM
^ Yes, Kindle stats would be great. I'm not quite sure how many Kindle users are using an older model.

If I find stats I'll post them here.

Medievalist
05-20-2011, 08:49 PM
1. If there's an HTML entity, use it. HTML entities use a now very reliable standardized set of characters. The named convention is the standard one since 4.01 transitional. Here's the list, based on the standard. (http://www.w3schools.com/tags/ref_entities.asp)

2. If you need Unicode for non-Roman characters, for instance Japanese, or Persian, or another language, go with UTF-8, and make sure you've that in your doc declaration. Don't use Unicode if there's an entity.

3. One of the problems with Unicode is that you then need to know that the font the reader will be using has Unicode support, and to what extent; this is what makes it tricky.

Todsplace
05-21-2011, 01:06 AM
Hi Medievalist,

Thanks for the feedback. You raise some good points. I hadn't actually thought about Unicode support and fonts!

May I ask: do you know if older Kindles will have an issue with XHTML?

Cheers,
Todsplace

Medievalist
05-21-2011, 01:26 AM
Hi Medievalist,

Thanks for the feedback. You raise some good points. I hadn't actually thought about Unicode support and fonts!

May I ask: do you know if older Kindles will have an issue with XHTML?

Cheers,
Todsplace

I don't.

I will note that 4.01 transitional is a safe bet; if the device supports XHTML, it'll be fine with 4.01.

Todsplace
05-21-2011, 02:01 AM
Thanks for the answer. Yes, that does sound like a safer bet. May I ask what's your preference for closing tags?

For example, for one blank line in a page, which would you recommend?


<br><br>
<br></br>
<br/><br/>
<br /><br />

I see Amazon mentions the last one in their guide, but is it really semantically sound? The Jabberwocky example in the KindleGen zip file uses a mixture (<br> for individual lines in poetry and <br/><br/><br/> elsewhere).

Medievalist
05-21-2011, 03:12 AM
I see Amazon mentions the last one in their guide, but is it really semantically sound? The Jabberwocky example in the KindleGen zip file uses a mixture (<br> for individual lines in poetry and <br/><br/><br/> elsewhere).

<br /><br /> is the XHTML/XML form of <br>

If you're using an XHTML declaration, use <br /><br />; if you're using an HTML declaration up to an including 4.x transitional use <br>, as a single un-paired tag.

Todsplace
05-21-2011, 03:49 AM
Thanks Medievalist :)

I take it that you use the ISO-8859-1 charset for your work?

Medievalist
05-21-2011, 03:51 AM
Thanks Medievalist :)

I take it that you use the ISO-8859-1 charset for your work?

:D

I'm a sort of specialist in terms of character sets, digital text, and standards.

So I do a lot of consulting about how to present text correctly, particularly in the context of scholarly writing.

Todsplace
05-21-2011, 04:06 AM
Ah, so I'm guessing you've explored alot of the different types? :D

At this point i'm tossing up between the ISO-88591-1 and the UTF-8. But after thinking about it, and considering some of your points, I'm most likely going with HTML4 instead of XHTML.

Todsplace
05-21-2011, 04:09 AM
By the way, do you know if <a id=""></a> is okay with older kindles?
I stopped using <a name=""></a> in favour of the id system, which I'm guessing will probably be more future compatible due to the general shift from names to id values.

Medievalist
05-21-2011, 04:13 AM
By the way, do you know if <a id=""></a> is okay with older kindles?
I stopped using <a name=""></a> in favour of the id system, which I'm guessing will probably be more future compatible due to the general shift from names to id values.

I'd go with ID, personally, but this is six of one, half a dozen of the other.

What I would concentrate on is keeping a live version of the document you use to generate the ebook, in whatever format.

You will, for one reason or another, have to re-generate your ebooks unless they completely fail commercial viability.

Todsplace
05-21-2011, 04:35 AM
Could you clarify what you mean by live version? Just a stable version?

And as for re-generation, do you mean newer versions of released titles for Amazon's Kindle DTP as their software changes and as newer Kindles come out?

Medievalist
05-21-2011, 04:52 AM
Could you clarify what you mean by live version? Just a stable version?

Sorry; it's a term of art.

It means a golden master, a file with all the editorial corrections, that you maintain in terms of viability in software.

That is, depending on how serious you are about archival strategies, andpublishing:

1. You have a native word processor file, say in MSWord, that opens and displays and functions perfectly in the current version of the software that you have, and you have a viable version.

2. You have versions of that golden master saved in a non-degrading format, like .pdf or hardcopy or both.

3. You have an easily imported version of that file in a archive-accepted format, like HTML 4.x transitional and / or .rtf.


And as for re-generation, do you mean newer versions of released titles for Amazon's Kindle DTP as their software changes and as newer Kindles come out?

Yes. You need to be able to create a new ebook edition as devices and file formats change--and they will change.

This is not, btw, unique to digital publishing; this is inherent in the technologies of publishing, going back to the 15th century.

Todsplace
05-21-2011, 05:59 AM
Yes. You need to be able to create a new ebook edition as devices and file formats change--and they will change.

Ah, I see. In your experience, does Amazon make it difficult for authors and publishers to replace ebooks with newer versions? As in swapping out a file without changing the metrics, sales page and all that?

I'm going to poke around their help files, but I'm guessing it would be weird to have to delete an out of date format in order to create a page for a newer format of the same product.

Todsplace
05-21-2011, 10:28 AM
Ah, I found the link: https://kdp.amazon.com/self-publishing/help/help?topicId=A2KRM4C8E91086

Yes, it seems you can modify content and (presumably) revise stuff.

Hmm, maybe I'm obssessing over nothing (regarding future-proofing) as files seem to be able to be periodically updated. But thank you for the advice, Medievalist. As always, you've been very helpful! I think you've answered almost all the questions I've ever posted on AW :D

Medievalist
05-21-2011, 07:20 PM
I'm going to poke around their help files, but I'm guessing it would be weird to have to delete an out of date format in order to create a page for a newer format of the same product.

It's possible, though, keep in mind if you make substantial changes to the text/content, you ought to call it a second edition.

Sargentodiaz
05-22-2011, 07:34 PM
There's a place to indicate revisions when you do through the edit process.
Just remember, when you do, it goes off the shelf until someone at Amazon reviews and releases it.

Todsplace
05-25-2011, 05:55 AM
Thanks for the extra info.

I didn't know Amazon had to review it before it went back online. This is for content that is edited through the edit html option, right?