PDA

View Full Version : Problems with text on Kindle - anyone noticed?


Fiona
12-16-2010, 06:16 PM
I have a Kindle, and I have read several books on it so far, but has anyone noticed a large amount of mistakes on the text? Mistakes that are absent on the paperback versions? Why is this?

I am reading The Exorcist on Kindle right now and there are some awful mistakes... such as quotation marks being opened and not closed, or the word "both" instead of "bother" or "her" instead of "he" etc. I have noticed this on other Kindle book editions too.

I wonder why this is happening? Are these books copied again? I don't understand why they would appear on Kindle versions and not in paperback versions... but it's frustrating to keep spotting such mistakes.

Ineti
12-16-2010, 07:14 PM
The gist I'm getting from reading various blogs and posts about ebooks and hearing from people who are converting their own work to ebooks, a large number of mistakes in ebooks can be credited to:

--People doing the conversions who don't have a full grasp of what they're doing
--The software used to automate the conversion not working perfectly for all types of ebooks

I'm sure there are other reasons, but user error and computer error are most likely the big two.

ResearchGuy
12-16-2010, 07:25 PM
I've seen weird errors in Kindle books, including misplaced quotation marks (sometimes inside of words!) and misplaced superscripts for end notes (LOTS of those in The Warmth of Other Suns, for example).

--Ken

Soccer Mom
12-16-2010, 11:42 PM
I'm sure it has something to do with the conversion process. FWIW, Penguin tends to be one of the worst. I buy a lot of Penguin books and it has gotten much better recently. I'm hoping this means they've solved some of their issues.

It hasn't deterred me from buy their books. It's just annoying.

Medievalist
12-17-2010, 01:02 AM
It's not just Kindle books either; I'm seeing in pretty much every file format, from mainstream publishers.

Some of the errors really do seem to be scanning errors--which makes no sense for a modern digitally written/edited/produced print book.

Some of them -- like the end-of-line characters, and hard-hyphens in places that don't make sense -- are clearly conversion errors.

But it's frustrating in the extreme.

Duchessmary
12-17-2010, 02:21 AM
I have the same problem with my Kindle. I got a Harold Robbins book, and the typos were unbelievable!

Medievalist
12-17-2010, 03:16 AM
You don't want to know what's happened to LOTR, in every legal version I've seen.

And there is a "public domain" version of Chaucer's Canterbury Tales where someone has decided to "fix" the Middle English.

So far none of the vendors--including my beloved Apple via iBooks--is properly and effectively handling replacing defective books. It's entirely dependent on who you're talking to.

This not OK. A production error should mean anyone who buys the book gets a corrected copy at no charge.

KathleenD
12-17-2010, 05:47 AM
Scanning errors are pretty funny, sometimes.

I was halfway through a book where the author kept talking about the political dub he'd started.

Dub? What's a dub? Must be a political term.

Then I realized it was a club in the paper version, and the conversion smashed the C and the L together :)

benbradley
12-17-2010, 06:10 AM
I've not read many ebooks, and none from "major publishers" - just a few things available as PDF's, from technical sites that know what they're doing, where the PDF's are the actual source files of the printed book, such as http://dspguide.com and http://nr.com. In short, I haven't seen this, and was really surprised at nonsensical-seeming statements by Medievalist saying that major publishers OCR their modern books to put them into electronic format...
It's not just Kindle books either; I'm seeing in pretty much every file format, from mainstream publishers.

Some of the errors really do seem to be scanning errors--which makes no sense for a modern digitally written/edited/produced print book.

Some of them -- like the end-of-line characters, and hard-hyphens in places that don't make sense -- are clearly conversion errors.

But it's frustrating in the extreme.
I recall you saying this earlier - that even recently-published books by major publishers, where the printed text was surely made from a computer file the publisher surely has a copy of and surely has physical and legal control of - have been converted to ebooks by scanning and OCR'ing the physical book. Not only is this a hideous and wasteful practice in a technical sense (it's a waste of time and, due to OCR being less than perfect, adds many errors that weren't in the printed edition), but (due to this OCR'ed text not receiving any editing or correction, but apparently published as-is) it also reflects badly on the publisher and the whole ebook experience. Do publishers not think ebooks are "real" books, and so aren't paying any attention to what's going out the door?

I can see this doing more actual harm to ebooks and e-publishing than the "self-publishing and ebooks lowering the bar" and associating ebooks with self-publishing, as discussed in that other thread. People are buying "real" ebooks from Real sites and in Real stores, and they're getting crap product. Presuming this is "the future of publishing," the publishing industry is going to hell and it's mostly the major publishers' fault.

I don't read Publisher's Weekly or any other publishing news site - maybe someone (Medievalist?) could send them a well-written letter to the editor telling of this problem.

benbradley
12-17-2010, 06:16 AM
You don't want to know what's happened to LOTR, in every legal version I've seen.

And there is a "public domain" version of Chaucer's Canterbury Tales where someone has decided to "fix" the Middle English.
I can see the "public domain" files having all sorts of problems, but one would think a "legal version" would have the same text as the highly edited and vetted print version. When it comes from a publisher, it's the publisher's problem.
So far none of the vendors--including my beloved Apple via iBooks--is properly and effectively handling replacing defective books. It's entirely dependent on who you're talking to.
I can sort-of understand the vendors not wanting to get involved - it's a computer file from the publisher, and as long as they're getting the same bits the publisher gave them into the hands of customers, the vendor has done their job.
This not OK. A production error should mean anyone who buys the book gets a corrected copy at no charge.
And since it's clearly the publisher's fault, I really think the publisher should handle it.

Medievalist
12-17-2010, 07:20 AM
And since it's clearly the publisher's fault, I really think the publisher should handle it.

Bookstores replace books with missing pages, damaged pages (in production or shipping) or damaged spines or misprinted covers.

When it's vendor site--like Amazon with Kindle, or Apple with iBooks, I see no reason why they shouldn't handle replacing a damaged book. They are selling books from many publishers.

They have a way to contact the publisher and report the problem. They have a way to alert the buyer.

Medievalist
12-17-2010, 07:26 AM
I
I don't read Publisher's Weekly or any other publishing news site - maybe someone (Medievalist?) could send them a well-written letter to the editor telling of this problem.

I'm working with publishers at this point.

The odd thing is that the same book in different formats--LOTR is the one I'm using most--has different errors.

Finding out what the work flow process is for various format has been more that a little shocking.

They did in fact scan LOTR.

Despite having a digital file that was quite recent, from the major official revised edition.

And much of the work was outsourced; I suspect to China, or India, since they both offer huge scanning and conversion operations.

The publishers are now so large, in terms of the Big Six and their imprints, that the office that handles the conversion etc. often has no contact, at all, with the group that actually did the production of the print book, and has no staff that are familiar with production.

They don't even have a QA procedure, never mind someone who does QA.

And when I ask "why not" they act like I'm speaking in tongues.

But some publishers are actually paying attention. I'm going to see what it's like in late 2011.

Here's a sample from a page of the iBooks LOTR single volume edition.
http://i195.photobucket.com/albums/z291/digital_medievalist/Linked%20iamges/LOTR.jpg

There are a few pages without errors, but very very few.

Granted, the numerous non standard names, many with diacritics, make it difficult, but it needn't have been this difficult.

Terie
12-17-2010, 11:30 AM
I downloaded a 'free first book in the series' by a favourite author with the full intent of buying the rest so I could have them all on my Sony e-Reader, but when I started reading it and saw all the errors, I decided not to pay money for the rest of the series. I'll stick with the hardcopy and audiobooks I've already bought, thank you very much. And this, too, was from one of the Big Six publishers.

I love my e-Reader. I've downloaded lots of stuff from Gutenberg.org (and found few errors in those....and any errors are easy to forgive because the files are free and they've been edited by volunteers), free 'first in a series' to sample new-to-me authors, and have my own and friends' manuscripts for critique on it. Also, files for a correspondence course I'm taking. I also bought one e-book pretty much direct from the author, who's a tech geek herself so I had a high degree of confidence that the conversion had been done right.

But I'm not going to spend money on e-books from the Big Guys until the quality of the editing comes up to scratch. I suspect that the reputable e-only publishers are probably doing a way better job in this area than the Big Guys....their business model depends on it!

Soccer Mom
12-17-2010, 06:16 PM
But I'm not going to spend money on e-books from the Big Guys until the quality of the editing comes up to scratch. I suspect that the reputable e-only publishers are probably doing a way better job in this area than the Big Guys....their business model depends on it!

It's funny, but I have actually found this to be true. I don't find my epublisher-only books to have the same formatting errors as the ones from the big guys. I'm betting it's because it was formatted for the various epublishing systems the first time around and not something that went wonky in conversion.

Medievalist
12-17-2010, 09:17 PM
It's funny, but I have actually found this to be true. I don't find my epublisher-only books to have the same formatting errors as the ones from the big guys. I'm betting it's because it was formatted for the various epublishing systems the first time around and not something that went wonky in conversion.

Yep.

There are still often small errors in the typesetting--relying on MSWord to correctly format curly apostrophes and quotation marks can create problems--but you don't see the numerous bizarre errors from scanning or converting.

Torgo
12-17-2010, 10:16 PM
If it seems surprising that books are being OCRed, when publishers are presumed to have the text in an electronic form, it's not actually uncommon for us to have nothing but a flat PDF from which the text can't easily be extracted. This is even for fairly recent titles. We will have versions of the MS in Word, say, but since the last round of edits takes place on the typeset page proofs you are not necessarily talking about the most recent or cleanest version of the text. Sometimes we are OCRing the PDF rather than the book...

Doing a full proofread is relatively expensive and the expectation is often that ebooks are cheap to produce; there is a hypothetical big pot of money realised from production savings that everyone feels entitled to a slice of. Consumers expect cheaper books, authors and agents expect higher royalties, and retailers and publishers expect higher margins. At this point in time, when many houses are engaging in mass conversion efforts, often done in an Indian or Far Eastern conversion house, corners and costs are inevitably being cut.

If you find an unacceptable level of errors in your ebooks, please complain. Often publishers won't be aware of how bad things are because their QA process is pretty limited. I know of places where they check a sample of the book and only get more rigorous if they turn up a certain number of errors.

Medievalist
12-17-2010, 10:29 PM
If it seems surprising that books are being OCRed, when publishers are presumed to have the text in an electronic form, it's not actually uncommon for us to have nothing but a flat PDF from which the text can't easily be extracted.

That is a little odd; you should be able to extract the postscript from the .pdf, and then import it to In Design or Quark quite easily, unless the file is encrypted.

This assumes you have the full postscript version of any fonts used in the book.

You get all the text, and formatting. It's essentially using the same postscript code that generated the .pdf and that would be sent to the printer to print hardcopy.

Not quite as good is to print the pdf. to a postscript interpreter--this is what is done by printers as a normal part of the printing process, but it will affect layout.

Publishers would actually do better to have two typists rekey the complete ms. from the printed book, then run diff and have some one compare the versions against the printed book, rather than using OCR.

This practice reduces errors and is cost effective.

Torgo
12-17-2010, 10:51 PM
Possibly production are lying to me for their own nefarious purposes! (I never have quite been able to get my head round the nitty gritty of what they're doing.) I should say quite a lot of the ones where we can't easily get the text out were scans from film originally, so they're basically image files. Or in one memorable case a PDF where the punctuation appeared to be some kind of weird other font or even bitmap, and I had to rekey every bit of it from the book.

Double-rekeyed is standard with us now for important titles, but the rest is done out of house and I am not sure what methods they are using.

Terie
12-17-2010, 10:55 PM
If it seems surprising that books are being OCRed, when publishers are presumed to have the text in an electronic form, it's not actually uncommon for us to have nothing but a flat PDF from which the text can't easily be extracted.

I'm not even in the book publishing business, and I don't buy this reasoning for one minute. Even without any special tools, text can very easily be selected and copied from a PDF file and pasted back into any desktop publishing or word processing program. I do this myself for my day job as a tech writer on an almost daily basis. Yes, the results will require some fixing up, but the PDF-to-DTP part is dead easy.

And with the tools available now for converting PDF to Word, it's even easier than what I described above.

Sure, for older books where there's no PDF, OCR makes sense (though failing to edit the resulting output and expecting people to pay actual money for it doesn't); but if you have a PDF, there's just no excuse for using OCR.

ETA: We cross-posted, Torgo. If you're interested, I'd be happy to set up a screen-sharing session and conference call with you to show you just how easy it is. After the New Year and I'm back in the office. :) PM me if you're interested.

Medievalist
12-17-2010, 10:59 PM
Possibly production are lying to me for their own nefarious purposes! (I never have quite been able to get my head round the nitty gritty of what they're doing.)

They may be more of the artist type than the geek type.

But there are tools specifically designed for this--and postscript 1 files from the 1990s even can still be utilized.

Should you become A Powerful And Influential Force with the power of life and death or at least the ultimate STET, institute a digital archive.

It will save your and your company, repeatedly.

Torgo
12-17-2010, 11:05 PM
Should you become A Powerful And Influential Force with the power of life and death or at least the ultimate STET, institute a digital archive.

It will save your and your company, repeatedly.

Got one. But it doesn't hold everything yet. Going forward, it will be a lifesaver, yes.

The punctuation horror referred to above were files bought in from another house, sadly... Can't control everything even with my orbital STET cannon.

Gillhoughly
12-18-2010, 07:22 PM
Complain to the publishers.

Cite a list of errors on the first five pages.

Tell them you're paying good money for their books and whatever the format, you expect to see a professionally finished product.

Demand a refund if that's an option. Tell them you're boycotting their products until they make a public announcement that they're fixing the problems. Blog about it, hell, leave a message with Oprah.

There's no excuse for sloppiness.

I thank you for this thread, too. I'm converting some of my backlist to Kindle and you've confirmed that my plan to proof all the pages for scanning errors is a necessity, however long it takes.

Of course, now I'm worried that the few books I have left with Penguin are also digital disasters and their incompetence is cutting into my sales.

It's just too easy to blame the writer for the publisher's errors.

Complain to the writers, too. Gently. Tell them how hard it is to read their words when the publisher doesn't bother to proof things. Ask them to talk to the publisher about the problem or have their agents do that for them.

This is an easily preventable issue.

Medievalist
12-18-2010, 09:44 PM
This is an easily preventable issue.

This is the part that baffles me.

The process has been completely debugged.

We know how to do this, and do it right.