Should pdf format e-books be searchable/have links?

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
Hello everyone,

My company has recently published a POD book called Paris Insights - An Anthology using Blurb.com. I have created a pdf file of this book by scanning the original. I have enabled the appropriate security setting for the file to discourage unauthorized editing and printing. I plan to use this file as an e-book, both for promotion and for sale on the Internet.

I converted a duplicate of this file to be searchable, but many of the words in the book did not convert consistently. Readers will be able to find a given word most, but not all, of the time when they use the search function. They may not be able to find a rarely used word at all if the scan of the text did not convert properly.

Also, the URLs that appear in the e-book do not have hyperlinks.

How important are these issues to the successful marketing of an e-book in pdf format? Should I plan to include a disclaimer with the file, saying that there may be problems with searching and that no direct Internet access to the web addresses in the text is available? Should I use the pdf file that is not searchable and mention this in the disclaimer? Or should I start from the beginning and create an e-book that has these features (something that I would prefer not to do)? Will anybody care one way or the other?

Thanks,
dpbooks
 

veinglory

volitare nequeo
Self-Ban
Registered
Joined
Feb 12, 2005
Messages
28,750
Reaction score
2,934
Location
right here
Website
www.veinglory.com
If there was an url in a non-fiction ebook I would expect it to be clickable and I general expect pdfs to be searchable. I would suggest matching the settings of mainstream epublishers in terms of what functions are disabled.
 

soma

Super Member
Registered
Joined
Sep 24, 2007
Messages
85
Reaction score
3
Location
Columbus, OH, USA
Website
tinianow.blogspot.com
From a purely monetary standpoint, you most probably have less to lose from unauthorized editing and printing than you do from poor sales because your eBook is lacking in accessibility features. No matter how you word the disclaimer, as far as the reader is concerned it will say "by the way, this is kind of a poorly-made eBook so don't expect too much."

Even if you put DRM on the eBook, all that a serious e-pirate has to do is run it through a program like ABBYY Fine Reader to make it editable or whatever they want. The best policy is probably to add those features at the outset and remember that any serious piracy concerns with unauthorized distribution or plagiarism can be caught and dealt with through intelligent use of Google. Remember, rights management is meaningless if you don't have customer satisfaction first.
 

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
Thanks to both soma and veinglory for responding.

Have either of you used or heard of anyone using Amazon's Kindle software to create a searchable/linkable e-book from a pdf file? Might this solve my problem? Would any other program out there work for me?

dpbooks
 

Mac H.

Board Visitor
Super Member
Registered
Joined
Feb 16, 2005
Messages
2,812
Reaction score
406
Whenever I get a new PDF, the first thing I do is strip out the d!mned protection .. it is only a few seconds work and makes everything so much easier.

I'm not sure WHY you would be trying to stop people from printing it. If you are worried about copyright, it is zero effort and cost for someone to make an e-copy .. so why try and stop someone from printing it .. which costs them money? If you are worried about illegal copies then you are closing a tiny hole while leaving the floodgates open.

The worst aspect is that by releasing the PDF as something unsearchable, you are preventing piracy detection programs (such as google alert) from notifying you if an illegal copy turns up on a website. So you are damaging yourself worse than others.

The way you made it searchable in your duplicate seems particularly odd. You scanned a printout with OCR ? Why not just use the original source of the file, and simply print to PDF using a totally free tool?

At least then the text will be converted properly ...

Mac
 
Last edited:

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
The problem with the original file is that it has a Blurb watermark that says "for proofreading only" or something to that effect. I have used this function to print out copies for review. But the only way to get a "clean" pdf copy of the book for public distribution is to scan it.

Regarding the security features to prevent printing, I can easily remove this. But this will not help with 100% searchability or the ability to add links, correct?

dpbooks
 

Mac H.

Board Visitor
Super Member
Registered
Joined
Feb 16, 2005
Messages
2,812
Reaction score
406
It looks like 'Blurb' is a terrible way to make books.

According to their tech support:

Exporting to PDF is also fine, but as Michal pointed out, a Blurb watermark is added to each page. We’ve invested a great deal into BookSmart, but offer it for free to anyone that wants it. We insert the watermark to show that the book was created and designed in BookSmart.

Obviously, you should feel free to use any of your content that you’ve previously put in a Blurb book with any another printer. But, Brian is correct that you’re probably going to need to rebuild the book from scratch using a page layout or image editing tool.
Basically they have deliberately designed their software to be unusable with any other printer or any other distribution method - including ebooks.

You could send someone knowledgable a PDF file (with watermark) to try and hack ... sometimes it is as easy as embedding the font that they use for the watermark with a 'fake' font that happens to be entirely clear.

Good luck,

Mac
 

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
Dear Elodie,

Thanks for informing me of this! I have edited my signature and the link works for me in Mozilla now. Would you please try it again to see if you can now read it with that browser?

dpbooks
 

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
I've got someone looking at the watermark question for me. But I now know that this is a moot point because I have found that the pdf file from Blurb is not searchable. OCR needs to be applied to this file.

Does anyone know if the application of OCR at the time of the physical scanning of a book works better than applying OCR to a file that has already been created? I already have a pdf file that is pretty well searchable, but that misses some words and shows others as gibberish. (BTW, I have successfully added hyperlinks to this file - there are not very many of them.)

In other words, would it be worth the time and expense to have the hard copy of the book rescanned, applying OCR at the time the scan is made?

Thanks,
dpbooks
 

soma

Super Member
Registered
Joined
Sep 24, 2007
Messages
85
Reaction score
3
Location
Columbus, OH, USA
Website
tinianow.blogspot.com
I can't see how it would help. With scanning, you're always going to end up with a few words not registering correctly. You could try exporting the OCR'd document to an MSWord file and apply spellcheck, or just manually proof the pdf file.
 

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
Hi Soma,

I'm not sure that I understand what this would accomplish. The words are correctly spelled in the document (spell check was completed prior to publication). But when they are searched, words adjacent to the searched term can sometimes contain one or more symbols or an incoherent string of letters. Or, the searched term itself may contain an erroneous letter (for example, a search for the word "boat" will call up "about", with the letters "bout" being highlighted as the searched term. I think that this is because the scanner cannot distinguish between the letter "a" and the letter "u" 100% of the time.

If this is a limitation of OCR technology, perhaps there is nothing more that I can do. If you were in my place, would you use this file (hyperlinks included) as an e-publication? Would you use it for promotion of the printed book? I would say that the search function is working at about 95% capacity.

dpbooks
 

veinglory

volitare nequeo
Self-Ban
Registered
Joined
Feb 12, 2005
Messages
28,750
Reaction score
2,934
Location
right here
Website
www.veinglory.com
DP, that's exactly what soma was saying. When you scan it effectived adds error to the underlying text accessable to searching. All you could do is export and correct that. Otherwise you will be distributing (selling?) a flawed product.
 

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
I see.

I am not familiar with this process. There are thousands of words that could be searched, and each search would generate a different set of words and phrases in the search box. How could I ever be assured that I have corrected all the words that appear abnormally?

Can anyone explain to me (a relative novice with manipulating pdf files) how to go about exporting the text found in the box when a word is searched and how to correct it? Or tell me where I can go to find the information?

Thanks,
dpbooks
 

soma

Super Member
Registered
Joined
Sep 24, 2007
Messages
85
Reaction score
3
Location
Columbus, OH, USA
Website
tinianow.blogspot.com
My idea was to export the scanned OCR document to MSWord, then proof the MSWord document and correct any errors. Then format the Word file (or import it into InDesign) as you want the pdf to look and build a new pdf from that file. Since the source is now an actual word processing (or InDesign) file and not scanned pages, you should no longer have trouble with the search tool mixing up "a" and "u" and similar errors.
 

dpbooks

Registered
Joined
May 22, 2008
Messages
18
Reaction score
0
Location
Paris, France
Website
www.discoverparis.net
Hi Soma,

I did what you suggested and exported the OCR document to Word. The result was a document that contains numerous different fonts and symbols instead of letters on each page, with all text rotated 90° on the page. Only the pages with photos and no text have retained the page color (light blue) and the correct orientation. The number of misspellings is enormous. It would be faster for me to start over again.

Last night, I read a chapter on e-publishing in a book by Penny Sansevieri called Red Hot Internet Publicity (Morgan James, 2007). Penny advises that an e-book should not be longer than 100 pages long (mine - or rather, my husband's - has 156 pages) and that it should have an "I need it now" element (my husband's book is a collection of essays on culture and contemporary life in Paris - compelling, we like to think, but hardly anything that shouts "urgent"). She suggests breaking the book into chunks and offer them separately to "drive more urgency than the book as a whole". What do you think of this advice?

Penny also says not to overload the book with pictures or graphics. But my husband's book contains 109 color photos - this is a major selling point of the work.

Perhaps we should abandon the concept of creating an exact copy of the printed book as an e-book and do something else altogether.

I welcome your comments and those of others reading this thread.

Thanks, and have a good weekend!
dpbooks
 

veinglory

volitare nequeo
Self-Ban
Registered
Joined
Feb 12, 2005
Messages
28,750
Reaction score
2,934
Location
right here
Website
www.veinglory.com
I would bet Sansevieri is thinking about non-fiction about business and self-help etc. Selling entertainment-type ebooks follows the same basic rules as selling them in paperback except it happens online.