Converting pdf to Word?

Status
Not open for further replies.

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
I'm trying to convert an entire pdf book (544 pages) file to Word, but it seems the file is too big for the free converter I use.
Is there anyone here who has the software and could do this for me? If not, can you recommend software I can buy -- not too expensive?

Thanks!
 

robjvargas

Rob J. Vargas
Banned
Joined
Dec 9, 2011
Messages
6,543
Reaction score
511
You're sure it's not locked by DRM (Digital Rights Management)?

If the book isn't public domain, I think altering the format violates copyright.

I think.
 

Cathy C

Ooo! Shiny new cover!
Kind Benefactor
Absolute Sage
Super Member
Registered
Joined
Jun 5, 2005
Messages
9,907
Reaction score
1,834
Location
Hiding in my writing cave
Website
www.cathyclamp.com
I presume this is for something of yours that's older, rather than something you're pulling from the web. I use Nuance PDF converter (I have version 5, but this is a link to version 7.0). I haven't tried it on a full book, but I've done several 50+ page documents with no problems. It's not expensive, and is a great little program.

But remember, there will be a LOT of codes you'll need to remove in any conversion. You're basically taking a photograph and turning it into text. But in Word, it will treat it with the same "here and after" formatting, so a center command in the converted text will treat the next paragraph as centered, a tab set will become first line indent, etc. If that's okay, cool. But I tend to open them in WordPerfect so I can remove the codes first. :)
 

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
Yes, it's my own work. And I found I can't even send it per email; I get an error message saying this file exceeds the 25 MB attachment limit. It's 428 MB.

ETA: I don't know how to remove codes. I don't know ANYTHING about codes. Mayby I'll just take it to a local IT place and let them do it.
 

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
The free software I tried it on was WOndershare PDF converter. They let me convert 5 pages, and it worked just fine. To buy it will cost me $59.95. It's not the earth but I need to be sure it can handle the whole document.
 

thothguard51

A Gentleman of a refined age...
Super Member
Registered
Joined
Oct 16, 2009
Messages
9,316
Reaction score
1,064
Age
72
Location
Out side the beltway...
Not sure you can do this, but break the PDF file into smaller manageable files and then convert? Then once you fix any formating problems, copy and past into a new singe word file.
 

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
I don't think I would e capable of that, thoguard! But it's OK. I have taken it to a tech guy in my village and hopefully he can do something with it.
 

Jamesaritchie

Super Member
Registered
Joined
Feb 13, 2005
Messages
27,863
Reaction score
2,311
Yes, it's my own work. And I found I can't even send it per email; I get an error message saying this file exceeds the 25 MB attachment limit. It's 428 MB.

ETA: I don't know how to remove codes. I don't know ANYTHING about codes. Mayby I'll just take it to a local IT place and let them do it.

425 MB? Doe sit contain a bunch of photos? The largest novel PDF file I have is under one MB.

Anyway, there are several freeware PDF to Word Doc programs out there. I don't know whether such a large file would present a problem, but it costs nothing to try.

Two I have used are here: http://www.freewarefiles.com/Free-PDF-to-Word-Converter_program_50365.html

And here: http://www.boxoft.com/pdf-to-word/
 

alleycat

Still around
Kind Benefactor
Super Member
Registered
Joined
Apr 18, 2005
Messages
72,886
Reaction score
12,236
Location
Tennessee
525 Mb does sound incredibly large, but PDF files can be quite large (and don't compress much either). That's about the size of a short movie.
 

Al Stevens

Super Member
Registered
Joined
Mar 4, 2011
Messages
2,537
Reaction score
214
Try Calibre. Convert to txt format. Then you can read it into word. But that is one really big pdf file.
 

Al Stevens

Super Member
Registered
Joined
Mar 4, 2011
Messages
2,537
Reaction score
214
I assume you can read it with Acrobat Reader. It must be the images. Even though they display at lower resolution and size, they might be stored in their original formats. Lots of embedded fonts can add to file size, too, but not that much.
 

blacbird

Super Member
Registered
Joined
Mar 21, 2005
Messages
36,987
Reaction score
6,158
Location
The right earlobe of North America
:Shrug:Still don't get why it's so big. It's just a 540 word novel with a few images.

My suspicion is the images, even if they're only "a few". Depending on a variety of matters, inclusion of images within text documents can vastly increase the file size.

I've used a freeware program called PDFedit, a few years ago, and it worked decently on pure text PDFs, but it was much like an OCR scan, in that it required subsequent editing, and mismanaged the occasional letter. I suspect font had something to do with it.

caw
 

Sheila Muirenn

Rebuilding My Brain
Super Member
Registered
Joined
Jan 9, 2010
Messages
1,906
Reaction score
495
Location
Riding my bicycle
Do you have Adobe Pro or Adobe Reader?

I assume you have Reader at home. If you have access to Pro, you can do OCR Text Recognition to make the document searchable. Do that first. Afterwards, click Edit, Select All, Copy. Then open a Word Doc and Paste. Should work. Not necessarily, but should.

I have to do this kind of thing at work a lot, and I'm limited with software options because of security. But, I don't have Pro at home, and don't remember the exact steps. You just go to one of the drop-down menus and look for OCR Text Recognition and click it.

I just looked online, and there are several ways to do with Reader. If you google Adobe Reader OCR Conversion, it will bring up many answers.

Another option, in my Adobe Reader at home, there is a button that looks like a piece of paper with a gold arrow along the bottom. This is the Convert Adobe to Word Online button. If you click, it will take you to an online place where you can do this. You will have to pay, but looks worth trying. This is the link. Only 19.99 per year US.
 
Last edited:

robjvargas

Rob J. Vargas
Banned
Joined
Dec 9, 2011
Messages
6,543
Reaction score
511
:Shrug:Still don't get why it's so big. It's just a 540 word novel with a few images.

500 pages, and that large?

It's not images *in* the pages, IMO.

It's pages *as* images.

I'm curious. Are you able to select text in the PDF document?

Remember, PDF (or, more accurately, Adobe Acrobat) isn't about word processing, about editing and manipulating text. At least, not *as* text. Acrobat was designed for desktop publishing, for layout and page design and page elements, etc. So it has the ability to treat page elements, even ones as large as the page itself, as a single object. Basically, an image.

If your document is 500+ pages of single-element pages, then it's essentially (short form) 500+ pages of photographs of the pages of a book. Depending on the dpi (image resolution, essentialy), that can get huge. Fast.

A lot of software, and even mid to large office machines,takes advantage of this image-like quality to provide options to scan and send documents electronically.
 

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
500 pages, and that large?

It's not images *in* the pages, IMO.

It's pages *as* images.

I'm curious. Are you able to select text in the PDF document?

Remember, PDF (or, more accurately, Adobe Acrobat) isn't about word processing, about editing and manipulating text. At least, not *as* text. Acrobat was designed for desktop publishing, for layout and page design and page elements, etc. So it has the ability to treat page elements, even ones as large as the page itself, as a single object. Basically, an image.

If your document is 500+ pages of single-element pages, then it's essentially (short form) 500+ pages of photographs of the pages of a book. Depending on the dpi (image resolution, essentialy), that can get huge. Fast.

A lot of software, and even mid to large office machines,takes advantage of this image-like quality to provide options to scan and send documents electronically.

I think you are right. It is the final files of a print ready book retuend to me from HarperCollins (yes, to those who know: they returned those files immediately after I caught them out publishing a book whose rights had reverted to me, but that's another topic).

I did a trial run on Wondershare and that worked all right; an enquiry at wondershare told me that they could convert the entire doc if I bought the product.

However, I am not able to edit the sample they sent me. I can highlight individual lines on the document, but can't edit them; the most I can do is delete, but then the whole page gets deleted.

I am beginning to suspect converting and editing this is is not going to be as easy I assumed; send a pm with your email.

If anyone would like to have a look I can send the sample "Word" converted doc (five pages) from Wondershare.
 
Last edited:

Jamesaritchie

Super Member
Registered
Joined
Feb 13, 2005
Messages
27,863
Reaction score
2,311
Out of curiosity, have you tried opening the file, and then saving it as a text file?

Anyway, something is seriously wrong here. There's no way HarperCollins should have created such a large file. Publisher created PDFs are usually pretty darned small. I have at least a couple of dozen PDF book files created by publishers, and the largest is under 3 MB.

I also have a fair number of glossy magazines in PDF, and several of these are almost all high quality images and photos. All are under 30 MB.

Publishers don't like passing large files back and forth anymore than they have to, and this file is huge. Larger than any I can remember seeing.
 

Torgo

Formerly Phantom of Krankor.
Kind Benefactor
Super Member
Registered
Joined
Apr 7, 2005
Messages
7,632
Reaction score
1,204
Location
London, UK
Website
torgoblog.blogspot.com
Yeah, that's enormous. I just had a peek at a print-ready PDF of a hardback around the same extent - less than 2MB.

It's possible the PDF is made up of uncompressed bitmap images of pages?

My tip would be Ghostscript. You should be able to fix just about anything with that. There are tips on the net that will help.
 

Deleted member 42

Out of curiosity, have you tried opening the file, and then saving it as a text file?

Anyway, something is seriously wrong here. There's no way HarperCollins should have created such a large file. Publisher created PDFs are usually pretty darned small. I have at least a couple of dozen PDF book files created by publishers, and the largest is under 3 MB.

Mr. Ritchie you are again pontificating in a vacuum.

The file is that large because it's a collection of images; it is not text.

The entire book was scanned as a series of images. Each page is a separate image.

Aruna: You need to buy OCR software, though you may already have some. Most scanners come with OCR or Optical Character Recognition software, that "reads" the images of letters and converts them to text.

This is a laborious process, but you don't have to sit there while it's happening.

Generally you send an hour or so "training" the software, which "reads" rather like a dyslexic six year old; it will missread some letter combinations, usually, in a patterned error.

After, you will need to proof the book against the published version, and format it.
 

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
Anyway, something is seriously wrong here. There's no way HarperCollins should have created such a large file. Publisher created PDFs are usually pretty darned small. I have at least a couple of dozen PDF book files created by publishers, and the largest is under 3 MB.

.

Well, they certainly did!
Priene, who took a look at the excerpt, said this:

Almost every single section of that document is an image - including most of the individual letters in the first part of the document. Most likely they were stored like that inside the original Harper Collins document. The only way to get the original text out would be to save the images from the pdf and then use character recognition software to scan the images for text. Which wouldn't be 100% accurate, but might be better than typing the whole thing in again.

And my tech guy managed to convert it and he agrees. Tech guy says it would be impossible to edit the file; at the very most he could remove sections of it -- single paragraphs etc -- but not change anything within the paragraphs.

As it is, though, the file is useless as it cannot be formatted for Kindle, which was my intent. Either I ask HC if they have a prior format (which I don't want to do as I am truly not happy with them) or else type the whole book out anew. I guess my mother could and would do that; but she's 94 and I don't know if she can! (I'll soon find out, though.)
 

aruna

On a wing and a prayer
Super Member
Registered
Joined
May 14, 2005
Messages
12,862
Reaction score
2,846
Location
A Small Town in Germany
Website
www.sharonmaas.co.uk
Aruna: You need to buy OCR software, though you may already have some. Most scanners come with OCR or Optical Character Recognition software, that "reads" the images of letters and converts them to text.

This is a laborious process, but you don't have to sit there while it's happening.

Generally you send an hour or so "training" the software, which "reads" rather like a dyslexic six year old; it will missread some letter combinations, usually, in a patterned error.

After, you will need to proof the book against the published version, and format it.

Oh, sorry - I missed this when I posted. I'll look into it over the next few days. It certainly beats retyping.
 
Status
Not open for further replies.