Mobileread
How to Do Everything with PDF Files
#11  Flinx 01-04-2009, 06:52 AM
Quote labnol
...wait for google bots to index those PDF.
The linked example shows why this way is essentially useless. The resulting text has line breaks on each line. A good converter for books has to try to set a line break only at the end of a paragraph.
Reply 

#12  tompe 01-04-2009, 08:59 AM
Quote Flinx
The linked example shows why this way is essentially useless. The resulting text has line breaks on each line. A good converter for books has to try to set a line break only at the end of a paragraph.
Really not true at all. You can also use the convention that two line breaks in a row indicates a new paragraph like TeX and LaTeX do. It is trivial to convert between the two conventions using some simple program or a one line script.
Reply 

#13  Flinx 01-04-2009, 01:24 PM
Quote tompe
Really not true at all. You can also use the convention that two line breaks in a row indicates a new paragraph
No, that is not really useful for the most standard PDFs. The text object in a PDF file does not contain a real line break. It contains the position where on the page it has to drawn and a number of characters. The result is a line of text.
The progam that makes the conversion has to estimate from the positions of the text objects in which order the lines come. Simple converters like the most available (including Acrobat) use one text object, convert it to text and set a line break at the end, resulting in one line of the output text. The better converters can try to join the separate text objects, if their horizontal start position is identical and the line is long enough. But this is a difficult job, and I have not yet found a program that works good enough for me.
Reply 

#14  tompe 01-04-2009, 01:51 PM
Quote Flinx
No, that is not really useful for the most standard PDFs. The text object in a PDF file does not contain a real line break. It contains the position where on the page it has to drawn and a number of characters. The result is a line of text.
The progam that makes the conversion has to estimate from the positions of the text objects in which order the lines come. Simple converters like the most available (including Acrobat) use one text object, convert it to text and set a line break at the end, resulting in one line of the output text. The better converters can try to join the separate text objects, if their horizontal start position is identical and the line is long enough. But this is a difficult job, and I have not yet found a program that works good enough for me.
That might be the case but there is no functional different between encoding paragraphs with two line breaks or one. What you are talking about is how go a converter is detecting a paragraph break but that has no necessary connection to how the encoding is done. You can argue that you loose information if you do not keep the line breaks in a paragraph since they are impossible to recreate but it is trivial to take a paragraph specified by using double line breaks and convert it to one line.
Reply 

#15  stonehat 01-05-2009, 04:28 AM
From TFA:
"Most mobile phones can read PDF files."

I stopped reading after that.
Reply 

#16  millerjpmd 01-07-2009, 05:11 PM
Thanks for the find. I started a thread concerning a similar issue with PDFs. This is what I found related to converting from a PDF.

Programs that allow you to manipulate and extract info from PDF:
File Juicer ($17,http://echoone.com/filejuicer/)
deskUNPDF ($100,
http://www.docudesk.com/deskUNPDF_product_home.shtml)
PDFpen and PDFpenPro ($50-100, http://www.smileonmymac.com/index.html)

Program that allows you to join multiple pdfs into single file with Table of Contents:
PDF Lab (free, http://www.iconus.ch/fabien/products/pleng/pleng.html)

w/r to just getting the PDF into a PRS-505 calibre, for the most part, worked as well as any of these programs

Hope this helps.

jpm
Reply 

#17  BlackVoid 04-16-2009, 07:11 AM
When converting a PDF with pictures for an ebook device, I found a good method with minimal fuss. It is a bit time consuming and you need a 3rd party product.

Use ABBY Finereader to convert to LIT format, then convert the LIT to the ebook format of your choice. Pictures will be preserved. Abby Finereader takes a while to convert for its own format, but it will also handle scanned books. I have not tried 2 column PDFs, but an average PDF with pictures is OK.

I then use BookDesigner to convert from LIT to LRF and the result is quite good.
Reply 

#18  namiamy 05-31-2009, 03:53 AM
good find. thx.
i got more knowledge about adobe...
Reply 

#19  stranjer 07-25-2009, 03:59 PM
thanks for the trick BlackVoid, I'm gonna try this myself...
Reply 

#20  sEventoRii 04-08-2010, 06:52 PM
very useful!
thx for sharing.~`
Reply 

 « First  « Prev Next »  Last »  (2/3)
Today's Posts | Search this Thread | Login | Register