Mobileread
Newbie text to epub conversion
#1  michaelbr 11-28-2020, 06:57 AM
I'm new to calibre and conversion, I'm trying to gather some pages/information from website and put them together into an "epub" ebook. I've tried to use the default setting in calibre to convert, unfortunately it didn't work. Tried to search on internet tutorials/youtube explaining it, and found very few tutorials available for this topic. Can someone please give me an hint where to start to learn this topic?

The trouble I'm having in converting is this,
original text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Mauris gravida urna ac vulputate efficitur. Duis ultrices

nisl id tempor ultricies. Ut feugiat metus a ornare aliquam.

Integer vel accumsan elit, in facilisis libero.


Vestibulum mollis justo ut dictum tempor. Maecenas euismod

dui sed feugiat auctor. Aenean at accumsan mauris.

after conversion:
<p class="calibre2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. </p>

<p class="calibre2">Mauris gravida urna ac vulputate efficitur. Duis ultrices </p>

<p class="calibre2">nisl id tempor ultricies. Ut feugiat metus a ornare aliquam. </p>

<p class="calibre2">Integer vel accumsan elit, in facilisis libero.</p>


<p class="calibre2">Vestibulum mollis justo ut dictum tempor. Maecenas euismod </p>

<p class="calibre2">dui sed feugiat auctor. Aenean at accumsan mauris. </p>


It seems for some reason, the webpage generated linebreak after each line instead of end of paragraph. It seems not too complicate to solve but I have no clue how.
Reply 

#2  itimpi 11-28-2020, 07:39 AM
You probably want to enable the heuristics processing option for conversions that have that problem.
Reply 

#3  Doitsu 11-28-2020, 07:52 AM
Quote michaelbr
I'm new to calibre and conversion, I'm trying to gather some pages/information from website and put them together into an "epub" ebook.
You also might want to check out dotepub.
Reply 

#4  michaelbr 11-28-2020, 08:21 AM
Quote Doitsu
You also might want to check out dotepub.
Thanks for this tip, I'm not sure if dotpub will solve my problem, since nowadays a lot of sites has embeded frames with ads, so I assume all those frames/ads will be included/converted too?
Reply 

#5  michaelbr 11-28-2020, 08:24 AM
Quote itimpi
You probably want to enable the heuristics processing option for conversions that have that problem.
Thanks for the tips, is there anywhere a tutorial which explains tweaks for conversion? I've found only one on youtube, and it's a very short one, using one regex to solve a specific problem.
Reply 

#6  retiredbiker 11-28-2020, 01:08 PM
When you go to convert a text file, a "TXT Input" settings panel becomes available in the left side of the convert screen.

If your input text has indentation at each paragraph, or if there is a blank line between paragraphs, you can use the "Paragraph style" dropdown. "Block" will use blank lines to determine actual paragraphs, and "Print" will use indentations. (Look at the tool tips.)

If your input text is just a long list of short lines, with no indication of where actual paragraphs should be, then the heuristic processing is your next best bet. It will use the length of short lines to try and guess the paragraph boundaries. It will maybe give pretty good results, but will not be perfect.

How well any of this works depends on the consistency of the input text. If you have inconsistent indentations or blank lines, or if there are, by chance, many paragraphs that end in long lines rather than shorter ones, you should edit the input text first, to get good results.

If you are doing a copy/paste to gather text from a web page, your best bet is to paste it into Word or Writer first, fix it up there, and then convert the word processor doc to epub.
Reply 

#7  michaelbr 11-29-2020, 03:58 AM
Quote itimpi
You probably want to enable the heuristics processing option for conversions that have that problem.
Thanks itimpi for this tip, unfortunately it did not work, maybe the linebreak is somehow hardcoded into the page when the webpage was generated.
Reply 

#8  michaelbr 11-29-2020, 04:02 AM
Quote retiredbiker
When you go to convert a text file, a "TXT Input" settings panel becomes available in the left side of the convert screen.

If your input text has indentation at each paragraph, or if there is a blank line between paragraphs, you can use the "Paragraph style" dropdown. "Block" will use blank lines to determine actual paragraphs, and "Print" will use indentations. (Look at the tool tips.)
Thanks so much for your detailed explanation, it's much appreciated, I'll give it a try.
Reply 

#9  deback 11-29-2020, 11:17 AM
Quote michaelbr
Thanks itimpi for this tip, unfortunately it did not work, maybe the linebreak is somehow hardcoded into the page when the webpage was generated.
Yes, conversion will create a separate line for each line or paragraph in the text file that has a CR at the end (carriage return, same as hitting the Enter key at the end of a sentence or line or paragraph). If you remove those CRs in your text file and prepare the text without the CRs (when they are in the wrong places), the Calibre conversion will turn out the way you want it.
Reply 

#10  michaelbr 11-30-2020, 03:35 AM
Quote deback
Yes, conversion will create a separate line for each line or paragraph in the text file that has a CR at the end (carriage return, same as hitting the Enter key at the end of a sentence or line or paragraph). If you remove those CRs in your text file and prepare the text without the CRs (when they are in the wrong places), the Calibre conversion will turn out the way you want it.
Thanks deback, but the trouble is to remove the CR, I don't think it can be removed automatically (there're CR at the end of each line and at the end of paragraph), is there any other way than to remove manually?
Reply 

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register