Mobileread
A Do It Yourself "Read It Later" Service for Koreader
#1  gummihuhn 01-23-2016, 09:07 AM
In addition to reading a lot of books, I read a lot of news. I love KOreader, and like to read using that tool as much as I can. I've experimented with a lot of tools: Pocket, Wallabag, Calibre, Calibre2OPDS, COPS and many more. But none of them provided the simple, seamless integration of my reading list with KOreader that I desired. So I pieced together my own using some really nice open source tools.

The tools:

Syncthing
I use Syncthing to sync my books between my computers and my devices running KOreader. I also use it to sync a lot of other things between devices. Syncthing is open source, peer-to-peer (no server required) sync software available for a wide variety of platforms. Even if you aren't interested in the "Read It Later" solution I describe in this post you should consider using Syncthing to sync your device(s) running KOreader. There is an Android app, instructions for Kindle Touch and just this week, thanks to tshering, a simple installer for Kobos running KSM. It should be fairly easy to put Syncthing on other e-reader devices. Even if you can't or don't want to install Syncthing on your device, you can use Syncthing for a very easy USB sync solution.

Five Filters
Five Filters offers a variety of content-related tools that may be of interest. The one I use most heavily (and the one used in my "Read It Later" solution) is called "Push to Kindle". Don't worry, despite the name a Kindle is not required. If you submit the URL of web page to this tool, "Push to Kindle" creates a nicely formatted .epub, .mobi or .pdf which can be emailed to your Kindle device (hence the "Push to Kindle" name) or downloaded to your computer. (Note that if you prefer to run "Push to Kindle" on your own server, an open source release is coming soon.

Pandoc (optional)
Pandoc is an open source document converter. It is a very powerful (albeit complicated) tool. I use it as a backup to the Five Filters downloads, since for some unknown reason images are stripped from the Five Filters epubs. For some of the websites I follow, the images are very important (eg, financial charts) so I use Pandoc to generate epubs for them. The downside is that the outputted epubs are not nearly as pretty as their Five Filters counterparts. I'm sure that this could be fixed with stylesheets etc but I have not looked into this. Again, using Pandoc is entirely optional. It is available for a wide variety of platforms. If you need to install from source (this won't apply to most people), I recommend creating a "relocatable binary".

A Simple Script
Here is a simple script I wrote to use these tools together:

Code
#!/bin/bash
# a simple script to download an epub version of a given web page from http://fivefilters.org/kindle-it/
# or (optionally) generate an epub version of the given web page using Pandoc (http://pandoc.org/)
# change the next line to the absolute output path where you would like the epub to be saved inlcuding the trailing '/'
savepath="$HOME/Documents/"
# OPTIONAL: the absolute path to the list of domains for which you want epubs with images (less pretty output)
# Use one fully qualified domain name (https://en.wikipedia.org/wiki/Fully_qualified_domain_name) per line.
# Pandoc must be installed to use this feature.
pandoclist="$HOME/.config/pandoclist"
now=$(date +"%s") # store the current time
url=$1 # store the input URL
furl=${url#*://} # remove the 'http://' or 'https://' from the input URL
domain=$( echo "$furl" |cut -d/: -f1 ) # get the domain for checking against Pandoc list
# the next line contains the options to pass to Five Filters
durl='http://fivefilters.org/kindle-it/send.php?context=download&format=epub&url='
durl+=$furl # construct the full URL of the epub request URL
oname=$(basename $url) # save the last part of the URL, which we will use to name the epub
oname="${oname%.*}" # remove the file extension (eg .html)
oname+=-"$now" # add a timestamp to prevent overwriting of files with same name
oname+='.epub' # add the .epub file extension to the output name
opath=$savepath$oname # define the absolute path to the output file
if grep -Fxq $domain $pandoclist # check for match in the Pandoc list
then pandoc -r html $url -t epub -o $opath # generate the epub and store it in the specified directory
else wget -b -q $durl -O $opath # download the epub and store it in the specified directory
fi
Putting It All Together
  1. Install Syncthing on a computer. Optionally, also install Pandoc on the same computer.
  2. [Install Syncthing on your KOreader device(s), or set up your Koreader device(s) for simple USB sync.
  3. Configure the folders to be synced between your computer(s) and your KOreader devices(s). See http://docs.syncthing.net/intro/getting-started.html. I use one folder ("Books", with subfolders) for my books, and another folder ("News") for epubs gathered with the above tools.
  4. Put my simple script on your computer and make sure it is executable. Make sure you edit it to set where the epubs should be saved (this should be the same as one of your synced folders), and optionally, the location of your list of websites for which Pandoc should be used instead of Five Filters.
  5. Now test it. From the command line, in folder where your script is:
    ./<name of script> <URL of web page>
  6. Assuming it is working as you like it, set up your browser, RSS aggregator etc to pass a URL to the script with a simple keyboard shortcut. This is left as an exercise for the reader.

Now, at the press of a couple of buttons on your computer, any URL you desire will be turned into an epub and automatically send to your KOreader device(s).

Enjoy! Suggested improvements or alternative approaches welcome.
Reply 

#2  Markismus 01-24-2016, 06:26 AM
I usually use rsync on linux. Had a look at pandoc quite some time ago. Is it already a viable solution for Latex to epub without loss of formatting?
Reply 

#3  gummihuhn 01-24-2016, 08:05 AM
Quote Markismus
I usually use rsync on linux.
rsync is a nice tool, which I use a lot. But once you want to do two-way sync and/or more than two devices are involved, I find Syncthing works really well.

Quote Markismus
Had a look at pandoc quite some time ago. Is it already a viable solution for Latex to epub without loss of formatting?
Sorry to say, I don't know. I've only used Pandoc for converting HTML to epub (haven't looked into what intermediate formats are used for that), and in that use case there is definitely a loss of formatting. I only use it for a couple of sites I follow where images are important, and for me the output is "good enough"-- at least until I find a tool that works better for this. Generating PDFs from the HTML is another option, which I may experiment with when I get some free time.
Reply 

#4  Alan_S 01-25-2016, 04:26 AM
You can check https://dotepub.com/

They also offer easy conversion of web pages into epub with images, but this also doesn't give great result. As I didn't tried Pandoc and don't know how bad result with it is, maybe dotepub gives same or similar result.

Please check and share how it works for you.
Reply 

#5  gummihuhn 01-25-2016, 06:56 AM
Quote Alan_S
You can check https://dotepub.com/

They also offer easy conversion of web pages into epub with images, but this also doesn't give great result. As I didn't tried Pandoc and don't know how bad result with it is, maybe dotepub gives same or similar result.

Please check and share how it works for you.
Thanks for that suggestion.

I did look at dotepub. Using the bookmarklet, I got better results than I've been getting with Pandoc. Unfortunately, it appears that the only way to use dotepub programmatically is to use their API, which requires you to parse the HTML yourself. If I parse the HTML, I've already solved the formatting issues with Pandoc, so dotepub doesn't offer much of an advantage.

If I can figure out how to grab the "Printer Friendly Format" link from sites where images are important and pass that URL to Pandoc, that should solve the problem. This probably requires site-specific configurations or "recipes", which I may play with at some point, but this isn't a huge priority for me at the moment.
Reply 

#6  gummihuhn 01-27-2016, 05:22 PM
Quote gummihuhn
If I can figure out how to grab the "Printer Friendly Format" link from sites where images are important and pass that URL to Pandoc, that should solve the problem. This probably requires site-specific configurations or "recipes", which I may play with at some point, but this isn't a huge priority for me at the moment.
I've figured out how to make the output of Pandoc much nicer, including easy per-site configuration settings.

An example of the current output is attached. To customize output content for the source website of that epub, here is all I needed:

Code
.date {display:none;}
.tophat {display:none;}
.persistent-header-placeholder {display:none;}
.lede-headline {display:none;}
.social-share {display:none;}
.article-rail {display:none;}
.terminal-tout {display:none;}
.read-this-next {display:none;}
.article-tags__tag {display:none;}
.article-tags__tag-link {display:none;}
.unsupported-browser {display:none;}
.footer {display:none;}
.footer__container {display:none;}
This is representative of the number of lines necessary for most other websites I've set up. Those settings won't change until the website gets redesigned (a fairly rare occurrence), so once a site is set up should just work. Getting the relevant CSS classes is pretty easy in any modern browser, even if you don't know CSS. And of course those per-site settings can be shared between people.

There are still some obvious improvements to be made, but I like the progress. After I've had some time to clean up my script and write up a how-to (probably this weekend), I'll post the updated script with instructions in case anyone is interested in trying it.
[epub] asia-stock-futures-signal-gains-as-apple-oil-hit-u-s-contracts-1453931458.epub (60.4 KB, 201 views)
Reply 

#7  gummihuhn 01-30-2016, 10:48 AM
I've reworked my script to make it quite a bit more flexible. As part of that it has become two different scripts.

I haven't yet had time to write up a how-to for customizing website-specific output from Pandoc. I hope to do that over the next week or so, and to push out a few more website-specific formatting rules.

I'm mainly only doing this to meet my own needs (other tools just weren't cutting it for me), but if you do give it a try, feedback and suggestions are welcome.

You can see the latest iteration and follow future developments here: https://github.com/0r0/klemheist
Reply 

#8  loviedovie 02-10-2016, 07:09 PM
wallabag works great for me but epub export is not automatic. I tend to use wallabag client on Android.
Reply 

Today's Posts | Search this Thread | Login | Register