Mobileread
eBookWise 1150/REB offline reading
#1  PostGrant 03-24-2005, 02:20 PM
Hey guys.

I use SiteScooper to gather all the sites I read for the day (BBC, Guardian, The Times, a couple of my favorite blogs). SiteScooper automatically creates an index site for all these, so if I Run SiteScooper ON this Index it creates a big fat, single HTML file, which is fully indexed and consists of all my daily reading.

Really handy so I don't need to transfer 8-9 files - the ugly part is SiteScooper has no interface so I had to write a batch file for all this. The good part is it's a 1-click operation - I run it before my shower, and by the time I'm out it's waiting for me to plug in my eBook.

Anyway, just thought I'd drop my experience at how SiteScooper saved my EB. I have a feeling this software isn't really supported anymore... the last update I saw was in 2001. Ugh.

Either SiteScooper needs to be resurrected, or some of hte folks at FictionWise/eBook Technologies need to realize the importance of not reading just DRM stuff.

#2  CINCNORAD 05-13-2005, 03:05 PM
Any chance you could share a tutorial on how to accomplish this? And if you would like to be my hero -- possibly share any batch files that would help? Been checking out sitescooper, and man that program looks complicated...

#3  technobritt 08-17-2006, 12:51 AM
PostGrant--I second CINCNORAD's request for a tutorial or walkthrough if possible. Reading websites offline would be EXACTLY what I'd use this device for most often.

#4  PostGrant 09-01-2006, 01:24 AM
I hear ya. A few months ago, I talked with the developer of the librarian software, and he was developing a spider for the EB1150. It seemed to work pretty darn well. Maybe he needs more beta testers?

In the mean time, I'll come up with a HOWTO on sitescooper. Give me a few days, I gotta go out of town.

#5  stobs 09-01-2006, 03:55 AM
please post it additionaly to the wiki

-S.

#6  Gatton 02-13-2007, 05:31 AM
I thought I'd bump this as I am looking for ways to read news offline on my EB1150. Are there any EB1150 users who can share some tips? The sitescooper page appears to be down and I don't know if it would even work on OSX which is my only option for now (Windows box blew up.) Alternatively is it possible to use something like wget? Bottom line is it would be nice to download a set of headlines/stories in one big html file for easy reading on this device. Any advice is appreciated. Thanks. Oh and I guess I should say I'm mainly interested in news sites like BBC, Washington Post etc. Thanks again.

#7  sea2stars 02-13-2007, 01:45 PM
I'm looking into Sitescooper and wget too since I'm looking to purchase a eb1150 shortly.

Sitescooper should work on a Mac. There's plenty of info on the net about the subject, although there still isn't a GUI; at least I can't find one.

I believe that there are front-ends for wget for the Mac & PC; again, Google is your friend.

#8  TadW 02-14-2007, 06:03 AM
@sea2stars: Sitescooper is console-based only, and development has been stagnant for a long time. It should definitely work on Mac if you have Perl installed.

#9  ashkulz 04-03-2007, 02:03 AM
I customized bloglines2html so that it would work for my REB1100. It does a lot of other things, namely downloads all the referenced images, blacklist some image domains, remove some unneeded links, and customizes the default templates to read and navigate properly on the ebook.

You will need to download these three files: bloglines2html and two required libraries: feedparser and BeautifulSoup. Put all of them in a single directory, and install Python if you don't have it installed.

Just run the command
Code
python bloglines2html.py -u userid -p password -o <some-dir>
Point your creation utility at index.html in the directory. I typically use
Code
rbmake -bef 1 -o feeds.rb index.html

#10  nrapallo 05-26-2009, 11:32 PM
Quote ashkulz
I customized bloglines2html so that it would work for my REB1100. It does a lot of other things, namely downloads all the referenced images, blacklist some image domains, remove some unneeded links, and customizes the default templates to read and navigate properly on the ebook.

You will need to download these three files: bloglines2html and two required libraries: feedparser and BeautifulSoup. Put all of them in a single directory, and install Python if you don't have it installed.

Just run the command
Code
python bloglines2html.py -u userid -p password -o <some-dir>
Point your creation utility at index.html in the directory. I typically use
Code
rbmake -bef 1 -o feeds.rb index.html
While the links above are no longer active, I was able to get a copy of the above modified python code and shell script directly from ashkulz a while ago. I attach them here in case you are looking for/need same.

EDIT: provided a revised bloglines2html.py for Windows Users (changed three occurrence of 'w' to 'wb' in file operations that work with binary data i.e. images). See the bloglines2html.py.zip attachment.

EDIT2: provided some sample .imp conversions, but needed to tweak the resulting .html to split <a name= href= > into <a name= ><a href= > as well as re-save a few images that were in an incompatible format for the python image handler. Oh yeah, created the .opf also. I'll try and automate these (necessary revisions) a bit more, later on.

p.s. Added a REB1100 .rb (in bloglines2html - May 26, 2009.rb.zip) created by eBook Publisher. A rbmake version (as ashkulz prepared) may be better compatible with the REB1100.
[zip] bloglines2html.zip (57.7 KB, 956 views)
[zip] bloglines2html.py.zip (6.9 KB, 871 views)
[imp] bloglines2html - May 26, 2009.imp (1.45 MB, 973 views)
[imp] bloglines2html - May 26, 2009_1200.imp (1.44 MB, 982 views)
[zip] bloglines2html - May 26, 2009.rb.zip (192.5 KB, 961 views)

Today's Posts | Search this Thread | Login | Register