Hey guys.
I use SiteScooper to gather all the sites I read for the day (BBC, Guardian, The Times, a couple of my favorite blogs). SiteScooper automatically creates an index site for all these, so if I Run SiteScooper ON this Index it creates a big fat, single HTML file, which is fully indexed and consists of all my daily reading.
Really handy so I don't need to transfer 8-9 files - the ugly part is SiteScooper has no interface so I had to write a batch file for all this. The good part is it's a 1-click operation - I run it before my shower, and by the time I'm out it's waiting for me to plug in my eBook.
Anyway, just thought I'd drop my experience at how SiteScooper saved my EB. I have a feeling this software isn't really supported anymore... the last update I saw was in 2001. Ugh.
Either SiteScooper needs to be resurrected, or some of hte folks at FictionWise/eBook Technologies need to realize the importance of not reading just DRM stuff.
Any chance you could share a tutorial on how to accomplish this? And if you would like to be my hero -- possibly share any batch files that would help? Been checking out sitescooper, and
man that program looks complicated...
PostGrant--I second CINCNORAD's request for a tutorial or walkthrough if possible. Reading websites offline would be EXACTLY what I'd use this device for most often.
I hear ya. A few months ago, I talked with the developer of the librarian software, and he was developing a spider for the EB1150. It seemed to work pretty darn well. Maybe he needs more beta testers?
In the mean time, I'll come up with a HOWTO on sitescooper. Give me a few days, I gotta go out of town.
please post it additionaly to the
wiki -S.
I thought I'd bump this as I am looking for ways to read news offline on my EB1150. Are there any EB1150 users who can share some tips? The sitescooper page appears to be down and I don't know if it would even work on OSX which is my only option for now (Windows box blew up.) Alternatively is it possible to use something like wget? Bottom line is it would be nice to download a set of headlines/stories in one big html file for easy reading on this device. Any advice is appreciated. Thanks. Oh and I guess I should say I'm mainly interested in news sites like BBC, Washington Post etc. Thanks again.
I'm looking into Sitescooper and wget too since I'm looking to purchase a eb1150 shortly.
Sitescooper should work on a Mac. There's plenty of info on the net about the subject, although there still isn't a GUI; at least I can't find one.
I believe that there are front-ends for wget for the Mac & PC; again, Google is your friend.
@sea2stars: Sitescooper is console-based only, and development has been stagnant for a long time. It should definitely work on Mac if you have Perl installed.
I customized
bloglines2html so that it would work for my REB1100. It does a lot of other things, namely downloads all the referenced images, blacklist some image domains, remove some unneeded links, and customizes the default templates to read and navigate properly on the ebook.
You will need to download these three files:
bloglines2html and two required libraries:
feedparser and
BeautifulSoup. Put all of them in a single directory, and install
Python if you don't have it installed.
Just run the command
Code
python bloglines2html.py -u userid -p password -o <some-dir>
Point your creation utility at index.html in the directory. I typically use
Code
rbmake -bef 1 -o feeds.rb index.html
Quote ashkulz
I customized
bloglines2html so that it would work for my REB1100. It does a lot of other things, namely downloads all the referenced images, blacklist some image domains, remove some unneeded links, and customizes the default templates to read and navigate properly on the ebook.
You will need to download these three files:
bloglines2html and two required libraries:
feedparser and
BeautifulSoup. Put all of them in a single directory, and install
Python if you don't have it installed.
Just run the command
Code
python bloglines2html.py -u userid -p password -o <some-dir>
Point your creation utility at index.html in the directory. I typically use
Code
rbmake -bef 1 -o feeds.rb index.html
While the links above are no longer active, I was able to get a copy of the above modified python code and shell script directly from ashkulz a while ago. I attach them here in case you are looking for/need same.
EDIT: provided a revised bloglines2html.py for Windows Users (changed three occurrence of 'w' to 'wb' in file operations that work with binary data i.e. images). See the
bloglines2html.py.zip attachment.
EDIT2: provided some sample .imp conversions, but needed to tweak the resulting .html to split <a name= href= > into <a name= ><a href= > as well as re-save a few images that were in an incompatible format for the python image handler. Oh yeah, created the .opf also. I'll try and automate these (necessary revisions) a bit more, later on.
p.s. Added a REB1100 .rb (in
bloglines2html - May 26, 2009.rb.zip) created by eBook Publisher. A rbmake version (as ashkulz prepared) may be better compatible with the REB1100.