Sitescooper: The Basics
#1  ignatz 03-11-2004, 12:48 AM
Herein I will attempt to impart my so far very limited understanding of Sitescooper. The hardest part for me was getting everything working to begin with, so hopefully this will help alleviate that pain for a few others. All my discussion is based on an OS of Windows XP Pro, converting to iSilo and html, and reading on a Palm Tungsten E. But much of it should be readily applicable to other methods. If I make any errors in the following, please do not be shy about correcting me. I've only been at this for a few days myself.

#2  ignatz 03-11-2004, 12:50 AM
Part 1 - Setup

This part can be the hardest and is probably why you see so many posts by folks who just couldn't get it working at all. But follow along with me and you'll probably get by just fine.

First get perl. Perl is a scripting language often used on websites for nifty tricks and all sorts of things. I won't pretend to be knowledgeable about it. Sitescooper is written in perl, so in order for it to run, your OS must have a copy. Most Windows machines will not have it already. (I think that many Linux installs include it by default.) So go here to ActiveState and download the latest build. I download it in the "msi" format, which is a Windows installer file. If that will not run on your machine you need to update your Windows Installer package. (To be honest, I'm not sure how that's done, though I know I've done it... Perhaps someone can interject on this? You might try Windows Update?) There is also a download option called the "AS Package," which is a zip file. Don't know much about it. I do see that on the sidebar it says that you cannot uninstall from the AS Package. Anyway, install perl. Make sure that you answer yes to the question that will put perl in the PATH. You'll see why later. And I think that you need to reboot after this install, though it won't tell you so. Can't hurt.

Okay, now that's half of it. Now we need the program. Follow along with what Alexander says in the intro post and get the "bleeding edge" version from This link will download the latest Windows version in a zip file. Unzip it wherever you like, but put it somewhere that you think it will stay. I put mine in my C:/Program Files/Palm directory.

Do you use iSilo? Make sure that you have a copy of iSiloXC. This is the command line version. You can get it here. Unzip the file and copy the file iSiloXC.exe into the top level of the "sitescooper-3.1.3" folder. This is what Sitescooper will use to convert your scooped files. **EDIT** If this does not work by itself, put a copy of iSiloXC.exe in your "C:\Windows\System32" folder.

So that's pretty much it for setup. But you'll see later that there's lots of tweaking to be done to get things just like you want them.

#3  ignatz 03-11-2004, 12:51 AM
Part 2 - Basic Scooping

So what's in here? Lots of stuff with strange extensions but no .exe files so how the heck do you get it running? Well for those of you who want to leap into the breach, here's the quick way to get started. Write a simple batch file. That's a little text file that runs shell commands. Open your favorite text editor and type this (without the quotes): "perl -misilox". (Note that this will work only for iSilo users. If you want to use Plucker or something else, see below in Part 4.) Save it as "sitescooper.bat" also in the top level of your "sitescooper-3.1.3" folder. The .bat extension is critical, as this is what lets Windows know that there are commands to be run. Now if you double click on the sitescooper.bat, a command window will open and stuff will start happening. A text file will pop up with hundreds and hundreds of sites. Scan through the document and put an "X" in the brackets in front of any site that you want scooped. Don't be in a rush to get a ton of sites right away. It's a huge overwhelming list. Pick one or two that you want to see work and put the X's in. Save the text file and close it. The action will continue in the cmd window. Then it will vanish. Now do a sync. If all has gone well and I haven't made too many errors, your scooped files should be in your RAM and readable from iSilo! Congrats!

So assuming all went well, what now? Well, if you want to add or subtract sites, you can go into your "tmp" directory and directly edit the file there called "site_choices.txt". But there's a better (IMHO) way. If you picked any sites from the site_choices file, edit it and remove them. Create a folder in the top level of the sitescooper directory called "sites". Now browse the folder called site samples. Any site that you want, copy it into the sites folder. Now if nothing is marked in the site_choices file, Sitescooper will read from your sites directory and scoop anything that's there. (Even if something is marked in the site_choices file, I think that Sitescooper will do both those files and the files in the sites directory...) Be aware that a lot of the .site files are outdated and some no longer work. If everything seems right, but it's not working, then try a different site before you give up on it.

#4  ignatz 03-11-2004, 01:06 AM
Part 3 - Basic Troubleshooting

But what if it didn't go well? Well, there's a LOT that can go wrong, considering the many different elements that have to work together to make this happen. Since I'm a newbie myself, I won't try to guess what will go wrong for you. I'll just show you some ways to diagnose what's going on and some places to browse and tweak.

First thing to do if your sites aren't scooping is watch the progress of the program. Unfortunately, that's where batch files don't work so well, because as soon as they're done, the window closes and it all happens far too fast to read. But there is another way. From the start menu, click "run". Type "cmd". This gives you the same command prompt window that your batch file runs from. Now change to the sitescooper directory. Just type "cd c:\program files\palm\sitescooper-3.1.3" or whatever the actual address is for you. (Note that you can also type "cd c:\prog*\pal*\site*" or something like that. Wildcards rock! But wait until you see the power of regular expressions. Coming soon...) Now type the same command that we put into the batch file: "perl -misilox". Now when it's done, the window stays open and you can scroll back and have a look. This is probably the single easiest way to see where things are going wrong. Most of the time for me, the problem was that nothing was happening here. Perl wasn't being found or the sites were in the wrong place, etc, etc. Also, if you can't make heads or tails of it, you can copy it all out and paste it into a help request. Then hopefully someone else can descipher it and let you know what's happening.

The next place to check is the documentation. Now I don't want to offend, especially as this whole program impresses the **** out of me, but the docs could be better written. At the least, they could use a better table of contents. But hey, at least you've got some, which is more than you can often say. They are in html form in the "doc" folder. When you open the index file you will not immediately see any links to the other documents. Never fear, it's way down at the bottom of the page. There are good descriptions of how to install on different system that you should double check. Keep in mind that the docs are also dated and don't reflect some of the latest changes. Still, they are how I've learned most of what I've gotten working so use 'em.

If these don't help, then the next step is to start asking. I'll help if I can, and there's lots of others who know more, I'm sure.

#5  ignatz 03-11-2004, 01:18 AM
Part 4 - Plucker, DOC, and others

Well, as I said, I've only used iSilo and html so far. What I can tell you is where to change things to get your other systems started.

The first step is to edit your "" file. This is the configuration file that Sitescooper defaults to. The key in here is to tell Sitescooper the location of the conversion tool that you are using. You'll want to check the documentation on this, and the config file itself is pretty heavily commented, so it shouldn't be too hard to figure out.

The next thing is to change the command switch. The -misilox switch tells Sitescooper to use iSiloXC.exe to do the conversion. For the other formats substitute the following switches:
-doc for DOC format
-plucker for Plucker
-richreader for Richreader format
-html for html format
Pretty straightforward, right? These are enumerated in the documentation. Of these, I've used -html successfully. The one thing to note for using it is that by default Sitescooper will dump the scoops into the "tmp/txt" subdirectory. If you are having problems with other methods, you can convert to html and then use whatever desktop software you have to convert to your final format.

#6  ignatz 03-11-2004, 01:33 AM
Part 5 - What next?

And where to go from here? This is as far as I can take you this evening. But there's so much to explore. Read the docs on constructing .site files and build or modify some. Actually, just read the docs in general. There's a ton of good stuff in there. For example, the basic html form that the documents are output in can be changed with html templates. You can also tweak the way the files are named as well as other parameters with some command line switches. (Personally I don't like the way that Sitescooper defaults to a "Date - Name" convention. If you include the switch -nodates in your command line, the dates will drop out.) Also the immense power of regular expressions (see the documentation on how to build .site files and also Alexander's post)

There's lots of power here. All we've done in this intro is (hopefully) get the engine started. I've got a little project that I'm tinkering with to warm up and then I hope to tackle some more challenging sites. And I haven't even begun to explore the huge list of sites that have already had .site files written for them...

So please let me know if this document helps you. I hope it answers more questions than it raises. Soon I hope to add a section on changing the output templates, as well as a description of my comics project. I also hope that others will contribute what they've done with Sitescooper and any cool tricks they've found.

#7  TadW 03-11-2004, 08:20 AM
Great tutorial!

I will install Perl/Sitescooper the next days and follow your instruction.

#8  ignatz 03-21-2004, 04:02 PM
Anyone have any comments on this intro so far? Has anyone used it successfully. I think Alex followed along here and got his working...

Just a couple of points that I have learned along the way. My work computer worked following only the steps outlined before. However, my home computer gave me a little more trouble. There were three issues and they may help with install problems.

# PilotInstallDir: $HOME/pilot/install
Remove the "#" in front and replace the location with something like the following (note that this line shows where the file is located on my drive; yours may be different):
PilotInstallDir: C:\Program Files\Palm\Ignatz\iSiloI
The "iSiloI" subdirectory inside your Palm install is where iSilo puts files to be installed. It appears to me that Hotsync checks here automatically to see if there is anything to install.
As I say, I would love to hear feedback or problems to improve this tutorial!

#9  Alexander Turcic 03-24-2004, 11:28 AM
Quote ignatz
As I say, I would love to hear feedback or problems to improve this tutorial!
Ignatz, I think you did a wonderful tutorial. Thank you! It helped even me lazy butt to install Sitescooper and to discover the beauty of it. Here I have two interesting links to support your tutorial:

This is a step-by-step guide to writing a .site file for your favorite site:

This is an explanation of all possible .site parameters:

#10  roychang 10-31-2004, 01:52 PM
Just happened to find this dedicated forum for Sitescooper so decided to contribute something since I simply love Sitescooper.

I've been a Sitescooper user for over a year and had done a couple of guides for the community at SPUG.

Jumpstart Guide to using Sitescooper with Plucker or IsiloX
How to perform Concurrent Scooping with Sitescooper

Hope those interested in trying or finetuning this great tool will find these useful.

  Next »  Last »  (1/2)
Today's Posts | Search this Thread | Login | Register