Sunrise XP is a great Windows utility for getting websites, ebooks, and anything else in HTML format onto your PDA, for reading in Plucker and Vade Mecum. Laurens Fridael, the creator, has graciously sacrificed a lot of time and effort into improving and refining this application, so I thought Id do my part to show my appreciation to him and the MobileRead community by putting together a Sunrise XP for Newbies tutorial, since Ive spoken with new users recently who dont know where to start. We had a few forum threads that provided some information each time Laurens created a new alpha or beta version, but there was no single good place to get instructions. A good portion of this text was modified from Laurens posts on the MobileRead forums and the programs help documentation, so lets call him a co-author.
Requirements for Sunrise:
- Windows XP / 2000 / 2003
- Pentium II or better
- 128Mb RAM for common websites. 256Mb for larger documents.
- HotSync Manager 4 or 6. (Should also work without HotSync installed, but does not work with older HotSync versions.)
Please forgive me for the length of this tutorial, (in my initial goal to create it, I never thought it would get this big). I promise you, it will be worth it!
Sunrise is used in conjunction with Plucker, a Palm OS offline reader, and Vade Mecum, a Pocket PC equivalent. These programs are similar to a web browser (like the one in which you are reading this piece) except that they are meant for offline viewingin other words, you download the entire content first and later read the content on your PDA without an internet connection. Those of you who use AvantGo are already familiar with the concept (though I believe AvantGo also lets you surf online as well). If you dont have either of the readers on your PDA yet, dont bother reading any further and get yourself a copy of them first:
For Palm folks, all you need from the Plucker website is the Plucker Viewer in whatever language you use. As of this writing, I suggest you use version 1.8, the last stable version. I am going to avoid getting into detail about how Plucker works, since that could be a totally separate piece, but the Plucker website
http://plkr.org has pretty extensive documentation, as does MobileRead itself.
Since I dont have Wi-fi on my Tungsten E2, Sunrise/Plucker is a great way to provide myself with my favorite web pages, e-books, etc. I run Sunrise nightly with a HotSync, and when I wake up in the morning, I have the days newspaper, weather, movie listings, favorite blogs, etc. right there for me to view during that day. There are also pre-formatted e-books available in Plucker format as well. I also have some reference websites in my PDA that permanently stay there for future use. Pretty much anything you can find by surfing online in your browser can be (with a few limitations) parsed with Sunrise XP and viewed later in Plucker.
I have had only exclusive experience with Plucker and no exposure at all to Vade Mecum, so this tutorial will primarily talk only about Plucker/Palm use, but I presume that most of this content applies to either application. (I would appreciate if any Vade Mecum users out there could chime in if there is a notable difference in how it works, or, even better, modify a copy of this tutorial for the benefit of Vade Mecum users.)
Please also note that there have been earlier java-based versions of Sunrise (called Sunrise Desktop, which were java-based. The old Sunrise (and JPluckX, an even earlier creation of Laurens) function similarly, but for most users, Sunrise XP will be the most user-friendly and versatile one.
Sunrise XP is a desktop PC-based program that downloads web content to your PC, either when you HotSync, and/or on a schedule you designate. By pre-fetching all the content, your viewing experience in the Plucker viewer on your Palm is fast, since theres no active downloading taking place while you are reading the content. After Sunrise downloads the web content, it then parses out all the unnecessary HTML code to make the files smaller and also to improve the viewing experience on a small PDA screen. You have various ways to control what downloads you want, when you want them and how you want them formatted. You also can filter out unneeded content (more on that later). Sunrise then takes this modified HTML and puts it into a *.pdb (Palm database) file format.
Once the .pdb files are generated, they are waiting on your computer to get loaded onto your PDA, either in RAM or on the expansion card. For Palm users, the best way to get them to your PDA is to configure Sunrise to update as part of the HotSync process, through a HotSync conduit. (Dont worry about terminology, this will all be explained later). In this way, it functions very much like AvantGo, except there are no limitations on the amount of content you can download per day, there are no obtrusive ads, the content is more compact memory-wise and everythings open-source!
Download the most updated (or stable) version of Sunrise XP from
http://www.sunrisexp.com . (As of this writing, April, 2006, the most current version is Sunrise XP v. 2.0 beta 7, which is pretty stable). Save the file sunrisexp-setup.exe (executable file) to your desktop computer (simplest way is to put it on the desktop in Windows). Click on the executable file to start the installer, and go through all the steps to get Sunrise installed on your PC. You should be prompted at one point if you wish for the Sunrise XP stub to be installed on your PDA; you should say yes. A small program will then be loaded into Palms Quick Install tool, for installation onto the PDA at the next HotSync. HotSync your PDA to get this stub installed, you will need it for Sunrise to send the files to your PDA. Once everythings fully installed on your PC and Palm, you can delete that .exe installation file if you wish.
Sunrise XP uses files generated by the user with directions on what and how to download websites. These files are called SXLs, which stands for Sunrise XP List. When you start Sunrise XP for the first time, you'll be presented with a blank SXL. First, though, Figure 1 (see bottom of this section) shows what a sample SXL looks like with all the information already entered to give you an idea of where were going. I have also posted this SXL if you would like to work directly with the sample SXL at some point later.
The SXL can have as many or as few documents as you wish; in this example, there are seventeen listed (seventeen rows). Note that by the term document, Im referring to a specific group of web pages listed in the SXL; depending on the link depth that is selected (seventh column), each main document could have hundreds of linked web pages below it. As you can see, some of these can get pretty largea function of downloaded pictures and many many links! Unless you have lots of room in your RAM, I encourage you to load the Plucker documents on your external memory cardmore on that later.
OK, lets go back to your blank SXL. The first thing you'll want to do is edit the default properties for new documents. Select "Edit --> Default Properties" and the Default Document Properties Box will come up (Figure 2)
Remember, these are just defaults; ANY of them can be changed later as needed for individual documents on a case-by-case basis. But try to set the defaults to what works best for your needs, as I describe below.
If its not already displayed, click the first tab, Main at the top. Leave all the blank settings under Document and Source as-is for now. Those will be changed from document to document. Under Image Settings you have choices for the quality of images as viewed in Plucker. As you can imagine, the quality, size and coloration of images will influence the size of the document files. To get an idea of how these different image qualities appear, visit
http://plkr.org/gal to see screenshots showing different bit-per-pixel (bbp) resolutions. Naturally, your PDA hardware may limit the quality, especially with an older or lo-res unit. You can select no images as a default if you dont foresee yourself displaying them. If you have a recent model PDA with high resolution (with pixel counts 320 x 320 or 480 x 320), I recommend you select Thousands of Colors (16 bpp). Otherwise, select what is most appropriate for your hardware. Remember, all these can be changed later.
Max. Size refers to the size of the pictures in the web site. For now, lets make the default 300 x 300. Also, lets check the box for Include Full-Size Alternate Images. This means that if you do have an image that is too large to be displayed full-size on your screen, you can click on that image in Plucker to see the full size image, after which you can pan Left-Right-Up-Down to see it at full size.
OK, lets check out the next tab (Output). Lets assume that we dont have a default schedule for updates in the first box, since some web sites you might want to download daily, others weekly, others every time, etc. You will want a destination, however, designated in the second box. Youll note that you have a choice to either set up your Sunrise output to be automatically installed at HotSync, or else to simply be put in a folder somewhere on your PC (in which case you would need to manually load the files onto your PDA). Most of us using the Palm OS are going to want to load our documents directly to the PDA/expansion card during HotSync just as AvantGo does, so lets assume the HotSync choice is selected. As Laurens help files state, Vade Mecum users (Pocket PC) have to output the document to a memory card or use ActiveSync or a third-party tool such as MobSync to transfer the documents to your device.
OK, but now where will we want to put this content? You have to designate a repository for the documents. So under Destinations, click New. For Palm users, the drop-down to the right of the word HotSync gives you a choice of RAM or expansion card. If you have one, I encourage you to keep stuff on the card unless its important/private material and warrants backing up. You have other choices if you pick internal RAMyour documents can be launched in the Palm OS launcher independent of Plucker, and you can have the Plucker documents backed up during HotSync. If you click OK, you should now see your destination in the box.
The next tab at the top, Feed, influences how RSS/Atom feeds are handled. Lets leave this alone for now; you may want to vary this stuff on a case-by-case business. The last tab, Advanced can also stay as-is for now as well, well discuss these features later when we configure documents. If you determine later that particular settings fit your needs better, modify those defaults at that time, and with each new Document, your default settings will be automatically entered.
There are also some general program settings on the menu under View --> Preferences. These settings control how the Program Interface works and the proxy server settings (which ideally dont need to be messed with), and Ill let you figure out the interface settings yourselfmostly minor tweaks. The only important setting at this stage is the Maximum Active Updates setting. You can simultaneously update from 1-5 documents at a time, depending upon your processing resources and bandwidth availability. I have an old clunker of a computer, so I only set it for one at a time. YMMV.
OK, now that we have our defaults, we are finally ready to configure individual documents in this SXL. I suggest you do some site reconnaissance first. No, this isnt a military operation; this is web-site reconnaissance. Laurens has set up Sunrise XP to utilize Microsoft Internet Explorers cookies (and cache too) for various reasons. This is important if you wish to download websites that require registration that is retained in a cookie (in my sample SXL, nytimes.com is one of them). Otherwise, such a website will give your Plucker document a please login screen, and nothing else. Im personally partial to Firefox (isnt everyone?), but use Internet Explorer for site reconnaissance here, to get those cookies registered.
Your starting webpage for each document is called the Source. Remember that youre going to start with the Source and linked webpages in Sunrise XP and parse them down to a form that can be viewed in Plucker. MobileRead has many listings for mobile-optimized sites that will work very well as your Source and ways to take advantage of them. The links in the item below are just the tip of the iceberg:
http://www.mobileread.com/forums/sho...?threadid=6227Another good option for the Source is to use RSS feeds-- look for a small orange rectangle logo like the one below with the letters RSS or XML. MobileRead provides a lot of information on what RSS is. If you click on a link leading to an RSS feed, you won't actually see a webpage, but rather XML code. But that's OK-- the URL for the feed you're looking at is what you want to use in Sunrise.
Youll generally get very good results viewing mobile versions of websites or RSS in Plucker. RSS and often mobile versions help you avoid unwanted ads, and file sizes will be much smaller. A third possibility is Printer-Friendly (often without pictures) or low bandwidth versions of websites. The downside is that some of these alternatives may be limited in their offerings and may not include graphics. Experimentyou can even view the Mobile optimized sites right in Internet Explorer, though theyll look funny on your PC screen.
Site reconnaissance is important while youre creating the SXL because you need to decide just how much Sunrise XP needs to download to suit your needs. Remember that starting at the Source, Sunrise will download
every possible link or graphic that you designate, unless you filter or otherwise restrict what it downloads. For the benefit of both the web hosts bandwidth and your computers / PDAs resources, you want to get what you need, but
only what you need, without going overboard. So surf around the website, particularly the Source page. Think about all of the links that are shown on that page, and consider what links are important and what extraneous stuff you shouldnt download. Are graphics needed to enhance your experience, or are they unnecessary? Will you want to select links, then select links within those links? How can we leave out ads? Are there multiple versions of the same thing (one-page version, printer-friendly version, etc.) or other links (FAQ, About Us, Contact Us, etc.) that you really dont care about? Do you want to allow links from outside that websites domain?
OK, so now you know a bit about the Source. We want to create a new document in the SXL. So in the main menu of Sunrise, select Edit --> New Document. The Document Properties box will appear.
(See Figure 3 at bottom of section).You may already see the URL of the web site you were previously visiting (if youve already copied the URL from the browser to the clipboard). If not, cut and paste the URL for the Source from the browsers URL address line directly into the line that says URL / File. You also can have Sunrise XP process a local HTML file on your hard drive for viewing in Plucker by hitting the button with the ellipsis (three dots), which will let you browse for the file
Give your document a name that simply reflects the Source, like Bills Home Page, digg.com, etc. This name is what will show up in Pluckers Library view. One important feature is the button with the ellipsis to the right of the document name. This gives you the option to put a date stamp in the name of your document, with various formatting choices. If no date stamp is added, every time you update and sync a Plucker document, the new one will have the same name as the old one and overwrite it. (This is fine if you only care to read whats new day to day, and is what I personally do on all my documents). If you do use the date stamp feature, each time a document is updated, it will have the date appended in the name. If your document name is digg.com, and you update it daily, your documents will have names like digg.com 040906, digg.com 041006, digg.com 041106, etc. This way, you can keep multiple copies on your PDA, though youll also have to manually delete old copies in Plucker when you no longer want them.
You can categorize your documents if you wish; this simply means that in Plucker, you can view all documents in the main library listing page, or only view one category at a time.
You need to designate a Link Depth, which indicates how many layers down you want to go from the Sourceprobably the
most important variable you can select. On color screens, Plucker allows you to see available links (in blue) and unavailable links (in red) so you can see all the links available to you.
If your Source webpage (and the images on it) is
all you want and youre
not going to want to click on any links, then your Link Depth should be set to 0, and all links on the page will show up as unavailable in Plucker. If you
do want to visit links from that Source page and select Link Depth 1, you will be able to click on links within the Source, but once you are one link in, you will not be able to click on any of the links in THAT page (unless those links have already been downloaded as a different link one layer down in the Source). If you select Link Depth 2, you will be able to click on links two layers in, etc. Note that even if you go two links in, you could get a
HUGE amount of content if the Source and other pages have a lot of links, so generally youre not going to want to go below Link Depth 2. However, there are several ways to limit downloading a huge number of links. One of the simplest ways is to restrict links by domain, server, or directory. Im going to use Laurens example (found in the Sunrise XP help):
In this example, the Source URL is:
http://www.server.com/directory/index.htmlSettingDescription & ExamplesRestrict to domain: Download only links in the "server.com" domain:
Restrict to server: Download only links on the server "www.server.com":
Restrict to directory: Download only links in the directory "www.server.com/directory/" or any of its subdirectories:
Note that images embedded in HTML pages are
not subject to the link restriction setting. This behavior is by design, as many sites store their images on another server or domain. Link filters (discussed later) will provide other options to restrict unnecessary content.
You have a checkbox that gives you the option to only update the document if the Source has changed. This saves processing power and bandwidth. Is there really a point in downloading everything again if the Source hasnt changed? If this setting is checked, Sunrise will make that check and not modify the document if unchanged. Thats why there are two columns in the main SXL window labeled Last Update and Last Modified. Last Update is the last time Sunrise checked for changes. Last Modified is the last time there actually
was a change made.
The images settings were discussed earlier. Modify as appropriate for your Source page and PDA type. Keep in mind that if you go outside of the Sources domain for links, those pages might be different as far as images, appearance, etc. Unless space or hardware is an issue, I suggest you keep the colors as high as possible.
The second tab, output, allows you to designate when and where updates will be generated and saved. The check box that says disable automatic and scheduled updates means that you want to manually initiate them yourself, or not at all. Perhaps you follow someones blog daily and they are going on a three-month absence without postings. You could use this setting to temporarily discontinue the automatic updates, yet keep the document in your SXL for later when the blogger is back. See Figure 4 at the bottom of this post.
Scheduling information is put into the top box by clicking New and setting a schedule. This is useful, lets say, if your source website only changes daily or weekly or monthly. You can also set updates for smaller time increments by using the hourly settings. I like the New York Times Magazine section, which only is published on Sundays. So my schedule for that is Every Sunday at 12 AM. It only gets updated the first time each week that Sunrise XP runs after midnight Sunday morning. See Figure 5 at the bottom of this post.
If you leave the schedule box blank, Sunrise XP will
always check the Source for updates every time it is run.
The destination information on the output tab should reflect your default setting from before. Change it if necessary for this particular document. (Example, you usually put your documents on the expansion card, but you wish to store this particular document in the internal RAM).
The Feeds tab really only applies if you are using the RSS/Atom feeds as described previously in Section 5 above; you can skip this tab (and section of the tutorial) for regular websites as these settings have no effect on regular HTML websites. The first two items, logo and blurb, are simply the title/logo of the feed and a brief description of the feed. Do what feels good. The next dropdown selector, Layout gives you three choices. (See Figure 6 at the bottom of this post). If you do use RSS feeds, experiment to see which configuration works best for you on each feed:
- Single Page List: This choice basically makes all the separate RSS summary listings into a single Source page when you open the document in Pluckerkind of like a simplified home page without any extraneous links, graphics, etc. You then have access to the links to read the full article if you want. This option works well if there isnt an inordinate number of items in the RSS list and the generated Source page is not unwieldy.
- Single Page List Plus Index: This choice is the same as the Single Page except that you have an index at the beginning with link names. If you have a lot more RSS items and/or the descriptions are longer, this index at the beginning lets you read the title of the RSS item on the Source page, then jump forward to the full text lower down on the Source page.
- Multiple Page Plus Index: This choice starts you out on a Source page that only contains an index listing each of the items in the feed. From the index, you can navigate to each item description on its own separate page. This choice also adds navigational links at the top and bottom of each page, so you can simply look at the feed items in order and jump to the next one as you wish. This choice is recommended for feeds with full entries that have a lot of contentThis is how I read Palm Addict, which often has 60 or 70 items in the feed at any one time, and items are often lengthy.
The other feed settings should be reasonably apparent, and affect how feeds are saved (to keep track of whether content changes).
There are a few important settings on the advanced tab that you should consider (See Figure 7, bottom of this post).
Just like your regular PC browser, Sunrise XP can cache downloaded content, and its suggested that you select the box to do so if not already checked. The cache is the same one that Internet Explorer uses. By checking the cache, Sunrise will not download content if you already have the identical file cached on your PC. This will make the overall process faster as well as save bandwidth for the web sites host. Control of the size of this cache is done through Internet Explorers menu (or control panels Internet Options menu).
As discussed earlier, Sunrise XP can use your Internet Explorer cookies, so check this box if the Sources website uses cookies to login (such as a newspaper, MobileRead, etc.)
Priority is something I dont use, but basically you can prioritize Sunrises sequence for downloading documents. Unless you prioritize them, Sunrise will update documents in alphabetical order of the documents name (first item in Main tab)
I always have the Include URL info box checked. What this means is that if you are in a Plucker document and you want to know the specific URL of what youre reading for future reference, Plucker will be able to display it. Its also helpful if you want to view the URL for a website that is beyond your link depth, or has Flash or other content that Plucker cannot display. In either case, Plucker can copy the URL to the PDAs Memo Pad, a very useful feature. Laurens instructions state that the Include URL info should not be checked if your Source document is a local file on your hard drive (which also can be processed by Sunrise for viewing on your PDA).
The Dont display unresolved links checkbox is a matter of personal taste, I never check it. As noted earlier, Plucker can display unreachable links (in red) or accessible links (in blue). If you check this box, the unreachable links will not be visible at all, but will just appear like plain text, which might be less distracting for you. I like to know if theres a link I cant reach, because I might want to find out the URL for later viewing.
The Link Filters is a
VERY important setting. Weve already had the option to filter out content from different domains, etc. Here, you can designate specific URLs that you dont want to download, or provide Sunrise with wildcards that filter URLs that have a specific pattern of characters. If you were good with your earlier site reconnaissance, youll know which links you
DONT want and which links you
DO want. Link filters are processed in the order you present them. The easiest way to illustrate what the filters do is to look at the link filters that I use for all New York Times downloads (again, Figure 7).
Heres what each filter in the image does from top to bottom:
Basically, I want to read the New York Times articles, and nothing else. This first filter limits most of my downloads to actual articles. Any web site that has a URL that starts with
http://www.nytimes.com/20* will be downloaded. (The asterisk indicates a wildcard, basically it represents any possible text.) This is how almost all of the NY Times article URLs are set up. For example, the first article on todays page has the URL:
http://www.nytimes.com/2006/04/09/world/asia/09cnd-nepal.html so this filter would allow this link to be downloaded. Any links that dont follow this convention are probably not content I want. The main section pages (National, Washington, Sports, etc.) do NOT follow this URL convention, but they seemed superfluous since I primarily only want to see content from the Source (front) page, so I intentionally had the filter work as it does. (See Figure 8).
The next filter is probably not necessary, but I wanted to be sure that all images come through, so by using *com/images, any URLs that end with those characters will be included.
I noticed after using Sunrise for awhile that none of the travel articles ever were downloaded. At one time, the New York Times had a different convention for assigning URLs in the travel section, though this no longer seems to be the case. The
http://travel2* wildcard enabled me to get all the travel articles.
Many articles in the New York Times website are stretched over multiple pages, and you are provided a link for a single page version. You also are often provided a printer friendly version. Both of these alternate versions are superfluous, since they have the same content as the primary pages, so I wanted to not download those versions. The URLs of all printer-friendly versions of articles end with the text pagewanted=print, so I had the filter ignore those URLs. Similarly, pagewanted=all gives you the single-page duplicate, so I set up the wildcard to filter out those URLs as well. Ill leave it to you to figure out what the filters */fashion/* and *privacy* filter out.
Youll have to do a little trial-and-error if you have a lot of things youll want to filter in or filter out, but once youre set up (as long as the web site doesnt change its conventions for URLs), it works great. To create a filter, hit the new button and youll get the link filter box, which gives you various choices. For the pattern, you can put either a regular expression (which is a specific URL), or a Wildcard, which uses the asterisk(s) as I did above, which can represent anything. I never change the Filter all Links drop-down, but you could have it filter specific HTML tags. (Dont worry, I barely know what that means myself
) You then need to decide whether you want to only include or only exclude URLs following your wildcard designation. Ive never used the rewrite links matching this pattern setting; perhaps Laurens can help explain that one. Update:
DTM has helpfully included an explanation about how link re-writing works; it's in his post further down below some of the comments here... .
At this point, if you have different documents that you still want to add to the SXL, go back to Step 6 and add new documents, filling in all the needed data from the four tabs.
Once you have all your documents entered into the SXL, youre going to want to save the SXL somewhere on your PC. I keep all my SXLs in a folder called
/My Documents/Plucker.