[Cataloguer's Toolbox] URL checking procedures
For maintaining catalogued Internet resources

The procedures which follow can be used for generating complete listings of Internet resources from the Library's Unicorn catalogue for use with an automated link checker such as LinkBot. While there are a number of steps involved, it is really not a difficult task. However, given the current state of the Internet, it should be seen as an essential task, since estimates of the average lifespan of an item on the Web in 1997 is around 45 days! For that reason, it is suggested that this report be run at least every two to three months.

As the number of such resources in the catalogue increases, so too will the amount of time it takes to keep these descriptions current. We can only hope that the institution of URNs or other devices to make addresses server-independent, or a vendor-implemented automatic means of regularly checking the status of MARC field 856, catches on before we reach the point at which we can no longer keep up with this task.!


Run Unicorn report

There are two steps to creating the file within Unicorn: 1) create a report to locate any described Internet objects (this can be done once and set to run NEVER so that it can be copied every time you want to run it), and 2) formatting and running the report to save to disk.

To set up the initial report:

  1. Enter the reports facility within Unicorn
  2. Select 1 Running reports, then Create
  3. From within the Bib group select bibliography
  4. Choose Review, then 3 Selection
  5. At the "Search string" screen, choose Change and enter http and hit return
  6. Enter other Internet protocols found in 856 (ftp, gopher, etc.) to the search list as appropriate. Hit return key to bring you up to the "Catalog selection phase"
  7. At "Include records based on the following catalog characteristic" choose Next
  8. At "Include records based on the following callnum characteristic" choose Next
  9. At "Include records based on the selected item characteristic" choose Next
  10. At "Select the phase to review and/or modify the answers" choose 4 Sorting
  11. "Select what you want to sort by"- the only one that makes any sense here is 3 Title, since this is the only element all records will actually have. Return your way back to the "Select the phase" screen.
  12. Choose 5 Formatting, then Change
  13. At "Select a format for printing items" choose 2 Catalog shelflist
  14. At "Select the catalog level information to add/remove to/from the output" add Tagged catalog record. Remove any other phrases in this section.
  15. At "Select the catalog entry list to use" select 2 Custom list. From the listing offered choose 73 (245) and 218 (856) [Note that the list numbers may change over time as new fields are added or subtracted- the field tags will not]
  16. At "Select what information about call number to print" choose 4 No call number information
  17. At "Select the item information to add/remove to/from the output" select whatever default text is supplied and remove it. No item information is needed for this report.
  18. At "Would you like to print a record per page?" hit enter
  19. At "Select filtering options for shadowed information" select 1 Public information only
  20. Hit return until you are up to the Select operation screen. Select Schedule. Give your report a name.
  21. At "Select when the report should run" choose 4 Never. This saves the report for use later on.

For subsequent runs of the URL report:

  1. Enter the reports facility within Unicorn
  2. Select 1 Running reports, then Copy
  3. Choose the report you saved above and hit enter.
  4. At "Select operation" choose Schedule
  5. Enter a name for the report and hit enter
  6. Run this report ASAP or schedule it for later in the day using <2> Future.
  7. Use the enter key to bring you back out of reports.

To capture the report:

  1. Enter the reports facility within Unicorn
  2. Select 2 Finished reports, then Download
  3. Choose the report you named under #5 of the second section above
  4. At "Select the format option" choose Change to change the page and line lengths
  5. Change the page length to the maximum (1000) and the line length to 250. This is important, as otherwise some of your URLs might be wrapped by the report format. A space is inserted into any HTML text wrapped onto a new line, thus rendering the URL ineffective.
  6. Use the enter key to bring you forward to the "Please enter a valid PC file name" command. Enter a filename by which you will find the file afterwards in your default Sirsi directory (usually c:/sirsi/chessinf). Hit return and you will see the familiar Unicorn load bar.
  7. When you are brought back to the "Select operation" menu, you are finished with this stage of the operation.

Edit and markup resulting report

Since most automatic link checkers work off of html documents, the next thing to do is to markup the Unicorn report using HTML. For this you can use WordPad, WordPerfect, Word, or whatever word processor you have handy that can do "search and replace" operations and output plain (ASCII or DOS) text:


Run LinkBot

LinkBot is a Windows-based link checking tool which can be used against both networked documents, such as Internet-accessible Web pages, and locally stored (on your PC) text files. It examines the marked-up URLs in these documents for syntax problems, goes out and queries the servers housing any linked documents, and returns a set of HTML files which describe what the remote hosts reported back to it. These are neatly categorized for manual cleanup. The procedures below assume that the file being checked is stored on your PC as a file rather than on a publicly-accessible Web server. If, for some reason, you have placed the finished file on the Web, simply enter the URL in the Location box rather than the filename.

  1. Start up LinkBot from your Windows workstation
  2. Enter the complete path and filename (or URL if you have placed the resulting file on a Web server) you wish to have checked in the LinkBot location window.
  3. Running the marked-up Unicorn file will take anywhere from five minutes to an hour, dependent on the size of the file, the speed of your Internet connection, traffic on the net, and other factors. Before 10:30am is probably the best time to do this.
  4. The finished report will open up a Netscape (or whatever your default Windows browser might be) session with the results presented in frames.

Back to Cataloguer's Toolbox
Back to Cataloguing remote resources in Unicorn.
URL: http://www.mun.ca/library/cat/URLmaint.htm
Last revised: 30 October 1997
Document author: Charley Pennell