![[Cataloguer's Toolbox]](toolbox.gif) |
URL checking procedures
For maintaining catalogued Internet resources |
The procedures which follow can be used for generating complete listings
of Internet resources from the Library's Unicorn catalogue for use with an automated
link checker such as LinkBot. While there are a number of steps involved, it is really
not a difficult task. However, given the current state of the Internet, it should be seen
as an essential task, since estimates of the average lifespan of an item on the Web in 1997
is around 45 days! For that reason, it is suggested that this report be run at least
every two to three months.
As the number of such resources in the catalogue increases, so too will the amount
of time it takes to keep these descriptions current. We can only hope that the
institution of URNs or other devices to make addresses server-independent, or a
vendor-implemented automatic means of regularly checking the status of MARC
field 856, catches on before we reach the point at which we can no longer keep up
with this task.!
Run Unicorn report
There are two steps to creating the file within Unicorn: 1) create a report to
locate any described Internet objects (this can be done once and set to run
NEVER so that it can be copied every time you want to run it), and 2)
formatting and running the report to save to disk.
To set up the initial report:
- Enter the reports facility within Unicorn
- Select 1 Running reports, then Create
- From within the Bib group select bibliography
- Choose Review, then 3 Selection
- At the "Search string" screen, choose Change and enter http and hit return
- Enter other Internet protocols found in 856 (ftp, gopher, etc.) to the search list as
appropriate. Hit return key to bring you up to the "Catalog selection phase"
- At "Include records based on the following catalog characteristic" choose Next
- At "Include records based on the following callnum characteristic" choose Next
- At "Include records based on the selected item characteristic" choose Next
- At "Select the phase to review and/or modify the answers" choose 4 Sorting
- "Select what you want to sort by"- the only one that makes any sense here is 3
Title, since this is the only element all records will actually have. Return your way back
to the "Select the phase" screen.
- Choose 5 Formatting, then Change
- At "Select a format for printing items" choose 2 Catalog shelflist
- At "Select the catalog level information to add/remove to/from the output" add
Tagged catalog record. Remove any other phrases in this section.
- At "Select the catalog entry list to use" select 2 Custom list. From the listing
offered choose 73 (245) and 218 (856) [Note that the list numbers
may change over time as new fields are added or subtracted- the field tags will not]
- At "Select what information about call number to print" choose 4 No call number
information
- At "Select the item information to add/remove to/from the output" select whatever default
text is supplied and remove it. No item information is needed for this report.
- At "Would you like to print a record per page?" hit enter
- At "Select filtering options for shadowed information" select 1 Public information only
- Hit return until you are up to the Select operation screen. Select Schedule. Give
your report a name.
- At "Select when the report should run" choose 4 Never. This saves the report for
use later on.
For subsequent runs of the URL report:
- Enter the reports facility within Unicorn
- Select 1 Running reports, then Copy
- Choose the report you saved above and hit enter.
- At "Select operation" choose Schedule
- Enter a name for the report and hit enter
- Run this report ASAP or schedule it for later in the day using <2> Future.
- Use the enter key to bring you back out of reports.
To capture the report:
- Enter the reports facility within Unicorn
- Select 2 Finished reports, then Download
- Choose the report you named under #5 of the second section above
- At "Select the format option" choose Change to change the page and line lengths
- Change the page length to the maximum (1000) and the line length to 250. This
is important, as otherwise some of your URLs might be wrapped by the report format.
A space is inserted into any HTML text wrapped onto a new line, thus rendering the URL
ineffective.
- Use the enter key to bring you forward to the "Please enter a valid PC file name" command.
Enter a filename by which you will find the file afterwards in your default Sirsi directory
(usually c:/sirsi/chessinf). Hit return and you will see the familiar Unicorn load bar.
- When you are brought back to the "Select operation" menu, you are finished with this stage
of the operation.
Edit and markup resulting report
Since most automatic link checkers work off of html documents, the next thing to
do is to markup the Unicorn report using HTML. For this you can use WordPad,
WordPerfect, Word, or whatever word processor you have handy that can do "search
and replace" operations and output plain (ASCII or DOS) text:
- Retrieve finished URLchecker report from default Unicorn directory on local PC
- Add html header and footer
- Perform the following "search and replace" operations to enable the 856|u in the
link checker
replace |uhttp: with <a href="http:
replace |2http with ">URL</a>
replace |uftp: with <a href="ftp:
replace |2ftp with ">URL</a>
- Save the file with an .htm (eg. "urls9710.htm") file extension
Run LinkBot
LinkBot is a Windows-based link checking tool which can be used against both networked
documents, such as Internet-accessible Web pages, and locally stored (on your PC) text files.
It examines the marked-up URLs in these documents for syntax problems, goes out and queries
the servers housing any linked documents, and returns a set of HTML files which describe what
the remote hosts reported back to it. These are neatly categorized for manual cleanup. The
procedures below assume that the file being checked is stored on your PC as a file rather than
on a publicly-accessible Web server. If, for some reason, you have placed the finished file on
the Web, simply enter the URL in the Location box rather than the filename.
- Start up LinkBot from your Windows workstation
- Enter the complete path and filename (or URL if you have placed the resulting file on a
Web server) you wish to have checked in the LinkBot location window.
- Running the marked-up Unicorn file will take anywhere from five minutes to an hour,
dependent on the size of the file, the speed of your Internet connection, traffic on
the net, and other factors. Before 10:30am is probably the best time to do this.
- The finished report will open up a Netscape (or whatever your default Windows browser
might be) session with the results presented in frames.