Cataloguing beyond the walls: APLA 1997 |
|
The organization of Internet information resources for library use has followed an evolutionary path paralleling the growth of the Internet itself. The earliest interest in this information seems not to have come from library selectors but rather from reference librarians who ventured out onto the net for their own information needs or to package information about library services such as hours of operation, directories of staff, and finding aids, for the public. This information, previously packaged as Hypercard stacks or Gophers, has now been almost entirely transported over to HTTP protocol.
Among the first external resources described in the early days of library Gopher servers were government and other library, university, or research sites that might prove to be useful in answering general reference questions at the desk or by phone. These were often links to organizations rather than to individuals or texts, as the individuals accessible via gopherspace tended to be information technologists more often than information creators, and plain text on a monochrome computer screen was not considered to be an esthetically rewarding experience.
With the move over to Web servers and an increase in those servers' functionality, library Webmeisters were able to get more creative in presenting information to the public. The value of a creative computer nerd (or at least someone who knew HTML and had some network savvy) to the library naturally shot up overnight. Library Web pages began pointing to electronic journals and texts as well as reference tools such as the Canadian Patent Database or the Department of Finance Canada Home Page. However, listing such resources and maintaining their links was quickly piling sites under. The University of North Carolina at Wilmington's Randall Library titles linked to Internet Resources is a case in point. These lists might be expected to work where the number of sites is limited in scope and therefore number, but given current trends in network development, is this terribly likely?
As the number of these Internet resources has grown, so has the demand for some means of organizing them for use. To this end we have divided our lists into smaller segments by subject, as in Memorial's Databases on the Internet, or by title, as in the ARL directory of electronic journals and newsletters. This band-aid approach to discovery of those objects selected by reference librarians cannot be expected to retain its functionality much longer. Adding local search engines like Glimpse or Excite doesn't help much either, at least not without adding additional descriptive information to the list of titles. After all, what does the title of Time Magazine tell you about its contents?
Some have suggested that there is no point in libraries trying to provide access to such a fast-moving target as the Web at all, since objects in that universe disappear or move constantly while new ones of value arise without any approval forms arriving at our door. Many point to the Web-based search engines as the logical successors to the library Web page or OPAC in organizing the Internet. The search engines use robots to query servers and follow links to new resources constantly and will likely always be more up-to-date than understaffed libraries in this regard. With a few exceptions (Magellan rates the usefulness of sites it lists) though, these tools make no qualitative judgment about the resources described, nor do they attempt to provide any but the most general subject structure to the Web. These also suffer from vocabulary problems, unpredictability of results, and a lack of descriptions sufficient for making informed choices about a result's relevance to one's search. Arlene Taylor, in a humourous presentation to the OCLC Internet Cataloging Project Colloquium in 1996 (Taylor and Clemson 1996), noted some of the difficulties she encountered in searching her name on Alta Vista, including this posting:
All My Children Update for Thursday September 7, 1995
Update for AMC Thursday September 7, 1995. By Ashley Lambert-Maberly. Welcome to the Thursday update! SPERM WAILS. In which Taylor wonders whether Dreck's magic sperm did the trick ... in which Arlene wants Alec to give her more of his special stuff ...in...
http:01/30/96/purplenet.com/soaps/amc/stories/sep95/sep07r.html
- size 7K - 13 Sep 95
There are other problems with automatically-created indexes to the Web which are more difficult to handle. Resources which may be available only within a proscribed community, such as those behind network firewalls and commercially available texts, indexes or ejournals, plus those which are non-textual in format, such as GIS and statistical data, are not picked up by the search engines. These can only be described and accessed by the holding or licensing institution itself, making for multiple finding tools being used for locating materials in the same format.
Organization and retrieval problems with Gophers, CWIS, search engines, Web lists, and the like have caused libraries to increasingly turn to their traditional allies in information organization, metadata within database structures, for help in organizing the Web. Depending on a particular library's outlook on non-traditional forms of information and their available system resources (both staff and infrastructure), their solution could be either to develop a new finding aid or to stick with the one they already have in the OPAC. Up until the approval of MARBI (Machine-Readable Bibliographic Information) Proposal 94-3 enabled the embedding of Universal Resource Locators (URLs) in MARC records in 1994, there was no useful means of utilizing the library catalogue for this purpose and other means had to be found instead.
One example of a worthwhile non-traditional finding aid for Internet resources is the University of California at Riverside's INFOMINE. INFOMINE uses locally created metadata contained in a mini SQL database on a Pentium server. Templates are provided for staff to create the metadata for newly discovered resources from outside of the system. While this system works reasonably well, it basically replicates the effort expended in creating a library catalogue, and in the end places the described resources in a finding aid which separates electronic from other resources of scholarly interest.
North Carolina State University in Raleigh has been attempting work-arounds to the problem of OPAC support for electronic resource discovery for several years. Alex was one such attempt to organize remote Gopher documents for library patron use. This was followed by Alcuin : a database of Internet resources, a project which basically used a subset of the MARC record to make up for the shortcomings of NCSU's terminal-based DRA catalogue of the time. While these experiments were useful in discovering many of the problems inherent in dealing with Internet resources, they have been allowed to fall by the wayside as the OPAC finally caught up with emerging technologies.
The Catalogue as Gateway
While the arrival of MARC field 856 provided the tool to enable library catalogues to act as a gateway to networked resources in 1994, it took some time for integrated library system (ILS) vendors to enable its use for the public. Presentation of this information was not the problem. Cataloguers had already been describing electronic works through note fields for some time, but for patrons the enabling technology was yet to come. The terminal-based OPACs most libraries provided were incapable of making the connections needed to make traditional catalogue records useful for electronic documents. Writing down an URL on a piece of paper and carrying it to another computer where it had to be typed in was not useful technology. It was not until the development of client-server OPACs and Web-based catalogues that the promise of the 856 was realized. Products like Sirsi's WebCat and Innovative Interfaces' WebPAC now will take the patron directly from the surrogate record into the information container itself, rendering the fact that the two are in separate files halfway across the planet from each other moot.
But getting the ILS vendors to enable the 856 was not the only hurdle to providing catalogue access to the Internet. Another hurdle was cataloguers themselves. Through an invitation posted on AUTOCAT, USMARC, and various other lists in late 1994, OCLC solicited participants for a project "to test and evaluate the efficacy of using USMARC format bibliographic records, including electronic location and access information (USMARC field 856), to provide description, location, and access information for remotely accessible electronic information objects." (Project overview 1994) While it may not have been designed with that objective, the InterCat Project was seen by many as a challenge to cataloguers to take up the baton for organizing the Internet. At the time, expertise in cataloguing Internet resources was nearly non-existent, so a list was set up so that cataloguers around the world could seek the help and support they needed from each other and Erik Jul, the InterCat guru. This project provided the needed impetus to get the profession moving toward an OPAC solution, and probably gave the ILS vendors the critical mass of users needed to enable their development efforts to be successful.
Besides expertise, documentation for cataloguing these materials was non-existent as well. Cataloguers anxiously awaited the OCLC and LC MARC format documentation for the 856, in the hopes that the examples there could be used as models for local efforts. Nancy Olson's Cataloging Internet resources: a manual and practical guide was anxiously printed off the Web and tentatively applied to whatever cataloguers could get hold of to practice their skills in this new environment. Long discussions on whether Web sites should be considered serials or monographs ensued over AUTOCAT and the new INTERCAT list.
By the time ALA Midwinter 1996 rolled around, OCLC was able to declare the Project a success and to announce a new means of dealing with the pesky problem of roving Web hosts, namely the Persistent URL or PURL. This was an intermediary server at OCLC headquarters which would store the true URL while the OCLC database stored a pointer to the PURL server in 856|u. Keeping the true URL in a non-MARC database enabled OCLC to run link checks periodically to make sure the address was correct. When a server failed to respond to the link checker, a message was automatically sent out to the library which contributed the surrogate record to the OCLC database and that institution was expected to investigate the true URL for possible editing. OCLC later placed this software in the public domain so that libraries could mount it locally.
Recently much work on cataloguing electronic journals has been done for the CONSER Cataloging Manual, including new modules for Remote access computer file serials and Electronic Newspapers. The electronic versions of this documentation make for particularly effective presentations, with hypertext links to actual documents described and other sources of further information. This type of cataloguing documentation is becoming increasingly prevalent with many library technical processing operations providing marked-up versions of their policies and procedures. The Queen Elizabeth II Library's own Cataloguer's Toolbox was one of the earliest examples of these and has now been active for over two years. Documentation on Cataloguing remote electronic resources on Unicorn is provided within the Toolbox framework. A similar tool is provided by Princeton University Library.
Problems, problems, problems
It has now been only a little over one year since the Queen Elizabeth II Library began cataloguing Internet resources in January of 1996. It all began here late in 1995 with an innocent request for assistance from a branch library which was faced with the discontinuance of a paper subscription in favour of an electronic one. The title was C M, the journal of children's materials for schools and libraries. While the Head of the Cataloguing Division was debating how to handle this problem, the librarian in charge of government publications began noting the presence of URLs in citations within the federal government's "pink list" of current publications and was inquiring about their use in providing early access to these materials. This was sufficient motivation to get Memorial into the InterCat Project and to get us started cataloguing the net. For the past year, most of the requests for access to Internet resources have come from a few sources, principally Information Services librarians, Government Documents, Periodicals, and the newly created Maps, Data and Media Division. There are now about 450 such items described within Memorial's OPAC.
Problems encountered here in cataloguing these materials are no different than those described by other agencies, namely maintenance of URLs in the catalogue record; removal of records which describe items no longer available through the Web; whether to describe the Internet version as part of the print record or as a separate entity; handling tables of contents, indexes, and other information auxiliary to the main text as well as documents that are only subsets of their print version; at what level in a website's hierarchical structure should the 856 point, changes to descriptive details as standards for this material emerge, and what level of description and analysis do these items deserve?
The problem of persistence of URLs can be handled in several ways. While the PURL solution effectively moves the URLs offsite for maintenance in a more or less permanent way, many ILS systems allow for the periodic generation of reports which can be passed by link checkers such as LinkBot to check for inoperative URLs. An even simpler solution may be the embedding of "mailto" links in these records so that the online user can inform the library of problems. Any of these solutions still demand human intervention at some point, whether it be in combing through exception reports or reading email, and the act of editing the 856 or deleting the record completely if the described site no longer exists will still fall to a cataloguer.
The decision on whether to add URLs to descriptions of paper versions or create new surrogates is rather a matter of policy formulation. After some consultation between cataloguers and various public service staff, it was decided that the public would be better served by adding the 530 (Additional physical forms available note) and 856 to the print surrogate wherever the versions were either identical or extremely close in content and title. Where the electronic version is considerably different in scope, style or title, we generally opt for cataloguing it separately with a 776 (Additional physical form entry) pointing to the print record. A mirroring 776 is then placed on the print record as well.
Additional documentation, in the form of abstracts, tables of contents and the like may be described in a 556 (Information about documentation note) with the 856 pointing to the documentation resource itself. Some print journals provide online indexes over the Web. Here is the description for that of the National Geographic:
245 00 National geographic 556 8 Index available online on the Internet via the World Wide Web. 856 7 |3Index available online: |uhttp://207.24.89.145/|2http
The problem of what level or version to point to when accessing a record is a more troublesome one. Oftentimes, the URL a cataloguer is given points several levels above the actual document being described. With Canadian federal documents the URL may point to a cover page which leads to both the English and French, or Postscript, MSWord, and Adobe Acrobat, versions of a document. Sometimes URLs lead to an index of documents such as this one for Bank of Canada Working Paper 96-12. Due to the stateless nature of the Web, URLs must sometimes include strings of search arguments (known by cataloguers as 'gobbledy-gook') which are used to query a server to find a particular document buried somewhere within its structure.
My personal favorite bugaboo of Internet resources is the Canadian federal government's structure for their Webbed versions of the minutes of proceedings and evidence of standing committees. Rather than provide links between documents of a particular standing committee over time, the parliamentary Web master has chosen to approach all standing committees by session of Parliament. This has meant that in order to provide links to all online issues of say, the Minutes of proceedings and evidence of the Standing Committee on Government Operations one has to provide separate links to those of each session.
And in conclusion...
While none of the problems encountered thus far is insurmountable, they do demand persistence, attention to detail, and some knowledge of the structure of Internet information. These are all personality traits known to exist in at least a subset of the cataloguing population. These should be put to use in our libraries as we make the transition into new information technologies. Particularly within the academic setting, I think the digitization and metadata projects sponsored by groups like CETH, the Committee on Institutional Cooperation (CIC) of the Big Ten Athletic Conference, the National Research Council, OCLC, and others show that our parent institutions and the agencies they support, assign some value to our endeavors in this regard. We should be showing our own initiative by exploring the issues involved with control of these resources and by seeking support from our selectors, administrators, and funders for bringing control to the chaos in what is evolving into a very valuable source of both primary and secondary information in the eleventh hour of the millenium.
| Table of contents. | URL: http://www.mun.ca/library/cat/catnet/gateway.htm Last revised: 24-May-1997 12:57 NST Document author: Charley Pennell |