Thursday, August 07, 2008

Demystifying Batch-Load Analysis: What You Need to Know About Vendor-Supplied Bibliographic Records

When: Sunday, July 13, 2008, 4:15-5:15 PM

*Coordinator: Ellen McGrath, University at Buffalo
*Moderator: Kevin Butterfield, College of William and Mary
*Speaker: Yael Mandelstam, Fordham University

This program was standing room only–-well, actually a number of people were sitting on the floor, but you get the idea, it was popular!

There are a number of vendor-supplied record sets of interest to law libraries, including: Making of Modern Law (MOML), LLMC-Digital, BNA, CALI, HeinOnline Legal Classics, HeinOnline World Trials, and LexisNexis/Westlaw Cassidy collections.

Yael Mandelstam got right down to the nitty-gritty and showed us how she analyzes batches of vendor-supplied bibliographic records before she loads them into Fordham’s catalog. The importance of the “before” part became evident when Yael described the situation with the original batch of MOML records. Many law libraries loaded them, only to discover that the bibliographic records for the electronic versions overlaid the records for the microfiche versions by mistake. Oops … there were a number of nodding heads in the room, which I took to mean some of those present had been burned in that manner. But never again, as Yael gave us valuable advice about how to keep that from happening.

Before getting down to specifics, Yael cautioned that “this technique is not meant to replace proper authority control, use of URL checkers, etc.” She makes use of two readily-available tools in her analysis: MarcEdit (a free editing utility available for download at http://oregonstate.edu/~reeset/marcedit/html/) and Microsoft Excel (spreadsheet software). She emphasized repeatedly how essential it is that you save a copy of your original file of records before you start rearranging it and that you save each iteration of a file.

The PowerPoint handout Yael prepared is excellent, so I am not going to spend time here on details you can more easily see there. It is available at: http://tsvbr.pbwiki.com/Batchload+Analysis

The approach to record set analysis was presented in three steps:
* step 1: Examine several individual records
* step 2: Count fields in file
* step 3: View isolated fields

The first step is important and should almost go without saying. Step 2 is a quick way to verify the number of occurrences of certain fields. For example, if you have 100 records in your batch, there must be 100 each of required fields, such as the 245 (title) and 856 (URL). If there are less, that is a big red flag! The “What’s wrong with this picture?” examples on the slides are very revealing.

I especially like the subtitle on the slides for step 3: The power of eyeballing. The value of isolating fields for analysis became clear immediately when each individual field was removed from its record and grouped together with its counterparts. When all the same fields are sorted together, the errors and inconsistencies truly do just jump out at you—amazing!

Yael shared helpful tips on how to cleanup those errors and inconsistencies using the global update capabilities of MarcEdit. Unfortunately it is not possible to view the changes in MarcEdit before you apply them, so she recommended doing that in your ILS instead. She concluded by giving a general overview of the work of the TS-SIS Task Group on Vendor-Supplied Bibliographic Records (http://www.aallnet.org/sis/tssis/committees/cataloging/vendorbibrecords/) which has setup a wiki (http://tsvbr.pbwiki.com/) in order to share the results of such batch-load analysis.

There wasn’t much time for questions: Should a batch be analyzed every time you are ready to load it? Yes. But there were a few comments, one of which was that MarcEdit cannot be used with some ILSs unless the whole database is extracted. The session closed with a comment about the fact that these batches are creating many duplicates for the same content in our catalogs. The aggregator-neutral record approach for e-resources (both serials and monographs) was mentioned, but naturally that raises other complexities for which there is no easy solution at present. Many thanks to OBS and TS for sponsoring this excellent program!

1 comment:

  1. Anonymous4:56 PM

    Thanks, Ellen, for a great description of the program.

    A word about MarcEdit: during several conversations with people who have attended my presentation at AALL, I kept hearing the sentence "I would like to do some of the things you demonstrated in your program, but MarcEdit does not work well with our ILS".

    I therefore thought that some clarification was in order. MarcEdit does not work "with" any ILS. It's a stand-alone program that you can use for viewing records and globally updating them BEFORE you load them into your system.

    You can also output a file of records from your ILS, save it on your desktop, open it in MarcEdit, make your changes, then load the updated file back into your system.

    MarcEdit is free and can be downloaded at http://oregonstate.edu/~reeset/marcedit/html/downloads.html.

    ReplyDelete