ALA 2012: FRBR Presentation One – Cataloging is not Sexy

“Current Research on and Use of FRBR in Libraries”
8am on Sunday, June 24, 2012
Speakers: Erik Mitchell & Carolyn McCallum, Thomas Hickey, Yin Zhang & Athena Salaba, Jennifer Bowen

This is the first of four presentations given at this session.

“FRBRizing Mark Twain”
Erik Mitchell & Carolyn McCallum

The presentation slides are available on Slideshare or view them in the embedded slideshow below.

Current Research on and Use of FRBR in Libraries from cjmccallum

Erik Mitchell and Carolyn McCallum discussed their project to apply the FRBR model to a group of records relating to Mark Twain. McCallum organized the data manually while Mitchell created a program to do it in an automated fashion. They then compared the results. This presentation covered:

Metadata issues that arose from applying FRBR
Issues in migration
Comparison of the automated technique to an expert’s manual analysis

Carolyn McCallum spoke first about the manual processing portion of the project.

For this project, they focused on the Group 1 entities (work, expression, manifestation and item). They extracted 848 records from the Z. Smith Reynolds Library catalog at Wake Forest University for publications that were either by Mark Twain or about him. Using Mark Twain ensured that the data set had enough complexity to reveal any problems. The expert cataloger then grouped the metadata into worksets using titles and the OCLC FRBR key.

In the cataloger’s assessment, there were 410 records that grouped into 147 total worksets (each one having 2 or more expressions). The other 420 records sorted out into worksets with only one expression each. The largest worksets were for Huckleberry Finn (26 records) and Tom Sawyer (14 records). The most useful metadata was title, author, and a combination of title and author.

A couple of problems that were identified in the process were that whole to part and expression to manifestation were not expressed consistently across the records and that determining boundaries between entities was difficult. The line where one work changes enough to become another expression or even a completely different work can be open to interpretation. McCallum suggested that the entity classification should be guided by the needs of the local collection.

Mitchell then spoke about the automated version of the processing.

Comparison keys comprised of the OCLC FRBR keys (author & title) were again used to cluster records into worksets. The results were not as good as the manual expert process but were acceptable and comparable to OCLC’s results. To improve the results using the automated process, they built a Python script to extract normalized FRBR keys out of the MARC data and compared those keys. This did improve the results.

In conclusion, Mitchell noted that the metadata quality is not so much a problem as the intellectual content. The complex relationships between the various works/expressions/manifestations are simply not described by the metadata. Both methods, manual and automated are time and resource consuming. Finally, new data models, like Linked Data, “are changing our view of MARC metadata” (slide 21).

Question from the audience about problems [with the modeling process?]
Answer: Process could not deal well with multiple authors.

Other related links:
McCallum’s summary of their presentation (about halfway through the post).
A poster from the ASIS&T Annual Meeting in 2011