Article citation:
Dunsire, G., Hillman, D., Phipps, J. & Coyle, K. (2011). A reconsideration of mapping in a semantic world. Proc. Int’l Conf. on Dublin Core and Metadata Applications 2011, 26-36.
Summary:
Dunsire, et al. discuss earlier methods of cross-walking metadata and compare it to the type of mapping potentially available in an open Semantic Web environment.
Quoted abstract:
For much of the past decade, attempts to corral the explosion of new metadata schemas (or formats) have been notably unsuccessful. Concerns about interoperability in this diverse and rapidly changing environment continue, with strategies based on syntactic crosswalks becoming more sophisticated even as the ground beneath library data shifts further towards the Semantic Web. This paper will review the state of the art of traditional crosswalking strategies, examine lessons learned, and suggest how some changes in approach–from record-based to statement-based, and from syntax-based to semantic-based–can make a significant difference in the outcome. The paper will also describe a semantic mapping service now under development.
Discussion:
Dunsire et al. begin the article by talking about the large amount of legacy data in various formats that has been generated and the need to crosswalk the data between those formats. In the traditional cataloging environment, data values have to be translated from one closed system based on a single metadata standard to another closed system based on another different metadata standard. So the elements have to be mapped between the systems and then the data values transformed between the systems to match different constraints of the elements in each system. Unfortunately, problems occur when the elements in each system do not have exactly the same meaning. There is no way to translate “approximately” when building a traditional crosswalk.
Additionally, the mapping of elements between systems is normally done separately from and outside of the method used to transform the data values.
Maps are developed, ingested and maintained as documents (usually spreadsheets) that are not actionable. Thus a further, but separate, step beyond the intellectual process of creating a map is the creation of programs that implement the mapping and transform data based on the decisions made during the creation of the maps. (p. 27)
They propose instead using RDF as a basis for developing maps.
What occurs to me though, is that the legacy data, by definition, cannot exist in the RDF/Linked Data world because it is just strings. Without the linkages, it is invisible. I wonder, does this mean that someone(s) must create metadata elements that correspond to the MARC fields and subfields to allow the relationships (maps) to be built? I think that something like this is happening in the Open Metadata Registry. I also think this is part of what Karen Coyle is always trying to get people to think about, but I am too lazy to go search for references right now.
Dunsire at al. discuss existing cross-walk strategies used for legacy data. Existing cross-walking strategies are generally top-down and include “pairwise” or “peer to peer” and “hub-and-spoke” or “switch” strategies. Pairwise strategies have only a one-to-one equivalency between element sets, this is their great weakness. Switch strategies allow more nuanced linkages but at the cost of an extremely large vocabulary.
Most existing crosswalks implement a peer to peer strategy. Because there has to be a one-to-one equivalence between the two mapped elements, there is no way to express “almost the same”. Switch strategies get around this by using a central vocabulary that can be used between any other two sets of elements. Unfortunately, this results in an extremely unwieldy (and ever-expanding) vocabulary to manage all of the possible equivalencies between all of the elements.
Dunsire et al. envision a process where elements can be related to each other in an ad hoc way through a series of RDF graphs.
A mapping is made to an appropriate existing RDF graph using OWL equivalence properties, as with a switch vocabulary in the top-down scenario. However, that graph need not, in itself, contain a mapping to the target vocabulary. Instead, the graph is “mapped” or connected to another graph, and so on until a graph containing the target concept is reached. (p. 28)
But this seems like it could be problematic to me. Is it the friend of a friend problem? When A knows B and B knows C, does C know A? Or this a problem with my thinking because I am going back them being equivalent rather than just somehow related? Because I am using the same relationship between A to B and B to C. So, if A knows B and B has heard of C, how are A and C related? Maybe not at all. Certainly, no assumption can be made about the relationship of A and C based on the relationships between A to B and B to C in this case. I think that without tight controls on the system, this process could easily go off the rails.
They specifically state that the mapping should be done with “OWL equivalence properties” so maybe I just need to go look at the OWL documentation to better understand how this type of mapping would work. The OWL documentation has been on my to-do list for awhile now, along with many, many other things. *sigh*
Also, the relationship between any two elements still has to be pre-defined, so in that sense, it is not really any more ad hoc than the switch vocabulary setup.
In an RDF environment, we can create relationships between the elements and use all elements (regardless of origin) together rather than translating the data value contained in one element to a different value for a different element in a different set.
In an open, non-transformative world, mapping applications can choose to treat all relationships as statements of equivalence or near-equivalence, or they can make use of more complex relationships in their applications. (p. 30)
Dunsire et al. give the example of DCMI’s Simple DC and DC Terms. These are two separate sets of elements that are related by property and subproperty.
The important thing to note is that the data values can stay the same, they are not required to be transformed into a different form for a different element set. They stay in their original element and the elements themselves have defined relationships within the system. So we end up with a collection of elements from different sets all being intermingled together rather than elements from just one set. For example, rather than having a MARC “record” to display, we might have a display assembled from individual elements taken from Dublin Core, RDA and FOAF. Hmm, it will be interesting to see how data storage in this new environment shakes out.
This mapping is done at the RDFs/OWL/SKOS level as part of the software programming. Setting up the semantics would not really be an on-the-fly sort of thing. But at the same time, it means that the whole process is integrated into one package together without the need to maintain separate documents in multiple places. I really like the idea that, even if transformations of the data values do take place, the original semantics of the elements/values would not be lost since it would all be connected within one application.
Dunsire et al. propose mappings be built on the relationship of one element refining another element; in the same sense that dct “dateCopyrighted” refines dc “date”. They don’t have exactly the same meaning, but they are related. How this might work is demonstrated in a figure showing possible relationships between the element “extent” in various element sets. I can see lots more wrangling over semantics in our future.
But I don’t think that a system that does this actually exists anywhere yet. They discuss how a toolkit/application would need to support the process of building the relationships by providing suggestions and constraints for users trying to map various properties. Maybe the eXtensible Catalog does some of this?
Dunsire et al. state that there is no need for authoritative mapping. Any given element can map to differing semantics at the same time. They cite the fact that are two versions of FRBR defined in RDF. This makes my cataloger’s heart want to curl up and hide. La, la, la, la, I can’t hear you! It sounds like chaos, but in reality, probably not. Conflicting semantics won’t be used in the same application and possibly not even in the same community (though this where the wrangling over semantics might come in again). Anyone can do anything, but that doesn’t mean that I have take everyone into account when I do my thing. So the “authoritative mapping” will still exist, but more likely, I’m guessing, as a consensus locally or within a community.
Finally, Dunsire et al. note that while the mapping process in the RDF environment is fundamentally different, it is still not a simple process. Many of the same problems still exist in the new environment.
There are issues of provenance and authorship of the map, version control and change management over time, the editorial and publishing cycle, management of group authorship and roles within the group including discussion and voting, and even evaluation of the validity of individual mapping statements based on the declared domains and ranges of mapping predicates. (p. 33)
The Open Metadata Registry is being developed to support this functionality for users to do this kind of semantic mapping between elements. Assuming that the eXtensible Catalog has mappings, I wonder if they allow or plan to allow administrative users (not the end-users) to add custom mappings into the system.
Dunsire at al. finish the article with four questions for further discussion:
• What will be the relationship of Application Profiles, specifying how sets of data elements should be assembled into packages or “records” for particular applications, to this ecology of mapping? Will communities wish to designate mappings that reflect their metadata point of view?
• We see the value of mappings as independent ontological statements with visible authority and ownership separate from the originating ontologies. Is there a value too, to formal endorsement in this environment?
• How does the mapping of individuals in value vocabularies fit in? Can these techniques be applied to value vocabularies in a useful way? Is the value different or less?
• What is the value of metadata registries such as the Open Metadata Registry, id.loc.gov, the Dublin Core registry, and vocab.org, in this environment. Can tools based within those registries encourage the growth of this environment? (p. 35)