OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a protocol developed by the Open Archives Initiative. It is used to harvest (or collect) the metadata descriptions of the records in an archive so that services can be built using metadata from many archives.
The protocol is usually just referred to as the OAI Protocol.
History[edit | edit source]
This summary was largely drawn from (Lynch, 2001).
In the late 1990s, Herbert Van de Sompel (Ghent University) was working with researchers and librarians at Los Alamos National Laboratory (US) and called a meeting to address difficulties related to interoperability issues of e-print servers and digital repositories. The meeting was held in Santa Fe, New Mexico, in October 1999. A key development from the meeting was the definition of an interface that permitted e-print servers to expose metadata for the papers it held in a structured fashion so other repositories could identify and copy papers of interest with each other. This interface/protocol was named the "Santa Fe Convention".
Several workshops were held in 2000 at the ACM Digital Libraries conference and elsewhere to share the ideas from the Santa Fe Convention. It was discovered at the workshops that the problems faced by the e-print community were also shared by libraries, museums, journal publishers, and others who needed to share distributed resources. To address these needs, the Coalition for Networked Information and the Digital Library Federation provided funding to establish an Open Archives Initiative (OAI) secretariat managed by Herbert Van de Sompel and Carl Lagoze. The OAI held a meeting at Cornell University (Ithaca, New York) in September 2000 to improve the interface developed at the Santa Fe Convention. The specifications were refined over e-mail.
OAI-PMH version 1.0 was introduced to the public in January 2001 at a workshop in Washington D.C., and another in February in Berlin, Germany. Subsequent modifications to the XML standard by the W3C required making minor modifications to OAI-PMH resulting in version 1.1. The current version, 2.0, was released in June 2002. It contained several technical changes and enhancements and is not backward compatible.
OAI registries[edit | edit source]
The OAI Protocol has become widely adopted by many digital libraries, institutional repositories, and digital archives. Although registration is not mandatory, it is encouraged.
There are several large registries of OAI-compliant repositories:
- The Open Archives list of registered OAI repositories
- The OAI registry at University of Illinois at Urbana-Champaign
- The Celestial OAI registry
- Eprint’s Institutional Archives Registry
- Openarchives.eu The European Guide to OAI-PMH compliant repositories in the world
- ScientificCommons.org A worldwide service and registry
Uses[edit | edit source]
Commercial search engines have started using OAI-PMH to acquire more resources. Google has started to accept OAI-PMH as part of their Sitemap Protocol, and they are using OAI-PMH to harvest information from the National Library of Australia Digital Object Repository. In 2004, Yahoo! acquired content from OAIster (University of Michigan) that was obtained through metadata harvesting with OAI-PMH.
Software[edit | edit source]
OAI-PMH is based on a client-server architecture, in which "harvesters" request information on updated records from "repositories". Requests for data can be based on a datestamp range, and can be restricted to named sets defined by the provider. Data providers are required to provide XML metadata in Dublin Core format, and may also provide it in other XML formats.
Archives[edit | edit source]
Workshops[edit | edit source]
References[edit | edit source]
- Lynch, Clifford A. (2001). "Metadata harvesting and the open archives initiative". ARL Bimonthly Report 217.
See also[edit | edit source]
- Data Format Management
- Digital curation
- Digital preservation
- File format
- Library of Congress Digital Library project
- National Digital Information Infrastructure and Preservation Program
- Web archiving
[edit | edit source]
- National Library of Congress, Digital Collections and Programs
- Library of Congress, National Digital Information Infrastructure and Preservation Program
- Library of Congress, Web Capture
- Protocol specification