DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows users to ask expressive queries against Wikipedia and to interlink other datasets on the Web with DBpedia data.
The DBpedia Dataset
Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates, categorisation information, images, geo-coordinates and links to external Web pages. This structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content.
The DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages, 180,000 external links into other RDF datasets, 207,000 Wikipedia categories and 75,000 YAGO categories.
The DBpedia project uses the Resource Description Framework as a flexible data model for representing extracted information and for publishing it on the Web. As of September 2007, the DBpedia dataset consists of around 103 million RDF triples, which have been extracted from the English, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian versions of Wikipedia.
The DBpedia dataset is available under the terms of the GNU Free Documentation License.
The DBpedia dataset is interlinked on RDF level with various other Open Data datasets on the Web. This enables applications to enrich DBpedia data with data from these datasets. As of June 2007, DBpedia is interlinked with the following datasets: GeoNames, Musicbrainz, CIA World Fact Book, DBLP, Project Gutenberg, DBtune Jamendo and Eurostat as well as US Census data. See DBpedia website and W3C SWEO Linking Open Data Community Project for details about interlinked datasets.
Accessing the DBpedia Dataset
The DBpedia dataset can be accessed using three different access mechanisms:
- SPARQL Endpoint. There is a public SPARQL endpoint which enables you to query the dataset using the SPARQL query language. You can use the SNORQL query explorer to ask queries against the endpoint (does not work with Internet Explorer). Several example queries are found on the DBpedia website.
- Linked Data Interface. DBpedia is also served as Linked Data, meaning that you can use Semantic Web browsers like Tabulator, DISCO or the Open Link Data Browser to navigate the dataset.
- Downloads. The DBpedia dataset can also be downloaded from the DBpedia website.
- Metaweb's Freebase project
- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives: DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference (ISWC 2007), Busan, Korea, November 2007.
- Christian Bizer et al.: DBpedia - Querying Wikipedia like a Database. Developers track presentation at WWW2007.
- Sören Auer, Jens Lehmann: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. Paper at ESWC 2007.
- Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: Yago: A Core of Semantic Knowledge - Unifying WordNet and Wikipedia. Paper at WWW2007.
- Christian Bizer et al.: Interlinking Open Data on the Web (Poster). Poster at ESWC 2007.