Between May and August 2014, I worked on a Google Summer of Code project to incorporate metadata from the Commons into the structured content collected by DBpedia. I worked on a Github fork, using Github issues to track the project. This work has now been published in the 14th International Semantic Web Conference, Bethlehem, PA, USA (doi:10.1007/978-3-319-25010-6_17).

Deliverables

edit

Completed

edit
  • Added support for the Wikimedia Commons to the DBpedia Extraction Framework

Abandoned

edit
  • Examined the file metadata dump to see if there was any interesting metadata there (there wasn't): #21
  • Handle disambiguation pages on the Commons (there are only ~5,000, so we decided not to work on this): #24
  • Extract image captions from the language Wikipedias (ran out of time): #27
  • Writing tests for the FileTypeExtractor and LabelExtractor: #8, #25
  • Mapped template mappings for the top-10 most used templates (ran out of time): #7
  • Propose a new scheme for linking objects in DBpedia through URI-based identifiers to the rest of the Web of Things (ran out of time): #15

Other outputs

edit

Google Summer of Code Mentors

edit

Contact

edit