Commons:Structured data/Archive/2014/Berlin Bootcamp

How can we make multimedia data easier to use on Wikimedia Commons, Wikipedia and sister sites?

Today, information about media files on Wikimedia sites is stored in unstructured formats that cause a range of issues: for example, file information is hard to search, some of it is only available in English, and it is difficult to edit or re-use files to comply with their license terms.

To address these issues, a week-long bootcamp took place in Berlin from October 5 to 10, 2014. Participants included community volunteers, as well as members of the Wikidata and Multimedia teams.

Group photo of participants at the Structured Data Bootcamp in Berlin, October 5-10, 2014

Group photo of participants at the Structured Data Bootcamp in Berlin, October 5-10, 2014. See more photos.

The focus of this event was to investigate how to structure data on Wikimedia Commons, reusing the same technology as the one developed for Wikidata. Participants collaborated in small workgroups to explore a range of problems and solutions, in parallel sessions focused on community, design, engineering, licensing and product management challenges.

Structured Data project slides

Each workgroup produced concrete examples of how these ideas could be implemented, including:

  • first ideas for data models for structuring file information, to make it both license-compliant and machine-readable
  • first user interface designs for viewing and editing structured data seamlessly, alongside unstructured data
  • a working prototype of a high-level API, for reading and updating metadata about media files
  • improvements to a prototype dashboard identifying files missing machine-readable metadata.

These preliminary ideas are now being documented on Commons so we, the Commons community, can all use them as an initial basis for an informed discussion. We may end up collectively changing or rewriting these preliminary requirements, designs and initial code as part of that discussion. For a project overview, check out this development page and these project slides.

The bootcamp was very productive, but many questions remain unanswered. The current thinking is that the Structured Data project could take several years to complete. A gradual development process seems preferable, to take time to build this properly and to minimize disruption. Next steps include community discussions, design, prototype building and testing, and a series of experiments with structured data formats before starting actual development and data migration.

Everyone is invited to get involved in this important project. The Structured data hub is the best place to get started; please consider adding it (and related pages) to your watchlist -- and signing-up for the newsletter. Your ideas and comments are much welcome, and developers would love your active participation in defining and guiding this project.

We look forward to working together to better support the needs of our users and modernize our multimedia infrastructure together.

Participants edit

Participants in this Berlin bootcamp included users from the Wikimedia Community (from Commons and other projects), the Multimedia Team (from the Wikimedia Foundation) and the Wikidata Team (from Wikimedia Deutschland).

Wikimedia community members
  • Multichill — administrator on Commons, Wikidata
  • TheDJ — administrator on Commons
Multimedia team
Wikidata team
Europeana representative
  • Hugo Manguinhas

Notes edit

Etherpad notes from each day of the meeting:

Photos edit