Commons:Wikidata for media info

Below is a proposal from the Wikidata software development team of how media meta data could be maintained using the same technology that powers wikidata.org. Note however that the meta data about files would be maintained on Wikimedia Commons itself, not on wikidata.org.

Please use the talk page for any discussion about maintaining media meta data using the Wikibase extension.

Proposal edit

For a long time, the Commons Community has been looking for a way to maintain and export meta-data about media files in a machine-readable way. Among other things, this would make it a lot easier to show the appropriate attribution and license information when re-using the file. Also, associating a file with a geo-location or a Wikidata item allows for new ways of searching and browsing files on Commons.

Since with Wikidata we now have a way to maintain structured data in MediaWiki, we would like to propose to use this mechanism for maintaining media meta-data on Commons.

In the following, technical terms are written in italics, and have a specific meaning as given by the Wikidata glossary. The following proposal is a bit technically dense, but we hope that we can clarify and explain it based on your feedback.

Wikidata currently has two entity types, and a third one is planned: items, properties, and queries. In order to support Commons meta-data, we plan to introduce another entity type, media-info. Each file on commons can have a media-info entity associated with it, which would reside on a sub-page of the file description page. E.g. the meta-data for File:Berlin.jpg would be located at File:Berlin.jpg/info. The meta-data is bound to the image description and would be moved and deleted along with it. It does, however, not need to exist, and has its own history, protection status, etc. like any other wiki page.

The media-info entities are similar to data items on Wikidata, but they do not have labels, aliases or descriptions. They can have any number of statements (values of properties along with qualifiers and references). The statements can be used to describe the file's provenance as well as its subject and context.

Of course, this meta-data would be useless if it could not be accessed. The most obvious place where this is needed would be the file description page itself. The mechanism for accessing properties of the meta-data will be the same that is currently used by Wikipedia to access information from Wikidata: a Lua library to be used to build templates that automatically include and format the relevant information. Eventually, it is planned to allow this also from other wikis that use Commons media, so it becomes possible to show the meta data automatically alongside the media file. Also, external Websites could access the meta data through the Commons API.

Internationalisation and Localisation edit

Internationalisation is built in the Wikibase software. Each Property has a code (P1234) and it can have multiple labels, one in each language. (The WMF language committee has agreed that Wikidata can be localised in any language, not just in the 280 languages that have wikipedias). When you go to CommonsData you will see the file property statements in your language. If one of the property labels has not been translated into your language you will see a fallback language. You can add the label in your language for that property and that translation immediately becomes available for every Commons file that has a statement using that property.

Commons will be searchable and usable in any language in a way which it isn't at the moment.

Usage examples edit

The properties that can be used to make statements about media files will be created and maintained by the community, just like this is now the case on Wikidata. Here are a few examples of what could be useful:

  • Provenance:
    • Creator (item): who created the file; Creator-name could be used when there is no data item for the creator.
    • License (item): the license that applies to the files
    • Work (item): a work the file shows or reproduces
    • Derivative of (item|media-info): what the work is derived from
  • Context:
    • Created (time): when the file was created
    • Location (geo-coordinate): where the file was created
  • Subject:
    • Description (multi-lang): free text description of the media file. No wikitext markup. This is used instead of the "editorial" descriptions we have on Wikidata, to allow for different descriptions from different sources; This is especially important when the description was imported from a 3rd party.
    • Topic (item): topic of the file. Can be used like tags.
  • Technical:
    • Aperture, etc. Might be taken from EXIF using a bot.

Note that unlike on Wikidata, statements will frequently be "self-sourced": the creator of a photograph might upload the file herself, and also provide the location, time, description, etc. In contrast, when meta data is imported from an archive or similar along with the image itself, the original source of the statements (description, authorship, etc) would be set to be the respective archive.

Note that just like with Wikidata, we strive to provide a powerful yet flexible framework that gives the community a platform on which to build structured data. The basic data structure is ignorant of the requirements of media curation, it is left to the community to create the required properties and establish the appropriate procedures and usage practices.

We hope you regard this as an invitation to discuss the proposal and to identify use cases that we do not cover with it, in order to improve the data model.

See also edit