Commons:Structured data/WMSE white paper on Structured Data on Commons/Wiki Loves Monuments

Case study 1: Wiki Loves Monuments

In a nutshell

  • We present a workflow to add SDC statements to photos from Wiki Loves Monuments competitions, including modelling the competition editions on Wikidata.
  • SDC gives us additional insight into photos from competitions; we can learn more about what the participants choose to photograph.

Why Wiki Loves Monuments?

edit

Wiki Loves Monuments has been running since 2010, resulting in over 2,5 million photos from over 50 countries. The result is a unique set of photographs depicting cultural heritage monuments all around the world that are available for everyone to re-use – not only in Wikipedia articles, but also in books, magazines and websites. With every year resulting in thousands of new photographs, organizing them has been an increasingly important topic of discussion among Wikimedians.

The category system on Wikimedia Commons, albeit useful, is a blunt tool. Whereas some of the monuments have their own dedicated categories (e.g. Category:Härnäsets folkets hus), many end up in general categories such as Category:Listed buildings in Mellerud Municipality. It's up to the uploader to place their photos in any relevant categories, and even if community members are eager to help out and organize the photos, a file can be overlooked for a long time, especially if originally placed in a very high-level category. Furthermore, the category names are usually either in English or in the language of the country, making it harder for uploaders, especially new ones, to identify correct ones. A Russian speaker photographing the Gothenburg Cathedral will find it hard to locate the appropriate category, Category:Göteborgs domkyrka, without taking a detour from the Russian Wikipedia article, Кафедральный собор Гётеборга, via Wikidata. What is an obvious workflow for experienced Wikimedians makes it hard for newcomers to share and annotate their material. And if the photo only gets a title and description in Russian, Wikimedia Commons users who don't speak the language will find it hard to find and re-use, or put in a more specific category.

Structured Data on Commons solves this problem, as it takes advantage of the multilinguality of Wikidata, making it easier to describe the content of photos and find them, lowering the threshold for editors from diverse language backgrounds. Depicts, one of the many properties available to SDC editors, has been in the centre of this development. Depicts statements are central in the new MediaSearch implementation being developed right now, which since recently has also become part of the multimedia search interface in Visual Editor. Every day, Wikimedians add depicts statements to files, making the new search engine more useful.

That's why Wikimedia Sverige decided to include WLM photos in our project to explore, enrich and document Structured Data on Commons. By adding depicts statements to those files we could make a real contribution to the Wikimedia platforms, making it easier for Wikimedians and other users alike to find interesting and relevant photos of cultural heritage. The size of the dataset gave us a possibility to examine the challenges of editing SDC on a large scale. Another contributing factor was that WLM photos are well-suited for converting their information to structured data statements. Every photo uploaded as part of the competition is accompanied by templates stating which local edition of WLM it belongs to, as well as which monument it depicts – in the form of a unique code. Sometimes the code is the monument's identifier in some external source, such as in the case of Swedish archaeological sites – its number in the Swedish National Heritage Board's database Fornsök; or in the case of Israeli monuments – its Wiki Loves Monuments ID on Wikidata. Over the years, the international WLM community has built a unique database of cultural heritage monuments and their identifiers, some of which has been migrated to Wikidata. Using this infrastructure to enrich the photos with structured data statements is a logical next step towards making the photos more findable and accessible.

Wiki Loves competitions on Wikidata

edit

In order to be able to say which competition the photo participated in, it was first necessary to create Wikidata items for all the regional editions, for example Wiki Loves Monuments 2020 in Sweden. The praxis of modelling competition participation data is to add two statements to every photo, for both the early edition (Wiki Loves Monuments 2020) and the regional one (Wiki Loves Monuments 2020 in Sweden). This makes it easier to query and analyze the data.

Our initial survey revealed that very few Wikidata items of the regional competitions existed, so the first step was to create those for the selected countries. Each of those is modelled as following:

Most of this information could be created semi-automatically, using the category tree of Category:Wiki Loves Monuments in Sweden and creating the statements in OpenRefine. Since regional competitions do not necessarily have to run from 1 to 30 September, we used the module WL data to retrieve the actual start and end dates. The module contains data about all the Wiki Loves competitions over the years and is used by statistics tools such as http://tools.wmflabs.org/wikiloves/ .

It was not our goal to cover all of the Wiki Loves Monuments photos. Since our goal in the Tools for Partnerships project was to "only" add 250.000 SDC statements, we could focus on a smaller number of photos as we developed a reusable workflow and identified any country-specific idiosyncrasies.

We decided to focus on the following countries: Sweden, as we are most familiar with it, having organized our country's Wiki Loves competitions for several years; as well as Israel and Poland, as they have both contributed large numbers of photos, 40.000 and 170.000 respectively. What these countries have in common is that they have good coverage of the monuments on Wikidata, built partially by Wikimedia Sverige as part of our project Connected Open Heritage a couple years ago, or by volunteers. Being able to match the monument IDs in the photos to their corresponding Wikidata items was a necessary prerequisite to automatically create depicts statements.

WLM as an example of SDC in user-created photos – who does what?

edit

Ever since Structured Data on Commons was implemented, several users have launched bots to retrieve some of the information from the file descriptions and convert it to structured data statements. They have been working on the more machine-readable pieces of information, such as:

Since these bots, such as BotMultichill and SchlurcherBot, are running continuously, we decided to focus on other types of SDC statements that are not included in their workflow. In our project, we added the following types of statements:

The competition information was extracted from the {{Wiki Loves Monuments}} template, which is obligatory for all the photos. For example, {{Wiki Loves Monuments 2020|il}} provides the information that the photo was submitted to Wiki Loves Monuments 2020 in Israel (Q105730444). As stated previously, both Wiki Loves Monuments 2020 in Israel (Q105730444) and Wiki Loves Monuments 2020 (Q66975112) are added as values of participant in (P1344) – see the discussion at the bot request page.

The depicts information was extracted from the monument template that each photo also must have in order to participate in the competition. The goal of the competition is to upload photos of identifiable monuments, after all. Each country or region has its own monument template(s); in the case of Israel, {{Heritage site in Israel}} is used. The regions also differ in how well the monuments are covered on Wikidata, which is a prerequisite for matching the photos with the Wikidata items. There is a number of countries with good coverage of WLM-eligible monuments on Wikidata, including Israel (where the Wikidata work was done by the community) and Sweden (where it was done by Wikimedia Sverige a couple years ago).

Adding location of creation was the next step, and it was only possible to do where we could add a depicts statement. That's because we used the located in the administrative territorial entity (P131) in the depicted monument's Wikidata item to find this information.

The following illustration shows how the templates in the file description pages of Israeli WLM photos were converted to SDC statements.

Results

edit
WLM Sweden photos that have coordinates of point of view, color-coded by competition year. Query.

We added SDC statements to all the WLM photos from Israel and Sweden, as well to Wiki Loves Earth photos from Sweden. We have also started the process with WLM photos from Poland, and we are currently in contact with the Polish WLM community to investigate how to process the specific templates they use.

Adding info about competition participation enables us to gain more insight into the participating photos and users, and the monuments depicted. In the map beside, you can see all the photos in WLM Sweden that have coordinates of the point of view (P1259), color-coded by the year of the competition. You can also run the live query and adjust it to your needs.

Next steps – covering more Wiki Loves photos

edit

The workflow for adding depicts and participant in statements we have presented and tested can be applied to Wiki Loves Monuments photos from other countries, as well as photos from other Wiki Loves competitions, such as Wiki Loves Earth or Wiki Loves Africa. To achieve the best results, it should be done by someone who is familiar with the coverage of the local monuments on Wikidata and with how the monument templates are structured, for example the organizers or volunteers in the local competitions, possibly with support from Wikimedia Sverige. The more photos get enriched with this sort of data, the more interesting and nuanced analyses can be made about the photos and the depicted monuments, which could increase interest in the photo competitions as well as help identify new approaches, such as focusing on monuments that have not yet been photographed.

At the same time, there has been a marked interest in migrating the WLM database to Wikidata. In some countries, this process has been completed and the Wikidata items are maintained and updated continuously; in others it has yet to be started. As a good coverage of the monuments on Wikidata is a prerequisite to adding depicts statements, we imagine that communities interested in "wikidatafying" their WLM competitions might want to kill two birds with one stone and work actively on adding depicts statements to their photos.