Commons:Koninklijke Bibliotheek/Machine access
Home | Overview of our media files | Our media donations | Our SDoC efforts | Machine access | Uptake, metrics and reuse | Case studies & stories | About KB / Contact | Our project pages on other Wikis | All pages |
Machine access to our media files on Commons (API, SPARQL etc.)
editThis pages illustrates some approaches to machine interactions with KB media files and categories, both for
- requesting/querying data from Commons, and
- writing/posting data to Commons.
Requesting/querying data from Commons
editWe can request data from Commons using the Wikimedia Commons query API, Commons SPARQL service and readily available tools. The following examples are taken from the article 50 cool new things you can now do with KB’s collection highlights - Part 5, Reuse. The references below refer to the numbering in this article.
Requesting images and URLs from readily available bulk image download tools
edit(item 42 in the article)
The easiest way for obtaining hires image URLs and/or the hires images themselves from a specific Commons category is using readily available bulk image download tools.
- Using the Wiki Loves Downloads tool you can easily get all the direct download URLs of the hires images of eg. the Reward letter of King Filip II of Spain to family of Balthasar Gerards, 1590 (but any category will do).
- Because this tool was developed by Wikimedia Deutschland, the default user interface is in German. We can use Google Translate to get a English user interface for international audiences. As stated in the tool, it divides the images of a category from Wikimedia Commons into a desired number of lists and generates these in the form of (zipped) text files with links to the respective images so that they can be downloaded with the help of a download manager (or a script).
- If you prefer the images themselves rather than only the URLs, the Java based Imker bulk image download tool is the way to go. It downloads all images from a specific category (or page) on Wikimedia Commons (or any other Wikimedia site) to your local machine.
Requesting image URLs from the Wikimedia Commons query API
edit(item 41 in the article)
Let’s look at three ways for requesting various URLs of images in a specific Commons category from the Wikimedia Commons query API using this documentation.
1) For the images in Category:Armorial de Beyeren we can request a simple list of images
- in JSON: https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmlimit=500&gcmtitle=Category:Armorial%20de%20Beyeren&format=json&gcmnamespace=6
- in XML: https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmlimit=500&gcmtitle=Category:Armorial%20de%20Beyeren&format=xml&gcmnamespace=6.
- Here we are filtering on namespace=6 (ns=6), so we are looking for files only (ie. no categories (ns=14) or galleries (ns=0)).
2) The downside of the above reponses is that they do not contain URLs (starting with https://), but only Wikimedia Commons file names. To request https URLs we need to modify our API call. When we apply the modified call to images of Visboek Coenen,
- we get this XML reponse
- Please note that this reponse contains both direct download URLs of the hires images (example), as well as two forms of URLs of the Wikimedia Commons files page (example).
3) Besides calling the API by a URL query string (as done above), it is also possible to do this via a Python script. For instance, we can request the miniatures from Chroniques de Froissart:
#!/usr/bin/python3 import requests, json S = requests.Session() URL = "https://commons.wikimedia.org/w/api.php" PARAMS = { "action": "query", "gcmtitle":"Category:Chroniques_de_Froissart,_vol_1_-_Den_Haag,_KB_:_72_A_25_(details)", "gcmlimit":"500", "generator":"categorymembers", "format":"json", "gcmtype":"file", "prop":"imageinfo", "iiprop":"url" } R = S.get(url=URL, params=PARAMS) DATA = R.json() PAGES = DATA.get('query').get('pages') print(json.dumps(PAGES, indent=2))
- Running this script in (for instance) the Pycharm Python IDE gives the following output:
Generating an off-Wiki HTML image gallery from the Wikimedia Commons API
edit(item 46 in the article - taken from the tutorial Reusing the album amicorum Jacob Heyblocq - Image gallery of album contributors on Github.)
Let's say you want create an off-Wiki HTML image gallery from the images in a specific Commons category. For instrance from Category:Contributors to the album amicorum Jacobus Heyblocq (but any other category will do as well)
You can do this using the Commons API. From the response of the query API and using this Python script we can generate a basic HTML image gallery/facebook of contributors.
The resulting HTML page looks like this (d.d. 12-05-2021) :
Retrieving P180-Depicted entities in structured file data using SPARQL and the Commons API
edit(item 47 in the article)
For many of the files in Category:Atlas de Wit 1698 one of more Depicts (P180) statements from Wikidata have been added to the stuctured data. Futhermore, digital representation of (P6243) = Atlas De Wit 1698, collection KB (Q2520345) has been added to the structured data of all files. These two facts allow us to programmatically retrieve the things that are depicted in the atlas. We can do this in three ways:
- the Wikimedia Commons SPARQL query service,
- the Wikimedia Commons API or
- via the Petscan tool
- via the Minefield tool.
1) Wikimedia Commons SPARQL query service
editTo retrieve the depicted things via the Wikimedia Commons SPARQL query service, we use this query
#Things depicted in Atlas de Wit 1698
SELECT ?file (GROUP_CONCAT(DISTINCT ?depictionLabel ; separator = " -- ") as ?ThingsDepicted)
WHERE {
?file wdt:P6243 wd:Q2520345 . # digital representation of (P6243) = Atlas De Wit 1698, collection KB (Q2520345)
?file wdt:P180 ?depiction .
SERVICE <https://query.wikidata.org/sparql> {
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
?depiction rdfs:label ?depictionLabel.
?file rdfs:label ?fileLabel.
}
}
}
GROUP BY ?file
giving this result, which can also be requested as JSON.
2) Wikimedia Commons API
editThe Wikimedia Commons API allows us to retrieve depicted entities for individual images. Let’s use https://commons.wikimedia.org/wiki/File:Atlas_de_Wit_1698-pl048-Montfoort-KB_PPN_145205088.jpg as an example. As can be seen from the Concept URI link in the Tools navigation on the left, this file can also be requested via the URI https://commons.wikimedia.org/entity/M32093127, where ‘32093127’ is the Page ID that is listed in the Page information, also in the left hand navigation. This M-number is Wikimedia Commons’ equivalent of the Wikidata Q-number.
From that M-number (M+Page ID) we can request the (Wikidata Q-numbers of the) depicted entities via the API call https://commons.wikimedia.org/w/api.php?action=wbgetentities&format=json&ids=M32093127 as JSON:
If we want to list all things depicted in all images in Category:Atlas de Wit 1698, we can write a small Python script to iterate over all images in that category, using the API call we saw before to request the pageIDs and titles of the files in that category:
import requests
import json
baseurl = "https://commons.wikimedia.org/w/api.php?action="
cat = "Category:Atlas_de_Wit_1698"
headers = {'Accept' : 'application/json', 'User-Agent': 'User OlafJanssen - Category:Atlas_de_Wit_1698'}
filesurl= baseurl + "query&generator=categorymembers&gcmlimit=500&gcmtitle=" + cat + "&format=json&gcmnamespace=6"
files = requests.get(filesurl, headers=headers)
filesdata = json.loads(files.text)
pageids=list(filesdata['query']['pages'].keys())
for pageid in pageids:
mnumber="M"+str(pageid)
pageurl= baseurl + "wbgetentities&format=json&ids=" + str(mnumber)
pageresponse = requests.get(pageurl, headers=headers)
pagedata = json.loads(pageresponse.text)
pagetitle=pagedata.get('entities').get(mnumber).get('title')
p180s = pagedata.get('entities').get(mnumber).get('statements').get('P180', 'XX')
if str(p180s) != "XX":
depictslist=[]
for p in range(0, len(p180s)):
qnum= p180s[p]['mainsnak']['datavalue']['value']['id']
depictsurl = "https://www.wikidata.org/w/api.php?action=wbgetentities&ids=" + str(qnum) + "&props=labels&languages=en&format=json"
depictsresponse = requests.get(depictsurl, headers=headers)
depictsdata = json.loads(depictsresponse.text)
depicts = depictsdata.get('entities', 'XX').get(qnum).get('labels', 'XX').get('en', 'XX')
if str(depicts) != "XX":
a = str(depicts.get('value')) + " (" + str(qnum) + ")"
depictslist.append(a)
print(str(mnumber) + " || " + str(pagetitle) + " || " + ' -- '.join(depictslist))
This gives the following result:
M32246841 || File:Atlas de Wit 1698-pl017-Leiden-de burcht.jpg || tree (Q10884) -- peafowl (Q201251) -- dog (Q144) -- Burcht van Leiden (Q2345558) -- gate (Q53060) M32092934 || File:Atlas de Wit 1698-pl017-Leiden-KB PPN 145205088.jpg || Leiden (Q43631) -- Rhine (Q584) -- Oude Rijn (Q2478570) -- Nieuwe Rijn (Q671841) -- Nieuwe Rijn (Q57945772) -- Pieterskerk (Q1537972) -- Hooglandse Kerk (Q1537970) -- Rapenburg (Q2597656) -- Academy Building (Q2515805) -- Hortus Botanicus Leiden (Q2468128) -- Zijlpoort (Q2326072) -- bolwerk (Q891475) -- fortified town (Q677678) -- Morschpoort, Leiden (Q2688448) -- Marepoort (Q1817627) -- Burcht van Leiden (Q2345558) M32246845 || File:Atlas de Wit 1698-pl017-Leiden-Pieterskerk.jpg || Pieterskerk (Q1537972) -- tree (Q10884) -- dog (Q144) -- weather vane (Q524738) M32246848 || File:Atlas de Wit 1698-pl017-Leiden-St Pancraskerk.jpg || Hooglandse Kerk (Q1537970) -- cloud (Q8074) -- weathercock (Q2157687) -- leadlight (Q488094) -- door (Q36794) -- woman (Q467) -- child (Q7569) -- dog (Q144) -- hat (Q80151) -- carriage (Q235356) -- walking stick (Q1347864) -- horse (Q726) -- tree (Q10884) -- crow-stepped gable (Q1939660) -- clock (Q376) -- Burcht van Leiden (Q2345558) M32246852 || File:Atlas de Wit 1698-pl017-Leiden-stadhuis.jpg || Leiden City Hall (Q2191676) -- dog (Q144) -- crow-stepped gable (Q1939660) -- cow (Q11748378) M32092941 || File:Atlas de Wit 1698-pl017a-Leiden, Stadhuis-KB PPN 145205088.jpg || Leiden City Hall (Q2191676) -- Hooglandse Kerk (Q1537970) -- Leiden (Q43631) -- Burcht van Leiden (Q2345558) -- dog (Q144) -- cow (Q11748378) -- peafowl (Q201251) -- Pieterskerk (Q1537972) -- coach (Q4655519) -- crow-stepped gable (Q1939660) -- gate (Q53060) .... M32092951 || File:Atlas de Wit 1698-pl018-Amsterdam-KB PPN 145205088.jpg || Amsterdam (Q727) -- Royal Palace of Amsterdam (Q1056152) -- fortified town (Q677678) M32092959 || File:Atlas de Wit 1698-pl018a-Amsterdam, Dam-KB PPN 145205088.jpg || Dam Square (Q839050) -- Royal Palace of Amsterdam (Q1056152) -- dog (Q144) -- horse (Q726) -- fire extinguisher (Q190672) -- fire department (Q6498663) -- Nieuwe Kerk (Q1419675) -- weigh house (Q1407236) -- Oude Kerk (Q623558) -- pump (Q134574) -- fire hose (Q1410061) -- firewater (Q5452025) -- coat of arms of Amsterdam (Q683829) M32092960 || File:Atlas de Wit 1698-pl018b-Amsterdam, Stadhuis-KB PPN 145205088.jpg || Royal Palace of Amsterdam (Q1056152) M32092964 || File:Atlas de Wit 1698-pl018c-Amsterdam, profiel (Joan de Ram)-KB PPN 145205088.jpg || Amsterdam (Q727) -- boat (Q35872) -- river (Q4022) M32092969 || File:Atlas de Wit 1698-pl018d-Amsterdam, Oude Kerk-KB PPN 145205088.jpg || Royal Palace of Amsterdam (Q1056152) -- weigh house (Q1407236) -- Nieuwe Kerk (Q1419675) -- market (Q37654) -- horse (Q726) -- Euronext Amsterdam (Q478720) -- exchange building (Q10882966) -- Oude Kerk (Q623558) -- péniche (Q7578326) -- porter (Q1509714) -- coat of arms of Amsterdam (Q683829) -- dog (Q144) .....
3) Petscan tool
editAn alternative way of finding the pageIDs of the category members is by using the JSON response of the PetScan tool for the given category. We leave it to the reader to implement this approach into the Python script above.
4) Minefield tool
editAnother way to find the M-numbers of the category members is via the Minefield tool. The input field expects a list of Commons file page titles. The list can be easily extracted from the XML response of the API call https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmlimit=500&gcmtitle=Category:Atlas_de_Wit_1698&format=xml&gcmnamespace=6, where the Title fields contain the relevant file page titles. After cleaning up this XML and copy-pasting the list into the tool, the response looks like this:
"mid","status","title","url" "M32093222","ok","File:Atlas de Wit 1698-pl070-Brussel-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093222" "M32093229","ok","File:Atlas de Wit 1698-pl071-Antwerpen-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093229" "M32093235","ok","File:Atlas de Wit 1698-pl071a-Antwerpen, Kasteel-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093235" "M32093242","ok","File:Atlas de Wit 1698-pl071b-Antwerpen, Oostershuis-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093242" "M32093246","ok","File:Atlas de Wit 1698-pl071c-Antwerpen, Stadhuis-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093246" "M32093250","ok","File:Atlas de Wit 1698-pl072-Mechelen-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093250" "M32093263","ok","File:Atlas de Wit 1698-pl073-Tienen-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093263" "M32093268","ok","File:Atlas de Wit 1698-pl074-Lier-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093268" "M32093271","ok","File:Atlas de Wit 1698-pl075-Leuven-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093271" "M32093273","ok","File:Atlas de Wit 1698-pl076-'s-Hertogenbosch-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093273" "M32093277","ok","File:Atlas de Wit 1698-pl077-Breda-KB PPN 145205088.jpg","https://commons.wikimedia.org/wiki/Special:EntityData/M32093277"
Requesting (meta)data associated with an individual image
edit(item 50 in the article)
Using the Wikimedia Commons API tool created by Magnus Manske, we can programmatically request (meta)data associated with an individual image in XML, for instance for a map of the Iberian Peninsula from Atlas van der Hagen
- https://tools.wmflabs.org/magnus-toolserver/commonsapi.php?image=Atlas_Van_der_Hagen-KW1049B12_002-HISPANIAE_ET_PORTUGALIAE_REGNA.jpeg&thumbwidth=234 - returns the URL of a thumbnail of 234px wide
- https://tools.wmflabs.org/magnus-toolserver/commonsapi.php?image=Atlas_Van_der_Hagen-KW1049B12_002-HISPANIAE_ET_PORTUGALIAE_REGNA.jpeg&thumbwidth=234&meta - adds the EXIF data
We can also query the Commons API directly to retrieve information about an individual image. We use these examples and this imageinfo API documentation for inspiration. For example:
- https://commons.wikimedia.org/w/api.php?action=query&titles=Image%3AAtlas_Van_der_Hagen-KW1049B12_002-HISPANIAE_ET_PORTUGALIAE_REGNA.jpeg&prop=imageinfo&iiprop=url&format=json - returns the direct image URL, the regular file page URL and the permanent file page URL.
- https://commons.wikimedia.org/w/api.php?action=query&titles=Image%3AAtlas_Van_der_Hagen-KW1049B12_002-HISPANIAE_ET_PORTUGALIAE_REGNA.jpeg&prop=imageinfo&iiprop=extmetadata&iiextmetadatalanguage=nl&format=json - returns the formatted metadata (ie. with HTML markup) in Dutch
- https://commons.wikimedia.org/w/api.php?action=query&titles=Image%3AAtlas_Van_der_Hagen-KW1049B12_002-HISPANIAE_ET_PORTUGALIAE_REGNA.jpeg&prop=imageinfo&iiprop=metadata&iimetadataversion=latest&format=json - returns the EXIF data
Writing/posting data to Commons
editAdd structured data to files on Commons from an Excel sheet
editThis Python script writes Property-Qid pairs from an Excel sheet to the Structured Data of files on Wikimedia Commons.
For instance it can add putto (Q284865) to the depicts (P180) property of the File:Atlas Schoemaker-UTRECHT-DEEL1-3120-Utrecht, Utrecht.jpeg from the Excel file P180Inputfile.xlsx
Althought mainly intended to add P180 values in bulk, this script is also able to add Wikidata Qids to other properties (than P180) in the structured data.
For further info and configuration, see https://github.com/KBNLwikimedia/SDoC/tree/main/writeSDoCfromExcel and https://commons.wikimedia.org/wiki/Commons:WriteSDoCfromExcel