Open main menu

Commons talk:Structured data

Multilingual captions beta testingEdit

The Structured Data on Commons team has begun beta testing of the first feature, multilingual file captions, and all community members are invited to test it out. Captions is based on designs discussed with the community[1][2] and the team is looking forward to hearing about testing. If all goes well during testing, captions will be turned on for Commons around the second week of January, 2019.

Multilingual captions are plain text fields that provide brief, easily translatable details of a file in a way that is easy to create, edit, and curate. Captions are added during the upload process using the UploadWizard, or they can be added directly on any file page on Commons. Adding captions in multiple languages is a simple process that requires only a few steps.

The details:

  • There is a help page available on how to use multilingual file captions.
  • Testing will take place on Beta Commons. If you don’t yet have an account set up there, you’ll need one.
  • Beta Commons is a testbed, and not configured exactly like the real Commons site, so expect to see some discrepancies with user interface (UI) elements like search.
  • Structured Data introduces the potential for many important page changes to happen at once, which could flood the recent changes list. Because of this, Enhanced Recent Changes is enabled as it currently is at Commons, but with some UI changes.
  • Feedback and commentary on the file caption functionality are welcome and encouraged on the discussion page for this post.
  • Some testing has already taken place and the team are aware of some issues. A list of known issues can be seen below.
  • If you discover a bug/issue that is not covered in the known issues, please file a ticket on Phabricator and tag it with the “Multimedia” tag. Use this link to file a new task already tagged with "Multimedia" and "SDC General".

Known issues:


-- Keegan (WMF) (talk), for the Structured Data on Commons Team 20:40, 17 December 2018 (UTC)

Notability discussion on WikidataEdit

People following the structured data project may be interested in this discussion that has kicked off on Wikidata:


A bot job started by User:Mike Peel has been stopped, that was creating Wikidata items for people with Commons categories that cannot be matched to Wikidata, on the grounds that such people may not necessarily be notable.

SDC of course would require such items to exist, for it to be possible to make statements about them. Items for every person or thing that currently has a Commons category would seem a bare minimum -- some visions for SDC envisage going much further, for example creating an individual Wikidata item for every single separate museum object that we currently have an image of.

Whatever the outcome, this is something we desperately need more clarity on, looking forwards; not least to plan around, in the event that such items on Wikidata would not exist. Do any of the SDC team have any thoughts, eg @SandraF (WMF): ? Jheald (talk) 12:49, 7 January 2019 (UTC)

Jheald thanks for the link. --Jarekt (talk) 15:04, 7 January 2019 (UTC)
The decisionmaking around this topic is fully up to the community... As a staff member, I want to make a point of not wanting to impose an opinion on this at all. With my volunteer hat on, I have no strong opinions either. If we create Wikidata items for everything, we must be able to properly maintain that huge mass of items too... I think less notable heritage objects can be modeled purely based on more generic statements (represents vase, with features blue paint / flowers/fishes... / designed by x / with inventory number nnn / in collection y) on Commons, and we can also decide to model less notable people in a similar, more generic way there. But I will happily follow the broader community's wishes if there is consensus about creating Wikidata items for everything. SandraF (WMF) (talk) 17:37, 7 January 2019 (UTC)
@SandraF (WMF): I think less notable heritage objects can be modeled purely based on more generic statements [i.e. without their own items], and we can also decide to model less notable people in a similar, more generic way
If you do believe this, I would like to see a fully worked-up example, to establish (i) how information about the underlying object, and its nature, creator, copyright status, licensing, history etc, would be kept distinct from information about its depiction/photograph; (ii) how this is possible when it is not possible to have qualifiers on qualifiers -- something the current Commons:Structured_data/Properties_table shows up as a major unresolved difficulty; (iii) how this would play alongside images where description in terms of wikidata items would be possible -- how great would the difficulties be that we would get into, if we would be trying to operate two quite different data models at the same time?
Rather than just you saying that you think this can be done, if having to go down this road is even slightly conceivable as an outcome, I would like to see some hard modelling to show how it definitely can be done; and what the consequences would be. Because to date I'm not sure that the data designs so far presented would cut it. Jheald (talk) 18:23, 7 January 2019 (UTC)
Yup. Whether one or the other solution is satisfactory is up to the community to reach consensus about! Deployment is around the corner, so the community can try this quite soon. Seeing the technology in front of one's eyes will certainly clarify things and cause more people to have strong opinions about this. SandraF (WMF) (talk) 09:03, 8 January 2019 (UTC)

Multilingual file captions coming this weekEdit

Hi all, following up on last month's announcement...

Multilingual file captions will be released this week, on either Wednesday, 9 January or Thursday, 10 January 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:

  1. Read over the help page for using captions - I wrote the page on because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
  2. Test out using captions on Beta Commons.
  3. Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.

Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.

Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. I'll add a reminder to this post once I confirm exactly what day captions will be turned on for Commons. Keegan (WMF) (talk) 01:06, 8 January 2019 (UTC)

  • Apart from the (cumbersome and totally useless) language selection drop-down, which seems to have been already in use elsewhere, I cannot see anything new. So, descriptions can and should be added to Commons files — how’s that any different than previous practice? -- Tuválkin 14:54, 8 January 2019 (UTC)
  • Could you please explain more on how you find using the translation feature cumbersome and useless? Do you find it easier or more difficult to add a language to a description template? You are certainly welcome to not use captions if you do not find the feature useful for your work, but if there's a way that it can be improved we'd like to hear about it. Additionally, if you could provide a link to this tool that seems to already exist elsewhere here, I'd appreciate it because I haven't seen it and I'd like to take a look. Keegan (WMF) (talk) 18:16, 8 January 2019 (UTC)
  • what I said is cumbersome and totally useless is the language selection drop-down, which seems to be the same exact element that shows up as a generic language-selection tool when one uses an WMF project while not logged in. I think it is cumbersome because it is made up mostly of empty space and the way languages are sorted (geographically, and ignoring the browser’s options on prefered languages?) makes it hard to find a language, not to mention the unintuitive way with scrolls and gains selection focus. I guess that you coopted this pre-existing element (which is of course good practice), but it is an essential part of the whole caprions feature. For me, the ideal language selector is a single, easily scrollable list of languages, properly sorted (the collation of which would be interesting to discuss, in terms of internationalized user expectations), whence to pick one out (one or several — Ctrl-click does work on some devices). That much for cumbersome. It is also useless because when a user is logged in there’s no need to present a complete list of languages. Even the most formidable polyglot will have to pick from a dozen or two; only in the unlikely situation one would be contributing in a language one’s not versed on such a general selection too wouled be needed — and for that a "more languages" button seem better than what we have now.
  • I certainly do find it simplest to click to edit the file page’s wikicode and add {{ab|Something here.}} (or {{ab|1= Something here.}}, if I’m feeling chatty) next to where it says |Description = }} — way simpler than going through UI hoops, but I understand that’s not what you’re after, especially since what I find simplest is already tried and tested and working for many years. But even if wikitext needs to be not offered, there’s Visual Editor apparently working in many projects; adding captions to be injected in a page seems to be the most basic of its functions.
What I asked is what this new feature amounts to. We’ll have a pencil icon that brings up an already existing language-selection tool and thence we procede to a rich text entering box/screen whose working are new either? Is it the pencil icon that is new?, to be shown in file pages and interspesed in the upload wizzard? -- Tuválkin 22:36, 8 January 2019 (UTC)
From a technical standpoint, captions are like labels on Wikidata. They will be searchable through the API, making it easy to find/filter/pull captions from files as metadata. There are a lot of possibilities of what this can be used for, from filling in infoboxes, building lists for translation of important files needing a caption localized for a project or campaign, searching for and finding captions, etc. So in comparison to description templates, while they potentially contain similar data to a caption, their function and reuse purposes are very different. Keegan (WMF) (talk) 19:57, 9 January 2019 (UTC)
  • @Keegan (WMF): Understood, thanks. Maybe this field can/could be populated with the contents of the |Description = field of {{Information}}, {{Artwork}}, and other such templates. -- Tuválkin 01:43, 10 January 2019 (UTC)
  • If I understood it correctly, Tuvalkin meant an automatic action. The ideas behind structured data, semantic web and LOD are great, changes are good, but when I think of few thousands of my files on commons I would really love some bot. Especially, that descriptions are well structures in templates and mostly in size of captions. Nova (talk) 20:57, 10 January 2019 (UTC)
  • I think a real bot would create to much trash, but a Tool like VisualFileChange would be great. --GPSLeo (talk) 21:03, 10 January 2019 (UTC)

Captions are liveEdit

Captions can now be added to files on Commons. There's a bug with abusefilter sending errors to new accounts adding captions, the bug is being investigated and fixed right now. IRC office hours will be in a little over one hour from now, I look forward to seeing you there if you can attend. Keegan (WMF) (talk) 16:50, 10 January 2019 (UTC)

  • Is there a way to disable the box, or make it much less invasive? It's very annoying that it pushes the actual captions some half page down. Nemo 18:13, 10 January 2019 (UTC)
    • @Nemo bis: Make this edit to your user css and it'll disable the captions. If @Keegan (WMF): or someone could add an ID to the css surround for it then we could attach some extra css tags to it to show/hide it, which would be better. Thanks. Mike Peel (talk) 19:36, 10 January 2019 (UTC)
      • @Mike Peel: I'll make a Phabricator ticket later today to look into that. Keegan (WMF) (talk) 20:00, 10 January 2019 (UTC)
      • Thanks for the css tip, looking forward the show/hide option. Nova (talk) 19:56, 10 January 2019 (UTC)
        • Update: try this to also disable the "structured data" header. Thanks. Mike Peel (talk) 20:28, 10 January 2019 (UTC)
          • Works better now, thanks. Nova (talk) 20:41, 10 January 2019 (UTC)
    • It's easy indeed to hide the entire thing, but I'd just like it to still be there somewhere and not take one third of my screen or so. I suspect someone assumed that people don't care about existing descriptions being pushed out of the screen, or that nobody speaks more than 2 languages. Nemo 16:07, 11 January 2019 (UTC)
  • Hi, @Keegan (WMF): I see that a file ID, called "entity" is added as the first time a caption is created.   Question is the IDs created only when we add a "structured data" for the first time, or will IDs will be created automatically for each existing files? Christian Ferrer (talk) 18:44, 10 January 2019 (UTC)
    @Christian Ferrer: Hey, good spot. This is something we fixed in development yesterday, and will not be displayed from next week (the other issue is that it says "label" rather than "caption", which will also be fixed in the same change). Jdforrester (WMF) (talk) 18:49, 10 January 2019 (UTC)
    There is also some other things that I noticed. There is not anymore rollback, but if I remember well we already talked about that heu.. no, the rollback works well.... The second thing is : if you create a caption, then a ID is created, ok. If you revert then the ID is also removed. It's confirmed by the exact same number of bytes added and removed to the file.
    Now if you delete all the captions without having reverted, then the captions are indeed removed, but the ID stay. It's also confirmed by the number of bytes. This is not really a question, just a thing that I noticed. Regards, Christian Ferrer (talk) 19:05, 10 January 2019 (UTC)
    @Christian Ferrer: Yeah, it's technically the marker for the entity ID. It's not relevant to users (they can't use it and they can't change it; it's just the reference for the database), so we won't be showing it (if you really need to know it, it appears on the action=info page). The "byte size" change is also not very helpful or accurate as that's a measure for the database which depends on the JSON serialisation of the entity model, but removing that from history pages would probably be disruptive for power users so we don't plan to do that right now. Jdforrester (WMF) (talk) 19:18, 10 January 2019 (UTC)
  • Nice feature. It would probably be useful to write local file caption guidelines within Commons, in order to tell people which style is expected in the caption. mw:Help:File captions is a rather technical manual (no markup, how to undo, and so on), but things like capitalization, punctuation, preferable caption length and so on should probably also be advised to users… —MisterSynergy (talk) 19:44, 10 January 2019 (UTC)
  • @Jdforrester (WMF): Note that the "Captions" features are available in the file redirect pages. What will be the impact if captions are added there? Christian Ferrer (talk) 21:04, 10 January 2019 (UTC)
    @Christian Ferrer: Another good spot. Captions on redirect pages aren't useful, and we will disable them, but it won't break anything. We've filed a Phabricator task to do this. Jdforrester (WMF) (talk) 21:07, 10 January 2019 (UTC)
  • Just whoa. You guys let this thing go live like this? With a layout that will immediately antagonize the exact kind of contributers who would be the most enthusiastic and productive about captions? A layout that hoggs whitespace (does it even look tighter in monobook, respecting its default margins and paddings?), a layout that puts this thing above all else on the page (above "Summary", srslsy?), under an H1 heading (whiskey tango foxtrot, aren’t you guys all about structure?!), with some wierd horizontally divided box which will mistify both oldschool HTML 1.0 veterans and swipe-swipe whipperspnappers (click on the pencil to edit the caption text under it across a line?; why not clicking the caption itself?, or put a proper button next to it!)…? After you’ve been working on this captions things since May last year, at least? Good grief, you’re supposed to be the code gods that are going to dig a ditch between yourselves and the computer illiterate masses, burying all the power users in it. Turns out part of that dire prediction isn’t true after all, but sadly that’s the part about code gods — for this gizmo seems utterly ungodly. And therefore I’m gonna sprinkle some CSS holy water on my ”skin” and forget I ever saw this thing live in production looking like this. -- Tuválkin 02:09, 11 January 2019 (UTC)
  • And nobody thought of turning it off for file redirects oscar mike golf. -- Tuválkin 02:12, 11 January 2019 (UTC)
  • This is really bad. Please turn it live only when at least Template:Artwork is correctly handled, either by using the wikidata element or the description field with language template. The feeling now is that all the hard work that was done to describe files is going to be lost. Léna (talk) 13:08, 11 January 2019 (UTC)
  • Agree with some of the comments above. I thought the team had accepted and undertaken that structured data needed to be on a different tab to the regular file information, after this was flagged by multiple respondents in the Statements consultation (September-October 2018)
As a result, in the "What's new" section of the "Statements 2" consultation (November 2018), User:Keegan (WMF) wrote:
The tabs for Wikitext content and metadata (respectively called 'File information' and 'Structured data' for the purposes of this discussion) are now true tabs instead of anchor links, which should reduce/eliminate the occurrence of super long pages.
Such tabbing is necessary, and should be implemented ASAP. The Structured Data is (or, we hope, will be) very important for machines. But it is important it should not get in the way of the templated information for humans. Jheald (talk) 14:27, 11 January 2019 (UTC)
Jheald's last argument hits home. Nova (talk) 16:51, 11 January 2019 (UTC)
In the statements consultation I was referring to the decision to put statements behind tabs. Captions were never planned to be hidden from users, but most of the rest of SDC will be behind a tab. I think the planned new box that gathers "use this file" and attribution generation is probably going in the whitespace that already exists and is unused to the right of files (as seen in the statements mockups), but as far as I know for now that's the only other visible thing. Keegan (WMF) (talk) 18:06, 11 January 2019 (UTC)
I do see the problem in how the mockups are presented, though, by not showing the "File information" tab first. A side-by-side comparison that would have shown captions on the "main" file page, instead of them simply being absent from the statements mockup. I'll make sure to not repeat that mistake in the next feature design consultations. Keegan (WMF) (talk) 18:25, 11 January 2019 (UTC)

Accounts on Beta CommonsEdit

Trying to create an account on Beta Commons: is it possible that the error message "The passwords you entered do not match" arises when the actual problem is something else? I really doubt that four times in a row I couldn't match my password correctly, but four times in a row I got this same error. - Jmabel ! talk 09:59, 9 January 2019 (UTC)

@Jmabel: It's most likely that you're trying to use your SUL account. The Beta Cluster does not operate at the high levels of security that we have for production, hence the message above the login form:
This site (Beta Commons) allows WMF staff and community volunteers to test MediaWiki in a production like environment.
Do NOT use your normal password, or any password you use anywhere else online.
Did you already have a Beta Cluster account? If not, did you create one?
Jdforrester (WMF) (talk) 15:38, 9 January 2019 (UTC)
Again, as I wrote above, I was trying to create an account. It asks me to enter an account name, an email address, and enter a password twice. I got this error message after doing so, four times.
Is it a problem that I used the same name as my account here? It shouldn't know whether the accounts have the same name. - Jmabel ! talk 18:24, 9 January 2019 (UTC)
@Jmabel: Oh, sorry. No, it's not going to know about you having an account on this (other) system. I just created a test account there and it worked fine. Jdforrester (WMF) (talk) 18:43, 9 January 2019 (UTC)
Jmabel, it used to be the case that MediaWiki on some non-production servers (like wikitechwiki) was not able to deal with passwords containing certain Unicode characters. Have you tried using an ASCII password, as silly as this might sound? Nemo 19:15, 10 January 2019 (UTC)
Well, at this point the feature is live, so there's no point to my looking at the Beta. - Jmabel ! talk 00:43, 11 January 2019 (UTC)

Accessing the captions via lua and pywikibotEdit

@Keegan (WMF): (and others): are there ways to access the captions using Lua and pywikibot, or are they human-accessible only at the moment? Thanks. Mike Peel (talk) 16:45, 11 January 2019 (UTC)

Humans-only for now, or read-only via API. This will change, I do not know when at the moment. Keegan (WMF) (talk) 18:43, 11 January 2019 (UTC)
OK, thank you. If you can let me know when it is available then I can see if it can be integrated into the wikidata infobox to supplement the captions from Wikidata (and/or sync those over to here). Thanks. Mike Peel (talk) 19:54, 11 January 2019 (UTC)

Copyright status of structured "items"Edit

Are "Captions" and other SDC "items" released under the CC BY-SA 3.0 like the rest of Wikimedia Commons or under the CC0 license like Wikidata? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 23:42, 11 January 2019 (UTC)

My guess is that free text captions are CC BY-SA 3.0, like the rest of the page. However most of the properties will be de-facto ineligible for copyright so for all practical reasons CC0. --Jarekt (talk) 03:43, 12 January 2019 (UTC)
There's a phabricator ticket that's been asking this question since 2017, but with no meaningful input yet.

With the captions that are live right now, it's something that needs to be clarified urgently. @Keegan (WMF): ?
Since the whole page is licensed CC BY-SA 3.0, and there is nothing to indicate anything different for the captions, I think that means that any caption being added by a user at the moment has to be considered to be CC BY-SA 3.0. I think the contributor would be entitled to assume that that is the license under which they have made the caption string available, given that there is nothing else anywhere indicating anything other than this. This needs to be addressed quickly if the ultimate intention is to release the data CC0, because otherwise there will a considerable set of CC BY-SA 3.0 captions building up, that would have to be cleared to the more permissive release.
A relevant question, if SDC is intended to be CC0 (and at the moment we have had no clear indication either way), is what restrictions this would place on data being harvested from existing CC BY-SA 3.0 file pages. Even if reporting that the creator of a painting was Leonardo da Vinci may be an uncopyrightable fact, extracting such information at scale may fall subject to database rights that might be only available BY-SA. Other information, eg saying that a painting was "probably painted c.1530" or was considered to be by "a follower of Raphael", may reflect real intellectual choices that could attract copyright in their own right, particularly if a substantial number were taken. This is a question that needs clarification, before substantial data transfer starts from Commons templates. Jheald (talk) 11:38, 12 January 2019 (UTC)
Jheald, I disagree with legal theory that metadata, listing basic facts about someone or something can produce its own copyrights. You are not making artistic choices here and merely reporting information you found in the references. Otherwise you are doing something wrong. That is why it is OK to copy such information from Wikipedias or Commons to Wikidata. However, I agree that this should be clarified sooner than later. My vote would be to store most or all of Structured data under CC0 license so it is compatible with Wikidata. @Keegan (WMF):, I think this is important, especially since Commons is a project which is obsessed with getting copyrights right. We might need to discuss it as a project, but I also think WMF lawyers should look into it as well, especially if we start reusing the data and combining it with wikidata. --Jarekt (talk) 04:01, 13 January 2019 (UTC)
It would probably be wise to simply add the text "by publishing this you agree that you release this caption with the CC0 license" but it might confuse people into thinking that this applies to all texts or something. Maybe one of the developers should open a proposal at "Commons:Village pump/Proposals" and ask for community feedback in how to clarify this without being "too intrusive". But a simple indication that all Structured Data on Wikimedia Commons "Items" are CC0 would suffice in the beginning of the process. Let's not forget that this is (legally) important for the re-users outside of Wikimedia websites as they're the people this whole system is built for. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:52, 13 January 2019 (UTC)
I'm seeing where we are on all this. Keegan (WMF) (talk) 19:56, 14 January 2019 (UTC)

@Donald Trung, Jheald, Jarekt: We are planning to ask users to release items on Structured Data on Commons under CC0, and we are working with the legal team to add the right license-related language. Full text pages on Wikimedia Commons may still be under CC BY-SA.

“A relevant question, if SDC is intended to be CC0 (and at the moment we have had no clear indication either way), is what restrictions this would place on data being harvested from existing CC BY-SA 3.0 file pages.”

The text on Commons is CC BY-SA currently, to the extent it's protected by copyright. However, most captions are not likely to be copyrightable, since they are short and factual. Copyright protects creative works of expression, and not the underlying ideas. It's possible that there will be some descriptions that are so idiosyncratic to get Copyright protection, and Commons has a couple of options if that comes up: 1) argue that they are not copyrightable, 2) just remove those captions and write a better, CC0 caption. Either choice would be guided by how the Commons community would like to settle it on Commons:File captions. Keegan (WMF) (talk) 22:08, 15 January 2019 (UTC)

@Keegan (WMF): "Short and factual" doesn't get you a copyright waiver. To be without copyright there has to be essentially no choice in the text that was written. That's simply not true for a caption, and certainly not true for 40,000,000 of them. Jheald (talk) 23:06, 15 January 2019 (UTC)
Hullo @Jheald: the WMF legal team says that Copyright law protects creative expression, and not ideas or concepts. Short phrases that can only be written in a limited number of ways are not protected. The caption field has a few limitations: it can only have 255 characters, it does not allow Wikitext, and it should factually describe the image. The vast majority of captions will not be a sufficiently creative work of authorship to be copyrightable. A few references from the team: Stanford Law School published a guide on how this question has been handled by U.S. Courts, and U.S. Copyright Office Circular 33 explains what level of creativity is required for protection. In cases where a description is so idiosyncratic as to require compliance with CC BY-SA, it should be removed from a CC0 caption field. Copyright law can be unsatisfyingly unclear, so if you need more help clarifying what kind of creativity should be considered, please contact the legal team and they may be able to help by writing guidance in Wikilegal.
— Preceding unsigned comment added by Abittaker (WMF) (talk • contribs) 01:12, 16 January 2019 (UTC)
@Abittaker (WMF): This is Commons. You're not just dealing with U.S. copyright here, you're dealing with copyright for the whole world. And I dispute the claim that just because the caption is limited to 255 characters and needs to factually describe the image, that that means there are only a limited small number of ways it could be written. On the contrary, there may be any number of aspects of the image that the caption-writer may choose to foreground, and any number of ways to present them. The choice of one rather than any of the others is the writer's expression, and that is what copyright protects. It's no good trying to wish this away: there is an issue here which needs to be faced. Jheald (talk) 01:22, 16 January 2019 (UTC)
It may be interesting to note that the Indian Supreme Court recently affirmed that legal headnotes were protected by copyright. [3]. That's despite headnotes, on the face of it, being more formulaic and more derivative than image captions.
Similarly, this 2006 paper [4] suggests that (p.187) "other than the headnotes, private publishers probably do not have copyright in the court decisions they are publishing" (emphasis added).
In Canada the copyright status of headnotes was affirmed in the 2004 case CCH Canadian Ltd v Law Society of Upper Canada. Jheald (talk) 02:08, 16 January 2019 (UTC)

We do not have to figure out legality of copying captions at this point. The CC0 aspect of SDC should be advertised and copyright aspects taken into account before any bot migration of captions or other free text descriptions. However other SDC data should be OK as those are non-copyrightable facts. Also I do not agree that we need to consider laws of all jurisdictions when discussing copyrights of SDC. There are many jurisdictions and some have some odd laws (See here for example). However when disusing laws related to Commons text, than the only law we need to consider is US law as that is where the servers reside. --Jarekt (talk) 03:58, 16 January 2019 (UTC)

  • By the way, the text on the bottom of each Commons page "This text is available under the Creative Commons Attribution-ShareAlike Licence; additional terms may apply" should become something similar to d:Wikidata:Copyright: "All structured data from the [SDC] namespace is available under the Creative Commons CC0 License; text in the other namespaces is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. " --Jarekt (talk) 04:14, 16 January 2019 (UTC)

I would strongly recommend keeping CC-BY-SA as the licence for structured data on Commons. We are talking about the potential of harvesting 50 million sentences' worth of caption from existing files, which are already licensed in CC-BY-SA (or something compatible). Any change in copyright terms that prevent existing image descriptions from being converted into structured data will defeat the point of adopting structured data. Deryck Chan (talk) 17:31, 20 January 2019 (UTC)

Well, that’s debatable :) There was a similar discussion with Lexemes on Wikidata. In the end, they went for CC-Zero, in the full knowledge that this would prevent mass-copy from the Wiktionaries. Jean-Fred (talk) 18:13, 21 January 2019 (UTC)

To add a category just after a captionEdit

This morning each time I tried to add a category after to have added (and after saving) a caption I got this message : File:Screenshot Editing File Ophiozonella nivea (YPM IZ 007648 EC) jpg - Wikimedia Commons.png. Christian Ferrer (talk) 07:49, 12 January 2019 (UTC)

Hi, Yes, I already reported that: phab:T213462. Regards, Yann (talk) 08:49, 12 January 2019 (UTC)

Look and appearance of captionsEdit

Hi, it will be great the the file pages can keep a visual coherence.

1/ The title "Structured data" in a file page should be at the same sizes, and not bigger, than the other headers such as "Summary", Licensing", ect...
2/The size of the caption box on my screen (1920*1200) is a very little smaller than the {{Information}} and than the license template. It would be great that all boxes and templates be at the same size.

Regards, Christian Ferrer (talk) 12:59, 12 January 2019 (UTC)

@Christian Ferrer: Hey there,
The reason the MediaInfo section is under an H1 is because it's its own page component, at the same "level" as the wikitext block, which also has an H1 (the page's title). Right now the design is in flux, and I agree that it's a little confusing. In the future the design is going to change; the most recent design feedback session about this would mean that the H1 wouldn't appear, but instead the parts of the page would be split with tabs. That discussion is now closed, but I'd be interested to hear from you if that proposal would work for you.
You are right, the text in the Information template is 5% smaller than in the rest of the page – it's set to 95% (=13.3pt) of the general page content size (=14pt) in the template by using the class toccolours. I don't know why this was done, but it's been this was for a very long time, so I imagine a community discussion would be needed before changing the template.
Jdforrester (WMF) (talk) 21:26, 12 January 2019 (UTC)
I've no special strong opinion about potential tabs, but I have not really though to that for now. I just think that if headers there are, in a file page (or in a specific tab), then all main headers should be at the same level.
The size of the text was not my concern, I talked about the size (width) of the caption box compared to the width of {{Information}}, the width of caption box seems a little smaller
After a night's sleep I woke up with the certainty that you should limit the display at one langage at one time. Me I have 3 lines, and this is really boring (and some users have more...) although one line is fully accepteble. Furthermore I don't plan to write any caption in Spain langage + I don' want to hide this caption box, and now the result is that it comes to mind to remove es-2 from my babel just to avoid those 3 boring lines....When I looked to a file page when I was not connected, I found the 1 line box much much better...
Now that we have Commons:File captions, and in order to give infos to the visitors and editors, maybe that a link to that page should be given in the caption box, if not in the default display so then in the editing mode.

Christian Ferrer (talk) 05:26, 13 January 2019 (UTC)

How to search SD?Edit

When and how will WP search support the search in SD? My naive approach using incaption:value was not successful. I noticed that the search takes the captions into account and finds them [5], but there should be a way to search specifically for captions.

BTW, searching for the help text "Add a one-line explanation of what this file represents" should not find any matches: [6] --Herzi Pinki (talk) 17:17, 12 January 2019 (UTC)

@Herzi Pinki: That's because you're using the wrong search engine; it's in this one. Jdforrester (WMF) (talk) 21:14, 12 January 2019 (UTC)

filed a bug report for the BTW. --Herzi Pinki (talk) 21:53, 12 January 2019 (UTC)

You did not get me. I did not want to search the code. MediaWiki Search should support searching captions like titles. best --Herzi Pinki (talk) 22:02, 12 January 2019 (UTC)

Search will be supported, it's not turned on yet. Keegan (WMF) (talk) 19:38, 14 January 2019 (UTC)


Just curious, but why can't descriptive information from Wikitext "Descriptions" and "Categories" be harvested to create structured data? Licenses could be harvested right? Then why can't a bot harvest vital user-generated information from both native descriptions and organization? It just seems like a major handicap that existing Wikitext on over 50.000.000 (fifty million) media files can't be utilised.   --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:14, 13 January 2019 (UTC)

Sure, technically, they could be harvested. If we, the Wikimedia Commons community, want to do that, then we could have bots do that. We could hardly do that though before captions were available. (I guess there could have been a soft launch with "captions available but invisible and only editable by bots", but personnally I don’t think this was necessary − Wikidata also launched pretty much empty, and the community went on to make bots that seeded it).
Also, I don’t think that this is the job of WMF to do it − fairly sure that had the WMF done something like that, there would have been outrage that they’re touching the content. Jean-Fred (talk) 12:25, 13 January 2019 (UTC)

Information template problemEdit

Hi, some of my yesterday uploads to which I've added caption in Polish (via the UploadWizard form) seem to have broken Information template appearance, as you can see in this file page. Deleting the captions didn't help. Nova (talk) 10:57, 13 January 2019 (UTC)

Sorry, my mistake, noticed and fixed by Multichill [7], thanks. Nova (talk) 14:14, 13 January 2019 (UTC)

Hiding captionsEdit

Let me get straight into the point:

How do I hide the captions from my view? Any CSS code I can put on my common.css to remove that element?

Thank you. — regards, Revi 10:59, 13 January 2019 (UTC)

Found out the answer myself by looking at the above topic. For those who may be looking for this:
/** Initially posted by Mike Peel */
/** Just hiding captions */
.filepage-mediainfo-entitytermsview { display:none;}
/** Hiding "Structured Data" header */
.mw-slot-header {display:none;}
— regards, Revi 11:04, 13 January 2019 (UTC)
I thought original idea of Structured Data on Commons SDoC hereinafter (or a prototype of it I saw) was putting the SDoC section below the GlobalUsage or something like that. Is it me who has a wrong memory or has something changed thus creating this what the hell design? — regards, Revi 11:12, 13 January 2019 (UTC)
@-revi: There are two gadgets available in your preferences now - one that hides it completely as per the css, and one that collapses it by default but lets you expand it again if you want. Thanks. Mike Peel (talk) 11:48, 13 January 2019 (UTC)
Since I prefer codes on my local page rather than gadgets, thanks for letting me know but I will keep status quo. BTW, if possible, I want to force them to be H2, and moved to elsewhere (let's say, above Metadata section as prototype I recall was). Is it possible with CSS/JS? (Not asking you to do that but just wondering).
I have no interest in using it as currently is (it just sucks) but with the modification to put them away from the top of the page, it would be usable. — regards, Revi 18:20, 13 January 2019 (UTC)
@-revi: You can shrink it using e.g. { font-size:1.5em;}. as far as I can tell, moving it to a different part of the page is something the WMF would have to do. Thanks. Mike Peel (talk) 19:29, 13 January 2019 (UTC)
I think you can move things around using JavaScript (but because personal JS is loaded at the end, it will "jump around") − see for example MediaWiki:Gadget-CategoryAboveAll.js which moves the category box at the top. Jean-Fred (talk) 21:17, 13 January 2019 (UTC)
@-revi: captions certainly has some design issues that showed up in production that were not visible in testing. The team is working to identify the problems and push the fixes, I've made an update section that you can keep an eye on. Keegan (WMF) (talk) 17:57, 14 January 2019 (UTC)
It would be very useful to make a list of what was missed in testing, so that the problems we had here are less likely to be replicated in the future. - Jmabel ! talk 22:32, 14 January 2019 (UTC)
Oh of course, lists are being compiled and things will be shared. A majority of the issues following the release are bugs/design flaws that can only be surfaced from release into production; in most all software development there are limits to testing environments trying to replicate a live environment, with highly customized configurations, skins, javascript, css rules, abuse filters, etc. that come with all of the wikis.
All these excuses aside, a new testing environment that will be better at finding things has been in the works for the past few weeks, and should be up and live in time for depicts and other statements testing in a couple of weeks or so. In theory, it will be very helpful in reducing bugs in production. Keegan (WMF) (talk) 00:01, 15 January 2019 (UTC)

UploadWizard and captionsEdit

Hi, I just wanted to follow the new idea and do my best with the new uploads - there are few issues from this exercise:

  • current explanation in the form for the Caption is not enough - how it is different from Description? What IS most important and for whom? I seek for a link to a page with a good set of examples;
  • the section Caption is translated into Polish as "Podpis", which is a bit confusing, as suggests more a "Signature";
  • repetitiveness - first - Title, than - Caption, than - Description. All of them containing the same subset of information. In my case, with Description containing mostly one or two sentences (close to 255 characters), Caption could be the same (but two separate fields suggest that shouldn't?). Now I have to fill in 5 fields, if, as usual, two languages included; Some kind of auto-generated-pre-filled-suggestion-from-Description would be of much help;
  • Caption is marked as Optional, but put above the required Description;
  • There is no info that wiki markup should not be used in Captions and, as far as I've checked, it is not validated if it has been used, so published with the markup, than uninterpreted on the file page.

It rather stops from providing the Captions with upload at the current state. Nova (talk) 12:35, 13 January 2019 (UTC)

Captions should probably be with the "other information" if anything. Nemo 15:23, 13 January 2019 (UTC)
Thank you for the feedback, it's very helpful as the team looks at design changes. Keegan (WMF) (talk) 17:53, 14 January 2019 (UTC)

Captions updates to comeEdit

The development team is putting together the plan for changes needed for captions, there are a few bugs and some design issues that showed up when captions went live on Commons (and thank you to all who have pointed them out onwiki and/or participated on Phabricator). I'll have a list to post later this week, along with information about how soon we can expect to see the changes. I'll be making the post here, with a note on the Village Pump. Keegan (WMF) (talk) 17:51, 14 January 2019 (UTC)

Can better sitelinks help with structuring data?Edit

I've opened a proposal at Wikidata at "Wikidata:Wikidata:Requests for comment/Proposal to create a separate section for "Commonswiki" links" in relation to linking to galleries and categories on Wikimedia Commons, I also left a field there open for other ways how Wikidata could help Wikimedia Commons with its structure, will the structured data for Wikimedia Commons project be able to utilise such links or does this project exclusively work with the files and not the existing community-made infrastructure? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 15:59, 15 January 2019 (UTC)

Hashtags (#)Edit

A lot of image-sharing websites use hashtags (#) to clarify what is depicted in an image, is the concept of "depicts" going to be like this? Because I think that adding hashtags could very easily be a good search 🔎 tool, let's say you want to find an image with both "#Cats" and "#Birds" then a specific search could look for images where both of these hashtags are used. Sure vandals could wrongfully tag images with "#Hot sex" and "#Nude female human" and actual images that depict hot sex and nudes could be just as well be vandalised with "#Children". But from what I can tell "depicts" will be vulnerable to the same levels of vandalism. Hashtags could be listed below the categories on the bottom of an image, and being placed in a category could automatically add certain hashtags to an image, these "automatic hashtags" are then non-editable in the same way maintenance categories associated with certain license templates are. Is this idea viable or close to how "depicts" will work?

Because this is already how a lot of successful media-sharing websites do it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:26, 15 January 2019 (UTC)

Hashtags are not being implemented. Here's the last round of designs of what depicts will look like; it's very much about structuring data. Can someone add incorrect or malicious entries into structured data? Yes, but in no different way than the rest of the wiki functions. Structured data is fully integrated with recent changes and the revision deletion extension that controls page deletion, revision deletion, and suppression (aka oversight), as well as abuse filter. The development team plans on having depicts statements up for testing in a new test environment by the end of the month, so you'll be able to see it in action. Keegan (WMF) (talk) 22:17, 15 January 2019 (UTC)

Captions Vs. DescriptionsEdit

Alright, I really do not want to sound daft or anything, and I've already heard read an explanation, but I genuinely can't tell the exact advantage of "captions" over "descriptions", I get that "descriptions" are in WikiCode and can't be used with the infrastructure of the new "Structured Data", but other than character limitations I don't exactly understand how "captions" help with structured data, do they allow for data-mining ⛏ to automatically create "depicts" based on them? How exactly do they organise files in ways that "descriptions" don't? When I search for a file the file descriptions are also searched, so how do file captions improve upon this? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:40, 16 January 2019 (UTC)

Also, descriptions can be multilingual as well, the way adding different languages is done using the MediaWiki Upload Wizard is identical, how are the languages of file captions better organised than those of "descriptions"? I've seen the new upcoming designs and they all look great, but I still don't see where the advantages of "Captions" come to play. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:44, 16 January 2019 (UTC)

I think the next set of releases for structured data, depicts support followed by support for other properties, will help illustrate how captions fits in with the coming features, particularly on the back end of the software. Search with captions and statements, for example, will be an entirely different experience and it's very hard to illustrate that right now. The next question might be why wasn't captions released with this other stuff, then? The answer to that is the underlying technology behind captions that will power the rest of structured data - namely Federated Wikibase and multi-content revisions - is brand new and had to be integrated into Commons first, before we can finish development and release of the next feature set. Keegan (WMF) (talk) 22:35, 16 January 2019 (UTC)
Ah, well patience is a virtue then, but seeing how we now have 3 (three) fields for descriptive information ("the title", "Descriptions", and "Captions") I can understand the backlash. But thanks for explaining, although I still think that removing the character limit from "captions" (or at least synchronizing it with "descriptions'" 10.000 character limit) and then having a bot automatically copy all current "descriptions" to captions would've been preferable.
By the way, shall "depicts" and other planned structured data fields also be integrated with the MediaWiki Upload Wizard 🧙‍♂️? Or shall it not pass? And if so, is there concept art of this available? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:42, 18 January 2019 (UTC)
More will be integrated into the UploadWizard that will provide context for captions, though I do not have details or designs to share right now. More will be available in the next couple of weeks as the team moves past captions fixes and into depicts testing. Keegan (WMF) (talk) 21:05, 18 January 2019 (UTC)

Captions updatesEdit

Here's the list of work being done this development cycle to work on captions.


These tasks have been complete and are either live on Commons now, or scheduled for deployment next week.


Works in progress, the fix being live on Commons is yet to be determined.

To do, but it's complicatedEdit

The underlying problem is going to take additional work.

Needs community fixEdit

Captions introduced a bug into a Community-maintained gadget-space, unfortunately. The development team may able to advise volunteers on fixes where appropriate, but the team is unable to fix this themselves.

Needs community discussionEdit

Two tasks can be implemented by the development team, but they require Commons community consensus to implement.

I'll post updates if/when I receive them. Keegan (WMF) (talk) 20:23, 18 January 2019 (UTC)

Can I switch this on and off?Edit

The table with an input possibility for structured data (image one-liners) now appears above the image discription. Is there a way to switch this on and off, or to relocate the table to a lower position on the file page? Elly (talk) 20:00, 19 January 2019 (UTC)

"Depictes" (local Vs. Wikidata-based solutions)Edit

Hello 👋🏻 y'all,

What are "Depicts" going to be? Are they going to be local items or will they be hosted on Wikidata? My idea 💡 would be for them to be locally hosted on Wikimedia Commons in a new namespace such as "Depict:Confucius" which could then link to the Wikidata page about "Confucius", one advantage this could have is that they wouldn't be called into question as much regarding their notability as Wikimedia Commons basically has no notability standards. If they're locally hosted they could be locally linked and connected with Wikidata, at "Wikidata:Wikidata:Requests for comment/Proposal to create a separate section for "Commonswiki" links" I requested Wikimedia Commons categories and galleries being linked at the same time to a Wikidata "item" and "Depicts" could then also be added on to that request, this would also enable translations from other Wikimedia websites to be seamlessly imported and puts minimal effort on the Community to maintain it. Thoughts 💭? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 16:49, 21 January 2019 (UTC)

Image added here for (visual) reference.

What I suggest for Wikidata would be link this (stole the design from Christian Ferrer):

In other projects
Wikimedia Commons
Files depict this topic

And then information could be automatically imported, most English speakers don't know that "Pikachu" is "ピカチュウ" in Japanese, or that "Poké Ball" is "モンスターボール" (Monster Ball) in Japanese, but keeping it local would also allow non-notable (according to Wikidata standards) depicts to be hosted here such as Great Balls (スーパーボール, Super Balls), Ultra Balls (ハイパーボール, Hyper Balls), Master Balls (マスターボール), Safari Balls (サファリボール), Level Balls (レベルボール), Lure Balls (ルアーボール), Moon Balls (ムーンボール), Etc.. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 17:05, 21 January 2019 (UTC)

The basic idea of notability in Wikidata is, that any Wikiproject needs it. So if there is anything with many pictures of it, it is okay to create a Wikidata item for it. If there is only one picture of it there is no need to create an item of grouping the images. --GPSLeo (talk) 17:38, 21 January 2019 (UTC)
Now as an example from the Dutch-language Wikipedia let’s look at “w:nl:Bristol (winkelketen)” (Link 🔗 / Mobile 📱), which is related to the Wikimedia Commons category “Bristol (shop)”, its deletion log reads “Een pagina met deze titel is eerder aangemaakt en daarna verwijderd of hernoemd, zie hieronder: 10 jan 2008 10:41 MoiraMoira (overleg | bijdragen) heeft de pagina Bristol (winkelketen) verwijderd (cyberpestactie van werknemer belastingdienst) (bedanken)” which translates as that it was deleted because of “Cyberbullying by an employee of the Taxation Service”, according to their official website there are 210 Bristol stores in the Benelux which is much more than some other similar stores covered on the Dutch-language Wikipedia (nlwiki), on the English-language we have the “salted” article “w:en:Ulefone”, which is related to the Wikimedia Commons category “Ulefone”, which has these reasons for deletion:
However simple Ecosia searches for “Bristol (winkelketen)” and/or “UleFone” shows a large number of independent sources covering these subjects, notability standards on Wikipedia's are corrupt and can be easily manipulated by any admin by “salting” the pages and any user interested in writing about “a salted” subject will be discouraged from writing about it, furthermore I’ve seen countless of good neutral articles deleted because they were “promotional” or “spam”, but let's not delve too deep into the issues of other wiki’s and how local admins do influence what does and does not get covered as we have a similar situation with the interpretation of “Commons:Scope” here, the issue is that a Wikidatacentric approach will have to work with the standards of other Wikimedia websites, and although all other Wikimedia websites combined cover 54,055,172 content pages (as of 14:28 22 D. 01 M. 2019 A.), but some subjects will simply never be covered on the other Wikimedia websites, this can be because of a lot of factors other than just notability, admin interfere of content creation, overzealous “spam-hunting”, Etc. People could just not have an interest in a subject or all the information is either “burried/hidden away” in old newspaper clippings, just not online, it’s an obscure subject that only existed during the 1970’s in Botswana, Etc. There are many scenarios where something could be the subject of multiple photographs without it ever being covered on Wikidata, I don't believe that Wikidata could ever be the sum of all knowledge (not under Deletionism and Exclusionism, anyway) and sure the current category system solves this, but aren’t structured data “items” supposed to be independent from the MediaWiki category and organisational system? Having a system independent system for organising the content on the project (Wikimedia Commons) that works well together with Wikidata but has the space to be flexible enough to organise things outside of the scope of Wikidata and other Wikimedia websites would be a better option than outsourcing everything to Wikidata, it limits the ability to actually structure the data on Wikimedia Commons.
You simply can't walk through a major Dutch or Belgian city without seeing a Bristol shop, why should this subject not be noted when it's depicted while for example "w:nl:Scapino (winkelketen)" should. I'm not arguing that structured data shouldn't work with Wikidata, the point is that if something gets deleted off of Wikidata that is extensively covered here then we shouldn't let Wikidata "unstructure" the data here. Also Wikidata's notability doesn't cover any Wikimedia project, it specifically excludes Wikimedia Commons because Wikimedia Commons itself has no notability standards. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:03, 22 January 2019 (UTC)
Return to the project page "Structured data".