Open main menu

Commons talk:Structured data

Property creation on WikidataEdit

Hello everyone! Over the past few months, we brainstormed about Wikidata properties that will be needed to describe files on Wikimedia Commons, and those ideas have been summarized with a list of properties. Updates and feedback and further thoughts are still very welcome.

Some of these properties currently exist on Wikidata, but many do not and are in need of creation. Property creation on Wikidata is a community-driven process, the development team will be happy to follow along and to support where possible. As Depicts and other statements will be deployed in the first months of 2019, it is time to start process of creating new properties now.

Here are some first thoughts from the team.

Please let us know how you can help or what you think! Keegan (WMF) (talk) 16:37, 19 September 2018 (UTC)

I do think we need a separate section dedicated to Commons related properties, which might or might not be useful on Wikidata. Future copyright related properties should be discussed in that context. We actually have d:Wikidata:Property_proposal/Sister_projects#Wikimedia_Commons, may be that is the right place. --Jarekt (talk) 17:18, 19 September 2018 (UTC)
I suggest moving this discussion list of properties, to Wikidata by creating a page similar to d:Wikidata:Lexicographical data. We have several projects on Wikidata like d:Wikidata:WikiProject Informatics/Software/Properties created to get feedback from the Wikidata community members for deciding the use of existing properties or for the proposition of new properties. I however agree that this process may sometimes be very slow. Also the links pointed to by User talk:Jarekt are equally important. John Samuel (talk) 17:30, 20 September 2018 (UTC)
I'm happy to move/copy the table over to Wikidata. Would someone like to set up a page for it to live on, that has Wikidata-relevant project information? I'm not from the Wikidata community myself, so I'm unfamiliar with how that should go. Keegan (WMF) (talk) 16:51, 21 September 2018 (UTC)
+1 on @Jarekt:, I think it'd be a good idea to make a separate page for SD. Should we also create a separate page here? @Jsamwrites: We already have d:Wikidata:WikiProject Commons, but I think we can improve it. --Sannita - not just another sysop 19:43, 21 September 2018 (UTC)
@Sannita:, Thanks. I added myself as a participant. @Keegan (WMF): My personal opinion is that we can create a subpage on d:Wikidata:WikiProject Commons or copy the current discussion to the new page. John Samuel (talk) 10:50, 22 September 2018 (UTC)

Searching Commons - how to structure coverageEdit

RIsler (WMF), the Structured Data product manager, has identified an issue that he'd like to bring to the community's attention, with regards to how search will function:

“After review with many engineering and product folks at WMF, WMDE, and within the Commons community, we've come to understand that the initial implementation of depicts "tags" for Commons media should be more focused on making sure all relevant concepts are identified and tagged, rather than limiting tagging to a few specific terms. Additionally, for now we won't rely much on the Wikidata ontology (the data structured) to find any additional depicts statements automatically.

Here is an example of what we mean. Let's try the hypothetical case of an image of a German Shepherd, and the user uploading it tagged it with only "German Shepherd" (Wikidata item d:Q38280):

  • We may be able to suggest an additional depicts tag (dog, d:Q144) based on the "subclass of" or "instance of" property of German Shepherd (we are still determining if this is possible for the initial version of depicts functionality). These suggested tags could appear during use of the UploadWizard, or on the image's file page, and be available for a human to confirm their accuracy before being added to the file's data.
  • In the first half of 2019, we expect to launch a machine-based image classification feature that may suggest a number of additional depicts tags including "dog" (d:Q144), "pet" (d:Q39201), "canine" (d:Q2474088), etc. These suggested tags could appear on the image's file page and be available for a human to confirm their accuracy before being added to the file's data.
  • Once a suggested tag is confirmed, it is added as a depict statement and the German Shepherd image will show up as a match for searches for any of those terms.
  • On the file page, users will be free to add additional depicts tags that are accurate for the image (for instance, if it's a young dog, add puppy (d:Q39266) )
This combination of techniques should ultimately result in better searches that can be both very specific (show me German Shepherd puppies) and broad (show me pets).”

Within the next week or two we will provide information on access to try out a prototype for Search on Commons. The prototype will not be advanced enough to show what we are talking about here, but we will be providing more information about "good coverage" tagging at that time. Keegan (WMF) (talk) 17:04, 21 September 2018 (UTC)

1/What do you mean "the user uploading it tagged it with only "German Shepherd" "? during use of the UploadWizard will the user have to chose a tag or a category? or did you want to mean "the user uploading it with only "German Shepherd" as category? Christian Ferrer (talk) 14:28, 22 September 2018 (UTC)
2/may be a language misunderstanding from myself, but do you mean that an image " only tagged with "German Shepherd" " will not appear in "dog" search results, because it is not tagged as "dog"? and that we have to add the "dog" tag manually? Christian Ferrer (talk) 14:37, 22 September 2018 (UTC)
I can appreciate why you're considering this, but (as presented) I think it's a bad idea.
A key principle on both Wikidata and Commons has been to try to make statements as narrow and precise as possible, and to rely on hierarchy rather than permitting redundancy (eg: COM:OVERCAT, here on Commons).
The problem, as many have discovered, is that searching a hierarchy is expensive, far more expensive than a flat tag search. People writing bespoke queries may be prepared to wait 60 seconds for a full hierarchical exploration (and the SPARQL service is able support this relatively small population of searchers). But 60 seconds is not acceptable for the main search interface, nor would the query engine be likely to scale to support full hierarchical searching for the entire population of searchers.
Also there's the issue that the Wikidata ontology at the moment is simply not in good enough shape -- just not consistent and predictable or reliable enough -- to even specify what those hierarchical searches should be.
So going back towards something that can be implemented as a flat search starts to look like the only solution.
But IMO adding multiple redundant "depicts" tags for the same object in a wider image is to be avoided if at all possible. Keegan, you say that there has been a review of this "within the Commons community". I'm aware of a couple of times the question has been raised, eg here and here, admittedly without much take-up, but with a sense I think that this was not the direction the participants would prefer. It adds redundant clutter to the item. It makes it difficult to know whether there are two objects involved, or just a single one. It reduces the impetus to refine the description and try to describe the things really sharply (in my view the COM:OVERCAT principle strongly contributes to the activity of category refinement for images). It makes it less clear where qualifiers (like "shown with features" or "located within image") should be placed. And it goes directly against the principle used on Wikidata, on a system that's supposed to seamlessly combine with it.
As an alternative, I would suggest treating these additional tags added for search purposes as 'shadow tags', attached closely to specific (conventional) primary tags for items. So if something in the image is tagged "German shepherd", make "dog" an alternate shadow tag attached specifically to that "German shepherd" tag, rather than a free-floating tag in its own right.
That way we can keep things organised, preserve the impetus to try to refine the identification of things, and be clear about how many identified things there are -- that there is only one animal in question, not two. Jheald (talk) 20:32, 22 September 2018 (UTC)
A further issue is what will happen when a Commons image "depicts" something with its own Wikidata item. How is it proposed to handle this case? An item on Wikidata will not have redundant depicts values: it will not have an additional "depicts:dog" statement, if it is for a painting of a German shepherd. Jheald (talk)
The "shadow tags" would be a kind of cache and like any cache would easily become out of date if the underlying data is changed on Wikidata. But the alternatives don't seem very pleasant. Queries that take 30 seconds to complete? Tagging every photo of a human with "human", "homo sapiens", "person", "homo", "homininae", "hominidae", "primate", "ape", "animal", "onmivore" "two-legged animal", "organism", "thing",... I know I've missed a lot. --ghouston (talk) 11:12, 23 September 2018 (UTC)
@Ghouston: The team appear to have developed really cold feet about using Wikidata to populate the additional search tags -- see phab:T199119, and in particular the first substantive comment on that ticket, by Cparle on 10 July, for their quick initial read of some of the issues this would face. So I don't think there would be any intention to keep the additional tags sync'd with Wikidata. Instead I think the suggestion is to perhaps try to suggest a few extra tags at upload time, and then from then on let them follow their own separate destiny. (Otherwise your analogy with a cache would be spot on.)
Hence the 'shadow tags' existing in their own right. But I do think there might be mileage in e.g. storing them as a qualifier ("additional search tag" ?) to a depicts statement, rather than as independent depicts statements in their own right. Jheald (talk) 17:13, 23 September 2018 (UTC)
Jheald has accurately described some of the technical issues that prevent us from implementing the preferred approach. The idea of something like an "additional search term" qualifier has some promise, and is an approach we're still considering as a possibility, but we need to game out the consequences involved. There are other logistical issues like how we would display it consistently in the UI, and how we integrate that approach with other platforms/systems (like GLAM databases), and how this would work with search. If that approach turns out to not be feasible, the solution that covers all requirements without extreme workarounds is to simply have a number of depicts tags on the M item. Although some tags might be somewhat redundant to humans (but still useful for search purposes), we can probably mitigate the impact on the UI. We will have the "Make Primary" button/link that will allow users to essentially say "these things are the most important", and those tags would be shown first and be the preferred vehicles for qualifiers. Again, using the German Shepherd example, although the image may be tagged with "dog", "pet", etc., German Shepherd can be the primary tag and house the important qualifiers like "applies to part", "shown with features", etc. while the depicts tag "dog" doesn't need to be primary and can just hang out in the background minding its own business (we're also considering a "cutoff" where, after a certain number of depicts tags, the user will have to expand to see more). We also have other reasons for wanting to separate what we're calling "elemental" depicts tags, including making it easier to import data from sources that already have tags set up that way (like Flickr Commons, GLAM sites, etc). Depicts on Commons will perhaps be the most complex part of the project, and easy answers will be in short supply, but we think the end result will be a dramatic improvement in search and discoverability. RIsler (WMF) (talk) 22:35, 24 September 2018 (UTC)
@RIsler (WMF): Thanks for dropping by. It's good to know that something like an "additional search term" qualifier is still in consideration.
Regarding the use of "Make Primary", I am now a bit confused. I had understood, from the Depicts consultation that 'Primary' was to be used on "depicts" to indicate the overall topic of the image -- eg something like nativity scene (Q31732) or Sacra Conversazione (Q370665), rather than being used to prefer Virgin Mary (Q345) over woman (Q467) for one of the elements within the scene. I do think that for the latter a better approach would be to try to tie the two together more concretely, eg by making the one a qualifier value for the other. It would be a much better structure for people writing queries to be able to work out what is going on. The idea of introducing additional ranks beyond the three used on Wikidata is also interesting (but is this possible, technically, without major surgery to the code of wikibase?), eg to hive off secondary tags to a lower rank, so many applications could ignore them. But going down the road, I suspect that tying the secondary tag to the regular tag is probably information that will turn out to be useful. If an additional rank were going to be introduced for anything on CommonsData, I would put one for "inferred by machine; not confirmed" at the head of the queue -- I suspect it is a status we may be going to be seeing a lot -- to rank below a regular statement, but still be eligible to be included as a wdt: statement in the RDF, if there was no regular statement outranking it.
As regards data import, I suspect we're kidding ourselves if we think this is ever going to be easy. I'm working on an image upload project with a major GLAM at the moment, with simultaneous creation of Wikidata items for the underlying objects, and the reconciliation of names for people and places to Wikidata is brutal -- easily the most rate-limiting aspect of the whole process. This is probably as near as one can get at the moment, before CommonsData goes live, to what an upload involving Structured Data will entail. As an example, the current batch of images I've been working on contains 200 creators or contributors, with names that are supposedly normalised to the Library of Congress preferred form, if the LoC has heard of them. An initial match to the LoC and then Wikidata found 90 matches, 10 of which turned out to be wrong. By trying matching via VIAF, and then going through remaining candidates one by one, I've now raised the 'matched' count to 110 of the original 200, but it's taken a day and a half to do. And this batch is just 2% of the overall collection. Perhaps the universe of potential "depicts" tags is a more limited vocabulary, but the matching of a tag vocabulary to Wikidata, and then even more so the verification of that matching, is not a small job. I suspect that against all that, using machine methods to identify when one tag is probably just a less specific intimation of another tag, and should therefore be made subordinate to it, will likely add no more than a drop in the sea.
A further point is that Commons will still be expecting all uploads to be fully categorised, and for those categorisations to obey COM:OVERCAT, ie only categorise with the most specific indications. Structured Data should help a lot with that -- one of the reasons I'm so much trying to go the Wikidata route with my current project is to then be able to read off the appropriate Commons categories -- but to avoid OVERCAT the uploaders will thus need to work out in any case which tags are redundant to which other ones, so the effort of determining this to store them in qualifiers is not really an additional overhead. Jheald (talk) 18:52, 25 September 2018 (UTC)
For "make primary", we're exploring whether it can serve more than one purpose. Yes, its main use would be to identify the main subject of the media. But perhaps this feature (or something similar) could also say, either implicitly or explicitly, that the tag in question should be the one to host relevant qualifiers. Again, this is all still work in progress and we have a lot of different use cases to account for, so we certainly won't have anything solid on this until next month. RIsler (WMF) (talk) 18:03, 26 September 2018 (UTC)
Hope we misunderstood the comment made by Keegan (WMF), otherwise it is likely better to develop FastCCI tool, and to create a "tag" namespace in Commons that will work in parallel with category tree but that will not be subject to our over-categorisation rules. Example : if you categorize your file with Category:Dog then Tag:Canis, Tag:Canis lupus, ect, ect... are automatically added to the file by a BOT or a software, and when you click on Tag:Canis then you see all the images that have "Canis" as tag. This would allow to stop spending a signifiant part of the $3,015,000 USD of that project. Sorry for that last sarcasm. Christian Ferrer (talk) 12:03, 23 September 2018 (UTC)
  • @Christian Ferrer: 1. Refers to statement tagging, not category tagging. Categories remain an independent process 2. Correct, the file would have to be tagged with "dog".
I'll work on getting some more specific answers to other concerns and questions. Keegan (WMF) (talk) 19:05, 24 September 2018 (UTC)
ok thanks for the answer. Christian Ferrer (talk) 21:11, 24 September 2018 (UTC)
  • It seems to me that it is a disaster that the system will not automatically be able to make a search based on a hierarchy of tags. Would it be possible to offer both types of search, i.e. a simple tag search which would be fast and a hierarchical search which would be understood to be slow (perhaps limited in the amount of hierarchy which could be searched)? Strobilomyces (talk) 11:52, 25 September 2018 (UTC)
@Strobilomyces: I can't speak for the team, but as I understand it the sheer number of different ways different properties are used in different circumstances, plus the density of very odd glitches in the WD ontology, plus the difficulty of prioritising results to meet general users' expectations of seeing the results that they would actually want to see, have put the team right off offering any deep hierarchical search. (See the assessment by Cparle on the ticket I linked above for just a taster of some of the problems lurking under the surface). Any attempt in this direction would be a major research project, simply not on the agenda for the team trying to ship version 0.1
BUT --- all of CommonsData and all of Wikidata should be accessible from WDQS, so it should be possible to write queries in SPARQL that are as deep and complicated and bespoke and intricate as one could wish. And probably, soon enough, one will find that users who have a particular knowlege and interest in particular areas, understand the twisty details of the Wikidata hierarchy in those particular subject areas, and are prepared to put in the time to extend some of the data that is incomplete and fix some of the statements are wrong -- those users are quite likely to start producing ready-written query designs for particular subjects and disciplines, that somebody might well graft a user-friendly front-end onto. But nobody should underestimate the amount of data that is going to need to be improved on Wikidata, if those queries are going to produce good and solid results -- just look at all the data that is currently missing from the infoboxes on categories, just for starters, never mind all the data that is still needed to make sure the hierarchies behind those items are solid and robust. Jheald (talk) 17:20, 25 September 2018 (UTC)
Thanks for the answer.Strobilomyces (talk) 11:44, 26 September 2018 (UTC)
  • I have some doubts about this. From my experience with the Wikdiata ontology I have to admit that it might not be well suited for Commons because it is deeper than what Commons needs, and perhaps not as user-oriented as one would expect. The thing is, there is nothing stopping Commons users to create their own ontology or hierarchy of depicts items. So why not have an own collection of depict items on Commons itself and structure them as wished? Then they can be connected to Wikidata items where appropriate, and use whatever ontology the user wants.--Micru (talk) 07:51, 29 September 2018 (UTC)
@Micru: CommonsData is not currently projected to support generic items, only media-items for particular media files. Generic items are expected to live on Wikidata (per current plans, at least). Jheald (talk) 11:29, 29 September 2018 (UTC)
The question which was not studied is what should be done in wikidata ontology to allow a correct search using the wikidata ontology. Currently nobody try to improve the wikidata ontology because there was no reason to have a strict set of rules. But we can improve the ontology by fixing a set of simple rules like an item should not be an instance and a subclass at the same time or no reference cycle. Snipre (talk) 07:19, 2 October 2018 (UTC)
@Snipre: The comment by Smalyshev on wikidata-l is also worth reading [1] : The main problem is that there is no standard way (or even defined small number of ways) to get the hierarchy that is relevant for "depicts" from current Wikidata data. It may even be that for a specific type or class the hierarchy is well defined, but the sheer number of different ways it is done in different areas is overwhelming and ill-suited for automatic processing... One way of solving it is to create a special hierarchy for "depicts" purposes that would serve this particular use case. Another way is to amend existing hierarchies and meta-hierarchies so that there would be an algorithmic way of navigating them in a common case. This is something that would be nice to hear about from people that are experienced in ontology creation and maintenance... I think this is very much something that the community can do. Jheald (talk) 08:11, 2 October 2018 (UTC)

@Keegan (WMF): If I understand correctly: The current wikidata ontology is unsuitable for searching (e.g. related discussion) which is a huge problem. I do not think it is a good idea to cover up this mess with hundreds of different tags. Instead the image classification and searching algorithm should be a motivation and help people to fix the ontology. --Debenben (talk) 15:59, 3 October 2018 (UTC)

  • @Keegan (WMF): I fully agree with above, if "German Shepherd" is currently no linked (in the results of a potential search) with the taxon chain of Canis lupus familiaris, it is because the ontology is not well done, Structured data for Commons may be a good idea only in the extend that the "data" is indeed well structured. In Wikidata German Shepherd should be a "breed" (with "breed" as a property) of Canis lupus familiaris, however it is not. It is currently a breed of dog, which literally is true but ontologically totally wrong, "dog" is not a species but a taxon common name. I wonder how many items are affected by this kind confusion. As well woman (Q467) is a "female adult human" only in the description, but not in the statements, where you can indeed find "female" and "adult" but not "human", therefore women will never be highlighed if you search "female mammals". But that's not why I pinged you, has it been envisaged to have the possibility to add qualifiers to the depicts "tags", as it is shown for the Search prototype? That will be good. Sorry if it is already written somewhere and if I missed that. Christian Ferrer (talk) 05:24, 7 October 2018 (UTC)

Necessary changes to how viewing and using old file page revisions functionsEdit


Structuring data on Commons with Wikibase changes how content on a file page is stored and served back, known as ‘’multi-content revisions’’(MCR). Instead of a file page revision being a single big chunk of information, data is broken apart into pieces known as “slots.” When you view a file page, its history, or any individual revision, what you are seeing is being assembled from multiple slots.

This makes serving old revisions of a file page complicated, as one slot may have a revision that has been edited while another slot has not been changed. The old version of a file page cannot be returned in the same way that a plain wikitext-based wiki page works, which simply finds the specific past version of the wikitext on the file page – because there is only one – and returns that.

In order to make MCR work on old revisions of file pages, the development team is looking at making these old versions of pages match how Wikidata functions. The following things change when looking at an old revision of a file page:

  • The Edit tab at the top right of the page is replaced with Restore
  • The function of the Edit tab, accessing the old version of the entire wikitext of a page in order to be restored is removed. Instead, a page is shown with the differences between the current and old revision (the one being restored), with an edit summary field.

Let’s say that you want to revert a file page to a specific version from the past. Currently, you’d access the History, click on the revision that you want. From there you would click on the Edit tab, view the old text in the editable text box and fill in an edit summary, and save the page.

The new function has you access the history, click on the revision that you want. From there you would click on the Restore tab (which has replaced the Edit tab). You’d then see a diff of the revision from the current page, and an edit summary for to fill in with the save button. The editable text field is removed. This is replicating how Wikidata handles old revisions.

If you’d like to read through the technical discussion that resulted in this decision, here is the Phabricator ticket where you’d start. There are more links within, including links to gerrit patches.

There are advantages to serving old revisions in this new manner, the main one being simplifying the process of restoring an old revision should that be your goal in editing the page. There are some drawbacks to this decision however, primarily that the entire wikitext of old page revisions will not be available for copying, if someone is looking to duplicate old text on another page. Individual line changes can be seen and copied from the diff view. As mentioned at the top, this change will only affect the File namespace on Commons. Access to old revisions in the Commons namespace, Template namespace, Main namespace, etc., will remain as it is today. This use of old revisions in the File namespace does not seem to have a large impact on Commons, and the team hopes that any disruption in workflows from change in how old text is accessed is minimal. The team may try to look into other ways of serving the entirety of old wikitext page revisions, but it will not be possible in the near future.

Are there any questions about this change? Keegan (WMF) (talk) 19:37, 27 September 2018 (UTC)

So it's not going to be possible to undo just a change to the wikitext, without reverting back the structured data -- nor to just revert structured data, without reverting the wikitext?
This might be a problem, if we consider that it might often be two largely distinct communities editing the data (probably heavily mechanised) and editing the wikitext (probably manually), often likely quite largely independently.
If somebody reverts back some edits to the data after a mistake, while in the intervening time an edit has been made to the wikitext, it sounds as if that wikitext edit will be reverted back too, and may be quite hard to reinstate, if it is no longer possible to access the wikitext of a whole page in the form to which it had been updated. This might upset non data-editors quite a lot. Jheald (talk) 20:35, 27 September 2018 (UTC)
Undoing and reverting will work just fine. Here's what you won't be able to do directly anymore on a File page: open the old revision, edit that revision directly as wikitext in the editing box with the big warning that this is an old revision, and save it as the new revision. Keegan (WMF) (talk) 22:17, 27 September 2018 (UTC)
I'd like to point out that this use-case for an old revision of a file page, accessing the old wikitext directly to either copy or manipulate it to save as the current revision, does not seem to be a common workflow for file namespace here on Commons. It is quite common in discussion spaces, and on other wikis. Please let us know if there is a prevelent use-case for this workflow that we need to figure out a solution to. Keegan (WMF) (talk) 22:21, 27 September 2018 (UTC)
To add on to what Keegan said above - A.) The MCR team is still working on features and, in the future, should have a way to access the Wikitext of old revisions. It's just probably not going to be ready for our v1 launch. B.) As we get closer to launch and start putting things on Beta for testing, we'll explore a few possible temporary workarounds to address some edge cases as Jheald mentioned. RIsler (WMF) (talk) 01:51, 28 September 2018 (UTC)
Will this affect API access to old revisions? Currently, Geograph Update Bot makes some use of the ability to read past revisions (to check if location templates are the same as in the first revision), and I'd like it to make more (to detect location templates added by other bots, and to detect when it's thinking about reverting another user). It would be unfortunate if these became impossible. --bjh21 (talk) 21:20, 27 September 2018 (UTC)
To me, this seems like one more argument for the serialization/deserialization approach I've suggested several times. - Jmabel ! talk 23:18, 27 September 2018 (UTC)
This is just for older versions, right? The wikitext of the current version will still be accessible? (At least, that part of it that isn't the structured data.) BTW, it does seem to be possible to get hold of old versions of Wikidata items using getOldVersion() in pywikibot, but not to format it into an item_dict using get() (you have to manipulate the json - e.g. see [3]), I guess the same might be possible here so that bot-actions (for spotting/reverting vandalism and bot errors) would still work if needed? Thanks. Mike Peel (talk) 00:17, 28 September 2018 (UTC)
Wikitext of the current version will still be available. RIsler (WMF) (talk) 01:52, 28 September 2018 (UTC)
BTW, I'd send around the alert about this discussion as you did for #Property creation on Wikidata - this is a more important discussion than that one was... Thanks. Mike Peel (talk) 00:18, 28 September 2018 (UTC)
I'm doing so first thing in the morning. It was a very busy day, and experience has taught me to not send a massmessage at the end of a busy day :) Keegan (WMF) (talk) 02:10, 28 September 2018 (UTC)
I did this a little while ago. Keegan (WMF) (talk) 16:07, 28 September 2018 (UTC)
  Question How will work Commons:Rollback? Christian Ferrer (talk) 16:34, 28 September 2018 (UTC)
  Question how will be displayed file histories and how will be displayed Difference between... Christian Ferrer (talk) 17:12, 28 September 2018 (UTC)
None of these things should be affected or change. What is changing is how old revisions work in relation to viewing an old version of a page. Keegan (WMF) (talk) 17:34, 28 September 2018 (UTC)
Ok thanks you for the answer. Christian Ferrer (talk) 17:55, 28 September 2018 (UTC)
Losing access to old revision's plain wikitext is plain red flag IMO. It may not be used that often but that's really helpful when you need it. Wikidata - you never see the plain wikitext (and you never need it), and here on Commons, we work with plain wikitext. — regards, Revi 16:43, 28 September 2018 (UTC)
@-revi: How often do you need the bulk plain wikitext from an old revision of a file page? I ask because as @RIsler (WMF): mentions above, the team should have a feature to access the old revision wikitext again in the future and this removal is temporary (unfortunately we do not know how long temporary is). If you find that you do access these old revisions on file pages regularly as part of a workflow, we'd love to hear about it and pursue a workaround. Keegan (WMF) (talk) 17:03, 28 September 2018 (UTC)
it is a commons workflow for me, to edit text to change information template to artwork template (for example) or copy paste LOC metadata into an old LOC flickr upload, i even use visualfilechange to make mass text changes to old files. but i would be happy to edit wikidata instead. a pencil on the template, to go to wikidata would be expected. and on ramp to QuickStatements. Slowking4 § Sander.v.Ginkel's revenge 02:51, 29 September 2018 (UTC)
  • Agreeing with @-revi: This is a disgrace. How often do we use it, @Keegan (WMF):? Often enough for you not break it, what about that? -- Tuválkin 02:57, 29 September 2018 (UTC)
    • I'm trying to make sure I understand: will it still be possible to edit the latest version in a straight-text manner? That is, when you are talking about "old" revisions not being editable this way, is that just ones that have already been changed, or does that include the latest? Because if it's the latter then, yes, this is going to break a lot of workflows. - Jmabel ! talk 03:34, 29 September 2018 (UTC)
      • I'm still waiting for an answer to this, as conversation has headed off in different directions. - Jmabel ! talk 17:13, 29 September 2018 (UTC)
        • @Jmabel: The latest revision of the page will be editable just as it is today. We are only talking about when you view a historical version of the page. Keegan (WMF) (talk) 18:34, 1 October 2018 (UTC)
I have three concerns:
  1. In the two examples above the one from Commons allows you to preview the page, but the wikidata version does not. Does that mean we will also lose the ability to see what the page will look like while in the process of performing a restore? Fortunately, I assume as a workaround we could instead take care to always start by clicking on an old revision to see a rendered version of the the old version before clicking on the "restore" command.
  2. A common use I have for examining the old wikitext for a page is to figure out what wiki code was used in the past to produce a complex layout of description and licensing templates that have since been changed. Possible workarounds would be to either (1) restore an old revision, copy the old code, restore again to revert that restore, or (2) copy the current wikitext and then manually apply the diffs backwards through each revision to reconstruct the old code. Neither is particularly appealing. I would definitely sorely miss the ability the directly copy the wikitext of old revisions in the File namespace. I could live without it temporarily, but this is not something I do infrequently.
  3. Another case where I use the wikitext of an old revision in the File namespace is when making corrections to fix an edit that has broken the template rendering of a page. Commons File pages make heavy use of templates for their content and are often edited by editors from other wikis who make use of Commons but are not themselves primary contributors to Commons, so they are less familiar with the complex set of templates that Commons has built to produce the majority of the content in the File namespace. It is not uncommon for inexperienced editors to make an edit to a file page that adds useful information but also breaks the page rendering in a significant manner. For these types of corrections I will often click to start editing the revision of the page immediately before the less experienced editor started editing the page, copy out the wikicode for the portions that they inadvertly disrupted, and then use this copied code to make a correction. Alternatively, I might start by editing the revision of the page before the inexperienced editor started, actually make the change the other editor was trying to make and would have made if they were more experienced with Commons, and then submit. Making these types of corrections will be much harder without access to the wikitext of old revisions in the File namespace. As a workaround we will instead have to restore an old revision, copy the code, restore again to revert that restore, start a new revision, paste the copied code, make the necessary corrections, and then submit.
RP88 (talk) 04:06, 29 September 2018 (UTC)
For the second point RP88, if the answer made by Keegan (WMF) to my question above is right, then you should be able to copy a wikitext with Difference between.... Christian Ferrer (talk) 06:04, 29 September 2018 (UTC)
@Christian Ferrer: In your example link try to use your browser to copy to the clipboard just the contents of the older {{Information}} template and its parameters on the left side of the comparison. In the browsers that I've tried (Chrome, Firefox, Safari) the copied text will be intermingled between the old and new {{Information}} template/parameters. Using a diff as a way to retrieve the text of an old revision can be done, and is usually not too onerous for less complex edits, but quickly becomes impractical. —RP88 (talk) 06:20, 29 September 2018 (UTC)
  •   Comment I don’t believe I ever needed to edit the wikitext of old revisions of File pages, and can’t think of any use case I would have needed that ability for. :) Jean-Fred (talk) 08:20, 29 September 2018 (UTC)
  •   Comment MediaWiki already supports "view source" of a page, for example as offered when a page is locked. If it is necessary to withdraw the "edit" option link from the page history (and I don't fully understand why that is so necessary), would it not be possible to offer "view/view source" instead ? Jheald (talk) 11:35, 29 September 2018 (UTC)
  • As I understand it, the tricky part with viewing the old source of a page that has MCR is that not all revisions of all things are living in the same place, so assembling that snapshot in the raw is what becomes infeasible and why the view is changed from plain wikitext to the diff view. I think it might be possible in the future to put back together, but for now we need a workaround. Keegan (WMF) (talk) 18:41, 1 October 2018 (UTC)
@Keegan (WMF): That seems rather odd. One would expect at least all the wikitext to be living in the one place. Jheald (talk) 19:15, 1 October 2018 (UTC)
Let me try to clarify what Keegan meant above. It's hard to explain without getting in the weeds about what MCR does, so let me provide a short answer - we might indeed provide a view source button/tab, but it may be easier to simply provide a source view via a modified querystring on the EditPage. The ultimate point is that there *will* be a way to view the Wikitext of a past revision, we just haven't settled on the best way to do that yet. RIsler (WMF) (talk) 21:04, 1 October 2018 (UTC)

Multiple questions regarding the change @RIsler (WMF), Keegan (WMF):

  • Who is going to merge old filedescripon pages in the new system`?
  • Why has no community consenus seeked, on Commons:Village pump/Proposals?
  • Who is going to fix all the bots which will break once the change is merged?
  • If i remember corrently, somehwhere staff promised that filedescription pages and categorys will be keept. Why has this changed?
  • Who is going to fix all the gadgets which will break?

Best. --Steinsplitter (talk) 06:32, 12 October 2018 (UTC)

  • I left a note on COM:AN, so we can get a bit more input here. Best --Steinsplitter (talk) 06:40, 12 October 2018 (UTC)
    • Steinsplitter, why did you leave a note on AN -- this does not require administrator intervention. The Village Pump seems more appropriate (and it doesn't seem like a proposal, more like a FYI). -- Colin (talk) 06:55, 12 October 2018 (UTC)
      • Sounds reasonable, moved it to VP. --Steinsplitter (talk) 06:56, 12 October 2018 (UTC)
        • I left a VP note when this was posted. Keegan (WMF) (talk) 17:13, 12 October 2018 (UTC)
  •   CommentI don't think this will be any problem for me. I often want to see the wikitext of current revisions, in order to copy/paste to another page (which is what I suspect some of the hasty opposers above are doing). But I've never needed to do that for old revisions. Indeed the only time I've ever needed the old revision of a File page on Commons to revert to it. As an aside, I wish one could revert to an old version of a file without that appearing in one's upload log -- if the devs know of a ticket for that one, I'd support it. -- Colin (talk) 06:55, 12 October 2018 (UTC)
  • @Steinsplitter: File description pages and categories are being kept. Page history merges will not change. As for why consensus wasn't sought, it's because this isn't an optional feature, it's a required function. I'm not aware of gadgets, bots or tools that this particular change might break (and I had a look). Are there any particular ones that you had in mind? Keegan (WMF) (talk) 17:12, 12 October 2018 (UTC)
  • Required by whom, or by what? -- Tuválkin 18:22, 12 October 2018 (UTC)
  • Multi-content revisions, the software that assembles pages from Commons and Wikidata. Keegan (WMF) (talk) 21:42, 12 October 2018 (UTC)
  • I run User:Geograph Update Bot, which inspects old revisions of pages (and of files). I asked above if API access would be affected, but I haven't yet had a reply. --bjh21 (talk) 19:12, 12 October 2018 (UTC)
  • This particular change we're discussing is more about reverting old revisions via the UI. We have no current plans to change API access to *read* Wikitext for old revisions (new MCR stuff will be backwards compatible). If an issue comes up that requires such a change to be made, we'll be sure to inform everyone before it happens, but as of now the plan is to keep the basic API functionality working as is. More preliminary info here: RIsler (WMF) (talk) 23:49, 12 October 2018 (UTC)

This discussion is 3,300 words long, having run for just over 2 weeks. This discussion was mentioned on the VP, but there was no other effort to notify users, even those of us that put our names forward to be part of formal consultation. It's only happenstance that I remembered the VP mention, which happened while I was away travelling.

The change is significant in the fundamental way that Wikimedia Commons works and should be run as a proposal or RFC, run for at least 30 days, and benefit by having a FAQ based on the questions raised so far.

It's worth pointing out that as the most active current uploader of images on Commons and this change is worrying due to the future potential impact on the way that upload projects will work, templates can be used and running housekeeping tasks on uploads, which includes automatically reviewing past versions of image page text (many of my bot tasks do this as part of checking past bot actions and ensuring bots do not edit-war with "human" changes). Despite vague assurances that this probably will not be any more volunteer effort, I do not believe that will be the case long term. This change is part of making templates harder to use and image page text becoming harder for newbies to format "correctly", with "correct" being defined by whatever pops out at the end of the WMF's structured data project. The authority for the changes comes from the WMF funded project, not because Wikimedia Commons volunteers have established a consensus for changes. Instead the structured data project has fudged consensus by having consultations, like this one, that procedurally mean very little and where input from volunteers can be cherry picked by the unelected, to demonstrate whichever case benefits the project at that time.

Thanks -- (talk) 10:38, 16 October 2018 (UTC)

@: you've said that there was no effort to notify other users, even those that put names forward to be part of the formal consultation. I notified those people, including yourself.
As to the point about requiring consensus...
As a Wikimedian, I understand your point and the necessity for consensus when proposing to change community process and workflows. However, the fundamental problem with your proposal is that you're relating consensus to necessary software changes. The implication of asking for consensus is that if it's not found, the thing isn't done. This has to be done no matter what, so that completely fails in the process you propose. Instead, what we can do is inform people of incoming changes, look for the workflows that this might break, and help implement new systems or workarounds.
If there are concrete issues that you can identify now with how your bots operate, please let us know and we will work with you. If you are unsure of what might break because you won't know until this happens, please let us know at the time and we will work with you then. There are some aspects to this project that will be necessary to do, and the development team would very much like to make this as painless as possible. Keegan (WMF) (talk) 19:36, 29 October 2018 (UTC)

Lua version of the {{Information}} templateEdit

In preparation for the Structured data on Commons it might be a good idea to revamp our most used infobox template: {{Information}} and rewrite it in Lua. I can look into adopting and simplifying some of the code used to {{Artwork}} template to develop Module:Information which should be a very simple and lightweight replacement of the current wikicode. Of course once the sandbox version is ready we would notify the community and go through extensive testing process before any deployment. The Lua code at this phase would simply mimic output and behavior of the wikitext but in the future might be used for merging data stored in the information template with data stored in structured data. --Jarekt (talk) 17:21, 22 October 2018 (UTC)

I'd suggest holding off on doing something like this. I do not think we have the overhead to support this at this time, it might be more useful to wait and see how the function of SDC takes shape. Keegan (WMF) (talk) 19:38, 29 October 2018 (UTC)
@Keegan (WMF): I think Jarekt's idea might good, so could you explain what you mean or expect, I don't understand. Thanks, Yann (talk) 20:55, 29 October 2018 (UTC)
It will cause server strain in two ways. The first is a one-off individual occurrence, severe load times when editing the template. Maybe not so bad, something people could live with. However, the second way is that it makes re-rendering the page extremely expensive, and will forever slow down loading the File page for everyone. That's not really something we can live with. I'm not an engineer, this is the simplified form of what I've been told, so if we need more specifics I can try to dig them out. If we can figure out in a way in the future to do this without the strain, that'd be great. On the other hand, we'll likely have structured data more feature complete by then, and it might be moot. Keegan (WMF) (talk) 21:13, 29 October 2018 (UTC)
The Module:Information is ready for testing, as announced in Template_talk:Information#Rewrite_in_Lua. At this stage I am only testing it and will not deploy without broader support. @Keegan (WMF):, would this change be significantly differ than any other change to {{Information}} template or one of the 15 templates and modules it calls? Edits to one of those templates is not unusual. Also there are other modules and templates that are used a lot, like Module:Fallback with 44M transclusions, or Module:I18n/date with 52M, and although we are trying to minimize number of edits, they are beeing changed all the time. In my experience, the "one-off individual occurrence, severe load times when editing the template" means that the tab becomes unresponsive, than it might take up to 2-3 months until all the files using templates are updated. I am all for "figure[ing] out a way in the future to do this without the strain", since I expect that once Module:Information is life there will be a lot of ideas on how to improve it. --Jarekt (talk) 13:14, 2 November 2018 (UTC)
@Jdforrester (WMF): thoughts for Jarekt here? Keegan (WMF) (talk) 16:40, 2 November 2018 (UTC)
@Keegan (WMF), Jarekt: Correct, ideally you wouldn't ever touch those modules either, and certainly not change it "all the time". Jdforrester (WMF) (talk) 16:51, 2 November 2018 (UTC)
@Keegan (WMF), Jdforrester (WMF): OK, "all the time" is an exaggeration, but we do have 200+ templates and modules with 1M and more transclusions and 37 of them with over 10M transclusions (see Special:MostTranscludedPages). All those pages are protected (I hope), and we try to limit the edits to them, but there is always something to fix or improve. In the past the general attitude was en:Wikipedia:Don't worry about performance, I do worry about it but not to the point of postponing improvements. As the number of files is going up those edits affect more and more pages, this might become more of the issue. Keegan was hoping we "can figure out in a way in the future to do this without the strain", any chance of that? --Jarekt (talk) 02:14, 3 November 2018 (UTC)
@Jarekt: Absolutely, I understand the evolving needs for such tools (new languages, new designs, etc.). However, implementing them in Lua, though indeed it is far better than wikitext, is very slow compared to proper code to do it in PHP/JS. A big part of the benefit from the Structured Data on Commons work is to provide tools for all the media files that could be implemented in Lua, but in a way that isn't a performance disaster. Taking weeks to roll out each typo fix to all files is terrible experience for Commons users (readers/re-users); via MediaWiki code itself it runs faster and more manageably for the servers, and so accidental breakage doesn't sit on millions of files for "2-3 months", as you put it. Each new feature added via a template or module is at the rough cost of a dozen such done via the main system. (In other words, just because you can, doesn't mean you should. 😉) Jdforrester (WMF) (talk) 18:13, 5 November 2018 (UTC)
@Keegan (WMF), Jdforrester (WMF): I see, so the way to make edits to templates like {{Information}} would be to move them from wikitext or lua implementations into MediaWiki code itself. I am fine with that. My assumption was that Lua code will be the place where we merge inputs from Structured Data on Commons (SDoC) and the wikitext for files that have some data in one format and some in the other format. For example, some image we managed to parse the date and we store it as SDoC, but the author field is not comprehensible to algorithms and we do not have anything in the SDoC (until someone inputs it manually), so we display that information the way we always did: from the template wikitext. We do something like this right now with {{Artwork}}, {{Creator}} and other templates. So if the Lua is the place to merge those two streams of data, than the first step would be to have Lua code for a single stream (wikitext), that do not breaks existing file description pages. Latter when we have SDoC, we can extend the Lua code to render information from it and perform merging tasks. So would be plan be to do the merging in MediaWiki code, or something else entirely? --Jarekt (talk) 21:15, 5 November 2018 (UTC)
I strongly suggest putting this project on hold until we have some features out for SDC and we all (the dev team and the community) can see what can be accomplished without having to take this to MW core, or do any sort of complicated work that might have negative impacts on the site before we see what's possible first with SDC. Keegan (WMF) (talk) 21:26, 8 November 2018 (UTC)
Ok, lets put it on hold. However in the past, several incremental changes prooved less desruptive than acumulated sweeping changes, as they have higher chances of of not beeing reverted. --Jarekt (talk) 20:21, 9 November 2018 (UTC)

Mark for translationEdit

Could you mark for translation the part please? Sorry can’t give the source link:

  • Read the log of the last IRC office hour, which took place _date tag_.

--Omotecho (talk) 04:12, 29 October 2018 (UTC)

Thanks for pointing it out, I'm not a translation admin on this wiki so I'm working on getting someone to help take care it. Should be marked up soon. Keegan (WMF) (talk) 21:53, 29 October 2018 (UTC)

Could Wikidata be used to create non-English category names for existing categories?Edit

Despite the language being selected as "Mandarin Chinese" and both the Wikidata item and the category itself containing Taiwanese alternatives the title of the category is still displayed in the English language.

I wrote the below sometime ago, but I don't really know where to propose it, Commons:Village pump/Proposals is exclusively in English and non-English speakers simply won't be made aware of it, as this doesn't really concern any policy or "community" thing but rather the underlying software of Wikimedia Commons I thought that it might find a better place here. Note that because English is the de facto language of Wikimedia Commons (and all "multi-project" websites like Wikidata) I don't think that anyone who can't understand English will be able to give their input about this, but really these people are "voiceless" here because if you can't understand English you're still very likely to run into English everywhere and let's say someone who only speaks Wolof won't be able to contribute much here, unless we would allow Wikidata to start adding non-English titles and translations for Wikimedia Commons. Maybe this could also apply to tags and properties.

The original proposal is below:

Sometimes I wonder into Wikimedia Commons from a non-English language Wikimedia project such as the Mandarin Chinese Wikipedia or the Vietnamese Wikipedia, however rather than finding the titles appropriate for the language I Wikimedia Commons displays in, it still shows the English language titles such as “Category:Round coins of the State of Qi” even though I’ve added the Taiwanese title “齊國圜錢” which is also used at the Mandarin Chinese Wikipedia. It would make sense for people who can’t speak and understand English to still be able to navigate their way through Wikimedia Commons, right? This could easily be implemented by just changing the display titles of the categories, non-English titles would redirect to their English equivalents but would display themselves in whichever language the viewer has set their preferences to.

Additionally translations could be imported from Wikidata, Wikidata often features non-English translations of its items so a bot could simply mass-create redirects that will serve as alternative display titles based on these. In case these alternative titles change on Wikidata they should change on Wikimedia Commons, but I think that the software should be able to prefer Wikipedia article titles over translations but should be able to be manually overwritten locally will like a special template like {{Preferred translation}}. An example of using a more Wikipedia-centric translation method would be using Category:Archaeological Museum of Thessaloniki which exists as the English Wikipedia article “w:en:Archaeological Museum of Thessaloniki” and as the Greek Wikipedia article “w:el:Αρχαιολογικό Μουσείο Θεσσαλονίκης”, then the alternative title for the Wikimedia Commons category for users coming in from that language could be “Category:Αρχαιολογικό Μουσείο Θεσσαλονίκης” or “Κατηγορία:Αρχαιολογικό Μουσείο Θεσσαλονίκης” (namespace changes could be handled outside the redirects and just automatically display the translations in the preferred language). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 16:24, 31 October 2018 (UTC)

This will be possible with structured data, at least theoretically and probably not quite how you envision. Whether or not the community would like to do so will be up to, well, y'all :)
There are people working to create 1:1 equivalent Wikidata items for Commons categories. Items can hold translations, and with structured data people would be able to add these Wikidata Category statements to files which are then accessible in available languages. Again, it'd be up to the community to figure out this work flow and if you want to do that. It would be slightly redundant in some ways as files will have other statements that expose the same or similar information, but I image there's a use-case for maintaining the category tree in structure. Keegan (WMF) (talk) 17:05, 31 October 2018 (UTC)
@Keegan (WMF):, then if category trees will largely be superseded by tags and statements then these should also be made available in a variety of languages. However categories have something that many other statements don't, they're often directly linked to other Wikimedia projects including Wikipedia's in various languages, I'm not sure how other Wikimedia projects will be properly linked through with other statements but in its current form users often come to Wikimedia Commons categories through "equivalent Wikipedia articles". Will this somehow also be superseded in a manner that subject relevant images could be easily linked in other Wikimedia projects? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 18:12, 31 October 2018 (UTC)
I think this is achievable with no software changes and only a small settings change. If $wgRestrictDisplayTitle were disabled on Commons then {{DISPLAYTITLE}} could be used to set the displayed title of a category to something appropriate for the language (using {{LangSwitch}}), with a bot handling creation of redirects and correcting uses of redirected categories. It might be possible to import translations from Wikidata, but there are several reasons why this wouldn't work in general (non-unique labels, categories not notable enough for their own Wikidata item, etc) so many would have to be maintained here. So I think this is a policy and/or community matter and COM:VP or COM:VPP would be a perfectly sensible place for it. --bjh21 (talk) 18:44, 31 October 2018 (UTC)
@Bjh21:, alright then I'll post this to the village pump to propose it after I've re-written it a bit. Would there be a way to mass notify non-English speakers in their languages to ask for feedback? Also most of it would have to be done locally due to the notability "issues" with many categories so I wasn't ever expecting Wikidata to cover the translations 100% (one-hundred percent). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:00, 31 October 2018 (UTC)
Let me know if it would be useful/possible to add a call to something like DISPLAYTITLE to {{Wikidata Infobox}} - hopefully the titles in the infobox already show the labels that we'd want to use here. Thanks. Mike Peel (talk) 21:12, 31 October 2018 (UTC)
@Donald Trung, Keegan (WMF): There's a huge use case for being able to query the category tree using SPARQL from within the CommonsData Query System (phab:T194401#4661082) -- especially in the early stages when there will be a huge amount of information to sync from categories to structured data and vice versa. But that is best done with something that monitors changes to the category table and reflects them directly to the SPARQL triplestore, not by creating actual redundant statements on CommonsData and trying to keep them up to date.
Nor is anyone going to rewrite the software around the category table in MediaWiki. It will continue to be built around a category page name which is a string, picking up Category:pqrst statements in the wikitext of the page.
Yes, we have an increasing number of categories linked to items on Wikidata, which can have labels translated into multiple languages. But using those to present/interpret aliases for category names -- both at the bottom of file pages, and at the top of category pages -- would be quite tricky. For one thing, Wikidata labels are not disambiguated, whereas category names need to be unique. It's perhaps not completely impossible, but it would be a major major amount of work, and I am dubious that resources would be diverted to make it happen any time soon. Jheald (talk) 00:37, 1 November 2018 (UTC)
To amplify something I wrote in the previous paragraph: it's not enough for the software just to rewrite the display title on the category page. It also needs to similarly translate category names at the bottom of file pages, otherwise these will not match, and things will be very very confusing. And also, the modified software needs to be able to recognise the category-name translations, if these are used e.g. to add a category when a page is edited. This is starting to become quite a patch, and one would also need to make sure key tools like HotCat and Cat-a-lot continued to work, too. Jheald (talk) 00:46, 1 November 2018 (UTC)

Might be better as something supplemental than as a replacement. - Jmabel ! talk 15:32, 2 November 2018 (UTC)

Copyright and licensing statementsEdit

New designs are up for structured copyright and licensing statements, based on feedback from the first round of designs. Please look them over, they are very important to the project and Commons. Thanks! Keegan (WMF) (talk) 16:48, 2 November 2018 (UTC)

One question: What's that "Office" option?Edit

Hi, I'm watching the presentation about structured data on Commons, and hit the test page for search. There "Image", "Videos", etc... and then, "Office". What's that? Also, how things are going to be translated for those queries? Cheers! Tetizeraz. Send me a ✉️ ! 19:17, 3 November 2018 (UTC)

"Office" is a holding name for document type files. It's not the set name that will be used. As for translations, they will be done through, where all MediaWiki system messages go for internationalization. Keegan (WMF) (talk) 21:27, 8 November 2018 (UTC)
Thanks for the clarification! Tetizeraz. Send me a ✉️ ! 16:40, 11 November 2018 (UTC)
Return to the project page "Structured data".