Commons:Village pump/Proposals/Archive/2021/12

Tool to help with [Category:Maps of <location> by <year>]

Request
Help with creation of categories in the {{Map of <location> by <year>}} (and/or decade/century) schemes
(Since I don't know which village pump is the best to bother with my request, I first go here. Please feel free to move my request to any place better suited, if I'm too wrong here.)

My current problem is that I wish to categorize maps as precise as possible. This again requires extensive knowledge of existing categories because you NEVER know which categories already exist. Sometimes, there are already detailed categories in place where you can put an "Old map of Madagascar" right into the Category:16th-century maps of Madagascar; or an "Old map of France" right into the year Category:1892 maps of France, or an "Old map of Russia" into the Category:1760s maps of Russia. More often however, the intended category doesn't yet exist. That leaves the categorizer two options:

  • Number one, create the category tree yourself on the spot, which means quite a bit of research how the other categories of that tree are defined and then copy the templates from some other already existing category, get it wrong three times, and sometimes even adjust several other categories that missed important parent categories or whatever. Which takes quite a lot of time and is quite a punishment for the categorizer.
  • Number two is the lazy route where you don't do anything and drop your still uncategorized map smack-dab into "Category:Maps of the United States" (for example; that one has currently 1266 files of all years, map types, etc.). From there a map eventually gets moved by another more dutiful categorizer into "Category:Unidentified maps of the United States" (currently 1165 files), or directly into the correct state, or year, or both. Most categorizers choose this lazy route, because the more specific subcategory doesn't exist and you want to get some work done. Until I recently created century-subcategories for London, the "Old maps of London" contained about 800 maps from the 15th to the 20th century. That's quite a punishment for anyone who uses the categories to search for a map of some place+time and has to click through several hundred files, and it invalidates the purpose of categories.

So what can we do to create an easy-to-handle system? I have two suggestions:

  • First, to have lots of yet-empty categories created in advance: Category:1712 maps of France and Category:1916 maps of Idaho may still be empty, but you can expect that it will at some point have content, since 1711 and 1713 maps of France; and 1915 and 1917 maps of Idaho already exist. This would require a clever user to guide a bot which categories should be pre-created, based on which countries or sub-divisions existed at which point in time. Which means that it makes no sense to create "1692 maps of Uganda" - that's nonsense - but there needs to be provision for categories like the 1692 maps of HRE, Ottoman Empire and Prussia; and then again prevent 1962 maps of these countries. After such an organized creation, we'd have several thousand empty categories, but ready to be filled.
  • Second, have a bot searching for red categories that match the right parameters: IF a category like "Maps of <place>" already exists AND that category already has more than <threshold> subcategories (about 10?), the bot would recognize red categories of "YYYY maps of <place>" with the same <place>-Keyword and create the according category tree. In practice: if next year someone categorizes a new map as "2022 maps of Nigeria", the category would remain red for a while until the helpful bot creates that category as "{{MapsNigeria|202|2}}", plus the "2020s maps of Nigeria" and the "21st-century maps of Nigeria", as needed. Meanwhile, a "2022 maps of Birmingham" (map-of-category exists but has too few subcategories) would not yet create its own subcategory, and the bot could change this tag automatically into "Maps of Birmingham|2022" which allows for a later subcategorization if really needed.

--> As far as I can see, the structure of maps by year/decade/century is standardized and distinct enough to make both outlined approaches somewhat feasible. No concern is given here to the other sorting criteria (projection, topic/theme, language, etc.) as these require different and more elaborate categories. The same could theoretically be done with the "YYYY in <place>" categories (Category:2017 in Bogotá vs. Category:1968 in Kathmandu), but here I see a few more potential implementation problems. All the best, --Enyavar (talk) 18:49, 8 December 2021 (UTC)

Maps of <location> by <year> categories should be burned with fire. They are the best example of overly-specific-categorization on Commons and they typically make it impossible to efficiently browse maps on Commons. For somewhere like New York City, these categories make sense, but for most locations, we end up having 1 map per category, which is pointless. Nosferattus (talk) 00:31, 1 January 2022 (UTC)
The problem is two-fold, yes: The super-categories like "Old maps of France/Germany/England" burst with hundreds of files, because it is so much hassle to create smaller subcategories on your own. ("London" or "Madagascar" for example had no sub-categories for centuries until I created and filled them this year, just to name examples) So, because no more specific subcategories that are readily available, people pile their stuff in the super-categories, which I find hard to browse. I shudder each time I open Category:Old maps or Category:Maps of the world. These unspecific categories are pretty pointless, too.
However, I agree that there should be a helpful tool that also lets you browse through all content of all tiny subcategories in once glance, for example from Category:1660s maps of Germany. A "show me all"-Button would be another great help, and probably even more important than a tool to help breaking down the big super-piles. --Enyavar (talk) 21:51, 1 January 2022 (UTC)
This is a problem with all by-year categories, and other systematic subcategorisation. In categories with a few dozens of files it is usually easier to just browse through them, while a suitable breakup helps when there are hundreds of files (but the same breakup is seldom usable for all categories or uses). Pre-creating categories will result in the frustrating categories with 0–3 files each, one of which may be – but usually isn't – the file you are searching for. A general technical solution is needed. If we have the year and the substance category, a suitable tool should be able to find any relevant file. We need such a tool integrated in the user interface.
Ah! I notice the page banner is asking for ideas for tools. Isn't this one that should be highly prioritised? Does anybody have an outline ready, or the experience to describe the needed tool well?
LPfi (talk) 13:01, 11 January 2022 (UTC)
@LPfi: I created this one, to answer that question. --Enyavar (talk) 10:02, 13 January 2022 (UTC)
@LPfi: The more general solution will be faceted search, through the search function, powered by structured data. If somebody really wants to search for images by year, that should be low-hanging fruit for such a system. By for maps, surely it's more common to want to see a curated view of maps of a particular thing, across quite a range of dates. And that is what suitably-organised categories can lend themselves well to, with their collation of what are the natural "things" that have tended to be repeatedly depicted. Jheald (talk) 18:33, 11 January 2022 (UTC)
  Oppose. 10000% agree with User:Nosferattus. In my view everything finer than "by century" should go. Hard agree that "Maps of <location> by <year> categories should be burned with fire" (And I speak as somebody trying to prep two very large sets of map uploads). It was catastrophic that User:AnRo0002 in particular in 2019 created so many of these "by decade" and "by year" categories.
Categories should bring together things that are useful to browse together, not split them apart.
IMO the two things that are most needed are (i) have categories encompassing a meaningfully broad period, and sort them by date; (ii) split out maps depicting part of the <location> from maps depicting the whole location.
That way one brings together maps of the same thing - eg England as a whole, or London as a whole, or the continent of South America as a whole - and has the chance to see how the depictions of those places evolved. (And, equally, how previous maps got reissued decades or centuries later). And it helps us to see at a glance what is in the category that should be moved to a geographically more precise category, rather than it being lost across a million hard-to-explore different micro sub-categories. Jheald (talk) 18:20, 11 January 2022 (UTC)
Hard disagree on "Maps by century are sufficient". Sure, you can't find many maps from the 6th century, so there's no problem with that. But for some centuries (like "world maps of the 21st century"), we'll get inundated with maps, and I already started to classify new world maps I encounter by the year (like Category:2011 maps of the world) I only just started doing so while doing the unidentified maps, and I expect hundreds more maps incoming; for each year since 2000. Again, that statement I make exclusively for the 21st century maps of the world.
Some other granularities would be old maps by decade: Here I find it important to distinguish a 1800s map from a a 1890s map. Maps, their styles and what they are able to express change significantly over time, and if you are searching for a map of some (larger) place around the 1800s, I find it easier to go for the narrower decade-categories (i.e. search the maps between 1790 and 1809) than the broader century-categories (i.e. search the maps between 1700 and 1899). My two example decade-categories are already decently filled, and we know there are many files that would match the criteria, but are still entirely uncategorized. What you suggest is to sort hundreds of thousands of maps by adding the year like this [[Cat:<18th-century maps of place>|1724]]. That is just not feasible, someone tried doing that with the "Old Maps of London" once, but they failed to do so over more than a mere hundred of the maps of London. While there are thousands. The regular uploader just gives us [[Cat:Maps]], and while we have some volunteers who do little else but sort those maps - my observation is that most do so only very superficially, and don't care about subcategories. It also doesn't help that the map-category tree is never the same. How do you expect that the regular categorizer will follow a guideline to sort (old) maps by year?.
While I would love a better way to simply say "it's a <map> of <extent> from <year> about <topic>... the structured data tool is plainly unusable, requires even more arcane knowledge and also needs a lot more data-input efforts than the existing category-tools. It'd be great IF we had a "cat-a-lot for structured data". And then some automatism which automatically translates between structured data and categories. But we don't, neither.
Again, I'm referring you to this submission which should help browsing while still keeping whatever level of granularity any category currently has. You'll note that this cat and its subcats could also benefit from such a thing, which is aside from mere maps. --Enyavar (talk) 10:02, 13 January 2022 (UTC)
Until we have the working tools, the problem is when only one (or a few) of the subcategories have a suitable map. I might prefer a map from 1739, but what I mainly care of is that a certain place is included with suitable context, enough detail, certain aspects (topography/roads/coastline/rivers/whatever I happen to need), and that the map is intelligible at the scale I need. A map from 1830 may be the one that suits my needs the best – as there probably is no suitable map from 1739, and if there is one, it might not be in that category – and having a zillion of categories about maps showing roads, topographic maps etc. will not help, unless the specific category includes the map I am searching for. I won't be browsing through thousands of images, so a category where I am more likely to find my map helps, but going to the next page of a category is easier than to click on a category, get back to the parent and click on the next one, which is frustrating if the subcategories are small or deep. — Preceding unsigned comment added by LPfi (talk • contribs) 12:28, 13 January 2022‎ (UTC)
@Enyavar and LPfi: I think the structured data tools perhaps may get there sooner than you think -- especially as structured data / search / querying is the (unique) area of Commons that WMF have any central developers at the moment.
I don't know if you've used search recently, but already a search for say "Oxford" [1] comes back with prompts to refine the results by Licence / File Type / Image size / Community Assessment. Full general faceted search is hard; but adding options to narrow by image type (eg general photo / painting / print / diagram / map / etc) or by date or by maker would IMO be low-hanging fruit (along with some others), and I know that eg User:Multichill has very much been pushing for the search team to take more steps in this direction. Not least because it would start to show some real pay-back for all the investment that's gone in on infrastructure for SDC.
The more SDC starts becoming visibily useful, the more I think people will start building all manner of ways to transfer information into it -- including taggers, and tools to automatically add information of a particular type for all files in a particular set. So I think this will drop, and I think that when it does it will move surprisingly quickly.
On the other hand, what I think may be harder will be to teach search hierarchical information; and what is appropriate amount of hierarchical generalisation to include (eg should a search for "animals" return insects ? Should a search for "Oxford" include returns for every object in an Oxford collection? How should they be prioritised, compared to eg a general skyline view of Oxford, or a random streetview?) For these reasons, I do think categories will be around and continue to be important for the foreseeable future (even if some eg Multichill might disagree). Though I suspect we might see tools to filter category contents by particular SDC fields -- eg by date or date range, image type, location proximity, materials, etc. Which might favour larger categories over a larger number of smaller ones.
Even with the tools we have, eg m:PetScan I think it is already easier to find intersections between categories (eg "Old maps of London" and "1724 maps"), rather than to combine together a large number of small categories. PetScan can do the latter (eg return all files in a given category or up to 3 levels of sub-categories below it), but the more steps you go down a category tree, the more chance that it may have taken a turn in a quite unexpected direction. So in general I would say intersections are better.
Given that we start where we do, like User:LPfi, I think in general it is much less likely that I would be searching for a specific year, more likely that I would be searching for a particular type of map or subject of map over a range of years that would be acceptable. Having those maps together in a category makes that possible. Spreading them out over a myriad number of micro-categories essentially doesn't.
Possibly interestingly large libraries seem to think similiarly, so that in the systems of subject headings used for searching by eg the Library of Congress or the British Library the standard cutter they use is "Early Maps to 1800", giving rise to subject headings like "Chester -- Maps -- Early Maps to 1800" (their equivalent of categories), grouping together all maps of a place up to 1800, rather than even splitting them by centuries as proposed above.
As to whether date-sorting categories is manageable, I think it is. For one thing I have already got a script (see Commons:Bots/Requests/JhealdBot (7)) that can add |year sort keys for a particular category for a particular list of files, if the data is there in their file description page. And I suspect it was me that did at least some of the previous ordering of "Old maps of London", even if I may not have come back to it very recently. Once a category has been sorted into order, it's not so much work to deal with a few extra files at the end that may have been added since. And when people get the idea that Old maps categories should be sorted into date order, then people will do that, for tidyness.
I don't see a problem with a category having up to two to three hundred images, so long as they are sorted in an understandable way. (And it helps if they are of a similar thing, which is why I think categories like Category:Old maps of whole Wales (alone) or the categories in Category:Old_county_maps_of_England are worth separating out from the general "Old maps of Wales" or "Old maps of County X" categories.
At least this way one can see relatively quickly what there is; and, also, be able to identify quickly things that should be moved into different categories. Contrast that with eg things like Category:1750s maps of England which doesn't contain a map of England at all, but rather File:Bournemouth area 1759 map.jpg, which has a more precise categorisation, namely Category:Old maps of Dorset.
To me the by-year and by-decade categories are not helpful; and they are getting in the way of accurate classification of what the maps are actually of, by overwhelming us with so many categories that the maps which ought to be bubbled down to more precise geographical categories become no longer collected in one place, and no longer visible. Jheald (talk) 20:01, 17 January 2022 (UTC)
Discussion mentioned on the Wikimaps telegram group. Jheald (talk) 20:13, 17 January 2022 (UTC)
Discussion also noted on the Wikimaps facebook group Jheald (talk) 20:39, 17 January 2022 (UTC)