Commons:Bots/Requests/YaCBot

YaCBot (talk · contribs)

Operator: McZusatz (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought:

  • File-Cleanup:
    1. Internationalization
    2. General cleanup (com:regex: Dates, Format, Junk, Interwikilinks, ...)
    3. com:OVERCAT-cleanup removed per discussion OverBot will solve those
    4. Remove duplicate categories (regardless of their sortkey)
    5. Remove {{Uncategorized}} if more than zero two or more visible categories are there
    6. Mark as uncategorized if zero visible categories are found (Only hidden ones or no at all)

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Continuous

Maximum edit rate (e.g. edits per minute): less than 12 edits per minute. (Not all files need cleanup and thus some API request do not result in an edit. Also maxlag is set to 3)

Bot flag requested: (Y/N): Y

Programming language(s): Java

McZusatz (talk) 19:26, 22 December 2013 (UTC)[reply]

Discussion

  • With regard to "Remove {{Uncategorized}} if more than zero visible categories are there", this is likely to be problematic due to mass adding of naff categories, both automatically and by hand. There are good reasons to add highly generic categories to images, such as Yellow flowers and these can be highly useful when using category intersections (something that will become much easier for most users with planned improvements to the search function). In these situations removing the uncategorized template is not a good move as we still want to encourage better categorization by addressing content, place, time and any other relevant dimensions/qualities. Note that for both DoD and Geograph categories I have Faebot removing this template as part of housekeeping, but this tends to both be context sensitive and I apply a rule of thumb that there have to be a minimum of 2 visible categories rather than just 1 (i.e. not just "George W. Bush" (content) but categories so you can find George on a trip in Afghanistan (place)). Are there ways you plan on compensating for this? -- (talk) 07:47, 1 January 2014 (UTC)[reply]
I can either raise the limit (e.g. 2) or remove this functionality completely. --McZusatz (talk) 12:08, 1 January 2014 (UTC)[reply]
  Done. The limit is now 2. --McZusatz (talk) 01:39, 2 January 2014 (UTC)[reply]
  • I will start the test run sometime in the next 24 hours...
  • I have updated the task description to match the most recent changes. (Per Fae's comment)
  • In regards to the overcategorisation cleanup I am not sure what you want to hear but maybe this FAQ helps:
Does the Bot consider the whole category tree while cleaning up?
No, for several reasons this is not useful as of now, the bot will only consider grandparents. First, some edits may be too elusive to keep track of. (Imagine a grand-grand-...-grandparent category gets removed and the uploader or editor may not understand the removal; resulting in a rollback). Second, the category tree itself is not cleaned up in regards to overcat, which should be done first. (I may file another bot request for this task). And lastly, the category tree contains loops (user:dschwen estimated them to be about 10^3) which have to be cleaned up by hand first. Otherwise the bot will likely die or produce unexpected results as I am confident that there is no way to tackle the overcat-problem universally if there are loops in the tree.
Are hidden categories considered for the overcat cleanup?
No, as of now, the cleanup is limited to visible categories. (Otherwise the bot will controversially remove category:Files uploaded by Russavia from all 60.000+ files in category:Files uploaded by Russavia (cleanup). --McZusatz (talk) 16:51, 2 January 2014 (UTC)[reply]
I am working on a fast category inspection database server and could add overcategorization queries if this is wanted. The server can be queried through XHR and/or WebSockets (intersection example: GFDL licensed QIs) --Dschwen (talk) 17:48, 2 January 2014 (UTC)[reply]
I'd like to see a human readable output of the loops. Thus I can bookmark it and whenever I am bored I can clean up some loops. :) --McZusatz (talk) 12:38, 3 January 2014 (UTC)[reply]
Test-run Status:
105 files crawled in 0 days 0 hours 6 minutes 36 seconds. (3.7714286 seconds per file for read, write and sleep)
Resulting in 42 edits. (Most of them were the annoying internationalisation-edits but you may refer to the four last edits and 12 first edits to see the interesting ones.) --McZusatz (talk) 12:38, 3 January 2014 (UTC)[reply]
Looks OK for me, but will be good idea to look on bot contributions to hide service categories (example: File:214 Dissociation of Sodium Chloride in Water-01.jpg). --EugeneZelenko (talk) 15:18, 3 January 2014 (UTC)[reply]
Sorry, my bot does not detect violation of com:HIDDENCAT. (I have marked Category:CNX missing caption hidden by hand, but it is impossible for my bot to do) --McZusatz (talk) 16:00, 3 January 2014 (UTC)[reply]
In case the overcategorization is _really_ wanted, one can 'fix' it with cat-a-lot in five clicks anyway and mark the subcategory as hidden. --McZusatz (talk) 20:54, 3 January 2014 (UTC)[reply]
May be specifying parent and child category in edit summary and bot logs will help humans to detect such problems? --EugeneZelenko (talk) 15:35, 4 January 2014 (UTC)[reply]
  Done; see the bot's two most recent edits. However, I think a bot log will grow rapidly beyond 10MiB (200 bytes per entry) and consequently slow down the bot. Removing 'old' items from the log makes it incomplete and useless. The best log is the contribution log which does not have the mentioned problems. (Of course I can enable the log with just one click, if this is the consensus)--McZusatz (talk) 20:39, 4 January 2014 (UTC)[reply]
Sure, contributions page is best solution but will be good idea to create dedicated account for overcategorization fixes. --EugeneZelenko (talk) 15:29, 5 January 2014 (UTC)[reply]
I can make the bot to edit pages only when an overcat-cleanup was made and then create user:YaCBot 2 to clean up files while no overcat-cleanup is made, if this is what you meant. --McZusatz (talk) 18:08, 5 January 2014 (UTC)[reply]
You are correct. I could only suggest to include task name into bot name :-) --EugeneZelenko (talk) 15:19, 6 January 2014 (UTC)[reply]
  Done. YaCBot will do the file cleanup and OverBot will do the Overcat cleanup for files. --McZusatz (talk) 23:31, 6 January 2014 (UTC)[reply]

If there are no objections, I think bot status should be granted. --EugeneZelenko (talk) 15:30, 8 January 2014 (UTC)[reply]