Commons:Bots/Requests/YiFeiBot (13)

YiFeiBot (talk · contribs) (13)

Operator: Zhuyifei1999 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: check every image to see if it includes {{License template tag}} or {{No license since}} on it. If not, add the page to Category:Media without a license: needs history check

Automatic or manually assisted: Automatic unsupervised

Edit type (e.g. Continuous, daily, one time run): Daily

Maximum edit rate (e.g. edits per minute): 6 edits per minute

Bot flag requested: (Y/N): N

Programming language(s): python: pywikipedia

Zhuyifei1999 (talk) 06:09, 27 October 2013 (UTC)[reply]

Discussion

  • Please make a test run. Is it possible to go through all use uploads after finding one problematic file? This will to reduce clutter on user talk pages. --EugeneZelenko (talk) 14:47, 27 October 2013 (UTC)[reply]
      On hold buggy with those uploads before 2011, I'll do another test run after tons (may be 10M) of null-edits is done. Also, go through all use uploads after finding one problematic file is hard, but I'll try with some additional sql queries. --Zhuyifei1999 (talk) 09:25, 28 October 2013 (UTC)[reply]
  • I am confused as to why do you think you need 10M null-edits, I did not noticed any issues with older uploads. this edit to {{GNU-Layout}} set thinks back for a while, so now we will have to look for files lacking {{License template tag}} or {{GNU-Layout}} for next few months. As I mentioned before the manual pipeline we use when dealing with images with no license is:
  1. Check for uploads done within the last few weeks (or since the last run) that do not have a license: (lacking {{License template tag}} or {{GNU-Layout}}) or is already tagged with {{No license}}, {{Delete}}, {{Speedydelete}}, {{Remove this line and insert a license instead}} or in Category:Media without a license: needs history check. Add {{No license}} using VisualFileChange. VisualFileChange uses user friendly I message per user, which is very important since often people make the same upload mistake on 10's or hundreds of files and we really do not want to add 100's of templates to their user pages.
  2. All older files should be added to Category:Media without a license: needs history check, since they are most likely files that "lost" license somehow. Do not tag those with {{No license}}, since many are by no longer active users, and it is not their fault that some vandal of inexperienced editor removed the license and nobody noticed. Deleting admins are suppose to check the edit history before deleting, but I suspect that few of them do.
Alternatively we can look at number of edits to the file. Files with a single edit, or edit only by the uploader and categorization (and other) bots can be tagged with {{No license}}. --Jarekt (talk) 12:24, 28 October 2013 (UTC)[reply]
Changing to files with a single edit. Anyways, the 10M null edits are for the files before {{License template tag}} is created, so it somehow lacking the update of the templatelinks table and generates some false positives while doing a sql query (more than 19 out of 20), sometimes even null edit just before the tagging won't work, so have to do the null edits. --Zhuyifei1999 (talk) 12:47, 28 October 2013 (UTC)[reply]
{{License template tag}} was created over 2 years ago and is now transcluded on about 18,029,597 pages (compare to 19,089,184 files present). I did not observed any false positives due to non-current templatelinks table, in last year when quarrying the database. The only exception were files using {{GNU-Layout}}. --Jarekt (talk) 17:24, 28 October 2013 (UTC)[reply]
Oh? Special:Permalink/108061906 & Special:Permalink/108089514 (see also the dates before the edits) was because of this with this sql query:
SELECT page_title
FROM page
WHERE page_namespace = 6
AND page_is_redirect = 0
AND NOT page_id IN (
    SELECT tl_from
    FROM templatelinks
    WHERE tl_title = "License_template_tag"
    OR tl_title = "No_license_since"

--Zhuyifei1999 (talk) 09:08, 29 October 2013 (UTC)[reply]

By the way, the query I used for a while was:

select /* SLOW_OK */ page_title 
from page 
where 
  page_is_redirect=0 and 
  page_namespace=6 and 
  not exists (
    select * 
    from templatelinks 
    where 
      tl_from=page_id and 
      tl_namespace=10 and 
      tl_title in ("License_template_tag","GNU-Layout","No_license","Delete","Speedydelete","Remove_this_line_and_insert_a_license_instead") 
    limit 1 
  )

I think now that may be we should start simple, so if the bot could only put all the images meeting the above criteria (and not already in this category) to Category:Media without a license: needs history check or a day subcategory, that would be ideal. Sometimes I run into issues where one typo in some book template causes 100's of images to loose a license. We do not want to sent several hundred notifications to one poor uploader that probably did not broke the template to start with. So multi-step approach with human-in-the-loop might work better. --Jarekt (talk) 15:57, 10 December 2013 (UTC)[reply]

Sorry, I'm busy recently. I'll look into it this Friday. --Zhuyifei1999 (talk) 11:56, 11 December 2013 (UTC)[reply]
  Changing to add the category for every match --Zhuyifei1999 (talk) 07:26, 14 December 2013 (UTC)[reply]
  Done another test run at [3] with two reruns for improvements. --Zhuyifei1999 (talk) 08:25, 14 December 2013 (UTC)[reply]
Is it possible to create different edit summaries depending on history size? It's not clear what bot did: added category or template. --EugeneZelenko (talk) 15:30, 14 December 2013 (UTC)[reply]
  Changed summary assuming you mean like that. --Zhuyifei1999 (talk) 06:31, 15 December 2013 (UTC)[reply]
Looks much more informative for me. --EugeneZelenko (talk) 15:40, 15 December 2013 (UTC)[reply]

If there are no objections, I think task should be approved. --EugeneZelenko (talk) 15:31, 19 December 2013 (UTC)[reply]

  Agree--Jarekt (talk) 15:42, 19 December 2013 (UTC)[reply]