Open main menu

Wikimedia Commons β

Commons:Bots/Work requests

< Commons:Bots

Shortcut: COM:BR · COM:BWR

Bot policy and list · Requests to operate a bot · Requests for work to be done by a bot · Changes to allow localization  · Requests for batch uploads
Gnome-system-run.svg


Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day.

Cleanup Panoramio files needing categoriesEdit

There are lots of files located in the child categories Category:Photos from Panoramio needing categories by date. Many of these have been categorized, so they need to be moved out of these categories. Ideally this should be done by the person doing the categorization, but for whatever reason people haven't been doing that step. Thus, it would be nice to have a bot remove the files from these child categories if they actually are categorized (criterion: at least one category that isn't hidden) on an ongoing basis. Thanks. howcheng {chat} 18:22, 23 August 2017 (UTC)

I am trying to do the task. --Kanashimi (talk) 15:06, 31 August 2017 (UTC)
I will start the long-term task in few days. --Kanashimi (talk) 22:36, 29 September 2017 (UTC)

@Kanashimi, Howcheng: The criterion "at least one category that isn't hidden" should be replaced with a more complicated criterion. E.g. Category:Photographs taken on 2017-03-28 is not hidden (and should not be hidden) but an image categorized just in such one category should be dealed as uncategorized image. The bot process needs to be tracked and manually checked. --ŠJů (talk) 20:55, 2 October 2017 (UTC)

Please mark all direct subcategories of Category:Photos from Panoramio needing categories by date as {{Hiddencat}}, just as subcategories of Category:Photos from Panoramio needing categories by ID are marked. --ŠJů (talk) 20:55, 2 October 2017 (UTC)

@ŠJů: You are right. I am now getting the contents and checking hidden categories via expandtemplates; it is more precisely. About marking Category:Photos from Panoramio needing categories by date, I am coding now. Please wait for several days. --Kanashimi (talk) 00:24, 4 October 2017 (UTC)

ReviewerEdit

Hello.Why there are no other bots review files from other websites such as Flickr?I suggest creating similar bots for other websites.For example: to review images from Pixabay.This will saves humans a lot of time.I think it is possible to apply the FlickreviewR method with most sites.Some images in addition to videos will remain for humans.Thank you ديفيد عادل وهبة خليل 2 (talk) 19:14, 20 September 2017 (UTC)

HTTPS link upgrade for Geograph Britain and IrelandEdit

About 4% of all images on Commons are from the Geograph Britain and Ireland project. There are therefore a large number of links to geograph.org.uk on image description pages. That website now supports TLS, so links can be upgraded from http:// to https://, surely an ideal task for a bot to work its way through slowly. An example page would be File:Bretforton_post_office_-_geograph.org.uk_-_803570.jpg, which includes a link to both the photo and the author profile page on geograph.org.uk. These URLs fit the form http://www.geograph.org.uk/photo/[0-9]+ and http://www.geograph.org.uk/profile/[0-9]+ respectively and simple addition of an 's' in the protocol part accomplishes the change. Is anyone interested in taking this on? Beorhtwulf (talk) 23:38, 20 September 2017 (UTC)

@Beorhtwulf: About 1,783,308 files currently qualify, can't touch this.   — Jeff G. ツ 15:12, 21 September 2017 (UTC)
Because there are too many? That's interesting because I understand most of those images were uploaded by a bot in the first place - a much bigger job than this text substitution. An edit every five seconds would get the job done in around three months, or cut that time down if it's ok to go for shorter intervals between edits. Is this not feasible? Another thing that may help put a dent in the size of the task: a large number of links to that site are likely to be from the protected template that we are waiting to be updated, dealing with many of them without the need to go through editing each page: {{Geograph/en}}. Beorhtwulf (talk) 00:19, 22 September 2017 (UTC)
The http:// links will work into the future. So isn't this unnecessary and a waste of resources (Every edit is precious)? —Dispenser (talk) 20:09, 22 September 2017 (UTC)
Even we surely need to do this, I think it is better useing a template to replace all the links instead of just add a s. --Kanashimi (talk) 08:57, 2 October 2017 (UTC)

Coords from exifEdit

Could you get coordinates from files' EXIF in category:Files by Juandev for ticeket 1344 and move it to template:Location dec, please? --Juandev (talk) 20:00, 10 October 2017 (UTC)

Hi Juandev, i think that user:Dschwen is doing that sooner or later. --Arnd (talk) 09:26, 11 October 2017 (UTC)

Category:Files with no machine-readable licenseEdit

It seems that since some days every redirect resulted by a rename is shown up in this category. Currently we have about 9000 in it. I don't know whats wrong or how to fix it. But as a workarround it would be nice if a bot could do just a null-edit to any redirects within this category, which removed the entry for that redirect. This should be done regularly (once or better multiple times a day) until the reason for this fault is fixec. Thx in advance. --JuTa 15:26, 14 October 2017 (UTC)

This is related to phab:T108662. I suggest to wait a few day until the patch is merged. --Steinsplitter (talk) 15:52, 14 October 2017 (UTC)
Thats fine, but a single bot run after the patch has been applied would be fine. (I dont like to do 9000+ null-edits manualy) --JuTa 16:09, 14 October 2017 (UTC)

Rename files with wonky Unicode encodingEdit

I placed this request at User_talk:CommonsDelinker/commands#Other_requests first but I was told I should place it here: The following 342 file names have bad Unicode encoding. For example, instead of containing the character "–" (&#8211;), they contain the text "&-8211;". The same for three other characters:

  • &-8217; = ’
  • &-8220; = “
  • &-8221; = ”

For the moment I will list here only three files, the complete list can be found at User:Fructibus/A.

CommonsDelinker: Replace File:Almost complete "sub-cordate" flint handaxe of Lower Palaeolithic date (500000 BC &-8211; 40000 BC). (FindID 112531).jpg with File:Almost complete "sub-cordate" flint handaxe of Lower Palaeolithic date (500000 BC – 40000 BC). (FindID 112531).jpg across all Wikimedia projects. Reason: Wonky Unicode encoding
CommonsDelinker: Replace File:An incomplete and ornate cast copper alloy double looped asymmetrical buckle with an integral bar and a cast copper alloy hinged plate. Medieval or Early Post Medieval (AD 1300 &-8211; AD 1600). (FindID 97064).jpg with File:An incomplete and ornate cast copper alloy double looped asymmetrical buckle with an integral bar and a cast copper alloy hinged plate. Medieval or Early Post Medieval (AD 1300 – AD 1600). (FindID 97064).jpg across all Wikimedia projects. Reason: Wonky Unicode encoding
CommonsDelinker: Replace File:An incomplete and ornate cast copper alloy double looped asymmetrical buckle with an integral bar and a cast copper alloy hinged plate. Medieval or Early Post Medieval (AD 1300 &-8211; AD 1600). Detai (FindID 97064).jpg with File:An incomplete and ornate cast copper alloy double looped asymmetrical buckle with an integral bar and a cast copper alloy hinged plate. Medieval or Early Post Medieval (AD 1300 – AD 1600). Detai (FindID 97064).jpg across all Wikimedia projects. Reason: Wonky Unicode encoding

I hope this is the right place for such requests. Thanks. Fructibus (talk) 12:31, 15 October 2017 (UTC)

-- ; sql --cluster analytics commonswiki_p
SELECT
  page_title,
  COUNT(*), SUM(page_is_redirect=0) AS Files,
  REGEXP_REPLACE(page_title, ".*?&([^;]{2,16});.*", "&\\1;") AS HTML_Entity
FROM page
WHERE page_namespace=6 /*File:*/
AND page_title LIKE "%&%;%"
GROUP BY HTML_Entity
HAVING Files > 1
ORDER BY Files DESC
LIMIT 80;
Files HTML_Entity Replace Example
314 &-8211; U+2013 (En dash)
67 &-195; ...SAN DIEGO (March 28, 2007) &-195;&-145; Sailors...   Done
27 &-34; U+0022 (Quote Mark)
21 &-x2F; U+002F (Forward slash)
15 &-8220; U+201C (Left curly quote)   Done
15 &-8221; U+201D (Right curly quote)   Done
12 &_-39; U+0027 (Apostrophe)   Done
11 &-226;&-128; ...an electrician&-226;&-128;&-153;s mate…   Done
9 &-x27; U+0027 (Apostrophe)   Done
9 &-8216; U+2018 (Left curly apostrophe)   Done
8 &_amp; U+0026 (Ampersand)   Done
8 &-65533; ...Women&-65533;s Acc...   Done
1 &-194;&-160; ...against Vanderbilt.&-194;&-160;.jpg   Done
7 &-194;&-168;   Done
7 &_-8217; U+2019 (Right curly apostrophe)   Done
5 &_quot; U+0022 (Quote Mark)   Done
4 &_-8216; U+2018 (Left curly apostrophe)   Done
4 &-39; U+0027 (Apostrophe)   Done
4 &-8217; U+2019 (Right curly apostrophe)   Done
4 &_1740; U+06CC (Arabic Dot-less Ya)   Done (good catch!)
2 &-201; ...de l'Arriere - &-201;conomisez le...   Done
2 &-259; U+0103 (a with breve)   Done
1 &-x1F; Fran&-x1F;çois...   Done
Those aren't wonky, just # is invalid for MediaWiki file names while they're just fine in Windows. Dispenser (talk) 17:42, 15 October 2017 (UTC)
There seem to three types of errors: space/underscore inserted after &, the HTML entity sent (both in decimal and hexadecimal), and for some reason &-226;&-128;&-153; which is the UTF-8 sequence e2 80 99 for U+2019 (Right curly apostrophe). Dispenser (talk) 03:06, 16 October 2017 (UTC)
Possibly fixed phab:T67297 (deploy, kinda). Most recent occurrences: 2x Oct 8 and 2x Oct 10 by User:Fæ (direct entity variant), 1x Sept 25 by User:Lz jawa using UploadWizard (space inserting), several times in August by User:*angys* using Flickr2Commons (direct hexadecimal entity). Also, User:Pharos in July using GWToolset (direct hexadecimal entity). Dispenser (talk) 03:43, 16 October 2017 (UTC)
New Report: User:Dispenser/HTML entities Dispenser (talk) 18:46, 16 October 2017 (UTC)

From what I have read, this is a small number of files. No objections to the obvious fixes, or I can eventually fix by tweaking an existing regex based renamer if nobody gets to it. -- (talk) 04:04, 16 October 2017 (UTC)

I think the multi-part entries require a closer look, as there are file names like File:Alexia Chascsa completes &-195;&-162;&-226;&-130;&-172;a"dunker&-195;&-162;&-226;&-130;&-172;&-194;- training at the Helicopter Overwater Survival Training facility during Aviation Spouses Day 130607-A-SM724-320.jpg or File:U.S. Army Staff Sgt. Matthew Parsons, 2nd Battalion, 309th Regiment, an egress-training instructor, climbs a rope obstacle during day two of the 174th&-195;&-162;&-226;&-130;&-172;&-226;&-132;&-162;s 130313-A-IM587-889.jpg. --Achim (talk) 17:02, 16 October 2017 (UTC)
html.unescape(u'&-195;&-162;&-226;&-130;&-172;&-226;&-132;&-162;'.replace('-', '#')).encode('cp1252').decode('utf-8').encode('cp1252').decode('utf-8') comes out as U+2019 (Right curly apostrophe)

If it didn't work the first time do it again ;-) Can't figure out the first one though. —Dispenser (talk) 02:45, 17 October 2017 (UTC)

There some Mojibake, like "Горный Алтай" becomes File:Đ^ldquo,ĐžŃ^euro,Đ˝Ń^lsaquo,Đš Đ^ĐťŃ^sbquo,Đ°Đš - panoramio - Tanya Dedyukhina.jpg which also has caret encoding. —Dispenser (talk) 16:46, 17 October 2017 (UTC)

Caret encodingEdit

Main discussion: User talk:Dispenser/HTML entities#Caret encoding
Files Entity Char Info
7116 ^^39, ' U+0027 (Apostrophe)
3584 ^quot, " U+0022 (Quote Mark)
1259 ^amp, & U+0026 (Ampersand)
142 ^rsquo, U+2019 (Right curly apostrophe)
118 ^gt, > U+003E (Greater-than sign)
71 ^pi, π U+03C0 (pi symbol)
47 ^^093, ] U+005D (Right Bracket)
46 ^^091, [ U+005B (Left Bracket)
35 ^sbquo, U+201A (Openning quote mark)
26 ^lt, < U+003C (Less-than sign)
26 ^ndash, U+2013 (En dash)

This is where a bot converts certain characters into HTML entities then swaps & ; for ^ , (sometimes stripping trailing ,). Most come from User:BotMultichillT (446 caret names, fixed in November 2011?) User:Panoramio upload bot (10,631 caret names, fixed in December 2016?) with last occuance in December 2016. There's apparently been code changes, as newer version of Panoramio bot created mojibake while earlier versions handle Thai script just fine. —Dispenser (talk) 21:58, 18 October 2017 (UTC)

Medeival against MedievalEdit

Sorry, I'm fairly new at Wikimedia Commons, so I'm not sure if it's desirable to report such file name errors. I've found 324 files containing "Medeival" instead of "Medieval" (I can't find the word "Medeival" in any dictionary, so it's probably a misspelling). I've placed the list here: User:Fructibus/Errors#Medeival. If it's considered useful to rename them, then I can use the {{universal replace}} template or whatever indicated form to make the list. Fructibus (talk) 19:57, 16 October 2017 (UTC)

248 hits 'Medeival'
49 hits 'medeival'
12 hits 'Medievel'
2 hits 'medievel'
--Achim (talk) 18:05, 17 October 2017 (UTC)