Open main menu
Notice If you want to see Python source code that supports some of my projects, go to Github and help yourself. The code is not written with reuse in mind... -- (talk) 15:57, 15 May 2018 (UTC)

If you are concerned that a category gets flooded with automated uploads, check that a template like {{Disambig}}, {{Photographs}}, {{Categorise}}, {{CatDiffuse}} or {{CatCat}} has been applied before complaining. In the case of my batch upload projects, any category marked this way will not be added to new photographs. -- (talk) 16:32, 20 September 2018 (UTC)




File:Noah Silliman 2016-11-02 (Unsplash).jpgEdit


  This thread may be parked for a while, but it is in my backlog. If it gets archived, feel free to un-archive it.

@: searching for a photo of Frances Carpenter here on commons, I came across the images from Category:Chevy Chase Club and noticed the TIF files, I was wondering if it makes sense these files to be added to the same categories like for example jpg files. Thank you for your time. Also, this being the last day of 2018, allow me to present my very best wishes for 2019! Lotje (talk) 13:32, 31 December 2018 (UTC)

Request & Invitation for Wiki Loves Love 2019 JuryEdit

Hello @:, Hope you are doing well! I am co-ordinator of Wiki Loves Love 2019, an international Commons Contest aimed at documenting love in different cultures and the theme of 2019 is festivals, ceremonies and rituals of love. We would be honored to have you on Jury for the contest, which will happen from February 1- February 28 2019. Please let us know if it is agreeable, our team would be excited and thrilled to have you on board. The timeline would be after first week of March to couple first weeks of April. Hopefully, that time would be enough for the jury. And if it is not, then we can always extend. But we will do the pre-work and your work would be to select the winner photographs. Happy New Year to you. Wishing you lots of love and happiness.Wikilover90 (talk) 10:12, 14 January 2019 (UTC)

@Wikilover90: Count me in. Happy to help out with the final selection and comfortable to take part in Skype or similar conferencing as part of that process. -- (talk) 11:46, 14 January 2019 (UTC)
@:Thank You!! We appreciate it! Lots of Wiki love!Wikilover90 (talk) 12:38, 14 January 2019 (UTC)

State Library of VictoriaEdit

[Acknowledging that you're on a Wiki break. This is not urgent, and can await your return/ later availability; hope all is well with you]

This search (for example) lists pictures in the collection of State Library of Victoria that are out of copyright. Each is available (via the "Available online" link) as a jpg, and as a high-res tif. The resultant "Download" pages lists them as "Out of Copyright". Could you automate downloading all their OoC artworks, please? They have an API. Category:Paintings in the State Library of Victoria has very few members. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:04, 16 January 2019 (UTC)

Another bump to prevent this being archived prematurely. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:29, 23 March 2019 (UTC)


Recatagorized all images in the Category:Silverton to Category:Silverton, New South Wales.

Image DeletionEdit

ESA duplicate picturesEdit

Hi Fæ! Thanks a lot for User:Fæ/Project list/ESA! I'm categorizing many pictures from Category:ESA images (review needed) and found some duplicates of pictures previously imported in Category:ESA_files_uploaded_by_Revent. I didn't check all of them but it's likely all of these 254 photos have been duplicated. What should we do? Keep the oldest ones (those imported by Revent) or yours? vip (talk) 15:39, 25 December 2018 (UTC)

@Don-vip: The decision of which to keep should be based on which image is better quality (presumably these will all be digitally identical), and which have the better description. There's no issue with deleting my uploads where they are duplicates, I am still puzzled as to exactly how these are created, as I do automatic duplicate checks before upload. -- (talk) 09:19, 27 February 2019 (UTC)

@Don-vip: I have started an initial test generating a local (i.e. off-wiki) JSON database of image hashes for all ESA uploads. There are 2,689 subcategories, and I do not yet know how many unique files that includes, so it will take many hours/days to generate the database, but once generated testing new files will be quick. Depending on eventual size, I may have to rewrite my code if it becomes impossible to hold the results in a memory array. These hashes can then be referenced to see if a version of the same photograph already exists on Commons before upload. A simple minded check so that a version with the same extension that is not a higher resolution will never be uploaded makes sense for the vast majority of uploads we are interested in hosting from the ESA. This would allow for a TIFF to be uploaded which is identical to an existing jpeg, but not another jpeg.

Here is a typical example the image hashes can discover and could prevent, these two photographs were not previously linked or marked as duplicates. They are identical with a hex image hash of '2001191209594c68', the difference probably from the fact that one has EXIF data and the other does not. There may be other 'hidden' differences that appear to cause one thumbnail to look blurred in the gallery preview (at least in Chrome):

Here's a counter intuitive gallery created from 4 existing uploads which match the hex image hash "4070f8ecece9f162" and are 'virtually' identical because they were taken by a satellite only seconds apart. In the envisioned new upload process only the first would get uploaded:

I was tempted by the idea of adding the image hashes to the Commons image pages, either as an infobox parameter or as hidden metadata, but I am not sure that anyone else would ever use them. Even my specific way of generating the thumbnail to be hashed, and then which type of image hash to apply, is not an agreed 'standard'.

Note that the existing Commons API available SHA1 hash shows which files are 100% digitally identical. The image hash is effectively 'looking' at the image and shows up 'virtually' identical images, even if they have different resolutions or are hosted in different formats like jpeg/TIFF/png. -- (talk) 09:31, 28 March 2019 (UTC)

By the way, these hashes are being generated on my 8 year old second hand laptop, which is now my main machine. These type of experiment are at the limit of what I can do at home, and I have to push it down to the lowest processing priority to avoid over heating. I could do some of this on labs, but it's not such a good environment for playing around with early testing. If it was less of a hassle, I could shape a project like this into a grant request, and maybe upgrade some of my (literally) dying kit. But, it's a hassle, and the sun is shining today. -- (talk) 11:10, 28 March 2019 (UTC)

@: Great experiment! It would be useful for all imports, not only ESA. It's never been considered by the foundation to be added as an official Commons feature? vip (talk) 22:33, 29 March 2019 (UTC)

You can find more background and experiments at User:Fæ/Imagehash.
There was a discussion on the VP at the time, and there was a phabricator ticket (linked on the right), but the idea of the WMF picking this up was effectively abandoned.
No doubt I could get some funding and run after this as a potential Commons improvement, especially for tracking copyright violations, but my feelings are about the same as they were in 2017, I'm not sure I want to vanish down this rabbit hole. With a strategic eye, it may be that recent changes in copyright law, putting more onus on 'hosts' to track copyright violations, will force WMF development to spend some time on media file matching solutions at the time of upload, beyond SHA1 matching. Leaving this one on pause for now, apart from the odd interesting experiment like this one... -- (talk) 11:51, 30 March 2019 (UTC)
Among the new uploads, an animated GIF

After playing around a bit more with this today, I will try a new upload run. Not only are imagehash matches checked, but near matches are checked within a difference of 2 (which means very close matches). An example of a match stopping an upload is quite interesting, as it highlights duplicates at the ESA source, even though they have different ID numbers and descriptions:

On the other hand here is an example of an exact difference hash match, which in truth is not an exact match as the same data is used to generate a different colour image. Potentially I could use a different imagehash to account for colour, but no plan to do this at the moment:

Large TIFF duplicates will not be tested, mainly because of the limits of running this on my laptop*. However these seem less likely than png, gif and jpeg duplicates. * The source image is loaded to RAM, resized to 80px wide using 4 different methods and then 4 hashes are generated, often but not always giving the same hash, this added complexity is attempting to match how Commons does its thumbnail generation, without exactly understanding the detailed methods. Clearly, as TIFFs might be >200MB, it is unrealistic to handle those the same way.

There is potential to retrospectively go back through the past ~36,000 (unique) uploads from ESA and mark duplicates and matches with different filetypes, but that will be a separate exercise. -- (talk) 12:11, 3 April 2019 (UTC)

Revisiting the image hash database, I have started to tease out duplicates in more detail and examine 'outliers'. One amusing discovery was to find 30 ESA images with image hashes of zero, definitely a "what?" moment:

Then, perhaps more funky, is the discovery of 29 ESA images with a precise image (difference) hash of f0f0f0f0f0f0f0f0, presumably because the data for most of these was computer enhanced to have all pixels perfectly "balanced".

Eventually, with a bit more debugging on real/not real duplicates, I'll probably be creating a "duplicates" category and adding cross-referencing galleries for (literal) non-duplicates with matching & near match hashes in the other versions parameter.

Before someone else picks this up, some of the images are not ESA. They get included because they are inherited in the automatic searching though all subcategories of the top ESA category. Unfortunately that's down to the not-necessarily logical way that Commons lets human volunteers make subjective choices about category hierarchies. -- (talk) 12:30, 6 April 2019 (UTC)

File:Internet Archive book plate project, blank example 1.jpgEdit

File:Sullivan, daniel.jpgEdit

File:Apollo 11 Astronaut Michael Collins Prepares for Weightless Conditions DVIDS854571.jpgEdit

Alarming thing from the EUEdit

This certainly bears watching. I side fully with the Internet Archive on this one. Hopefully this madness won't affect Commons in the future. Abzeronow (talk) 20:00, 11 April 2019 (UTC)

Wanted Category with meaningless nameEdit

Hello.Please deal with this category ديفيد عادل وهبة خليل 2 (talk) 12:51, 12 April 2019 (UTC)

File tagging File:Virtual Worlds- Promise and Perils (1746165368).jpgEdit


It is helping young students learn that the cockpit is not really that complicated ~ and all those fancy buttons and such do have a purpose and are very easy to learn ~ Mitchellhobbs (talk) 21:25, 13 April 2019 (UTC)

— Preceding unsigned comment added by Mitchellhobbs (talk • contribs) 21:30, 13 April 2019 (UTC)

Notification about possible deletionEdit

Source of derivative work is not properly indicated: File:George Siemens, University of Manitoba Learning Technologies Center, (1634386213).jpgEdit

File:'I am the power.' Rudi, a regular character on the streets of Harrogate, wanted me to take his picture looking at the sun. 'Can you see the light in my eyes?' His views on existentialism are quite (4002194411).jpgEdit

File:"There is a Song About The Indie Scene - " (25594237500).jpgEdit

File:Benjamin back at school today-1 (24545404072).jpgEdit

Notification about possible deletionEdit

Return to the user page of "Fæ".