|This thread may be parked for a while, but it is in my backlog. If it gets archived, feel free to un-archive it.|
@Fæ: searching for a photo of Frances Carpenter here on commons, I came across the images from Category:Chevy Chase Club and noticed the TIF files, I was wondering if it makes sense these files to be added to the same categories like for example jpg files. Thank you for your time. Also, this being the last day of 2018, allow me to present my very best wishes for 2019! Lotje (talk) 13:32, 31 December 2018 (UTC)
Request & Invitation for Wiki Loves Love 2019 JuryEdit
Hello @Fæ:, Hope you are doing well! I am co-ordinator of Wiki Loves Love 2019, an international Commons Contest aimed at documenting love in different cultures and the theme of 2019 is festivals, ceremonies and rituals of love. We would be honored to have you on Jury for the contest, which will happen from February 1- February 28 2019. Please let us know if it is agreeable, our team would be excited and thrilled to have you on board. The timeline would be after first week of March to couple first weeks of April. Hopefully, that time would be enough for the jury. And if it is not, then we can always extend. But we will do the pre-work and your work would be to select the winner photographs. Happy New Year to you. Wishing you lots of love and happiness.Wikilover90 (talk) 10:12, 14 January 2019 (UTC)
State Library of VictoriaEdit
[Acknowledging that you're on a Wiki break. This is not urgent, and can await your return/ later availability; hope all is well with you]
This search (for example) lists pictures in the collection of State Library of Victoria that are out of copyright. Each is available (via the "Available online" link) as a jpg, and as a high-res tif. The resultant "Download" pages lists them as "Out of Copyright". Could you automate downloading all their OoC artworks, please? They have an API. Category:Paintings in the State Library of Victoria has very few members. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:04, 16 January 2019 (UTC)
Recatagorized all images in the Category:Silverton to Category:Silverton, New South Wales.
ESA duplicate picturesEdit
Hi Fæ! Thanks a lot for User:Fæ/Project list/ESA! I'm categorizing many pictures from Category:ESA images (review needed) and found some duplicates of pictures previously imported in Category:ESA_files_uploaded_by_Revent. I didn't check all of them but it's likely all of these 254 photos have been duplicated. What should we do? Keep the oldest ones (those imported by Revent) or yours? vip (talk) 15:39, 25 December 2018 (UTC)
- @Don-vip: The decision of which to keep should be based on which image is better quality (presumably these will all be digitally identical), and which have the better description. There's no issue with deleting my uploads where they are duplicates, I am still puzzled as to exactly how these are created, as I do automatic duplicate checks before upload. --Fæ (talk) 09:19, 27 February 2019 (UTC)
@Don-vip: I have started an initial test generating a local (i.e. off-wiki) JSON database of image hashes for all ESA uploads. There are 2,689 subcategories, and I do not yet know how many unique files that includes, so it will take many hours/days to generate the database, but once generated testing new files will be quick. Depending on eventual size, I may have to rewrite my code if it becomes impossible to hold the results in a memory array. These hashes can then be referenced to see if a version of the same photograph already exists on Commons before upload. A simple minded check so that a version with the same extension that is not a higher resolution will never be uploaded makes sense for the vast majority of uploads we are interested in hosting from the ESA. This would allow for a TIFF to be uploaded which is identical to an existing jpeg, but not another jpeg.
Here is a typical example the image hashes can discover and could prevent, these two photographs were not previously linked or marked as duplicates. They are identical with a hex image hash of '2001191209594c68', the difference probably from the fact that one has EXIF data and the other does not. There may be other 'hidden' differences that appear to cause one thumbnail to look blurred in the gallery preview (at least in Chrome):
Here's a counter intuitive gallery created from 4 existing uploads which match the hex image hash "4070f8ecece9f162" and are 'virtually' identical because they were taken by a satellite only seconds apart. In the envisioned new upload process only the first would get uploaded:
I was tempted by the idea of adding the image hashes to the Commons image pages, either as an infobox parameter or as hidden metadata, but I am not sure that anyone else would ever use them. Even my specific way of generating the thumbnail to be hashed, and then which type of image hash to apply, is not an agreed 'standard'.
Note that the existing Commons API available SHA1 hash shows which files are 100% digitally identical. The image hash is effectively 'looking' at the image and shows up 'virtually' identical images, even if they have different resolutions or are hosted in different formats like jpeg/TIFF/png. --Fæ (talk) 09:31, 28 March 2019 (UTC)
By the way, these hashes are being generated on my 8 year old second hand laptop, which is now my main machine. These type of experiment are at the limit of what I can do at home, and I have to push it down to the lowest processing priority to avoid over heating. I could do some of this on labs, but it's not such a good environment for playing around with early testing. If it was less of a hassle, I could shape a project like this into a grant request, and maybe upgrade some of my (literally) dying kit. But, it's a hassle, and the sun is shining today. --Fæ (talk) 11:10, 28 March 2019 (UTC)
@Fæ: Great experiment! It would be useful for all imports, not only ESA. It's never been considered by the foundation to be added as an official Commons feature? vip (talk) 22:33, 29 March 2019 (UTC)
- You can find more background and experiments at User:Fæ/Imagehash.
- There was a discussion on the VP at the time, and there was a phabricator ticket (linked on the right), but the idea of the WMF picking this up was effectively abandoned.
- No doubt I could get some funding and run after this as a potential Commons improvement, especially for tracking copyright violations, but my feelings are about the same as they were in 2017, I'm not sure I want to vanish down this rabbit hole. With a strategic eye, it may be that recent changes in copyright law, putting more onus on 'hosts' to track copyright violations, will force WMF development to spend some time on media file matching solutions at the time of upload, beyond SHA1 matching. Leaving this one on pause for now, apart from the odd interesting experiment like this one... --Fæ (talk) 11:51, 30 March 2019 (UTC)
After playing around a bit more with this today, I will try a new upload run. Not only are imagehash matches checked, but near matches are checked within a difference of 2 (which means very close matches). An example of a match stopping an upload is quite interesting, as it highlights duplicates at the ESA source, even though they have different ID numbers and descriptions:
- Attempting upload of "Ground tracking stations ESA418974.jpg", Source: https://www.esa.int/spaceinimages/Images/2019/03/7_Ground_tracking_stations
- Near match to existing File:New Norcia station ESA300788.jpg, Source: http://www.esa.int/spaceinimages/Images/2013/12/New_Norcia_station
On the other hand here is an example of an exact difference hash match, which in truth is not an exact match as the same data is used to generate a different colour image. Potentially I could use a different imagehash to account for colour, but no plan to do this at the moment:
- Attempting upload of "Receding waters ESA418751.jpg", Source: http://www.esa.int/var/esa/storage/images/esa_multimedia/images/2019/03/receding_waters/19317074-1-eng-GB/Receding_waters.jpg
- Match to existing File:Cyclone Idai floods near Beira 2019-3-19 by ESA Copernicus Sentinel-1.jpg
- However the related TIFF does get uploaded File:Receding waters ESA418751.tiff
Large TIFF duplicates will not be tested, mainly because of the limits of running this on my laptop*. However these seem less likely than png, gif and jpeg duplicates. * The source image is loaded to RAM, resized to 80px wide using 4 different methods and then 4 hashes are generated, often but not always giving the same hash, this added complexity is attempting to match how Commons does its thumbnail generation, without exactly understanding the detailed methods. Clearly, as TIFFs might be >200MB, it is unrealistic to handle those the same way.
There is potential to retrospectively go back through the past ~36,000 (unique) uploads from ESA and mark duplicates and matches with different filetypes, but that will be a separate exercise. --Fæ (talk) 12:11, 3 April 2019 (UTC)
Revisiting the image hash database, I have started to tease out duplicates in more detail and examine 'outliers'. One amusing discovery was to find 30 ESA images with image hashes of zero, definitely a "what?" moment:
Then, perhaps more funky, is the discovery of 29 ESA images with a precise image (difference) hash of f0f0f0f0f0f0f0f0, presumably because the data for most of these was computer enhanced to have all pixels perfectly "balanced".
Eventually, with a bit more debugging on real/not real duplicates, I'll probably be creating a "duplicates" category and adding cross-referencing galleries for (literal) non-duplicates with matching & near match hashes in the other versions parameter.
Before someone else picks this up, some of the images are not ESA. They get included because they are inherited in the automatic searching though all subcategories of the top ESA category. Unfortunately that's down to the not-necessarily logical way that Commons lets human volunteers make subjective choices about category hierarchies. --Fæ (talk) 12:30, 6 April 2019 (UTC)
Commons:Deletion requests/File:Sullivan, daniel.jpg Taivo (talk) 18:04, 9 April 2019 (UTC)
Alarming thing from the EUEdit
This certainly bears watching. I side fully with the Internet Archive on this one. Hopefully this madness won't affect Commons in the future.
https://boingboing.net/2019/04/11/one-hour-service.html Abzeronow (talk) 20:00, 11 April 2019 (UTC)
Wanted Category with meaningless nameEdit
It is helping young students learn that the cockpit is not really that complicated ~ and all those fancy buttons and such do have a purpose and are very easy to learn ~ Mitchellhobbs (talk) 21:25, 13 April 2019 (UTC)
Notification about possible deletionEdit
Source of derivative work is not properly indicated: File:George Siemens, University of Manitoba Learning Technologies Center, (1634386213).jpgEdit
File:'I am the power.' Rudi, a regular character on the streets of Harrogate, wanted me to take his picture looking at the sun. 'Can you see the light in my eyes?' His views on existentialism are quite (4002194411).jpgEdit
Commons:Deletion requests/File:'I am the power.' Rudi, a regular character on the streets of Harrogate, wanted me to take his picture looking at the sun. 'Can you see the light in my eyes?' His views on existentialism are quite (4002194411).jpg E4024 (talk) 22:32, 14 April 2019 (UTC)
Commons:Deletion requests/File:"There is a Song About The Indie Scene - " (25594237500).jpg E4024 (talk) 22:34, 14 April 2019 (UTC)
Commons:Deletion requests/File:Benjamin back at school today-1 (24545404072).jpg Mjrmtg (talk) 11:58, 18 April 2019 (UTC)
Notification about possible deletionEdit