This semi-automated task runs through members of Extracted images (over 140,000 images) and where suitable to do so, where a crop is found to be a lower resolution than making a new crop from the linked parent image, the new extracted image is uploaded, overwriting the original crop. Old unnecessarily small crops may exist due to imports from Wikipedia or where use of the crop tool was on a version of the parent image before the parent was upscaled, for example by Flickrbot upload or by volunteers upgrading the file. This task remains a manual choice due to the number of crops that have been significantly edited off-wiki, such as with manual background removal or colour and saturation fixes.
* Improved to use height and width measured from each side's mid-points, so cutting through the quadrilateral's centrum.
The technology for identifying the crop was tested for LOC#Adding and updating other version galleries. A key improvement is the addition of perspective transform, i.e. the original crop from the parent image may have been edited by the user to adjust for perspective or other types of irregular transformation. Perspective transforms are highly useful with crops from larger panoramas, such as to change the focus to a single building where without a transform the result would be oddly slanted. The automated process attempts to repeat the same mapping to recreate the larger crop version.
Image rotations, or mapping an irregular quadrilateral crop to a rectangle using a homography transform, unavoidably introduce flaws in the result and conservative choices for jpeg compression mean that the end result may be significantly larger in file size than the starting crop, or even the parent image. The code uses a perspective mapping with a bicubic filter, these are available in the standard python PIL (image.transform). The identification of the crop is done using OpenCV and the FlannBasedMatcher function, which may take minutes to process a potential upload depending on resolution and visual nature.
- Non-jpeg crops or parents that are not jpeg or TIFF or PNG
- Unexpected colour formats, causing tricky problems with cv2.imdecode or cv2.cvtColor
- Mirrored derivatives example
- Rotations by EXIF headers, i.e. the crop appears to be the same orientation as the parent but this is not in the pure image data
- Images larger than 64 megapixels or crop originals with an edge over 3,500 pixels, due to unrealistic processing times
- Minor improvements to size, taken as less than 20% increase in resolution
Even with the performance related exceptions, analysing a pair of images may take minutes, and only a small minority (circa 1%) of those may provide a suitably larger new crop version. Calculations which result in near match sized crops are significantly speeded up by "precheck" runs with ½, ¼ or ⅛ size versions of the images, depending on source image sizes. In the case of Ambrogio Fusinieri (cropped).JPG which originally took 7.5 minutes to check, by instead comparing at ¼ original size, the precheck took just 0.5 seconds accurately deciding to skip the image, with far more time taken to download the two images over the internet.
As of 31 July 2018, 893 crops have been upscaled using this process.
|SQL for count of upscaled images|
SELECT COUNT(*) FROM categorylinks JOIN page ON page_id=cl_from AND cl_to = "Extracted_images" INNER JOIN logging_userindex ON log_title=page_title WHERE log_user = '1086557' AND log_action="overwrite" AND log_comment LIKE "%Fæ/upscale%";
- Selected implemented cases
As an exception, crops found in the following categories were upscaled without manual intervention:
Crops with a deleted parent are added to Category:Image crop missing parent page.