CommCheck is an in development bot that uses Google's Cloud Vision service to automatically report common copyright violations and is designed to tackle some of the needs described in Community Wishlists.

Why?

edit

Commons is littered with copyright violations uploaded either cross-wiki or by those who just want to add a "better image" of their favourite whatever to Commons. At the moment, the main review process is human.

How it works?

edit

Google has one of the world largest repositories of commercially copyrighted images. Many patrollers on Commons use Google's reverse image search to check for potential copyright violations. CommCheck uses Google Cloud Vision to find instances of images on the internet to determine whether it is likely that the image has just been uploaded from an image search.

To prevent false positives, at the moment only CC-BY-SA-4.0 images are checked and those uploaded by accounts with more than 200 edits are ignored.

The bot will report "points". Two points is one full result in Google Images (i.e. pages that include the matching image), and one point is a partial result (which may be for cropped or modified images), so the higher the count the more image results that are found. Google limits the number of results so there will be a maximum number of points.

Who does it benefit?

edit

The entire community. It helps copyright holders by ensuring that their images aren't uploaded under invalid licenses, it helps us as Wikimedians by ensuring that images on Commons are actually true to the purpose of Wikimedia projects and it also relieves strain on patrollers.

False positives

edit

Not enough tests have been run for exact stats on false positives, but CommCheck will only check images once and ignore all images with ticket approval or any license other than CC BY SA 4.0.