User:Mike Peel/PDF redaction

The inclusion of copyrighted or non-copyleft images in PDFs can lead to the deletion of the file from Commons, as it can't be freely licensed.

In cases where the creator of the PDF prefers that it be available on Commons, it is best to ensure that only copyleft CC-licensed images are included, and that required attribution is made explicitly clear in the PDF document itself.

If the creator of the PDF has not done that, and does not do so when the issue is pointed out, it may be possible for another Wikimedian to "fix" the PDF so that it can remain on Commons by redacting the copyrighted image. This page sets out the process for doing this, and a list of files where this process has been applied.

Process edit

  • Open the PDF with GIMP. Open pages as 'images' rather than 'layers'. Check that the file width and height are set to the right number of pixels for the PDF (possibly set the 'resolution' to '72 pixels/in'?)
  • Find the images of the affected pages. The other page images can be closed as they won't be needed.
    • Option 1:
      • Select the the 'Rectangle Select Tool', and select the area of the problematic image
      • 'Cut' the selection (cmd-X on a mac, control-x on a PC)
      • File -> Export, and save the modified images as PNGs. Once you've saved an image, you can close it (and ignore any warnings about it not being saved).
    • Option 2:
      • File -> Export, and save the modified images as PNGs. Once you've saved an image, you can close it (and ignore any warnings about it not being saved).
      • Open the file in Apple Preview, use 'Rectangular Selection' to select the area of the problematic image, and delete it
      • Save the PNG
  • Open the original PDF with Apple Preview. View -> Thumbnails.
  • You should be able to drag and drop the PNGs into the PDF thumbnails list. Then move the images around until they're in the right place, and delete the old slides (select the slide, press the backspace key)
  • Save the PDF to a new name
  • You should now have a PDF that has had the problematic image redacted!

Disadvantages edit

This process should be used as a last resort, where the PDF creator can't or won't make the changes in the source file. The reasons:

  • This process is substantially more time consuming, and requires additional software.
  • This process may impact the integrity of the PDF file, rendering it a less accurate portrayal of how the PDF was originally published.
  • The original creator may be able to address the issue in a more aesthetically pleasing way, for instance replacing a non-free file with a similar free file, rather than leaving a blanked-out image in the PDF.

If the creator can't, or won't, do this for whatever reason, then it's better to follow this process rather than deleting the file.

Files edit

Methods that don't work edit

  • Opening the PDF in Apple Preview, using the annotations tools to put a white box over the problematic image ("blanking"), and then saving the file.
    • Apple Preview saves a layered PDF. It is easy to remove the blanking by opening the PDF again in Preview and deleting the white box.
  • Opening the PDF in Apple Preview, using the annotations tools to put a white box over the problematic image ("blanking"), and then printing the PDF to file (File -> Print -> set custom paper size -> PDF -> Save as PDF).
    • If you open the PDF again in Apple Preview, then you can't remove the blanking any more. However, if you open it in Inkscape then you can easily move the white box off and the blanked image is still there.