User:Fæ/Project list/NewberryLibrary

Newberry Library edit

 
Gilpin's hydrographic map, 1848.

Several collections have been released by the library on suitable licenses for Wikimedia Commons. The images are released on both the Internet Archive (IA) and CARLI digital collections. Refer to blog post

A priority focus will be on high resolution public domain images, though the books available in PDF format will be of longer term interest.

All Newberry collections on CARLI are indexed here.

Technical edit

File name format will be:

<title> (NBY <image_ID>).(jpg|pdf)

NBY is the abbreviation used on CARLI.

Known errors edit

10,000 limit edit

Calling the gallery page from Python appears to only render the first page of results. Going to "/page/2" onwards just lists the same files. It remains unclear why this happens, especially as examining the page in a browser works perfectly well, though it may be a persistent cookie is expected and overrides the url parameters. A consequence is that a maximum of 10,000 results can be tested for upload until this is resolved. A work-around of searching by subcollection can be used, and in the Curt Teich collection keeps each set of results under 10,000 and can be displayed on one gallery page.

Reports edit

  • Search showing/counting all uploaded NBY files

Collections edit

Everett D. Graff Collection of Western Americana (graff) edit

657 R Everett D. Graff Collection of Western Americana
CARLI IA

The collection includes photographs, letters, notebooks, pamphlets and maps. An upper limit may be needed for the notebooks to decide whether the import should be pages as jpegs or to upload as a single PDF. Multiple pages can be cross-linked in a gallery. There are 316 objects returns on CARLI and 893 on IA, though the releases of digitizations are active, so these numbers may increase.

Testing the overlap:

  • GR_1453, a 45 page document, exists on IA but not on CARLI
  • GR_4403, a map, exists on CARLI but not on IA

A sample visual scan through both collections indicates that the releases on CARLI are likely to be more likely to be image focused, while the IA releases are far more textual, including a number of long documents that would be better loaded in PDF or DjVu format. Consequently the initial upload will be from CARLI.

On CARLI the URL includes either "singleitem" or "compoundobject", immediately separating out those that may need a gallery.

Initial run restricted to single images and documents with a maximum of 12 scanned pages.

Curt Teich Postcard Archives (teich) edit

3,737 R Curt Teich Postcard Archives
5,477 R Curt Teich Postcard Archives, Detroit Publishing Company

This is a large postcard collection, of which a maximum of 20,000 objects may be eligible for Wikimedia Commons. The metadata maps are very different to the graff collection, so there is more customization than expected. A minimum limit of 500 pixels on the longest side has been added for single images, which means that many early scans will be skipped. Example at 391 x 246, too small for realistic educational value.

Some images have no known publication date, but the library has assessed their decade. Using this a terminus ante quem is calculated to judge if {{PD-US-not renewed}} is the best copyright release.

The rights statements vary, so an additional test of rights for whether they include "No copyright known" is made. Others appear to all have commercial restrictions and should be skipped unless a year has been identified and is before 1923.

Chicago and the Midwest (chicago) edit

2,964 R Chicago and the Midwest, Newberry Library

Photographs, maps and documents relating to the history of Chicago and the Midwest. Around 3,000 images are returned. Recent images by date appear to be correctly released, but will not be part of the mass upload.

Non-Newberry Collections edit

By processing the CARLI site for Newberry Library files, it is possible to use the same script to process other collections. The number of collections where rights claims are either not applicable or are equivalent to confirming the scans are public domain appear to be in the minority.

Boys in Blue, Abraham Lincoln Presidential Library and Museum edit

1,899 R Boys in Blue, Abraham Lincoln Presidential Library and Museum

There are 950 scans of cards and photographs from the museum. All date to before 1900. Unfortunately the scans are watermarked and deliberately limited in resolution, significantly reducing their educational value.