Commons:Batch uploading/New Orleans Bee

New Orleans Bee edit

Old newspapers of the en:New Orleans Bee

All editions are located at http://www.jefferson.lib.la.us/genealogy/NewOrleansBeeMain.htm - All are in the public domain

I crawled the collection and start to download all files (found 136.667 files, assume 60-80GB) --Slick (talk) 12:12, 11 August 2012 (UTC)[reply]
The files are PDFs with a single newspaper page, each. I am not sure how to upload the files. Upload as single PDF or try join them to a complete newspaper in a single PDF (thats really a big job und I am not sure I can do it well)? Or upload only single JPGs, created from the PDFs? But when convert, the quality will less than PDF. (I assume leave a PDF will be the best solution). --Slick (talk) 13:34, 11 August 2012 (UTC)[reply]
Which one is the correct licence template in this case? I guess PD US? --Slick (talk) 08:50, 13 August 2012 (UTC)[reply]
What about missing pages like as shown on the right? Upload this 'missing pages' too or should I try to remove them before upload. (In this case I am not sure I can catch all because have to check ~137.000 files manually). --Slick (talk) 09:19, 13 August 2012 (UTC)[reply]
All downloads are finished. There are no metadata in the files, so it's not possible to identify frontpages or pages from a single day. Only the year and month is known. So I suggest filenames in example: New Orleans Bee MONTH YYYY (NNN).pdf, i.E. New Orleans Bee - April 1917 (002).pdf or New Orleans Bee - January 1917 (011).pdf. Are there any hints? --Slick (talk) 12:18, 19 August 2012 (UTC)[reply]
To categorize the uploadet documents I suggest to add them to a categorie as the example New Orleans Bee MONTH YYYY, i.E. New Orleans Bee January 1917 (which is child of New Orleans Bee YYYY, i.E. New Orleans Bee 1927). Are there any hints? --Slick (talk) 12:18, 19 August 2012 (UTC)[reply]

Opinions edit

Even though some of the scans are lousy, I believe that Commons is the right place to archive them. Who knows how long JPL will keep the archive online? We have to remember that JPL is a public library and is subject to budget cuts. To make the hole thing usable, we should put the papers together in single .pdf/.djvu files by month, or even week. I don't think it is necessary to pool them together by date. In the end it doesn't matter. Some scans can also be uploaded as .jpg when there's a need/use for it. Like the one that's already online and announcing the entry of the US into WW I.

If I can be of any help - drop me a line. --Hedwig in Washington  (Woof?) 01:23, 21 August 2012 (UTC)[reply]

I try to join the PDFs from several months to a single file. This result in single files by month >100MB which can not upload. To join them by week is not possible, because there are not metadata or like this to do this job with a script. (only manually is possible, but not for me). So I believe we have to upload them as single files. --Slick (talk) 16:01, 23 August 2012 (UTC)[reply]

I give it a try in the next couple of days. --Hedwig in Washington  (Woof?) 01:05, 24 August 2012 (UTC)[reply]
Any new opinions? What we do now? --Slick (talk) 17:11, 10 September 2012 (UTC)[reply]

I start the upload process now. The files are named: The New Orleans Bee <year> <month> <number>.pdf (example). They are in categories named: The New Orleans Bee <month> <year> (example). I need futhermore help to create the categories and the parents of them. The parent is by year The New Orleans Bee <year> (example) and the childs are the categories by month. The entire upload will take some time to complete, because there are a lot of files to upload. --Slick (talk) 20:51, 12 October 2012 (UTC)[reply]

There is a category for missing pages now (if found). --Slick (talk) 17:12, 13 October 2012 (UTC)[reply]
I found a solution to create the categories I need with a bot, so it is not nessesary to create them by hand now. Thanks. --Slick (talk) 22:04, 13 October 2012 (UTC)[reply]
Assigned to Progress Bot name Category
Slick done Slick-o-bot Category:The_New_Orleans_Bee_by_year