Open main menu

Wikimedia Commons β

Commons:Batch uploading

(Redirected from Commons:Batch upload)
Bot policy and list · Requests to operate a bot · Requests for work to be done by a bot · Changes to allow localization  · Requests for batch uploads

This page has a backlog that requires the attention of experienced editors.
Please remove this notice if it won't be needed in the future.

Boarisch | Български | বাংলা | Català | Deutsch | Deutsch (Sie-Form)‎ | Ελληνικά | English | Español | فارسی | Français | Galego | Magyar | Bahasa Indonesia | Italiano | 日本語 | 한국어 | Македонски | മലയാളം | Nederlands | Polski | Português | Português do Brasil | Svenska | Türkçe | 中文(简体) | +/−

Nuvola apps kcmsystem.png

Commons Batch Uploading is a project to centralize the uploading of a collection of files, that have released their work as PD or any Commons compatible license. The files would be assigned to a bot operator who would see how the request would be fulfilled. (To upload batches from Flickr, please make requests on Commons:Flickr batch uploading)

Before you request a batch upload here, please read the guide to batch uploading first.

See w:Wikipedia:Public domain image resources for potential future batch uploads.



Scripts, Examples and InformationEdit

New requestsEdit

John F. Kennedy assassination filesEdit

  • Source to upload from: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]
    • Do the media URLs follow a pattern? - I don't know.
    • Does the site have an API? - I don't know.
    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?) - I don't know.
    • Did you contact the site owner? - No need to, they are works of the U.S. federal government so they are in the public domain.
  • Describe the works to be uploaded in detail (audio files, images by …): PDF files released by the U.S. federal government into the public domain, pertaining to the John F. Kennedy assassination. They are in container files so they must be extracted first.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created? - No special template needed, just use the default one.

Illegitimate Barrister (talk) 19:25, 6 October 2017 (UTC)


Assigned to Progress Bot name Category

Edo period coin collecting cataloguesEdit

  • Source to upload from:

The website of the University of California at Santa Barbara. 🎓

    • Do the media URLs follow a pattern?

Kokin kousei, Shinsen zeni kagami (Corrected Against Past and Current Records, A New Selected Mirror of Cash Coins):

cover, 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10.

Shinpan kaisei, Kosen nedantsuke, Narabi ni bantsuki (Improved New Edition: Price List of Old Coins, Together with Rarity Ranking):

Book 📚 cover, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11.

    • Does the site have an API?

Not that I’m aware of.

    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?)

Not my subject of expertise.

    • Did you contact the site owner?

No, though I have attempted to contact him through his information provided here which may or may not be out of date, after that I tried it here which also bore no results. These images are in public domain so his permission is not needed. 📚

  • Describe the works to be uploaded in detail (audio files, images by …):

The images are of 2 books both of which were published in Edo period Japan, the images themselves are “large JPG” files of around 200 kb each (yes, “large” files, note 📝 that this website was last updated in 2003 or so), and generally contain two (2) pages with a description. The description in this case should not be copied as it was written by Dr. Luke Roberts himself.

The first book is owned by Dr. Luke Roberts and is from 1842 while the second book as published in Nagoya in 1799 and is (or was?) owned by collector Sam Leung. Both of them are too old to be copyrighted. Also note that the authorship of the second book 🕮 is (currently) unknown.

The books themselves are coin collecting catalogues containing the images of various Chinese, Japanese, Korean, Vietnamese (Annamese), and sometimes Muslim coins and have their prices and rarity written next to them, the illustrations generally exclusively show the obverse of the coins unless the reverse is notable as well. These books 📚 are old but well-preserved so they’re still easy to read and the scans themselves (though in contemporary times considered to be “small”) are of high quality.

  • Which license tag(s) should be applied?

Public domain from Japan. 🗾

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?


Also I think that the page from where the image 🖼 was uploaded from should be properly linked 🔗 so the information regarding the coins and the translation of the pages are easily accessible.

Sent from my Microsoft Lumia 950 XL with Microsoft Windows 10 Mobile 📱. --Donald Trung (Talk 💬) (Sockuppets 🎭) 09:49, 22 September 2017 (UTC)


Assigned to Progress Bot name Category

Illustrations of Vietnamese cash coins from Ed Toda's "Annam and its minor currency".Edit

I would like to request these bots to fetch all images and their related text from Ed Toda’s Annam and its minor currency, I really would like to do this myself only I have very little experience uploading files from public domain, and I have extremely little free time at hand and the images almost number in 300. 😅 Now I really wouldn't request this here if I thought that I had many other options so here it goes...

Sent from my Microsoft Lumia 950 XL with Microsoft Windows 10 Mobile 📱.

I plan on using all of these in the Wikipedia article “Vietnamese cash”, so I do have an immediate usage for them. I hope that these ideas 💡 are plausible. 🙇🏻

  • Source to upload from:

Ed. Toda.’s Annam and its minor currency hosted on Art-Hanoi, a website operated by Sema (known on Wikipedia as @Pyvanet~commonswiki:).

    • Do the media URLs follow a pattern?

11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21(, and technically ALL coins here).

Do not copy after 21 as page 22 is originally created by Sema himself, though Sema did create the Wikipedia article I wish to add these to, and he did wish to upload them himself he eventually gave up on doing it as too many of his files of more recent South-Vietnamese banknotes got deleted, I would request these separately from him but a ticket 🎟 would have to be fetched, and it should be more clearly discriminated as more recent currencies may violate Vietnamese copyright ©.

    • Does the site have an API?

Not that I’m aware of.

    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?)
    • Did you contact the site owner?

Yes, I did though the other images would have to be uploaded by himself, these however are the images of Ed. Toda and are mere scans thus do not go over the threshold of originality, and are in public domain.

Note 📝: The owner really does want his content here but quit here after some of his images got deleted due to Vietnamese copyright © laws.

For context: this page describes the scans in detail and is entitled “Read this first”.

  • Describe the works to be uploaded in detail (audio files, images by …):

The files are all images (scans) from Eduardo Toda’s 1882 book 📚 “Annam and its minor currency”, the authoritive English-language “classic” on Vietnamese cash coins prior to its publication going in full detail on the history, and circumstances (both economically wth resource management as well cultural with various “religious” (read: Superstitious) reasoning behind the content of the alloys), the descriptions of the coins should also be uploaded (further below), all of these files are hand-drawn Vietnamese (Annamese/Annamite) cash coins, though they are somewhat factually inaccurate due to them only following a single Chinese calligraphic style. They are all images from the same book and number exactly at --- images.

How the files should be organised:

The files themselves should all be named “Toda Nr.[number]” and then if possible their inscription in Chinese characters. The description below the files could be like exempli gratia title = “Toda No. 1. 太平興寶“ description is “Quote: “ (Barker: 1.6-1.16) Obverse: 太平興寶 Thai-binh-hung-bao. Reverse: The character 丁 Dinh, the name of the Dynasty.” - Ed Toda, Annam and its minor currency, 1882” Maybe again with a link 🔗 to the appropriate page. It should copy all the text until the next image, it should not any text before that chapter’s first image. Even long text should be copied as these illustrations mostly rely on their context.

With “the appropriate page” I mean that if it were ploaded from “” then that specific link 🔗 should be listed as “the source”.

Where white-spaces exist between paragraphs “
” could (or should) be added.

The files should be made in a new category called “Category:Illustrations from Annam and its minor currency by Ed Toda” which in itself falls under “Category:Coins of Vietnam”, and “Category:Eduard Toda”.

  • Which license tag(s) should be applied?

The license that should be applied is “


The author died in 1941, so this work is in the public domain in its country of origin and other countries and areas where the copyright term is the author's life plus 75 years or less.

This work is in the public domain in the United States because it was published (or registered with the U.S. Copyright Office) before January 1, 1923.

This file has been identified as being free of known restrictions under copyright law, including all related and neighboring rights.

” as the book was published in Shanghai (though the writer is Hispanic ABD/and if it were published in Spain the copyright © would still be valid until 2022, but it completely falls outside of this in both Chinese, and U.S. American law).

Each file should contain at the author “Eduardo Toda y Güell (though Dr. R. Allan Barker, “the Qui-Gon Jinn of Vietnamese cash coins” hypothesises that Toda’s wife drew them)” (I put a part in there as I'm a joker 🃏), or on a more serious note “Eduardo Toda y Güell (though Dr. R. Allan Barker hypothesises that Toda’s wife drew them)” as the authorship isn't 100% known, while the source should list their appropriate page at “Art-Hanoi” a website operated by Sema (known on Wikipedia as @Pyvanet~commonswiki:) for appropriate attribution.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?


Donald Trung (talk) 13:17, 24 July 2017 (UTC)


Assigned to Progress Bot name Category

1886 Bordeaux MontaigneEdit

1886 is the digital library of the university I am working for. This library contains 2,523 images by fr:Jean-Auguste Brutails (mainly churches from en:Aquitaine). This pictures are PD and I have started to upload them. I have uploaded ten of them and am now looking for feedback on anything that could be improved. I can upload more files on request. Symac (talk) 08:04, 18 July 2017 (UTC)

  • Is there a template that could be used on the file description pages? Do you think a special template should be created? I have started by using Template:Artwork.

One week after, another batch of 10, any extra advice welcome:

Ten more before the full import if no error found:


Hey Symac!

This looks good :) couple of thoughts:

  • {{Artwork}} looks fine to me ; did you look into {{Photograph}} which may (or may not ^^) be more applicable?
  • Using free-form description for “Accession number” field feels a bit strange. I thought we had meta-templates for such things but cannot find it right now...
  • Please do not use {{Mld}} for descriptions. It has weird behaviours in many regards. For example on File:Chapelle de Magrigne - J-A Brutails - Université Bordeaux Montaigne - 0001.jpg, the Mérimée template is not displayed to me and I bet it is because of Mld. The only positive thing with Mld is that it automagically hides too many descriptions, but that behaviour has since been activated in descriptions fields when using « {{fr}}{{en}} ». Some tools for adding quick translations also expect the latter format. In short: No Mld please :)
  • The all text in Permission should be put into a template, so it can be translated further.
  • {{1886 Bordeaux Montaigne}} looks good! I see it’s using old-style Autotranslate − I’ll convert it to use the Translate extension :)
  • Not sure about {{PD-old-auto}} − I think it’s only recommended if you provide the death date. But I’m not quite sure.

Hope that helps! Jean-Fred (talk) 09:16, 19 July 2017 (UTC)

@Jean-Frédéric: thanks for this valuable input. I have changed to {{Photograph}} which seems to be better suited. I have followed the other advices, that seems better now. If others think of something else that should be updated I am listening. Symac (talk) 12:34, 19 July 2017 (UTC)

I have added an extra 10 files with the same script, still looking for input on this. I think I will run the full import at the end of the week if no blocking error is noticed before! Symac (talk) 07:33, 25 July 2017 (UTC)

And 10 more, I will run the full import tomorrow if no error is reported. Symac (talk) 08:31, 27 July 2017 (UTC)
Assigned to Progress Bot name Category

USDA NRCS Plants DatabaseEdit

  • Source to upload from:
    • Do the media URLs follow a pattern? Yes.
    • Does the site have an API? No.
    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?) valid XHTML
    • Did you contact the site owner? No.
  • Describe the works to be uploaded in detail (audio files, images by …): Public domain: 10771 photos and 7064 line drawings, with species information for categorization. There are other copyrighted images as well, some of which may be freely licensed.
  • Which license tag(s) should be applied?


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?


@Guanaco: There is a lot of copyrighted material within these images, e.g. [21] [22]. (Just because this is a U.S. government web site this does not mean all the material is U.S. government material and by this means freely usable!) Actually I have not found too many images that really can be used (e.g. [23]). You should at least provide a procedure how to distinguish between copyrighted and free material. --Reinhard Kraasch (talk) 11:02, 9 July 2017 (UTC)

@Reinhard Kraasch: The gallery search function [24] has a filter by copyright status. [25]
I've found that the URLs linked by the thumbnails provide species information within <title>:
The search is navigable with &page=2, 3, 4, etc.
I'm actually interested in scripting this myself now, though it would be my first batch upload task. Guanaco (talk) 14:23, 9 July 2017 (UTC)
@Guanaco: Well, just go on... On the other hand it always is a good idea to have a second opinion with such a batch upload - especially for the non-technical aspects. --Reinhard Kraasch (talk) 20:52, 10 July 2017 (UTC)
Assigned to Progress Bot name Category

US National ArchivesEdit

I am hoping to begin a bulk upload of media from the US National Archives in the next few weeks. This will be a very different approach from the first upload, which was based on uploading files from an offline drive and scraping HTML for the metadata. This time around, NARA has an API for our online catalog, and so I am building a bot, using mwclient, to upload using the live metadata and files from the API. Some details:


The dataset includes all PD materials at (API: I plan to begin with a series of ~100,000 WWI-era photos. Technically, there are over 15 million files (and counting) in this dataset.

File names

The script is currently configured to name files with the formula: For single-page items:

  • "File:[TITLE] - NARA - [NAID].ext"
    Where "[TITLE]" is the catalog record's title field, and "[NAID]" is the National Archives Identifier. If this is over the character limit, "[TITLE]" is automatically truncated, with "(...)" appended.

For multi-page items (since the above formula would give all files belonging to one catalog record the same title):

  • "File:[TITLE] - NARA - [NAID] (page X).ext"

We are developing a custom metadata mapping, since NARA does not adhere to a metadata standard. You can see the metadata template we use here: {{NARA-image-full}}. Some notes:

While all the records in this catalog come from NARA or partner institutions, there are many different facility locations, and some NARA facilities have their own institutions templates already (e.g. US presidential libraries). Therefore, I am creating institution templates to go along with all NARA locations, and the script will insert the correct institution template based on a mapping.

NARA's authority file is not yet mapped to Wikidata, however that is definitely something that would be useful in the future. For now, we will upload files with NARA's creator and author names and their NAIDs and links back to the catalog authority record. However, including the NAIDs in a Commons template field means that in the future, Wikidata could be used to make creator templates appear instead. Any help with this would be appreciated.


Because NARA records are nearly all (>99%) derived from the records of US federal agencies, these uploads will use {{PD-USGov}} or its subtemplates. Most NARA records are in one of about 600 record groups based on their creating agency, so I am using a mapping of NARA record groups to Commons PD-USGov templates so that the bot can apply the more specific agency templates in most cases. Help filling out this mapping would be appreciated.

Nearly all holdings of the US National Archives are in the public domain as a work of the federal government (or, otherwise, due to age). This is marked in the "use restriction" field in the catalog, with a value of "Unrestricted" indicating public domain determination by the archivists. Therefore, the script will be configured to skip over any records in which the use restriction is anything other than "unrestricted" (even "possibly" ones, which could ultimately be PD, but need a human determination).


All uploads will be automatically categorized by the metadata template into Category:Media contributed by the National Archives and Records Administration and a category for the series they belong to (such as Category:US National Archives series: DOCUMERICA: The Environmental Protection Agency's Program to Photographically Document Subjects of Environmental Concern, compiled 1972 - 1977). Eventually, the script will be designed to create the series category if a file is uploaded for a series which does not yet have one.

When it comes to topical categories, past NARA uploads utilized the {{uncategorized}} tag to encourage the community to add topical tags. However, since this creates work for the community, I am planning this time around to run uploads a small batch (hundreds to a few thousand) at a time, so I can upload them with one or more topical categories that apply to all records in the batch, rather than uncategorized.


You can find the upload bot's code at This project is being developed in public on NARA's official GitHub account. I would welcome collaboration (pull requests or otherwise) there. In addition, the Commons community is welcome to file issue reports on that repo.


The most recent test uploads can be viewed in Category:US National Archives series: American Unofficial Collection of World War I Photographs. I am still polishing the upload script, but these examples essentially represent what should be expected from the bot once it gets started.


The bot account is technically already flagged from the last bulk upload a couple of years ago, however I would like to submit the current plan to community review before restarting uploads. If there are any opinions on the bot's design or the format of uploads or other issues, I am happy to hear them. We'd also like to know whether to limit what is uploaded in any way—as in, would Commons actually be interested in 15 million files, or might some of these, like the millions of census cards, not be of interest. Also, if anyone is interested in helping out with the coding or other tasks, please feel free to let me know. This is a big undertaking. Thanks! Dominic (talk) 17:25, 31 May 2017 (UTC)

Assigned to Progress Bot name Category
User:Dominic Coding User:US National Archives bot Category:Media contributed by the National Archives and Records Administration


Noideawhatiamdoing (talk) 02:21, 7 March 2017 (UTC)


Assigned to Progress Bot name Category

Imperial EncyclopaediaEdit

I'm trying to upload with Commonist. It troubles me that the software often lose connection with the sever. When I re start the software, it reuploads those files that already have been uploaded. I hope someone who is familiar with uploading script could download the books and upload them.

  • Describe the works to be uploaded in detail (audio files, images by …):

The en:Gujin Tushu Jicheng (simplified Chinese: 古今图书集成; traditional Chinese: 古今圖書集成; pinyin: Gǔjīn Túshū Jíchéng; Wade–Giles: Ku-chin t'u-shu chi-ch'eng; literally: "Complete Collection of Illustrations and Writings from the Earliest to Current Times"), also known as the Imperial Encyclopaedia, is a vast encyclopaedic work written in China during the reigns of the Qing Dynasty emperors Kangxi and Yongzheng. It was begun in 1700 and completed in 1725. The work was headed initially by scholar Chen Menglei (陳夢雷), and later by Jiang Tingxi. The encyclopaedia contained 10,000 volumes. Sixty-four imprints were made of the first edition, known as the Wu-ying Hall edition. The encyclopaedia consisted of 6 series, 32 divisions, and 6,117 sections. It contained 800,000 pages and over 100 million Chinese characters. Topics covered included natural phenomena, geography, history, literature and government. The work was printed in 1726 using copper movable type printing. It spanned around 10 thousand rolls (卷). To illustrate the huge size of the Gujin Tushu Jicheng, it is estimated to have contained 3 to 4 times the amount of material in the Encyclopædia Britannica Eleventh Edition.

  • Which license tag(s) should be applied?

I have used {{PD-scan}}. Also see Commons:Village_pump/Copyright/Archive/2016/09#Photocopies_of_ancient_books.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

File:Gujin Tushu Jicheng, Volume 1 (1700-1725).djvu for example.

維基小霸王 (talk) 13:28, 8 October 2016 (UTC)


Assigned to Progress Bot name Category
  • Files copied locally.
  • Uploads started 16 October 2016. Initial uploads taking around 5 minutes / djvu file. Leading zeros are being used in the volume number to make sorting very easy to understand.
  • Upload run completed 18 October. There are know missing files from the original file share that will need to be added as a housekeeping task.
  • Further housekeeping tasks to be raised at Category talk:Gujin Tushu Jicheng.
NA Category:Gujin Tushu Jicheng

Catharijne ConventEdit

  • Source to upload from:
    • Do the media URLs follow a pattern?
    • Does the site have an API? No
    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?)
    • Did you contact the site owner? Yes; this site was specifically set up for this particular upload, as the museum itself claimed not to be able to host the images online.
  • Describe the works to be uploaded in detail (audio files, images by …): 572 high-res TIFF files of objects held by the Museum Catharijneconvent in Utrecht, The Netherlands. The metadata were provided as XML and are available as well.

  • Which license tag(s) should be applied?

Probably [26].

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

AWossink (talk) 13:17, 14 April 2016 (UTC)


AWossink, I could look into uploading this one as well. The linked source only has one file currently, and do you have the xml? Basvb (talk) 13:59, 14 April 2016 (UTC)
Aah I see that I likely misunderstood, you're planning on uploading this batch yourself using GWToolset Arne? See phabricator:T131841 Basvb (talk) 14:16, 14 April 2016 (UTC)
@Basvb: Yes, I was planning to do that in order to get some experience in the GWToolset. However, since this batch has been sitting around for quite a while and uploading with the GWtoolset seems to be a lot slower than your script, it would be great if you are still interested in taking care of this upload as well! Let me know if that suits you - otherwise I am happy to continue as I originally planned. Best, AWossink (talk) 14:26, 14 April 2016 (UTC)
Is it really slow with GWtoolset (more than a day for all, nice thing with GWToolset is that it can run while your PC is off)? The file sizes are quite large compared to the tropenmuseum upload, so any script will be slower. Basvb (talk) 15:04, 14 April 2016 (UTC)
Assigned to Progress Bot name Category
AWossink Resolved Uploaded with Pattypan instead of GWT Category:Textiles in Museum Catharijneconvent

Tropenmuseum ExpeditionsEdit

First file uploaded as an example
Second file I uploaded from this set, as an example
  • Source to upload from: This spreadsheet. The metadata has been extensively cleaned and prepared by Spinster (talk). I have the files on my hard drive and can provide them or place them in a downloadable spot by request. Upload tips are in this Google doc.
    • Did you observe an URL pattern


    • Do you know whether the site has an API


    • What else can ease uploading (is the site valid XHTML, WCM they use…)?


    • Did you contact the site owner?

The metadata and images were provided to me by Richard van Alphen, coordination collection digitization at the Nationaal Museum van Wereldculturen (= the new name of the museum in which the Tropenmuseum was merged).

  • Describe the works to be uploaded in detail (audio files, images by …):

720 photographs from the Tropenmuseum collection. All images in this upload are related to expeditions and missions, most notably

  1. the 1903 North New Guinea Expedition
  2. ethnomusicologist Jaap Kunst
  3. the Siboga Expedition
  4. various Christian missions in Africa, Suriname, Indonesia

The upload is part of the project Expedition Wikipedia.

  • Which license tag(s) should be applied?

Variations on {{KIT-license}}. See the spreadsheet: I already provided the correct license template for each image.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Please use {{Photograph}}, this is appropriate and the provided metadata fits nicely. See Sample file 1 and Sample file 2. More information and tips for the upload in this Google doc. Feel free to ask me any questions!

Spinster (talk) 12:26, 25 August 2015 (UTC)


Looks easy enough. I can take care of it, using my battle-tested MassUploadLibrary + GlamWiki Toolset. Jean-Fred (talk) 20:42, 3 September 2015 (UTC)

@Jean-Frédéric: Are you still looking at this? I may pick it up if not. This is already so nicely prepared that we ought to be able to knock it out easily. BMacZero (talk) 22:28, 29 February 2016 (UTC)
@BMacZero: @Spinster: contacted me again about it a couple of weeks ago, and decided to not be looking at this per lack of time (I should have updated here too, sorry about that) and suggested some alternatives. Spinster answered she had one someone else interested in this. Can you confirm Spinster? (I certainly agree it’s very nice prepared ;) Jean-Fred (talk) 15:11, 3 March 2016 (UTC)
@BMacZero: @Jean-Frédéric: Thanks for your offer to help! I know that Wikimedia Nederland is also looking for someone to do this upload, but perhaps if you have a quick routine you might be able to do it more easily than someone who is new to mass uploads. I'm emailing Arne Wossink from WMNL so that he can get in touch with you about this directly. Spinster (talk) 09:34, 6 March 2016 (UTC)
@Spinster: Thanks! Yes, I have a bot that's been doing batch uploads and I should be able to do this one in short order once I have the files. BMacZero (talk) 23:09, 6 March 2016 (UTC)
Arne asked me a few weeks ago to work on this batch upload. I was planning on doing so somewhere in the coming weeks. I've to look a bit at what is the easiest way to upload these. The files are on my pc (folder 100MB) and not via a weburl (making GWToolset not really an option?). Should I proceed as planned or would you like to work on this BMacZero? Mvg, Basvb (talk) 20:46, 7 March 2016 (UTC)
@Basvb: If it's all the same to you, I can take care of it pretty easily. I just write custom C# scripts for these, it won't take long for me to adapt one to this upload. I can have it ready to go in few days once I have the files. BMacZero (talk) 23:00, 8 March 2016 (UTC)
Ok, I've asked Arne if he wants to play it along to you. I'd likely not have any problem finding a way of uploading, just need to find a day to do so somewhere in the next weeks, so if you're ready to go that could speed up the process a bit. Basvb (talk) 17:40, 10 March 2016 (UTC)
If i can be of any use for uploading from a URL with GlamWikiToolset, just let me know. Hansmuller (talk) 16:36, 17 March 2016 (UTC)
I discussed it with Arne, he prefered if I would upload, as this can help out with understanding the Dutch. Hans, the images are not via URL, but on my computer, but I don't think I will have any issues uploading, thank you for offering the help. Just need to find a little time somewhere in the next two weeks. Mvg, Basvb (talk) 19:21, 17 March 2016 (UTC)
Okay, got it. BMacZero (talk) 00:32, 19 March 2016 (UTC)
@Spinster:: Ik ben begonnen met het werken aan de upload, ik denk over een paar uur klaar te zijn om te uploaden (heb de meest technische zaken overwonnen). Wel had ik twee vraagjes/opmerkingen. 1.: Er zitten 858 afbeeldingen in de zipfile die ik van Arne had ontvangen. In het metadatadocument zitten echter maar 720 rijen met bestanden. Ik zal er dus 720 kunnen uploaden, maar ben benieuwd of de andere 138 ook vrijgegeven zijn, of dat je daar bijvoorbeeld nog wat voor wil uitzoeken? 2.: Over de bestandsnamen: de naam van het museum in all-caps komt wel erg schreeuwerig over. Daarnaast is het volgens mij handiger om de titel vooraan te hebben en daarna pas de museumnaam (zodat in categorieën mensen gelijk kunnen zien wat het bestand afbeeld). Ik kom dan op bijvoorbeeld de volgende bestandsnaam uit: "Children paddling the proa of protestant missionary Van Hasselt, ... - Collectie stichting Nationaal Museum van Wereldculturen - TM-60010069.jpg", is deze afwijking van naamgeving ok, of wil je graag dat ik bij de aangegeven namen blijf? Misschien kan zelfs het "collectie stichting" stukje eraf om de naam nog wat korter en makkelijker te hanteren te maken? Tot slot lijkt het me mooi om alle bestanden in een technische (niet zichtbare) categorie te zetten, zodat je ze met verschillende tools kunt blijven volgen. Iets als "Images from the Nationaal Museum van Wereldculturen"? Mvg, Basvb (

talk) 19:00, 5 April 2016 (UTC)

I've uploaded 3 sample images: File:Mission exhibition in the Colonial Institute's museum - Collectie stichting Nationaal Museum van Wereldculturen - TM-10000436.jpg, File:Mission exhibition in the Colonial Institute's museum - Collectie stichting Nationaal Museum van Wereldculturen - TM-10000437.jpg and File:Tentoonstelling over Missie en Zending in het museum van het Kolo... - Collectie stichting Nationaal Museum van Wereldculturen - TM-10000438.jpg. If the files are good and we decide on a upload specific category (or not to use one) I can start the upload, which will take circa 2 hours. Basvb (talk) 20:11, 5 April 2016 (UTC)
@Basvb: Uploads look good!
  • One small detail: times end with 00:00:00, perhaps this can be omitted?
  • Only 720 images: that's correct; the other ones are not OK in terms of copyright.
  • OK to do the name of the collection at the end and to drop the ALL CAPS
  • But I would not take away the 'collectie stichting' - I think the museum does want this mentioned
  • And an invisible category is a great idea of course. How about Category:Files from the Nationaal Museum van Wereldculturen? Maybe at some point they'll do sound and video too.
Thanks for your efforts, highly appreciated :-) Spinster (talk) 10:11, 6 April 2016 (UTC)
Thanks for the quick reply. About the times: I'm thinking of fixing those after the upload is done (using visualfilechange). Parsing all the different times before is possible, but doing it after the upload is just a little bit quicker. And I'll add the category and start the upload. Basvb (talk) 12:21, 6 April 2016 (UTC)
@Spinster:: The upload is done, on 12 or 13 files there was a naming issue (wrong characters in the title), I'll be uploading those by hand asap. The much linked creator template from Gijs van der Sande is a red link, can you create that one Spinster? Basvb (talk) 15:35, 6 April 2016 (UTC)
I wrote down the numbers of the failed uploads and fixed them now, it does however seems like I've missed as there are currently only 719 files. Basvb (talk) 18:48, 6 April 2016 (UTC)
I searched the Google Doc for other potential candidates to be missing but there are none (with a question mark in the title, which was the reason for failed titles, those are forbidden in commons file titles). Not a good idea on how to find the last failed file easily. Basvb (talk) 19:00, 6 April 2016 (UTC)
Ok, I've fixed all the dates now, everything is done now (except for this one missing file, but I don't know an efficient solution for that). Do you plan on posting something in the Dutch village pump Spinster? Any other things to quickly make these files better usable? Basvb (talk) 18:10, 7 April 2016 (UTC)
@Basvb: thank you so much! I made the creator template for Gijsbert van der Sande and am going through the files to see if I can add more info (categories and such). We'll publicize this upload via all Wkimedia Nederland channels and via GLAM mailing lists and the GLAM newsletter, among others. Spinster (talk) 09:15, 11 April 2016 (UTC)
Sounds like a good plan, shall I move this request to the completed requests? Basvb (talk) 14:01, 14 April 2016 (UTC)
Assigned to Progress Bot name Category
User:Basvb Upload done User:BasBot Category:Files from the Nationaal Museum van Wereldculturen


  • Source to upload from:: ;
    • Did you observe an URL pattern:
      — Yes. Each photo from this site has an individual number from 1 to 145250 (last actual number at 11:50, 12 October 2015 (UTC) is 145250) and is available with link like<number of photo>. To get the direct url of JPG file:
      • prepend number of photo with zeros to get seven-digit number
      • remove the last digit
      • separate first 6 digits with "/" to get three 2-digit numbers
      • insert this numbers between and the number without leading zeros, and append .jpg

For example, the direct file link of photo with number 4 ( is ; for photo with number 12345 ( is

    • Do you know whether the site has an API — No
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner? — No
  • Describe the works to be uploaded in detail (audio files, images by …):

Trainpix is Russian photo gallery of railway rolling stock (default language is Russian, the site is also available in English, Ukrainian and Belorussian languages). The first string under photo describes the name of rolling stock (model and number), which should be used as name of file on Commons (also it can be appended with number of photo on this site like bot-uploaded photos from Flickr). The second string describes the place where a photo has been taken. The third string describes author's name and date. In the left part of page, each photo has description tag of license (in Russian — Лицензия), including copyright, zero, by, by-sa, by-nc, by-nc-sa, by-nd, by-nc-nd and mark. Any registered user of this site is available to select license in upload form (screenshot), also it includes Please read before license selection hyperlink to page, which describes Creative Commons 3.0 licenses. I am not understand what means mark license (it's description in upload form is No author rights) and is it simillar to Public Domain, but all photographs with free licenses zero, by, and by-sa should be uploaded on Commons. Note that some numbers are deleted photos (example —

  • Which license tag(s) should be applied?
    • {{cc-zero}} — for images with license tag "zero"
    • {{cc-by 3.0}} — for images with license tag "by"
    • {{cc-by-sa 3.0}} — for images with license tag "by-sa"

  • Is there a template that could be used on the file description pages?


  • Do you think a special template should be created?

Maybe. Each rail vehicle on this site has description table with registration railway, model, current condition, etc.

Xenotron (talk) 15:34, 21 August 2015 (UTC)


Assigned to Progress Bot name Category

40,000 from zeno.orgEdit

Any chance of getting a new set of high-resolution images from Directmedia/zeno to replace the Yorck Meisterwerk Project images? Even in 2005 many of them looked completely inadequate because of the very low (40%) jpg quality (see Commons talk:10,000 paintings from Directmedia/Requests for improvement). has much better images available now. There are so far over 2700 images in Category:Images from, but they appear to be done ad-hoc by individuals, not as part of a particular project (or a particular bot). That compares to 10,374 in Category:PD-Art (Yorck Project), and now claims to have 40,000 Meisterwerk, which appear to be much higher quality that the original set. No doubt the original 10,000 are included in the current 40,000.

Because the Directmedia/Yorck Meisterwerk Project was the first big batch upload of artwork, the images are heavily used throughout the wikipedia sites. It is precisely for this reason that updating them is important. So: is it possible to replace the original 2005 images on commons? If not then after uploading new images, all of the old images would need to be marked with {{superseded}} and all of the pages that use the old images updated to the new ones (by bot).

Would it be easier to get a new set of images directly from Directmedia as part of GLAM? They might even provide higher resolution images than what is on their own site.

Note that the site is in German, and they have a lot of other things besides paintings. Laura1822 (talk) 10:12, 13 July 2015 (UTC)

  • Source to upload from:   
    • Do the media URLs follow a pattern?   
    • Does the site have an API?   
    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?)   
    • Did you contact the site owner?   
  • Describe the works to be uploaded in detail (audio files, images by …):   
  • Which license tag(s) should be applied?   
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?   


Assigned to Progress Bot name Category


  • Describe the works to be uploaded in detail (audio files, images by …):
Images the comet 67P/CHURYUMOV-GERASIMENKO by the NAVCAM on the Rosetta spacecraft.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Yann (talk) 14:32, 6 June 2015 (UTC)


Assigned to Progress Bot name Category

Archives Nationales (France)Edit

As part of a partnership between the Archives nationales and Wikimédia France. 45 files were uploaded in September, here is another batch of 32.

Metadata output can be viewed at User:Jean-Frédéric/Sandbox2.

Considering the output quality, and unless there are strong oppositions, I’ll most certainly proceed with the upload in a few days from now.

Jean-Fred (talk) 23:43, 31 January 2014 (UTC)

Nice, but I wish we would get the complete documents, and at a higher resolution. ;o) Regards, Yann (talk) 03:44, 1 February 2014 (UTC)
I see :) Well, I’ll let them know :) Jean-Fred (talk) 00:57, 3 February 2014 (UTC)

  Done. Jean-Fred (talk) 00:57, 3 February 2014 (UTC)

Another batch of 69 files. Jean-Fred (talk) 09:58, 7 September 2014 (UTC)

Blast! There are four images for which the copyright status is unclear. I had set them aside, the time the Archives investigate, but at upload time I used the wrong metadata file and uploaded them anyway!
Ah well. I just delete them, and will undelete them if necessary. Jean-Fred (talk) 10:28, 7 September 2014 (UTC)
File:Photographie de l’arrivée du Tour de France sous la présidence de Valéry Giscard d’Estaing 1 - Archives Nationales - AG-5(3)-3493-3103.jpg
File:Photographie de l’arrivée de François Mitterrand à l’Élysée 1 - Archives Nationales - AG-5(4)-SPH-1-4641.jpg
File:Photographie de Levy du Palais du champ de Mars 1 - Archives Nationales - F-12-11911-10.jpg
File:Vœux du général de Gaulle le 31 décembre 1968 1 - Archives Nationales - AG-5(1)-1057-2080.jpg
  Done, 65 files uploaded. Jean-Fred (talk) 10:37, 7 September 2014 (UTC)

Another batch of 43 images. Code is on GitHub, ingestion template is still Commons:Archives_Nationales/Ingestion. Test output at Commons:Batch uploading/Archives Nationales (France)/Fourth batch. Jean-Fred (talk) 22:17, 16 May 2015 (UTC)

  Done --Jean-Fred (talk) 14:00, 19 May 2015 (UTC)
  Comment It would be better to add headers [27] in such uploads. Regards, Yann (talk) 14:50, 6 June 2015 (UTC)
Hi Yann, thanks for your feedback. I added the filedesc header to the template. For the license, I usually thin it makes more sense to have it inside the Infobox template, but no strong opinions. What do you think? Jean-Fred (talk) 16:47, 6 June 2015 (UTC)
I think it's better to have it separate. Regards, Yann (talk) 17:00, 6 June 2015 (UTC)

Fifth batch. 100 files. Tests at Commons:Batch uploading/Archives Nationales (France)/Fifth batch. Jean-Fred (talk) 00:56, 3 October 2015 (UTC)

  Comment i note you have letters page by page. you might want to combine into one multipage document, to make it easier to create wikisource document. 13:06, 3 October 2015 (UTC)
That’s a good point. More often than not we only have one page though. :-( Wikisource integration is somewhere on the partnership roadmap, I will keep that in mind :) Jean-Fred (talk) 12:33, 11 November 2015 (UTC)
  Done 99 files uploaded in Category:Media contributed by the Archives Nationales (France)/5. For this one, I just added a link to the Archim database on all file pages, although some objects do not appear to have an entry there: I believe it is easier to test the link and remove it if necessary than the other way around. I might come up with a smarter way for the next batch. Jean-Fred (talk) 12:33, 11 November 2015 (UTC)

Eight batch of 53 items (I’m a bit behind...), Tests at Commons:Batch uploading/Archives Nationales (France)/Eigth batch. (I’ll probably remove the autolink to Archim database, looks too wrong on that batch). Jean-Fred (talk) 00:22, 3 February 2016 (UTC)

I agree with removing autolink to Archim, I tested 30 files and only one correct link so far :( Léna (talk) 10:20, 3 February 2016 (UTC)
  Done Jean-Fred (talk) 00:20, 4 February 2016 (UTC)
Part Assigned to Progress Bot name Category
First batch (Sep 2013) Jean-Frédéric   Done ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/1
Second batch (Feb 2014) Jean-Frédéric   Done ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/2
Third batch (Sep 2014) Jean-Frédéric   Done ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/3
Fourth batch (May 2015) Jean-Frédéric   Done ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/4
Fifth batch (Sep 2015) Jean-Frédéric   Done ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/5
Sixth batch (Oct 2015)
"Août 14. Tous en guerre !"
Jean-Frédéric   On hold ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/6
Seventh batch(Nov 2015)
"Napoléon et l'Empire"
Jean-Frédéric   On hold ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/7
Eight batch (Dec 2015)
"Le secret de l’État"
Jean-Frédéric   Done ArchivesNationalesBot Category:Media contributed by the Archives Nationales (France)/8


  • Source to upload from: NASA Earth Observatory
    • Did you observe an URL pattern

Yes and No. See: NASA EOL

    • Do you know whether the site has an API

No API Known.

    • What else can ease uploading (is the site valid XHTML, WCM they use…)?

I have written the uploader code, see Opinions.

    • Did you contact the site owner?

No, but I am eager to contact them if it is needed. Some files are larger than 60-70 MBs.

  • Describe the works to be uploaded in detail (audio files, images by …):

The files are all images which are geo-taged and available on the NASA EOL Website.

  • Which license tag(s) should be applied?


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

No special templates needed can use the NASA PD template. Pouyana (talk) 22:03, 15 May 2015 (UTC)


The bot, that uploads this files is hosted on labs. It is written in python and uses pywikibot to upload data. The code is available on GitHub. I have uploaded some files as test with my own account but a Bot account when needed can be created. To avoid duplicate I compare SHA1 hash of the files and the source files on Commons.--Pouyana (talk) 22:03, 15 May 2015 (UTC)

Assigned to Progress Bot name Category


  • IVIA
    • Did you observe an URL pattern No
    • Do you know whether the site has an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
  • Describe the works to be uploaded in detail (audio files, images by …): Videos about agriculture in the Land of Valencia

  • Which license tag(s) should be applied? The one stated in the Vimdeo Videos

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Coentor (talk) 07:17, 23 April 2015 (UTC)


I see no licensing information at all in the video descriptions. BMacZero (talk) 00:53, 3 June 2015 (UTC)

Found it. {{Cc-by-3.0-es}}. Are all of them the same license? BMacZero (talk) 00:57, 3 June 2015 (UTC)
Assigned to Progress Bot name Category Category:IVIA Videos

ETH Zürich BildarchivEdit

As of 23 March 2015, the image archive of the ETH Zurich library is available for download: Whenever possible, the digital images in ETH-Bibliothek's Image Archive can be downloaded for free in web resolution and as high-resolution JPGs and TIFFs. The images can be used freely for scientific, private, non-commercial and commercial purposes. [28]

The images are mostly licensed under Public Domain Mark or CC BY-SA 4.0 (e.g. [29]). For some images, e.g. [30], the license is "unknown, users must clarify the copyright situation". These images are not available for download.

  • Describe the works to be uploaded in detail (audio files, images by …): All available images. License tags need to be checked.
  • Which license tag(s) should be applied? See above.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created? Category:Bildarchiv der ETH-Bibliothek Zürich exists already.

Ktotam (talk) 00:50, 30 March 2015 (UTC)


Assigned to Progress Bot name Category


There are lots of valuable paintings in the public domain at the auctioner Bukowskis. Information about when the painter died is provided for each painting.

The images can be licensed as PD-art and categorized with Category:Images from Bukowskis. I see forward to providing additional catgeories.

I have previously uploaded most of the PD paintings from one of their auctions manually.

Jonund (talk) 19:50, 26 February 2015 (UTC)

Maybe Category:Images from Bukowskis should be a hidden cat? It doesn't provide much useful info for most users, and it removes images from Category:Media needing categories, although they may be in need of meaningful categories. --Jonund (talk) 15:03, 1 March 2015 (UTC)


Assigned to Progress Bot name Category

Peter Parker's Lam Qua Paintings CollectionEdit

Hi, Some of Lam Qua's Medical portraits is uploaded as a part of wellcome's collection thanks to @:'s diligent work.

And there is more.

Can anyone upload them please?

Thanks, 維基小霸王 (talk) 03:12, 1 February 2015 (UTC)


Assigned to Progress Bot name Category

State Library of North CarolinaEdit

  • Description of Content

The State Library of North Carolina has over 10,000 photographs taken during a statewide survey of North Carolina's cultural heritage institutions, including libraries, museums, archives, and historic sites. The survey was conducted as part of the North Carolina Exploring Cultural Heritage Online project (NC ECHO), which surveyed all 100 counties in the state, and ultimately identified and photographed over 950 institutions between 2001 and 2009. The collection includes images of building interiors and exteriors, historical marker signage, displays, and exhibits. The State Library is also the official depository for all state agency publications, and has an extensive collection of digitized historical publications on North Carolina and its people.

The State Library is interested in adding the NC ECHO images to Wikimedia Commons (I, Retrent, work for the State Library), and we are particularly interested in doing so in a way that takes advantage of the recent and forthcoming integration of Commons and Wikidata, and the Structured Data Project.

The NC ECHO images are currently online through the North Carolina Digital Collections, a portal jointly managed by the State Library and State Archives of North Carolina. The images can be found at The images are described, and it is relatively easy to capture metadata and image links in whatever format is required for upload tools. State Library staff are available to facilitate.

Example images:

  • Which license tag(s) should be applied?

The photographs were created by a North Carolina government agency, and as such are public records and the State Library considers them to be in the public domain.

The Template:PD-SLNC license was developed by the State Library and Wikimedia DC as part of the Summer of Monuments Project: It is based on the State Library's official rights statement, and it may be used for these images.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Template:Photograph is likely sufficient. However, we are unsure about how to take advantage of the recent integration of wikidata and Commons (, and whether we would need to use a special template to insert Wikidata links in fields like Depicted place. Once uploaded, we'd like to add authority control links to the buildings/locations depicted in the images.

Also, five NC ECHO images were uploaded manually as part of the Summer of Monuments project under the username NCandbeyond. These images were uploaded with Template:Information, and we are unsure of how to change the template so that we can enter additional metadata.

Retrent (talk) 16:56, 11 December 2014 (UTC)


Assigned to Progress Bot name Category

National Museum of KoreaEdit

  • Source to upload from: here (375, some uploaded), here (3285 / some duplicates with above folder.), here (7277 files / some duplicates with above folders.)
  • Describe the works to be uploaded in detail (audio files, images by …):

Images taken by National Museum of Korea, depicting all their heritages.

  • Which license tag(s) should be applied?


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

{{NMoK}} should be used after the source link, and {{Institution:국립중앙박물관}} should be author.  revimsg 04:54, 14 November 2014 (UTC)


Assigned to Progress Bot name Category

The Digital South Asia LibraryEdit

For Bond photos, the direct URL to the images is
For Keagle, it is
The "NNNN" is a 4 digit number.

For Hensley, it is The "x" is a small case letter, and the "NNN" is a 3 digit number.

    • Do you know whether the site has an API


    • What else can ease uploading (is the site valid XHTML, WCM they use…)?


    • Did you contact the site owner?


  • Describe the works to be uploaded in detail (audio files, images by …):

All the files are images taken of South Asia (mostly India and Burma) during World War II by American servicemen Glenn S. Hensley, Robert Keagle and Frank Bond.

  • Which license tag(s) should be applied?


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

The regular "Information" template should be fine. Co9man (talk) 08:44, 10 October 2014 (UTC)


I harvested all the Bond [31] and Keagle [32] metadata (about 700 images total). Hensley is difficult because the search seems to be broken and the IDs don't follow the same pattern as the others - I let them know, maybe they will fix it soon. BMacZero (talk) 05:24, 17 May 2015 (UTC)
I've been told that Hensley's photos were not taken as part of his official duties, and therefore aren't public domain. In addition, while the Bond and Keagle images are technically public domain, the text at the bottom of [33] lists some restrictions that would make the image unsuitable for Commons. Not sure if/how to proceed. BMacZero (talk) 16:55, 17 May 2015 (UTC)
Assigned to Progress Bot name Category

Musée des AugustinsEdit

Over several months the Musée des Augustins (fine arts museum in Toulouse) uploaded the media collections on Commons (see Category:Media_contributed_by_the_Musée_des_Augustins_de_Toulouse).

I am finishing up this work with the works in the reserves: 683 paintings & 921 sculptures.

Alignments for artists names is made at Commons:Musée des Augustins/alignment/ARTIST reserves.

Test of metadata is at Commons:Batch uploading/Musée des Augustins/test.

Looking forward to your opinions, Jean-Fred (talk) 22:10, 26 September 2014 (UTC)


Good work. Thank you. --Slick (talk) 04:59, 23 October 2014 (UTC)
Interesting, but it is a pity that the images are so small. Regards, Yann (talk) 15:25, 23 April 2015 (UTC)

@Jean-Frédéric: Is this complete (i.e. can it be remove from the Batch Uploading list)? BMacZero (talk) 19:29, 29 February 2016 (UTC)

Assigned to Progress Bot name Category

Rubin Kazan - Llevant UDEdit

  • Source to upload from:
    • Did you observe an URL pattern
    • Do you know whether the site as an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
  • Describe the works to be uploaded in detail (audio files, images by …):

Photogallery from of an historical match for Levante UD. Good quality images of players that maybe don't have any better portaits.

  • Which license tag(s) should be applied?

The usual license.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?


54 images. Images are at where NNNNN is 57099 to 57152. There is no per-image metadata. BMacZero (talk) 15:43, 1 July 2015 (UTC)

Assigned to Progress Bot name Category
Category:Rubin Kazan-Llevant 15-03-2013


This request for the upload was ignored: ttps:// SVG files

    • Did you observe an URL pattern
    • Do you know whether the site has an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?

I contacted Wolfgang Spraul <> 22 January 2014 and he told me that the best thing I can do is "to tell all your friends about Openclipart. :-) Tweet, Blog, say thanks from wikitranslate, but most importantly tell them in person."

When I asked him about the credits for the work to the project I was told to add the link to the homepage

Natkabrown (talk) 14:55, 9 August 2014 (UTC)

  • Describe the works to be uploaded in detail (audio files, images by …):

Public domain clipart files in SVG format.

  • Which license tag(s) should be applied?


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Rillke(q?) 10:16, 9 August 2014 (UTC)

{{PD-OpenClipart}} as permission which sets also an hidden Cat. User: Perhelion (Commons: = crap?)14:30, 9 August 2014 (UTC)
All SVG files from openclipart (I've downloaded the 1.4GB dump for Aug 9th) seem to contain embedded metadata with the Author, a link to the source page for the file, and a bunch of openclipart-specific "tags". I think the key to importing the images is finding a good mapping from the oca tags to commons categories. --Dschwen (talk) 15:15, 9 August 2014 (UTC)


Assigned to Progress Bot name Category
Dschwen Writing the code to analyze the metadata DschwenBot

Digitaler PortraitindexEdit

Zoomify images at Example:
    • Do you know whether the site has an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
  • Describe the works to be uploaded in detail (audio files, images by …):
257,000 high resolution zoomify print portraits of Early Modern figures. These are very useful for WP articles.
  • Which license tag(s) should be applied?
{{PD-Art-100}}, I believe all images were created in the Early Modern period.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
I could create one if this batch upload is possible

Jfhutson (talk) 01:41, 22 May 2014 (UTC)


  Comment They use Zoomify (see ), so it should be possible to get the hightest resolution of the images in general. --Slick (talk) 09:35, 26 June 2014 (UTC)

Assigned to Progress Bot name Category

Geograph DeutschlandEdit

Alle auf dieser Seite hochgeladenen Bilder stehen unter der CC BY-SA 2.0-Lizenz. Vor einiger Zeit wurde hier bereits einmal ein (kleiner) Schwung mit Bildern unter der Kategorie Category:Images from the Geograph Deutschland project hochgeladen. Ist es möglich, alle knapp 50.000 bisher hochgeladenen Bilder nach Commons zu übertragen? Ist es weiters möglich, die neu hinzukommenden Bilder von Zeit zu Zeit automatisch hochzuladen? Gemäß dem Ziel des Projekts "geographisch repräsentative Photos für jeden Quadratkilometer Deutschlands zu sammeln" sind fast alle Bilder auch für Wikipedia relevant.

P170 (talk) 16:11, 21 March 2014 (UTC)


Assigned to Progress Bot name Category

Glitch ArtworkEdit*.zip

(but directory is not traversable)

    • Do you know whether the site has an API


    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?


  • Describe the works to be uploaded in detail (audio files, images by …):

Images and animations in .png,.swf, and .fla format archived in .zip files

  • Which license tag(s) should be applied?
    This file is made available under the Creative Commons CC0 1.0 Universal Public Domain Dedication.
The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Lepidoptera (talk) 11:21, 13 February 2014 (UTC)


  Comment I review this and only the first ZIP File ([34], 3557 PNGs, ~1.5 Kbyte each, ~30x30 pixel) contains usable file formats. I dont know if it possible to convert the FLA oder SWF given in the other ZIP files can convert (in a batch) and this is usefull. I can not convert. And I am not sure this in not out of scope here, because this are thousands small pieces (specially in the first file I review). I like to get a second opinion please. --Slick (talk) 19:29, 24 February 2014 (UTC)

Assigned to Progress Bot name Category


  • Source to upload from:

    • Did you observe an URL pattern

A list of all applicabel seals is currently created. Either plain URLs or a file (e.g. csv) with description, tags etc.

    • Do you know whether the site has an API

The site uses MediaWiki-Software.

    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?

This is our own site.

  • Describe the works to be uploaded in detail (audio files, images by …):

This site contains a huge collection of historic letter seals from German governemental and administrative authorities.

  • Which license tag(s) should be applied?
  This image is in the public domain according to German copyright law because it is part of a statute, ordinance, official decree or judgment (official work) issued by a German federal or state authority or court (§ 5 Abs.1 UrhG).

বাংলা | Dansk | Deutsch | English | Esperanto | Español | Suomi | Français | Italiano | 日本語 | Македонски | Malti | Nederlands | Polski | Русский | Svenska | 中文(简体)‎ | 中文(繁體)‎ | +/−


  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Veikkos-archiv (talk) 07:43, 16 January 2014 (UTC)

We need help for selecting the best upload process. Did we understand correct that we have to wait for an admin to show up in the "Assignes to" field?


Assigned to Progress Bot name Category

Kurt RasmussenEdit

  • Source to upload from: Kurt Rasmussen,
    • Did you observe an URL pattern
      • where xxxx is a four-digit number
    • Do you know whether the site as an API
      • I think not.
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
      • Essentially the algorithm that needs to be done is:
        • for each page in the linked search results
          • for each div class="bildvorschau" in the search results
            • download the url given in the first a href=, use this URL as source in the final information template
            • in the now downloaded file, find div class="bildcontainer"
            • in it, from the p class="beschreibung", extract the description to be used in the final information template
            • from the img tag immediately following it, download the url in the src attribute
            • upload it to Commons
  • Describe the works to be uploaded in detail (audio files, images by …):
    • All images by Kurt Rasmussen.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

I am also, parallelly, trying to coordinate a manual upload of this huge collection of extremely valuable photos. For that, see User:Darkweasel94/Rasmussen. darkweasel94 13:42, 13 December 2013 (UTC)


Assigned to Progress Bot name Category
darkweasel94 finished coding, will upload in the next days will probably do this from my own user account Category:Files uploaded by darkweasel94 (cleanup) (also contains other stuff)

Bildarchiv AustriaEdit

  • Source to upload from: Bildarchiv Austria
  • Describe the works to be uploaded in detail (audio files, images by …): All photos there that have a date before 1912. I think this should include everything tagged "um (= around) 1900" or earlier, but not "um 1910" per COM:PRP (because this might include stuff that was taken in 1913)
  • Which license tag(s) should be applied? {{PD-Austria-1932}} {{PD-1996}}
  • Is there a template that could be used on the file description pages? Do you think a special template should be created? I don't know if we need anything more specific than {{information}}.

It isn't urgent that this be done because this source isn't likely to disappear, but if somebody wants to hack together something to get these photos with adequate description, I'd be very grateful because there are many useful photos of Austrian history there. darkweasel94 10:04, 3 December 2013 (UTC)

  •   Info collection example, single image example, raw image example and the image download size is (max.) 800px, otherwise you have to pay. Maybe it is possible to request a higher resolution for commons? --Slick (talk) 14:46, 3 December 2013 (UTC)
    • It might be, but that would very much harm their business, so I wouldn't find it especially likely that they will do so. The 800px versions would already be very nice to have, but if we can get more, that's of course even better. ;) darkweasel94 15:14, 3 December 2013 (UTC)


Assigned to Progress Bot name Category


As part of a partnership with Wikimédia France, the Musées de la Haute-Saône (codenamed Champlitte) are sharing part of their collection on Wikimedia Commons. Jean-Fred (talk) 23:25, 24 November 2013 (UTC)


The first tests look great. Is there anything that needs to be done for alignement ? I don't see any subject on the source records in Joconde that should be used to decide categories to use, so everything goes into the museum category and we categorize after ? Symac (talk) 09:52, 26 November 2013 (UTC)

Thanks Symac for reviewing this.
The dataset provided by the museum was just a sample, with merely 136 records. Looking at the metadata export, I did not see any obvious candidate for an alignment − but that may be because of the sample size. I did not see any good source for categorisation, unfortunately − but that may be because I get a bit confused with the numerous Joconde fields.
There is though some parsing work to do. Size should be pretty much done ; a reverse look-up table together with a split-match-and-apply should do for technique (modulo the metadata confusion) ; rest does not seem to be good candidates for that either. Less work to do it seems :) Jean-Fred (talk) 22:05, 11 December 2013 (UTC)
If after two weeks there are no more concerns than mine (to which you answered perfectly), I think it should be a good idea to ask more data to the provider to go further with this partenership. Symac (talk) 07:18, 12 December 2013 (UTC)
Okay, we had a phone call with the museums last week, the project is back on tracks. We will make use of the GLAMwiki Toolset Project. Current target is to proceed with the upload at the end of January. Jean-Fred (talk) 23:22, 21 December 2013 (UTC)
Update: This is still happening. The museum folks are experimenting right now with the GWToolset on Commons Beta. Jean-Fred (talk) 10:20, 5 February 2014 (UTC)


Redux: GWToolsetEdit

We are getting very close to push files here. User:Tounoki from the museum is managing this.

Test files:


Jean-Fred (talk) 13:34, 11 April 2014 (UTC)

Some quick feedback based on the examples above.
  1. If the descriptions are always in French then a {{fr|1=<description>}} should be wrapped around it.
  2. "lieu de création" (currently part of the object history) should be mapped against the "place of creation" parameter instead
  3. Date should use {{other_date}} if possible. Unsure if GWtoolset supports this but looking at the json it the source data might have sufficient structure.
  4. Measures should use {{size}}. Looking at the json the source data should have sufficient structure.
  5. In the Sabot images
    1. The license seems to have disappeared
    2. an empty Creator template is used
    3. the institution template seems to be broken
    4. apostrophes ( ' ) seem to have been replaced by "&​#39;" (everywhere else is used)
/Lokal_Profil 14:55, 11 April 2014 (UTC)
Hi André, thanks a lot for your feedback.
  1.   Done I used {{Original caption}} as it labels it as a description − we need to stuff other things into this field.
  2.   Done
  3. I gave a try at parsing the date but it does not capture the complexity of the dates (for example « début 14e siècle » is only parsed to 14th century)
  4.   Done Right.
As for the Sabot images I’m not sure what happened there − maybe User:Tounoki would know?
Jean-Fred (talk) 16:07, 14 April 2014 (UTC)
Assigned to Progress Bot name Category

Fonds Eugène Trutat bisEdit

Follow-up of Commons:Batch uploading/Fonds Eugène Trutat. The archives provided the rest of the Fonds Eugène Trutat, and in better resolution. Jean-Fred (talk) 19:48, 19 October 2013 (UTC)



Some of the metadata is processed using manual alignment.

   Done − Let’s say it’s good enough as it is :-þ


All right, I think we are all set. Tests have been updated on /test.

Here is waht we are looking at categorisation wise − note that these numbers only account for the categorisation made through the alignment ; all files are at least in Category:Fonds Trutat - Archives municipales de Toulouse (plus a bunch of hidden ones).

Per category
The program added 912 categories, 92 distinct ones
The most used category is on 405 files
The less used on 1 files
On average, a category is used 9.9 times (mean)
The median is: 2.5
Per file
The most categorized file has 5 categories
The less categorized file has 0 categories
We have 10 uncategorized files
We have 289 files with two categories or more, which makes 60.2%
On average, a file has 1.9 categories (mean)
The median is: 2.0

Jean-Fred (talk) 00:00, 18 November 2013 (UTC)

First one uploaded: File:(Avignon (Vaucluse). Remparts) - Fonds Trutat - 51Fi480.jpg. Will proceed with the rest shortly. Jean-Fred (talk) 23:14, 20 November 2013 (UTC)
  • Ten more done ; here they are (using the awesome {{MyUploads/grid-photostream}} ;-) Jean-Fred (talk) 00:11, 22 November 2013 (UTC)
  • 40 more done. I’ll probably fire the rest at the end of the WE if there are no further remarks. Jean-Fred (talk) 23:58, 22 November 2013 (UTC)
  • All new ones done. Still have the ones uploaded 3 years ago to update. Jean-Fred (talk) 17:39, 3 December 2013 (UTC)
Assigned to Progress Bot name Category
User:Jean-Frédéric User:TrutatBot

Spread the signEdit

This is what the films look like. This one shows the Swedish sign for apelsin (orange).

  • Describe the works to be uploaded in detail (audio files, images by …):
    • Spreadthesign has around 150 000 films of signs in 16 different languages, and are continuing to make more films in new languages. They want to share their films to help raise awareness about sign language and to make better use of the material they have. The films are of high quality but they have yet to decide what resolution and format they want to upload.

  • Which license tag(s) should be applied?
    • CC-BY-SA

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
    • It would probably be a good idea to have a special template. One thing I'm thinking about is a field that allows connecting films with the same word in different languages for example.

Axel Pettersson (WMSE) (talk) 10:36, 29 August 2013 (UTC)


In a first meeting me and Lokal_Profil had the following comments on the code:

  • Make the filename $word_$language-spreadthesign.ogv
  • Create categories with $wordclass (or what it might be called to have nouns, verbs and such in separate categories)
  • Create the categories Videos of sign language in $language and $language sign language
  • Make the description more dynamic and at least in english and the language of the film. Help with l10n might be needed here, although they have a network with partners in all languages they have films in.
That’a cool project! Awesome :)
Just a few rapid thoughts
  • I gather from the Gist that source links all follow the same pattern − it might be worth to create {{STS link|$vid}} to create the link (whose label could be i18n).
  • Not sure why you need a special template for. Connecting to similar films could be done through the other versions field. Am I missing something?
  • License: File:67329.webm is tagged with CC-BY.
  • Author: do we have better metadata for that? « own work » does not really cut it. Who should be attributed here?
All for now. Hope that helps!
Jean-Fred (talk) 21:06, 29 August 2013 (UTC)
Thanks for your concerns Jean-Fred, we also feel it's important projects.
  • We want to use one correct template to create all commons pages to 150000 uploaded films.
  • Yes, we have better data. « own work » is gone, each file name will be as suggested above $word_$language-spreadthesign.ogv, better, dynamic(language supported) description to.
Spreadthesign10:21, 30 August 2013 (UTC)
Updated bot code.
  • New version is available here
Spreadthesign10:21, 30 August 2013 (UTC)
Updated bot code.
  • Changes link source here
  • We vill use CC-BY-SA-3.0 for the upload.
Spreadthesign 09:33, 2 September 2013 (UTC)
Updated bot code.
  • Added support for wordclass category [[category:verb]] here
Spreadthesign(talk) 10:45, 3 September 2013 (UTC)
Updated bot code.
Example: Apelsin spreadthesign.ogv
SpreadthesignBot (talk) 08:35, 4 September 2013 (UTC)
I created {{STS-cooperation}}. Please help translate it and put it in the code somewhere. /Axel Pettersson (WMSE) (talk) 10:14, 5 September 2013 (UTC)
A few things: 1) Which language codes are you planning to use? It makes sense to use the sign language code and not the spoken/written language code (so swl for the Swedish sign language, not sv). 2) In the bot code I don't see any simple way to cancel uploading or pausing between uploads. You need a way to shut it down in case of "emergency", and in the beginning you should start uploading at 1-2 files per minute, and gradually increase that. 3) Are the videos already encoded as OGV? If not, you might consider using Webm instead, since that gives better quality, but it's no big deal. Skalman (talk) 08:04, 6 September 2013 (UTC)
Wordclass (ordklass in Swedish) is called part of speech in English, POS for short. What I understand, filenames on Commons need to be unique. Same word can belong to different parts of speech. So if I am not wrong, I suggest that the part of speech should also be part of the filename. There should also be possible to add another optional distinguisher if two or more signs are used for the same word of the same part of speech.
Regarding the order $word-$language, isn't it better to state the $language first, like $language-$word-$pos-$optional_dist-spreadtheword.webm/ogv. Please give me your thoughts.
I don't know if this matters, but working with Wiktionary I am used to the fact that capitalization matters. Is there a reason to not normally name the files without capital letter for the word? Like swl-apelsin-noun-spreadthesign.webm/ogv. ~ Dodde (talk) 17:16, 6 September 2013 (UTC)
Thanks all.
To Skalman.
  • We were planning to use language code (ex. sv for Swedish, Svenska) for naming the files simply because we are able to support it, now we may add support for sing language code.
  • Yes, we thought we can limit the amount of files proceeded by limiting how many rows are fetched from the database i SQL. As you mentioned we are going to start with a few records and then increase the amount.
  • All videos we are going to upload are in flash, mp4 or/and ogv format.
Spreadthesign 13:31, 11 September 2013 (UTC)
To Dodde.
  • Today we cannot support using POS for naming the files, I'll create another distinguisher.
  • I can't see any benefit using $language before $word or another way around. If there is any please tell me.
  • Capitalization matters, I guess it was just a typo or test case.
Spreadthesign (talk) 19:51, 11 September 2013 (UTC)
A matter of organizing the files, I suppose. It's easier to see which language the word belong to, if the language code is presented first. In a listing, words of the same language would be grouped together. That is all I can say.
In order to be able to insert entries for the signs, the information regarding part of speech needs to be present. Is this information present somewhere else (in some database), or is it expected the person who runs the bot should manually decide or sometimes guess for each word before creating the entry? ~ Dodde (talk) 22:54, 11 September 2013 (UTC)
For pronunciation files, I believe that the language always comes first, see Category:Pronunciation. It makes sense to use the same system for sign language videos.
If you're not using the actual sign language codes, what do you intend to do with written languages that are covered by multiple sign languages, such as English (US, UK)? Are you using codes such as en-uk? I really think it'd be easier and more correct to just use the sign language code.
Regarding capitalization: The first letter of a file name is always capitalized here (which is another reason to have the lang code first). However, File:Coca-cola spreadthesign.ogv should actually have a capital C in the middle (as well as the lang code).
I feel that it'd be good if somebody who is part of the Commons community commented on this as well (I'm only familiar with the Swedish language Wiktionary). Skalman (talk) 06:26, 12 September 2013 (UTC)
Updated bot code.
  • Added support for sign language cod for naming the files. Now the file name we creating gonna be: swl-apelsin-spreadthesign.ogv sign of orange i Swedish sign language or ase-orange-spreadthesign.ogv sing of orange in American sign language. here.

SpreadthesignBot (talk) 12:16, 12 September 2013 (UTC)

Great work with the upload preparations. A few thoughts though.
  • Are the words always unique? I.e. the Swedish banan (banana/the track) could technically have two signs which would each end up becoming swl-banan-spreadthesign.ogv. A way around this would be to append the internal STS-id. In this case banana would become swl-banan-spreadthesign-98036.ogv. This also solves the issue with e.g. apelsin having three separate videos
    • As a follow up to the three different apelsin videos. Will the information "vanligast/lika vanlig/används i Norrland" which distinguishes the three be included somehow?
  • The license used should probably be {{cc-by-sa-3.0|[ Spread the Sign]}} instead of {{self|cc-by-sa-3.0}}. What I did was to bake this into the {{STS-cooperation}} template so that this can be used as the permission parameter instead. This template also includes the Media contributed by Spread the sign-category meaning that it should also be removed from the github code.
  • The license should be complemented by an e-mail being sent to stating that STS are the owners of the material uploaded by User:Spreadthesign and releases it under cc-by-sa-3.0. Once this is properly registered the id can be added to {{STS-cooperation}}.
  • As for using a special information template. Since the information is structured (POS, language etc.) and there are so many videos it might actually help localisation to use a purpose created template. What I'm thinking is that if there is e.g. a parameter for POS then the language mapping could be done directly in the template (similar to how {{Technique}} works). Similar thing could be done with the languages etc. Other opinions on this would be welcome though.
/André Costa (WMSE) (talk) 15:12, 16 September 2013 (UTC)
Using a template to enter POS and any other information would be very helpful when inserting videos+descriptions automatically into Wiktionary entries, as long as it's well-structured. Skalman (talk) 08:48, 17 September 2013 (UTC)
Yeah, Spreadthesign is back. Thank you all for your thoughts, advices and assistance. We are very excited to begin the upload of our material very soon.
Now for the questions:
  • All files should be unique, I added a distinguishe for each file name.
  • I'm not quite sure how the license should be if not {{self|cc-by-sa-3.0}}. Please explain!
  • {{STS-cooperation}} was complemented, id release is added.
  • We understand how useful it would be to use the POS, but today this is not possible due to lack of support in the db.
  • Updated bot code is available here

SpreadthesignBot (talk) 13:29, 30 September 2013 (UTC)

In the code you should change permission from {{CC-BY-SA 3.0}} to {{STS-cooperation}}. Then it will be as in Apelsin spreadthesign.ogv with complete cooperation, license and OTRS-templates.
I still think description should include both english and the language of the film. Probably something like {{Multilingual description|en=$desc $categoryLanguage|(something that finds out language)=($desc in the language of the film $language in the own language)}} as it would be helpful to non-english communities.
Bot request is at here. Please help with the aproval there.
/Axel Pettersson (WMSE) (talk) 12:22, 3 October 2013 (UTC)
Please don’t use {{Multilingual description}}, please favor {{sv|...}}{{en|...}}. The behaviour of Mld is automatically triggered when there are more than N languages in the description field (don’t remember right now how much is N). Jean-Fred (talk) 13:12, 3 October 2013 (UTC)
Sorry about that, I didn't know. On the other hand, if the desription field only has two languages, english and the language of the film, will it be triggered then? Or maybe it doesn't matter as there is only two languages there. Still have the problem of inserting the right language-code there also, or is there an existing solution somewhere? /Axel Pettersson (WMSE) (talk) 13:39, 3 October 2013 (UTC)

Way to go, File:Bho-make+a+reservation-spreadthesign-9982.ogv is up and running. Some thoughts:

  • No need for + in filenames, it should be Bho-make a reservation-spreadthesign-9982.ogv
  • Format the upload as this
  • The description should state that it's Brittish sign language. Something like "Book a table in a restaraunt at a particular time so that you can eat a meal in British sign language."
  • Categories should be on one line each
  • Categories should (probably) be category:British English sign language and category:Videos of sign language in British English for Bho.

/Axel Pettersson (WMSE) (talk) 09:02, 21 October 2013 (UTC)

A few more points:
  • The name of the language is British Sign language - any categories should probably not include the word "English". The categories Axel suggested should probably be category:British Sign Language and category:Videos in British Sign Language (though I don't understand the difference between them - are both needed?)
  • I am confused as to which language this is. "bho" is the language code of Bhojpuri - British Sign Language has the language code "bfi".
  • The link back to should not be a Swedish language link. ->
  • Where does the description/definition "Book a table in a restaraunt at a particular time so that you can eat a meal" come from? On I only see the text "make a reservation" (+the translations to other written languages). I believe you can "make a reservation" at a hotel too, so which description is accurate?
Skalman (talk) 12:23, 21 October 2013 (UTC)


  • + in a file name is a bug and it's fixed already.
  • The description will be complemented
  • Categories as well.
  • "bho" comes from i think i got it right.
  • Nice point with the link back to vill fix it until next test upload.
  • Sign description comes from our colleagues around the world whom get the chance to help since they know better what each sign means in theirs own language.

SpreadthesignBot (talk) 19:02, 21 October 2013 (UTC)

@SpreadthesignBot: That's an old version of Ethnologue. Starting with Ethnologue 15 it's "bfi". See here for the current version: On sv-wikt we use ISO 639-3 if Wikimedia doesn't have a special code (and I believe that the new version is the same as ISO 639-3). Skalman (talk) 08:40, 22 October 2013 (UTC)
To Skalman

Thanks a lot Skalman, my mistake.

SpreadthesignBot (talk) 12:13, 22 October 2013 (UTC)

Any status update? Skalman (talk) 11:51, 15 November 2013 (UTC)

Reboot in DecemberEdit

I've added some new movies

New description, wordclass in the categories and some more issues solved. Please have a look and comment here. /user:SpreadthesignBot (through Axel Pettersson (WMSE) (talk) 10:15, 6 December 2013 (UTC))

No comments after waiting for a few days. Moving along with some more uploads now, but feel free to interrupt or comment as we move along. /Axel Pettersson (WMSE) (talk) 10:54, 9 December 2013 (UTC)
Hey, I just hac a quick look. Looks very good, not much to say − please upload more!
One feature request just for the pleasure to ask for the impossible ;-). I see descriptions are provided in English and Swedish − good. But I see that translations are available in many more languages on STS website ; for example 81278, if inserted with a /de/, gives /de/81278/ which says “personen”. Any chance to fetch all those and add them to the file description page ? :) Jean-Fred (talk) 11:48, 9 December 2013 (UTC)
My suggestions:
  • Put the word in quotes (e.g. "annat" på svenskt teckenspråk)
  • In English, language names use capital letters, so it should be Swedish Sign Language (but "svenskt teckenspråk")
  • I'm wondering about File:Swl-annat-spreadthesign-73566.ogv - "annat" in Swedish does not mean "else" in English (annat=other, annars=else). I hope such mistakes are uncommon, but it would be nice to know what the actual meaning is - should we assume that for Swedish Sign Language videos, the Swedish description is (most likely to be) correct? Is there a good place to report errors?
Nice to see some activity here! Skalman (talk) 16:12, 9 December 2013 (UTC)

How is it going? Do you need help uploading? I might be able to help. Are there other considerations? Skalman (talk) 14:22, 14 January 2014 (UTC)

Files uploaded during test periodEdit

Theese files should be deleted and uploaded again with correct name and format later on.


Assigned to Progress Bot name Category
Axel Pettersson (WMSE), Spreadthesign coding SpreadthesignBot Media contributed by Spread the sign


See Com:DPLA for an overview of the project. The DPLA has metadata for over 2 million records; sadly only a portion of these are PD. User:Bdcousineau is going through collection by collection to reveal PD materials. See Com:DPLA for the list.

  • Source to upload from:
    • Did you observe an URL pattern
    • Do you know whether the site as an API
The DPLA has an an API that is available for use, however, it is a metadata repository. The source files will be linked from the local website. See Commons:Bots/Requests/Smallbot (10) for sample templating, etc. Bot operator retired before upload was begun.
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
Given the mission of the DPLA, there may be no need. The DPLA has representation on the project page. Project coordinator is happy to contact site owners as needed, if needed.
  • Describe the works to be uploaded in detail (audio files, images by …):
Jpg and tiff files.
  • Which license tag(s) should be applied?
Depending on the collection, either a {{PD-US}} tag, or a {{PD-USGov}}.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created? Depending on the collection {{Artwork}}, {{Photo}}, {{Book}}. We've also created a preliminary institution tag that will be adjusted to reflect the owning institution.

Bdcousineau (talk) 00:59, 5 August 2013 (UTC)


I'd be happy to assist with the task. We however need to establish a good way to handle this. Perhaps a specific template should be created that holds all the notes on the linked nom. This way we would have more control on the licensing should we desire to make slight updates. Also was the code to retrieve the files ever created before? -- とある白い猫 ちぃ? 10:31, 8 September 2013 (UTC)

Hi, thanks! Please know I'm not a techie, so what is "linked nom"? For the initial batch we were working with, we mimicked the templating created by prior uploads (see Commons:Bots/Requests/Smallbot (10) - there is a sample of the JSON source on that page as well. I can see however that an overarching template will be needed. As far as the licensing goes, all the materials we started to work with are {{PD-USGov}}, the others will be different.
I guess the big question is, where would you like to start? With where we left off? or with a smaller batch? A smaller batch makes the most sense, in that the templating and licensing can be in adjusted as you suggest. The Massachusetts Digital Commonwealth has a few smaller collections that are PD - and total approx 1500 items.
To be clear, disclaimer: I am a NARA employee - this project has nothing to do with my official duties, nor does it reflect official policy, etc etc. Bdcousineau (talk) 12:27, 8 September 2013 (UTC)
OK so perhaps we should do this like a Q&A to avoid mistakes. I meant the Commons:Bots/Requests/Smallbot (10) when I stated "linked nom".
  1. At this repository do we have a variety of licenses? If so is there a list of it? Can we easily distinguish the license of each file?
  2. Were any files copied to commons with a bot before? Or was that never the case? I'd rather avoid re-engineering code if one already exists. What exactly do you mean by "where we left off"
  3. Does this repository grow in size? If so how often do we need to update?
  4. Do you have a link to the API and example sample images?
-- とある白い猫 ちぃ? 13:04, 8 September 2013 (UTC)
Much easier, thanks!
  1. For this project, only PD materials are appropriate. Some will be {{PD-USGov}}, others {{PD-art/1923}}, and other will be {{PD-1923}}. In general, each collection will have the same license for each of their files - for example, the ARTStor files (10K files) will be {{PD-art/1923}}, and the MassDigCommonwealth will be {{PD-1923}}. Even though the DPLA has huge number of files, only a small percentage are PD. Yes, easily distinguishable. Also, I can generate any list you'll need collection by collection.
  2. No, no files copied to Commons yet. "Where we left off": group consensus asked previous uploader to upload small sample batch for further review. Task not completed. Most likely that previous work is useless, and should be ignored.
  3. Yes, DPLA grows in size, both inside each collection, and as new service hubs/partners are added. The is no consistent languaging for licensing, either, licensing developed at the donor level, and is wildly various. Last time I checked, searching by licensing field was not an option - PD mapping done by hand. Since the project has a DPLA contact (user:SJ), it might be possible to get better access to the rate at which material is added to DPLA. Can this be put off for a moment?Also, since the project does have a DPLA connection, it's reasonable that at some point he needs to be drawn in to consnesus process around templating, etc, especially if DPLA-specific templates are developed.
  4. You have to get a key for the API here. Sample images: DPLA is broken, can't get any search results. Will try again later today.
New: The DPLA Dev team and others associated with the DPLA were excited when we contacted them about this (April 2013)... so I am assuming we can get some support from them if needed. Bdcousineau (talk) 14:18, 8 September 2013 (UTC)
One possibility is them uploading to Flickr and I can use existing code to receive it. They can throttle their internet usage with this way too to prevent outages as the bot would be relentless (since I don't know their upload limits). They can for instance use . For the script to work they must release it with a free license. If they are willing to do this option, I wouldn't need to code. Or they can upload directly to commons of course. I just am curious if they are unwilling to do either. -- とある白い猫 ちぃ? 21:25, 8 September 2013 (UTC)
Hmmm... that level of support is unlikely, it'll be more like a thumbs-up/pat on the back/yes go for it. IMHO I don't think the DPLA is in the business of pushing the files out once the service hubs sign on, they are strictly a repository; while a great angle, this version of the plan is prolly a non-starter. It'll be up to Wikimedians to figure out a way to bring the files to Commons. BTW I really appreciate having this discussion, thanks. Bdcousineau (talk) 23:59, 8 September 2013 (UTC)
Well, I need sample images, urls etc to work with. -- とある白い猫 ちぃ? 20:42, 11 September 2013 (UTC)

Ok, will try by Saturday, surely by Sunday am. Tied up til then. Thanks so much. Bdcousineau (talk) 01:02, 12 September 2013 (UTC)

Please do not hurry, I am rather busy with real world affairs until more or less the end of this month. This is an issue that needs to be handled with time and care anyways. -- とある白い猫 ちぃ? 21:12, 13 September 2013 (UTC)
Great! Here are sample urls of declared PD materials:
All the metadata from the DPLA's API is PD. Let me know if this is useful, and what you needed. Bdcousineau (talk) 22:15, 15 September 2013 (UTC)
Assigned to Progress Bot name Category

Museum of History of PhotographyEdit

Museum of History of Photography contains online collection of public domain photographies within two categories: Photos and Equipment. It would be valuable for Commons and Wikipedia projects as included in:

  • old cameras and camera equipment,
  • historical context of geographic places,
  • photographies of notable people,
  • photographies by professional photographers; different techniques of photography,
  • etc.
  • Source to upload from:
    • There is an URL pattern.
    • Site does not provide any API.
    • What else can ease uploading?
      • There are also books available at the website (scanned pages as JPG). There are sections which can be downloaded optionally: BOOKS, BULLETINS, POSTERS, CALENDARS, ARCHIVE
  • Which license tag(s) should be applied?
    • Public domain. The website says: "Copyright: Exhibits in the Public Domain MHF allows visitors to make full use of the digitalised images of the museum exhibits in the formats published by the Museum in the public domain on condition that the source and creators are acknowledged. The terms and conditions of the accessibility of superior quality digital images of museum exhibits are stated in The Rules of Accessibility of the MHP Collections and the Price List."
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
    • Author:
    • Subject:
    • Geographical location:
    • Dating:
    • Genus:
    • Technique:
    • Dimensions:
    • Inventory number:

dariusz woźniak (talk) 07:37, 16 July 2013 (UTC)


Assigned to Progress Bot name Category

Rubin Kazan - Llevant UDEdit

  • Source to upload from:
    • Did you observe an URL pattern
    • Do you know whether the site as an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
  • Describe the works to be uploaded in detail (audio files, images by …):

Photogallery from of an historical match for Levante UD. Good quality images of players that maybe don't have any better portaits.

  • Which license tag(s) should be applied?

The usual license.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?


54 images. Images are at where NNNNN is 57099 to 57152. There is no per-image metadata. BMacZero (talk) 15:43, 1 July 2015 (UTC)

Assigned to Progress Bot name Category
Category:Rubin Kazan-Llevant 15-03-2013


The renovated en:Rijksmuseum in Amsterdam has made their digital collection of 111,000+ objects available digitally under a CC-0 license ( An API key is needed for digital downloading ( According to the museum:

"All object descriptions available via this API are covered by a Creative Commons 0 licence. The images are in the Public Domain, according to which the data and the images are free of rights and may be copied, changed, distributed or exported without the Rijksmuseum’s permission."

Sandstein (talk) 20:40, 7 April 2013 (UTC)

  • Describe the works to be uploaded in detail (audio files, images by …): Presumably the entire collection is of use. According to "The Rijksmuseum API Collection is a set of more than 110,000 descriptions of objects (metadata) and digital images from the Rijksmuseum collection. The works of art and implements in the set date from ancient times through to the late 19th century and provide an excellent overview of the richness, diversity and beauty of the Dutch and international heritage. Unfortunately, copyright restrictions mean that we are not yet able to include any works from the 20th or 21st centuries. The set includes paintings and prints (ranging from the great masters of the Golden Age through to anonymous biblical paintings and other painted objects from the Middle Ages), 19th-century photographs, ceramics, furniture, silverware, doll’s houses, miniatures, etc. Digital photographs were taken of all of the objects in this set."
  • Is there a template that could be used on the file description pages? Do you think a special template should be created? Museum:Rijksmuseum. Also, the following should probably be taken into account even though we are not an app: "In all apps to be built in which images belonging to the Rijksmuseum are used, app designers will credit these as having been built with the API of the Rijksmuseum, including images and documentation. The credit must be placed where it can be seen easily by users. App-builders will credit all images with the words ‘Rijksmuseum collection’."


  • I'm quite aware of this awesome collection. Haven't uploaded it yet because we're planning to use it as pilot for Commons:GLAMwiki Toolset Project. Not sure when this will happen exactly, probably in the next months. Multichill (talk) 10:42, 13 April 2013 (UTC)
Assigned to Progress Bot name Category

Fonds AncelyEdit

This upload is part of a partnership between Wikimédia France and the Library of Toulouse. It consists of 2085 public domain files. You may see general notes and work in progress on User:Jean-Frédéric/Ancely.

The metadata is held in a OAI PMH repository. The code explores it and retrieves records ; then if applicable the various fields are matched to a manual alignement of Commons categories and tags, community curated. This is then fed to a data ingestion templates which translates the metadata to {{Artwork}}. Actual upload is made with Pywikipedia-rewrite by User:AncelyBot.

In its current state, the categorisation system with the alignment outputs 31,801 categories (1,694 distinct) − the drawback is that many are high-level categories (“Shawls”, “men”, etc.)

Looking forward your thoughts, Jean-Fred (talk) 22:49, 6 March 2013 (UTC)


  • Uploaded five more − see Special:ListFiles/AncelyBot Jean-Fred (talk) 01:14, 16 March 2013 (UTC)
  • Uploaded fifteen more − and I will continue uploading files until my demands are met! Jean-Fred (talk) 00:23, 19 March 2013 (UTC)
  •   Support everything looks fine for me. (may be a bit overcat) --PierreSelim (talk) 14:24, 20 March 2013 (UTC)
  • Ok, uploading 100 right now. Jean-Fred (talk) 21:06, 11 April 2013 (UTC)
  • Looks very good. The only thing that worries me a bit is the number of categories per image. That might become a problem. Please upload more! Multichill (talk) 10:39, 13 April 2013 (UTC)
  •   Oppose now, we have forgotten to finish the Creator mapping User:Jean-Frédéric/Ancely/Creator --PierreSelim (talk) 12:02, 25 April 2013 (UTC)
  • Uploaded the first 350. Jean-Fred (talk) 23:08, 7 May 2013 (UTC)
  • Uploaded the first 500. Jean-Fred (talk) 13:04, 8 May 2013 (UTC)
  • Uploaded the first 800. Jean-Fred (talk) 14:05, 10 May 2013 (UTC)
  • Made it 1,000. Jean-Fred (talk) 23:15, 12 May 2013 (UTC)
  •   Done. 2041 files uploaded + 33 dupes + 11 errors = 2085 files, the size of the corpus. Jean-Fred (talk) 14:49, 24 May 2013 (UTC)



The following files were already on Commons − we might want to update their file descriptions (current: 33)


The following files failed to upload (current: 11)

Categorisation statisticsEdit
Per categoryEdit

30266 categories, 1760 distincts Mean: 17.1965909091 Median: 2.0 Max 1045 // Min 1

Top 10: [(u'Mountains in art', 1045), (u'Men in art', 992), (u'Women in art', 878), (u'Trees in art', 780), (u'Houses in art', 736), (u'Pyr\xe9n\xe9es-Atlantiques', 693), (u'Hautes-Pyr\xe9n\xe9es', 617), (u'Pyrenees', 470), (u'National costumes in art', 468), (u'Rivers in art', 440)]

Lose 10: [(u'Estrades', 1), (u'Pierre Bayle', 1), (u'Morla\xe0s', 1), (u'Louis-Fran\xe7ois Couch\xe9', 1), (u'Jean Racine', 1), (u'Faience in France', 1), (u'Marmite', 1), (u'Corsica', 1), (u'Dordogne River', 1), (u'Esera River', 1)]

Per fileEdit

Mean: 14.5160671463 Median: 13.0 Max 47 // Min 0

Top N: [('B315556101_A_LEVASSEUR_066', 47), ('B315556101_A_LEVASSEUR_068', 46), ('B315556101_A_LEVASSEUR_018', 44), ('B315556101_A_LEVASSEUR_056', 42), ('B315556101_A_LEVASSEUR_057', 42)]

Lose N: [('B315556101_A_BERTHIER_010', 1), ('B315556101_A_BERTHIER_024', 0), ('B315556101_A_BERTHIER_021', 0), ('B315556101_A_BERTHIER_018', 0), ('B315556101_A_BERTHIER_013', 0)]

Assigned to Job Progress
Jean-Frédéric Metadata pre-processing Status:    Done
Jean-Frédéric, Symac, Léna, PierreSelim Metadata alignment Status:    Done
User:Jean-Frédéric Upload Status:    Done
Dupes and errors processing Status:    todo

South African churchesEdit

User af:Gebruiker:Morne has uploaded hundreds of perfect images of buildings in South Africa (mostly churches) in Afrikaans Wikipedia, all under the same licence "you are free to use, copy, modify, if you properly credit the author" (see an example). I consider it important, as there are unfortunately relatively few images of South African cities, towns and villages in Wikipedia. --Dmitri Lytov (talk) 03:21, 3 March 2013 (UTC)

  • Describe the works to be uploaded in detail (audio files, images by …):

It's a collection of several hundred images of churches in South Africa.

  • Which license tag(s) should be applied?

"you are free to use, copy, modify, if you properly credit the author" (see an example).

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

Sorry, no idea.


Assigned to Progress Bot name Category

National Gallery of ArtEdit

Jean-Fred (talk) 14:19, 18 January 2013 (UTC)

  • Source to upload from: National Gallery of Art online database, per their open access policy
    • Did you observe an URL pattern
    • Do you know whether the site as an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
      See here, they welcome the idea
  • Describe the works to be uploaded in detail (audio files, images by …):

Artwork digitisations

  • Which license tag(s) should be applied?

Existing uploads seem to rely on {{PD-author|National Gallery of Art}} or {{PD-art|PD-old-100}}.

I guess a custom wrapper for {{Licensed-PD-Art|PD-old-whatever|{{PD-author|National Gallery of Art}}}} would do the trick (in the spirit of {{Walters Art Museum license/2D}})

Went ahead and created {{PD-Art-National Gallery of Art}}. Jean-Fred (talk) 14:47, 18 January 2013 (UTC)


Assigned to Progress Bot name Category

11k of Areal PhotosEdit

In the course of the arial photo project of the German Wikimedia de:Wikipedia:Projekt Fotoflüge I wrote an article for a pilots magazine. After that I got in contact with a Pilot who wants to share his own created areal photo collection which he created over the past 24 years. It seams that all photos are already geo-referenced and classified (by type like solar power plant, church as well as by region like Europe, Andalucia, Sanlucar). The classification as well as the geo-reference is within the exif data of the images. During a manual upload the geo-reference was recognized correct by commons. Because of the big amount of pictures it would be fine if there is some way to may automize the upload and if possible somehow to match the classification of the pictures to the commons categories. I have no idea if or how this is possible and it would be great to get some information if this is possible or to get some help for this request. The Classification is sometimes in German and not matching the Commons categories. The Pilot has already created a Wikipedia / Commons User and uploaded one example file where you could see how the data is sored within the exif Data.

  • Source to upload from:

The files are on a computer of the pilot / photographer.

    • Did you observe an URL pattern
    • Do you know whether the site as an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?

Not the site owner but the photographer User:Graf-flugplatz

  • Describe the works to be uploaded in detail (audio files, images by …):

About 11.000 of digital arial photos should be uploaded.

  • Which license tag(s) should be applied?

Has to be clarified with Author, but expect "CC BY-SA 3.0" like the example.

Update 18.12.2012: License "CC BY-SA 3.0" is approved by Author User:Graf-flugplatz.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?


Nice sample images. I'm from Germany too and I like to help. But not before end of april 2013 because I am away and busy. If this will be ok, just waiting ... --Slick (talk) 17:26, 9 January 2013 (UTC)

Ok, how can I get the images to upload? I like to have them here, so I can check the tags they have and can try to find best categories for. Possible solutions are I download them all from a source or you can send it to me on by CD/DVD (I am from germany too). You can contact me (in german please) here about this. Additional I suggest the pilot (or you) fill in a minimal content on his userpage for other they are interesting in the source/creator. (i.E. the same information as in this request) --Slick (talk) 08:51, 6 February 2013 (UTC)

Assigned to Progress Bot name Category
Slick Waiting for user response...

Garden of the Victory in ChelyabinskEdit

  • Source to upload from:

User Ain92 asked me to upload some photos with Panoramio Picker but I have never done it and found that it's too complicated to understand it in the nearest time. So I ask to upload for category:Garden of the Victory in Chelyabinsk all photos from this page and 2-9th photos from this page (they are cc-by). Анастасия Львоваru (ru-n, en-2) 07:03, 11 December 2012 (UTC)


Assigned to Progress Bot name Category

Gerald R. Ford Presidential Library and MuseumEdit

The Ford Presidential Lib/Museum is a federal archives, part of NARA. We'd like to create a partnership with Wikimedia:Commons and get all of our digitized material up. All materials are in the public domain. Agency management is on board, and we have a team already working on this! I've been uploading materials one-by-one, I've gotten about 170 images uploaded - see Commons:Gerald R. Ford Presidential Library and Museum - I figure it should take me til oh, 2215 to get everything up! We're looking for an administrator to work with and develop a plan. Bdcousineau (talk) 18:50, 5 September 2012 (UTC)

See Commons:Gerald R. Ford Presidential Library and Museum for current progress.Smallman12q (talk) 23:26, 17 September 2012 (UTC)


Assigned to Progress Bot name Category

Rudolf Steiner GesamtausgabeEdit

Die folgende Seite bietet alle Werke der Gesamtausgabe Rudolf Steiners (gemeinfrei) als Scan in zitierfähigen Ausgaben. Eine Übernahme zu Wikimedia Commons wurde hier besprochen und gewünscht.

  • I downloading the files und prepare for upload. Which one is the correct licence template in this case? I guess PD-old. Only this or need a second one? --Slick (talk) 21:14, 11 August 2012 (UTC)
  • Downloads finish. --Slick (talk) 08:47, 13 August 2012 (UTC)

A discussion in german about the licence can found here. Looks like there is a problem with scans from sources newer than 1923. --Slick (talk) 13:35, 15 August 2012 (UTC)

I cancel to support this batch job, remove all local work already done, because missing help/support although requested more than one time. Revert job to Request-List. --Slick (talk) 20:30, 23 August 2012 (UTC)


Assigned to Progress Bot name Category

Detroit Publishing Company at LoCEdit

"This collection of photographs from the Detroit Publishing Company Collection includes over 25,000 glass negatives and transparencies as well as about 300 color photolithograph prints, mostly of the eastern United States. The collection includes the work of a number of photographers, one of whom was the well known photographer William Henry Jackson. A small group within the larger collection includes about 900 Mammoth Plate Photographs taken by William Henry Jackson along several railroad lines in the United States and Mexico in the 1880s and 1890s. The group also includes views of California, Wyoming and the Canadian Rockies." Subject index; geographical index. cmadler (talk) 17:17, 20 March 2012 (UTC)

See also w:Category:Detroit Publishing Co. for those which have already been uploaded. --Hhm8 (talk) 05:15, 24 March 2015 (UTC)
The link has moved to [35] BMacZero (talk) 02:51, 16 May 2015 (UTC)


  Support Great collection of historical images. --Junkyardsparkle (talk) 21:48, 26 April 2014 (UTC)

  Support --Hhm8 (talk) 05:15, 24 March 2015 (UTC)

I believe this is now superseded by the TIFF uploads from the NYPL which includes their scans of the "Detroit Publishing Company postcards". Refer to User:Fæ/Project_list/NYPL#Collections_batch_uploaded. -- (talk) 13:01, 6 February 2016 (UTC)

Cesare BrizioEdit

Photographer Cesare Brizio has agreed to donate 1300+ images here. Images may be taken from the web page OR originals can be sent to anyone on a DVD if required. He also suggested some sound files - but they are in the wrong format (mp3).

Data from OTRS ticket 2012021810002796 follows (permission obtained to copy this OTRS message here)
Dear Ron Jones: yes, I confirm that I am actually glad to release all the images located at via the "View Media" link at as "Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)". Furthermore, I can provide upon request higher resolution versions (1024x768 or more) of almost all the same images.

By the way, I would gladly release as "Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)" all the audio samples (recordings of animal sounds) available at the web pages listed here:

best regards,

Cesare Brizio


  Support Sounds good espacally the fact that we don't have for all biological articals pics.--Sanandros (talk) 14:49, 21 August 2012 (UTC)

Assigned to Progress Bot name Category

Works of Maurice RavelEdit

All files from,_Maurice can be uploaded to Commons (57 files).

Maurice Ravel's works are in the public domain in France since a decision by the Cour de cassation in 2007 (French Supreme Court). See Wikipedia articles for details. There are about 35 published before 1923, for which there is no URAA issue. Yann (talk) 12:12, 15 September 2012 (UTC)

Category:Compositions by Maurice Ravel
License {{PD-old}}


Assigned to Progress Bot name Category
  • Waiting for the backlog of this page may take longer time than manual uploading 57 images using Special:UploadWizard. Bennylin (yes?) 12:49, 26 February 2012 (UTC)
    • If you would just had a look at the page or at least a bit of music knowledge...; but today I am bountiful and do not respond with other unhelpful comments. I just ask me how you could became steward with those hasty comments. If you want to help, you could take upload requests or analyze them carefully. Or are you even paid by WMF to advertise UpWiz?
  • Please make some suggestions how to get a good descriptions from the page. (Including a custom template, categories, ...)
  • Page structure:
  • {{Not-PD-US-URAA}} is not a valid license template (says that right on it!). Works should be verified as being PD or otherwise free in the US before uploading. Otherwise you're just adding to the Commons:WikiProject Public Domain/URAA review workload. cmadler (talk) 12:17, 27 April 2012 (UTC)
    • Pre-1923 works should be tagged with {{PD-1923}} to cover US copyright status. Post-1923 works are probably still copyrighted in the US, and should not be uploaded without investigation into the status. cmadler (talk) 13:35, 17 September 2012 (UTC)
      • Is this tag necessary even for non-US works? Yann (talk) 15:29, 17 September 2012 (UTC)
        • Yes, because works on Commons must be free in both the country of origin and the US. (Right on {{PD-old}}, it says, "You must also include a United States public domain tag to indicate why this work is in the public domain in the United States.") Alternatively, {{PD-old-70-1923}} is a single template covering both the US and French copyright. cmadler (talk) 12:39, 18 September 2012 (UTC)
  •   Oppose Actually, now that I look at it, I don't think any of his works are in public domain in France, the country of origin. The Cour de cassation ruling found that the prorogations de guerre (extensions for the two World Wars) were superceded by later copyright laws, but only for non-musical works. Since we're discussing musical works, the prorogations still need to be taken into account. Works published through 1920 get an additional 14 years, 272 days, while works published from 1920 through 1947 (since Ravel died in 1937, this covers all the rest of his works) get an additional 8 years, 120 days. So Ravel's works through 1920 are copyrighted in France until late 2022 (272 days gets you almost to the end of September), while his post-1920 works are copyrighted in France until 2016 (120 days goes to late April). cmadler (talk) 12:49, 18 September 2012 (UTC)
    • The Cour de cassation did not mention the type of works to which its ruling applies. Yann (talk) 13:43, 18 September 2012 (UTC)
      • If I understand correctly, the 2007 Cour de cassation ruling related primarily to the 1997 law, which had extended the normal duration for non-musical works from 50 years to 70 years (but was not cumulative with the war extensions), and dealt specifically with the works of two painters, Monet and Boldini. But musical works had already been extended to 70 years pma in 1985, by the "Lang" law, and in the 2007 ruling, the court found that this law was cumulative with the war extensions ("la loi du 3 juillet 1985 avait porté à 70 ans la durée de protection normale, de sorte que les bénéficiaires des prorogations de guerre applicables à cette date pouvaient prétendre à une durée de protection excédant 70 ans"), but only in the case of composers who had already "acquired" the right (already died, starting the copyright clock) prior to July 1992. Have I misunderstood an aspect of this? cmadler (talk) 16:34, 18 September 2012 (UTC)
      • After its two rulings, the Cour de Cassation summarized the situation in its annual report for 2007. It mentions the particular situation of musical works, in the terms quoted above by cmadler. However, as the 2007 rulings were not about music or Ravel, there are apparently still some arguments about how to interpret and apply the principles and how the computation of the term of protection should be done in the specific case of Ravel and, depending on the result, if his works are still under copyright in France or if they are in the public domain there. This 2008 article concluded that, at that time, the question was still uncertain but that commentators seemed to lean more toward the theory of the longer term of protection. Anyway, it seems that the SACEM still perceives money relating to the author's rights of Ravel's works for the uses of those works "à l'étranger" (outside of France, in some countries where the works are still under copyright).[36]. I didn't find something telling clearly if they still perceived fees from the uses of Ravel's works in France after 2008. If the works are still under copyright in France and given the sums of money that would represent, it is somewhat surprising that no litigation is found. It may not help clarify the situation that the money perceived from the copyright used to be claimed by a mysterious offshore company, although I suppose that does not affect the term of protection. -- Asclepias (talk) 19:33, 19 September 2012 (UTC)

Maritime photo collectionEdit

Category:Frederic Logghe Maritime photo collection includes only part of the collection available at the website listed there. The collection itself didn't seem to have grown recently and Commons might be a good place to maintain it in the long term. --07:09, 28. Sep. 2011‎ Docu

Anybody should check the licence before a mass import. I am not sure they all be free. I found lot of pictures with copyright informations. i.E: [37] [38] [39] --Slick (talk) 16:30, 4 August 2012 (UTC)


Assigned to Progress Bot name Category

Images from Caelum Observatory & The Mount Lemmon SkyCenterEdit

Adam Block from The Mount Lemmon SkyCenter has kindly agreed to release a large amount of his images with a CC BY-SA 3.0 license. He has done this specifically so they can be used on wiki projects. A .zip containing all of the released images can be found here. I would like to be able to upload them all into a category called 'Images from Caelum Observatory & The Mount Lemmon SkyCenter' or something in that vein. Many of them will be very useful and have high EV. A link to one of his galleries showing the relevant copyright statements can be found here. As there is 200+ files in the .zip file, uploading them all would be very tedious. I would be very grateful if someone could assist me with this matter. Thanks, Originalwana (talk) 13:16, 10 September 2011 (UTC)

Looks like it is difficult to upload the files from zip with a batch-job because missing information, i.E. description. IMHO makes more sence to parse the website for images under CC because there are very useful descriptions. (Example) --Slick (talk) 11:02, 11 August 2012 (UTC)
That would be great but I have no idea how to go about it, do you know how this could be done? Thanks Originalwana (talk) 10:22, 13 August 2012 (UTC)


Assigned to Progress Bot name Category


The site [40] has a great public domain collection of norvegian manuscripts. For exemple, the totality of the manuscript œuvre of Henrik Ibsen (in the UNESCO patrimony).

The objectif is : download all pictures, convert in a djvu file (by book) and upload the djvu in Commons.

Is it possible ? thx ! --M0tty (talk) 19:36, 3 October 2010 (UTC)


Assigned to Progress Bot name Category


All the images / videos from UMich listed in these two directories [41] If they could all be added to a single category I will than combine them into Wikipedia. --James Heilman, MD (talk) 23:13, 19 July 2011 (UTC)


Assigned to Progress Bot name Category


The owners of ECGpedia have agreed to allow release of their images under a Creative Common 3.0 license This applies to all images except which they are unable to release do to a continued non commercial requirements. There are about 2000 images in all. A list can be found here --James Heilman, MD (talk) 18:51, 13 July 2011 (UTC)

All images are licensed as "Creative Commons Attribution Noncommercial Share-Alike". Wikimedia commons does not allow "Noncommercial" licenses, so unless ECGpedia re-license their images we are not going to be able to use them. If they re-license that will need to be marked on the individual images themselves or through OTRS, which will list which images are covered. --Jarekt (talk) 19:36, 13 July 2011 (UTC)
Yes they have agreed to re-release the images under a license that allows commercial use. So the images will need to be marked as such.--James Heilman, MD (talk) 20:23, 13 July 2011 (UTC)
Here is the OTRS Ticket#2011102310008874 There are about 3000 ECGs and 700 echo images. --James Heilman, MD (talk) 13:55, 23 November 2011 (UTC)


Assigned to Progress Bot name Category
Smallman12q Smallbot

ian.umces.eduEdit offers 3251 free high resolution images and 2544 free vector symbols licensed under CC BY 3.0. --Leyo 06:59, 21 June 2011 (UTC)

A worthy set of images. I was able to download 2546 SVG files in a single ZIP file, but matching it with metadata is more challenging. --Jarekt (talk) 03:37, 29 June 2011 (UTC)
Are the file names in the ZIP file self-explanatory or rather meaningless? --Leyo 09:07, 29 June 2011 (UTC)
Filenames identify source and have few words about content, see for example here. For SVG files, I think we need to write some scraping software to create a spreadsheet with:
  • "Author" and "Author Company"
  • Title and description
  • "Date created"
  • URL (to be used to link back to the source image)
  • "Album name" and "Keywords" can be useful for choosing categories
  • "Filename" (to match it with the downloaded file)
I am at the moment rather busy with Commons:Batch uploading/Web Gallery of Art but if someone can gather the metadata I can upload the files. --Jarekt (talk) 15:05, 29 June 2011 (UTC)
As according to our discussion here (in German), these files additionally need to be fixed to change numbers that omit the leading zero (like .12345) to include this zero ( ---> 0.12345), else wikipedias renderer doesn't parse them correctly. (the substitution can also be of the type -.12345 ---> -0.12345). This is just in case someone very suddenly rushes in to upload these :) Iridos (talk) 23:24, 4 July 2011 (UTC)
All of the SVGs in this library were originally created with Illustrator, although most were run through SCOUR, which I now see strips leading zeros. Does anyone know of any other SVG parsers that have a problem without leading zeros? — Preceding unsigned comment added by Adrianbj (talk • contribs)
All the SVG files already contain DC metadata. There is also an online spreadsheet and excel version of metadata available.
Links to searchable database of all images/symbols and custom download builder for all the symbols in SVG, AI and PNG in a zip archive.
Just read through a translated version of the german discussion. Not sure why that virus didn't rasterize well. The PNG previews and downloadable versions on the IAN website were all created automatically with iMagick and rSVG, although problems like you are seeing did occur with various older versions of iMagick and rSVG. — Preceding unsigned comment added by Adrianbj (talk • contribs)

It seems they changed their licensing terms; the new license doesn't allow redistribution or sales, which makes it unacceptable for Commons. I guess the upload could still happen since CC licenses are irrevocable, but I imagine they wouldn't appreciate it much. The best solution would be for someone to contact them and ask them to change it back to CC BY. InverseHypercube 07:51, 18 February 2012 (UTC)

Sorry about the licensing change - we do rely on this resource to bring traffic to our website, so we would really appreciate honoring of our new license. Thanks. — Preceding unsigned comment added by Adrianbj (talk • contribs)
Thanks for commenting! However, if the images are licensed under CC-BY, we would be required to attribute (and link back to) your website, so no traffic loss would occur. In fact, since CC-BY would allow us to transfer images from your website, having your images on high-traffic sites such as Wikipedia would increase hits to your site, since they would all link to it. InverseHypercube 04:37, 8 March 2012 (UTC)
A custom license tag such as in Category:Custom license tags might be used. It might contain a link to the website and or a direct link to the respective image (example). --Leyo 09:32, 8 March 2012 (UTC)
I'd like to make the preview sized versions of all our images (photos and vector illustrations) available on Commons with a custom license tag (and attribution) and a direct link to the respective image on our site where users can register (free) and download the full resolution / vector (SVG) versions. Almost all the photos (JPG) have metadata embedded. All the SVG files also have metadata, but the preview PNGs do not because of the metadata issues with the PNG format. As I mentioned above, there is an automated spreadsheet available from our site with all the metadata. We are constantly adding images to the library. Is there any possibility to automatically update commons if I create a web service (XML/JSON) of all the images and metadata?
That sounds great, and it can definitely be done. However, as I understand copyright law, by licensing the previews under CC-BY, for example, you would also be licensing the SVG files under the same license, since they do not meet the threshold of originality over the previews. While we might only upload the previews, I don't think you could stop others from distributing the SVG files. InverseHypercube 17:35, 8 March 2012 (UTC)
I guess I was thinking of a custom license, rather than CC-BY, as suggested by Leyo. Would that work?
I still think you would be effectively licensing the SVG files under the same license. InverseHypercube 17:49, 8 March 2012 (UTC)
I understand what you are saying, but if the custom license says that users cannot redistribute or sell and that they must provide attribution even for the preview PNGs, would that work? Maybe this is too problematic for posting to Commons? We are actually also wanting to add the option for users to purchase the right to use our images without attribution, because at the moment, there are many cases when they can't use them due to the attribution requirement. We think this dual licensing model will make them more useful for more people. I'd be curious if anyone has any further suggestions.
No, that wouldn't be allowed on Commons. See Commons:Licensing#Acceptable licenses; non-commercial licenses are not permitted. However, if you licensed the SVGs under a license that required attribution for redistribution, it would apply to the PNGs too. InverseHypercube 22:14, 9 March 2012 (UTC)
Looks like I was wrong about not being able to license the preview images and the vector versions under separate licenses; the community consensus seems to be that you can. See Commons:Village_pump/Copyright/Archive/2012/01#CC_BY-SA_3.0_and_the_original_image_quality. InverseHypercube 18:38, 14 March 2012 (UTC)


Assigned to Progress Bot name Category


As discussed at Village Pump and announced here Yale released 250k images in its database under {{Cc-by-3.0}} license, see here for details.

We should start looking into moving them here while retaining all available metadata. --Jarekt (talk) 14:43, 3 June 2011 (UTC)


My prelimary evaluation:

  • 47343 images of paintings are available in high resolution at the present time. (Go here, fill in no fields, and click "Find.")
  • Images are made available as TIFF files, max resolution appears to be 2400 x 3000 px, 8-bit color, often smaller (they're crops of a single photo, but not bad). We should upload original TIFFs as well as JPEG versions, and cross-link them.
  • Image downloads via the website are protected by a re-CAPTCHA system. This needs to be either defeated, circumvented, or we need special permission to bypass it.
  • Download speed appears to be throttled to about 80 KB/s. At this rate it will take roughly 93 days just to download them all. This is expected and should not be circumvented, since bandwidth hogging costs money and draws ire.
  • We will require is a special license tag for these, because the situation is not simple. Yale has released their digitizations under CC-BY, which will be important in nations where digitizations may be protected by copyright or by a publisher's right, or in case of a hypothetical reversal of Bridgeman v. Corel. On the other hand, PD-Art indicates that attributing the source is not a legal requirement in the United States or other nations where reproductions carry no copyright, and we should not make reusers think that it is required. We need a special tag that combines these, while referring to the original entry in Yale's collection.
  • I don't know if the URL suffix is a stable reference number. We should instead link to a search for the Accession Number, like this.
  • Extracting metadata from HTML should be straightforward. Their metadata fields match our {{Artwork}} template rather well.

I can write a tool to get started on this, but have other obligations this week. Other opinions are welcome. Dcoetzee (talk) 07:29, 5 June 2011 (UTC)

We already have good contacts at Yale. Meg Bellinger from Yale gave keynote speech at GLAMcamp_NYC (see notes and slides). We can ask en:User:Witty lama, who I think interacted with them, to check what would be be the way to get the data with the least interruptions. We can also check if and how would they prefer that we link to their system. I can start on the license templates, institution templates, etc. --Jarekt (talk) 20:35, 5 June 2011 (UTC)


I created {{PD-Art-Yale}} for 2D artworks. Please verify & correct/improve. I think we should add attribution text parameter and possibly put parts of it in an info box with Yale Logo so the credit is not lost in the text.

It is uncertain to me if CC license extends to "digitization" of 3D objects which are otherwise in PD. --Jarekt (talk) 14:05, 9 June 2011 (UTC)

Looks good so far. I don't know if this collection includes three-dimensional works, or paintings with three-dimensional frames, but if it does it's worth noting that they must be used under the terms of the CC license in all nations (as the photograph would not be a mere copy). Dcoetzee (talk) 23:27, 9 June 2011 (UTC)
Yes if they CC extends to photography of the 3D objects than we would need a separate license: Artwork - PD-old, Photography - CC

--Jarekt (talk) 02:09, 10 June 2011 (UTC)

All listed links are currently dead, and [42] indicates that any material that is available is subject to copyright. Should this be delisted? BMacZero (talk) 02:19, 9 November 2015 (UTC)

Assigned to Progress Bot name Category

Geheugen van NederlandEdit

Initial request from Commons:Picture requests/Requests/Europe:

“There is a collection of photographs of historic maps, originating as far as I understand from the "Nederlands Scheepvaartmuseum Amsterdam". I have seen it here [43]. At the moment I am expecially interested in [44]. The maps would be interesting for a great number of articles, the latter one for some articles about "Noord-Friesland". Maybe somebody can make it possible to upload the whole collection. Thanks in advance and with best regards -- 03:35, 24 May 2010 (UTC)”

There are several collections that might be of use:

820 files from are already available at Commons. -- Common Good 19:10, 29 April 2011 (UTC)

I am not sure about the licence. I only found this. Sure we can import the images? --Slick (talk) 19:50, 13 August 2012 (UTC)


Assigned to Progress Bot name Category

Africa CentreEdit

Africa Centre is a non profit organisation in Cape Town that supports arts and culture projects across Africa. Since 2007 they have commissioned thousands of arts and culture images that are related to their projects. The images give an insight into performance art, public art, site-specific art, poetry, visual art, social innovation, architecture, public space, etc. in Africa. They have applied the Creative Commons Attribution-ShareAlice 3.0 license, and have given me permission to upload their files. The photos for each of the Africa Centre projects would be uploaded under the categories Performance Art, Visual Art, Public Art, Poetry, Culture, Arts, and City of origin.Riannedac (talk) 08:54, 15 April 2011 (UTC)

I guess a lot of these photographs are actually derivative works of modern art. Does the Africa Centre own the copyright to the works? Permission should be arranged with Commons:OTRS.
For the actual uploading part we're writing Commons:Guide to batch uploading.
Are you already in touch with Wikimedia South Africa? I'll send them an email about this project. Multichill (talk) 14:30, 16 May 2011 (UTC)

Canada LineEdit

From the English Wikipedia I stumbled upon "Here you will find photography of the Vancouver, B.C. Canada Line, which opened to the public on August 17th, 2009. <...> The photography presented in this blog (780 posts containing 22,000 photos) will be kept online as a historical archive of the construction of the Canada Line." All files are licensed {{cc-by-2.5-ca}} and you have to attribute "Tafyrn & Seamora" and link back to the blog. You can find the actual images in . Based on the page it's used on (for example you should be able to decide on the title and give it a category (or Category:Canada Line if you can't find anything else).

Great collection, but all images are in 640×480. Are we sure to upload this small resolution? Can anybody, with better english than me, contact the author and request higher resolution for commons? --Slick (talk) 11:13, 11 August 2012 (UTC)


Assigned to Progress Bot name Category

Codex GigasEdit

The Swedish National Library has made available the Godex Gigas, a 13th century bible manuscript which is also the largest medieval manuscript in existence, in its entirety. It's available in high resolution through FSI Viewer and in medium resolution as jpegs. The whole file structure is available at National Library's website here. The jpegs seem quite simply to download, but it would be even more interesting to extract the high-resolution pictures out of the viewer.

As a reproduction of a medieval volume, there are no copyright issues to worry about (except for perhaps the pictures of the highly ornate cover).

Peter Isotalo 13:30, 29 November 2010 (UTC)


I did some research on the software used here. We won't be able to directly download the hi-rez images because they are stored in WEB-INF. We can request chunks of the image from an API, but the max size on those seems to be 2000px (the image I was looking at is 2943x4387 at full rez).

Anyway, for reference the API call<FILENAME_HERE>&cmd=view&vtl=fsi/info.xml&tmp=fsi gets you an XML with the full width and height of the image. The filename for the first image in Genesis is images/urn-nbn-se-kb-digark-48462.tif; I don't know if there is a good way to find these. Then, you'd make API calls like<FILENAME_HERE>&width=2000&height=2000&left=0&top=0&right=0.679578661230037&bottom=0.455892409391383&tmp=fsi (left/top/right/bottom are 0.0-1.0 coords of the view window) to download each 2000x2000 chunk of the image.

It might be best to see if they'd be willing to provide the hi-rez versions to us. BMacZero (talk) 17:56, 25 May 2015 (UTC)

KB has published high-resolution images nowEdit

Ping User:Peter_Isotalo

Codex Gigas at

-- User:Mattiasostmar

Assigned to Progress Bot name Category

Right Livelihood AwardEdit

After some discussion with the Right Livelihood Award Foundation I got a clarification on the usage conditions of the photos provided on their website. The details of the discussion can be found in the OTRS, ticket 2010103110002401.

Basically, pictures (mainly portraits of the laureates) from which are marked with a copyright by the Right Livelihood Award Foundation in the respective license files can be used free upon attribution of the photographer and the Foundation, i.e. Template:Attribution. All pictures with other copyrights are in general incompatible to Wikimedia Commons since the Foundation does not own all the rights and they are "free to use as long as they are used in the context of the Foundation's and its Laureates' work." In this respect, the information on is not formulated well.

So I wonder if some kind of batch upload of these pictures make any sense, or if it is faster sorted and uploaded completely manually. --Prolineserver (talk) 22:37, 26 November 2010 (UTC)


Assigned to Progress Bot name Category

Old city mapsEdit

Please have a look at this website. It is a digitization project of old maps. It is done by the Hebrew University of Jerusalem and other institutions. They have a sizable database of old maps that are mostly in the public domain. I searched Commons for a sample of their files to find out if they have already been uploaded, and couldn't find any. These are very rare centuries-old maps and they could be invaluable for many Wikimedia projects.

The maps contain copyright watermarks which obviously don't represent the true status of the copyrights. However, if the university can be contacted and asked if they could collaborate with us and give us access to the un-watermarked maps, and in return we could offer a customized tag (like what'd been done for other mass uploads), it would save us incredible amounts of time and effort working on removing them at the Graphic lab, especially since we have enough work at our hands (just look at the Category:Images for cleanup backlog).

I hope you can start this upload soon. Regards, -- Orionisttalk 23:41, 5 October 2010 (UTC)

  Support very nice collection --Jarekt (talk) 14:26, 6 October 2010 (UTC)


Assigned to Progress Bot name Category

IUCN red listEdit

As I mentioned at talk page, we have established a partnership with the International Union for the Conservation of Nature (IUCN) to produce range maps for many species of animals. See Commons:IUCN red list for a few more details. The GIS manager at IUCN has actually kindly produced now around 6000 maps (in .gif) for all the amphibian species they currently have data on. They have placed the zip file in a password protected ftp site (I can send someone the file, it is only 60 something megabytes). You can see the samples I have uploaded, at Commons talk:IUCN red list#New developments. I also have a .dbf file with information about the source of the spatial data, and I will get shortly a file with a relation between species names and the identification number for the species at the red list website (for example 56054 is the ID for Acanthixalus sonjae). This can be used to extract the Assessor information required to complete the description file. I know that Polbot's sixth task used information retrieved from the IUCN red list website, so it should be possible to use part of its code to retrieve this information. There should be a few other batches later on. GoEThe (talk) 12:41, 1 October 2010 (UTC)

Update: The IUCN is going to send me a total of 30,000 images to be uploaded. They would like it to be done in time for their next website update, which will be on November 11th. Can anybody help me with this? GoEThe (talk) 16:06, 21 October 2011 (UTC)


  •   Support I support this as there are plenty of articles are missing range maps. It is nice that the IUCN would like to support us by releasing range maps to us even if they are in .gif format. --Clarkcj12 (talk) 22:27, 24 January 2012 (UTC)
Maybe we can also per bot make png out of them or is that not possible.--Sanandros (talk) 14:31, 21 August 2012 (UTC)
Assigned to Progress Bot name Category
I think I might be able to do this if I get enough information. Werieth (talk) 01:24, 28 January 2013 (UTC)
I would be interested in helping this move forward. -- Daniel Mietchen - WiR/OS (talk) 22:04, 8 February 2013 (UTC)

VOA pronunciation sound filesEdit

Voice of America has a great pronunciation guide with sound files for 2200 hard-to-pronounce names, places, etc. The sound files are PD as US govt works. These would be great additions to many Wikipedia articles. The pronunciation guide is here. These would need to be downloaded, converted to OGG, and uploaded. Thoughts? Calliopejen1 (talk) 18:41, 12 September 2010 (UTC)


Assigned to Progress Bot name Category
User:Smallman12q Commons:Bots/Requests/Smallbot 9 User:Smallbot

It looks like there are ~6500 entries under "list lookup". The sounds seem to be good. The conversion could easily be done with ffmpeg. The mp3's don't seem to be more than 15kb, so the total upload would be around a hundred MB. In addition to the mp3, their is information on name, country, country, pronunciation, and notes which could be used to categorize them. Provided the entries are indeed PD, they could be easily batch uploaded. For naming, you could use VOA-name.Smallman12q (talk) 23:44, 27 December 2010 (UTC)

How u really can say the are PD-VOA? Who are the authors? I'd upload theme with PD-Treshhold of Originality
i can confirm that they are the work of a VOA employee. with the rollout of the improved voa-pronunciation-guide-reimagined-for-2013/ , we could confirm with an otrs email. [45] use "PD-USGov-VOA" and Category:VOA pronunciation Slowking4†@1₭ 21:40, 30 April 2013 (UTC)
I've filed a BRFA at Commons:Bots/Requests/Smallbot 9.Smallman12q (talk) 20:50, 2 May 2013 (UTC)
Great. A regular refresh is a good idea. I've been in touch with the maintainers of the list; they are indeed PD-VOA. --SJ+ 23:08, 12 May 2013 (UTC)

Population distributions of JapanEdit

I would like to upload images from this category. The images in question are populations distributions of various japanese cities, towns and villages. They are used, for example, in this article or this one. I've uploaded a bunch of samples: 1, 2, 3, the full list. Claymore (talk) 14:50, 6 August 2010 (UTC)


Looks very good! Nothing comes to mind to change here. At jawp however you should add {{NowCommons}} to the images and replace all usage so the admins at jawp can easily delete the files. Multichill (talk) 17:58, 6 August 2010 (UTC)

They depend on a template system that requires names of the file to be "Demography(xxxxx).svg". I'll see if I can convience them to move to the template implementation I created for ruwp. Claymore (talk) 07:07, 7 August 2010 (UTC)
A template system which is based on the names of files is sooooo broken. Multichill (talk) 09:03, 7 August 2010 (UTC)

Assigned to Progress Bot name Category
Claymore ClaymoreBot Population distribution of Japan

The Tansey Collection of MiniaturesEdit

Hi. The Tansey Collection of Miniatures have a large collection of 17th, 18th and 19th century miniature portrait paintings in high resolution. The paintings are definitely within our scope, and would be a great addition to the commons. I have therefore uploaded some of them here, but since there are so many and the frames needs to be cropped to make them eligible for PD-Art, some help would be appreciated. Cheers —P. S. Burton (talk) 17:25, 29 July 2010 (UTC)


  • This sounds like it could be end up being a situation similar to that which developed with the UK National Portrait Gallery. Have you, as a courtesy, considered contacting the curators of the collection before doing a systematic process such as this? I also have objections to this based on the cropping, but that is a different issue to that related to batch uploading so I will raise this back at the Village pump (though any need for cropping will make automation difficult or impossible here, especially for the circular and oval miniatures, which is most of them). Carcharoth (Commons) (talk) 06:29, 31 July 2010 (UTC)
Assigned to Progress Bot name Category


We could have a bot upload images from It is a page like Flickr but all images are licensed and therefore all ok for Commons.

I created a category for the images and a template to use {{Piqs}}. It needs a better picture but the biggest problem is which images we should upload. We could upload ALL images or have users SELECT images. Perhaps we could make a bot like the one we use to upload images from Flickr. Suggestions? Opinions? --MGA73 (talk) 20:39, 24 July 2010 (UTC)

Nice page. For the initial import my suggestion is parse the subpages of the top pictures or here or here or here. If possible watch for new files there in the future (i.E. by the given rss feeds). So we will get only the best and not all the others. But this only make sence when it is done in intervals, not only one time. And another hint, the bot should to login to get the highest solution (or maybe there is a woraround to download the original?). --Slick (talk) 14:09, 11 August 2012 (UTC)


Assigned to Progress Bot name Category

Pearson Scott Foresman SVG filesEdit

Users at the Open Clip Art Library have created many SVG versions of line drawing files by Pearson Scott Foresman here. They should be uploaded with the DerivativeFX tool, and the raster version tagged with Template:SupersededSVG. File:Catfish (PSF).svg is one file I have uploaded so far; use it as a basis for formatting new SVG upload filepages. --Siddharth Patil (talk) 15:13, 28 June 2010 (UTC)


Assigned to Progress Bot name Category

Old Book ArtEdit This site has tons of old public domain book illustrations. If a bot can upload them, I'll happily categorize them. Rocket000 (talk) 14:31, 21 January 2010 (UTC)

After reading, I think it's best to contact him and see if we can do some sort of partnership. Are you willing to contact him? Multichill (talk) 22:15, 23 May 2010 (UTC)
The images are either released as CC-by-sa or public domain. I think he specifies that he would like as a courtesy a link back to his website, so I think that would suffice for this upload.--Diaa abdelmoneim (talk) 07:22, 14 September 2010 (UTC)
This would be great. Could some bot owner look into this? Yann (talk) 08:12, 14 February 2014 (UTC)
Assigned to Progress Bot name Category

University of Washington Digital CollectionsEdit

The same algorithm applied to Commons:Batch uploading/Freshwater and Marine Image Bank can be used on multiple collections of the UW collections. I'll list some here with the reason of why the images would be PD.

There are many more that could be checked.--Diaa abdelmoneim (talk) 17:54, 17 October 2009 (UTC)

Status Collection Size License
Alaska, Western Canada and United States Collection 7242 {{PD-1923}}, verify on a per-image basis
Alaska-Yukon-Pacific Exposition 1425 {{PD-1923}}
American Indians of the Pacific Northwest 2294 inconsistent
Architecture of the Pacific Northwest 1072 unknown
  Done Albert H. Barnes Photographs of Western Washington, ca. 1895-1920 302 {{PD-old-auto-1923|deathyear=1920}}
Boyd and Braas Photographs of Seattle and Washington State, 1888-1893 89 {{PD-1923}}, deathyears indeterminate
Centralia Massacre and the Industrial Workers of the World Collection, 1912-1932 47 {{PD-1923}}, verify per-image
Robert Henry Chandless Photographs of China, 1898-1908 187 {{PD-old-auto-1923|deathyear=1951}}, but check British and Chinese copyrights
Civil War Letters Collection 109 {{PD-1923}}
Civil Works Administration Photographs of King County, 1933-1934 134 might be {{PD-USgov}}
  Done John N. Cobb Photographs of the Fishing Industry, ca. 1897-1917 348 {{PD-old-auto-1923|deathyear=1930}}
Asahel Curtis Photo Company Photographs 1985 partially {{PD-1923}}
Phyllis Dearborn and Robert Massar Photographs of Pacific Northwest Architecture, 1943-1963 1596 likely not PD
Decorated and Decorative Paper Collection 504 {{PD-old-100-1923}}
Lauren R. Donaldson Collection of South Pacific Radiological Surveys, 1946-1964 404 probably {{PD-USgov}}
Early Advertising of the West, 1867-1918 451 {{PD-1923}}
Early Washington Maps 1253 {{PD-1923}}, verify per-image
Everett Massacre of 1916 Collection 110 {{PD-1923}}
Fashion Plate Collection 548 {{PD-1923}}
Federal Emergency Relief Administration (FERA) Photographs of King County, 1933-1935 274 probably {{PD-USgov}}. crop needed
Gairola Indian Art & Architecture Image Collection 2631 not automatically PD
Grand Coulee Dam Construction, 1933-1942 271 maybe {{PD-USgov}}
  Not done Gary Greaves Oral History Digitization Project 123 not PD, audio
Harriman Alaska Expedition of 1899 254 {{PD-1923}}, crop needed
  Done Eric A. Hegg Photographs of Alaska and the Klondike, 1897-1901 817 {{PD-old-auto-1923|deathyear=1948}}
  Done Wilhelm Hester Photographs 937 {{PD-old-auto-1923|deathyear=1947}}
Historical BookArts Collection  ? {{PD-1923}}, very old, different algorithm needed
Historical Children's Literature Collection 1357 some {{PD-1923}}, some {{PD-old-70}}, some not PD
Industries and Occupations Photographs 1355 partially {{PD-1923}}
International Collections Database 5613 partially {{PD-1923}}
  Not done Henry M. Jackson Collection 750 likely not PD
  Not done Jewish Archives Collection 2504 likely not PD
  Done H. Ambrose Kiehl Photograph Collection 298 {{PD-old-auto-1923|deathyear=1942}}
Kinsey Brothers Photographs of the Lumber Industry and the Pacific Northwest, ca. 1890-1945 2795 Some {{PD-old-auto-1923|deathyear=1945}}, some {{PD-old-auto|deathyear=1945}}
  Done Frank La Roche Photographs 337 {{PD-old-auto-1923|deathyear=1936}}
Labor Archives Digital Resources Portal Special algorithm needed
James Patrick Lee Photographs of Seattle, ca. 1904-1940 274 PD questionable (author death post-1957)
Lawrence Denny Lindsley Photographs of Washington State, ca. 1875-1971 472 PD questionable (author death 1975)
McKenney and Hall Indian Tribes of North America
Medieval and Historical Manuscripts 180 very old, but mostly text
William E. Meed Photographs of the Yukon Territory, ca. 1898-1907 235 {{PD-old-1923}}, author deathyear unknown
Menus Collection 80 very mixed, not all PD
  Not done Modern Photographers Collection 1692 Not PD
The Mountaineers Photograph Album Collection 138 mixed
Moving Image Collection 761 mixed, different algo needed for video
Napoleonic Period Collection of Political Caricatures 84 definitely PD, published in Britain and/or France
  Not done R. Nath Mughal Architecture Image Collection 2106 Not PD
Nineteenth (19th) Century Actors and Theater Photographs 843 {{PD-old-1923}}, various authors
  Done Frank H. Nowell Photographs of Alaska, 1901-1909 280 {{PD-old-auto-1923|deathyear=1950}} according to existing category
Oral History Collection 709 audio
Pacific Northwest Historical Documents Database 5275 varied, partially {{PD-old-1923}}
Pamphlet and Textual Ephemera Collection 945 varied, partially {{PD-old-1923}}
Panorama Photographs 156 mostly {{PD-old-1923}}, author varies
  Not done Blanche Payne Regional Costume Photograph and Drawing Collection 1211 not PD (author death 1972, pub >1930)
Theodore E. Peiser Photographs 140 {{PD-old-1923}}, author death unknown but likely shortly after 1907.
Lee Pickett Photographs 1690 Some {{PD-old-1923}}, author death 1959
Portraits Database 1930 mostly {{PD-old-1923}}
Prior and Norris Troupe Photographs 229 likely all {{PD-old-1923}}
Prosch Albums, ca. 1851-1906 169 All PD, different authors
Rainier National Park Mountain-Glacier Wonderland Photograph Album, ca. 1925 51
Salmon in the Pacific Northwest and Alaska Collection, 1890-1961 274 Varies
  Done Henry M. Sarvant Photographs of Washington State and the Yukon, 1892-1912 210 {{PD-old-auto-1923|deathyear=1940}}
J. Willis Sayre Photographs 24293 Some {{PD-1923}}
SeaTac/Seattle Minimum Wage Project 49+56 Not PD
Seattle Photographs 5025 Mostly {{PD-1923}}, no authors listed
Seattle Power and Water Supply Collection 696 Mostly {{PD-1923}}, no authors listed
Society and Culture Collection 9513 Some {{PD-1923}}, some various authors listed
South Asian Oral History Project  ? Not PD, audio, different algorithm needed
Stereocard Collection 325 {{PD-1923}}, check per image, misc. photographers listed
Sundberg Oral History Digitization Project 22 Not automatically PD, audio
Tacoma Narrows Bridge Collection 176 Post-1923, some misc authors, PD unknown
  Done John E. Thwaites Photographs of Alaska, 1905-1920 394 {{PD-old-auto-1923|deathyear=1940}}
Calvin F. Todd Photographs of Seattle, 1905-1930 247 Some {{PD-old-auto-1923|deathyear=1968}}, some post-1923 (not PD)
Tollman and Canaris Photographs of the Salmon Industry in Washington State, 1893-1897 60 {{PD-anon-1923}}? Only Canaris' deathyear is known.
Transportation Photographs 1421 Some {{PD-1923}}, misc authors
University of Washington Campus Photographs 2985 Some {{PD-anon-1923}}, some misc authors
University of Washington Yearbooks and Documents  ? TODO
Oliver S. Van Olinda Photographs of Puget Sound, 1880s-1930s 421 Author death 1954, some {{PD-1923}}
Vietnam War Era Ephemera Collection 303 Likely not PD
Alvin H. Waite Photographs of Tacoma and Washington State, 1892-1907 164 {{PD-old-auto-1923|deathyear=1929}}
War Poster Collection  ? TODO
Arthur Churchill Warner Photographs of Washington State and Alaska, ca. 1884-1945  ? TODO
Washington State Localities Photographs  ? TODO
Dwight Watson Photographs of Washington State, 1933-1943  ? TODO
World and Regional Maps, 16th to 19th Centuries  ? TODO
WTO Seattle Collection  ? TODO


Assigned to Progress Bot name Category
User:BMacZero   Done Albert Henry Barnes Collection User:BMacZeroBot Category:Images from the Albert Henry Barnes Collection to check
User:BMacZero   Done John N. Cobb Photographs of the Fishing Industry, ca. 1897-1917 User:BMacZeroBot Category:Images from the John N. Cobb Photograph Collection to check
User:BMacZero   Done Eric A. Hegg Photographs of Alaska and the Klondike, 1897-1901 User:BMacZeroBot Category:Images from the Eric A. Hegg Photographs Collection to check
User:BMacZero   Done H. Ambrose Kiehl Photograph Collection User:BMacZeroBot Category:Images from the H. Ambrose Kiehl Photograph Collection to check
User:BMacZero   Done Wilhelm Hester Photographs User:BMacZeroBot Category:Images from the Wilhelm Hester Photographs Collection to check
User:BMacZero   Done Frank La Roche Photographs User:BMacZeroBot Category:Images from the Frank La Roche Photographs Collection to check
User:BMacZero   Done Frank H. Nowell Photographs of Alaska User:BMacZeroBot Category:Images from the Frank H. Nowell Photographs of Alaska Collection to check
User:BMacZero   Done Henry M. Sarvant Photographs User:BMacZeroBot Category:Images from the Henry M. Sarvant Photographs to check
User:BMacZero   Done John E. Thwaites Photographs of Alaska User:BMacZeroBot Category:Images from the John E. Thwaites Photographs of Alaska Collection to check

NOAA Photo LibraryEdit

The Fema request got me started. NOAA has a nice set of images at . Not sure what amount of images we're talking about, but at least a couple of thousands. Multichill (talk) 20:09, 14 October 2009 (UTC)

See the catalog of images.


If possible, go ahead with it since there haven't been any objections. –Juliancolton | Talk 16:55, 19 March 2010 (UTC)

It does sound good. -- User:Docu at 19:42, 2 May 2010 (UTC)

Some or all of these images don't have metadata, including the dates of when they were taken. --O (висчвын) 20:04, 07 August 2010 (GMT)

Hmm r they really free? Cause some of them have an author which is not working for the NOAA directly, but working for an university which takes part in that project.--Sanandros (talk) 14:44, 21 August 2012 (UTC)

Assigned to Progress Bot name

Images from Beinecke's collectionsEdit

One more wonderfull collection with lot of PD-images - 200,000 digitized images of photographs, illuminated manuscripts, maps, works of art, and books from the Beinecke's collections --Butko (talk) 08:50, 16 April 2009 (UTC)

Did you contact them? Did you get a release? Or is this merely a suggestion. That shouldn't go here imho. Nice collection though, we should contact them to get some nice images. Multichill (talk) 14:05, 7 June 2009 (UTC)
I would like to help out on the acquisition of images of this library. I wanted to send an e-mail but thought it would be best if we work together on a draft. --Diaa abdelmoneim (talk) 14:59, 7 June 2009 (UTC)
Ok. As discussed on irc: You'll contact the library. Please keep me posted. Multichill (talk) 15:07, 7 June 2009 (UTC)
Any update on this one? Multichill (talk) 23:14, 4 September 2009 (UTC)
I sent them a mail multiple times but they didn't reply....--Diaa abdelmoneim (talk) 23:18, 4 September 2009 (UTC)
  • User:JovanCormac seems to have started uploading the Detroit Company images. Maybe the batch should be split into many parts then each uploaded on its own.--Diaa abdelmoneim (talk) 17:06, 16 October 2009 (UTC)

Can this be removed from the the list? (Commons:Batch uploading)? -- RE rillke questions? 18:27, 4 June 2012 (UTC)

Images from World Digital LibraryEdit

New site with PD-images - Contain 1170 items --Butko (talk) 06:52, 22 April 2009 (UTC)

User:Sj shown interest in working on this upload. Looks like a very nice collection. Some points:
  • The items have an id (, so easy to loop over
  • The description of the items is available in a lot of languages, we should use that
  • Lot's of metadata is available, this should make categorization easier
  • One item can contain multiple files. We should be aware of that
  • Files are available in the tiff file format. We should either have tiff thumbnails or upload tiff and a jpg version (transcoding!)
  • Experience and code gained with the usgov uploads should be (re)used
  • Some items have curator video's, might be fun to upload too
Multichill (talk) 14:13, 8 November 2009 (UTC)
Aside: There's a lot of interest in using data from how these images are used in encyclopedia articles, and how traffic is driven to the original archives, to inspire more libraries to take part in WDL. +sj + 14:14, 8 November 2009 (UTC)

Any progress? -- RE rillke questions? 18:29, 4 June 2012 (UTC)

Thanks for the reminder. They've done a batch of updates recently; I'll see if I can get a dump next week before finding a suitable scraper. --SJ+ 06:52, 21 June 2012 (UTC)
Hi there. I'm the Wikipedian in Residence at the World Digital Library. Some content is already found here: Category:Images from the World Digital Library. Whatever you decide to do is what you decide to do, but, WDL asked that facilitating mass uploads and encouraging extensive uploading not be a part of my scope this year. This is at the request of the majority of their partners. But, I can't control what others do, of course, if something is in the public domain, and I do upload occasional images. Do note: not all content on WDL is public domain - there is content from post-1923 (per US law) on the site, and much of that content was not created by federal/government entities. So make sure you check each page accordingly. (This includes content from the Florida State Library, for example.) Sarah (talk) 04:06, 13 July 2013 (UTC)

2014 UploadsEdit

@Sj, SarahStierch, Rillke: I am revisiting these based on an (independent) email request earlier this week from 維基小霸王. Progress below. It may take 5 years but we get there eventually.   -- (talk) 11:16, 28 February 2014 (UTC)

Thank you, Fæ.--維基小霸王 (talk) 10:40, 3 April 2014 (UTC)
Technical, commentsEdit
File:<title[<=200 characters]> WDL<id>.{png, pdf, jpg}

Where <id> is the WDL database number and title may be trimmed to below 200 characters by breaking off sentences, some titles are long due to additional sentences adding details. The title is taken from the English version of the page, but may include various non-English characters such as the under-dot and over-bar in "Ṭahmāsp", consequently the Commons file name relies on utf-8 rather than being limited to ascii.

First (and possibly only) run limited to photographs which are single items unless in pdf format. Other formats such as mp3 files exist on the site and would require pre-processing, whether this is worth the time to batch automate will depend on volumes and interest.

Copyright issues
  1. File:Bombed Copy of “Defensor pacis” WDL11254.png WDL was impossible to filter automatically. There is no date for the photograph, nor any detail about the photographer, only details of the object. The same photograph (in a different tint and size) appears at (The Royal Library of Denmark) and may be CC-BY-NC-ND under the website terms, however the absence of a copyright status or back-link when released through WDL may indicate this general term was not intended to apply to this individual photo.
completeness issues
  1. Books like File:On Aristotle’s “On the Heavens” WDL7106.jpg are incomplete: only the first page are uploaded.--維基小霸王 (talk) 04:59, 3 March 2014 (UTC)
    I think the easiest solution is to identify these as a backlog list and then create them by hand using the existing text with the original then being deleted as an inferior duplicate. This example turns into:
    On Aristotle’s “On the Heavens” WDL7106 V1.pdf & On Aristotle’s “On the Heavens” WDL7106 V2.pdf
    Others fixed
    Please add any more to be processed to Images uploaded by Fæ (check needed) as they will then appear in this live list.
rotation issues

Book scans like File:Treatise on Geometry WDL7107.pdf requires rotation.--維基小霸王 (talk) 09:42, 3 March 2014 (UTC)

I have raised this on Commons:Bots/Work_requests#Rotating_books as someone may have created a tool to do this already. -- (talk) 09:52, 3 March 2014 (UTC)
resolution issues

Resolution of book scans like File:Theater of Instruments and Machines WDL4305.pdf are too low to be able to identify the text. Resolution can be improved through reconstructing the PDFs from pictures, although it still not very clear.--維基小霸王 (talk) 01:40, 4 March 2014 (UTC)

I agree that a resolution of 675px is disappointingly small but is just about usable. I am not sure it would be worth the effort of restitching a new book unless the scanned pages were significantly improved and the available WDL png versions in this case are only 836px wide. These may be a case of attempting to find better scans elsewhere and re-uploading with new files rather than bothering with the existing WDL copy.
Checking all the books in Books from the World Digital Library, 93%+ have a width of over 800px, in fact a large proportion are over 2,000px. List for analysis below.
If any are definitely unusable, I encourage raising a deletion request under COM:SCOPE and we can discuss if there are possible replacements or if repairs are worth the effort. -- (talk) 09:48, 4 March 2014 (UTC)
I ran a little experiment with File:Theater of Instruments and Machines WDL4305.pdf by bot-downloading all the png versions of the pages, converting to very high quality jpegs and compiling them into a new pdf. The result was changing from an 8MB file to a 252MB file, so a bit more compression would probably be okay, though a reader can now see the pages at 2,000px across rather than 675px. This is all expensive in volunteer time and a bit too complex to automate, so I would only expect to do this work for a handful of desirable books. -- (talk) 20:12, 9 March 2014 (UTC)
Assigned to, task Progress Category
Fæ, mapping to {{Artwork}} using BeautifulSoup in Python, including capturing all available languages ('en', 'fr', 'ru', 'ar', 'es', 'pt', 'zh'). I was particularly pleased to get both Chinese and Arabic working on the image pages. Status:    Done -
Fæ, intelligent categorization filter based on WDL location and keywords. (Considered adding a grandparent/parent/child check on the category list, but this could be a later Faebot project to apply to many batch uploads and need not be a dependency here as the categories appear "reasonable".) Status:    Done -
Fæ, intelligent licensing to skip post-1923 works.

May miss some files due to WDL layout inconsistencies. Filter is based on date and creator fields that match"192[3-9]|19[3-9]\d|20[0-1]\d", creator['en']) or"192[3-9]|19[3-9]\d|20[0-1]\d", date). This will cause some Public Domain images to be skipped (such as the Orville Wright photo of 1903 below), but the law of vanishing returns on programmer time applies...

Licenses being used are not great, however with the ample metadata being included in the Artwork template, this might be easy to refine if there are suggested improvements.

Status:    Done -
Fæ, upload first batch photographs (~1,500 done)

During the first 400 or so I found a number of bugs/improvements (such as improved categorization). A couple of uploads may end up getting deleted due to uncertain copyright, however the date filter should now be adequate for the rest of the run.

Upload second batch in parallel books (max 877)

Status:    In progress Catscan report
{Images from the World Digital Library & Images uploaded by Fæ}
The WDL category started with 250 images already uploaded.

Second category specifically for books (pdf) Books from the World Digital Library

Fæ, decide what to do about uploads over 100MB, for example 10630 which is over 180MB in size.

These have to be handled manually as there is no readily available batch process, list below.

Log of WDL pdf/book files larger than 100MB
  1. The Qur'an in the Earliest Printed Version 210MB
  2. The First Folio of Shakespeare WDL11290.pdf 156MB
  3. Book of the Holy Gospel of Our Lord and God Jesus Christ WDL9917.pdf 103MB
  4. The Lincoln Bible WDL11358.pdf 367MB
    File:The Spiritual Couplets.pdf [46] Manuscript copy appears to have been created in 2006.
  5. File:An Examination of the Talents Required for the Sciences WDL10630.pdf 182MB
  6. El melopeo y maestro- Treatise on the Theory and Practice of Music WDL10633.pdf 358MB
  7. Account of the Composition of the Human Body
    PDF 102.2 MB
  8. Kiev with Its Oldest School, the Academy
    PDF 144.6 MB
  9. Complete Book on the Judgment of the Stars
    PDF 124.3 MB
  10. Commentary on the Chapter Nine of the Book of Medicine Dedicated to Mansur
    PDF 105.2 MB
  11. Compendium of Medical Texts by Mesue, with Additional Writings by Various Authors
    PDF 111.5 MB
  12. The Greater Luminary
    PDF 173.7 MB
  13. The Seven Books on the Therapeutic Method, Which Is the Art of Curing, by John of Damascus from the Decapolis, Major Medical Authority among the Arabs
    PDF 110.4 MB
  14. Rosary and Service Dedicated to the Blessed Virgin Mary and Other Devotions Combined in Honor of the Most Holy Trinity and in Worship of the Most Venerable Queen of the Heavens
    PDF 162.9 MB
  15. City of God
    PDF 128.9 MB
  16. Book of Effects of Drugs
    PDF 112.3 MB
  17. Arabia- The Cradle of Islam
    PDF 102.2 MB
  18. Life in the Desert, or, Recollections of Travel in Asia and Africa
    PDF 138.9 MB
  19. The Penetration of Arabia- A Record of the Development of Western Knowledge Concerning the Arabian Peninsula
    PDF 101.8 MB
  20. Personal Narrative of a Year's Journey through Central and Eastern Arabia (1862-63)
    PDF 125.2 MB
  21. From the Indus to the Tigris
    PDF 126.9 MB
  22. Narrative of a Journey into Khorasan, in the Years 1821 and 1822
    PDF 231.7 MB
  23. Strolls Around Tobol'sk in 1830
    PDF 110.6 MB
  24. Guide to the Great Siberian Railway
    PDF 351.6 MB
  25. The Amazon and Madeira Rivers- Sketches and Descriptions from the Note-Book of an Explorer
    PDF 104.9 MB
  26. Through the Brazilian Wilderness, by Theodore Roosevelt- With Illustrations from Photographs by Kermit Roosevelt and Other Members of the Expedition
    PDF 153.6 MB
  27. A Journal of Captain Cook's Last Voyage to the Pacific Ocean, and in Quest of a North-West Passage Between Asia & America, Performed in the Years 1776, 1777, 1778, and 1779
    PDF 183.8 MB
  28. White Isles of the South Sea- History of the Apostolic Vicariate of the Gilbert and Ellice Archipelagoes, by Father Fernand Hartzer
    PDF 254.6 MB
  29. Funafuti; Or Three Months on a Coral Island- An Unscientific Account of a Scientific Expedition
    PDF 121.6 MB
  30. Nepal and the Himalayan Countries
    PDF 113.9 MB
  31. An Account of the Kingdom of Nepal
    PDF 236.3 MB
  32. Journal of a Tour through Part of the Snowy Range of the Himala Mountains, and to the Sources of the Rivers Jumna and Ganges
    PDF 473.2 MB
  33. The History of Genghizcan the Great, First Emperor of the Antient Moguls and Tartars
    PDF 339.8 MB
  34. Germany and Its Colonies- Travels through the Empire and Its Overseas Possessions, with the Collaboration of Arthur Achleitner, Johannes Biernatzki, et al.
    PDF 542.3 MB
  35. Bhotan and the Story of the Dooar War
    PDF 389.9 MB
  36. A Narrative of the Mission Sent by the Governor-General of India to the Court of Ava in 1855, with Notices of the Country, Government, and People
    PDF 363.1 MB
  37. Portuguese Possessions in Oceania
    PDF 371.5 MB
  38. The History of Persia
    PDF 334.3 MB
  39. Description of Egypt. First Edition. Antiquities, Descriptions, Volume One
    PDF 288.8 MB
  40. Description of Egypt. First Edition. Antiquities, Descriptions, Volume Two
    PDF 332.0 MB
  41. Description of Egypt. First Edition. Antiquities, Essays, Volume One
    PDF 312.4 MB
  42. Description of Egypt. First Edition. Antiquities, Essays, Volume Two
    PDF 119.6 MB
  43. Description of Egypt. First Edition. Modern State, Volume One
    PDF 433.3 MB
  44. Description of Egypt. First Edition. Modern State, Volume Two
    PDF 306.9 MB
  45. Description of Egypt. First Edition. Modern State, Volume Two (Additional)
    PDF 301.7 MB
  46. Description of Egypt. First Edition. Natural History, Volume One
    PDF 356.0 MB
  47. Description of Egypt. First Edition. Natural History, Volume Two
    PDF 324.3 MB
  48. Geographical Description and Governmental Administration and Settlement of the Spanish Colonies in the Gulf of Guinea
    PDF 288.1 MB
  49. Narrative of an Expedition to Explore the River Zaire, Usually Called the Congo, in South Africa, in 1816
    PDF 378.3 MB
  50. The History of the Caribby-Islands
    PDF 243.0 MB
  51. The African West and Catholic Missions, Congo and Oubangi
    PDF 229.4 MB
  52. Explorations in Africa, By Dr. David Livingstone, and Others, Giving a Full Account of the Stanley-Livingstone Expedition of Search, under the Patronage of the New York Herald, as Furnished by Dr. Livingstone and Mr. Stanley
    PDF 300.4 MB
  53. Mister Johann Anderson...Reports on Iceland, Greenland, and the Davis Strait for the Proper Use of the Sciences and Commerce
    PDF 301.4 MB
  54. A Voyage Down the Amoor- With a Land Journey through Siberia, and Incidental Notices of Manchooria, Kamschatka, and Japan
    PDF 216.4 MB
  55. The Constitution of India
    PDF 224.4 MB
  56. A New, Authentic, and Complete Collection of Voyages Round the World- Undertaken and Performed by Royal Authority, Containing a New, Authentic, Entertaining, Instructive, Full, and Complete Historical Account of Captain Cook's First, Second, Third, and Last Voyages, Undertaken by Order of His Present Majesty
    PDF 808.9 MB
  57. The Special Features of French Antarctica, Otherwise Called America, and of Several Lands and Islands Discovered in Our Time
    PDF 178.2 MB
  58. Sunday Book
    PDF 286.7 MB
  59. Aesop's Fables
    PDF 179.0 MB
  60. A Voyage Round the World, Including an Embassy to Muscat and Siam in 1835, 1836, and 1837
    PDF 471.8 MB
  61. Travels in South Africa in the Years 1849 to 1857
    PDF 374.0 MB
  62. Laszlo Magyar's Travels in Southern Africa Between 1849 and 1857
    PDF 495.2 MB
  63. Commentary of Hugo of Sienna on the First (Book) of the Canon of Avicenna Together with His Questions
    PDF 195.7 MB
  64. Early Writings of Carl von Linne
    PDF 161.2 MB
  65. Ibn Battuta's Rihla
    PDF 100.2 MBLog of WDL pdf files over 99MB
  66. Foreign Relations of the United States, 1894. Appendix 2- Affairs in Hawaii
    PDF 247.4 MB
  67. The Lango- A Nilotic Tribe of Uganda
    PDF 124.2 MB
  68. Through the Dark Continent
    PDF 149.4 MB
  69. A. M. Mackay- Pioneer Missionary of the Church Missionary Society to Uganda
    PDF 114.1 MB
  70. Amadis of Gaul
    PDF 623.7 MB
  71. An Account of a Selection of Plants of America
    PDF 354.1 MB
  72. An Account of a Selection of Plants of America
    PDF 178.8 MB
  73. The Pilgrimage of Alpha (Manuel Ancizar) in the Northern Provinces of New Granada, 1850-51
    PDF 236.7 MB
  74. Saint Ignatius of Loyola, Founder of the Society of Jesus- Heroic Poem
    PDF 351.6 MB
  75. The Gospel of St. Matthew
    PDF 120.1 MB
  76. The Comprehensive Book on Medicine
    PDF 125.7 MB
  77. Earthquakes of India- Volume I
    PDF 157.2 MB
  78. Description of Malta
    PDF 112.6 MB
Status:    Backlog -
Fæ, re-work 'zoomified' objects such as File:From Tobol'sk to Obdorsk WDL181.jpg where there are multiple images. Currently only the first has been taken, however these appear to be a small proportion of the total. In this particular example the WDL provided no pdf version. I assembled one by downloading the 33 png files used by the "zoomify" tool and repackaged them into a pdf. See File:From Tobol'sk to Obdorsk WDL181.pdf.

195 files to be checked identified. Same issue as raised in discussion. Backlog created here.

Status:    1% done -
Fæ, many of the images have links to an original library source (in particular images provided by the US Library of Congress). It should be possible to go through these and check if an original very high resolution TIFF is available and can be uploaded as an alternative version. Based on the previous NARA batch upload, the Commons Community preferred to have both a high resolution jpeg/png which is easily used elsewhere, as well as a larger or sometimes extremely large tiff, with the categorization on the usable file with a pointer in the other_versions parameter to the on-wiki tiff.

Where a tiff is available and does not already appear on Commons, it would be ideal to upload it as an alternative in this way.

Note, in some cases the original link no longer works, this is the case for the Biblioteca Nacional Digital of Brazil and it may be impossible to find an original scan.

    1. File:Chinese Military Officer, in Official Uniform and Summer Straw Hat. China, 1874-75 WDL1905.png, 1,024 × 1,455 px, description in 7 languages
    2. LOC catalogue (A):, tiff available at 1,081 × 1,536 px
    3. LOC detailed record (B) linked from A:, in English only
    4. Given the LCCN deduced from B, we can pull the XML record -
    1. File:Declaration of Independence. In Congress, July 4, 1776, a Declaration by the Representatives of the United States of America, in General Congress Assembled. WDL109.jpg - see earlier issue of multiple pages, this is only 1 page out of 2 to be uploaded, width 1,024px, 7 languages
    2. LOC display (A): (4435 × 5465px)
    3. LOC catalog, linked from A:, English only
    4. XML record -

I will ponder the path to take here. Where a source exists in higher resolution online, it will invariably be in a library collection which itself may be worth creating as its own batch upload project rather than piecemeal due to happen-stance of being published on the WDL.

Status:    On backlog -

Maps from Ryhiner CollectionEdit

Available from I´ve dealing with this collection for time (see this file for a example). This collection consists in "over 16000 high resolution images: maps, town plans and topographical views from the 16th to the early 19th century". So, if this declaration can be taken in face value, there is no problem with copyright because this maps are already in Public domain and being a 2D works their digital copies are also in PD. So if this statements are correct all their collection could uploaded by a bot to commons. Their maps are avaible in high resolution using zoomify (see the exemple map in their site). Tm (talk) 13:20, 22 April 2009 (UTC)


Looks like a great collection. Is it possible to access the source files? Did you try contacting them? Multichill (talk) 14:03, 7 June 2009 (UTC)

Sorry for the delayed answer. To aswer your first question, i don´t know if it´s possible to have online acess to their source files, and i am not very techie savy. Also i didn´t try to contact them. What is your opinion of what are the next steps to take? Tm (talk) 01:25, 15 June 2009 (UTC)

I´ve sent today an email asking for their permission to make this batch upload. I thought that asking now if their source files are avaible online in this stage would be too soon. Tm (talk) 15:10, 2 July 2009 (UTC)
Sorry about not responding sooner, looks like i forgot to watchlist this page. We're in the non tech phase. Try to contact them, see if they like it. If that turns out alright we can start the actual data retrieval and uploading part. Writing a general story about this is still on my list. I'll see if I can make a first version. Multichill (talk) 16:59, 2 July 2009 (UTC)

Just a quick update to tell that i received a automatic answer about the absence of the person contacted by my email, and i forward it to a email i received in the answer. When and if i receive a answer i´ll update this page. Tm (talk) 00:48, 3 July 2009 (UTC)

I received a aswer, and already replied to it, but i am waiting permission to republish the email or the contents of the aswer that i received. Tm (talk) 04:10, 12 July 2009 (UTC)

You can always use OTRS if you want to keep it private. Multichill (talk) 10:56, 12 July 2009 (UTC)

The question isn’t exactly about privacy, but more about building trust between the parts, after the NPG case (I fully support Dcoetzee)‎‎, with might have been heard by this people and gave them a bad impression of Wikimedia Commons and its users. I can tell, without breaking the secrecy correspondence, that the answer that I received was slightly positive to the possibility of cooperation, but the person that answered made some questions, doubts and remarks that need to be addressed, about this possible cooperation, (I gave my opinion), but requested that its answer be publish so that more people can give their input. Despite this I received an automatic answer to my second email telling that I might not receive a second email until 10 of August. Tm (talk) 07:39, 19 July 2009 (UTC)

Any update on this one? Multichill (talk) 23:13, 4 September 2009 (UTC)

Not much. I´ve received a email on 11 of August telling, that do to the holidays of the person that i´ve send the mail, the answer would be delayed but i´ve not received nothing subsequently, until now. Tm (talk) 23:43, 4 September 2009 (UTC)

I have send an email today. as i´ve only received a email on 15 of September telling me that the person i contacted had contacted the library but was still waiting an answer. In this email i asked if there is already an answer. When i receive a answer i´ll update this page. Tm (talk) 04:05, 21 November 2009 (UTC)
I´ve received an email, some days ago, from the same person that i´am contacting from the beginning, saying that still there isnt any aswer, from the library responsible for this collection, about the enquerie i made some months ago. Comments? Tm (talk) 13:11, 6 December 2009 (UTC)
I have to report that the library that keeps this collection, unfortunetly, decided to reject the request made some months ago as, according to the person i exchanged emails, this request "lacks a formal application and there is no treatment needed because the maps are already available online for the public." Tm (talk) 23:34, 14 January 2010 (UTC)
Ok. Looks like we're going to scrape their site after all. I'll have a look at it. Multichill (talk) 23:51, 14 January 2010 (UTC)
These images are easily scrapable through a bit of regex and looping. The various galleries are listed here Where each gallery has about 40 images of the same subject, different periods probably. Next to each gallery the name of the place is listed, where the category could just be like Category:Scotland maps or the like. We've done uploads through the Zoomify upload before so the experience is there.--Diaa abdelmoneim (talk) 10:07, 17 October 2009 (UTC)
I had a look at {{PD-art}}. Seems to work in Switzerland so no NPG issues ;-)
The plan:
  • Loop over the galleries at (does that contain all maps?)
  • Loop over all images in a gallery
  • For each image pull the metadata. Several sources. Have to see what information is useful
  • Pull the image with some dezoomify tool
  • Generate filename, description and categories
  • Upload to Commons
What metadata to use exactly is somewhat tricky. Also the dezoomify if a bit of extra work. Multichill (talk) 15:43, 15 January 2010 (UTC)
Multichill, might I volunteer my script, which will take in a web page holding a zoomify Flash object, regex for the location of the image tiles automatically and download and recompose the highest zoom level available. Have a look at: this page, which has a full code listing. Example of its work can be seen here. I hope it's useful. Inductiveload (talk) 02:58, 19 February 2010 (UTC)
Sure. Looks nice at first glance, but you should split it up in functions and use objects so it can be used in other programs (like pywikipedia). Probably best to make a lib part and a commandline part (which uses the lib part). What license is you code? Do you need some help restructuring it? Did you take a look at this script when you wrote your code? Multichill (talk) 09:19, 19 February 2010 (UTC)

Any progress? -- RE rillke questions? 18:25, 4 June 2012 (UTC)

Assigned to Progress Bot name


Message below was posted on the Commons:Village pump --Jarekt (talk) 19:23, 18 September 2009 (UTC)

Looks like a batch upload could be useful here: Tekstman (talk) 18:03, 18 September 2009 (UTC)

I browsed the site and they seem to have few hundred images scaned from old books with clear sources and their own PD justification. Some of those images might be useful, like those. Some should match them to our PD licenses. --Jarekt (talk) 19:23, 18 September 2009 (UTC)

I am working on uploading some of these directly from Project Gutenberg BMacZero (talk) 15:59, 15 May 2015 (UTC)


Assigned to Progress Bot name

Mollusca by Jan DelsingEdit

Photos of shells of Mollusca (143 bivalves, 1469 gastropods) by Jan Delsing from

The only uploaded example is:

The best names of files would be: BINOMIAL NAME shell.jpg

Example of filenames:

  • File:Pythia scarabaeus shell.jpg
  • File:Pythia scarabaeus shell 2.jpg
  • File:Pythia scarabaeus shell 3.jpg
  • File:Pythia scarabaeus shell 4.jpg
  • and so on.

Thanks. --Snek01 (talk) 18:09, 6 October 2009 (UTC)

If this information could help, then EOL has cooperation with and EOL takes public domain images and Creative Commons images from this source automatically. --Snek01 (talk) 10:18, 12 November 2009 (UTC)


Assigned to Progress Bot name

Virtual Manuscript Library of SwitzerlandEdit

Scans of manuscripts from the Virtual Manuscript Library of Switzerland. At this date, there are 482 manuscripts from 20 different libraries:

Usual copyfraud restrictions included... :(

  • This would be great. Could some bot owner look into this? ? Thanks, Yann (talk) 08:03, 14 February 2014 (UTC)

OpinionsEdit is a nice list. If they have some logic on their website, automatic scraping of their site will be noticed. So don't go too fast ;-) Multichill (talk) 21:10, 15 January 2010 (UTC)

Assigned to Progress Bot name Category

Mineral pictures of Leon Hupperichs on mineralienatlasEdit

Hello, I need help for uploading all pictures of Leon Hupperichs on Mineralienatlas:. His user page on mineralienatlas is here (435 pictures on 49 pages).

The picture description page should be the same as in the example File:Ravatite-MA1296598364.jpg. User category is Category:Files by Leon Hupperichs. Greetings -- Ra'ike T C 12:16, 4 March 2011 (UTC) P.S.: The other approved pictures of Leon Hupperichs on mindat will be loaded by User:Reinhard Kraasch, because he has all pictures loaded from mindat, when he worked for that old request and he is informed yet for the new one.


Assigned to Progress Bot name Category

West Bengal Public Library NetworkEdit

  • Source to upload from:
    • Did you observe an URL pattern: as DSpace format
    • Do you know whether the site as an API
    • What else can ease uploading (is the site valid XHTML, WCM they use…)?
    • Did you contact the site owner?
  • Describe the works to be uploaded in detail (audio files, images by …):
    • All PDF book are Public domain books ( Mostly Bengali & English language). The big issuse is book not available in single PDF, They are divided with chapter or pages, ( commonly 200 pages books, in 4 PDF )
      • This is a big problem. I don't know any automatic process to collate several PDFs into a single file. To be used in Wikisource, OCR would be needed, but AFAIK there is no OCR working software for Bengali. Yann (talk) 08:00, 14 February 2014 (UTC)
  • Which license tag(s) should be applied?
    • PD
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
    • Not needed
  • Jayantanth (talk) 07:40, 17 October 2013 (UTC)


Assigned to Progress Bot name Category

Batch uploads in progressEdit

Archives of American Art - Federal Art ProjectEdit

285 images are being uploaded as part of the Archives of American Art partnership with Wikimedia. All of these image are federal works, as part of the Federal Art Project. Missvain (talk) 04:25, 26 September 2011 (UTC)

Details about this collection are available at: [47]

One test upload is at File:Archives of American Art - Job Goodman - 2126.jpg and we welcome feedback on how the templates and categories are done. (most of the templates modified from the NARA templates) The upload is being done with a pywikipedia custom script, and is small and slow enough that it probably doesn't need a bot flag at this time. If we do more, larger batches, I (User:Aude) can apply for a bot flag. AAA uploader (talk) 04:35, 26 September 2011 (UTC)


Looks good. Please use {{Size}}, {{Other date}} and {{Technique}}, and wrap the notes in an {{lang|en}} or {{en}}.
Maybe we could create a External link template to build the source link ? URLs have a nasty tendance to change over time.
We also need to extend {{Original caption}} (did not know about that one, thanks! :-) to provide the language of the caption.
(I have some Python code lying around for parsing Size and Date taht I can sens you if you want − I really need to put those on the SVN >_<).
Jean-Fred (talk) 12:04, 26 September 2011 (UTC)
Thank you for these suggestions! I've updated the code to include the size and other date / isodate templates. I need to see what Sarah says about technique. I'm not sure how to classify "photographic print". I've also wrapped things in the {{en}} template.
For external links, these are ugly with no simple id parameter that I know of, but rather include some combination of the title + id. Anyway, I've linked the url to the id, like how NARA does it. Not perfect.
Please let me know if you or anyone has additional feedback. Cheers. Aude (talk) 03:30, 1 October 2011 (UTC)
I'm proceeding with the uploads now. I've tweaked how the templates are done, labelling the photographer as "Photographer" and not "Artist" since the subject of the photographs are artists and it may cause confusion, and adjusted how the description and sources are done, per feedback from Archives of American Art staff. It's a smallish number of images, so if anything needs to be tweaked post-upload, that's definitely doable. Cheers. Aude (talk) 03:27, 6 October 2011 (UTC)
Rather good. The lines beginning with "Identification on verso" could go to the inscriptions fields, since this is what they are. I think it would make the description field easier to read.--Zolo (talk) 17:49, 9 October 2011 (UTC)
Assigned to Progress Bot name Category
Aude trial AAA uploader

US National ArchivesEdit

I plan to use a bot to uploads images from the US National Archives' digital files. I currently have access to a cache of over 120,000 TIFF master files which are ready for upload. The bot is a custom pywikipediabot script written by Multichill (code) and it relies on slakr's toolserver tool to translate NARA metadata into Commons upload code. It will upload images using the custom {{NARA-image-full}}. Each page will be uploaded with that template filled out with the imported NARA metadata, plus {{Uncategorized-NARA}} to facilitate the categorization of these files. Dominic (talk) 19:15, 20 July 2011 (UTC)


Moved form Commons:Bots/Requests/US National Archives bot I wrote a bot to do the uploads. I added the link to the source. Multichill (talk) 19:54, 19 July 2011 (UTC)

  Comment For photographs, like 3 example uploads, I would suggest to look into a way to add more categories:
  • Author category
  • Date category
  • Subject category
  • Medium category (photographs, paintings, handwritten documents, etc.)
  • etc.
For other types of records other category types might be suitable. It is easier to add some of those categories before the upload. --Jarekt (talk) 01:43, 20 July 2011 (UTC)
I'm not sure how we could do any of these in an automated way. Not all documents have subjects, and the ones that do do not map onto Commons categories anyway. The same is true of the medium and author fields. The dates also seem difficult. Some of the dates are ranges, some are exact days, just months, or just years. Dates can represent dates of creation, copyright, publication, or broadcast. I am hoping we will be able to organize a major community effort for categorizing these, as it will take humans. The one thing that we can do is categorize them hierarchically according to the National Archives catalog structure. For example, each of the Ansel Adams items would go in the a category for the "Ansel Adams Photographs of National Parks and Monuments, compiled 1941 - 1942, documenting the period ca. 1933 - 1942" series. Of course, some of the series are less descriptive than others, but it's a start. Dominic (talk) 03:00, 20 July 2011 (UTC)
I think we should try 2 approaches. Add categories based on NARA catalog structure, We could make them hidden categories and encourage people to move images out of them, but this way we can group similar images together. I still think that we should try to match NARA authors with Commons creators and add appropriate categories. In my WGA upload all images have Creator template and matching author category. May be a way to accomplish that would be to create translation table there each NARA author is matched with a creator and category. Than your bot would read this table and use it to add proper templates and categories. Table can be easily added to the bot if it was implemented as external CSV file. We probably do not need to match every NARA author, since some might be quite obscure, but we should at least match all authors that already have creator template and authors with large number of records. Dominic, do you think it would be possible to put somewhere list of all authors of the files you are planning to upload and how many records are associated with them? I can try to see how many I can match. --Jarekt (talk) 16:01, 22 July 2011 (UTC)
This is what I did. I made {{NARA-Author}} for all of the authors. Every author (or person listed as a "contributor", whether it's a photographer, artist, director, etc.) has an ID and a page in the catalog that links to the records they are associated with. That template creates a URL to these author records in the catalog. I am not sure if that helps or hinders the attempt to make categories for them, but maybe we can use the template in some way to add categories based on those unique IDs? I will note, though, that it's actually uncommon for authors to be listed at all. Most documents are created by uncredited federal workers, and others are grouped into series based on the author, but the author field in the record isn't actually used (cf. this series). The full list of author records could actually be extracted from the dataset, if anyone is brave enough to try. Dominic (talk) 16:18, 22 July 2011 (UTC)
I did not noticed {{NARA-Author}} before. If it is added to all the images, that have author, than we can easily add creator templates and categories latter. BTW I did not see author records in NARA dataset or its description. --Jarekt (talk) 17:00, 22 July 2011 (UTC)
I do not know if there are separate XML files for the person authority records, like there are for items. However, if an item has a contributor mentioned in its record, the contributor's ID is also there in a field in the item's data file. This is how I am able to upload the files with that information. Dominic (talk) 19:04, 22 July 2011 (UTC)

As a test run, I have gone ahead and finished the Ansel Adams batch (220 files). [48] Dominic (talk) 04:45, 20 July 2011 (UTC)

End of move. Multichill (talk) 19:26, 20 July 2011 (UTC) I moved the discussion to here from Commons:Bots/Requests/US National Archives bot. We have two pages:

Why did I make this split? Because bot request take ages when we start discussing batch requests and a request gets closed when we actually want to provide more feedback. Can everyone please respect this? Multichill (talk) 19:26, 20 July 2011 (UTC)

ARC numberEdit

Another solution for ARC could be: store them all in a separate template page so that series ARC=408 would give "Record group 79: Records of the National Park Service, 1785 - 2006 (ARC identifier: 408)". This would make page description more concise and would also allow to add translations of record group and series names by editing only one page.--Zolo (talk) 01:51, 28 July 2011 (UTC)

We could even store more data than that in the template so that we would only need to provide the document ARC in the file description. This would not be as efficient but this would minimize duplicate info and would provide cleaner, potentially reusable data. Additionnally, this would make file description even easier by hiding away info that in most cases should not be changed by users. I have created a toy template in {{ARC/sandbox2}}. {{ARC/sandbox2|306514}} gives

  This media is available in the holdings of the National Archives and Records Administration, cataloged under the National Archives Identifier (NAID) 306514.

This tag does not indicate the copyright status of the attached work. A normal copyright tag is still required. See Commons:Licensing for more information.

English | Español | Français | Italiano | Македонски | മലയാളം | Nederlands | Polski | Português | Русский | Slovenščina | Türkçe | Українська | Tiếng Việt | 中文(简体) | 中文(繁體) | +/−

  • Record group: Committee Papers, compiled 1806 - 2000 (: 306513)
  • Series: 128: Records of Joint Committees of Congress, 1789 - 2004 (: 457)

This means that {{ARC/data}} will need to be quite large. To make it smaller, it could also be used for record groups and series only, and not for individual documents. But it would make it less useful.--Zolo (talk) 03:29, 29 July 2011 (UTC)

ParserFunctions are a bit beyond me, but could that possibly work with tens of thousands of records? Dominic (talk) 12:54, 29 July 2011 (UTC)


  • I think it would be useful for our users to have on each page a link to the relevant "Scope & Content" page of the photographic series the picture belongs to on the ARC website. These "Scope & Contents" pages contain valuable information on the origins of the pictures. As they are 2 clicks away from Wikimedia Commons, among a number of not-so-useful links, I think most users won't find them if we don't provide a direct link (We might also copy them to wikisource and link to the corresponding wikisource pages. We might copy them here on Commons if we get community approval for using gallery pages for that purpose). So for example, on this file it would be good to have the following : "Series: Signal Corps Photographs of American Military Activity, compiled 1754 - 1954 (Scope & Content)". I think "Scope & Content" is more important, for a first reading, than "Details". The "record ID" and "Source" fields should be merged and called "Source". Teofilo (talk) 23:21, 1 August 2011 (UTC).
    I changed my mind. I feel more like removing all the Record group, Series, NAIL Control Number information. The {{NARA-image}} template with its single arcweb link is enough. The users who want to know more can click on that single link which is an entrance to all the extra information. The "Record ID" field is not useful save the Nara-image template. Teofilo (talk) 08:44, 2 August 2011 (UTC)
    I doubt you'll find anyone agreeing with that point of view. And note that the ARC ID is more just the identifier that refers to the catalog record and allows us to make predictable URLs. The series and record group are actually descriptive metadata assigned by the archives that relate to the document creator and/or subject. Dominic (talk) 04:50, 5 August 2011 (UTC)
    I am afraid you are swapping the parts. Until now hardly any upload from the NARA was made by including those extravagant and noisy data which are not useful to a majority of users. You will find hardly anyone among those who uploaded contents from NARA in the past who agrees with you. For example File:USS Intrepid (CV-11) - Nov 44 a.jpg. That these extra data are not useful is common sense. For example, let's see how the Bundesarchiv pictures are documented. In the case of File:Bundesarchiv Bild 101I-731-0388-38, Frankreich, nach der Invasion, Infanteristen.jpg, all the extra information such as
  • Inventory: Bild 101 I - Propagandakompanien der Wehrmacht - Heer und Luftwaffe
  • Classification: Sachklassifikation/E {Zweiter Weltkrieg 1939-1945}/Ee {Kriegsschauplätze und Feldzüge}/Ee 300 {Westfeldzug}/Ee 350 / 360 / 370 / 380 {Frankreich*}/Ee 380 {Frankreich nach der Invasion (ab 6.6.1944)}/Ee 381 {Infanterie} Sachklassifikation/E {Zweiter Weltkrieg 1939-1945}/Ed {Truppen- und Formationsgeschichte*}/Ed 100 / 200 {Heer*}/Ed 110 {Infanterie}
was removed. Removing is the right thing to do. Please note also that the creator template was made collapsible because a lot of people found it too noisy. There is a wide support to the idea of keeping description pages streamlined and simple. Teofilo (talk) 09:24, 5 August 2011 (UTC)
  • Each page contains 2 links to en:U.S. National Archives and Records Administration. I think this is one too many (or two too many if you count commons:National Archives and Records Administration). Couldn't we just get rid of the "Current location" field altogether? Isn't the {{NARA-image}} template sufficient to mean that the pictures are located there ? Teofilo (talk) 23:02, 1 August 2011 (UTC)
    • NARA is a major US government agency with more than two dozen facilities. It's not a location. The location field is the record of where the physical document digitized on Commons is located. That the institution's name is linked more than once is because there are three separate templates used on the pages that are complete; it seems pretty trivial. Dominic (talk) 20:12, 2 August 2011 (UTC)
      Brainwashing the user by repeating three times the same message is an advertising technique amounting to using Wikimedia for a promotional campaign at the expense of usability. It overcrowds the template and makes the other information such as the author, date, or description fields proportionately less visible. The reason why the Artwork template contains both a "location" field and a "source" field is that we are dealing with photographs of paintings and photographs of sculptures. The "location" field is for the location of the painting/sculpture, while the source field is for the source of the photograph. For this reason, NARA uploads of paintings such as File:"Crocodile and Snake Fighting" - NARA - 558928.tif are wrong. The "location" field should be filled with "unknown", or with the name of the museum or of the private owner who owns the painting. Writing "National Archives and Records Administration, Still Picture Records Section, Special Media Archives Services Division (NWCS-S)" in the "current location" field of this painting is a mistake (for example compare with File:Serapis Louvre AO1027 profil.jpg, and count the number occurrences of the "Louvre" word there). For works that are just photographs, not photographs of paintings or photographs of sculptures, the "location" field should be removed. Teofilo (talk) 09:24, 5 August 2011 (UTC)
      These files are the records of a government agency, and the location field is the listing of the repository in which the records are held. That is not extraneous or unusual information. Your accusations of brainwashing and advertising are getting tiresome. The institution you are talking about is a public agency that holds public records; it is graciously making its high-res scans available to Commons with no strings attached. The "advertising" you are talking about is metadata added and maintained by Wikimedians because it is useful. Nothing of the sort has been demanded or even asked by the institution you are maligning. Dominic (talk) 16:13, 11 August 2011 (UTC)

File name maximum length and file name cutting formatEdit

The following is copied from Commons:Administrators' noticeboard/Blocks and protections#User:US National Archives bot

I think the bot should be blocked until the file-name issue is solved. See the "File:Combat memorable..." entry in Commons:National Archives and Records Administration/Error reporting or compare this NARA upload (name cut after "Gene") with previously uploaded picture with full name. Look at this list of 50 uploaded files where most of the file names are cut. It is not realistic to correct all these file name errors afterwards one by one, tagging each picture with {{Rename}}. The upload software bug must be solved so that the files are uploaded with the full name, without cut. Cut names not only produce an impression of bad quality upon users, it also creates a lot of potential wrong keyword searches in search engines. Someone looking for a "gene" (a biological system) should not find the "Alphonse Juin, Commanding Gene" picture in his search results. Teofilo (talk) 22:23, 30 July 2011 (UTC)

Er, you want it blocked? I can just turn it off, you know. I'm not exactly sure what the issue is, though. The titles get cut off when they reach the length limit. "The upload software bug must be solved so that the files are uploaded with the full name, without cut" is an impossible solution. This doesn't seem like a huge problem, certainly not one that's more important than getting the content on Commons. Most end users are going to be viewing the images on the projects, so the idea that these titles somehow negatively affect users because they are stylistically displeasing is a little baffling to me. Dominic (talk) 23:29, 30 July 2011 (UTC)
Oh? How come there no polite enquiry from Teofilo on either Commons:Batch uploading/US National Archives or User talk:Dominic? Oh wait... Jean-Fred (talk) 23:37, 30 July 2011 (UTC)For Jean-Frédéric, here is the Commons:National Archives and Records Administration/Error reporting link again, where the problem was debated between Dominic and me below the "File:Combat memorable..." entry. Teofilo (talk) 11:11, 31 July 2011 (UTC)
Actually, I posted a fix a couple of days ago for the problem Teofilo mentions. Oddly it hasn't been applied yet. --  Docu  at 05:36, 31 July 2011 (UTC)
Thank you for doing so. I was not aware that you had prepared a fix. Teofilo (talk) 11:11, 31 July 2011 (UTC)
I thought that that was about the dates appended to the end of titles. I don't see where you mentioned the issue Teofilo is concerned about anywhere on the page. Dominic (talk) 19:10, 31 July 2011 (UTC)
Who is(are) the person(s) in charge of the upload software ? According to en:Wikipedia:Naming_conventions_(technical_restrictions)#Title_length, "Titles must be less than 256 bytes long when encoded in UTF-8.". Measured with , File:US Navy 050419-N-5313A-049 A U.S. Marine Corps AV-8B Harrier launches from the flight deck of the amphibious assault ship USS Kearsarge (LHD 3) during flight operations in the Mediterranean Sea.jpg is 202 bytes long and File:Combat memorable donne le 22, 7re 1779, entre le Captaine Pearson commandant le Serapis et Paul Jones commandant le Bonh - NARA - 532895.tif is only 145 bytes long. So it looks possible to add 256-145=111 more characters into NARA uploads' file names. The full title "Combat memorable donne le 22, 7re 1779, entre le Captaine Pearson commandant le Serapis et Paul Jones commandant le Bonhomme Richard et son escadre, 07/22/1779" being 159 characters long, it should be OK. With 249 characters, "Pvt. Jonathan Hoag,...of a chemical battalion, is awarded the Croix de Guerre by General Alphonse Juin, Commanding General of the F.E.C., for courage shown in treatingwounded, even though he, himself, was wounded. Pozzuoli area, Italy.", 03/21/1944" is perhaps only one or two characters longer than the 256 limit after adding "File:" and ".tif". Also it could be decided to cut whole words instead of cutting in the middle of the words, and to use (…) at the location where the cut is performed, like I did for this upload of mine. Perhaps it would be best to always keep the date at the end of the title, and to cut the words located before the date. Teofilo (talk) 12:40, 1 August 2011 (UTC)
I am running a script that was written by Multichill; he's not in charge of the bot's actions, but I am not a programmer, so I can't easily make changes without him. I was not originally aware that the character limit was that high. I had thought that the limit was being imposed by the upload form, not by the bot's script, which is why I was saying it wasn't fixable. I see now that we can allow even longer titles, but I am not sure if we should. This should be discussed at Commons:Batch uploading/US National Archives, as the names already seem rather long and unwieldy to me. Your suggestion to not have it cut off titles mid-word, though, is a good one, I agree. In any case, I don't think this is a dealbreaker. The full titles are all contained in the template's "title" parameter, so we wouldn't have to go back and rename anything manually anyway, since a bot can extend the names using that data. I think it is more important to get the files actually uploaded at this point. Dominic (talk) 14:27, 1 August 2011 (UTC)

End of copy from Commons:Administrators' noticeboard/Blocks and protections#User:US National Archives bot

Do you have a deadline after which the files won't be available any longer ? File renaming is an activity which consumes a lot of resources and which is generally frown upon unless there is a good reason to do so. I am afraid the massive file renaming operation will be refused. When there is a problem in a car factory you stop the production line until the problem is solved. You don't sell the cars first and recall them a year later to change the defective part. The latter is more expensive. I think we need more opinions from people with bot software writing experience and help from people who would be willing to actually modify the script or write the file renaming bot's script. I am going to copy the present talk on Commons:Batch uploading/US National Archives. Teofilo (talk) 17:00, 1 August 2011 (UTC)
Well, I am only here for a couple more weeks. The files are not available on the Internet, but on hard drives here in the office. So it wouldn't be wrong to say there is a deadline of sorts. I am not sure the analogy to the factory is appropriate, as we're not recalling anything, just changing a name on a wiki. I'm not even sure if this is important enough that we would want to go back and change past uploads, even if we do change the convention going forward. They are not erroneous, just truncated. Dominic (talk) 17:21, 1 August 2011 (UTC)

For those who don't want to read all that text, the question is whether we want to make use of the full 250 characters we are allowed for the file names, which can be quite long, or whether we want to truncate it at a shorter length. The script is currently truncating at 120 characters, which isn't exactly short either, but does cause a lot of titles to get cut off. Dominic (talk) 17:21, 1 August 2011 (UTC)

I agree the file name issue should be fixed before next batch of uploads and I think we should keeping titles short. Lets concentrate on the issue of how to do it. Dominic, Is this still the code you are running? If so than I assume that the issue is with "if len(titleText)>120: titleText = titleText[0 : 120]" line. Docu, did you say you posted a fix somewhere? If so than where? I think we can solve this issue in the timely manner as not to slow down Dominic too much. --Jarekt (talk) 17:43, 1 August 2011 (UTC)
Yes, that is the code. It seems easy enough to change, except this is more a question of style than a bug in the code, so I'm not sure what chance, if any, to apply. (I think Docu is referring to the date issue, not this one, but I am not sure.) Dominic (talk) 17:59, 1 August 2011 (UTC)
The date issue appears on the NARA website too. It is not a simple upload bot script problem, although a script could help remove the extra date. I don't think there might be so many files with the date duplicate issue, so I guess it won't be so bad if we leave that issue unsolved. Teofilo (talk) 18:19, 1 August 2011 (UTC)
I have inquired, and these are actually not errors so much as limitations in the NARA catalog software. That "coverage dates" field, which is used to refer to the dates depicted in the document's subject rather than the document's creation, can only take ranges. When you put in a single day, it still makes it into a range. This isn't something they are going to fix. Dominic (talk) 18:35, 1 August 2011 (UTC)
A few more ideas:
1) Unwieldy ? Of course they are but we are in a situation where we must choose between the less unwieldy of two unwieldy possibilities. The possibility with extra-long names, and the possibility with names cut in an automatic fashion which creates wordings that are at times perfectly meaningless. It should not be forgotten that for a number of users English is a foreign language and it is less obvious when you don't master the language to understand that a sentence was cut and you should not even try to read a meaning. Also we should try as much as possible not to misrepresent the quality of the NARA's work. The NARA's work might have a number of shortcomings, but in any case the NARA does not produce botched file names.
2) While the files with a cut name are, in my opinion, a problem, there is no reason to prevent the bot from uploading all the other files with a short name. One possibility would be to quickly modify the bot script so that the files with long names are avoided for the time being, and to upload them later, after we have decided what to do with them.
3) One option would be to decide the new shorter names manually, on a case by case basis. We would have a bot write all the long file names in the left column of a table, and then we would request Wikimedians to write the shorter names with (…) in the right column. Then when all shorter names are available, the upload bot would be able to pick up the shortened files names from the table. Teofilo (talk) 17:48, 1 August 2011 (UTC)
Ensuring that we don't cut off names mid-word will help, as would adding "..." to the end when cut off will help. Note that even at 250 characters, some titles will be cut off. I am not sure (especially judging by Jarekt's reply) that there is agreement to do that, though. Dominic (talk) 17:59, 1 August 2011 (UTC)
I see 2 possible solutions:
  • Automatic: if filename is longer than 120 characters than look for periods, semicolons or commas and trim there. If string still longer than 120 than trim on the word end. Add ... in last case and may be in case of the trimming at a comma.
  • Manual: if filename is longer than 120 characters than (as Teofilo suggested) skip it for time being, while writing its ID and title to some log file. Than from time to time read the log file in Excel (or some other spreadsheet) and manually trim the title. Or post the file somewhere, so others can help (Teofilo?). Than alter your bot to allow upload of those specific files with provided filenames. I should be able to help with this part, if you need help.
The first solution is much less work. So that would be my preference. --Jarekt (talk) 18:49, 1 August 2011 (UTC)
1) If you are patient enough to read 120 characters, why aren't you patient enough to read 256 ? Both the NARA website designers and the Library of Congress website designers have felt normal to require from their users to read titles longer than that. For example the html < title > attribute of is 330 characters long. What is wrong with that ? If the Library of Congress asked you for advice, what advice would you give ? Also, the fact that a title is displayed on your browser page does not mean you have to read the whole of it. If you are tired with reading, you can stop reading and look at some other area of the page.
2) I tagged one the the NARA uploads with {{rename}} diff. The file was renamed today. Here is the result and I think it is much better (although I forgot to include the date). And I don't feel it is too long. If you remove the last part, the dramatic - tragedy - effect meant by the creator is lost. Sometimes titles are pieces of litterature, meant to create emotions. Many of these pictures were used for propaganda. The caption was perhaps as important as the scene represented. Teofilo (talk) 22:18, 1 August 2011 (UTC)
3) For people who are unhappy with file names longer than 140 characters (while being shorter than 256 characters) it may be possible to create a Javascript (or gadget, or fullfledged mediawiki extension) which automatically cuts the name that is displayed onscreen (with the possibility to read the longer version in a mouseover). Teofilo (talk) 23:41, 1 August 2011 (UTC)

I think you are looking at this entirely the wrong way. Relatively few people are looking at the images on Commons itself, and the ones that are are usually the editors that are maintaining them, not the people using the images. No one is really concerned about a long title looking a little unsightly at the top of a description page. We do, however, have to think about how this is going to be used on the projects, and huge file names make article text hard to read in the edit view and make Wikisource index pages incredibly odd-looking. And for what? You're writing as if the file name, which is clearly marked off with a "File:" and a ".tif" and has other data in it, is the title itself. It may be true that titles are pieces of literature and that they are important, but no one wants to remove the title. There is a title field in the metadata for that, quite apart from the file name. Dominic (talk) 00:15, 2 August 2011 (UTC)

The view that Commons is for Wikipedia is not very popular here. A lot of people insist that Commons should be viewed as a media repository independently of its value for Wikipedia. The file name is aslo important as being the caption you read when your mouse hovers on a file name below a thumbnail in a category page. Teofilo (talk) 00:39, 2 August 2011 (UTC)
4)I have found the following pictures from a batch upload a (171 B), b (170 B), c (176 B), d (177 B), e (174 B), f (176 B), g (180 B), which probably means the uploader did not found these lenghts annoying. Teofilo (talk) 00:39, 2 August 2011 (UTC)

For me filename needs to meet 2 requirements be meaningful and be unique. The second part (<20 characters) provides uniqueness, and the first part is trying to be be meaningful and I think 100 characters is plenty to accomplish that. I find long names to be distracting and award, and wikitext using them hard to read. However raising the maximum length of the filename would be by far the simplest way to "fix" the issue. --Jarekt (talk) 03:38, 2 August 2011 (UTC)

In my view, filenames needs to be authentic. If Shakespeare called his play "Romeo & Juliet" you can't rename it "Richard & Julia" because you have a personal liking for these names. If some obscure Office of War Information bureaucrat during World War II decided to call a picture "Members of the 6888th Central Postal Directory Battalion take part in a parade ceremony in honor of Joan d'Arc at the marketplace where she was burned at the stake" you cannot change it. The only alternative would be to use a totally cryptic name, like 43-0194a.gif. I don't think there is a middle unauthentic term between a totally cryptic name and the full authentic name. The argument that the full name is written in the "title" field of the template anyway, fails to convince me, because putting an unauthentic name in a more prominent place than the authentic name remains an aggression of authenticity. The choosing of a long caption or name in association with a picture by some administration during World War II is a historical fact. Even if you find that fact distracting or ugly, you can't change it. By the same token, some picture happen to be ugly. But for authenticity's sake one should not retouch an ugly historical picture to make it look nicer. If a picture has an ugly title, you can't change it either. You can't retouch "Romeo & Juliet". Teofilo (talk) 09:17, 2 August 2011 (UTC)
For this file, and this one key information, location and year, are cut. Teofilo (talk) 16:04, 2 August 2011 (UTC)
It is quite clear by now what your opinion is, Teofilo. What we are looking for is other opinions to see if anyone actually agrees with you. Dominic (talk) 20:17, 2 August 2011 (UTC)
The only absolute criteria for filenames are 1) uniqueness (easily done with the ARC) and 2) length is under the technical limit (easily done by truncation). All other considerations are cosmetic, as the full metadata is listed in the info template. The filename is just a key for the file database: it doesn't have to contain a perfect description of the image, most files at Commons don't. To be honest, we could call all images "NARA image - ARC 123456.tiff" and be done with it. So I don't think it matters where we chop the description. I'd lean towards shorter, as long filenames can be pain at Wikisource (we have the full name in the Page: namespace, for example), but that is a minor gripe. The metadata will always be in the info area, and only the ARC is required to uniquely identify the image. So, I'd say truncate at whatever is most convenient. Inductiveload (talk) 23:29, 2 August 2011 (UTC)

This file name cut removed the most important : Captain Harry Truman Teofilo (talk) 21:40, 3 August 2011 (UTC)

Teofilo, You provided dozen of examples of trimed filenames. However to me the only issues with those is that they are too long. I agree with Inductiveload that "filename is just a key for the file database" and that descriptions can be found inside file descriptions. --Jarekt (talk) 02:39, 4 August 2011 (UTC)
You wrote "I agree the file name issue should be fixed" above on this page on 1 August (diff). If you agree with Inductiveload that "filename is just a key for the file database", what is the issue which you want to fix ? Or have you changed your mind since 1 August ? Teofilo (talk) 12:58, 4 August 2011 (UTC)
Note that truncated names now only terminate at the end of complete words and include a "..." when there is any truncation. Dominic (talk) 04:31, 4 August 2011 (UTC)

File matching toolEdit

I think we need a developer for the development of a file matching tool. That tool would use an interface similar to that of Cat-a-lot, with the possibility to select two files from a gallery page. Then the tool would

  • add the |Other version field in both files
  • pick up the categories from the older file and add them into the newer file (and vice-versa) Teofilo (talk) 12:08, 5 August 2011 (UTC)
This does not make sense to me. What gallery page? How will non-identical versions be detected by a bot? The eventual plan is to add JPG/DjVu versions of all these files by bot, so they will all have linked file in "Other versions" that will be usable on the projects at some point. Dominic (talk) 16:13, 11 August 2011 (UTC)

Author information retrieving botEdit

We need a bot to explore systematically all pages similar to in order to retrieve author information. At present such author information is not provided by the upload bot. Perhaps it is simpler to to this separately with another bot. I think I am personally getting tired to add this information manually (for example, see this diff). Teofilo (talk) 12:08, 5 August 2011 (UTC)

Those are not structured pages and I see no way for a bot to extract author information from them. There are some tasks that simply require a human. Dominic (talk) 15:43, 5 August 2011 (UTC)
All captions from (example : "Danny Kaye, well known stage and screen star, entertains 4,000 5th Marine Div. occupation troops at Sasebo, Japan. The crude sign across the front of the stage says: `Officers keep out! Enlisted men's country.'" Pfc. H. J. Grimm, October 25, 1945. 127-N-138204) and similar pages should be extracted (by a bot or human) and put into the left column of a table. Then a bot should say if the file was uploaded on Commons or not, and if so, provide a link to the file uploaded on Commons, and say if the |author= is still void. Then humans could pickup the author name from the full caption. This would ensure that this is done in a systematic way, and that no chance was missed to find author names. Teofilo (talk) 15:30, 6 August 2011 (UTC)

Actually a bot could compare the string of characters in the full caption at and the string of character in the |title= field on Commons. For example, comparing ["Danny Kaye, well known stage and screen star, entertains 4,000 5th Marine Div. occupation troops at Sasebo, Japan. The crude sign across the front of the stage says: `Officers keep out! Enlisted men's country.'" Pfc. H. J. Grimm, October 25, 1945. 127-N-138204] with [|Title=Danny Kaye, well known stage and screen star, entertains 4,000 5th Marine Division occupation troops at Sasebo, Japan. The crude sign across the front of the stage says: "Officers keep out! Enlisted men's country."] would reveal that "Pfc. H. J. Grimm, October 25, 1945. 127-N-138204" was left out. After all left out parts are neatly listed in a table by a bot, humans could try to figure out what they can do with them. Teofilo (talk) 15:44, 6 August 2011 (UTC)
I think you missed the point. How do you know what to compare? You have a Commons image file, and then you have a string of characters on a random webpage. If a human has to find and point the script to the line on the page that has the information, it kind of defeats the purpose. Dominic (talk) 17:10, 8 August 2011 (UTC)

Categorizing progress statistics softwareEdit

[concerning Commons:National Archives and Records Administration/Categorize/Progress ]


Would it be possible for BernsteinBot to compile more data ? At present the "categorized" column on, for example, this page only provides a boolean "categorized" YES/NO parameter. Would it be possible to retrieve the number of added categories and to calculate the percentage of files with 2 or more categories, with 3 or more categories, etc... ? Especially if the number of categories is only one, I consider that the job is not finished. Files should have at least 2 or 3 categories, in most cases. It would be good to have a way to find the files with only one category, so that people can quickly go to those files to finish the job. Teofilo (talk) 12:58, 1 August 2011 (UTC)

The above is a copy of a message I left on Bernsteinbot's owner talk page Teofilo (talk) 12:12, 5 August 2011 (UTC)

I think we need also statistics to control whether the |author field has been completed or is still left blank. Teofilo (talk) 12:19, 5 August 2011 (UTC)

It operates based on normal Commons procedure. Files are either uncategorized or they're not. I don't see much evidence for your opinion that files with only one category are "unfinished". It would be nice to collect some of these statistics for measuring outcomes, but I'm not convinced it would be very useful (or very much used) by people categorizing. Its certainly not a pressing need. Dominic (talk) 17:05, 8 August 2011 (UTC)

Using en language templatesEdit

Dunno if there'll be any further bots edits to the already uploaded images, but I guess there will. So if there is a chance, could someone please add {{en|…}} around the descriptions (title and general notes)? I'm a bit surprised that this (apparently) didn't happen already on upload. Using the template would make future translations a bit easier, and is generally recommended here on Commons for internationalization issues (even if it's only regarded as helpful for users who don't speak English, to allow quick and easy identification of the language used). Many thanks in advance --:bdk: 14:32, 20 August 2011 (UTC)


This page is getting very unwieldy. I am going to be marking and collapsing threads that seem to be resolved so that it is easier to navigate the page and see what needs to be addressed. If anyone feels that I have erroneously marked something as resolved, please feel free to uncollapse it and say so. Dominic (talk) 17:45, 11 August 2011 (UTC)

I marked general questions of categorization as resolved, as we have developed a process for assisting editors in categorizing. Every image uploaded is given {{Uncategorized-NARA}}, which places it in Category:Media contributed by the National Archives and Records Administration. Each file is also automatically placed in a category for its NARA series. We have an automatically updated project page at Commons:National Archives and Records Administration/Categorize/Progress where Commons editors can see the progress of per-series categorizing and navigate down to to a list of individual images that need categorizing. In tis way, hopefully adding topical categories for all files will be manageable. Dominic (talk) 18:43, 11 August 2011 (UTC)
Open issues

I am trying to summarize the issues that are in any way open, so we can bring some closure to this and the uploading can be completely above board.

  1. Can we automatically match NARA author data with Creator: templates and categories on Commons? — I'd like to work on this, but it can be done within the template, so it doesn't need to block uploads.
  2. Do we want to move the "NARA - <ID> - " part of file names to the front? — It will stay as is unless we hear from more people that they want this.
  3. Storing metadata on a separate template. — I wasn't entirely sure how useful or even possible this is, so I have left it alone in case others have thoughts.
  4. Teofilo's requests:

It seems to me that all of these fall into the category of things that can be worked on during/after the actual upload of files, with the possible exception of the file name lengths. However, that and several others either do not seem very well supported or thought out. New comments, even if it's just simple agreement or disagreement, would help clarify the level of support. Dominic (talk) 19:09, 11 August 2011 (UTC)

Uploaded Progress Recent uploads Category
199,833 80 % Gallery Category:Media contributed by the National Archives and Records Administration

81% completed (estimate)


Assigned to Progress Bot name Category
Dominic 80 % US National Archives bot Category:Media contributed by the National Archives and Records Administration

United States Fish and Wildlife ServiceEdit

Moved from commons-l to here:

Here's the latest output from my upload script:

This includes all of the changes that I was planning to make. What do you think? Shall I go ahead and upload all of the images that I have downloaded at

FWIW, the code is here: In particular:


I moved it here. Feel free to change this page whatever way you like. I love the fact that you're working on this! Some input from my side:

  • Title: Please don't prepend with "FWS", this messes up to sorting of categories. Please append it (<title> - FWS - <id>.jpg like that)
  • Date: Please use the original date (not the scan date)
  • Original metadata: Please don't add that. You just have to integrate it with the page. An easy way to do that is to create a template you can substitute at upload, see for example user:Multichill/WGA. If people ever wonder about the original fields they can just check the uploadlog. Started one at User:BotPyrak/FWS
  • Categories: This image is uncategorized. The subjects seem to be very suitable for assigning categories. You can either put the images in these categories directly or work with temporary categories so users have to move it.

Multichill (talk) 14:02, 5 June 2011 (UTC) Ok. I started an upload template at User:BotPyrak/FWS.

  1. At upload do
  2. Loop over the key value pairs and output them like
    (code for that is already in pywikipedia). The individual subject fields have to be available at subject_1, subject_2 etc. I still have to add the code for that to Pywikipedia.
  3. Close with

Multichill (talk) 14:18, 5 June 2011 (UTC)

Assigned to Progress Bot name Category

Geographicus Rare Antique MapsEdit

Geographicus Rare Antique Maps is a specialist dealer in fine and rare antiquarian cartography and historic maps of the 15th though 19th centuries. A large portion of their inventory of authentic antique maps is online at their website. The owner send an email to OTRS and Pharos brought him in contact with me. It's a collection of about 2000 old maps. The owner was perfectly happy with me using dezoomify to get the high resolution images. Stuff used to get this batch upload going:

I have a comma separated file with all the metadata. I loop over all images:

  • Get the metadata and do a bit of cleaning up
  • Construct the title: <product_name> - Geographicus - <id>.jpg
  • Construct the description: Basically just put all fields in User:Multichill/Geographicus
  • Check if the title doesn't already exist (probably the same file, dupe checking might be a bit problematic, have to find out)
  • Download the high resolution image using dezoomify
  • Do a dupe check
  • Upload the image

I could use some help to get User:Multichill/Geographicus/cartographers more complete. When the upload start it would of course be nice if people can help with Category:Maps from Geographicus to be categorized. Multichill (talk) 11:42, 12 March 2011 (UTC)

Unknown artistEdit

Very nice batch Multichill.
One image I checked had the artist field linking to − maybe a bot could replace that by {{unknown|author}} ?
Jean-Fred (talk) 18:22, 25 March 2011 (UTC)
Changed. When User:Multichill/Geographicus/cartographers is complete I will do a bot run to replace it. Multichill (talk) 18:43, 25 March 2011 (UTC)

Geographicus linkEdit

Nice project and good file descriptions. I am just wondering on a small point: {{Geographicus-link}} does not really provide an accession number, shouldn't it go in the "references" field ?--Zolo (talk) 06:59, 25 March 2011 (UTC)

From {{Artwork}}: accession number: Museum's accession number or some other inventory or identification number. Provide also link to museum database if available.
It does provide an accession number. Take for example this image. The id is Hempstead-uscs-1925, this gives Geographicus link: Hempstead-uscs-1925 (a link to the source in the Geographicus database). I don't think this should go to the references section. Multichill (talk) 18:48, 25 March 2011 (UTC)
I think the passage from artwork documatation you quote made more sense in older versions of the template, when accession number was called ID. It is true that Geographicus link may be somewhat akin to an accession number, but it does not look like a number, so I think it sounds a bit odd. Since "Hempstead-uscs-1925" is a "code" and is "Necessary for phone orders", maybe we could simply change the layout of {{Geographicus-link}} to something like:
Geograhicus code : Hempstead-uscs-1925--Zolo (talk) 09:42, 26 March 2011 (UTC)
I have changed {{tl|Geographicus-link}, revert it if it is not good but I think it looks more logical this way in an "accession number" field (and the word code is used by Geographicus)--Zolo (talk) 14:30, 1 April 2011 (UTC)

Broken uploadsEdit

Just a section to add broken files. Maybe we can get them later on:

  Comment Also the low res file it is not the same as the map on the source link. Tm (talk) 04:00, 14 June 2011 (UTC)

Ordnance Survey OpenDataEdit

I spotted the OS map File:Whitehaven area 1 in 250 000 scale.png being used on the high traffic w:Cumbria shootings article and thought it was a copyright violation. I then looked closer, and realised that w:Ordnance Survey has released several datasets under a free (and Wikimedia Commons compatible) license, as part of their OpenData Initiative.

This is a fantastic resource, and of great use to almost any UK geography article. This could greatly standardise mapping across the UK, and replace many custom one off maps with the recognisable and accessible OS standard. There have already been a few OS OpenData uploads to Commons, and they can be found at Category:Maps from Ordnance Survey. A list of all OpenData products can be found at, the license can be viewed at

I've not looked into the datasets themselves, but a lot of them are in TIFF format, which would be an issue. And there's also the issue of how we split the maps into separate files. Even if the source data is in separate files, they may arranged in arbitrary grid squares, which may not be helpful if you'd like to show a map of a town. Is there am easy way to display a matrix of multiple maps on Wikipedia?

Still, I don't think these are big issues. These maps are definitely within Common's scope, and can make an immediate impact across Wikimedia projects. Suitcivil (talk) 22:41, 4 June 2010 (UTC)


The grid

Great sets. License looks ok. I'll take have a shot at it. I wanted to look into these datasets anyway for the Commons:Batch uploading/Geograph upload. Multichill (talk) 08:04, 5 June 2010 (UTC)

I've downloaded the tiff files. It's about 20G. I'm going to upload the tiff files and jpg versions of the files. I'm going to make a new license template to reflect the OS license. That will probably all be fine, I'm just wondering what the best way is to get these files categorized. All files should end up under Category:Maps of the United Kingdom. Based on the grid square it's probably possible to find the right subcategory.
Each map should link to the other file version (tiff -> jpg, jpg -> tiff) and it would be very nice if every map links to the squares next to it for easy navigation. Multichill (talk) 11:34, 5 June 2010 (UTC)
At I stored the files. I'm currently converting the tif files to jpg files, this will probably take a while to complete. Multichill (talk) 14:04, 5 June 2010 (UTC)
As these are images with a limited colour palette, and significant line detailing, wouldn't PNG be preferred over JPG? Suitcivil (talk) 15:58, 5 June 2010 (UTC)
Just for the record, I've spoken to Multichill on IRC earlier today and asked the same. The PNG thumbnailer doesn't render files over 12.5 M pixels. Also there is some work being done one a TIFF thumbnailer, I dont know what the maximum filesize is for that. –Krinkletalk 16:07, 5 June 2010 (UTC)

After a short IRC conversation, I'm currently experimenting what the options are regarding SVG. There are vector files available. I'll convert a few to SVG and see how our SVG parser is doing. –Krinkletalk 16:07, 5 June 2010 (UTC)

SVG will probably be quite hard to create. I'm sticking to the tiff and jpeg files. I created Commons:Batch uploading/Ordnance Survey/Template to be substituted, {{Map tile navigation}} to navigate the tiles and {{OS OpenData}} to be used as license tag. Multichill (talk) 21:47, 5 June 2010 (UTC)
I'm about to upload the first batch jpg part & tif part. Multichill (talk) 15:43, 6 June 2010 (UTC)
The first (small) batch is online. Ordnance Survey 1:250 000 Scale Colour Raster map gives a nice overview. In this batch I have 1 map per grid square. I have two more batches, one with 100 images per grid square and one with 400 images per grid square. Feedback would be nice. Multichill (talk) 20:07, 6 June 2010 (UTC)
I should probably add a category to each batch, what about:
Category:Ordnance Survey 1:250 000 Scale Colour Raster maps
Category:Ordnance Survey Street View maps
Category:Ordnance Survey Vector Map District maps
Should I timestamp the files in the filenames? Multichill (talk) 19:00, 8 June 2010 (UTC)
I'll just keep on talking to myself. New batches at and . Probably need some fine tuning. Multichill (talk) 20:10, 8 June 2010 (UTC)
Assigned to Progress Bot name Category
Multichill Uploaded the first batch OrdnanceSurveyBot Category:Maps of the United Kingdom (and subcats)

Boundary-Line dataEdit

I've uploaded PNG and SVG versions of the parish_region shapefile in the The png works ok, but lacks detail; whilst the svg is huge: Inkscape runs out of memory for me when I try to do anything with it, and Mediawiki struggles.

IMO, one significant flaw in the data is that the boundaries for Bristol, Liverpool and Torbay (amongst others) include significant area of sea; distorting the coastline. The one data file in the set that doesn't have that issue is the high_water file, so combining the two would produce more useful maps again. My SVG skills aren't up to that job, and in any case I'd have memory issues.

I'm not totally sure but I believe the contents of the various shapefiles in are:

  • parish_region - Civil parishes (and equivalent, uploaded)
  • district_borough_unitary_ward_region - Electoral wards in England and Scotland; excluding counties with a unitary council (eg Cornwall and Wiltshire)
  • high_water_polyline - Coastline of England, Wales and Scotland (and outlying islands)
  • westminster_const_region - Parliamentary constituencies for all of England, Wales and Scotland
  • district_borough_unitary_region - Admin districts for England, Wales and Scotland
  • county_electoral_division_region - Electoral divisions for certain counties in England (non-Unitary Authorities)
  • unitary_electoral_division_region - Electoral divisions for Wales, and those parts of England excluded in previous file.
  • scotland_and_wales_const_region - Unsure - Welsh assembly and Scottish parliament constituencies I think.
  • scotland_and_wales_region_region - Electoral regions for Scotland and Wales (for the devolved governments)
  • county_region - English counties, which are not Unitary authorities.
  • greater_london_const_region - The london assembly constituencies.

I planning to do location/locator maps for at least:

  1. Districts, parishes, parliamentary constituencies and wards within counties
  2. Wards within Unitary Authorities.
  3. Scottish Parliament, Welsh Assembly and London Assembly constituencies.

I'll do these as png only; mainly as a result of the coastline issue I mentioned.--Nilfanion (talk) 12:47, 8 June 2010 (UTC)

Nice that data is used!....but isn't this a bit out of the scope of this page? Multichill (talk) 19:00, 8 June 2010 (UTC)
Well.. yeah sort of off-topic but if I do make a bunch of location maps (over 10,000 easily) that would be a related batch upload :) Figure this is sensible place to ask for 2nd opinion on that.--Nilfanion (talk) 19:23, 8 June 2010 (UTC)
Fair enough. Did you see the on how to reuse their data? Multichill (talk) 19:43, 8 June 2010 (UTC)

Curious as to what you plan to do with the Vector Map District data. The OS provides raster (tif and jpg) data, but its primarily a vector product which unfortunately is in shapefile format only...--~~


The Tropenmuseum donated about 2100 image related to Suriname and will donate a lot more images in the future (see Commons:Tropenmuseum). GerardM did the communication part, did Multichill the uploading/technical part.


The first batch I got were 2100 images related to Suriname and the Marroon. I received a DVD containing the images and a Microsoft Access database containing the metadata. I created a user ODBC connection in windows and used pyodbc to make a connection from python. The code is a combination of custom code, pywikipedia and functions I copied from previous projects (Deutsche Fotothek & WLANL). The filenames were already in the right form and contained a unique identifier so I had my bot loop over the files and for each file:

  1. Extract the unique id
  2. Using the identifier pull all relevant info from the database
  3. Generate a description
  4. Generate temp categories
  5. Generate a Sha1 hash and check for duplicates
  6. If the file doesn't exist yet, upload the file using KITbot

Of course you can find the source in my svn.

The provided metadata was excellent. It contains descriptions in one (Dutch) or more (English) languages and was very useful for generating temp categories. All the images are placed in Category:Images from the Tropenmuseum and a bunch of temp categories. Images have to be copied from these temp categories to real categories. Turned out we don't have a lot of Suriname related images so I pretty much had to build a category tree from the ground up. This is a lot of work, but images end up in very good topic categories. It also improves the chance of images ending up in multiple relevant topic categories (previous batch uploads images got stuck at only one category). This is a lot of moving around, but I that's just a job for a bot. This mapping causes a lot of over-categorized images, but this can easily be fixed with the recategorization bot ( -cat:Images_from_the_Tropenmuseum -onlyfilter). For the next part we have to figure out how to get people to categorize the images because I don't feel like doing this all alone. Users only have to map temp cats onto topic categories, the actual moving is done by a bot. Not sure how to make this easy for other users. Multichill (talk) 11:41, 16 September 2009 (UTC)


Yesterday Gerard and I visited the Tropenmuseum. We got 35.000 images and a database with all the metadata. I slightly modified the program I used for Suriname and fired up the bot. Modifications:

  • Other database name and other table names
  • Changed the regular expression to find the id of the file
  • Removed some encoding bugs
  • Filtering the temporary categories to get rid of the completely useless categories right away
  • Added <!--{{id|1=To be translated}}--> so Indonesian translations can be added later.

The upload will probably be finished tomorrow. Than comes the hard part: Categorization. I added temporary categories again, but this time I got some data from the Tropenmuseum describing the structure of these categories so I can build a tree. I will first do this for the geography tree. Multichill (talk) 22:19, 26 November 2009 (UTC)


Moved to Commons:Tropenmuseum#Categorization to avoid redundancy

Opinions first partEdit

  • Making categorization easier: How about doing something like with the Fotothek upload? Like creating the temporary categories with a commons delinker link and a suggested category, waiting for a user to review it. And where are all these categories stored? I mean where can I find a list of all the temporary categories with how many files they contain so I could check for a better category name also for the delinker? Automatic Dutch to English translation would also make it a lot easier, instead of going to Google and translating...BTW, the upload is already finished right?--Diaa abdelmoneim (talk) 00:05, 18 September 2009 (UTC)
3th batchEdit

A third batch is expected somewhere in February 2010, but this might be much later. Until then we have plenty of images to keep us all busy. Multichill (talk) 22:57, 20 December 2009 (UTC)

Objects due to arrive soonEdit

Just got an email. The next batch is in the (snail) mail now. The next batch is 6000 photo's of objects in the collection of the Tropenmuseum. Probably going to upload these objects in the next couple of days. Multichill (talk) 15:58, 16 June 2010 (UTC)

We had some problems, but now I'm uploading new images. Multichill (talk) 19:39, 27 July 2010 (UTC)

Festivals uploadsEdit

Since this was suggested here

I sometimes cover festivals (like many other photographs) for the french chapter (Wikimédia France).
This mainly involves two kinds of events:

  • book or comic strip festivals Comédie du Livre of Montpellier, O Tour de la Bulle of Montpellier, Festival of Sollies Ville, Festival of Roquebrune sur Argens, Festival of La Seyne sur Mer, Festival of Luminy etc.
  • manga - anime - japanese culture festivals / convention.: Mang'Azur - Japan Matsuri, Japanîmes (next week) , JapanExpo (in one month)...

In the first kind, there are usually authors taken in photograph, preferably during interviews or autograph session.
For the second kind, there can be notable people to be taken in photograph, usually people performing on stage like 'HITT'. There can be also ambiance photograph to depict those festivals.

Those photograph are also tagged with {{Supported by Wikimedia France}} with a catname corresponding to the eventname concatened wmf (Support or Cover by), depending if an accreditation was needed to take the photograph.

Right now, I am uploading with Commonist a batch of photographs taken the last weekend, I intend to categorize each photographs by the category usually labeled here to category:'surname name'. I also intend to fill the value of the depicted parameter for each photographs of author, so this can be displayed in the special template corresponding to all those images.

Esby (talk) 22:04, 3 June 2010 (UTC)

Upload doneEdit

Upload in progress or to be doneEdit

Files that needs to be uploaded

  • Category:Toulouse Game Show 2010 - Game - Anime related Convention of Toulouse - about 130 photographs - concert / personality / ambiance, upload in progress.
  • Comic strip festival of Luminy (Photographs taken, but not yet uploaded on Commons).

Event cover plannedEdit

Photograph not taken yet.


Hi, just wondering if there should/could be done something about the categorizing in both Category:Comédie du Livre 2010 and Category:Comédie du Livre 2010 - Supported by Wikimédia France. Should the description-template categorize the file, and/or should the category name include it's supporter ? Just wondering. –Krinkletalk 16:55, 5 June 2010 (UTC)

This image was taken by someone external to Wikimedia France, so the file is in the normal category while not being in the one supported by Wikimédia France. This is done in order to limit the files directly in Category:Supported_by_Wikimedia_France. Esby (talk) 20:34, 6 June 2010 (UTC)
Assigned to Progress Bot name Category
Esby (talk) 22:11, 3 June 2010 (UTC) Upload done. irrelevant ( still categorisation or edits might be performed by user:esby-mw-bot All images are categorized in Category:Comédie_du_Livre_2010_-_Supported_by_Wikimédia_France

US Air ForceEdit

And yet another branch of the military to pick clean. The US Air Force has a set of photos at their site. Not sure how much photo's we're talking about. The same logic as the Fema, navy and army can be used. Crawl all the galleries and extract the id (simple regex). The id's can be used like to get the image and metadata (beautifulsoup). The gallery structure can be used to make a temporary category structure. The name of the files should be like "US Air Force <id> <title>.jpg". Of course duplicate checking should be enabled like in all the other bots.

The source will be available here. Multichill (talk) 18:11, 23 October 2009 (UTC)

I felt like building something so I build two things:

  1. A category generator. I used it to generate a tree under Category:Images from the US Air Force based on the galleries at the website. {{Air Force header}} does all the magic. You can view the full list at Special:PrefixIndex/Category:Images from the US Air Force.
  2. A upload bot

The upload bot can work on these categories and fill them. If subject is set, it will upload to the subject category right away. Some example images can be found in:

Would be nice to set the subject on a lot of these categories before actually uploading the images. What do you think? Multichill (talk) 17:48, 16 January 2010 (UTC)

Assigned to Progress Bot name
Multichill On hold (Commons is short on disk space) BotMultichillT

US ArmyEdit

The Fema request got me started. The US Army has a nice set of images at . Judging from the latest id it's around 50.000 images. The bot should probably consist of two parts

  1. Loop over the search pages and find the location of all images like . All pages seem to be in the form
  2. Work on all these images

Shouldn't be to hard with some regular expressions for the first part and screen scraping with beautifulsoup for the second part. Multichill (talk) 22:07, 14 October 2009 (UTC)

I wrote a bot for this (source). It basicly works the same as the other USgov bots. The main difference is that I'm unable to extract category information. The title is based on the title field, and as a fallback, the description. The first images can be found in Category:Images from the US Army needing categories as of 23 October 2009. Multichill (talk) 14:01, 23 October 2009 (UTC)

No response so I slowly fired up the upload. Multichill (talk) 11:31, 25 October 2009 (UTC)


Assigned to Progress Bot name
Multichill On hold (Commons is short on disk space). BotMultichillT

Navy News ServiceEdit

The Fema request got me started. The US Navy got about 75.000(!) images available at just waiting to be copied to Commons. I wrote a bot based on the FEMA upload.

  • The bot loops over all the images.
  • From the META fields I get the url, long description and short description
  • A regex extracts the date from the long description
  • A regex extracts the author from the long description
  • A regex extracts the location from the long description
  • The title is constructed based on the url and the short description
  • Image is uploaded and ends up in one of these categories

This is just a general overview. The source is available here. Multichill (talk) 16:48, 16 October 2009 (UTC)


  1. There is a template for the US Navy images {{ID-USMil}} you could use this or create one only for the US navy and add it in the source.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
    Looks nice. I'll probably use it for the next files. Multichill (talk) 17:30, 19 October 2009 (UTC)
  2. I'm not sure if the ID should be stated first. Like on File:000629-N-5686B-001 Sailor Returns Home.jpg I think US Navy should be before the numbering.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
    Sure, so it would be File:US Navy 000629-N-5686B-001 Sailor Returns Home.jpg in this case. Multichill (talk) 17:30, 19 October 2009 (UTC)
  3. Some images like File:020121-N-5563S-003 .50-Caliber Machine Gun.jpg don't have date and location. This is because the date isn't in brackets. It is however between ")" and "--" or ")" and "–". I also don't know why the location isn't grabbed...--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
    Looks like I have to improve the regex to catch these cases. Both date and location use the same regex for maching. Multichill (talk) 17:30, 19 October 2009 (UTC)
  4. You don't need the ID in the description. Create or use a source template for the upload where the ID is stated and a link to the site is given.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
    I do to prevent naming collisions. Multichill (talk) 17:30, 19 October 2009 (UTC)

Ok. Bot is changed to include the suggestions. Now it's running again. Multichill (talk) 19:50, 21 October 2009 (UTC)

  1. One small problem - it doesn't seem to like image descriptions with quotation marks in them, and so cuts off partway through - eg/ File:US Navy 071227-N-4014G-037 An MH-60S Seahawk assigned to the .jpg; File:US Navy 071227-N-6125G-184 ailors attached to the Nimitz-class aircraft carrier USS Harry S. Truman (CVN 75) enjoy a USO concert preformed by the band .jpg. Shimgray (talk) 21:18, 23 October 2009 (UTC)
    Ah, an escaping problem. This probably only happens to a couple of images. We can always move them to a better name if the current name is not clear. Multichill (talk) 21:58, 23 October 2009 (UTC)

I worked my way through some of the aircraft carrier categories. Interesting! Still a lot of additional categorization to do though.

  1. For one carrier, generally several temporary "aboard USS .." categories could be combined into one.
  2. The ship based temporary categories seem more helpful than the ones for stable locations, e.g. "Arabian Gulf".
  3. For the captions, maybe {{original caption}} could have been used.
  4. Minor point: given the small size of the license tag, it could have been included directly into {{information}}.
  5. It might be worth going through the descriptions by bot to wikify names of units, ships, etc., linking them to the corresponding articles at en.wp

-- User:Docu at 06:43, 25 October 2009 (UTC), edited 06:59, 25 October 2009 (UTC), 08:10, 25 October 2009 (UTC)

  1. You should probably move it to topic categories right away. Maybe you could use a bot.
  2. Stable locations only seem to be useful for photos on land.
  3. That could have been used, but I didn't.
  4. That could have been done.
Nice to see people working on this! Multichill (talk) 11:18, 25 October 2009 (UTC)
I mentioned 3 and 4 mainly for future uploads. BTW I made a bot request at Commons:Bots/Requests/vertrepbot. -- User:Docu at 16:19, 26 October 2009 (UTC)

Hey Multichill, thanks for uploading all those Navy pics, I'm sorting through them now, looking for possible FP candidates.

A few things I found will I was looking through them:

  • 1. Some images seem to have had something go wrong with their title during the upload; For example this one and this one. I'm assuming you meant to have 's around some words ('Sea Sparrow') but something's gone awry. You might want to fix it before you move on to the Army upload.

Hope this helps.

Sarcastic ShockwaveLover (talk) 22:09, 26 October 2009 (UTC)

Hi Sarcastic ShockwaveLover,
  • 1.: "^ldquo" in the title seems to come from "“ in the description.
  • 2.: Thanks for noticing. It should be fixed now. It's was correct when Multichill uploaded it. ;)
  • 3.: If you look at the file size of this file, you will notice that File:USS Port Royal (CG 73) aground.jpg, isn't a duplicate, but a scaled-down version. File:USS Port Royal (CG 73) aground.jpg should be tagged with {{duplicate}} for deletion. The new file is an improvement over the old one. I found a few ones too and tagged the old ones for deletion.
-- User:Docu at 14:41, 27 October 2009 (UTC)
    • I'd rather you didn't delete this one, I rotated it and cropped it to correct the tilt, I'm planning on nominating it for FP status. Sarcastic ShockwaveLover (talk) 08:57, 28 October 2009 (UTC)
      • 3. yes, I listed it under "other versions" instead. Looking closer at it, it doesn't appear to be an exact duplicate or scaled down version. The few images that silp through the bot's check are some where the file was edited (and not even scaled down), e.g. this one and File:AAV Embarking.jpg. -- User:Docu at 12:33, 28 October 2009 (UTC), edited 18:22, 28 October 2009 (UTC)
  • The maximum length of file names that are being used seems to be 231 chars. While sometimes in the distant future all filenames have to be that long to be unique, I wonder if we couldn't [have] kept them shorter in the meantime. -- User:Docu at 17:34, 28 October 2009 (UTC) (inserted "have" on 04:42, 30 October 2009 (UTC))
    • That would mean a mammoth renaming effort. It's already going to be huge just categorising them. That said, I think files like this one should be renamed. Also, perhaps we could put the categorisation/cleanup effort on the front page, much like that large German upload a few months back? It might help get some more people involved. I've categorised about 100 150 images so far using HotCat (thank God for that tool), but that's just a drop in the proverbial bucket. Sarcastic ShockwaveLover (talk) 11:58, 29 October 2009 (UTC)
  • No, I don't think we should move them. The advantages of the current file names are that they are generally descriptive titles and it's the title the Navy published it with.
  • The ^ldquo,/^rdquo,/^rsquo, could be fairly easy to fix (by an adminbot), there are approx. 3500 (Special:Search/^ldquo, OR ^rdquo, OR ^rsquo, prefix:File:US Navy 0). As we generally don't do cosmetics on file names, we could leave them that way though.
  • The categorization part should be easier once my bot has created additional categories (see here). I probably should get to work on that.
  • Besides these hardware based categories, there is still much to be done to create categories for specific events/operations etc. (e.g. Category:Vertical replenishment). It's fairly easy to build temporary categories from search results. One just needs to go through the category afterwards and remove a few false positives, most categories of FEMA officials were done that way. What generally threw it off were images of "A. on the phone with B." or "A. B. and C. (not pictured) attending Z.)", but they were easy to sort. If you want me to prepare you some temporary categories to review, I'd be glad to do so. -- User:Docu at 04:42, 30 October 2009 (UTC)
      • Please and thank you! Sarcastic ShockwaveLover (talk) 12:10, 30 October 2009 (UTC)
        • I did the test run for the bot. BTW which searches would you want me to put into temporaries categories? e.g. I used something like this to extend that cat. -- User:Docu at 15:29, 30 October 2009 (UTC)
-- User:Docu at 18:43, 31 October 2009 (UTC)
  • I might be a bug in preview/thumb, looks ok in full resolution. -- User:Docu at 18:46, 31 October 2009 (UTC)
  • I can confirm that are really incomplete (also in full resolution). But on the source its the same. Only the preview on source is fine. I cant fix. Only crop will be a solution. --Slick (talk) 07:56, 15 September 2012 (UTC)
  • BTW These days, Emijrpbot is fixing the date format on this batch (sample: [50]) -- User:Docu at 14:00, 9 January 2010 (UTC)
  • Can I archive this?--Diaa abdelmoneim (talk) 08:30, 25 April 2010 (UTC)
    • The question is if the categorization has to be cleaned up before archiving or not.
      Both the Starr batch and the 1st Geograph upload still have quite a few things to clean up, but the initial upload is done and further files could be in a new request. The first of these two had been archived, the second one not.
      The Navy news one still has some 4000 location categories, some of which should be merged others removed (I merged approx. 100 of these into 30 one or two weeks ago). Avron is doing quite a lot of categorization on these, but personally I had lost interest sometimes last year. -- User:Docu at 08:40, 25 April 2010 (UTC)
      • Looks like most images get categorized by ship. I'm now adding temporary ship categories to help in this process. Multichill (talk) 08:53, 1 May 2010 (UTC)
        • Some of the location categories were already in the form "Images from US Navy, Location Aboard <ship name>", with tons of spelling variations. Many of these were merged into "Aboard <ship name>" categories. Would be great if you'd help with that too. -- User:Docu at 09:14, 1 May 2010 (UTC)
      • I spend some time on categorization. I first added a lot of temp ship categories and than moved images to real ship categories. I now changed the upload bot to first try to find a real ship category, fallback to a temp ship category or add a location category if no ship is found. I also nuked a lot of not so useful location categories (mainly seas). The aboard categories still have to be done. The same strategy could probably be applied. So the next step in big categorization is to either match a temp category with a real category (if it makes sense) or empty it out and nuke it (if the category doesn't make sense). Probably makes sense to start with the biggest temp categories, who wants to help? Multichill (talk) 11:36, 23 May 2010 (UTC)
Assigned to Progress Bot name
Multichill Finished the initial upload, now resyncing and categorization BotMultichillT

Metropolitan Museum of ArtEdit

This is one I've been working on for a while. The Metropolitan Museum of Art has a large collection of about 60,000 images of works in their online collection database, at a variety of resolutions. These have to be filtered carefully by hand because they have many photographs of 3D works and many non-PD works.

For most images that have a high-res version, it is easy to extract it by simply taking the URL of the thumbnail or regular image and changing "thumb" or "regular" to "zoom". This trick works for all images except those in the "The Libraries" collection (which only contains 50 images). Many images contain a color guide and false copyright statement that will need to be cropped at some point.

Did u try to contact them? Maybe they'd like to help.--Diaa abdelmoneim (talk) 08:51, 3 July 2009 (UTC)
User:unforth of Flickr also has an extensive collection (several hundreds) of MET pictures like this one. Teofilo (talk) 08:17, 12 September 2009 (UTC)
Any updates? Multichill (talk) 22:55, 20 December 2009 (UTC)
I forgot about this one... for this number of images license sorting by hand is infeasible, so I need to come up with some kind of automated scheme for this. I may also want to repeat the rip from the beginning, since there may be new images since I started this. Dcoetzee (talk) 12:30, 31 January 2010 (UTC)

Okay, on reflection I think the best way to handle this is to avoid trying to handle all the images in the database at one time - instead it's best to start with the "low-hanging fruit" of categories and/or searches that are known to be all PD. I'll take another look at this, and I'll also post here about how to extract high-resolution images efficiently - I don't believe it's necessary to stitch for the MET. Dcoetzee (talk) 19:23, 4 March 2010 (UTC)

Okay, here's the skinny on how to download these images at "zoom" resolution:
  • Visit the objectview page for the image, e.g. [51], from search results or browsing.
  • Save the image URL of the preview thumbnail, for example
  • Click the "zoom" icon to go the zoom view. If there is no such button then only the preview is available.
  • View source. Search for "EQZoom(" in the page text. Take the first parameter to "new EQZoom", and substitute it for the filename at the end of the preview thumbnail URL above (it may be the same). Also, change "regular" to "zoom". This will grab the zoom version of the image, which may be either a TIFF or a JPEG. Zoom resolution varies widely between images and is high but not really high. In this example the URL is
Dcoetzee (talk) 20:06, 4 March 2010 (UTC)
I would rather do a partnership project with the MET instead of stripping clean their site. I met a lot of MET people (:P) in the US. I'll contact them and see what's possible. Multichill (talk) 16:50, 30 April 2010 (UTC)
Bit busy and doing this from Europe isn't that practical. Maybe WMNY can step in here? Multichill (talk) 20:11, 5 January 2011 (UTC)
Assigned to Progress Bot name
User:Dcoetzee License sorting User:Dcoetzee



Batch upload of all suitable images in These images were created for

I'm using BotMultichillT (talk · contributions · deleted user contributions · recent activity · logs · block log · global contribs · SULinfo) for the uploads.

How the bot worksEdit

The source. The bot works like this:

  • The bot loops over all the images in the Flickr group
  • The bot checks if a suitable license is on the image
  • The bot checks if an allowed tag is on the image and not a disallowed tag
  • If it's a suitable image the bot will pull the description from Flinfo
  • The description is improved based on the added tags by using a template trick and User:Multichill/WLANL/descriptions
  • Categories are added using the same trick and User:Multichill/WLANL/museums
  • The image is marked as reviewed by Multichill (talk)
  • The categories are filtered using the functions in
  • The filename is derived from the username and the title assigned by the user

For more details see the source code.


I received a lot of permissions, I used a modified (hacked) version of flickrripper to upload these images. Also received permission from the remaining museums. I'm uploading them now. I will do a flickrripper run over the whole pool at the end to catch images not tagged correctly. Multichill (talk) 12:48, 29 October 2009 (UTC)


  • SUPER! --MGA73 (talk) 18:25, 19 August 2009 (UTC)
  • Some Suggestions:
  1. Since the description is in Dutch I suggest using {{nl|}} for the descriptions.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
  2. Also {{WLANL}} should have a link to Wiki Loves Art project page or Flickr group.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
  3. I suggest moving {{WLANL}} to the source parameter in the information template and adding |url= as a parameter in {{WLANL}} with the Flickr source link.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
  4. Flinfo doesn't add a description using the image's name so for it doesn't add "Amandelbloesem, Vincent van Gogh (1890)" which would be a good description.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
  5. Generally, I think uploading to Flickr isn't the best choice since it reduces image quality for non pro members like here . Maybe offering Flickr Pro accounts to participants would dissolve this issue :-)--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
Thanks for your input.
  1. Good point. I will change this
  2. {{WLANL}} sure needs improvement. This is just a quick hacked up version I created because otherwise I would have a red link. It's on my list
  3. I like it at the bottom because it doesn't clutter op {{Information}}
  4. This should be changed in Flinfo. I'll do a request at User talk:Flominator#Flinfo request
  5. Yes. Not having the originals sucks. I'll leave a note asking users to upload the original version if possible. It would be nice if these kind of projects would upload to Commons directly, but with the current tools that's kind of hard.
Multichill (talk) 07:10, 20 August 2009 (UTC)
All images use {{nl}} now. Flominator changed Flinfo to also pull the description from the title.
I fired up the bot to upload. Issues i'm currently aware of:
  1. Some images get tagged as uncategorized, but these images are categorized.
  2. Some names are not that good
  3. I have some name collisions.
Issues are not that serious and can be fixed later on. Multichill (talk) 14:01, 22 August 2009 (UTC)

There seems to be a problem with getting descriptions from titles. It adds the title in the description, but does so twice. This happened to multiple images:

  1. File:WLANL - wendier - de Jonge.jpg
  2. File:WLANL - wendier - Zielenprauw .jpg
  3. File:WLANL - wendier - Verenvitrine Suriname.jpg
  4. File:WLANL - wendier - Danseres Nias.jpg
First part is from User:Multichill/WLANL/descriptions, second part is from the title. Multichill (talk) 16:53, 22 August 2009 (UTC)

Last update?Edit

I received a couple of permissions. Don't expect to receive more of them. I just have to upload the images for which I received permission and do one final run to see if I missed anything. When I've done these two things, this batch upload is (finally) finished. Multichill (talk) 22:59, 20 December 2009 (UTC)

Is the upload still in progress?--Diaa abdelmoneim (talk) 12:43, 26 March 2010 (UTC)
I should probably do some last checking. Multichill (talk) 17:40, 26 March 2010 (UTC)
Assigned to Progress Bot name Category
Multichill Waiting for permission in OTRS BotMultichillT

Images from NYPL Digital GalleryEdit

Assigned to Progress Bot name
Dcoetzee Uploading Dcoetzee

Will be great if we batch upload PD-images from NYPL Digital Gallery - NYPL Digital Gallery provides free and open access to over 685,000 images digitized from the The New York Public Library's vast collections, including illuminated manuscripts, historical maps, vintage posters, rare prints, photographs and more. --Butko (talk) 14:45, 14 April 2009 (UTC)

  • This collection turned out to be more promising than I supposed. They use LizardTech ContentServer to serve up their images, whose API is described here. Here's how you extract original TIFFs at full size: first use a "browse" query to obtain some XML including the image dimensions, like this one [52]. The folder name and image name can be obtained from URL of the zoom view. Then, use a getimage query like this one [53] to get the full size TIFF, specifying the dimensions from the previous query. Tada. Close examination shows no artifacts in the TIFF - these are original scans (internally, they are SID images). The first one I extracted was 3845 × 4947, about 60 MB as a TIFF, and 27 MB as a PNG (which you can preview here). They throttle you at 80 KB/s per transfer, but they do allow simultaneous transfers; any way you look at it though it would take a long time to fetch all the images we need. In light of the long download time per image, we're going to want to license filter before downloading. Dcoetzee (talk) 07:00, 15 April 2009 (UTC)
  • Update: their complete collection of high-resolution images is browsable here. This can be used to easily obtain a list of folder-name pairs. I'll presently begin downloading. Dcoetzee (talk) 06:23, 16 April 2009 (UTC)
  • Update: a better way to download these is to use the "getfile" function to get the raw .sid files, which are highly compressed (as in [54]) and then use LizardTech's command-line decoder to convert to TIFF ([55]). This is a quicker download and doesn't even require the dimensions. Dcoetzee (talk) 22:12, 16 April 2009 (UTC)
  • I'm still in the middle of grabbing these. Enumerating IDs turned out to be trickier than I thought, because the folders are so large the browse interface times out on them. I ended up enumerating them instead using wildcard searches on single letters. Even just looking at the high res images, it's a lot of data. All told we're talking at least 100 GB in PNGs, and I'm pretty sure all of the high-resolution images are public domain works, although that will require further confirmation. It's an excellent source. Dcoetzee (talk) 06:59, 22 April 2009 (UTC)
  • Update: I've enumerated about 65000 high-res images, and am in the process of downloading and converting them to PNGs, slow enough to not overwhelm their bandwidth. So far I've retrieved about 17250, occupying 323 GB. I'm also in the process of generating image descriptions of them based on NYPL metadata. I've created Category:New York Public Library Digital Gallery and plan to start uploading some of them soon. Dcoetzee (talk) 13:18, 16 May 2009 (UTC)
  • Update: I've had contact from a representative of the NYPL, who has been very helpful in furnishing IDs and sanctioning the sharing of their public domain images. He gave me a list of about 40,000 stereographs which I can begin uploading immediately as soon as I put together a suitable fully-automated upload tool for the task. Dcoetzee (talk) 21:43, 25 June 2009 (UTC)
    • Great work. I think this is good news and I'm very happy that someone over there is nice enough to help out.--Diaa abdelmoneim (talk) 20:46, 27 June 2009 (UTC)
      • I have just begun automated uploading of this collection of 40,000 images, which are being placed along with existing images in Category:Images from the New York Public Library. Each image and its metadata is being downloaded from NYPL on-the-fly. Dcoetzee (talk) 03:11, 28 June 2009 (UTC)
      • Update: I've estimated that at my present rate of upload, the current collection being uploaded (which actually contains 84000 images) will require about 7 weeks to upload, and will occupy about 500 GB. Dcoetzee (talk) 10:38, 28 June 2009 (UTC)

Nice upload, but I have a couple of points you should address:

  1. I don't like the two versions (png & jpg). Who cares about thumbnail size? Are you sure you want two upload two versions of every image? And why not upload the original tiffs for our restoration people?
  2. The files are uncategorized, please tag them with {{subst:unc}} right away.
  3. How are you going to get these files categorized? The images should probably all in a subcategory of Category:Stereo cards and in one or more topic categories
  4. Other versions field seems to be broken that was an easy fix. Multichill (talk) 11:30, 28 June 2009 (UTC)

Multichill (talk) 11:20, 28 June 2009 (UTC)

More to question:

  • Do u mean by 84000 images, 42000 png and 42000 jpg?
  • Why don't u merge the source template into the source field in the {{NYPL-image-full}} template?
  • Does the bot auto categorize?
  • What's the license of these images? why are they pd? I mean why is the original file before the scan pd?--Diaa abdelmoneim (talk) 12:08, 28 June 2009 (UTC)
    • They're all PD due to age ({{PD-1923}}), according to the NYPL, although some of them don't list a specific date on their page (for many of them, you have to click through to the original source description to verify the age). There was one date field that I was not grabbing, which I am currently modifying it to grab. The bot does not do autocategories (I don't have that functionality, and I don't trust autocategories anyway), but I am now automatically marking them as uncategorized. Uploading the TIFFs doesn't make any sense, because they are derived from MrSID files and contain exactly the same data as the PNG files (there is no metadata).
    • I also prefer not to have two versions, but thumbnail size is a very real concern, and unfortunately the software does not support JPEG thumbnails for PNG files. For example, a typical image of width 300 would be about 30 KB in size, which is prohibitive for modem users when many such images are used on a page. When the software adds a proper feature for this, they can all be deleted. Oh, and no, I mean 84000 PNG and 84000 JPEG.
    • Should I be putting these all in the root category Category:Stereo cards? Dcoetzee (talk) 17:25, 28 June 2009 (UTC)


  • I'm currently categorizing to the "Category:Robert N. Dennis collection of stereoscopic views"--Diaa abdelmoneim (talk) 17:22, 28 June 2009 (UTC)
    • I can take care of categorizing by source collection automatically if you wish - please don't go to unnecessary manual effort. :-) Dcoetzee (talk) 17:26, 28 June 2009 (UTC)
      • I started a bot that that does this for the first 1600 images. It would be good if u do this with all your upcoming uploads. And you said 84000 images as a first batch. How many more batches are there? If it is possible for me to assist in the upload I would be glad to do so. Multichil also has a university connection or a very high speed connection I'm sure if we ask him kindly he would help in the upload. If we work together we can upload this in a week. And please don't add the images in the stereo card root category. Just in the Category:Robert N. Dennis collection of stereoscopic views.--Diaa abdelmoneim (talk) 17:49, 28 June 2009 (UTC)
        • Unfortunately that may not be an option, depending on how fast the NYPL wants their servers hit. I can inquire about it. I can deal at least with the Robert N. Dennis collection right now, but other subcollections will have to wait until I see how many collections there are and how meaningful they are. Dcoetzee (talk) 17:53, 28 June 2009 (UTC)
          • So should I keep categorizing the first 1600 images of the batch? I don't want there to be a double category or something. How many images do u upload daily? And how big of a PD collection do they have?--Diaa abdelmoneim (talk) 18:00, 28 June 2009 (UTC)
            • No, I'll go back for them a bit later this week, don't worry. :-) And I'll check for any existing category so double categories will not occur. I upload roughly one image every 50 seconds or 1728 per day (this includes both the PNG and JPEG). I have no idea how large their complete PD collection is, and I don't think they do yet either. Dcoetzee (talk) 18:08, 28 June 2009 (UTC)
  • Could the bot also categorize to location? Like in File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg the location being Michigan? --Diaa abdelmoneim (talk) 18:31, 28 June 2009 (UTC)
  • The past couple of files have been very low res. Is this a mistake by the bot or are these really low res?--Diaa abdelmoneim (talk) 18:34, 28 June 2009 (UTC)
    • Some files do not have SID files available from the NYPL - for these I upload the highest available resolution, which is about 700px wide. And yes, I may be able to extract the rough location from the Original Source field. For now I must go away but back later. :-) Dcoetzee (talk) 18:44, 28 June 2009 (UTC)

Looks like all images are now tagged with Category:Robert N. Dennis collection of stereoscopic views and {{Uncategorized}}. This seems like a good starting point to me, but i rather have a dedicated uncategorized template just like with Barch and Fotothek. Could you please tag the images with {{Uncategorized-NYPL}}. I'll create the remaining structure later this week. This will prevent your uploads from flooding the regular tree and messages like this one. Multichill (talk) 20:07, 29 June 2009 (UTC)

Ok. The basics are there. If everyone agrees we only need to run a bot to change the old uploads ( -lang:commons -family:commons -transcludes:NYPL-image-full -regex -nocase "\{\{Uncategorized\|" "{{Uncategorized-NYPL|" ). Multichill (talk) 20:21, 29 June 2009 (UTC)
No problem, I'll take care of everything. :-) Dcoetzee (talk) 23:07, 29 June 2009 (UTC)

subject Categories

Could u or Multichil create a bot that automatically adds a temporary subject category to each file that would be checked and if correct be moved into a permanent category like what has been done with Fotothek or BArchive? I'm not sure we should wait till the first 80,000 images are up and then start cating. BTW the NYPL has started receiving funds again from the city of New York so they might stop throttling downloads. It would be beneficial if u would inquire about that.--Diaa abdelmoneim (talk) 20:22, 30 June 2009 (UTC)

I'd be happy to do this but haven't seen this type of thing before - is there an example or description of this process somewhere? Many of these can (if nothing else) be automatically categorized into the category for the city where they were taken. Dcoetzee (talk) 22:15, 30 June 2009 (UTC)
Commons:Fotothek has categories assigned to their files based on the description. In "Original source: " it is mostly written at the end what the subject or where the photo was taken. Dividing the image in such categories would make further categorization easier. So for example File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg has "Original source: Robert N. Dennis collection of stereoscopic views. / United States. / States / Michigan / Stereoscopic views of Lake Superior Scenery." You could grab from there "Stereoscopic views of Lake Superior Scenery" cause it's after a slash and before a bracket. The category would later be reviewed and approved by a user. The temp category would be "NYPL_Stereoscopic views of Lake Superior Scenery" This would serve as preliminary categories.--Diaa abdelmoneim (talk) 22:23, 30 June 2009 (UTC)
That makes sense - incidentally, is there an easy way to merge a category into a different existing category? Will CommonsDelinker do this? For many of these the corresponding existing category is obvious, and automated merging would be desirable. Dcoetzee (talk) 22:40, 30 June 2009 (UTC)
I'm currently automatically subcategorizing the images and placing the categories in Category:Temporary categories for images from the New York Public Library. I'm also updating the uncategorized tags and Robert N. Dennis category on my initial uploads. Dcoetzee (talk) 01:55, 1 July 2009 (UTC)
See User:CommonsDelinker/commands/documentation#Categorize uncategorized images. Multichill (talk) 19:37, 1 July 2009 (UTC)
Is it possible to have a template like the one found on,_location_Dresden ? so that it makes categorizing easier?--Diaa abdelmoneim (talk) 09:37, 2 July 2009 (UTC)
That sounds like a good idea. However, I'd want to be sure first that CommonsDelinker recognizes the new Uncategorized-NYPL... Dcoetzee (talk) 10:55, 2 July 2009 (UTC)
Dcoetzee, you should probably only add Uncategorized-NYPL if you can't don't have a proper temp category. This way we can just use the normal category move bots to move images from a temp cat to a proper topic category. Multichill (talk) 11:01, 2 July 2009 (UTC)
Dcoetzee, can we delete a temp category once it's cleaned out or do you expect more images to go into these categories? Multichill (talk) 17:03, 2 July 2009 (UTC)
On the first point, already done, on the second - I have no idea. But it'll get recreated as necessary anyway. Dcoetzee (talk) 18:23, 2 July 2009 (UTC)
  • For stereoscopic view #9466, I made a gallery with all 80 versions, i.e. 10 (files) * 2 (file types) * 2 (sterescopic) * 2 (it's "Mirror Lake"). I'm wondering if I should also put them into a specific category with 9466 in its name. -- User:Docu at 11:12, 25 April 2010 (UTC)
    • Ideally we would also have at least one (non-stereoscopic) image selected from them. -- User:Docu at 11:18, 25 April 2010 (UTC)
  • To make it possible to sort the files into topical categories without overwhelming them, I set the sortkey in the template. They now appear after other images, e.g. in Category:Mirror Lake (California). -- User:Docu at 11:12, 25 April 2010 (UTC)

NYPL and PD-Scan

Dcoetzee I'm a little unhappy with the way our images are tagged as PD-Scan only. Many of the images don't have their original publish date and someone who looks on the picture can't be sure if it's PD as there is no clear sign of it. For example File:Arch_on_St._George_Avenue,_from_Robert_N._Dennis_collection_of_stereoscopic_views.png has only "Digital item published 5-5-2005; updated 2-12-2009." which doesn't assert PD-old. There is an NYPL page about the collection which may hold clues about why the collection is PD. I think after we clear why the collection is PD we should create a template stating why it is PD, which goes along the PD scan. --Diaa abdelmoneim (talk) 10:58, 4 July 2009 (UTC)

  • I agree, the NYPL image metadata does not generally contain sufficient metadata to clearly establish their copyright status. I have only the word of the NYPL that these are public domain, and they may not as be as conservative in evaluating copyright status as we are. I don't really want to filter them before upload though, because I'm fairly confident most of these actually are PD and are just missing the metadata to prove it. There are two things I can do here: I can fetch the "Imprint" date from the collection, and I can tag any images that do not have a clear indicator of copyright status for human review with Category:PD files for review. This could prove to be rather difficult though, because dates are specified in a variety of strange formats that are difficult to parse. Dcoetzee (talk) 22:15, 4 July 2009 (UTC)
    • Or just an OTRS confirmation, or a rights information page on their site saying "no known restrictions". Don't tag anything please. I'm sure all images are PD but only need a legal confirmation.--Diaa abdelmoneim (talk) 22:19, 4 July 2009 (UTC)
      • As far as I know OTRS is inappropriate for public domain images - that's for the copyright holder confirming that they've released a work, and NYPL is not the copyright holder. Their copyright status will need to be confirmed based on the available information, and PD review has already agreed to help me with kind of thing in the past. As for "no known restrictions", every one of these image description pages says that in its HTML metadata - their evaluation can't be trusted. Dcoetzee (talk) 23:30, 4 July 2009 (UTC)

What's the status of this upload? Multichill (talk) 12:29, 17 September 2009 (UTC)

Sorry for the delay. I'm working on getting a Toolserver account so I can continue the upload with my existing tools and Mono, or with a rewrite of the tools. It should be able to pick up right where I left off. I don't have enough bandwidth at home to do the upload. Dcoetzee (talk) 08:48, 25 September 2009 (UTC)
  • Any update ?--Diaa abdelmoneim (talk) 12:42, 11 December 2009 (UTC)
    • The NYPL upgraded their software and it's no longer possible with the new default settings to download the images in the same manner in which I originally did, so I've been forced to suspend progress on this. I asked Josh from NYPL about this and on Jan 15 and he said: "No progress on that front, but I might actually be able to open another door within the next month or so (might just be able to get you direct access to a batch of jpg full-res derivatives)...will follow up with details soon..." Dcoetzee (talk) 12:32, 31 January 2010 (UTC)
  • I just checked and at some point in the last few months the NYPL listened and re-enabled the SID interface, allowing this upload to continue, so I'm starting it back up. Dcoetzee (talk) 00:02, 16 April 2010 (UTC)
    • Finally!!! Please change the status to uploading when u do. =) Congrats.--Diaa abdelmoneim (talk) 07:54, 16 April 2010 (UTC)
      • Done :-) I can only upload at a fast rate when I'm at school since my upload bandwidth at home sucks - but I'm there pretty often and my updated tool uploads at a rate of about 5-6 image pairs per minute there. Dcoetzee (talk) 10:12, 16 April 2010 (UTC)
      • Another small update on this - it turns out I've only been uploading the fronts of these cards, and not the back. This is probably a good thing, since the backs are usually just blank with a bit of writing, and not nearly as useful for educational purposes. Because of this, there are actually only 42,000 images, not 84,000, in the stereographic collection. Dcoetzee (talk) 00:42, 17 April 2010 (UTC)
  • Still working on this upload. I'm bandwidth-limited at the moment so it's taking quite a while. It's probably more than half done. Dcoetzee (talk) 02:07, 9 November 2010 (UTC)
    • I've now finished all of the stereographic views from the New York Public Library that were supplied by Josh Greenberg. I will contact Josh to see what other images he has to offer. I'm open at this point to feedback about how I can improve the process (besides obviously uploading images more quickly - I think this is a good time to port the tool to Toolserver). I'm also considering uploading only high-quality JPEGs, instead of both a JPEG and a PNG version. Let me know what you think. Dcoetzee (talk) 07:19, 14 November 2010 (UTC)

Hello. What is the status? InverseHypercube (talk) 05:51, 2 April 2011 (UTC)

This project has probably seen its better days, but I found this through searching for NYPL images. I tried all the parameters given in the LizardTech Express 8 manual with this image, but I simply am not able to download it. Even doing what the manual says on downloading the file itself (getitem?cat=*&item=*.sid) or with the parameters of width and height, I only get "Invalid dimensions".
I'm especially interested in all the New York real estate maps, including the famous Sanborn Maps, a collection with unprecedented detail of buildings throughout the years.
The system NYPL uses is almost misanthropic. If the data is free and open, then it shouldn't be behind artifically restrictive systems. And then put a fee on their own file-acquisition service.
I'd be glad to help, but unfortunately I have no idea how to code a bot, and I have very little coding skills. As to file formats, I'm partial to retaining the highest possible quality. Barring TIFF's, a lossless PNG would be the next choice. ~ Nelg (talk) 23:25, 31 March 2013 (UTC)

Minerals from various sources on mindat.orgEdit

Besides Rob Lavinsky other uploaders to have either published their work on a free and usable license or have granted an OTRS permission to their images.

This upload will mostly use the same procedures, categories and file names as the upload of Rob Lavinsky's pictures from

Progress of the request (failed, uploading, coding, done)Edit

Assigned to Progress Bot name Category
Reinhard Kraasch in progress RKBot Category:Files by Leon Hupperichs
Category:Files by Christian Rewitzer from mindat

Quantity structureEdit

  • 200 files by Christian Rewitzer
  • 702 files by Leon Hupperichs

(several of them have already been uploaded)


None so far.

Test uploadEdit

  • Hi Reinhard, wonderful pictures ;-), but I think, it's better, to have the otrs-permission into the field "Permission" and the mineral category (here: Category:Sonoraite) is missing. -- Ra'ike T C 22:31, 26 March 2011 (UTC)
    I used the same script as with the Lavinsky upload - so the description page looks almost the same as with these images, e.g.: File:Spodumene-18945.jpg. The missing mineral categories can be fixed by the bot (as with the Lavinsky upload), but I guess it's easier to do it by hand, since it will probably only very few categories. --Reinhard Kraasch (talk) 20:36, 6 April 2011 (UTC)
    Ok, no problem (nu' isset einmal so ;-) ). I also think, to add the missing mineral categories by hand isn't a big problem (shouldn't be so much), but the locality-cats could make the bot better. greetings -- Ra'ike T C


Assigned to Job Status Comments
Reinhard Kraasch Image (and description) download from Status:    Done 12:24, 15 March 2010 (UTC) All images have been downloaded
Reinhard Kraasch Generate image descriptions, autotranslate locality info Status:    Done 19:45, 21 March 2010 (UTC)
Reinhard Kraasch Generate and autotranslate category info Status:    Done 19:45, 21 March 2010 (UTC)
Reinhard Kraasch Test upload Status:    Done 21:38, 26 March 2011 (UTC) (2 images)
Various Discussion of test upload Status:    Done 19:43, 10 April 2011 (UTC)
Reinhard Kraasch Actual image upload Status:    Done 21:35, 10 April 2011 (UTC)
Reinhard Kraasch Identify duplicates Status:    Done 19:39, 12 April 2011 (UTC)
Reinhard Kraasch Generate missing categories Status:    Done 21:17, 12 April 2011 (UTC)



A set of 140.000 Dutch press photos from the period 1959-1989, made available under CC-BY-SA license by the Nationaal Archief, at [56]. The upload contains a (Dutch) description, date and licensing information, and some suggested categories, which are however in Dutch and often not very useful anyway. All images are put in Category:Images from Anefo and Category:Uncategorized images from Anefo.


  • Please read Commons:Guide to batch uploading
  • Change the naming of the files to <title> Anefo <id> and don't make the titles so short. Current names mess up sorting (everything at A)
  • Use {{nl}} (Update, you should use it like {{nl|1=}})
  • Include deep links to the originals (id should resolve to a location)
  • Use the attribution field of {{cc-by-sa-3.0-nl}} to include the correct attribution
  • You should probably make a Partnership template
  • Some statistics on the subjects would be nice for easy mapping to real categories. You should probably not mix subject and coverage. Doing 140.000 files by had is too much. You do that by making a list of top subject and a list of top coverage. You ask users to map these to real categories (where possible) and use that at upload. That's a lot less work than doing everything by hand.
  • Please apply these corrections to the already uploaded files Multichill (talk) 12:32, 17 June 2012 (UTC)
    • I don't think file naming and sorting is much of an issue. Other batch uploads are done in a similar way.
    • Please use {{nl|1=}} rather than {{nl}}.
    • I mapped some of the suggested topics at Category:Anefo temporary redirects. The redirect bot would eventually move them to the correct categories. Once the upload is finished, these can be deleted. --  Docu  at 13:17, 17 June 2012 (UTC)
Assigned to Progress Bot name Category
Andre Engels running, about 1500 done (17-06-2012) Robbot Category:Uncategorized images from Anefo


  Due to Office action by unelected Wikimedia Foundation employees which cannot be appealed, nor the evidence examined by those found guilty in secret of something unspecified using unverifiable information, this highly successful content generating project which exceeded its targets, has now been abandoned. If as an unpaid volunteer, you wish to start a similar project, please create a new batch upload project page. Thank you (talk) 14:24, 11 March 2015 (UTC)
Aim: To upload 100,000 identified aviation photographs by amateur and professional photographers.
Exemplar photo: Italian military MB-339's flying in formation at Brindisi Papola Casale, Italy, by Aldo Bidini


These are amateur collections of photos posted to aviation websites. The fits the aim of Commons to preserve a comprehensive collection of photographs against every in-service aircraft type for the purposes of educational re-use. This is a large project and is planned to span many months, mainly throughout 2013 but is likely to extend into 2014.

  • Which license tag(s) should be applied?

The photographs are not all available on a free release, individual OTRS tickets are needed to release these. Those already available were released on {{GFDL}}. Credit templates with the licence are individually created once the OTRS ticket is approved, for example {{AlanBrown}} which is now used on over 1,000 photographs.

  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

The standard {{Infobox aircraft image}} is applied to these batch uploads and populated as much as possible from the forum description/metadata. This includes location, aircraft id, photo description, photo date, photographer, gallery page, image source page and construction number.

Categories exist for most aircraft ids and are added automatically. Where found to be missing, these are picked up and created by volunteers manually.

  • Other conventions

Multilanguage: To avoid complex issues with filenaming and uploading, the module is used on the filename to find the best meaningful transliteration for accented or other characters to the standard ascii set. For example "Москва" should be transliterated as "Moscow". This only applies to the filename, text in the description fields will be identical to the source. This only applies when the source forum is predominantly in English, for sites such as where a non-English language is consistently used, filenames will be based on the source language to avoid the high probability of unacceptable translation errors.

ID: The filename ends with the unique database identity of the photo, being a string of 7 digits. For the different sources the naming convention is:

Source website Filename format File:<aircraft>, <airline> AN<id>.jpg File:<aircraft>, <airline> JP<id>.jpg File:<aircraft> <registration>, <location> RP<id>.jpg File:<aircraft> <airline> <registration>, <location> PP<id>.jpg

Note if <aircraft> looks unrealistically short, then location is used instead. Sometimes this is due to the photo being of an airport rather than an aircraft. If <airline> is unrealistically short, then it is left out as optional.

EXIF data: After the second tranche of uploads had started, it was noticed that the Python Image Library that was semi-automatically detecting and cropping the credit bar from images was dropping the EXIF data in the process. A separate routine uses exiv2 to copy EXIF data from the original file to the cropped version before upload and this same routine is being used to improve previously cropped files. Where the EXIF data is less than 3,000 bytes in size, it is skipped as trivial (when under 4,000 bytes, metadata normally appears to be that introduced by image processing rather than by the original camera); the skipped files could be retrospectively fixed if there is a later rationale put forward for doing so. Some of the variance in cropped jpeg filesize may be down to embedded thumbnails in the original being lost in the cropped file, this is an optional feature of jpeg files but makes no practical difference to files hosted on Commons.


  1. Negotiation with photographer on aircraft forum, resulting in an email to OTRS with a release statement and a credit template on Commons referencing the ticket number. Add to approved list of authors sorted by forum/website.
  2. Set up python code for specific forum to generate text for the image pages, as layouts will vary. This relies on to turn the html into Python friendly arrays.
  3. Upload sets by photographers in the backlog (organized by an off-wiki Google spreadsheet). If a category for the aircraft model exists, it is added at this point, along with any category for the photographer.
  4. Crop any credit bars and remove watermark templates. This works as a separate process and need not be run at the same time as batch upload, one benefit of keeping this a separate process is that the original watermarked version is retained on Commons should we ever need to redo the crop using better tools, or if we need to test for duplicates. The nature of cropping needed varies by forum and tests for oddities such as photos with a lot of black pixels or bleed-through. The credit bars are inconsistent as they appear to vary over time with changing styles and vary in nature between forums. Cropping relies on an installation of the Python Image Library and tests for black pixels to double check that a credit bar is there, and what height it appears to be. The cropped image is re-saved as mime type 'jpeg' with a quality of "98" which appears to create a similar sized image (a quality of "99" appears to create an image up to 50% larger than the original). There is no known way of removing overlay watermarks, images with this issue should be avoided as negotiation with a forum member might release the originals without the overlays.
  5. Final check of categories is done manually and should remove the backlog category or any category check template.
Categorization conventionsEdit
Categorization principles
  1. Where the aircraftid/registration field matches an existing category, this is transcluded via the template.
  2. Where there is a match to the airports mapping, a default category of <year> at <airport> is to be added.
  3. Where there is no year given for the image, a default category of Aircraft at <airport> is to be used.
Categorization by airport at upload

A mapping of name variations and ICAO codes is used on upload to check the imageloc field for a search text match (or match to the airport name) and then chooses a category based on the airport name. A reasonable effort has been made to find existing Airport categories, but a lack of standards (including the majority failing to include any standard airport codes, and there being variations as to whether English or other languages are used) makes this uncertain. Ideally {{Airport codes}} should be applied to all airport categories.

For example, if a photo to upload is described as having a location that includes the precise word " LHR" or the string "London Heathrow Airport" (including the space and capitalization), then this returns "London Heathrow Airport" and the routine then adds, and if necessary creates, a year-airport category (such as Category:2012 at London Heathrow Airport), if the year is not given we then add an aircraft-at-airport category.

The year and aircraft-at-airport category choices do not apply to "non-airports" such as museums listed at the end of the table.

The category mapping table can be found on its own subpage at /Airporttable and the raw text that generates it (and that Python scripts rely on) can be found at /Airportlist.

There is an issue of encoding unicode strings in Airport categories, in that the category check functions cannot currently handle them (this might be theoretically fix-able but there is a law of diminishing returns on programmer time).


Issue of embossed watermarksEdit
Example of a photograph with an embossed watermark saying "AIRLINERS.NET" that will require a later correction.

A small percentage of images uploaded will unfortunately have "embossed" watermarks. These appear to have been placed pseudo-randomly across the image and are in addition to the "credit bar" which can be easily removed. As a practice, this seems to have stopped in recent years at both and Though the expectation is that the number of images watermarked in this way will be below 5%, this still means that a couple of thousand may be needing correction by the end of the batch upload project. There may be ways of automatically un-watermarking images with standard overlays like this, though one has yet to be researched properly. Unfortunately there is no currently known way of automatically detecting these images on upload (for example there is no change in the EXIF information).

The general guidelines for Commons are that watermarked images assessed as in scope, and not overly promotional, are okay to be uploaded to Commons.

Action If you see any photographs with these watermarks, please add them to Category:Images from with watermarks and Category:Images from with watermarks so that the project team can apply a systematic 'best' way of correcting them.


Note - this table is not maintained.

 ! Task Assigned to Progress Where it happens
     Ad-hoc verification of OTRS tickets User:Russavia Est. 80% Referencing OTRS emails user credit templates are created.
     Upload image sets against OTRS ticket for each photographer, coordinated via a Google spreadsheet. User:Fæ

82.5% completed (estimate)


Photos appear in photos (check needed)
     Crop credit bar from photos. User:Fæ 0 outstanding (ideally zero) Photos leave photos (credit bar)
     Complete categorization and remove from check category (28,629 in backlog) Multiple, see below.

40.2% completed (estimate)


Photos leave photos (check needed)
     Create Airport/museum categories where missing, and correctly template Help needed 98 red-links left (ideally zero) /Airporttable
     Create Russian parser for and upload for photographers with OTRS permission Fae   Done photos (check needed)
     2014-03-07: Out of 138751 files in Aviation photographs by photographer, 82507 (59%) are from this project


     As of 2014-03-07 the following users were spotted helping with this project:

Priority loungeEdit

See Commons:Batch uploading/Airliners/Priority for a list of 823 images in photos (check needed) that are used in at least one Wikipedia article in any language. Ideally this table should be empty as any photo in use has already been evaluated by someone, so please consider these a priority for manual checks and removal from the check needed category.

Project membersEdit


I'd love for these photo's from to be uploaded to the commons so that I can use them on the Dubai International Airport wiki page:

MoHasanie (talk) 06:30, 30 March 2014 (UTC)

Hi, sorry about the delay in getting back to you. The photographers have their streams as all right reserved on, so we need a release of their photostream on record in OTRS to be able to upload the photos. Currently, we have the following photographers from that forum with release recorded:
  1. Mike Freer - Touchdown-aviation, {{MikeFreer}}
  2. Pedro Aragão, {{PedroAragão}}
  3. Michel Gilliand, {{MichelGilliand}}
  4. Felix Goetting, {{FelixGoetting}}
  5. John Davies, {{JohnDavies}}
  6. Javier Bravo Muñoz, {{JavierBravoMuñoz}}
  7. Alan D R Brown, {{AlanBrown}}
  8. AlainDurand, {{AlainDurand}}
  9. Alex Beltyukov - RuSpotters Team, {{AlexBeltyukov}}
  10. Renato Spilimbergo Carvalho, {{RSC}}
  11. parfaits, {{PavelAdzhigildaev}}
  12. Igor Dvurekov, {{IgorDvurekov}}
  13. Steve Fitzgerald, {{SteveFitzgerald}}
  14. Toshi Aoki, {{ToshiroAoki}}
  15. Oleg V. Belyakov, {{OlegBelyakov}}
  16. Igor Bubin, {{IgorBubin}}
  17. Dmitry Avdeev, {{DmitryAvdeev}}
  18. Chris Finney, {{ChrisFinney}}
  19. Shimin Gu, {{ShiminGu}}
  20. Árpád Gordos, {{ÁrpádGordos}}
  21. André Du-pont, {{AADPR}}
  22. Guido Allieri, {{GuidoAllieri}}
  23. Anton Bannikov, {{AntonBannikov}}
  24. Manfred Groihs, {{ManfredGroihs}}
  25. Peter Bakema, {{PeterBakema}}
  26. Sunil Gupta, {{SunilGupta}}
  27. Leonid Faerberg - Russian AviaPhoto Team, {{LeonidFaerberg}}
  28. Christian Hanuise, {{ChristianHanuise}}
  29. Ward Callens, {{WardCallens}}
  30. Robert Frola, {{RobertFrola}}
  31. Darian Froese, {{DarianFroese}}
  32. Aktug Ates, {{AktugAtes}}
  33. Ian Creek, {{IanCreek}}
  34. Peter Duijnmayer, {{PeterDuijnmayer}}
  35. Danial Haghgoo, {{DanialHaghgoo}}
  36. Paul Davey, {{PaulDavey}}
  37. Martijn Geerlings, {{MartijnGeerlings}}
  38. Eugene Butler, {{EugeneButler}}
  39. Andrew Babin, {{AndreyBabin}}
  40. Dean Constantinidis, {{DeanConstantinidis}}
  41. Mikhail Glazyrin, {{MikhailGlazyrin}}
  42. Vsevolod Aladyshkin - St.Petersburg Spotters , {{VsevolodAladyshkin}}
  43. Grahame Hutchison, {{GrahameHutchison}}
  44. Ercan Karakas, {{ErcanKarakas}}
  45. Alan Lebeda, {{AlanLebeda}}
  46. Eduard Marmet, {{EduardMarmet}}
  47. Roland Nussbaumer, {{RolandNussbaumer}}
  48. Les Rickman, {{LesRickman}}
  49. Tim Rees, {{TimRees}}
  50. Raimund Stehmann, {{RaimundStehmann}}
  51. Luc Willems, {{LucWillems}}
  52. Jeroen Westram, {{JeroenWestram}}
  53. Luc Verkuringen, {{LucVerkuringen}}
  54. Elisabeth Klimesch, {{ElisabethKlimesch}}
  55. Anthony Noble, {{AnthonyNoble}}
  56. Kral Michal, {{MichalKral}}
  57. JetPix, {{TorstenMaiwald}}
  58. Perry Hoppe, {{PerryHoppe}}
  59. Andreas Hoppe, {{AndreasHoppe}}
  60. Andy kennaugh, {{AndyKennaugh}}
  61. Gleb Osokin - Russian AviaPhoto Team, {{GlebOsokin}}
  62. Ted Quackenbush, {{TedQuackenbush}}
  63. Alain Rioux, {{AlainRioux}}
  64. Andre Wadman', {{AndréWadman}}
  65. Jerome Krier', {{JérômeKrier}}
If you can find photographers that fit your needs in that list, then I can run an update from their stream, otherwise you might want to drop them a note (or ask Russavia to add them to his list) and ask if they would like to contribute to our project by releasing their photos on a CC-BY-SA license so that others can use them for the public benefit. -- (talk) 16:09, 10 April 2014 (UTC)
@MoHasanie: some of the photos you want uploaded are ok. The only ones which aren't are those by Sam Chui -- he will not release under a free licence. If you want to contact other photographers, perhaps you can co-ordinate this with me, so that I can contact them and request permission on behalf of Commons. @: I have now categorised photographers by website. It might be worthwhile getting the 1st class membership for now (it's $55 a year) and doing a complete run of those photographers on Is that all good? Also, you will need to do an update of a lot of streams, such as {{EduardMarmet}} because only a small fraction of images have been uploaded from those streams. russavia (talk) 08:13, 11 April 2014 (UTC)
Hi Russavia, so all the photo's except those by Sam Chui are alright? I can try contact Sam Chui and asking him to change the license. MoHasanie (talk) 06:40, 16 April 2014 (UTC)

Art of Japan in the RijksmuseumEdit

Consistent and best practice upload from the Rijksmuseum of 1,679 photographs of Japanese artworks.

Using this as an initial themed upload, might enable a far large upload of images from the collections using the same code/process.


The upload will use the GWToolset.

A credit template of {{GWToolset Rijksmuseum}} is available.

API guide:

Examining an initial upload based on an export from Europeana, here, it seems worth exploring the Rijksmuseum API before proceeding. Reasons include:

  • The RM metadata is available in Dutch and English, both can be reused on the image page, though in practice most of the "English" text may be Dutch anyway.
  • Some additional fields are available in the RM API that do not appear in the Europeana data, such as exhibition history (places, years). Whether this is helpful for Commons would need exploration.
  • The image is identical whether pulled from the RM API or Europeana, including the EXIF data.
  • A persistent identifier via should probably be used, rather than the current URL (though including both might be a good option).
  • Dimensions are nicely broken up in the metadata and so could be displayed with fairly intelligent use of Commons templates for dimensions.
  • In the example, "physicalMedium" is given as porselein in Dutch, when the English metadata is called, the same field shows plate (dishes), which is actually "objectType". The same inconsistency has been carried over to the Europeana data. It may well be that importing the English version from the API might not be as useful as limiting this upload to the Dutch metadata, or leaving the English as a suggested translation.

Analysis comparing differences only between API calls in NL and EN for example image:

 <element> <value> # First in English then Dutch
 objectTypes  [u'plate (dishes)'] 
 objectTypes  [u'bord (vaatwerk)'] 

 objectCollection  [] 
 objectCollection  [u'keramiek'] 

 scLabelLine  anoniem, 1700 - 1725, plate (dishes) 
 scLabelLine  anoniem, 1700 - 1725, porselein 

 id  en-BK-1968-212 
 id  nl-BK-1968-212 

 materials  [] 
 materials  [u'porselein', u'glazuur'] 

 subTitle  d 15.9cm × h 2.9cm 
 subTitle  d 15,9CM × h 2,9CM 

 dimensions  [{u'part': None, u'type': u'diameter', u'value': u'15.9', u'unit': u'cm'}, {u'part': None, u'type': u'height', u'value': u'2.9', u'unit': u'cm'}] 
 dimensions  [{u'part': None, u'type': u'diameter', u'value': u'15,9', u'unit': u'CM'}, {u'part': None, u'type': u'hoogte', u'value': u'2,9', u'unit': u'CM'}] 

 language  en 
 language  nl 

 physicalMedium  plate (dishes) 
 physicalMedium  porselein 

 acquisition  {u'date': u'1968-01-01T00:00:00Z', u'method': None, u'creditLine': u'B. Westendorp-Osieck Bequest, Amsterdam'} 
 acquisition  {u'date': u'1968-01-01T00:00:00Z', u'method': None, u'creditLine': u'Legaat van mevrouw B. Westendorp-Osieck, Amsterdam'} 

From this example and the two other images uploaded by DH, the English version of the metadata appears to miss some fields and may be inconsistent. It seems better to stick to the Dutch record only. -- (talk) 14:17, 4 May 2014 (UTC)


Licenses chosen can be based on this statement "All data and all images made available through the API are either in the public domain or are subject to a CC0 license." found here. This should mean that the photographs are themselves released as CC0, with copyright of the art object being a separate issue (sticking to a cut-off of before the 20th century should mean PD can apply).


David Haskiya has started the foundation of this batch project, and after he ran short on time, Fæ has offered to pick up this project.

Action Status Where
DH: Pass on background and current xml file.

Three initial example files uploaded.

["BK-1968-205", "BK-1968-213-A", "BK-1968-212"]
Status:    Done Art of Japan in the Rijksmuseum
Fæ: Initial investigation (May 2014):
  • Review comments on template by Jean-Frédéric [57]
  • Check structure of current xml
  • Review metadata potential from Rijksmuseum API
    • API key   Done
    • Manual experiments   Done
    • Automate API import and map to a suitable GWToolset available template   Done
Status:    Done -
Fæ: Run test upload on beta cluster. (May 2014)
  • Pull xml test sets for 10 to 200 artefacts   Done
  • Explore category checking   Done
  • Ask for feedback[58]   Done
  • Follow up on GWT/Artwork template format bugs (filename generation, wiki-code handling in parameters)   Done
Status:    Done -
Fæ: Do real uploads starting with an initial 'page' full (20-200) for feedback and checking.

Planned cut-off of 1923 to avoid any possible contention on copyright, for the moment at least.

Status:    Done
2,496 images uploaded
Art of Japan in the Rijksmuseum

Library of CongressEdit

This is a project coordination page to explain the process used and to keep track of issues and past uploads

  • Source to upload from:
    • Library of Congress collections
    • There is an API, though web pages with metadata in MODS format are usable.
    • I have been in correspondence with the library on API access, it is limited to 15 enquiries per minute.
  • Describe the works to be uploaded in detail (audio files, images by …):
    • Suitable collections listed at
      These are not all suitable, some collections have few images online, others are neither government works, nor before 1923.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?

(talk) 11:39, 18 June 2014 (UTC)


Initial uploads used custom scripts to upload, the most recent use the Special:GWToolset which requires an xml file to be generated.

Naming is of the form:

File:<descriptive title> LCCN<lccn>.tiff

For an explanation of the unique lccn identification, refer to

For early GWT uploads naming has been forced to use "-LCCN" rather than " LCCN".

Opinions and issuesEdit

These include tiffs and there has been on-going issues with regard to Commons' thumbnail generation of very large tiffs (>50MP) and whether we should host jpeg files for convenience in parallel with the tiffs.


Code Assigned to Progress Bot Category
cpbr Status:    Done Custom British Cartoon Prints Collection: 1,648R
  • Uploads use {{Photograph}}
  • There has been significant post-upload "housekeeping" to:
    • Add country sub-categorization.
    • Add the parent category as this got left off after a re-run was needed when GWT was changed.
    • Upgrade jpgs to the same size as tiffs, using the sips command under OSX (so local downloading and uploading is required).
  • Supporting credit to WMUK.
Status:    Done GWT Photochrom prints collection: 18R
Original total c.11,500. Current category total will be less due to volunteer recategorizations.
  • Uploads use {{Photograph}}
  • Initial analysis included an assessment of how many tiffs would be over 100MB, there are only a handful out of a couple of thousand images.
  • A bulk of the collection appears to be pre-1923 (the test sample of 100 had 93 as published in 1923 or earlier).
  • Where they exist, location categories may be applied, for example Los Angeles. This may be slightly controversial, however the alternative is to make country bucket categories which seems a worse option.
  • No credit needed.
Status:    Done GWT Library of Congress panoramic photographs collection: 0R
  • Use {{Artwork}}
  • Images relating to the history of ballooning
Status:    Done GWT Library of Congress Tissandier collection: 232R
item 02121
  • Create special 'page turner' script to find all sub-images within a LoC item
  • Use {{Photograph}}
  • Photographs of the September 11th attack on the World Trade Center - all from the same photographer
Status:    Done GWT Library of Congress images of September 11 attacks: 250R
  • Create special webscraping query to generate xml as the photographs do not have LCCNs
  • Use {{Artwork}} - swapping to {{Photograph}} as only a small proportion of files are scans of building plans
  • Use HABS license template
  • 6,079 images were previously uploaded for HABS
  • Post-upload geolocation templates
  • Post-upload creation of PNG files from 50 MP+ TIFFs (via API, not GWT)
  • (Hard, this requires uploading an array of all existing files in memory) Cross link files from same scanned document using other_versions (example)
  • Ensure post-upload categorization is skipped when other editors have touched the file
  • Set up backlog page for HABS related category creation:
Commons:Batch uploading/Library of Congress/HABS
Status:    In progress GWT Catscan query
Files from the Historic American Buildings Survey: 303516

151.8% completed (estimate)


Wellcome Images CC-BYEdit

Example archive quality scan of a chromolithograph from the Wellcome Images collections (7,087×5,141 pixels)
  • Source to upload from:
I shall email the Images Team to see if an API is available. The standard web search does not seem to filter by licence.
  • Describe the works to be uploaded in detail (audio files, images by …):
Historic medical related photographs and illustrations.
  • Which license tag(s) should be applied?
CC-BY, possibly PD on a case by case or age basis.
Fae, I've noticed that your ~1300 image test run uses the CC-BY-SA-3.0 license, inconsistent with the CC-BY-2.0 claim that appears on the source pages for these lithographs (or the CC-BY-2.0-UK license mentioned in the announcement). What prompted you to use CC-BY-SA-3.0? —RP88 00:22, 19 February 2014 (UTC)
Oversight rather than design. I'm swapping these to CC-BY-2.0 and if a different interpretation comes out of our discussions later with the Wellcome, I'll apply that decision.
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
We probably should create a credit template in negotiation with Wellcome.
Category:Files from Wellcome Images holds current related uploads.
(talk) 11:46, 21 January 2014 (UTC)
{{Wellcome Images}} is obviously a related template. The Haz talk 05:24, 17 February 2014 (UTC)
I'm thinking that we should use {{Artwork}} instead of {{Information}} as this seems most appropriate. I've created Institution:Wellcome Collection. Considering the template contains probably every field we could desire, it might be the best template to use. The Haz talk 16:36, 18 February 2014 (UTC)
The test run of 1,300 lithographs use the artwork template. These will probably be over-written with better information when they go from low res to high res images by using information from the full library catalogue. There may be some of the 100,000 images that are not artworks, a bridge to be crossed when we come to it. -- (talk) 20:38, 18 February 2014 (UTC)


CC vs. PDEdit

It's great to have these images available, digitally, and I support the proposal to upload them by bot, but Wellcome are claiming copyright over, and to be the original source of, artworks and images from books which are already in the public domain. The assertion of copyright, and the right to attribution, should be rejected. They have added a strapline underneath each image; this will need to be removed. The process of downloading high resolution versions of these public-domain works is tortuous, with a CAPTCHA, irrelevant terms & condition, and zipped files. Andy Mabbett (talk) 13:27, 21 January 2014 (UTC)

For the relevant images, the T&Cs are very simple,[59] they just say that CC-BY applies and we must use "Wellcome Library, London" as an attribution, which the normal sort of credit we would give anyway. If I had to (if it turns out we can get no API access) then I can automatically trim the bottom strapline before upload, I already have a handy bit of Python that can do it and the strapline is not a requirement in the T&Cs. Note, the full high resolution version does not have a strapline.
(After a bit more testing) The download links are confusing, The first download link ("Download low-res images") guides you to download the "web quality" version on display, the second ("Download hi-res images") leads you through a CAPTCHA process to give you a download link for a zip file. I would have difficulty automatic the CAPTCHA process. The zipped full quality download is brilliant archive quality, my test example being >7,000px across showing beautiful detail of every figure in the painting, we definitely must have them.
I will see if my email gets suitable results before testing much more, or considering how the workflow for batch upload could work.
Note If you examine my first example manual upload (thumbnail above) this is a good example of where {{PD-Old-70}} or an equivalent may not apply and the best licence we could justify may well be the CC-BY one. The painting is catalogued as being created in the 1920's even though of an event in 1911, further I can find no date of death for the particular painter and this may be made complex due to the copyright law in China that may apply. It is worth observing that the EXIF data includes their old conditions, so has the licence as "cc-by-nc"; this is not in agreement with the stated website terms. -- (talk) 15:04, 21 January 2014 (UTC)
Another bit of license confusion is that their announcement identifies the license for these image as CC BY 2.0 UK while the tems identify the license as the unported version of CC BY 2.0. With regards to the PD images, where appropriate, I think something like {{Licensed-PD-Art|PD-old-100|Cc-by-2.0-uk|attribution=Wellcome Library, London}} is a suitable compromise and see some uploads are already taking that approach. —RP88 18:01, 21 January 2014 (UTC)
The Ts&Cs don't just impose an attribution on us, but on all re-users. We shouldn't be echoing that. And surely, if the Chinese image is not PD, then WT have no right to apply CC-by, or assert copyright in any other manner? Andy Mabbett (talk) 23:05, 21 January 2014 (UTC)
Andy, you appear to be getting views on this in many channels right now. I would rather wait until I have an email back from the Wellcome to my first question, which may enable batch upload quite nicely, and this might then also give me a suitable single point of contact to discuss how best to interpret copyright licensing. As it happens I raised the release of these images around 2 years ago with the Wellcome head of publishing, I am relieved that we have got as far as allowing public reuse of the images even if individual assessment of copyright on the 100,000 historic images for which truly are PD and which may have concerns, has yet to be completed. One of the benefits of a release on Commons is that our community is interested in copyright and will tend to winkle out these issues, even for complex and changeable areas of international IP law. -- (talk) 09:53, 22 January 2014 (UTC)
Went ahead and created {{PD-Art-Wellcome Trust}}. Feel free to amend. Jean-Fred (talk) 09:43, 22 January 2014 (UTC)
Thanks for setting this up. The assumption of PD may not be valid in some cases, until we start some test runs and have a better sense of how much of an issue this is, it is probably not worth engineering the solution much further at this moment. -- (talk) 09:56, 22 January 2014 (UTC)
Sure. What’s nice (and dangerous) with the template is that we can easily tweak the licensing information later based on a finer understanding of their terms (like cc-by Vs. cc-by-uk). :-) Jean-Fred (talk) 10:17, 22 January 2014 (UTC)

After handling a couple of these images, I believe that the batch upload project will need the support of a named contact within the Wellcome Library, or regular access for one or more Commons volunteers to be able to research the background of some of the collection. A good example of a file likely to be contested is included on the right as a thumbnail. It may well be that this was donated to the Wellcome Library as part of an set of archives but there is potential for this to be questioned due to the copyright mark naming Wojnarowicz as a member of Act Up (unfortunately David Wojnarowicz died in 1992, it is a photograph of Wojnarowicz as a boy that is featured in the poster). If the Wellcome Library does have a relevant letter of release or similar from an Act Up representative or agent, then this would be the basis of an OTRS ticket, or a public clarification in the description on Commons. Act Up would have produced this poster as part of their public knowledge mission and I have no doubt that representatives of the organization would confirm this as public domain if approached. Should this need to be done, then volunteers in Wikimedia LGBT can assist.

I am not sure at the moment how many of the 100,000 might be questioned, this would be a nice bit of analysis to do early on in the project so that suitable workflows deal with questions and there is confidence in how the collection is assessed before uploading to Commons. Due to the volume of work, this may even be an area that we may want to propose funding for to ensure it is done consistently and in a timely way. -- (talk) 13:50, 23 January 2014 (UTC)

With regard to ACT UP posters, I have sent off this email request for confirmation. -- (talk) 09:11, 24 February 2014 (UTC)

Excellent Fae. It would be great if we can get permission to host these ACT UP posters under a CC-BY license. However, to be honest, I confess that I'd like to know whether or not ACT UP had already given these posters to Wellcome under terms that permit Wellcome to distribute them with a CC-BY license. I really hope so, since If they haven't done so, even if they are willing to do so now, this would kind of cast a pall over Wellcome's generous release of their collection (as that might indicate we'll have to give closer scrutiny to the validity of the CC-BY claim on individual images, if Wellcome hasn't been careful when applying this license). —RP88 10:28, 24 February 2014 (UTC)
NC in the EXIFEdit

More worryingly is in the EXIF data for the image to the right "Copyrighted work available under Creative Commons by-nc 2.0 UK" --AdmrBoltz 13:15, 27 January 2014 (UTC)

As noted previously, this appears out of date compared to the terms on the site. Using NC was their *old* policy. During a batch upload we could change the EXIF, however it is better to keep the digital file identical to the original. -- (talk) 13:22, 27 January 2014 (UTC)
Must have missed that above. While I normally would agree that keeping original EXIF data is good, this could lead to confusion if someone were to reuse the content out of Wikimedia. --AdmrBoltz 14:07, 27 January 2014 (UTC)
I have the skills to get Faebot to tweak the EXIFs with any agreed corrections, though I would suggest this only happens after the originals are uploaded so they appear in the file version history. I would look at this as part of the main upload project, once that gets under-way. -- (talk) 14:16, 27 January 2014 (UTC)
Technical stuffEdit

Just noting I've enlarged and transferred their logo to Commons - File:Wellcome Trust logo.svg Nick (talk) 16:09, 21 January 2014 (UTC)

Avoiding over-categorizationEdit

I am using the Keywords (when available) on the Wellcome Images library catalogue page as a starting list to then look on Commons for existing categories of the same name. This does lead to matches with "diffusion" categories such as China or Hospitals. After Roland zh and then Mark Marathon raised this as an issue on my talk page, I have created a housekeeping script that sniffs through all uploads, tests for use of categories using the {{Categorise}} template and trims them off. I have not integrated this into the upload itself as it is already 40% done (so I want to avoid monkeying around with it for consistency) and doing this a relatively short time after upload (possibly a few days) gives a moment for volunteers to spot the images appearing in their watched categories for them to "diffuse" by hand; in balance this feels like a better option than not taking some value from all the available keywords. Note, the Wellcome keywords all stay in the description.

The script seems to process around 25,000 images per day, but gets stuck and needs a kick probably due to my home wifi, and something like 1% or 2% of images are affected. As this is going to be a one-off fix, it will run only for the upload and once all uploads are complete. There may be some residual issues which I could trap, such as where diffusion categories have been added by volunteers rather than me on upload, however as this is one-off I'm not currently planning for it to get this smart for the likely small number of images that might be this sort of fringe case. -- (talk) 12:58, 16 October 2014 (UTC)

I have integrated this test into the upload script. From today (i.e. ~25,000 more images) this will ensure that diffusion categories are not automatically applied just because they appear as a keyword on the Wellcome Images catalogue page. -- (talk) 16:26, 30 October 2014 (UTC)

Metadata and conventionsEdit

Metadata structures from Wellcome Images (WI)
name data structure conventions and notes
photo number [A-Z]\d{7} This number may be found in Wellcome catalogues as "photo no" or "image number" and may be shorter by having dropped leading zeroes. This number appears unique to the Wellcome Images collection but other identification numbers may be usefully included as references from other catalogues, such as the Wellcome Library reference number.
source "" + <photo number> + ".html" An alternative of<photo number> will redirect to the same gallery page.

(Draft!) Mapping these to Commons parameters:

  • filename = <safe version of WI short catalogue description TBD > + "Wellcome " <WI photo number> + ".jpg"
  • source = <WI source>


Assigned to, task Progress Bot name Category
Fæ, to email Wellcome for information on the API or licence filtering. Status:    Done

Meetings arranged
Email sent

Fæ, run single exemplar manually Status:    Done
Exemplar 1 - battle at the Ta-ping gate, 1911