Commons:Bots/Requests/Smallbot 5

Smallbot (talk · contribs)

Operator: Smallman12q (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: To upload files of the w:Gerald R. Ford Presidential Library and Museum as part of Commons:Gerald R. Ford Presidential Library and Museum. Ongoing discussion can be found at User talk:Bdcousineau.

The bot will parse relevant links at http://www.fordlibrarymuseum.gov/ and upload the files. There are thousands of pdfs (estimated 2GB worth of pdfs thus far) and images.

Automatic or manually assisted: Automatic...some supervised

Edit type (e.g. Continuous, daily, one time run): One run

Maximum edit rate (eg edits per minute): 9 (however fast it uploads)

Bot flag requested: (Y/N): N

Programming language(s): VBScript (Javascript, XMLHTTP, MSHTML, XMLDOM, COM) Source will be made available after cleanup as upload proceeds.

Smallman12q (talk) 22:14, 6 October 2012 (UTC)[reply]

Discussion

I've parsed one of the pages and placed the output at here. Automatic descriptions are not available aside from the link text. Authors are also not automatically available. I've uploaded a sample file at File:Vice President and Romanian President Ceausescu (Background material only) (Gerald Ford Library) (002301000) .pdf. Naming, categorization, and template suggestions are welcome.Smallman12q (talk) 22:14, 6 October 2012 (UTC)[reply]

Will be good idea to wrap part of source field into language template. Is it possible to extract more specific then Background material only? Like diplomatic letters, etc? Space before file extension should be removed. --EugeneZelenko (talk) 14:46, 7 October 2012 (UTC)[reply]
Initially, it was requested that .pdfs be uploaded first. You can see some of the sample uploads listed at User:Bdcousineau/Sandbox3. The files use the {{GFPLM-image-full}} template and {{Gerald R. Ford Presidential Library and Museum-cooperation}} tag. Most of the discussion is held at User talk:Bdcousineau. All comments are welcome.Smallman12q (talk) 23:24, 19 October 2012 (UTC)[reply]
A sample category has been completed at Category:Gerald R. Ford Presidential Library and Museum series: Issue Decision Papers for the President. Something similar would follow for the other thousand pdfs. Is this approved?Smallman12q (talk) 22:13, 22 October 2012 (UTC)[reply]
That looks good. Any idea what's wrong with this one File:Gerald Ford Papers- Final Issues for Decision, Army Corps of Engineers- Puerto Rico - Compact of Permanent Union (1)(Gerald Ford Library)(1554455).pdf? --99of9 (talk) 23:09, 22 October 2012 (UTC)[reply]
It got corrupted. I've also filed a bug for it: bugzilla:41281. I also need Filemover rights to fix the files in Category:Gerald R. Ford Presidential Library and Museum series: Budget Review Decision Papers which were affected by bugzilla:41190.Smallman12q (talk) 23:57, 22 October 2012 (UTC)[reply]
You already have the filemover right. Or do you mean for the bot? --99of9 (talk) 00:23, 23 October 2012 (UTC)[reply]
For the bot.Smallman12q (talk) 23:47, 23 October 2012 (UTC)[reply]
Granted filemover right. Feel free to test this out. --99of9 (talk) 23:10, 24 October 2012 (UTC)[reply]

Unless there are any further objections, I suggest we approve this for the pdfs. Perhaps it's best to run another trial when it comes to the images? --99of9 (talk) 23:09, 22 October 2012 (UTC)[reply]

That's fine. When we get to images, we'll hold a separate discussion.Smallman12q (talk) 23:57, 22 October 2012 (UTC)[reply]
Looks like bot "eats" part of file name. May be separator other then / should used for file names? --EugeneZelenko (talk) 14:49, 23 October 2012 (UTC)[reply]
Err...its not the bot that ate part of the filename...its the wiki (bugzilla:41190)=P. Anyhow, for future uploads with the bot, all / are replaced with "-". It will also check for other characters which can't be used in titles.Smallman12q (talk) 23:47, 23 October 2012 (UTC)[reply]
Does last number mean document ID in Gerald Ford Library? In this case will be good idea to combine library name and ID in one set of round brackets, something like (Gerald Ford Library, ###). --EugeneZelenko (talk) 14:28, 24 October 2012 (UTC)[reply]
The last number is the scan id of the document. Is there anything wrong with the current separation between the (library) and (scan id number)? It's how I've done my previous batch uploads without complaint.Smallman12q (talk) 22:09, 24 October 2012 (UTC)[reply]
Just too much round brackets at the end of file names :-) --EugeneZelenko (talk) 13:56, 25 October 2012 (UTC)[reply]
I prefer to leave the (library)(id) pattern. So is the trial good to go?Smallman12q (talk) 00:17, 27 October 2012 (UTC)[reply]

If there are no any objections, I think task should be approved. --EugeneZelenko (talk) 14:52, 30 October 2012 (UTC)[reply]