User talk:Dominic(Redirected from User talk:US National Archives bot)
Structured Data on Commons Newsletter - Fall 2018 editionEdit
Welcome to the newsletter for Structured Data on Wikimedia Commons! You can update your subscription to the newsletter. Do inform others who you think will want to be involved in the project!
- Community updates
- Multilingual Captions, the first feature release for Structured Data, is coming in January of 2019
- Be on the lookout for the beta testing announcement
- Help using captions has been set up, if you'd like to go ahead and see the workflow
- Two IRC office hours were held since the last newsletter
- Our dedicated IRC channel: wikimedia-commons-sd webchat
- Help determine and propose properties on Wikidata for Commons
- Review designs for structured licensing and copyright
- Join the community focus group!
Since the last newsletter:
- Review a prototype for searching structured Commons (October 2018)
- "Good coverage" for depicts tagging (Sept. 2018)
- Review and discuss mockups for displaying the new metadata section of the file page (18 September - 9 October 2018)
- Depicts statements draft requirements (14 August - 31 August 2018)
- Identify Wikidata properties that Commons will need (26 June - 14 August 2018)
- Presentation by Keegan on the first features to be released for Structured Data, presented at Wikiconference North America, Columbus, Ohio, October 2018.
- Sandra presented a project update at the GLAM-Wiki conference in Tel Aviv, Israel, November 2018, as part of an update and panel discussion.
- Structured Data on Commons was the subject of a keynote presentation by Sandra (see slides) at the Baltic Audiovisual Archives Council conference in Tallinn, Estonia, November 2018.
- Partners and allies
- The info portal on Structured Commons now includes a section on GLAM (Galleries, Libraries, Archives and Museums).
- We are currently planning the first GLAM pilot projects that will use structured data on Wikimedia Commons. One project has already started: the Swedish Heritage Board researches and develops a prototype tool to provide improved metadata (translations, data additions...) from Wikimedia Commons back to the source institution. Read the project brief.
- The documentation for batch uploads of files to Wikimedia Commons will be improved in 2019, as part of preparing for Structured Data on Wikimedia Commons. To prepare, the GLAM team at the Wikimedia Foundation wants to understand better which types of documentation you already use, and how you like to learn new GLAM-Wiki skills and knowledge. Fill in a short survey to provide input!
- Stay up to date!
- Follow the Structured Data on Commons project on Phabricator: https://phabricator.wikimedia.org/project/profile/34/
- Subscribe to this newsletter to receive it on a talk page of your own choice.
Message sent by MediaWiki message delivery - 17:58, 7 December 2018 (UTC)
Captions in JanuaryEdit
My apologies if this is a duplicate message for you, it is being sent to multiple lists which you may be signed up for.
Hi all, following up on last month's announcement...
Multilingual file captions will be released this week, on either Wednesday, 9 January or Thursday, 10 January 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:
- Read over the help page for using captions - I wrote the page on mediawiki.org because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
- Test out using captions on Beta Commons.
- Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.
Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. -- Keegan (WMF) (talk) 21:09, 7 January 2019 (UTC)
File:Vietnam. Three Fighter Squadron 161 (VF-161) F-4D Phantom II fighter aircraft from the attack aircraft carrier USS Midway (CVA-41) and three Corsair II attack aircraft from the attack aircraft carrier USS America ((...) - NARA - 558517.gifEdit
Hi, I don't know what your aim with this bot is, but most NARA files have been uploaded already in TIF-quality. I see no reason to upload the same files in low GIF-quality. I would be glad if you could either upload high-res files or at least exclude low quality duplicates of existing files. Thank you. Cobatfor (talk) 17:43, 11 January 2019 (UTC)
- @Cobatfor: I am aware that this is unfortunately an issue. I had access to a small number of high-resolution TIFFs several years ago and uploaded them all. This was maybe 100,000 files. There are currently over 50 million files in the NARA catalog, so it is not the case that most have been uploaded in TIFF already. The current bot is uploading directly from the catalog, unlike the TIFF originals that were stored on a drive. There may be a small number of duplicates resulting from this process, and I would like to clean that up eventually. It is difficult to exclude these beforehand, because Wikimedia Commons does not have structured data (can't easily query on the identifier field to detect if it exists), and it is not really possible to programmatically determine that a version of a file already exists on Commons with the bot we have. I will need to write a different script to flag any items with the same identifier in order to deduplicate. Also, it's hard to tell with your example, but for many of these, the GIF is not just a low-resolution version of the TIFF. The TIFF is the master scan file, while the GIF may have had color levels adjusted, been cropped, or other edits made prior to being made catalog-ready. I have actually had trouble getting them deleted in the past, because Commons admins will not speedy-delete a duplicate if there has been any editing done. I have been required to write a deletion request for each one, which costs me a lot more time, and makes it less of a priority for me. Dominic (talk) 15:24, 30 January 2019 (UTC)
US National Archives bot down? and requestEdit
I notice the US National Archives bot hasn't uploaded anything since October, 2018. Has it been deactivated? Also, are there any plans to upload .jpg versions of the many .tiff master files? I know .tiff files are preferred for file fidelity, but .jpg are more convenient for displaying on Wikimedia sites. Also, may I request a bot-assisted upload of the NARA series Gerald R. Ford White House Photographs, compiled 08/09/1974 - 01/20/1977? The corresponding Commons category only has a about 80 images, while the NARA collection appears to have over 1,000. Note that some previously uploaded files uploaded without complete bot-generated meta data, e.g. this one, are more difficult to categorize. Thanks! (Update, I just read your user page, and understand if you can't contribute or respond right now due to the ongoing Federal shutdown. All the best! Take care.)--Animalparty (talk) 00:07, 15 January 2019 (UTC)
- @Animalparty: As you guessed, I have been furloughed and I am catching up on my work inbox and other messages now. Thank you for your patience! The NARA upload bot is not really operating continuously; I operate it when I have the time and resources to do so, and sometimes when I do run it, I run into issues that I need to fix. I am trying to upload on a more regular basis, but I it's been a while, between the holidays and the shutdown. I can certainly prioritize that series, and let you know when I get to it. Also, regarding the TIFFs and JPGs, I uploaded a large number of TIFFs early on in our project, because we had them stored on a hard drive in the office. For these, there should usually be a JPG version already, unless there were some that were missed. For all the rest of the uploads, and future ones going forward, all I will be able to upload is the files from the online catalog. There are a very few TIFFs, but most of these are JPGs (or some low-quality GIFs, unfortunately). If you have been working with NARA images, or plan to, I would love to hear more about what you're working on. Thanks! Dominic (talk) 15:00, 30 January 2019 (UTC)