User talk:Danmichaelo

This talk page is automatically archived by ArchiveBot. Any sections older than 60 days are automatically archived. Sections without timestamps are not archived.
Archive

Archives


Archive 1

Crop tool & multipage filesEdit

Thanks Dan for your work! I pinged you at it:s:Wikisource:bar but I don't know if you'll get the ping. Here my first run of CropTool on a djvu file: File:Piccolo Mondo Antico (Fogazzaro) (page 7 crop).jpg; it is a poor, but necessary, image. --Alex_brollo Talk|Contrib 14:40, 11 August 2016 (UTC)

Cool! Btw. is background removal something you think could be automated? Using ImageMagick, I would guess that a white fuzzy opaque fill would be the way to go, perhaps with a conctrast-stretch to make the black more black first. But I'm not sure if the level of fuzziness would have to be varied from image to image. How is your current procedure for doing it? – Danmichaelo (δ) 20:25, 11 August 2016 (UTC)
Take a look if you like at my account, User:Alex brollo, you'll find some hundreds of uploadings of images extracted from books in the last few weeks... I used a very similar convention for naming them .... Well, your tool saves lots of time, since the most annoying work is, to complile metadata, links, categories. I usually work with XnView and needed steps are:
  1. crop the image (usually a TIFF by ddjvu.exe);
  2. rotate a litle bit when needed (a lot of times)
  3. convert in into gray-scale
  4. adjust it editing its levels ("blanking" pale gray areas)
  5. convert into .jpg or .png and upload it into Commons
As you see from first uploads I feel comfortable to upload the raw crop, then to download it, to edit it (by XnView), and to re-upload it. It's fast and effective. Sure, it will be great to work inside the tool into a canvas, but the tool is already great! Would you like some more suggestion coming from many tests I'm going to perform? The first one is to add an input field into Crop Tool accepting the page number. Wikisourcers often review many pages of the same book into the same edit session, the smartest ones will discover fastly that it's very simple to upload a different page of the same book just editing the page number into the tool URL :-) but an input field will help. About Gadget-CropTool.js, I uploaded into it.source a slightly different version, i.e.:
  1. the script has been customed to work from nsPage only and gets the full name of page from context;
  2. an attribute target="_blank" has been added to the link code, so that the tool opens into a different tab from the calling page. Here the code: it.source CropTool.js gadget.
In brief: it.source (and other source projects I guess) was waiting for your tool... from years, thanks again!--Alex_brollo Talk|Contrib 14:30, 12 August 2016 (UTC)
Great to hear! An input field for selecting page (or a dropdown like on Commons) should be really easy to add, and something I also had in mind, so that will definitely come. Feel free to let me know if you have other ideas for improvement. – Danmichaelo (δ) 18:32, 12 August 2016 (UTC)

CropTool running at bn.wikisourceEdit

Just to let you know that bn.wikisource is interested about CropTool: bn:s:User talk:Alex brollo.

Obviously I cant' read any bengali word or character, but this is not so hard an issue when a user is sufficiently bold :-) --Alex_brollo Talk|Contrib 09:10, 17 August 2016 (UTC)

Cool, had to check the source to locate ক্রপটুল on the gadget page :) – Danmichaelo (δ) 20:27, 17 August 2016 (UTC)

A (challenging) proposal for CropTool enhancingEdit

Do you know as Internet Archive works with texts, how it is used as one of best free djvu/pdf sources for wikisource work, and details of the rich archive of different derived files that are stored into any IA item? In brief, there's one high-definition image for any pdf / djvu page of IA books, while both djvu and pdf images are highly compressed; and they can be downloaded and cropped manually; but ask me more if you want only if you are really interested about, it's an hard issue. --Alex_brollo Talk|Contrib 10:09, 18 August 2016 (UTC)

Let's discuss how the process could work. Would this be for djvu/pdf files that have already been transferred to Commons (I guess they have a link back to archive.org), or files that only live on archive.org? In the latter case, I guess a user would just paste the URL from archive.org into CropTool and work from there.. Does the URLs from archive.org include page numbers? – Danmichaelo (δ) 11:58, 18 August 2016 (UTC)
Things are a little bit more complex - high resolution images of pages are stored as jp2 files, into a zip. A dynamic request to the right IA server builds a jpg image from jp2, then sent it. I.e of a IA url, giving back a high resolution jpg:
Aas you see the needed url is a call to a php script - not a static url; and it is wrapped into a very complex, multi-server url.  :-( --Alex_brollo Talk|Contrib 13:39, 18 August 2016 (UTC)
Given the IA item (LingenosoIdalgoDon_chisciotteDellaManciaVol.2), it seems like you can get the rest (server, dir) from the metadata api: https://archive.org/metadata/LingenosoIdalgoDon_chisciotteDellaManciaVol.2 . What confuses me about this file though is that the 64 MB pdf file is marked as "original", while the 843 MB jp2.zip is "derivative". Any idea why? – Danmichaelo (δ) 18:57, 18 August 2016 (UTC)
Yes. IA items are uploaded by contributors as pdf files or as zipped images; those uploads are the "original" files. Both are normalized and somehow deskewed by the powerful server of IA getting "derivative" _jp2 images; these are the source for any following elaboration (OCR, pdf, djvu....). Djvu is presently derived but it is not published; it is used to extract text and _djvu.xml, the latter contains "word mapping", t.i. coordinates of words into page image. I presume that IA book viewer uses _jp2 images + _djvu.xml to highlight searched words. _jp2 images are "omologous" to images wrapped into djvu or pdf files, so that coordinates of images into _jp2 can be exactly derived from jpg images coming from djvu/pdf IA files. --Alex_brollo Talk|Contrib 08:48, 19 August 2016 (UTC)

Another (easy) proposal for CropToolEdit

Most wikisource users of CropTool (two from three I presume by now.... ;-).... did you get some more feedback?) feel the need to one more input field to add one or more category names (it would great to use commonist convention, t.i., if I remember right, names of categories separated by a | character).

Perhaps, the best could be to have a preview of full description text that is going to be uploaded by CropTool, just as IA upload bot does, allowing users to add/to edit all what they need.

PS: Thanks for the drop-down field for page numbers!--Alex_brollo Talk|Contrib 14:24, 19 August 2016 (UTC)

Return to the user page of "Danmichaelo".