Mail Archive sponsored by Chazzanut Online

jewish-music

<-- Chronological -->
Find 
<-- Thread -->

Re: Recent News--not exactly music



Joel,

>I think your correspondent is mistaken about the technical aspects of this
>important project. My guess it that they are scanning the books and
>capturing "images" of the individual pages but NOT using character
>recognition to input the contents into a database or machine searchable
>corpus (the latter is a MUCH bigger task.)

The correspondent is correct, although the project is more complex
than described (as figures).

The project is two-fold. One is to capture the images, but at the 
same time, the goal will be to use Adobe Acrobat's text OCR capabilities
to turn the text into a searchable database, so it is as originally
described. At least, to the best of my memory, that is what the NYBC
director has described to me. I'll know more in a couple of weeks, 
when I next get to visit the center, I hope. (I'm not involved in the
project, just curious as a Hebrew typographer and information systems 
person.)

Note that if using the Acrobat tools, the OCR need not be done immediately,
and could be deferred until appropriate parsing tools were available. 
Hebrew, especially Hebrew printing in Frank Ruehl and the even-worse 
faces of the last century (but as opposed to a good, beautiful, 
Sephardic-based face such as Ada, scripts such as Rashi, or the 
modern Israeli faces including Ada, but also Friedlander's Hadassah
or Tzi Narkis' Narkissim) are nightmares to scan due to the overall
similiarities between the lettforms.

ari


Ari Davidow
The klezmer shack: http://www.well.com/user/ari/klez/
owner: jewish-music mailing list
e-mail: ari (at) ivritype(dot)com


<-- Chronological --> <-- Thread -->