Mail Archive sponsored by
Chazzanut Online
jewish-music
Re: Recent News--not exactly music
- From: Ari Davidow <ari...>
- Subject: Re: Recent News--not exactly music
- Date: Wed 09 Dec 1998 19.10 (GMT)
Joel,
>I think your correspondent is mistaken about the technical aspects of this
>important project. My guess it that they are scanning the books and
>capturing "images" of the individual pages but NOT using character
>recognition to input the contents into a database or machine searchable
>corpus (the latter is a MUCH bigger task.)
The correspondent is correct, although the project is more complex
than described (as figures).
The project is two-fold. One is to capture the images, but at the
same time, the goal will be to use Adobe Acrobat's text OCR capabilities
to turn the text into a searchable database, so it is as originally
described. At least, to the best of my memory, that is what the NYBC
director has described to me. I'll know more in a couple of weeks,
when I next get to visit the center, I hope. (I'm not involved in the
project, just curious as a Hebrew typographer and information systems
person.)
Note that if using the Acrobat tools, the OCR need not be done immediately,
and could be deferred until appropriate parsing tools were available.
Hebrew, especially Hebrew printing in Frank Ruehl and the even-worse
faces of the last century (but as opposed to a good, beautiful,
Sephardic-based face such as Ada, scripts such as Rashi, or the
modern Israeli faces including Ada, but also Friedlander's Hadassah
or Tzi Narkis' Narkissim) are nightmares to scan due to the overall
similiarities between the lettforms.
ari
Ari Davidow
The klezmer shack: http://www.well.com/user/ari/klez/
owner: jewish-music mailing list
e-mail: ari (at) ivritype(dot)com