Data-Driven Poe

Mabbott - Pollin - Savoye

The original edition of this material was printed from pages and type directly produced by a computer printer of the era. Each page was divided into two major columns, and the various entries for each word were essentially also aligned into columns. As a result, it was extremely difficult to scan the text and convert it to OCR. It was necessary to purchase an inexpensive copy of the book, which was regrettably destroyed in the process of scanning. Each page was cut out and had to be carefully folded down the middle such that each major column could be scanned separately, on both sides of each sheet.

The task “first involved the laborious key-punching of the data on IBM cards, with the consequent handling of these crude and precarious data-records. The development of word-processing instruments using a console screen set before the typewriter-like keyboard, with data files on central and safely stored disks, has greatly quickened and facilitated the collection and correction of considerable quantities of data — especially if one is willing to work almost around the clock. ... But several conditions had to be met: availability of expensive computer machine time, of a programmer, of human “inputters” who could type the data “onto” the disk and, above all, of a publisher willing to issue a large and possibly expensive book. In June 1981 these conditions were miraculously fulfilled ... The OCR program sometimes still got confused about what text belonged on a single line, and moved entries around in ways that were not always predictable. Consequently, the resulting text required a great deal of manual intervention, although this intervention was partially mitigated by the creation of a series of WordPerfect Macros that were able to make an initial stab at correcting some errors.”