Saturday, November 10, 2012

11,000 letters

My current extracurricular work for YB involves a monumental task: I'm retyping, letter by letter, a 15-page answer key that was originally in PDF form and which must now be converted to a manipulable MS Word document. Each page of this answer key is for a different set of books. Book topics range from drawing conclusions to finding the main idea, making comparisons, drawing inferences, etc. All the questions are multiple-choice, so the answer key takes the form of an immense grid: 25 columns across and about 30-40 rows tall.

Thinking I was being clever, I had originally tried to convert the PDF version of the answer key to a Word document through Adobe Acrobat Pro's "export to Word" function, but the result was a horrifyingly illegible mess. I briefly contemplated using Excel for the conversion, but decided that I'd be wasting too much time controlling and adjusting column widths. So-- MS Word it was... and is.

Filling in the grids for a single page (remember, we're talking about 750-1000 letters per page, here) takes the better part of an hour, and is extremely tedious, tiresome, Sisyphean work. I admit I take frequent breaks; it's the sort of data-entry work that I dread and loathe, and if I work in anything longer than one-page bursts, I'll be sure to go insane. Constructing the tables into which I'm keying the multiple-choice answers was a far more interesting task than the key-in itself, and I've still got over half the document to go-- about 11,000 letters for all 15 pages. Pray for the retention of my sanity.



The Maximum Leader said...

Some sort of OCR conversion was not available?

Kevin Kim said...

Steve asked me the same question via email. I'm not sure that OCR conversion is reliable, and then there's the perilous "convert text to table" step, where everything goes haywire (as happened when I used the "export to Word" command). It's not enough for the conversion to convert the letters-- it's also got to stack those letters reliably as tables.

I've got only four pages to go, but if you've got an OCR software recommendation (Adobe Acrobat Pro can convert PDF text to OCR-ready letters), I'm all ears. Not sure what that'll change, though; the PDF text in this document is already selectable (if not editable).