On Feb 19, 2008, at 6:04 AM, David Clark wrote:
> I'm looking for decent and cheap Concordance software for Mac that
> will create a full concordance of a large PDF. I have numerous
> documents created in a word processing program (probably Word for
> Windoze) and then converted to PDF that are quite large -- 30 or so
> pages -- and I'd like to create a full word list.
Sounds like a job for, Perl! But first, you need the text as text.
One thing Adobe Reader has over Preview.app (at least as of 10.4) is
Adobe Reader has a Save As Text option. Sure, in Preview you can
Select All, Copy, then Paste into a text document but Save As Text
seems easier and I think the output might be cleaner. If you want a
concordance created from multiple PDFs, you could probably merge them
together using a program like Combine PDFs then use Save As Text.
http://www.monkeybreadsoftware.de/Freeware/CombinePDFs.shtml
Now for the Perl. Since text manipulation is its forte, I figured
this was a problem someone already solved. I searched for
"concordance" on perlmonks.org and got many hits. This post at the
end of a thread from 2001 seems to work nicely.
http://perlmonks.com/?node_id=104687
I literally selected and copied it from the web page, pasted it into
a text editor, saved it (by convention perl scripts end in .pl),
typed 'perl ' in Terminal, dragged the Perl script to the window then
dragged a PDF's saved text to the window after it and hit Return. Out
came an alphabetically sorted list of words, each followed by a
number indicating the line number on which the word appears.
Actually, at first the line numbers didn't work correctly because
Adobe saves the file with a Mac line ending (CR) and the script only
recognizes Unix line endings(LF). There are many options but I opened
the text file in Text Wrangler, changed the line endings from Mac
(CR) to Unix (LF) and saved (Windows CRLF) would also work).
If you didn't want the line numbers, you could change this line:
print ("$key ", join (', ',

{$concord{$key}}), "\n");
to this:
print ("$key\n");
When I wanted the concordance saved to a different file, I did the
same typing and dragging then added " > output.txt" which redirects
the script's output from standard output (the Terminal window) to a
file. Unless you changed directories in the Terminal window,
"output.txt" will be in your OS X account Home directory.