Please help me make many documents much shorter.
July 29, 2009 11:43 AM   Subscribe

How can I access the "AutoSummary" feature of Microsoft Word progammatically to summarize many documents at once? Alternatively, what other programs are there that will summarize documents that can be accessed programmatically? If this isn't the right place to ask, where is?

I am working on an academic research project. I have several collections of plain text documents written in English. For each I would like to produce a number of increasingly smaller summaries of each of the documents using the "AutoSummarize" feature of Microsoft Word or some other program with similar functionality. I would like each summary saved with a file name indicative of the base file name and the size of the summary. How would you suggest I go about this?

In essence, here is my desired algorithm:

// Get all files to work over
Foreach file (read all files in directory)
    // Get all the summary sizes as a percent of original size
    Foreach summary_size_percent(95 90 85 ... 25 20)
        // Do the summarization and save the file
        Autosummarize file to summary_size_percent and
        save as file_summary_percent.txt

I have access to any common operating system and multiple versions of Word (Windows and Mac) and Pages for the Mac as well, though it seems less able to specify an exact percent for the summary feature. I have some money I can use to buy almost any other document summarization software, or if there is freeware I can give that a swing too. I can buy and install scripting software if needed.

If there turn out to be multiple ways to do this, so much the better, especially if they produce different summaries. I am mostly a Unix programmer, so I am pretty familiar with Perl and Java and C on a Unix console, but I haven't done any Windows or Mac scripting.

If you know of a better place to ask this, where would it be?
posted by procrastination to Computers & Internet (1 answer total)
 
Best answer: There's a lot of research on text summarization methods — I just have a passing interest in them and don't know much, but I looked at Delicious bookmarks for "summarization" and found a few projects that might be helpful: IBM Many Aspects Document Summarization Tool, Open Text Summarizer, MEAD, etc. There's also a summarization service built into Mac OS X.
posted by dreamyshade at 3:34 PM on July 29, 2009


« Older Do you know this mellow song? A piece of the lyric...   |   Would you do your honeymoon the same way all over... Newer »
This thread is closed to new comments.