Please help me make many documents much shorter.
July 29, 2009 11:43 AM Subscribe
How can I access the "AutoSummary" feature of Microsoft Word progammatically to summarize many documents at once? Alternatively, what other programs are there that will summarize documents that can be accessed programmatically? If this isn't the right place to ask, where is?
I am working on an academic research project. I have several collections of plain text documents written in English. For each I would like to produce a number of increasingly smaller summaries of each of the documents using the "AutoSummarize" feature of Microsoft Word or some other program with similar functionality. I would like each summary saved with a file name indicative of the base file name and the size of the summary. How would you suggest I go about this?
In essence, here is my desired algorithm:
// Get all files to work over
Foreach file (read all files in directory)
// Get all the summary sizes as a percent of original size
Foreach summary_size_percent(95 90 85 ... 25 20)
// Do the summarization and save the file
Autosummarize file to summary_size_percent and
save as file_summary_percent.txt
I have access to any common operating system and multiple versions of Word (Windows and Mac) and Pages for the Mac as well, though it seems less able to specify an exact percent for the summary feature. I have some money I can use to buy almost any other document summarization software, or if there is freeware I can give that a swing too. I can buy and install scripting software if needed.
If there turn out to be multiple ways to do this, so much the better, especially if they produce different summaries. I am mostly a Unix programmer, so I am pretty familiar with Perl and Java and C on a Unix console, but I haven't done any Windows or Mac scripting.
If you know of a better place to ask this, where would it be?