Generating a Tag Cloud From Really Large Text Files
October 5, 2008 2:13 PM Subscribe
I want to generate tag clouds for really large text files (the largest is 85 MB). No online service that I've found will support a file this big. Any suggestions?
I grabbed the a copy of each of the websites of the four national, major Canadian political parties. I want to generate tag clouds for each party, showing the terms they use most often on their sites.
I had a sysadmin friend of mine (I'm so not a programmer) use some regular expression magic to merge each site into a single file. Thus, I have four very large text files. I considered things like Wordle or Many Eyes, but they have a much smaller maximum file size than I need. What should I do?
One alternative: get my friend or somebody to produce a list of words for each site and the number of times they appear. Then I create a small file with the right number of each of, say, the top 100 words, and use Wordle on that. Any better suggestions? Or, you know, help?
posted by dbarefoot to computers & internet (9 answers total) 3 users marked this as a favorite
posted by jesirose at 2:26 PM on October 5, 2008