Generating a Tag Cloud From Really Large Text Files
October 5, 2008 2:13 PM Subscribe
I want to generate tag clouds for really large text files (the largest is 85 MB). No online service that I've found will support a file this big. Any suggestions?
posted by dbarefoot to computers & internet (9 answers total) 4 users marked this as a favorite
I grabbed the a copy of each of the websites of the four national, major Canadian political parties. I want to generate tag clouds for each party, showing the terms they use most often on their sites.
I had a sysadmin friend of mine (I'm so not a programmer) use some regular expression magic to merge each site into a single file. Thus, I have four very large text files. I considered things like Wordle or Many Eyes, but they have a much smaller maximum file size than I need. What should I do?
One alternative: get my friend or somebody to produce a list of words for each site and the number of times they appear. Then I create a small file with the right number of each of, say, the top 100 words, and use Wordle on that. Any better suggestions? Or, you know, help?