please help my create a spreadsheet from a list of links.
November 20, 2012 2:43 PM Subscribe
I want to create a spreadsheet from the hyperlinks and words in a word list on Wiktionary. Please take me through the steps. Thanks!
The Wiktionary list is the most frequent 5000 words of a non-English language. Most of the words in the list are hyperlinks, leading to individual pages that contain information about each word (part of speech, etymology, English gloss, etc.). I'd like to scrape the data from each link so that I'd end up with a csv file that has columns with all the available information for each of the 5,000 words (and I'll fill in the rest manually). How do I go about this? Thanks again!
posted by iamkimiam to computers & internet (8 answers total) 2 users marked this as a favorite
Seems pretty simple to get the words. Copy and paste and do text to columns splitting on spaces.
As far as scraping the dictionary definitions, you're going to want a programmer for that. It's non-trivial, unless there's an api and it doesn't look like there is.
posted by empath at 2:51 PM on November 20, 2012