Extracting Data from Myspace and creating a date sorted list of gigs.
February 17, 2008 10:14 AM
Subscribe
How could extract and combine the data from about 40 gig pages on Myspace (like
this and
this) and end up with a date-sorted list of all of the data?
Would it be easy, or quick to do this once a week? The more automated this can be the better. I don't really want an RSS feed but a resultant list like the one below which can be generated when I need it.
1/01/08: The Beatles: The Venue, London
1/01/08: The Verve: La Venue, Paris
2/01/08: The Beatles: The Venue, Manchester
2/01/08: The Rolling Stones: The Venue, York
2/01/08: The Beatles: The Venue, Skegness
4/01/08: The Kinks: The Venue, York
posted by takeyourmedicine to computers & internet (9 comments total)
2 users marked this as a favorite
import urllib2, re
bandnames = file("bandnames.txt","r").readlines()
baseurl = http://collect.myspace.com/index.cfm?fuseaction=bandprofile.listAllShows&friendid=18786133&n='
output_file = file('outputdata.txt','w')
for bandname in bandnames: #note: the following lines should be indented.
urlend = "+".join(bandname)
url = baseurl + urlend
resp = urllib2.urlopen(url)
html_code = resp.read()
### comment: you would have to design regular expressions (string patterns) to extract the data you are looking for. Do a Google search for "python regular expressions" and learn how to extract dates and other strings.
occurrence = re.findall(r'someregularexpression', html_code)[0]
output_file.write(occurrence + '\n')
More documentation at the following links, including password authentication, etc:
http://therning.org/magnus/archives/270
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/391929
Hope that helps. Let me know if you have any questions. If you want to get more elaborate and store the data in XML or SQL, let me know and I can dig up some code that does that.
posted by lunchbox at 10:30 AM on February 17, 2008 [1 favorite has favorites]