<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Extracting Data from Myspace and creating a date sorted list of gigs. </title>
	<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs/</link>
	<description>Comments on Ask MetaFilter post Extracting Data from Myspace and creating a date sorted list of gigs.</description>
	<pubDate>Sun, 17 Feb 2008 10:30:16 -0800</pubDate>
	<lastBuildDate>Sun, 17 Feb 2008 10:30:16 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Extracting Data from Myspace and creating a date sorted list of gigs. </title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs</link>	
		<description>How could extract and combine the data from about 40 gig pages on Myspace (like &lt;a href=&quot;http://collect.myspace.com/index.cfm?fuseaction=bandprofile.listAllShows&amp;friendid=78626127&amp;n=SCREAMING+TEA+PARTY&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;http://collect.myspace.com/index.cfm?fuseaction=bandprofile.listAllShows&amp;friendid=18786133&amp;n=Sleeping+States&quot;&gt;this&lt;/a&gt;) and end up with a date-sorted list of all of the data? &lt;br /&gt;&lt;br /&gt; Would it be easy, or quick to do this once a week? The more automated this can be the better. I don&apos;t really want an RSS feed but a resultant list like the one below which can be generated when I need it.&lt;br&gt;
&lt;br&gt;
1/01/08: The Beatles: The Venue, London&lt;br&gt;
1/01/08: The Verve: La Venue, Paris&lt;br&gt;
2/01/08: The Beatles:  The Venue, Manchester&lt;br&gt;
2/01/08: The Rolling Stones:  The Venue, York&lt;br&gt;
2/01/08: The Beatles:  The Venue, Skegness&lt;br&gt;
4/01/08: The Kinks:  The Venue, York</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.83853</guid>
		<pubDate>Sun, 17 Feb 2008 10:14:54 -0800</pubDate>
		<dc:creator>takeyourmedicine</dc:creator>
		
			<category>myspace</category>
		
			<category>gigs</category>
		
	</item> <item>
		<title>By: lunchbox</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241306</link>	
		<description>I have done stuff like this using Python many times. This should be easy, since it looks like you just have to automatically visit a list of easily designed URLs. Here&apos;s a rough outline; I haven&apos;t tested this. In the below code, bandnames.txt should be a file that contains a list of band names.&lt;br&gt;
&lt;br&gt;
import urllib2, re&lt;br&gt;
bandnames = file(&quot;bandnames.txt&quot;,&quot;r&quot;).readlines()&lt;br&gt;
baseurl = http://collect.myspace.com/index.cfm?fuseaction=bandprofile.listAllShows&amp;amp;friendid=18786133&amp;amp;n=&apos;&lt;br&gt;
output_file = file(&apos;outputdata.txt&apos;,&apos;w&apos;)&lt;br&gt;
for bandname in bandnames: #note: the following lines should be indented.&lt;br&gt;
    urlend = &quot;+&quot;.join(bandname)&lt;br&gt;
    url = baseurl + urlend&lt;br&gt;
    resp = urllib2.urlopen(url)&lt;br&gt;
    html_code = resp.read()&lt;br&gt;
    ### comment: you would have to design regular expressions (string patterns) to extract the data you are looking for. Do a Google search for &quot;python regular expressions&quot; and learn how to extract dates and other strings.&lt;br&gt;
    occurrence = re.findall(r&apos;someregularexpression&apos;, html_code)[0]&lt;br&gt;
    output_file.write(occurrence + &apos;\n&apos;)&lt;br&gt;
&lt;br&gt;
More documentation at the following links, including password authentication, etc:&lt;br&gt;
http://therning.org/magnus/archives/270&lt;br&gt;
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/391929&lt;br&gt;
&lt;br&gt;
Hope that helps. Let me know if you have any questions. If you want to get more elaborate and store the data in XML or SQL, let me know and I can dig up some code that does that.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241306</guid>
		<pubDate>Sun, 17 Feb 2008 10:30:16 -0800</pubDate>
		<dc:creator>lunchbox</dc:creator>
	</item><item>
		<title>By: bertrandom</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241309</link>	
		<description>Play around with &lt;a href=&quot;http://www.dapper.net/&quot;&gt;Dapper&lt;/a&gt; to get the data you want, and if you need to reformat it, you can use the RSS feed from that and work it into &lt;a href=&quot;http://pipes.yahoo.com&quot;&gt;Yahoo Pipes&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
Or, write a script to screen scrape it and parse it using regular expressions.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241309</guid>
		<pubDate>Sun, 17 Feb 2008 10:31:48 -0800</pubDate>
		<dc:creator>bertrandom</dc:creator>
	</item><item>
		<title>By: lunchbox</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241311</link>	
		<description>I immediately noticed a few typos in my code. (e.g. the baseurl should have a quotation mark at the beginning.) But let&apos;s see what solutions other people come up with first.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241311</guid>
		<pubDate>Sun, 17 Feb 2008 10:33:04 -0800</pubDate>
		<dc:creator>lunchbox</dc:creator>
	</item><item>
		<title>By: petethered</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241374</link>	
		<description>&lt;a href=&quot;http://www.rareoak.com/myspace.php.txt&quot;&gt;Here is a piece of PHP code&lt;/a&gt; I found somewhere that I used to grab the dates from a buddies myspace page and embed them in his personal page.&lt;br&gt;
&lt;br&gt;
Fiddle a little with it and you have yourself a solution.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241374</guid>
		<pubDate>Sun, 17 Feb 2008 11:15:06 -0800</pubDate>
		<dc:creator>petethered</dc:creator>
	</item><item>
		<title>By: freshgroundpepper</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241549</link>	
		<description>Here&apos;s a groovy script that I use to scrape stock quotes off of MoneyCentral, it should be pretty easy to repurpose this to do what you&apos;re looking for if you&apos;ve got a little programming background:&lt;br&gt;
&lt;br&gt;
#!/usr/local/groovy/bin/groovy&lt;br&gt;
// need to have the TagSoup jar in your classpath for this to work, it is better at parsing html that is malformed&lt;br&gt;
// see: http://ccil.org/%7Ecowan/XML/tagsoup/&lt;br&gt;
&lt;br&gt;
def symbols = [&quot;EEM&quot;, &quot;QQQQ&quot;, &quot;SPY&quot;, &quot;VFORX&quot;, &quot;VPU&quot;, &quot;VWO&quot;]&lt;br&gt;
&lt;br&gt;
def getQuotes(findSymbols = [&quot;AAPL&quot;]) {&lt;br&gt;
    def url = new URL(&quot;http://moneycentral.msn.com/detail/market_quote?symbol=${findSymbols.unique().sort().join(&apos;+&apos;)}&quot;)&lt;br&gt;
    def quotes = []&lt;br&gt;
    &lt;br&gt;
    url.withReader { reader -&amp;gt;&lt;br&gt;
        def html = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parse(reader)  &lt;br&gt;
		// crappy html on moneycentral, the data is in the only table of class &quot;t&quot;, and the only rows in that table we care about have 6 cells&lt;br&gt;
        def rows = html.&apos;**&apos;.grep { it.name() == &quot;tr&quot; &amp;amp;&amp;amp; it.children().size() == 6 &amp;amp;&amp;amp; it.parent().name() == &quot;table&quot; &amp;amp;&amp;amp; it.parent().@class == &quot;t&quot; }&lt;br&gt;
        def headers = rows[0].th.collect { it.text() }&lt;br&gt;
        rows[1..rows.size() - 1].each { row -&amp;gt;&lt;br&gt;
            def quote = [:]&lt;br&gt;
            headers.eachWithIndex { header, i -&amp;gt; quote[header] = row.td[i] }&lt;br&gt;
            quotes &lt;&gt;
        }&lt;br&gt;
    }&lt;br&gt;
    return quotes&lt;br&gt;
}    &lt;br&gt;
&lt;br&gt;
getQuotes(symbols).each { println &quot;${it[&apos;Symbol&apos;]}\t${it[&apos;Last&apos;]}\t${it[&apos;Change&apos;]}&quot; }&lt;/&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241549</guid>
		<pubDate>Sun, 17 Feb 2008 14:28:03 -0800</pubDate>
		<dc:creator>freshgroundpepper</dc:creator>
	</item><item>
		<title>By: freshgroundpepper</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241553</link>	
		<description>hmm...it didn&apos;t deal with a double &quot;less than&quot; sign well in the paste.&lt;br&gt;
&lt;br&gt;
Replace DOUBLE_LESS_THAN with the actual append symbol:&lt;br&gt;
&lt;br&gt;
headers.eachWithIndex { header, i -&amp;gt; quote[header] = row.td[i] }&lt;br&gt;
quotes DOUBLE_LESS_THAN quote &lt;br&gt;
}&lt;br&gt;
}</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241553</guid>
		<pubDate>Sun, 17 Feb 2008 14:31:54 -0800</pubDate>
		<dc:creator>freshgroundpepper</dc:creator>
	</item><item>
		<title>By: takeyourmedicine</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241587</link>	
		<description>I have no real programming knowledge, so I can&apos;t really make sense of this stuff, but Petethered&apos;s solution seems to come closest - but I&apos;m not sure how to fiddle with it to make it work - where to input the urls etc. Further or more explicit help on this would be useful!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241587</guid>
		<pubDate>Sun, 17 Feb 2008 14:59:58 -0800</pubDate>
		<dc:creator>takeyourmedicine</dc:creator>
	</item><item>
		<title>By: bprater</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241692</link>	
		<description>If you lack programming chops, you might hit up rentacoder.com and submit a project. Something like this could probably be done for less than $20.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241692</guid>
		<pubDate>Sun, 17 Feb 2008 16:36:34 -0800</pubDate>
		<dc:creator>bprater</dc:creator>
	</item><item>
		<title>By: waxpancake</title>
		<link>http://ask.metafilter.com/83853/Extracting-Data-from-Myspace-and-creating-a-date-sorted-list-of-gigs#1241699</link>	
		<description>Here&apos;s an actively maintained &lt;a href=&quot;http://davecouliershaveshisballs.com/myspace-gigs-parser/&quot;&gt;Myspace gigs parser&lt;/a&gt; written in PHP.  If you have no coding skills, use the web-based version of that with Yahoo! Pipes or Dapper, and you should be good to go.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.83853-1241699</guid>
		<pubDate>Sun, 17 Feb 2008 16:41:36 -0800</pubDate>
		<dc:creator>waxpancake</dc:creator>
	</item>
	</channel>
</rss>
