<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Help me dig into lexical analysis!</title>
	<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis/</link>
	<description>Comments on Ask MetaFilter post Help me dig into lexical analysis!</description>
	<pubDate>Wed, 30 Nov 2005 13:19:17 -0800</pubDate>
	<lastBuildDate>Wed, 30 Nov 2005 13:19:17 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Help me dig into lexical analysis!</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis</link>	
		<description>Lexical analysis!  What are some good resources for a beginner? &lt;br /&gt;&lt;br /&gt; I&apos;m focusing some word-nerdity on a secretive mad-scientist-flavored excursion into natural language analysis, and I &lt;b&gt;know&lt;/b&gt; that I don&apos;t know very much about it.  I&apos;d like to have more than self-invented gut-instinct ideas to work with.  What books, websites, essays, etc. will help me get up to speed on the subject?&lt;br&gt;
&lt;br&gt;
Specific interest in word-frequency analysis, but I&apos;m finding myself increasingly curious about the whole neighborhood of ideas.  I&apos;m frustrated by my inability to cover much ground with Google -- I don&apos;t know the words for what I don&apos;t know about!</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2005:site.28174</guid>
		<pubDate>Wed, 30 Nov 2005 13:10:28 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
		
			<category>language</category>
		
			<category>lexemes</category>
		
			<category>parsing</category>
		
			<category>tokenization</category>
		
			<category>analysis</category>
		
			<category>word-frequency</category>
		
			<category>haxaslegomena</category>
		
			<category>damned-lies</category>
		
			<category>statistics</category>
		
	</item> <item>
		<title>By: andrew cooke</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443321</link>	
		<description>&lt;a href=&quot;http://www.google.com/search?hs=S1D&amp;hl=en&amp;lr=&amp;safe=off&amp;c2coff=1&amp;client=firefox-a&amp;rls=org.mozilla%3Aen-US%3Aofficial&amp;q=%22parsing+is+the+process+of+structuring%22&amp;btnG=Search&quot;&gt;this &lt;/a&gt; is an oldie but goodie computing text that i really like.  they include a bit of background (chomsky hierarchy etc) before getting into the details.&lt;br&gt;
&lt;br&gt;
if you want to write something by hand, look for information on &quot;recursive descent&quot; parsers.  i find them by far the easiest to write and understand.&lt;br&gt;
&lt;br&gt;
this is computing rather than natural-language related, because that&apos;s what i know.  hope it helps.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443321</guid>
		<pubDate>Wed, 30 Nov 2005 13:19:17 -0800</pubDate>
		<dc:creator>andrew cooke</dc:creator>
	</item><item>
		<title>By: andrew cooke</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443324</link>	
		<description>oops.  not all those links find the book (the first one doesn&apos;t).  &lt;a href=&quot;http://www.freeinfosociety.com/pdfs/computers/parsingtechniques.pdf&quot;&gt;this&lt;/a&gt; is the second link, and what i intended.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443324</guid>
		<pubDate>Wed, 30 Nov 2005 13:21:53 -0800</pubDate>
		<dc:creator>andrew cooke</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443339</link>	
		<description>Computational linguistics pointers are definitely helpful -- I am indeed writing my own parser as part of this.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443339</guid>
		<pubDate>Wed, 30 Nov 2005 13:31:35 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: advil</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443344</link>	
		<description>Perhaps you could clarify what you mean by &quot;analysis&quot;?  There are a lot (a LOT) of things that calculations of word frequencies can be used for, and the actual process of calculating them is fairly straightforward, so I suspect you don&apos;t mean that.  Also, the only sense of &quot;lexical analysis&quot; that I&apos;m familiar with is what the &lt;a href=&quot;http://en.wikipedia.org/wiki/Lexical_analysis&quot;&gt;wikipedia article&lt;/a&gt; says, and with respect to natural language (as opposed to a computer language), that&apos;s not a very interesting task, and doesn&apos;t seem to be what you&apos;re after.  &lt;br&gt;
&lt;br&gt;
As far as &quot;natural language analysis&quot; goes, well, I am a linguist, and analyzing natural language is what I do (in the sense of formulating theories that make predictions about how natural languages behave), but it&apos;s not clear if this is what you&apos;re interested in either.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443344</guid>
		<pubDate>Wed, 30 Nov 2005 13:33:57 -0800</pubDate>
		<dc:creator>advil</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443366</link>	
		<description>Attempt at clarification:&lt;br&gt;
&lt;br&gt;
I&apos;m finding myself broadly interested in a subject about which I know very little.  I have specific applications in mind -- I&apos;m working on very first-draft software tools for parsing and analyzing a large body of text, and I&apos;m performing word-frequency counts to help chase a couple of brainstorms and generate statistics about the corpus.  However, I&apos;m getting by on an arm-chair sensibility about all of this, and I would like to have a better understanding of the various issues and ideas tied to the subject.&lt;br&gt;
&lt;br&gt;
The in-apt use of &quot;lexical analysis&quot; hopefully underscores my position: I don&apos;t have the functional vocabulary to describe accurately what I&apos;m interested in.  Hence...&lt;br&gt;
&lt;br&gt;
&lt;small&gt;(But lexical analysis as described in &lt;b&gt;advil&lt;/b&gt;&apos;s wikipedia link is &lt;i&gt;one&lt;/i&gt; of the things I&apos;m interested in.  I have a comp-sci background, so it&apos;s a phrase that has stuck in my head from that side of things.)&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443366</guid>
		<pubDate>Wed, 30 Nov 2005 13:47:45 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: scruss</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443392</link>	
		<description>I found the &lt;em&gt;Cambridge Series in Computational Linguistics&lt;/em&gt; very helpful for explaining the basics of language analysis to new staff back in my corpus linguistics days.&lt;br&gt;
I remember &lt;a href=&quot;http://gate.ac.uk/&quot;&gt;GATE&lt;/a&gt; being a fun platform/toolkit, but I can&apos;t exactly remember what I used it &lt;em&gt;for&lt;/em&gt;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443392</guid>
		<pubDate>Wed, 30 Nov 2005 14:01:53 -0800</pubDate>
		<dc:creator>scruss</dc:creator>
	</item><item>
		<title>By: advil</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443419</link>	
		<description>It seems that there are roughly two (closely related) fields you&apos;re interested in:&lt;br&gt;
&lt;ul&gt;&lt;li&gt; &lt;b&gt;Information Retrieval&lt;/b&gt; (IR): more or less, the study of using statistical techniques to get information, in some form or other, out of natural language texts.  It seems that IR may be mainly what you&apos;re interested in.  A few buzzwords that might help with the searches: &quot;topic detection and tracking&quot;, &quot;question answering&quot;, &quot;ontology extraction&quot;, &quot;statistical alignment&quot; (or &quot;text-translation alignment&quot;).  The standard textbook is &quot;foundations of statistical natural language processing&quot;, by Manning and Schutze, MIT Press.  If you have cs background this book may be approachable.  A sample academic research group that does this is &lt;a href=&quot;http://ciir.cs.umass.edu/&quot;&gt;here&lt;/a&gt;.&lt;/li&gt;&lt;br&gt;
&lt;li&gt;&lt;b&gt;Natural Language Processing&lt;/b&gt; (NLP): the study of taking sentences of natural language, and having a computer act more or less as a human does when they hear that sentence.  Most of the work here focuses on parsing sentences into some kind of syntactic representation.  There is some overlap between this and the previous topic, though less than you might think - many IR tasks don&apos;t need good parsing techniques, and any linguist will tell you that the large-scale statistical techniques that IR often uses simply aren&apos;t what humans do.  These days for CS people question-answering is the main task that needs real NLP techniques.  I liked the textbook &quot;speech and language processing&quot; by Jurafsky and Martin, Prentice Hall, 2000.  It&apos;s somewhat more NLP oriented than the manning and schutze book.  Hopefully these recommendations won&apos;t be too technical, but I just don&apos;t know of any less technical ones.&lt;/li&gt;&lt;/ul&gt;&lt;br&gt;
By the way, I have no idea to what extent any of this is approachable to a non-specialist, but ACL (association for computational linguistics) has put the past 20 years or so of the journal Computational Linguistics online for free &lt;a href=&quot;http://acl.ldc.upenn.edu/&quot;&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
If you are interested in parsing, you might want to know a bit about linguistics.  Stephen Pinker&apos;s &quot;The language instinct&quot; is the canonical recommendation here.  It&apos;s very readable, and very interesting.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443419</guid>
		<pubDate>Wed, 30 Nov 2005 14:13:31 -0800</pubDate>
		<dc:creator>advil</dc:creator>
	</item><item>
		<title>By: formless</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443431</link>	
		<description>&lt;a href=&quot;http://nlp.stanford.edu/fsnlp/&quot;&gt;Foundations of Statistical Natural Language Processing&lt;/a&gt; is a good intro to the  statistical side of NLP.&lt;br&gt;
&lt;br&gt;
There are some sample chapters online, including one on collocations.&lt;br&gt;
&lt;br&gt;
Searching for courses on NLP that have course notes online will also be helpful.  One topic you might want to search for is n-gram models.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443431</guid>
		<pubDate>Wed, 30 Nov 2005 14:19:34 -0800</pubDate>
		<dc:creator>formless</dc:creator>
	</item><item>
		<title>By: fvw</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443448</link>	
		<description>Get Jurafsky &amp;amp; Martin&apos;s &lt;em&gt;Speech and Language Processing&lt;/em&gt;. It&apos;s an introductory text, which means it has a very broad scope, but it does explain the practicial sides too, so you&apos;ll actually be able to use what you read.&lt;br&gt;
&lt;br&gt;
I don&apos;t know if you can download it somewhere, but if you can spare the money to buy it it&apos;s probably worth it anyway as it&apos;s not the type of book you want to read from screen.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443448</guid>
		<pubDate>Wed, 30 Nov 2005 14:29:40 -0800</pubDate>
		<dc:creator>fvw</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443514</link>	
		<description>&lt;small&gt;Have I mentioned lately how great AskMe is?&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443514</guid>
		<pubDate>Wed, 30 Nov 2005 15:06:25 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443642</link>	
		<description>formless, your link is coming up &lt;b&gt;403&lt;/b&gt; for me.  Which is funny, since it&apos;s the top hit for that string.  Was it working when you linked it, I presume?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443642</guid>
		<pubDate>Wed, 30 Nov 2005 17:05:56 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443691</link>	
		<description>&lt;small&gt;Something about my &quot;I presume&quot; smells kinda snarky to me, but I really didn&apos;t intend it that way.&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443691</guid>
		<pubDate>Wed, 30 Nov 2005 18:04:55 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: Alison</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443705</link>	
		<description>Here are some basic kinds of parsers off of the top of my head:&lt;br&gt;
&lt;br&gt;
&lt;a href=&quot;http://www.comp.leeds.ac.uk/nti-kbs/ai5/Chart/chart.html&quot;&gt;Chart Parsers&lt;/a&gt;&lt;br&gt;
&lt;a href=&quot;http://www.link.cs.cmu.edu/link/&quot;&gt;Link Parsers&lt;/a&gt;&lt;br&gt;
&lt;a href=&quot;http://en.wikipedia.org/wiki/CYK_algorithm&quot;&gt;CYK&lt;/a&gt; &lt;a href=&quot;http://www.csee.umbc.edu/~squire/download/cykp.cpp&quot;&gt;Parser&lt;/a&gt;&lt;br&gt;
&lt;a href=&quot;http://www.icsi.berkeley.edu/~stolcke/papers/cl95/node30.html&quot;&gt;Viterbi Parsers&lt;/a&gt;&lt;br&gt;
&lt;a href=&quot;http://www.cs.usfca.edu/~parrt/course/652/lectures/LR.parsing.html&quot;&gt;LR Parsers&lt;/a&gt; (Also &lt;a href=&quot;http://portal.acm.org/citation.cfm?id=191163&quot;&gt;SLR Parser&lt;/a&gt;)&lt;br&gt;
&lt;a href=&quot;http://portal.acm.org/citation.cfm?id=980564&quot;&gt;Tomita Parser&lt;/a&gt; (An improved LR parser)&lt;br&gt;
&lt;br&gt;
I also second the &lt;a href=&quot;http://www.cs.colorado.edu/~martin/slp.html&quot;&gt;Jurafsky &amp;amp; Martin&lt;/a&gt; book as well as &lt;a href=&quot;http://www.amazon.com/gp/product/0262133601/103-7611664-8952659?v=glance&amp;n=283155&quot;&gt;Manning and Schutze&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
I&apos;m a computational linguist with a pretty good background in NLP and I can certainly answer any questions.  I&apos;ve written an LR parser before and it took a little while.  The toughest part for you will be collecting the parts of speech for each word and their probabilities without statistics.  Also, there is no need to write a parser when they are available for free.  But if you just want the practice, enjoy!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443705</guid>
		<pubDate>Wed, 30 Nov 2005 18:20:58 -0800</pubDate>
		<dc:creator>Alison</dc:creator>
	</item><item>
		<title>By: Alison</title>
		<link>http://ask.metafilter.com/28174/Help-me-dig-into-lexical-analysis#443709</link>	
		<description>The&lt;a href=&quot;ftp://ftp.cs.brown.edu/pub/nlparser/&quot;&gt; Charniak Parser&lt;/a&gt; is free and so is the &lt;a href=&quot;http://www.link.cs.cmu.edu/link/download.html&quot;&gt;link parser&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.28174-443709</guid>
		<pubDate>Wed, 30 Nov 2005 18:23:33 -0800</pubDate>
		<dc:creator>Alison</dc:creator>
	</item>
	</channel>
</rss>
