<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter questions tagged with datamining</title>
      <link>http://ask.metafilter.com/tags/datamining</link>
      <description>Questions tagged with 'datamining' at Ask MetaFilter.</description>
	  <pubDate>Sat, 17 Oct 2009 20:39:39 -0800</pubDate> <lastBuildDate>Sat, 17 Oct 2009 20:39:39 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>What&apos;s the statistical technique for combining several test results into one?</title>
	<link>http://ask.metafilter.com/135754/Whats%2Dthe%2Dstatistical%2Dtechnique%2Dfor%2Dcombining%2Dseveral%2Dtest%2Dresults%2Dinto%2Done</link>	
	<description>I&apos;m a statistics n00b trying to learn how to combine the results of several tests into one. Basically, I&apos;d like to learn to how to categorize entities in some experimental data by combining the scores from several domain-specific tests into one unified score.  A practical but hypothetical example would be writing a computer program that given the sound of a car engine will try to identify what model of car it came from.  Say that there are 10 possible cars each sound can be matched with, and three independent tests that are applied to each sound, each test producing a number between zero and one for each car model indicating how likely it is that the sound came from that model.  I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself. &lt;br&gt;
&lt;br&gt;
There are established ways to do this that I&apos;ve seen used in research before, but don&apos;t know any of the math and haven&apos;t had any luck Googling for info.  I&apos;m not looking for a detailed explanation, just some pointers to what I should research to teach myself.  Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.135754</guid>
	<pubDate>Sat, 17 Oct 2009 20:39:39 -0800</pubDate>
	<category>datamining</category>
	<category>experiments</category>
	<category>research</category>
	<category>statistics</category>
	<dc:creator>gsteff</dc:creator>
	</item>
	<item>
	<title>How to resist data mining offline?</title>
	<link>http://ask.metafilter.com/135543/How%2Dto%2Dresist%2Ddata%2Dmining%2Doffline</link>	
	<description>PrivacyFilter: Can you think of examples of offline anti-data mining behavior? We can encrypt our activities, use Tor, etc., online, but how do people try to stymie data mining in daily life? I&apos;m thinking of groups sharing a &quot;loyalty card&quot; (like &lt;a href=&quot;http://cardexchange.org/&quot;&gt;these guys&lt;/a&gt;) to break up the patterns of their shopping habits, and putting incorrect information into surveys and forms to make things harder to correlate (in theory), &lt;a href=&quot;http://web.mit.edu/gtmarx/www/tack.html&quot;&gt;tack-in-the-shoe&lt;/a&gt; style. Other examples? (And, if you know more about data mining, do these methods actually accomplish anything?)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.135543</guid>
	<pubDate>Thu, 15 Oct 2009 08:13:17 -0800</pubDate>
	<category>datamining</category>
	<category>incorrectinformation</category>
	<category>loyaltycards</category>
	<category>privacy</category>
	<category>secrecy</category>
	<dc:creator>finnb</dc:creator>
	</item>
	<item>
	<title>What books would teach me about information and data?</title>
	<link>http://ask.metafilter.com/128430/What%2Dbooks%2Dwould%2Dteach%2Dme%2Dabout%2Dinformation%2Dand%2Ddata</link>	
	<description>Let&apos;s say I wanted to educate myself to sort-of the equivalent of a Bachelor&apos;s Degree in Information and Data (not sure what the real degree would be called, but you get the idea). What should be in my syllabus? Areas of study would include things like informational networks and social networking, tacit and explicit knowledge, parsing and data extraction, data mining, visualization, metadata, information retrieval and storage, plus other things that I&apos;m probably not even aware of. Websites are great, but so are books (maybe excluding $$$ textbooks if possible), podcasts, videos, source code, applications, etc. Assume a relatively high level of technical know-how (including coding skills) but little formal computer science training.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.128430</guid>
	<pubDate>Sun, 26 Jul 2009 13:58:24 -0800</pubDate>
	<category>datamining</category>
	<category>information</category>
	<category>informationscience</category>
	<category>km</category>
	<category>metadata</category>
	<category>resolved</category>
	<category>socialnetworking</category>
	<category>theory</category>
	<dc:creator>Deathalicious</dc:creator>
	</item>
	<item>
	<title>Ideas for a user interface design of a data mining application?</title>
	<link>http://ask.metafilter.com/102098/Ideas%2Dfor%2Da%2Duser%2Dinterface%2Ddesign%2Dof%2Da%2Ddata%2Dmining%2Dapplication</link>	
	<description>I&apos;m trying to write my first eclipse RCP application. I would welcome some ideas about how to design the UI, preferably using the RCP building blocks: editors, views, preference, wizards, perspectives... The application is a data mining application for analyzing a set of transactions. Initially the user will load a file of transactions into the application.  The user will then be able to view the transactions, but not modify them. Then the main part of the application is the ability to apply various types of data mining algorithms to the data. The user will be able to select which algorithm they want to use, then set the parameters for the algorithm, some of the parameters will be default, and some should be persistent (saveable). They will then run the algorithm, which may take some time and be cancelable, after which the results will be presented. The format of the presentation might span different views, some of which will be the same between different algorithms, and some might be unique. The user should then be able to rerun the same algorithm with different parameter settings and get new results, or run a new algorithm. The system should keep track of the different results though and allow the user to refer back to previous runs, and possibly save the results.
&lt;br&gt;&lt;br&gt;
How would you design a UI for this application using eclipse RCP?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.102098</guid>
	<pubDate>Fri, 19 Sep 2008 06:03:19 -0800</pubDate>
	<category>datamining</category>
	<category>eclipse</category>
	<category>java</category>
	<category>rcp</category>
	<category>UI</category>
	<dc:creator>blueyellow</dc:creator>
	</item>
	<item>
	<title>Where to start identifying relationships in a set of numerical, binary, menu data</title>
	<link>http://ask.metafilter.com/90834/Where%2Dto%2Dstart%2Didentifying%2Drelationships%2Din%2Da%2Dset%2Dof%2Dnumerical%2Dbinary%2Dmenu%2Ddata</link>	
	<description>Ok so I have this huge table of survey data - much of it numerical,  much of it binary, some of it from selections from menus of text items (e.g. blue, green, orange etc). Where do I start to find the most noticeable relationships between variables? I have some familiarity with regression analysis and am equipped with R (free stats package,  but not too familiar with all its functionality). But how do I&lt;br&gt;
a) deal with the binary and menu-based data?&lt;br&gt;
b) start to find the most significant dependencies? Just randomly? (I mean for example, maybe I will discover that all females between 25 and 30 who like the colour pink tend to eat lots more icecream on Thursdays.)&lt;br&gt;
&lt;br&gt;
Even a text book or a tutorial telling me what stats I need to know would be useful.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.90834</guid>
	<pubDate>Wed, 07 May 2008 17:16:38 -0800</pubDate>
	<category>anova</category>
	<category>dataanalysis</category>
	<category>datamining</category>
	<category>regression</category>
	<category>statistics</category>
	<dc:creator>vizsla</dc:creator>
	</item>
	<item>
	<title>Cheap data-fitting software?</title>
	<link>http://ask.metafilter.com/89133/Cheap%2Ddatafitting%2Dsoftware</link>	
	<description>Looking for free or low cost (under $50) multiple linear regression software which ideally works with Microsoft Excel (but not critical). Any recommendations?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.89133</guid>
	<pubDate>Thu, 17 Apr 2008 17:58:49 -0800</pubDate>
	<category>datamining</category>
	<category>linearregression</category>
	<category>multipleregression</category>
	<category>regression</category>
	<category>statistics</category>
	<dc:creator>vizsla</dc:creator>
	</item>
	<item>
	<title>How does Terapeak do it?</title>
	<link>http://ask.metafilter.com/84242/How%2Ddoes%2DTerapeak%2Ddo%2Dit</link>	
	<description>&lt;em&gt;&lt;/em&gt;How do sites like Terapeak and Hammertap get their data from eBay? I&apos;m exploring opening an eBay store with my eBay-savvy fiance, and as someone with a systems engineering background, I see the advantage to doing research through tools like Terapeak and Hammertap before we actually decide what to sell and how to sell it. However, I&apos;m also very intrigued by these sites from a technical perspective, and I wish I could figure out how they get their data. &lt;br&gt;
&lt;br&gt;
So my question to all of you DBA junkies is this: How do these sites mine their data? Or do they just have a contract with eBay to get data dumps sent to them and then they parse it and deliver it to the customer? And how would you, as a DBA-savvy person, go about creating your own Terapeak or Hammertap?&lt;br&gt;
&lt;br&gt;
Many thanks in advance!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.84242</guid>
	<pubDate>Thu, 21 Feb 2008 14:03:47 -0800</pubDate>
	<category>data</category>
	<category>datamining</category>
	<category>ebay</category>
	<category>hammertap</category>
	<category>mine</category>
	<category>terapeak</category>
	<dc:creator>omnipotentq</dc:creator>
	</item>
	<item>
	<title>Help me lock them out!!</title>
	<link>http://ask.metafilter.com/80141/Help%2Dme%2Dlock%2Dthem%2Dout</link>	
	<description>How can I protect my privacy online? I have been following the recent stories* about the re-selling of data to third parties, loopholes in confidential data management, etc.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
I am not a terrorist/ do not view dodgy pr0n/hate websites and I have nothing to hide (other than my own data) but I don&apos;t like corporations having access to this information that will steadily build up over the course of my lifetime. I  want to know that any information that I give is stuff that I give willingly and with my full and considered consent. &lt;br&gt;
&lt;br&gt;
So, here is the difficult bit: what hints, tips and advice can you offer to someone to ensure that minimal information is actually kept by these companies in the first place and to enable me to monitor how much I give them in the future? &lt;br&gt;
&lt;br&gt;
I don&apos;t want to live like a virtual hermit, I DO want to make some purchases online and I do want to browse the web without any hassles where possible but not at the cost of having all my data bandied about to all and sundry.&lt;br&gt;
&lt;br&gt;
I would like to build up some good habits (that I can automate if possible) which will enable me to keep an eye on this data. &lt;br&gt;
&lt;br&gt;
I use a Mac at home and XP at work so advice for both is much appreciated. &lt;br&gt;
&lt;br&gt;
Many thanks in advance!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.80141</guid>
	<pubDate>Fri, 04 Jan 2008 06:15:41 -0800</pubDate>
	<category>data</category>
	<category>datamining</category>
	<category>onlineprivacy</category>
	<category>privacy</category>
	<dc:creator>ClanvidHorse</dc:creator>
	</item>
	<item>
	<title>help a layman understand multi-dimensional data modelling</title>
	<link>http://ask.metafilter.com/79482/help%2Da%2Dlayman%2Dunderstand%2Dmultidimensional%2Ddata%2Dmodelling</link>	
	<description>I&apos;d like to know how multi-dimensional data modelling works. My company has Cognos BI and I am working closely with the developers. I would like to understand how a star schema works, fact and dimension tables, etc. I&apos;d like to be able to more clearly envision whats going on with the data. How is it different from relational databases? Can you recommend a book or some online articles written for the layman? &lt;br&gt;
&lt;br&gt;
I don&apos;t necessarily need to understand how *to* model the data, just how to understand data retrieval *from* a model.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.79482</guid>
	<pubDate>Wed, 26 Dec 2007 12:02:49 -0800</pubDate>
	<category>cognos</category>
	<category>database</category>
	<category>datamart</category>
	<category>datamining</category>
	<category>datamodelling</category>
	<category>datawarehouse</category>
	<category>starschema</category>
	<dc:creator>goethean</dc:creator>
	</item>
	<item>
	<title>Who owns facebook applications &amp;amp; why do they need to access my private data?</title>
	<link>http://ask.metafilter.com/73456/Who%2Downs%2Dfacebook%2Dapplications%2Dand%2Dwhy%2Ddo%2Dthey%2Dneed%2Dto%2Daccess%2Dmy%2Dprivate%2Ddata</link>	
	<description>Who owns facebook applications &amp;amp; why do they need to access my private data? Slightly related to &lt;a href=&quot;http://www.metafilter.com/65406/You-13-cents&quot;&gt;this post&lt;/a&gt; on the blue, I was wondering if anybody has any info on who owns the various facebook applications, and what they do with the data, since you need to consent to them accessing your data whenever you add an application.&lt;br&gt;
&lt;br&gt;
A benign view is that it&apos;s a technical-legal requirement, and that under facebook&apos;s terms of service your data is private, so you need to consent to these third parties accessing it, eg in order to share your movie reviews, quiz results or whatever with others.&lt;br&gt;
&lt;br&gt;
Another distinct possibility is that these are used for datamining purposes, eg for marketing.&lt;br&gt;
&lt;br&gt;
My guess is that the truth might have various facets. Programming a successful facebook app would be a great career advantage for an IT student or budding programmer. On the other hand, with so many people willing to offer up so much personal information, the temptation would be enormous for corporations to get in on the act, especially if they can cross-reference information from various distinct applications. &lt;br&gt;
&lt;br&gt;
Cutting short the rambling speculation, does anybody know of any articles or resources that list, discuss or analyse who develops &amp;amp; owns various facebook applications, and what the data is used for?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.73456</guid>
	<pubDate>Tue, 09 Oct 2007 16:13:09 -0800</pubDate>
	<category>applications</category>
	<category>datamining</category>
	<category>facebook</category>
	<category>privacy</category>
	<dc:creator>UbuRoivas</dc:creator>
	</item>
	<item>
	<title>Datamining the public web</title>
	<link>http://ask.metafilter.com/68120/Datamining%2Dthe%2Dpublic%2Dweb</link>	
	<description>How do i build a data warehouse that scrapes data from public websites for my own use? Tools? Tips? Hi. I would like to track apartments on a classifieds site and use the data for analyzing the inpact of diffrent things on price. What i need is a tool or scripting language that would make it easy for me to spider the website and put the data in a database. Preferable this would be an open source solution. &lt;br&gt;
&lt;br&gt;
I am also looking for good tools for extracting information out of longer pieces of text. For example on the site i want to mine users can put in comments on every object. I would like to be able to decide if a comment is positive, negative och neither. I have seen this be done on one online art site that i cant remember the name of right now. The artist used blog post and decided the mood of the writer by what words were used.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.68120</guid>
	<pubDate>Mon, 30 Jul 2007 01:52:51 -0800</pubDate>
	<category>datamining</category>
	<category>datawarehouse</category>
	<category>scraping</category>
	<category>spidering</category>
	<dc:creator>ilike</dc:creator>
	</item>
	<item>
	<title>Recomendations for qualitative data management</title>
	<link>http://ask.metafilter.com/61865/Recomendations%2Dfor%2Dqualitative%2Ddata%2Dmanagement</link>	
	<description>What are your experiences with qualitative data coding software? Generally, and Apple specific. I&apos;m preparing for a year-long fieldwork project in which my data will consist primarily of digitally recorded interviews, written interviews, notes, and digital photographs. I&apos;m trying to figure out the best way to keep track of my data on my G3 iBook. Obviously, the volume of data will be fairly large. &lt;br&gt;
&lt;br&gt;
I&apos;m looking for specific recommendations about programs, but also general experiences with how well these programs work. Are they worth the time it takes to learn them? What features should I be looking for? What features sound great but aren&apos;t that useful? I found &lt;a href=&quot;http://www.transana.org/download/index.htm&quot;&gt;Transana&lt;/a&gt; in my searching, and it looks like it might be a good way to actually transcribe less, by allowing you to code audio clips. Has anyone tried this? Did it work?&lt;br&gt;
&lt;br&gt;
Basically, anything you can tell me about your experiences with these programs other than &quot;this PC-only program is the one and only, best evar!&quot; would be greatly appreciated.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.61865</guid>
	<pubDate>Thu, 03 May 2007 07:00:34 -0800</pubDate>
	<category>apple</category>
	<category>data</category>
	<category>datacoding</category>
	<category>datamining</category>
	<category>fieldwork</category>
	<category>mac</category>
	<category>qualitative</category>
	<category>software</category>
	<dc:creator>carmen</dc:creator>
	</item>
	<item>
	<title>Running a Background Check on Yourself</title>
	<link>http://ask.metafilter.com/58245/Running%2Da%2DBackground%2DCheck%2Don%2DYourself</link>	
	<description>Are there any legit online background check services? A friend of mine is currently a teacher at a charter school. He would like to apply for a job at another charter school, but he is concerned about school #2 running a cheap online background check and negative results not only costing him the new job, but possibly getting back to school #1 (likely--lots of cross-pollination). He has an incident from 2002 that, though it resulted in no conviction (and he has documentation that no conviction or admission of guilt was rendered), could simply look bad (possession). It did cause some problems when he renewed his driver&apos;s license 3 years ago, but he hired a lawyer at that point who apparently straightened the matter out.  But, since these schools aren&apos;t unionized, teachers don&apos;t have tenure, etc., he could potentially lose his job at school #1 for even the slightest suspicious incidents in his history. &lt;br&gt;
&lt;br&gt;
So he is considering running a preemptive background check on himself to see what comes up. That way, he&apos;ll know what could come up and he can make an educated decision as to whether it&apos;s worth it to take the chance and apply to job #2. He is aware that online background checks are not terribly accurate, but he suspects that this is the method the schools would employ (for various reasons). &lt;br&gt;
&lt;br&gt;
He is, however, wary of providing his name and SSN to a company that is, essentially, a data mining company. This wariness is the result of &lt;a href=&quot;http://blog.washingtonpost.com/thecheckout/2007/02/keeping_a_low_virtual_profile.html&quot;&gt;some&lt;/a&gt;  &lt;a href=&quot;http://www.snopes.com/computer/internet/zabasearch.asp&quot;&gt;articles&lt;/a&gt; he&apos;s &lt;a href=&quot;http://www.wired.com/news/privacy/0,1848,67407,00.html&quot;&gt;read&lt;/a&gt;, his libertarian tendencies, and general common sense. Do any of you have any experience (good or bad) with companies such as &lt;a href=&quot;http://www.zabasearch.com/&quot;&gt;Zabasearch&lt;/a&gt;, &lt;a href=&quot;http://www.people-finder.com/&quot;&gt;PeopleFinder&lt;/a&gt;, or &lt;a href=&quot;http://www.intelius.com/&quot;&gt;Intelius&lt;/a&gt;? Or could you recommend another similar service? He&apos;s looking for a (relatively) cheap service that he won&apos;t regret doing business with. Many thanks in advance.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.58245</guid>
	<pubDate>Wed, 07 Mar 2007 06:40:08 -0800</pubDate>
	<category>backgroundchecks</category>
	<category>datamining</category>
	<category>privacy</category>
	<dc:creator>jessicak</dc:creator>
	</item>
	<item>
	<title>Career change after disability leave</title>
	<link>http://ask.metafilter.com/23096/Career%2Dchange%2Dafter%2Ddisability%2Dleave</link>	
	<description>Imagine the worst-case scenario for an unemployed person looking for a new position. OK, my situation is not quite that bad, but almost. I plan to begin interviewing in January 2006 after two years on disability leave. Given the details I describe inside, how I do best use the remainder of 2005 to land the best possible position?
Due to heart disease, I have been on disability leave for the last 18 months. Next week, I begin therapy that has a very good chance to put me back into the ranks of the employed in early 2006. How do I best explain this two-year gap? Do I have any hope of getting full health insurance benefits? How much of a compromise can I make between health insurance coverage and salary?&lt;br&gt;
&lt;br&gt;
Until 2004, my career has been very hands-on in various high-tech fields, however due to the sensitivity of my pacemaker to electrical fields, I cannot return to that path. Nor can I return to my previous employer, because the few positions for which I am qualified do not pay enough and they are dead-end. I have determined that my best career path lies in management, however except for making corporal in the US Marine Corps 20 years ago, none of my previous positions gave me any experience in leadership. How do I best prepare to get an entry-level management position?&lt;br&gt;
&lt;br&gt;
I have moved from Atlanta to Las Vegas, primarily because Las Vegas has shown great resistance to economic lulls. (Ill-advised marketing campaigns targeting families are another story.) I have always had an interest in the gambling/hospitality industry. I am in a new town with no real network of personal acquaintances or business associates, nor do I have any job references of consequence in Atlanta (my only good ones are in jail or working in Canada due to a scandal). Beyond contacting local alumni (which I plan to do), how do I meet people of influence in such a short time?&lt;br&gt;
&lt;br&gt;
Finally, I have a BS in computer science from Georgia Tech, and Las Vegas is significantly below the average for number of residents with a college degree. Datamining is huge in Las Vegas, and that is another area in which I have an intense interest. However, I have no experience in the field. How do I break into the datamining biz? What do I need to know language/application-wise and how do I best obtain that knowledge and experience quickly and cheaply?&lt;br&gt;
&lt;br&gt;
One last question, I need business cards but I have no clue what info to put on them for someone who is unemployed beyond the standard contact information. Other than the resume, how do I market myself?&lt;br&gt;
&lt;br&gt;
I have exhausted searching the web for answers. For the most part, the information online is sketchy. Because I receive disability insurance payments, my cash flow is very limited, and I must rely on public transportation. I need resources for information (beyond what my insurance provider can give) as much as I need answers, because I have many more questions. My situation has many facets and no one person can answer everything, so I will appreciate anything you guys can pitch to me.&lt;br&gt;
&lt;br&gt;
Brainstorm! I don&apos;t care how ridiculous an idea may sound. Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2005:site.23096</guid>
	<pubDate>Wed, 24 Aug 2005 16:43:52 -0800</pubDate>
	<category>datamining</category>
	<category>disability</category>
	<category>LasVegas</category>
	<category>management</category>
	<category>unemployment</category>
	<dc:creator>mischief</dc:creator>
	</item>
	
	</channel>
</rss>

