I have 600 images tagged with keywords. In addition to nice pictures of tag clouds and frequency calculations, what sort of smart, insightful analysis can I do with these data that could reveal relationships between the tags and other (more formal) data attached to the images? Any advice on software tools (Windows, preferably FOSS) would be appreciated. I have some training in statistics and I have actually done textual statistics before but only briefly so I'm not familiar with the current tools and methods.
posted by elgilito
on Mar 11, 2014 -
I'm currently enrolled in a PhD program for statistics and operations research, and in two more years I can grab that PhD. Alternatively, I can jump ship now with a MS, headed for the (inviting/inevitable) waters of industry.
Knowing that I have no interest in staying in academia, give me some motivation to finish. Or, tell me to quit because actual work experience is more valuable! What roles should I be looking at other than Data Scientist? Would it be feasible to get a position doing private research, and would that be awesome? Bonus points if you can tell me what skills I should be cultivating to be hire-able (software engineering)?
P.S. I will also do some more chatting with profs and former students to figure out how I should be directing myself, but I hope the HiveMind can provide some complementary ideas.
posted by zscore
on Feb 16, 2014 -
An NPR blog
cites an NSF study which claims that 26 percent of Americans asked answered that the Sun goes around the Earth, rather than vice versa. Believing that 1 in 4 of my fellow citizens doesn't know that the Earth circles the Sun is hard enough. But thinking about that number, it seems worse than that: if 26% got a 50/50 question wrong, wouldn't another 26% have answered correctly just based on chance rather than knowledge? That would mean that roughly half of Americans didn't know (and then split evenly on their guess). The idea that half of Americans don't know seems intuitively ludicrous to me. Am I missing something in how I think about this? Please help my statistically challenged brain... [more inside]
posted by tyllwin
on Feb 15, 2014 -
Biologists and Staticians... what's going on here? There hasn't been a female born in my husband's family in two generations. Help solve the brothers' debate about what's causing this, and what the odds of our pregnancy being male or female is. [more inside]
posted by jrobin276
on Feb 11, 2014 -
There's a lot of statistics online about how many children are sexually abused, but not much about how many adults are doing the abusing. I did read one article that mentioned in passing that about 1 in 20 men and 1 in 3300 women have sexually abused a child, but I think that was referring to only prepubescent children, not minors in general. The article was pretty old too, and there was no source for where they got that information. If anyone knows where I can find a reliable statistic regarding the percent of adults that sexually abuse children then I would appreciate it. [more inside]
posted by sam_harms
on Feb 4, 2014 -
MS Excel's regression tools provide 95% lower/upper confidence results but how does one properly interpret and then express those as a single ± (plus/minus) figure? [more inside]
posted by kartguy
on Jan 26, 2014 -
If I'm going to succeed in a field I am considering (bioinformatics), I'm going to need to learn a good deal of statistics. I have limited experience in the subject and have never found the details of it especially compelling. Can you recommend some books on statiistics (not necessarily light on the details) that are well written and interesting? Something designed to get the layman interested in the details could be good, but textbook recommendations would also be good.
posted by garuda
on Dec 20, 2013 -
I'm a daily MATLAB user for data analysis, and fairly fluent with most toolboxes, including Parallel Computing. I know I need to learn something new, though.* (MATLAB is great for prototyping but unwieldy for real data-crunching.)
I'm taking a class (Bayesian stat methods) starting in January based around R
. What's the best resource to get started with R for someone like me? [more inside]
posted by supercres
on Dec 12, 2013 -
I'm looking for every box score my favorite NBA team played in for this year and last. Where can I go to download this data? [more inside]
posted by antonymous
on Dec 5, 2013 -
I have 50+ responses from large companies for a survey that I've written which has approximately 100 questions. There is no other data that can be linked to this survey. I need to know what I can do with these results and how to do it. [more inside]
posted by JiffyQ
on Dec 1, 2013 -
I work in a University managing the broad based direct mail, email and calling programs. I have zero undergrad or graduate experience with math, business or the social sciences. (Aka, I can write a really nice essay...) I would like to chart a path to being recognized as an expert in predictive analytics. [more inside]
posted by meta x zen
on Nov 26, 2013 -
How do I use SPSS to analyze a range of approval ratings which vary by participant and correlate the skew to one demographic variable? [more inside]
posted by lettuchi
on Nov 25, 2013 -
I like math. Programming is OK, but I don't want to make it my thing. What careers should I be looking at? (Special snowflake details inside.) [more inside]
posted by sqrtofpi
on Nov 25, 2013 -
I'm trying to reconcile two numbers from the same national statistical agency. I'm looking for a dummy's guide for what defines the difference between the two numbers and how to use one to estimate the other. [more inside]
posted by MuffinMan
on Oct 22, 2013 -
I'm thinking about setting up a tongue in cheek project for Halloween that involves showing numbers that are scary. What numbers (with the context of a description) make you anxious, or at least spook you a bit on first glance? [more inside]
posted by mccarty.tim
on Oct 5, 2013 -
How flexible is a master's degree in biostatistics compared to one in applied statistics? Is this even what I want to do? [more inside]
posted by Comet Bug
on Oct 4, 2013 -
I have a list of 15 people. Each person has between 1 and 3 entries in a lottery, for a total of 35 entries. I need to select 9 people from the list of 15--nobody can win more than once.
What is the most transparent, most random, most low-tech way I can do this? [more inside]
posted by tchemgrrl
on Sep 18, 2013 -
I post a lot of URLs to social media sites (and since one of those is Twitter I often use url shorteners) that point to my own publishing company's website, and also directly to where I sell my books on Amazon, Barnes & Noble, etc. I know services like ow.ly will track how many clickthroughs a url will get, and I think they can give me multiple shortened urls for the same target url. I'm wondering if any url shortening sites will also let me keep track of all of my shortened urls and give them nicknames or make notes (so I can note where I've used them) and give me a chart or spreadsheet or something that shows me which urls are getting the most traffic. Or if there's an app or separate website where I can enter the info that will then collect the tracking data. I'm trying to avoid having to manually check every url's visitors data.
posted by joannemerriam
on Sep 15, 2013 -
I currently work for a growing company doing various social media marketing for small businesses. I have been finding that I receive a lot of satisfaction doing activities related to what I learned in library school. I enjoy collecting, organizing, and providing data and information for our internal staff and making things approachable. One weakness I see is that we are especially data rich and insight poor with social media. I would like to know if there are any recommended programs for data mining or statistical analysis? [more inside]
posted by andendau
on Sep 8, 2013 -
I'm looking for a word-count tool that will allow me to: set a goal for words written by a specific date, enter in the words I have written each day, see how many words I remaining toward my goal, and how many words I will need to average each day to reach my goal. [more inside]
posted by Tevin
on Aug 26, 2013 -
My office has recently had some funds open up and we are looking into investing in some statistical software to make our lives easier. We do a lot of work with distribution fitting, Monte Carlo analysis, and regression analysis with data sets that may contain left or right censored data. Unfortunately, we only have a few days to identify the best software package for our buck. Alternatively, the idea has been floated to download the free R software and spend the money on some training to get over the steep learning curve. What program or approach would be the best use of our money?
posted by C'est la D.C.
on Aug 12, 2013 -
I'm struggling to understand likelihood ratios (LR) in the context of diagnostic tests, and why a positive LR is influenced by the sensitivity of the test. [more inside]
posted by cacofonie
on Aug 1, 2013 -
We have a group of six people with 55 different options. Each member of the party has to vote for each option under 8 different analyses i.e. appearance, distinctness, etc. The options are quality weighted. [more inside]
posted by trashcan
on Jul 10, 2013 -
So I take it that the OkTrends blog
was killed off after Match bought OkCupid. Where can I now get my regular fix of really interesting statistics presented at a level that the lay person can understand? (I already know about Nate Silver and xkcd's What If.)
posted by capricorn
on Jul 7, 2013 -
I'm working on a problem for "Inferences about the difference between two population means for independent samples: sigma 1 & 2 unknown and unequal."
The final value of "test statistic t" falls in the rejection region for 95% confidence interval, but falls in the nonrejection region for 99% confidence interval.
Should I perform additional calculations before rejecting my null hypothesis?
posted by iamcharity
on Jul 1, 2013 -
I have over 10 years of sent email sitting in a folder. Are there any tools (preferably for OS X or *nix, but anything interesting is welcome) that I can use to generate interesting statistics, or draw pretty graphs, word clouds... basically anything interesting that works on a huge number of emails.
posted by Mwongozi
on Jun 28, 2013 -
Help me with statistics and Excel. Especially help me if you know any labor saving methods. I want the median, mean and standard deviation for the average price of all items sold, but my spreadsheet-full-of-data doesn't tell me the price of each sale -- just the average price per store, and the number sold at that store. Something like this: [more inside]
posted by croutonsupafreak
on Jun 21, 2013 -
So I'm beginning a statistics PhD program this fall and I'm concerned that my math skills have gotten rusty since I haven't done anything related for the past two years. I've been working as an actuary since I graduated college but I don't do that much math--mostly a lot of programming. Has anybody been in a similar situation to me? How was the adjustment for you? I'm considering retaking advanced calculus and linear algebra during my first year (probably next summer before I take 2nd year advanced courses) just to refresh myself again. I'm aware some people may think this is kind of pathetic but I'd rather be safe than sorry. Besides, it's only my first year. Is this frowned upon? [more inside]
posted by molamola
on Jun 17, 2013 -
I'm running into trouble with my statistics course. I'm just getting up to t statistics for independent measures research design. My problems are:
1. I'm going through a lot of paper
2. I need to keep all my calculations better organized as a do them
3. I'm flipping back and forth between my book, an online version of the book, and another screen so that I can reference as much material as possible at once. I'm thinking some kind of basic statistics calculator spreadsheet (or any other format) would be in order. Can anybody direct me towards one? [more inside]
posted by Che boludo!
on Jun 8, 2013 -
What are the limits to bedbugs? Why isn't every hotel room infested given how tough they are claimed to be? Is there any evidence on the chances of taking bed bugs home from a hotel with you? Will the bedbug infestation rates go ever upwards? Why or why not? Interested in aggregated, rather than anecdotal evidence here. [more inside]
posted by mister_kaupungister
on May 31, 2013 -
I come from an engineering background rather than a research background, and I find myself lacking in vocabulary when it comes to understanding research papers, particularly when they start talking about ANOVA analyses, F(x) effect sizes and p values. I can skim through the results of a study and see that certain numbers are bigger than other numbers, but I don't really know how to tell whether what I'm seeing is significant. I'm guessing that I'm missing basic education in statistics. Can I fix this in a simple way?
posted by sdis
on May 28, 2013 -
I was recently sitting down to tea with a friend of mine's (we're both in our early 20's), and the topic wandered over to hard drug use (e.g. stuff like cocaine and crystal meth, not marijuana or alcohol.) When comparing our perceptions of how common hard drug use was, we were completely surprised when our answers were polar opposites: I saw it as an extremely rare thing, but she said it was something virtually everyone did but no one talked about. What's the truth here? How prevalent is hard drug use anyway? And why do our experiences differ so much? [more inside]
posted by Conspire
on May 24, 2013 -
I have a list of paired numbers that span multiple orders of magnitude, and I need to find a method to a) compare within each pair in a way that does not disproportionately bias the comparison at the high or low end of the list, and b) define which pairs are dissimilar enough to be excluded from further analysis. The dataset itself follows a rough sigmoid curve, with a few pairs in the 1000s, more in the 100s, a lot in the upper 10's, some in the low 10's, and a few in the single digits. I have tried a few different comparison methods so far, including percent difference and relative percent difference of both the raw and log-transformed data. [more inside]
posted by nekton
on May 24, 2013 -
I'm about to begin a new project that looks at the outcomes of specific events, and would like to query the hivemind to see what kinds of approaches I can take to it. I'm always impressed with the wide variety of approaches to statistical problems I see on here. [more inside]
posted by Tooty McTootsalot
on May 24, 2013 -
Looking for statistical information on how many students study history at the B.A., M.A. and Ph.D. level in various non-US countries. [more inside]
posted by agent99
on May 9, 2013 -
So I'm going to Kenya in 5 weeks time for some work, and I'm meant to be briefing some colleagues (emphasis on brief) about some aspects of our work tomorrow. Something has leapt out at me, and I don't have the time to research it myself before presenting. [more inside]
posted by smoke
on May 7, 2013 -
In English, scientists customarily use the word "significant" or "statistically significant" to refer to an effect that is distinguished from zero at a p < .05 confidence level. On the other hand, the word "significant" in non-technical English carries a connotation of being meaningful, important, or substantial; this creates confusion when researchers write about "a significant effect," since the effect might be significant in the statistical sense while being so small as to be insignificant in the common-English sense.
In your native language, what word is used for "signficance" in the statistical context? Is the same word used outside the technical context, and if so, is it a word whose common meaning is something more like "detectable," more like "important," or something else entirely? In particular, does the confusion that arises in English also take place in your language?
posted by escabeche
on Apr 24, 2013 -
I have five structural equation models that are identical except for the final outcome variable. Should I expect the model fit statistics to vary more than negligibly? [more inside]
posted by aaronetc
on Apr 11, 2013 -
I'm curious what the most frequently purchased colors are for regular, non-jean pants for men. I was discussing this with a colleague today, and guessed black and navy. I'm looking specifically for industry numbers or anecdotal evidence from those working in clothing manufacturing or sales. (I already guessed!) Thanks in advance.
posted by whitebird
on Apr 5, 2013 -
Feeling a little stuck in my current job; unsure if I should go for the PhD or cross it off my list and change jobs. [more inside]
posted by un petit cadeau
on Apr 5, 2013 -
I've been thinking about product ratings online. Product A and B both have an average rating of 4 stars. Product A is universally liked: every reviewer gave it 4 stars. Product B is the Twilight Series: lots of people love it (5 stars), but many 1 star reviews drags down the average to 4 stars.
Other than displaying the rating distribution (# of 1 through 5 star reviews), are there well-known formulas that would give Product A a higher rating?
I think what I'm asking about are weighted means, or some sort of formula that takes into account variance or skew.
But rather than re-invent the statistical wheel, I was hoping some of you may be able to point out well-known examples of good weighted formulas, or research related to this question.
Hope this is clear! Thank you!
posted by User7
on Apr 1, 2013 -
I never use SPSS (I hate it like I hate nothing else, except perhaps Excel) but I must
use SPSS for this problem. Normally I'd just use R and be done with it, but SPSS is necessary for this problem. How can I get a simple error bar plot from two proportions? [more inside]
posted by Philosopher Dirtbike
on Mar 18, 2013 -
The fallacy is assuming that statistic information about a thing is more relevant in dealing with a particular instance of that thing than available first-hand data. [more inside]
posted by CustooFintel
on Mar 12, 2013 -