<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter questions tagged with r</title>
      <link>http://ask.metafilter.com/tags/r</link>
      <description>Questions tagged with 'r' at Ask MetaFilter.</description>
	  <pubDate>Wed, 04 Nov 2009 09:46:36 -0800</pubDate> <lastBuildDate>Wed, 04 Nov 2009 09:46:36 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>Rates of success?</title>
	<link>http://ask.metafilter.com/137228/Rates%2Dof%2Dsuccess</link>	
	<description>Statistics question: is it possible to test sets of cumulative data for significant differences in rate? I have three cumulative percentage graphs, measuring the germination rates of three different seed types. Is there a way to compare them and see if there are any statistically significant differences?&lt;br&gt;
&lt;br&gt;
The seed types were planted in triplicate, on three dishes each (nine overall). Every day for the past few weeks I&apos;ve observed how many seeds on each dish have begun germinating -- so for an individual dish I would have &quot; Day 1: 0 ... Day 7: 14 ... Day 14: 29&quot; etc, with each day&apos;s score a cumulative total. (There are 100 seeds on each dish, so it works as a percentage rate as well)&lt;br&gt;
&lt;br&gt;
In Excel, I&apos;ve graphed the average germination rates of the replicates, for a graph that &lt;a href=&quot;http://trenchfever.files.wordpress.com/2008/03/cumulative-civilian-and-service.jpg&quot;&gt;resembles this one&lt;/a&gt;. (with three lines plotted, and x-axis = time in days, y-axis = percent germinated).&lt;br&gt;
&lt;br&gt;
So is there a way to compare these different rates statistically? I can use Excel, Minitab, SPSS, and R.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.137228</guid>
	<pubDate>Wed, 04 Nov 2009 09:46:36 -0800</pubDate>
	<category>chart</category>
	<category>cumulative</category>
	<category>data</category>
	<category>excel</category>
	<category>graphs</category>
	<category>mathematics</category>
	<category>minitab</category>
	<category>r</category>
	<category>rates</category>
	<category>science</category>
	<category>spss</category>
	<category>statistics</category>
	<dc:creator>rollick</dc:creator>
	</item>
	<item>
	<title>BookSuggestionFilter: I need to learn R (and about statistics) in a hurry.</title>
	<link>http://ask.metafilter.com/136197/BookSuggestionFilter%2DI%2Dneed%2Dto%2Dlearn%2DR%2Dand%2Dabout%2Dstatistics%2Din%2Da%2Dhurry</link>	
	<description>BookSuggestionFilter: I need to learn about R (and statistical modeling) in a hurry. I need to learn R in a hurry. I&apos;m heading full steam into a new software development project at work where I will be working with some hard-core statistical modelers who are building models in R (as well as using other stuff like SAS).&lt;br&gt;
&lt;br&gt;
I have a degree in engineering so I am ok with the math, but it has been over a decade since I did anything with statistics (and it was pretty rough even back then).&lt;br&gt;
&lt;br&gt;
I&apos;m looking for some book recommendations for the following:&lt;br&gt;
1) a refresher on general statistics and some introductory material with statistical modeling&lt;br&gt;
2) a book about R for someone who is an experienced software developer</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.136197</guid>
	<pubDate>Thu, 22 Oct 2009 19:17:29 -0800</pubDate>
	<category>books</category>
	<category>R</category>
	<category>statistics</category>
	<dc:creator>kenliu</dc:creator>
	</item>
	<item>
	<title>[R] you experienced with plot() and barplot() ?</title>
	<link>http://ask.metafilter.com/119110/R%2Dyou%2Dexperienced%2Dwith%2Dplot%2Dand%2Dbarplot</link>	
	<description>How do I graph these data using R&apos;s plot() and barplot() commands? I have data that I need to graph using R. Unfortunately, the examples I have found in books and on the web are too simplistic and do not give me much insight into the process.  I have never used R before, and have been trying for hours and hours to figure out how to do what I need to do. I also already read the previous posts about R here on Ask MeFi, but I am still stumped.&lt;br&gt;
&lt;br&gt;
I&apos;ve made a fake set of data that represents the structure of my real data (which you don&apos;t want to see, it&apos;s too confusing).&lt;br&gt;
&lt;br&gt;
&lt;a href=&quot;http://bengarland.com/r/sample_data.txt&quot;&gt;Here is my sample data.&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
Take a quick look at that. &lt;br&gt;
&lt;br&gt;
There is a main factor/category, Factor1 with values of &quot;BOB&quot; and &quot;JILL&quot;, and a grouping within that called SubFactor with values &quot;Q&quot; and &quot;P&quot;. I also have blocked reps of everything, labeled A, B, C, D. My measurements are Measurement1, Measurement2, Measurement3, etc which I will abbreviate as M1, M2, M3, etc.&lt;br&gt;
&lt;br&gt;
To cut to the chase, I made a few hand drawings of the graphs I need. Please note that the graphs are illustrative only, and do not correspond to the data above.&lt;br&gt;
&lt;br&gt;
&lt;strong&gt;Please walk me through this step-by-step and assume I know absolutely nothing about how to get my data into R, how to manipulate it as needed, and finally, how to produce the following graphs.&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
&lt;strong&gt;ALSO&lt;/strong&gt; if any of you R graphing experts want to take me under your wing and help me learn more than I have outlined here (surely I will have future questions), please send me a MeMail.&lt;br&gt;
&lt;br&gt;
----------&lt;br&gt;
&lt;br&gt;
GRAPH 1...&lt;br&gt;
&lt;br&gt;
In the first, I want to make a bar graph of just &lt;em&gt;the average&lt;/em&gt; of the M1 measurements for each treatment, split into categories &quot;BOB&quot; and &quot;JILL&quot; with the grouped &quot;P&quot; and &quot;Q&quot; subfactors.&lt;br&gt;
&lt;br&gt;
So that we&apos;re on the same page here, the mean of BOB(Q) for M1 = (100+500+200+300)/4 = 275, for example.&lt;br&gt;
&lt;br&gt;
Here is my drawing for this: &lt;a href=&quot;http://bengarland.com/r/graph_sample3.jpg&quot;&gt;Bar Graph 1&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
----------&lt;br&gt;
&lt;br&gt;
GRAPH 2...&lt;br&gt;
&lt;br&gt;
In the second, I want a plot of the measurements for &quot;BOB&quot;, with M1, M2, M3, M4, and M5 on the X-axis. The lines will represent the &quot;P&quot; and &quot;Q&quot; subfactors.&lt;br&gt;
&lt;br&gt;
Here is my drawing for this: &lt;a href=&quot;http://bengarland.com/r/graph_sample1.jpg&quot;&gt;Plot 1&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
----------&lt;br&gt;
&lt;br&gt;
GRAPH 3...&lt;br&gt;
&lt;br&gt;
In another bar graph, I want to have M1, M2, M3, M4, M5 be the bars, and have them grouped by BOB (Q, P) and JILL (Q, P) on the x-axis. The Y-axis would show the means.&lt;br&gt;
&lt;br&gt;
Here is my drawing for this: &lt;a href=&quot;http://bengarland.com/r/graph_sample2.jpg&quot;&gt;Bar Graph 2&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
----------&lt;br&gt;
&lt;br&gt;
GRAPH 4...&lt;br&gt;
&lt;br&gt;
In the final plot, I want to graph only the mean measurements for &quot;BOB&quot; sub-level &quot;P&quot; &lt;em&gt;over time&lt;/em&gt;. Let&apos;s say that each M1, M2, M3, M4, M5 was taken at unique times T1, T2, T3, T4, T5 where the times could have values T1=15, T2=45, T3=55, T4=65, T5=70 and they need to be spaced properly on a timeline (i.e. the distance between T1 and T2 should be 30 units, T2 and T3 10 units, T3 and T4 10 units, T4 and T5 5 units) -- &lt;em&gt;I spaced them evenly on the graph, so ignore that.&lt;/em&gt;&lt;br&gt;
&lt;br&gt;
Here is my drawing for this: &lt;a href=&quot;http://bengarland.com/r/graph_sample4.jpg&quot;&gt;Plot 2&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
----------&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
THANKS!!!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.119110</guid>
	<pubDate>Thu, 09 Apr 2009 16:39:39 -0800</pubDate>
	<category>barplot</category>
	<category>graphing</category>
	<category>plot</category>
	<category>r</category>
	<category>resolved</category>
	<category>statistics</category>
	<dc:creator>bengarland</dc:creator>
	</item>
	<item>
	<title>R.rrrrgh</title>
	<link>http://ask.metafilter.com/116762/Rrrrrgh</link>	
	<description>Are There Good Blogs About The Statistical Programming Language, &lt;strong&gt;R&lt;/strong&gt;? I recently found the &lt;a href=&quot;http://www.r-cookbook.com/&quot;&gt;R Cookbook&lt;/a&gt; site and have enjoyed the R code and tips offered. I was wondering if there are other blogs out there that discuss how to use R better.&lt;br&gt;
&lt;br&gt;
P.S. I am asking this on ask.metafilter because Googling &quot;R&quot; is supremely unhealthful.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.116762</guid>
	<pubDate>Sat, 14 Mar 2009 15:06:04 -0800</pubDate>
	<category>blogs</category>
	<category>programming</category>
	<category>R</category>
	<category>statistics</category>
	<dc:creator>chrisalbon</dc:creator>
	</item>
	<item>
	<title>How can I convert a SAS dataset into something readable by R?</title>
	<link>http://ask.metafilter.com/112469/How%2Dcan%2DI%2Dconvert%2Da%2DSAS%2Ddataset%2Dinto%2Dsomething%2Dreadable%2Dby%2DR</link>	
	<description>How to import a SAS dataset into R (with, unfortunately, one extra degree of difficulty...)? I have a sas7bdat file. I&apos;d like to import it into R (OS X). Unfortunately, I don&apos;t have access to SAS -- I&apos;m away from work, and the file&apos;s sender has gone home for the weekend. Any free means of converting the file into a format readable by R? Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.112469</guid>
	<pubDate>Fri, 23 Jan 2009 17:48:52 -0800</pubDate>
	<category>convert</category>
	<category>data</category>
	<category>r</category>
	<category>sas</category>
	<category>statistics</category>
	<dc:creator>docgonzo</dc:creator>
	</item>
	<item>
	<title>Elegant weapons, for a more civilized age</title>
	<link>http://ask.metafilter.com/105140/Elegant%2Dweapons%2Dfor%2Da%2Dmore%2Dcivilized%2Dage</link>	
	<description>Do lisp and dialects make sense for general purpose scientific computing? I&apos;m thinking of learning some variant of lisp as my next language, mostly out of masochism. My real-world computing needs are scientific/numerical, i.e. data manipulation, some statistics, lots of curve fitting and the like, with some data acquisition thrown in . So far I have been using C for the heavy stuff, perl for the quick and dirty  and FORTRAN when I have to (I hang around with engineers). &lt;br&gt;
&lt;br&gt;
Is lisp any use for such things? All the functional recursiveness  seems pretty nifty, but I&apos;d like to pick up a tool I can actually use.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.105140</guid>
	<pubDate>Sat, 25 Oct 2008 02:35:07 -0800</pubDate>
	<category>lisp</category>
	<category>programminglanguage</category>
	<category>R</category>
	<category>scheme</category>
	<category>scientificcomputing</category>
	<dc:creator>ghost of a past number</dc:creator>
	</item>
	<item>
	<title>Visually exploring and representing survey data.</title>
	<link>http://ask.metafilter.com/90784/Visually%2Dexploring%2Dand%2Drepresenting%2Dsurvey%2Ddata</link>	
	<description>Best ways to visually explore a large survey data set? My advisor has advised me to explore my data set visually before diving in statistically. It is a large (N = 180,000+) survey data set comprised of individuals in over 80 countries. Most of the responses are categorical or dichotomous in nature, taking the form &quot;agree/disagree&quot; or &quot;yes/no/maybe.&quot; Some of them are Likert-style scales (1-5, Disagree-Agree). Many of the demographic variables are also categorical (for example, rather than asking income, &quot;income level&quot; is asked) but I do have a few continuous variables such as age. My dependent variable of interest is a scale composed of four survey items indexed to 100 (although the actual number of discrete values taken on the scale is rather low owing to the nature of the questions comprising the scale). &lt;br&gt;
&lt;br&gt;
What would be some interesting ways to visually explore this data? Obviously, scatterplots (even with jittering) are not the way to go because of the highly redundant and categorical nature of the data. I have a few boxplots that I&apos;ve generated (usually separating by gender or region). I am open to abstract suggestions or concrete suggestions using R or Stata.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.90784</guid>
	<pubDate>Wed, 07 May 2008 09:45:08 -0800</pubDate>
	<category>data</category>
	<category>graphs</category>
	<category>plots</category>
	<category>r</category>
	<category>research</category>
	<category>stata</category>
	<category>statistics</category>
	<category>visual</category>
	<dc:creator>proj</dc:creator>
	</item>
	<item>
	<title>RPy problems</title>
	<link>http://ask.metafilter.com/84618/RPy%2Dproblems</link>	
	<description>Stumped with RPy, need help badly! I&apos;d like to use &lt;a href=&quot;http://rpy.sourceforge.net/&quot;&gt;RPy&lt;/a&gt; to try to manipulate a Python 2.3.4 list within R 2.4.1. My list is made up of five arrays (four of string-type &#8212; &quot;utility&quot;, &quot;target&quot;, &quot;build&quot; and &quot;timeType&quot; &#8212; and one float-type &#8212; &quot;time&quot;).&lt;br&gt;
&lt;br&gt;
My problem is that I can&apos;t seem to build a data frame in R. I&apos;d like to, for example, group my data analysis by &apos;utility, &apos;target&apos; and &apos;build&apos; from calls made within Python.&lt;br&gt;
&lt;br&gt;
When I use &lt;code&gt;r.data_frame()&lt;/code&gt; to create a data frame object, the resulting object is not an R data frame. The following prints &quot;False&quot; on the call from &lt;code&gt;r.is_data_frame()&lt;/code&gt;:&lt;br&gt;
&lt;br&gt;
==========&lt;br&gt;
&lt;pre&gt;&lt;tt&gt;timeDataFrame = { &quot;utility&quot;:[],&lt;br&gt;
                  &quot;target&quot;:[],&lt;br&gt;
                  &quot;build&quot;:[],&lt;br&gt;
                  &quot;timeType&quot;:[],&lt;br&gt;
                  &quot;time&quot;:[] }&lt;br&gt;
&lt;br&gt;
for timeDataListObj in timeDataListArray:&lt;br&gt;
  for timeDataObj in timeDataListObj.timedata:&lt;br&gt;
    for timeDataType in timeDataTypes:&lt;br&gt;
      timeDataFrame[&quot;utility&quot;].append(timeDataListObj.utility)&lt;br&gt;
      timeDataFrame[&quot;target&quot;].append(timeDataListObj.target)&lt;br&gt;
      timeDataFrame[&quot;build&quot;].append(timeDataListObj.build)&lt;br&gt;
      timeDataFrame[&quot;timeType&quot;].append(timeDataType)&lt;br&gt;
      timeDataFrame[&quot;time&quot;].append(float(timeValue))&lt;br&gt;
&lt;br&gt;
df = r.data_frame(timeDataFrame[&quot;utility&quot;], \&lt;br&gt;
             timeDataFrame[&quot;target&quot;], \&lt;br&gt;
             timeDataFrame[&quot;build&quot;], \&lt;br&gt;
             timeDataFrame[&quot;timeType&quot;], \&lt;br&gt;
             timeDataFrame[&quot;time&quot;])&lt;br&gt;
r.print_(r.is_data_frame(df))&lt;/tt&gt;&lt;/pre&gt;&lt;br&gt;
==========&lt;br&gt;
&lt;br&gt;
Another problem I have is with syntax. For example, how can I perform a column reference like &lt;code&gt;df$target&lt;/code&gt; or &lt;code&gt;df$timeType&lt;/code&gt;?&lt;br&gt;
&lt;br&gt;
When I tried to do either:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;r.print_(df$target)&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
or&lt;br&gt;
&lt;br&gt;
&lt;code&gt;r.print_(df+r[&apos;$&apos;]+target)&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
or&lt;br&gt;
&lt;br&gt;
&lt;code&gt;r[&apos;print(df$target)&apos;]&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
I get syntax errors. Same with &lt;code&gt;r.split(df$target, df$build)&lt;/code&gt; and similar.&lt;br&gt;
&lt;br&gt;
The problem seems to come down to how these are interpreted. Either Python misinterprets the &lt;code&gt;r.print_()&lt;/code&gt; calls and complains about the $ reference, or when I use &lt;code&gt;r[&apos;print(df$target)&apos;]&lt;/code&gt;, the R interpreter doesn&apos;t have any knowledge of the variable &lt;code&gt;df&lt;/code&gt; and complains about non-existent variables.&lt;br&gt;
&lt;br&gt;
Any advice from seasoned Python/R/RPy users would be greatly appreciated. Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.84618</guid>
	<pubDate>Mon, 25 Feb 2008 21:05:44 -0800</pubDate>
	<category>program</category>
	<category>programming</category>
	<category>python</category>
	<category>R</category>
	<category>script</category>
	<category>scripting</category>
	<category>statistics</category>
	<category>syntax</category>
	<dc:creator>Blazecock Pileon</dc:creator>
	</item>
	<item>
	<title>R</title>
	<link>http://ask.metafilter.com/83510/R</link>	
	<description>I am looking for great books / guides for learning applied statistics and &lt;a href=&quot;http://www.r-project.org/&quot;&gt;R&lt;/a&gt; (at the same time). Any hivemind suggestions? I am a graduate student in political science. This summer I have 114 days of no work, no research, and no classes (yeah!). Along with studying for my comprehensive exams, I want to spend this time getting a good handle on applied statistics and &lt;a href=&quot;http://www.r-project.org/&quot;&gt;the program &quot;R&quot;&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
&lt;strong&gt;Little Background:&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
I have weak background in mathematics but a strong background (and proficiency) in programming. &lt;br&gt;
&lt;br&gt;
My first statistics classes were heavily &quot;math / theory&quot; oriented (proofs, formulas, etc...). I had trouble grasping it and really did not enjoy myself. &lt;br&gt;
&lt;br&gt;
However, in the more advanced classes we started using R and I loved it. To me, statistics in R is just enough programming language to learn / tinker in, something I am very comfortable with. I throughly enjoy spending time hacking code and so learning statistics through R seems like an interesting idea.&lt;br&gt;
&lt;br&gt;
This is all just a long winded way to asking: &lt;strong&gt;Does anyone knows of some great introductory, intermediate, or advanced books that teach applied statistics through using R?&lt;/strong&gt;</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.83510</guid>
	<pubDate>Tue, 12 Feb 2008 16:58:14 -0800</pubDate>
	<category>R</category>
	<category>statistics</category>
	<dc:creator>chrisalbon</dc:creator>
	</item>
	<item>
	<title>What is this symbol that looks like an O with a small n or r in it?</title>
	<link>http://ask.metafilter.com/75266/What%2Dis%2Dthis%2Dsymbol%2Dthat%2Dlooks%2Dlike%2Dan%2DO%2Dwith%2Da%2Dsmall%2Dn%2Dor%2Dr%2Din%2Dit</link>	
	<description>What is &lt;a href=&quot;http://farm3.static.flickr.com/2392/1828408504_cb6f463079_o.jpg&quot;&gt;this logical/matrix/algorithm symbol&lt;/a&gt;? I&apos;m trying to implement an algorithm for manipulating matrices and I think I understand the equation described, but I don&apos;t know what the symbol that looks like a big O with a small n or r in it is called, or what it really means.&lt;br&gt;
&lt;br&gt;
The actual computation seems to be a calculation of Minkowski distance having exactly i+1 links, and that it relies on knowing the distance having i links.  I&apos;ve never seen that symbol before, and would like to know what it means</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.75266</guid>
	<pubDate>Fri, 02 Nov 2007 08:54:23 -0800</pubDate>
	<category>algorithm</category>
	<category>logic</category>
	<category>matrix</category>
	<category>n</category>
	<category>O</category>
	<category>r</category>
	<category>resolved</category>
	<category>symbol</category>
	<dc:creator>i love cheese</dc:creator>
	</item>
	<item>
	<title>Optimal output settings when generating images for use in PowerPoint</title>
	<link>http://ask.metafilter.com/73828/Optimal%2Doutput%2Dsettings%2Dwhen%2Dgenerating%2Dimages%2Dfor%2Duse%2Din%2DPowerPoint</link>	
	<description>What are the optimal output settings when generating images for use in PowerPoint? I am producing plots for use in PowerPoint in &lt;a href=&quot;http://www.r-project.org/&quot;&gt;R&lt;/a&gt;. I tried using Windows Metafile output but it didn&apos;t work if I examined the file on a Mac. It looks like I am stuck with a raster image format, so I&apos;m using PNG. Various FAQ lists I have found suggest  using 1024&#xd7;768 resolution. &lt;br&gt;
&lt;br&gt;
R will produce its own titles so I don&apos;t want to mess with PowerPoint&apos;s slide templates. They also have very generous margins, but I want to use as much of the projected area as possible for my plots. Clearly I have to stop somewhere or I will be in danger of the edges being cut off, and since they have the slide title and axis labels, I&apos;d rather avoid that.&lt;br&gt;
&lt;br&gt;
So what&apos;s the minimum margin space I can use without having to worry that somewhere the stuff inside the margins will be clipped? A suggestion measured in pixels would be most helpful.&lt;br&gt;
&lt;br&gt;
II &lt;a href=&quot;http://groups.google.com/group/microsoft.public.powerpoint/browse_frm/thread/52218a4d232651b1&quot;&gt;previously asked&lt;/a&gt; this question on Microsoft&apos;s PowerPoint newsgroup, but no one felt able to give me a minimum number. I&apos;m hoping that there will be some suggestions here with the wealth of PowerPoint experience at MetaFilter.&lt;br&gt;
&lt;br&gt;
Please don&apos;t suggest using another software package instead of PowerPoint. I am well aware of the alternatives.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.73828</guid>
	<pubDate>Mon, 15 Oct 2007 05:22:53 -0800</pubDate>
	<category>margins</category>
	<category>powerpoint</category>
	<category>presentations</category>
	<category>R</category>
	<category>slides</category>
	<dc:creator>grouse</dc:creator>
	</item>
	<item>
	<title>GLM Variable Weighting</title>
	<link>http://ask.metafilter.com/58811/GLM%2DVariable%2DWeighting</link>	
	<description>Statistics. Generalized Linear Models; What&apos;s being done in this R script I&apos;ve been handed down, with the variable weights? I&apos;ve been handed an ancient and archaic R script that I&apos;m using to do some variable selection, using generalized linear models.  I&apos;m trying to work out what it&apos;s doing, and if there&apos;s some &lt;i&gt;name&lt;/i&gt; for the method of variable weighting it&apos;s using.&lt;br&gt;
&lt;br&gt;
Essentially, it runs a GLM on a null model, a model saturated with all variables, a series of models that consist of the null model &lt;b&gt;with&lt;/b&gt; each variable (ie. each variable on it&apos;s own), and a series of models that consist of the saturated model &lt;b&gt;without&lt;/b&gt; each variable.&lt;br&gt;
&lt;br&gt;
It calculates the % deviance explained for each of these models.&lt;br&gt;
&lt;br&gt;
It then calculates the &quot;change of deviance&quot; for each model - in the case of the &quot;null + variable&quot; models, this is the additional deviance explained by the model &lt;i&gt;with&lt;/i&gt; the variable, as compared to the null model.   In the case of the &quot;saturated - variable&quot; models, it&apos;s the decrease in deviance explained of the model without the variable, compared to the saturated model.  &lt;br&gt;
&lt;br&gt;
In other words, it&apos;s basically working out how much of a difference each variable makes to the model; adding a variable to a null model might produce a great increase in deviance explained, but subtracting that variable from a saturated model might decrease the deviance explained only a little, because other variables still in the saturated model might still contain the information of the missing variable.  This variable, therefore, has less explanatory power than a different variable that contains unique information.&lt;br&gt;
&lt;br&gt;
After it&apos;s calculated all these &quot;changes in deviances&quot;, it somehow converts all this into a weight for each variable (this is where the R-script loses me, and I can&apos;t figure out what it&apos;s doing).  In any case, it gives output like this.  This &quot;+&quot; and &quot;-&quot; signs indicate if it&apos;s reporting a null model &lt;b&gt;+&lt;/b&gt; the variable, or the saturated model &lt;b&gt;-&lt;/b&gt; the variable:&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;                        pcdev  		ch.dev&lt;br&gt;
+DWMeanMin              21.0156 		21.0156&lt;br&gt;
+DWMeanTemp             8.7369  		8.7369&lt;br&gt;
+DW3pmTemp              3.5637  		3.5637&lt;br&gt;
-DWMeanMin  		32.5202  		2.3141&lt;br&gt;
-DWMeanTemp 		32.9797  		1.8546&lt;br&gt;
-DWMeanMax  		33.0875  		1.7468&lt;br&gt;
-DW3pmTemp  		33.8545  		0.9798&lt;br&gt;
+DWMeanMax   		0.4372  		0.4372&lt;br&gt;
&lt;br&gt;
Weights:&lt;br&gt;
 DWMeanMin 	DWMeanTemp  	DW3pmTemp  	DWMeanMax &lt;br&gt;
    0.5739     	0.2606     	0.1118     	0.0537 &lt;br&gt;
&lt;/pre&gt;&lt;br&gt;
Which is saying that variable &quot;DWMeanMin&quot; is really useful, containing a lot of information the other variables don&apos;t.  &quot;DWMeanTemp&quot; might also be useful.  &quot;DW3pmTemp&quot; is a long shot, and &quot;DWMeanMax&quot; doesn&apos;t contribute much explanatory power at all.&lt;br&gt;
&lt;br&gt;
I understand the output. And it&apos;s useful.  I can post excerpts of the R script if required.  But what&apos;s this &lt;b&gt;method&lt;/b&gt; of variable selection called?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.58811</guid>
	<pubDate>Thu, 15 Mar 2007 21:24:04 -0800</pubDate>
	<category>generalized</category>
	<category>glm</category>
	<category>linear</category>
	<category>models</category>
	<category>r</category>
	<category>statistics</category>
	<dc:creator>Jimbob</dc:creator>
	</item>
	<item>
	<title>How can I import high-quality (ie vector) PDFs into Microsoft Word for OS X?</title>
	<link>http://ask.metafilter.com/57549/How%2Dcan%2DI%2Dimport%2Dhighquality%2Die%2Dvector%2DPDFs%2Dinto%2DMicrosoft%2DWord%2Dfor%2DOS%2DX</link>	
	<description>Why does Microsoft Word (OS X v.11) hate PDFs? Or: Please help me make this stats assignment just a little less frustrating. One of the (many!) advantages of R is that it produces crisp and clean plots, exporting them as vector (ie PDF) files. Great.&lt;br&gt;
&lt;br&gt;
However, when I add them to my DOC file (using &quot;Insert &amp;gt; Picture &amp;gt; From file...&quot;) the resulting object in the text is a low resolution picture (possibly a rendered GIF or JPG.) Ugly (and, in complex plots, nearly unusable.)&lt;br&gt;
&lt;br&gt;
How can I get Word to insert the proper high-quality PDF? I&apos;ve hunted around the web and Word and found nothing (ie neither &quot;Insert &amp;gt; File...&quot; nor &quot;Insert &amp;gt; Object...&quot; results in success.) I suppose I could render the PDFs (ie through Photoshop) into something high-quality, but that seems excessive (and would be a pain); nor do I want to layout this multi-page assignment in InDesign (or some version of TeX!)&lt;br&gt;
&lt;br&gt;
Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.57549</guid>
	<pubDate>Fri, 23 Feb 2007 12:36:57 -0800</pubDate>
	<category>import</category>
	<category>MacOSX</category>
	<category>MicrosoftWord</category>
	<category>PDF</category>
	<category>R</category>
	<category>Word</category>
	<dc:creator>docgonzo</dc:creator>
	</item>
	<item>
	<title>Statistical analysis package for OS X needed.</title>
	<link>http://ask.metafilter.com/34541/Statistical%2Danalysis%2Dpackage%2Dfor%2DOS%2DX%2Dneeded</link>	
	<description>SPSS is fine but produces offensively ugly figures. R has nice, clean output but is otherwise inscrutable. Statistica is, alas, Windows-only. Mathematica isn&apos;t really it, either. What statistics package should I use for OS X? I am in the sciences and need to do basic statistical analysis up to and including regressions. Hopefully, I will be doing more and more statistical analysis -- if the gods of grad school align as I hope -- and need a package that is adequately open-ended to include newer techniques like Bayesian and MCMC. &lt;br&gt;
&lt;br&gt;
What should I use?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2006:site.34541</guid>
	<pubDate>Fri, 17 Mar 2006 08:18:51 -0800</pubDate>
	<category>biology</category>
	<category>mathematica</category>
	<category>OSX</category>
	<category>R</category>
	<category>SPSS</category>
	<category>statistica</category>
	<category>statistics</category>
	<dc:creator>docgonzo</dc:creator>
	</item>
	
	</channel>
</rss>

