Logarithm Crash Course
October 14, 2005 12:02 PM
I'm working on a data graphic with a linear scale horizontally but a logarithmic scale vertically. The y-axis (count scale) values range from 0 to 200, with the mode average in the lower third of that range. I'd like the graph to take up only a third of the size vertically that it currently does.
I can recognize a logarithmic plot when I see one, and I know what I want generally, but I don't have enough of a handle on the Math to actually calculate and plot such a scale myself... at least accurately.
I'd appreciate any help in understanding the Math. It's one thing to see a formula, and another to know how to use it. I'm also not 100% sure that this is the best way to handle such a graph, so alternate strategies or confirmation are welcome.
I need a better idea of the data...correct me if I'm wrong about this.
First of all, did you make this graph and have the ability to play with the data, or is it just a graphic?
What you have is count data which you have used to make a histogram. The x axis is composed of linear "bins". Due to the fact that the counts have very high and very low values, you have log-transformed the counts. This means that you (or whoever made the graph) simply took the log of the y values (either ln, which is base e, or log, which is base 10). If it's the natural log (ln), you could take the log of the data and it would end up "shorter" (in fact, .43 the original ln-transformed data).
I hope that helps you.
posted by nekton at 12:13 PM on October 14, 2005
First of all, did you make this graph and have the ability to play with the data, or is it just a graphic?
What you have is count data which you have used to make a histogram. The x axis is composed of linear "bins". Due to the fact that the counts have very high and very low values, you have log-transformed the counts. This means that you (or whoever made the graph) simply took the log of the y values (either ln, which is base e, or log, which is base 10). If it's the natural log (ln), you could take the log of the data and it would end up "shorter" (in fact, .43 the original ln-transformed data).
I hope that helps you.
posted by nekton at 12:13 PM on October 14, 2005
kaleidagraph will plot in a log scale without transforming the data. I'm not sure what you are using to plot. If you want, email me the data and I'll make a log plot. If you just take the log of the data, you will get the compression, but your y-axis scale will be linearly spaced and in log units.
posted by 445supermag at 12:29 PM on October 14, 2005
posted by 445supermag at 12:29 PM on October 14, 2005
If some of the values are zero, you definitely don't want to use a logarithmic scale: zero becomes -∞ on a logarithmic scale.
posted by DevilsAdvocate at 12:34 PM on October 14, 2005
posted by DevilsAdvocate at 12:34 PM on October 14, 2005
I should clarify that the graph I posted doesn't yet have a logrithimic y-scale. Right now, both x and y are linear. I'd like y to be logrithimic, but I'm not sure it's even possible to do accurately at 72DPI.
Also, the data isn't real; it's representative of what it could be. The range of counts in each "bin" could be between 0 and 200. The gray bars would range between 0 and 60, with the white bars extending all the way to 200.
posted by Jeff Howard at 12:42 PM on October 14, 2005
Also, the data isn't real; it's representative of what it could be. The range of counts in each "bin" could be between 0 and 200. The gray bars would range between 0 and 60, with the white bars extending all the way to 200.
posted by Jeff Howard at 12:42 PM on October 14, 2005
If some of the values are zero, you definitely don't want to use a logarithmic scale
So if after the conversion, 0=0 and 200=60, with most counts in the lower third of that range, what other kinds of transformations are possible?
posted by Jeff Howard at 12:49 PM on October 14, 2005
So if after the conversion, 0=0 and 200=60, with most counts in the lower third of that range, what other kinds of transformations are possible?
posted by Jeff Howard at 12:49 PM on October 14, 2005
One way to visualize this is use base-2. Assume 256 is the top of the range (instead of 200); 256 is 2 to the 8th power.
Visualize the y-axis as being divided into 8 parts (into eighths). Then a point with a non-log value of 2 is 1/8th of the way up the scale, when plotted logarithmically (versus 2/256th up a regular y-axis), a non-log value of 8 is 3/8th up (versus 8/256ths), and a non-log value of 128 is 7/8ths up (versus 1/2).
In a log scale, the zero point (on the y-axis) translates into a regular (that is, non-log) value of 1. (Any nonzero quantity raised to the zero power equals one.) So yes, there are issues with values (before transformation) of zero.
posted by WestCoaster at 12:57 PM on October 14, 2005
Visualize the y-axis as being divided into 8 parts (into eighths). Then a point with a non-log value of 2 is 1/8th of the way up the scale, when plotted logarithmically (versus 2/256th up a regular y-axis), a non-log value of 8 is 3/8th up (versus 8/256ths), and a non-log value of 128 is 7/8ths up (versus 1/2).
In a log scale, the zero point (on the y-axis) translates into a regular (that is, non-log) value of 1. (Any nonzero quantity raised to the zero power equals one.) So yes, there are issues with values (before transformation) of zero.
posted by WestCoaster at 12:57 PM on October 14, 2005
A logarithmic bar plot doesn't sound like such a good idea. Since it's not common, it could be perceived as misleading. Especially so when the white extends from the grey. The point of that is that grey and white should somehow be compared, right? In a log plot, that comparison is difficult without looking up the numeric values on the axis and comparing those. Also, what DevilsAdvocate said.
posted by springload at 1:07 PM on October 14, 2005
posted by springload at 1:07 PM on October 14, 2005
that comparison is difficult without looking up the numeric values
Good point. These graph are also going to exist only as 72DPI images. I'm starting to realize that working on such a chunky resolution isn't really compatible with the fine degree of resolution a logrithim requires.
posted by Jeff Howard at 2:02 PM on October 14, 2005
Good point. These graph are also going to exist only as 72DPI images. I'm starting to realize that working on such a chunky resolution isn't really compatible with the fine degree of resolution a logrithim requires.
posted by Jeff Howard at 2:02 PM on October 14, 2005
If you are doing statistical analysis, be aware there are issues in transforming your data, especially where log-transformation is concerned.
posted by Rothko at 2:16 PM on October 14, 2005
posted by Rothko at 2:16 PM on October 14, 2005
You should be choosing the most appropriate way of presenting the data. By doing that you are maximizing your use of the available resolution as well as page area. Weather it is logarithmic or linear or anything else, you are choosing the scale because you want to present the data in that manner. Log gives greater overall range and good resolution for small values at the cost of resolution for large values.
I'm not sure weather I agree with springload here, for example I believe volume in a graphic equalizer is often displayed as a log bar graph with left and right channel overlapping. On the other hand I think it could be very confusing for other types of data...
Anyway, the easiest way I know to do it is:
posted by Chuckles at 2:31 PM on October 14, 2005
I'm not sure weather I agree with springload here, for example I believe volume in a graphic equalizer is often displayed as a log bar graph with left and right channel overlapping. On the other hand I think it could be very confusing for other types of data...
Anyway, the easiest way I know to do it is:
- Arrange your x and y values as columns in a table.
- Add a third column for log(y).
- Plot your graph with the values in the log(y) column instead of y. Treating the log(y) values as if they were linear.
- Choose the y-axis points you want to mark, take the inverse log of the heights, and write that value on the axis.
posted by Chuckles at 2:31 PM on October 14, 2005
Okay, I think I've got it. I just take the log of the count and multiply it by a linear scaling factor in order to get my desired height. That second part was what I didn't understand... I thought it should be more complicated somehow.
Here's the code:
Math.ceil(Math.log( i ) * 20);
Where "i" is a number between 1 and 200, and 20 is the scaling factor that gets my target graph to be the desired overall height (in this case 120px).
Here's a basic implementation:
Demo
posted by Jeff Howard at 6:46 PM on October 14, 2005
Here's the code:
Math.ceil(Math.log( i ) * 20);
Where "i" is a number between 1 and 200, and 20 is the scaling factor that gets my target graph to be the desired overall height (in this case 120px).
Here's a basic implementation:
Demo
posted by Jeff Howard at 6:46 PM on October 14, 2005
A log scale cannot include zero. Assuming these are histogram bins, that is a problem for you; you'll need to change any 0 into something like 0.001 which will come out as -3 on a log scale and therefore off the bottom of the graph.
You should probably only use semilog-y if the y scale has some log/multiplicative basis to it. See how you've got the white bars above the grey bars? In linear, the length of the white bar shows the difference (ie subtraction) between the white and grey values. In logarithmic, the length of the additional white bar will be the ratio (ie division) of the white and grey values.
Is this what you really want and (just as important), will your audience understand this?
posted by polyglot at 11:32 PM on October 14, 2005
You should probably only use semilog-y if the y scale has some log/multiplicative basis to it. See how you've got the white bars above the grey bars? In linear, the length of the white bar shows the difference (ie subtraction) between the white and grey values. In logarithmic, the length of the additional white bar will be the ratio (ie division) of the white and grey values.
Is this what you really want and (just as important), will your audience understand this?
posted by polyglot at 11:32 PM on October 14, 2005
If the problem is that you have 2 in one bin and 200 in another, why not rejigger the bins so that the smallest bin has ~50?
Or -- and you'd probably need to edit in graphics editor for this -- just create a linear histogram and then edit out an appropriately wide middle section of the graph to bring the height down lower? This is legitimate as long as you make it VERY VERY VERY SUPER OBVIOUS that you've done it -- a simple way would be to use a shaded background and leave a layer of whitespace at the breakpoint.
posted by ROU_Xenophobe at 9:45 AM on October 15, 2005
Or -- and you'd probably need to edit in graphics editor for this -- just create a linear histogram and then edit out an appropriately wide middle section of the graph to bring the height down lower? This is legitimate as long as you make it VERY VERY VERY SUPER OBVIOUS that you've done it -- a simple way would be to use a shaded background and leave a layer of whitespace at the breakpoint.
posted by ROU_Xenophobe at 9:45 AM on October 15, 2005
This thread is closed to new comments.
posted by Jeff Howard at 12:03 PM on October 14, 2005