Help me undestand this Cacti graph re: cooling & thermal sensors
December 8, 2009 8:28 AM   Subscribe

Trying to figure out if I'm cool with my computer's cooling. What does this Cacti graph mean?

OK here's the graph of the thermal sensors on my standard ATX machine on a weekly readout. It has four hard drives in it. Prior to week five, it had a rear outtake fan and where the side grate is, a cone/funnel (like this dude).

Between weeks 5 and 7, the computer had a rear outtake fan plus a side intake fan in place of the funnel. However, the side fan was pretty loud and annoying so last night I took that out, and what you see past week 7 is just the rear outtake fan and an unplugged side fan on the side grate.

How do the numbers look? What's the best way to interpret this Cacti graph? Thanks!
posted by xmutex to Technology (5 answers total)
 
I assume that Temp3 is your CPU. 65C is pretty hot, around the maximum recommended temperature for most Core 2 Duo CPUs. I'd find a way to get a side fan back in place. A quieter fan (I like SilenX and Scythe's quiet fans) along with some rubber fan mounts shouldn't raise your noise levels significantly.
posted by zsazsa at 8:46 AM on December 8, 2009


I'd interpret it as follows:

You started that log with the CPU under heavy load, it runs rather hotter than I would personally feel comfortable with (though in my experience, you have at least a 50% chance that the numbers reported require some unknown scaling factor to convert to actual temperatures - Case in point, do you really have an ambient of 29C?).

At time 5.2, your CPU load dropped off and everything almost instantly went back to normal - That suggests you don't have a critical problem with your heatsink (such as a huge air-gap in the thermal compound).

The part of it that would worry me happens t 7.25 (not 7.45) - Guessing that temp2 corresponds to your HDD, it really shouldn't go up that fast, nor should the ambient - Did something possibly block the case's vents at that point?

Otherwise, it looks like a fairly normal graph. If that represents your real CPU temp I might worry, but since it holds steady below 70C, I don't think you need to worry about it going up in smoke, at least.
posted by pla at 9:14 AM on December 8, 2009


Response by poster: Thanks! I've ordered a SilenX 80mm fan to replace the whiny one I have.

Is there some way to figure out what parts of the system temp1, temp2, and temp3 correspond to?
posted by xmutex at 5:31 PM on December 8, 2009


Is there some way to figure out what parts of the system temp1, temp2, and temp3 correspond to?

Well, we can say right off the bat that Temp3 corresponds to CPU. To prove this, scale the graph in as much as you can and start a CPU intensive app, it should almost instantly (a few seconds) shoot up to near its max. On stopping that same app, the temperature should drop right back down to near ambient, again within a few seconds.

We can almost safely say that Temp2 gives you the ambient temperature inside the case (sorry, I note that I said temp2 as HDD previously - Swap that for temp1 previously). To test that, block your case's ventilation (sticking it in a cardboard box only slightly bigger than the case will do well) for 15 minutes, not under load. You should see that line slowly creep up over the whole time (watch it to make sure it doesn't go too fast, and obviously stop the test if it gets way too high (35-40).

Now the last one, Temp1, gets a bit tricky. It could mean your HDD (but which one?), it could mean nothing at all. Easiest way to tell which HDD (if any)... Let the system idle so all the graph lines drop basically to ambient. Now defragment a drive. Repeat for all four drives... Did any of them obviously have Temp1 ramping up separate from CPU and ambient? If so, you know your number; If not... Well, like I said, it could literally mean nothing - My own board has three temperature channels, and shows CPU, ambient, and ambient ±2 (drifts randomly over time, possibly a just second ambient sensor somewhere else on the board).

A possibly easier approach, If you run Windows - Grab a copy of HWMonitor. It doesn't do graphs or anything fancy, but it does give a nice clean list of all the temperatures (and a few other things like voltages and fan speeds) it can find in your system. This includes system-board temps, CPUs per-core temps, and most usefully, per-HDD temps. It won't really help you profile your system over time, but it gives a great and easily-readable snapshot you can use as a reference.
posted by pla at 7:18 AM on December 9, 2009


Best answer: here's my out for all of lm_sensors:

k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp: +22.0°C
Core0 Temp: +23.0°C
Core1 Temp: +25.0°C
Core1 Temp: +25.0°C

it8718-isa-0228
Adapter: ISA adapter
[...snipped...]
temp1: +30.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
temp2: +23.0°C (low = +127.0°C, high = +127.0°C) sensor = thermal diode
temp3: +63.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
cpu0_vid: +1.550 V

It makes me think that k8temp is a more accurate depiction of my cpu temps, and it8718 isn't... could that be right?


Not so much "more accurate" as simply reading from a different place and requiring some scaling factor - Though as I said before, keep in mind that any of those numbers could mean absolutely nothing but random garbage.

So... Based on your earlier graph, I still suspect that temp3 refers to your CPU - But measured on the motherboard, not inside the CPU. The "Core" temps, when available, tend to give you dead-on accurate numbers. So temp3 needs a scaling factor, and will tend to lag the core readings by a few seconds.

The above info also strongly suggests that neither temp1 nor temp2 refer to your HDDs... Most likely temp2 refers to ambient, and temp1 measures at somewhere likely to get warm (but not CPU-warm), such as your chipset or an integrated GPU (if you have one).

So then, for your HDDs... For each drive you have, run:
smartctl -a /dev/hda | grep -i "temp"
Substituting the appropriate device name for /dev/hda (you can get these as the first column of the output to "df", if you don't know them offhand).

That will give a line that looks like:
194 Temperature_Celsius 0x0022 029 042 000 Old_age Always - 29 (Lifetime Min/Max 0/19)
From that line, the 4th column gives the current temperature (29C), and the fifth gives you the worst recorded value (42C). Note that not all drives have a temperature available via SMART reporting, not all drives support SMART at all (though anything less than 5 years old should), and not all drives will give you a number directly interpretable as degrees Celsius (some Samsung drives, for example, report 10x the temperature). But most modern drives will give a useful number.

Now, to finally get back to your original concern... Peg the CPU for a few minutes, and dump lm_sensors again. The "Core" values should give you a good idea of whether or not you have a cooling problem, but based on the difference between it8718 (which your graph seems to have used as its data source) and k8temp, I think you'll likely see pretty reasonably temperatures.
posted by pla at 7:07 AM on December 10, 2009


« Older I can hearz you knocking, stopz knocking while the...   |   Is it wrong to masturbate to one's exes when in a... Newer »
This thread is closed to new comments.