Problem with computer
August 13, 2004 7:07 PM   Subscribe

My computer is driving me nuts! Something about it is extremely broken, because it's wildly unstable.

It's new, assembled by AccessMicro. Athlon XP 3000, lots o' memory,
GeForce FX5200 video card. I'm running Gentoo linux with a 2.6.6
vanilla kernel. It's given (especially when APIC is enabled) to
segfaults when doing processor-intensive things (especially in gcc
when compiling, say, Mozilla or OpenOffice), and when I've left it on
overnight I've woken up to kernel panics (unrecorded in
/var/log/messages, though)--BUT, if I leave it on all day, no
problem! Seriously, it's consistently done this at night and not
during the day. Also, I've noticed that other processes
are far more responsive to the CPU's being used heavily than they were
on my old computer--xmms cuts in and out, for example.

But that's not the main problem! The main problem is, I
think, to do with the video card, or its driver, or
something. X (true of both X.org and Xfree) will seemingly randomly
crash, as will X apps (firefox is peculiarly prone to this, for some
reason (someone on the gentoo forums reported problems with firefox
and fluxbox, which I use, so maybe that's an issue, but since I have
broader problems I'm skeptical that it's the sole cause)). Or if X
doesn't crash, it will freeze, and the keyboard and sometimes also the
mouse stops responding. Sometimes when this happens, the screen gets
all messed up too. In these cases, hard reboot; it sucks.

One thing that happened recently when I had to reboot was that the
console came up in reverse video mode with odd colors; I opened the
case, the video card was pretty warm (it has a fan on it), let it sit
still for a bit, and restarted again still with the case open and the
screen was back to normal. Could all those problems be caused by an
overheated video card (and if so, given that there's already a fan on
it, and three other fans to boot, how can I cool it down more, other
than just leaving the case open?).

I've run memtest86 and the memory seems ok. I really hope that this
is, all unbeknownst to me, trivially solved, because I recently had to
wait 10 days for a replacement CPU fan and it was harrowing.
Harrowing, I tell you. Though at least it's still under warranty.
posted by kenko to Computers & Internet (20 answers total)
 
Response by poster: So, my question is, what the shit's wrong and how can I fix it?
posted by kenko at 7:08 PM on August 13, 2004


Install Windows.

Or, more seriously, run a more stable version of Linux than Gentoo. Debian, for example.
posted by reklaw at 7:14 PM on August 13, 2004


What happens when you leave it on overnight only in console mode?
posted by cmonkey at 7:34 PM on August 13, 2004


  • Start with running a stock kernel.org or -ac kernel. Next, record panics (use a serial console or pen and paper), and report them to the linux kernel mailinglist.
  • Try using an unaccellerated video driver.
  • If crashes are really more frequent at specific times, it might be a lack-of-sufficient-power issue. Does your power supply unit have enough capacity?
For future reference, his is AskMeFi, not alt.pieceofsoftware.support. There are better channels for getting support for these kinds of things.
posted by fvw at 7:39 PM on August 13, 2004


For future reference, his is AskMeFi, not alt.pieceofsoftware.support.

I think kenko's is a perfectly good question. There are enough mac/windows people asking "How can I make my iTunes wash my car for me", this adds some spice.
posted by cmonkey at 7:45 PM on August 13, 2004


Can you send the machine back? There are a lot of things it could be (heat, software, dodgy video card, dodgy CPU, etc.), and finding the specific problem yourself will be both frustrating and time consuming. Unless you're looking on this as a personal trial, I'd try to get the vendor to do the work.

If you want to do it yourself, what you need to do is list out the possible problems, and then isolate each one, test, and repeat.

For heat: check your temperatures in the BIOS and make sure nothing is running hotter than it should be. Also, make sure that all your fans are working, especially on your processor and video card.

For video: check the software by up- or downgrading drivers. Swap out your video card if possible to check hardware.

For software: run another OS. Knoppix is nice here -- you can run it off the disk, so you don't have to overwrite your current installation.

Sorry I can't be more specific, and best of luck.
posted by amery at 7:48 PM on August 13, 2004


Response by poster: cmonkey: that happens when X is running, but not displayed (startx, blah, followed by ctrl-alt-f2). I'll see what happens when it's not running at all.
posted by kenko at 7:54 PM on August 13, 2004


Actually, amery has a good idea, try booting knopix and leaving it overnight and see what happens.
posted by cmonkey at 7:57 PM on August 13, 2004


Isn't this likely related to some cron job that happens at night?
posted by bingo at 8:34 PM on August 13, 2004


Response by poster: That's possible, I guess, but I just ran the only two programs invoked by cron nightly (makewhatis and updatedb), and survived.
posted by kenko at 8:43 PM on August 13, 2004


Sounds like it's connected to heavy system load, then--I dunno about makewhatis, but updatedb is pretty disk-intensive, isn't it?--so as a first guess I'd check for cooling problems.
posted by arto at 9:09 PM on August 13, 2004


Sounds like a hardware problem to me. Possibly the video card. It's probably not temperature related, if it fails at night. Problem happened in console mode, so changing the X driver isn't likely to help. I'd try swapping in a different video card (preferrably an expendable one), then the power supply, before fiddling with the software.
posted by sfenders at 9:32 PM on August 13, 2004


... however, I'm the kind of guy who has plenty of spare video cards lying around. If you want to exhaust all possibilities for a software fix, maybe it's related to disk load. updatedb is way more disk- than cpu-intensive. So, try changing whatever BIOS options you have for IDE, such as DMA transfer. Reversed video on the console seems an unlikely failure mode for that, but it's not completely impossible.
posted by sfenders at 10:13 PM on August 13, 2004


If crashes are really more frequent at specific times, it might be a lack-of-sufficient-power issue. Does your power supply unit have enough capacity?

-Seconded.

Before going any further, check your warranty.
posted by Keyser Soze at 10:17 PM on August 13, 2004


Response by poster: Just out of curiousity—how are people making the connection between specific times of crashes and the power supply? I don't doubt that it's relevant, I just never would have arrived there myself.
posted by kenko at 6:49 AM on August 14, 2004


fvw, this is AskMe, not shit-on-me 'filter. I believe when I asked a n00b Linux question last, your suggeted I was horribly underqualified for what I was attempting, which may be true, but how else to learn? People come here to learn, so answers should come from people who are actually interested in teaching.
posted by yerfatma at 7:45 AM on August 14, 2004


Mainly, we're all just guessing wildly according to our prejudices. Some guesses are more obviously wrong than others, but really there isn't enough information to go on. Power supplies often fail, and trying a new one is easy, so...

To find out whether it crashes while some cron job is running at night, leave it running "while (true) ; do uptime >> load.log; sleep 60 ; done" or something.

Here's what I would do if I still did this stuff for a living:

0: make some effort to find one reliable way to cause the problem on demand, say by repeatedly starting X until it crashes, or running X11perf while doing heavy disk IO, or whatever. Maybe that will reveal more information about the problem, but if not...

1: boot from a CD or floppy and see if you can get it to fail in a similar way, with and without using your hard drive.

2: fool around with all the settings in the BIOS, setting them all to the most conservative.

3: Just because it's easy to do, find out of if it's temperature related. Not that it will help all that much to know this, if it fails at normal operating temp. To do this, run it in a cool place with the case open and a big fan blowing on it after it's been room-temp for a while. Then, run it in a warm place with a towel over it. In ancient times, if it worked when cold, we'd look for broken traces and such that might fail on heat-expansion. But that isn't so easy these days.

4: swap out the boards in the system one-by-one, starting with the video card. If there are things like a NIC you can just remove, remove them. If you've more than one memory module, and the system can run without one then remove one at a time, just in case.

5: try replacing the power supply, and maybe the hard drive.

6: if none of that worked, repairing it is probably going to be complicated and expensive. If you're completely insane, this is the point at which you would get out your motherboard schematic and scope.
posted by sfenders at 7:49 AM on August 14, 2004


I have no idea why so many of you are going on and on about hardware when it's far more likely to be a operating system/software issue. Try everything there first before you start futzing with the hardware.
posted by reklaw at 7:48 AM on August 15, 2004


"The console came up in reverse video mode with odd colors" at boot. Various problems occur seemingly at random in all kinds of apparently unrelated software. The kernel in use is known to be stable for almost everyone.

I'd say odds are around 95% that it's a hardware problem.
posted by sfenders at 8:41 AM on August 15, 2004


Suspect the dodgy video card. If it swaps OK, suspect the PSU. If things are still going wrong, it's possibly the mobo or CPU.

The chances that it's a Gentoo thing are slim, but non-zero, considerably higher if you've built stuff with -O3 (which, as far as I know, is still broken on x86 on all recent versions). It's also possible the nVidia kernel module is hosing you, since it's largely a proprietary binary scary thing, but zillions of people are using it with relatively few problems, so I wouldn't make it the first candidate for troubleshooting.
posted by majick at 5:39 PM on August 15, 2004


« Older Mac user needs help picking a windows laptop   |   Good PalmOS games to download? Newer »
This thread is closed to new comments.