Why is my (debian/linux) computer so sluggish?
November 13, 2006 7:47 AM   Subscribe

Why is my (debian/linux) computer so sluggish?

The computer is not new, but it's about a 1.5ghz, running debian linux. It is not slow per se, but rather sluggish to respond.

If I try to fire up vi, for example, it might come right up or it might pause for 5 seconds, then bring it up. Some times it takes 10 seconds to log in.

The computer has plenty of ram and is not under heavy load. When it needs to do tasks that are CPU intensive, it does them fine, it just might take a bit to start up.

Also, it does not play DVDs steadily - there is an occaisonal, nearly undetectable (but existant) stutter. This is strange - my previous computer that I used for a media center was a celeron 500Mhz which could display full screen dvds with no problem.

I'm wondering if there is some common misconfiguration, or conflict, or *something* that might account for this. I'm a computer geek so if it was something really obvious I think I'd have figured it out.

I was suspicious of slow disks for a while (or, that my disks were going into standby and the pause was them spinning up) so I tried: disabling apmd, changing the file system type from ext3 to ext2 (ext3 can take some extra CPU for journalling) and back, making sure all the disks used DMA, etc.

Nothing has really helped. Any thoughts?
posted by RustyBrooks to Computers & Internet (25 answers total) 2 users marked this as a favorite
 
Does this happen all the time, or is it at a certain part of the day? The updatedb command can cause things to slow down as it runs in the background. Other than that, the first place I would start would be to run top with a very short refresh time and see what is using the most of your CPU time.

By the way, I run kubuntu on a 1.5GHz P4 with 2 gigs of memory and have no troubles what so ever. This includes having multiple systems up in VMWare workstation as well as amarok, firefox, kontact, terminals and a host of other things.
posted by chrisroberts at 8:16 AM on November 13, 2006


Response by poster: The CPU is not being used by anything... there is no load on the system (that is, top will report a load of 0.0 and CPU usage is low). Happens at any time of day.

This machine should be totally usable... I used a much worse computer for my desktop for years. And hell, I used a MUCH worse computer before this one for the same tasks with no problem.
posted by RustyBrooks at 8:29 AM on November 13, 2006


Is it swapping? Check vmstat or top to see the swap usage.

Does this happen at the command line or just in X? If it happens in X, what video card/driver are you using?
posted by Xoder at 8:41 AM on November 13, 2006


Okay, I guess there are a couple things to look at. First, you complain that log in time can take up to 10 seconds. This may be okay, depending on what you have automatically starting when you log in. I would say it takes about that long for me to log in, as on log in Konversation, Kopete (and Kwallet), Amarok and an instance of Konqueror start up. I only log in once in awhile as my computers stay on most of the time, so I don't notice this much.

Second, application startup. You use vi as an example. Do you notice the first time you start vi that it takes some time to start, but after that it starts right up? This could just be an issue of caching. After the first startup, you aren't going directly to disk to start the application.

Third, the DVD playback. If your system isn't using much in terms of resources, and the playback is stuttering, this leads me to think that you aren't using the proper video driver and it can't keep up with the DVD. What kind of video card do you have? How is X configured?

I wouldn't think that your filesystem would be using enough CPU time to have any noticeable effect on your system. And I would much rather run ext3 than ext2 especially with the larger disks almost everyone has now and the amount of time an fsck will take on ext2.
posted by chrisroberts at 8:44 AM on November 13, 2006


Response by poster: Nope, it's not swapping. There is about 100M of ram free. It's using 900M but 400M of that is "cache".

This is via the command line, either at the keyboard (rare) or more likely ssh (from the local network)

I just used pico to open a 1K file, it took 7 seconds. The second time it's instantaneous though. Like it had to wake up or something. Except, it should be awake... because I've been logged into it and using it for an hour. If I try to open the same file in vi, now it takes 5 seconds again, but the second try is fine.

There's nothing in the system logs (I checked to make sure I wasn't getting disk timeouts or read errors or whatever).

Very frustrating.
posted by RustyBrooks at 8:47 AM on November 13, 2006


Do you have the same issues from the console? From a lower runlevel? From single user mode?
posted by cmonkey at 8:48 AM on November 13, 2006


There's a known windows problem, where after multiple read errors (even over a large period of time), the hard disks will slip down into PIO mode from DMA mode. This causes things to drag and can cause "stuttering" like you're describing.

I don't know if this is an issue on linux or not. I've never experienced it on my Ubuntu boxes, but it may be worth looking into.
posted by chrisamiller at 8:49 AM on November 13, 2006


Response by poster: The login startup time is bad because I have *nothing* in my startup scripts except standard variable setting. I am not talking about logging into X - I'm talking about sshing into the box. This is instantaneous everywhere else.

The ext3 thing came from observations I made with my older box, where the box would peg to 100% usage and climb to a load of 10 whenever I transfered huge files from one place to another. The box is on all the time and restarts infrequently, so I don't care about how long fsck takes, particularly. Also, the box is sort of "read only" most of the time, I use it to play movies, music, etc. Anyway, I have ext3 back on now, I just changed to ext2 to try that out.

I agree that it may be some kind of caching issue, but it's bizarre that it should take so long the first time.
posted by RustyBrooks at 8:50 AM on November 13, 2006


Have you been r00ted? It's possible that your computer has been compromised and is spending a great many cycles sending out spam, smurf attacks and other nasty stuff. A typical rootkit will replace most of the standard utilities to hide that load. A tool like chkrootkit can help you find it if you have. Also try taking it off the 'net and portscanning the system (with another system running something like nmap) to see if there are any extra ports open. Linux boxes on the open net are usually heavily targetted.
Also, you might check what services you have running, for example, if you're running an http or mysql server your responsiveness will go down. Apt/dpkg have a way of installing services you don't really need.
posted by leapfrog at 8:51 AM on November 13, 2006


Response by poster: Oh, and regarding X drivers, I have an nvidia geforce4 MX-440 (and that's what X is configured for). I would guess that the driver is working fairly well. I have it configured to play movies via the XV library, and it works OK - about a billion times better than straight X.

I've never tried it in single user mode. I could give it a shot when I get home. All these issues do happen in the console though - X is not a factor.
posted by RustyBrooks at 8:52 AM on November 13, 2006


Response by poster: leapfrog: that is a good idea and I hadn't thought of that. Would it really hide CPU load though?

I have a very minimal set of services running, basically just what I need. smbd, nfs, cups. httpd, mysql, etc, etc are not running.

Sorry to comment so much in my own thread...
posted by RustyBrooks at 8:55 AM on November 13, 2006


You should also see if SMART can tell you anything (assuming your drives support it).
posted by cmonkey at 8:55 AM on November 13, 2006


Response by poster: cmonkey: ooooh I think you got it. I did some tests and smartctl lists "prefail" or "old age" for all the parameters of the test. It's not a new hard drive, for sure, maybe time for me to look for another one.
posted by RustyBrooks at 9:07 AM on November 13, 2006


When a rootkit is installed, a large number of low-level system tools are replaced in order to hide the rootkit. For instance, ls is replaced with a version that works the same but will never show the directory containing the rootkit files. Process monitoring utilities such as ps, top, and lsof are replaced with versions that don't show the running rootkit process (usually a remote shell service), and user-monitoring utilities like uptime, finger, and who are replaced with versions that won't show the additional logon processes.
In short, yes, they've thought of that. Rootkits can be quite nasty, and very hard to detect. Some of the most difficult to find don't replace the utilities themselves but rather the low level libraries that feed information to the utilities. I remember reading about one of the earliest that would insert itself into the C complier, then it would detect when the C compiler was being used to compile a new version of the C compiler and add itself to the output so as to propagate itself to the new compiler without modifying the source code.
The best way to detect a rootkit is to boot the system from a bootable rescue media (like Trinity Rescue Kit or Knoppix) and running a rootkit detector like chkrootkit.
Using a network scanner from a different machine will reveal any extra open ports which can also indicate a compromise.

Having said that, I'm more inclined to think that you have a conflict in your configuration, maybe the wrong motherboard IDE driver loaded in the kernel or debug options entered at boot time. If your system were loaded with malware enough to cause slowdown running vi and such, it would likely be digging into swap constantly.
posted by leapfrog at 10:08 AM on November 13, 2006


You could try tuning your swappiness.
posted by Zed_Lopez at 10:22 AM on November 13, 2006


"smartctl lists "prefail" or "old age""

Back up your important data. Please. Your sluggish performance can certainly indicate a disk that's about to poop out on you.
posted by drstein at 10:51 AM on November 13, 2006


Run "strace" on programs that stall to see if it's near a system call that looks suspicious. I've seen something like this with DNS problems.
posted by cmiller at 11:24 AM on November 13, 2006


Response by poster: Regarding backup: prudent but probably not required... the disk that has the errors showing is just a small disk that the system is installed on. My data is on 2 seperate large disks. That's probably why the main disk is old/failing, it was the main disk on one of my computers back when 40GB meant something.

I'll try out strace and see if anything pops up.
posted by RustyBrooks at 11:52 AM on November 13, 2006


Have you run memtest to make sure all your RAM is good? Also, which window manager? How much cruft is installed? Does the problem happen if you boot off a Knoppix CD, too? (If it does, it isn't a hard drive.)
posted by QIbHom at 12:48 PM on November 13, 2006


smartctl lists "prefail" or "old age" for all the parameters of the test

If they are in the "TYPE" column of the Attributes table, I don't think that means you're having disk problems. It is just telling you which of two types of failure that attribute would be if it did fail.

From http://www.linuxjournal.com/comment/reply/6983:
The TYPE of the Attribute indicates if Attribute failure means the device has reached the end of its design life (Old_age) or it's an impending disk failure (Pre-fail). For example, disk spin-up time (ID #3) is a prefailure Attribute. If this (or any other prefail Attribute) fails, disk failure is predicted in less than 24 hours.
posted by Sirius at 1:09 PM on November 13, 2006


Response by poster: There is no windowmanager, actually, when you log in via X it runs my special program which handles the media center. But it happens even if you don't log into X - it is not at all related to or specific to that. I can even not start up xdm, etc, and I still have this trouble.

memtest reports no problems
posted by RustyBrooks at 1:35 PM on November 13, 2006


The SMART result to care about is the current pending sector count.

When a drive detects an unrecoverable ECC error on a given sector on the first try, it tries again, several more times; if that still doesn't work, it will generally re-seek that track, and try again. It will then mark that sector "pending reallocation". Next time the sector is written to, it will get put somewhere else on the drive; but if it's part of a file that gets read often but never written, the effect is to make all reads of that file very slow.

If you've got some going-bad sectors in programs you're trying to start, they will be very slow to start the first time; on subsequent times, they'll most likely still be in the Linux cache and load quickly.

If you've got going-bad sectors in your swap partition, you'll just get random slowdowns any time the swap partition gets used; the machine will feel like it's thrashing, even when it's a long long way from thrashing.

If you don't really care that your drive is on the way out, you just want to drag the last dregs of life out of it, and you've got enough free space available on another drive, you can force the failing drive to reallocate all its pending-reallocation sectors (unless it's also run out of spare sectors):

1. Boot up a live CD so you can run without your failing disk being mounted.

2. Use dd if=/dev/yourfailingdrive of=/path/to/image/file bs=1M to make an image of the failing disk in a file on another drive.

3. Use dd if=/dev/zero of=/dev/yourfailingdrive bs=1M to write zeroes to every sector on your failing disk, forcing it to reallocate any dodgy sectors.

4. Use dd if=/path/to/image/file of=/dev/yourfailingdrive bs=1M to restore the original disk contents.

Depending on the drive, you might get away with skipping step 3; and if you don't care about your data at all, you might even get away with doing a rewrite-in-place using dd if=/dev/yourfailingdrive of=/dev/yourfailingdrive bs=1M. Me, I like having backups :-)
posted by flabdablet at 2:47 PM on November 13, 2006


If you're backing up that drive, you will have a much smaller image with partimage than dd. And if the drive is really hosed, you want dd_recover, which keeps retrying after bad sectors. If dd hits a bad sector, it just bails out.
posted by leapfrog at 9:32 AM on November 14, 2006


These are both fair points.

On the other hand, if you do have a drive with enough space for a dd image, dd is faster; also, you can stop it bailing on error by adding conv=noerror to the parameter list.

The main reason I recommended dd over partimage, though, is that dd applied to a raw drive (like /dev/hda) as opposed to a partition (like /dev/hda1) grabs the partition table and all; you don't have to mess with fdisk before restoring. I understand there's a way to use partimage to do the same thing, but I've never been able to make it work, and dd always has.

Also, if the drive were really hosed, things would actually be failing to work instead of just being slow.

I haven't used dd_recover. Is it similar to ddrescue?
posted by flabdablet at 7:47 PM on November 14, 2006


You aren't by chance using external drives, are you? I've run into this EXACT same problem - Turns out my USB card wasn't fully supported, which caused my system to grind to a near standstill whenever I accessed any data contained on that drive. Thankfully, my OS was on an internal drive, which made troubleshooting this much easier. One new (fully supported) USB card later, and I'm good to go...

My much more technically-minded friend mentioned to me that it had to do with DMA / I/O issues, fwiw.
posted by rabble at 10:39 AM on November 20, 2006


« Older Opening random windows files?   |   Should I get dental/vision insurance? If so... Newer »
This thread is closed to new comments.