Advanced Mac troubleshooting Anything else to try from the command line?
October 27, 2017 10:47 AM Subscribe
I have an old iMac (13, 2 - 2.9 GHz) running El Cap that very suddenly mostly stopped working. I can get into single-user mode just fine but everything else is hanging and freezing. Trying to see if there are things I can try from the command line before I give up on it entirely. Suggestions welcome.
I have backups, and plenty of free HD space FYI. I came home from work yesterday and my machine which had been left on was now off. This sometimes happens w/ power failures so I restarted it and it started loading normally and then I got the dreaded "prohibited" sign. A flurry of troubleshooting followed w/ these results
- results are the same across all three accounts (personal, guest, Apple Official Guest)
- booting in to safe mode seems to usually result in a kernel panic
- verbose mode boot gives no extra details since it zips by and then hangs once it gets to the GUI, have not checked logfiles for it
- I can often get it to boot to the login screen and sometimes login to the desktop but then it's basically hung (i.e. can't run anything, sometimes I get an error that finder.app can't run, a click takes several minutes to register, can't get to terminal or anything)
- reset SMC and NVRAM for good measure, no change
- single user mode boot is ok and I ran fsck -fy which said "HD appears to be OK"
- leaving it overnight to think about what it did did nothing
Things I have not been able to do
- recovery mode goes straight to internet mode and then hangs, says it will take a day to d/l the software (my internet is slow but not that slow) and eventually dies (error 2102F)
- ran Apple Hardware Test which goes along fine and then eventually freezes (at 1:30-ish for simple test and 25:00 for extended test)
- I can't seem to download a copy of El Cap to create a USB startup (I follow those instructions, get a fail notice from app store)
- I do not have a recovery partition on the iMac
Other maybe useful information. I have a Mac household and have experienced this same failure mode from my older iMac and another older laptop over the past eight months (the weird slow hang... can boot but just barely stuff) so I do wonder a bit if there is something in my home software environment that is affecting this (it's not particularly dusty or humid or hot or cold here, no pets, surge protected machines) but I don't want to be superstitious about it, they're all old machines.
So! I have a functioning Mac laptop. I can get to the command line of the iMac. I can (probably) boot from a USB something. Are there steps I haven't taken? Are there things I haven't thought of? I am not sentimental about this machine, but I do have a limited amount of free time to fuck with it, would love to find a way to determine "Yeah it's fucked" versus "No, you can fix it" I have friends who could install a new HD/RAM if needed, but local computer guys are not great with Macs, so they would not be my first choice. Thanks for any and all suggestions.
I have backups, and plenty of free HD space FYI. I came home from work yesterday and my machine which had been left on was now off. This sometimes happens w/ power failures so I restarted it and it started loading normally and then I got the dreaded "prohibited" sign. A flurry of troubleshooting followed w/ these results
- results are the same across all three accounts (personal, guest, Apple Official Guest)
- booting in to safe mode seems to usually result in a kernel panic
- verbose mode boot gives no extra details since it zips by and then hangs once it gets to the GUI, have not checked logfiles for it
- I can often get it to boot to the login screen and sometimes login to the desktop but then it's basically hung (i.e. can't run anything, sometimes I get an error that finder.app can't run, a click takes several minutes to register, can't get to terminal or anything)
- reset SMC and NVRAM for good measure, no change
- single user mode boot is ok and I ran fsck -fy which said "HD appears to be OK"
- leaving it overnight to think about what it did did nothing
Things I have not been able to do
- recovery mode goes straight to internet mode and then hangs, says it will take a day to d/l the software (my internet is slow but not that slow) and eventually dies (error 2102F)
- ran Apple Hardware Test which goes along fine and then eventually freezes (at 1:30-ish for simple test and 25:00 for extended test)
- I can't seem to download a copy of El Cap to create a USB startup (I follow those instructions, get a fail notice from app store)
- I do not have a recovery partition on the iMac
Other maybe useful information. I have a Mac household and have experienced this same failure mode from my older iMac and another older laptop over the past eight months (the weird slow hang... can boot but just barely stuff) so I do wonder a bit if there is something in my home software environment that is affecting this (it's not particularly dusty or humid or hot or cold here, no pets, surge protected machines) but I don't want to be superstitious about it, they're all old machines.
So! I have a functioning Mac laptop. I can get to the command line of the iMac. I can (probably) boot from a USB something. Are there steps I haven't taken? Are there things I haven't thought of? I am not sentimental about this machine, but I do have a limited amount of free time to fuck with it, would love to find a way to determine "Yeah it's fucked" versus "No, you can fix it" I have friends who could install a new HD/RAM if needed, but local computer guys are not great with Macs, so they would not be my first choice. Thanks for any and all suggestions.
Response by poster: Definitely my feeling too and I am prepared for the inevitable, just sort of hoping against hope there might be a way to isolate just which hardware might need repairing before I toss the whole thing out. Thank you!
posted by jessamyn at 11:00 AM on October 27, 2017
posted by jessamyn at 11:00 AM on October 27, 2017
You may well have tried it already, but it's something I learned from my short time as a Mac IT guy. It's akin to "did you turn it of and on again" in that it solved 90% of the problems I would come across:
Reset the PRAM: Hold command-option-p-r as the machine boots. It will force another boot and clear the PRAM.
posted by soplerfo at 11:21 AM on October 27, 2017
Reset the PRAM: Hold command-option-p-r as the machine boots. It will force another boot and clear the PRAM.
posted by soplerfo at 11:21 AM on October 27, 2017
I can (probably) boot from a USB something.
Can you get Knoppix to run on it?
If so, worth using
But the most likely thing if you're seeing freezes is power supply issues, probably due to capacitors drying out and/or eating themselves.
posted by flabdablet at 11:23 AM on October 27, 2017 [2 favorites]
Can you get Knoppix to run on it?
If so, worth using
smartctl -a /dev/sdain a root terminal and checking the raw counts for unrecoverable read errors, reallocated sectors and sectors pending reallocation. An unreadable sector in the guts of a vital file or directory won't necessarily show up on a fsck but can still ruin an OS's entire day. You could also try
memtestat the Knoppix boot prompt and let memtest86+ play with your RAM for a while.
But the most likely thing if you're seeing freezes is power supply issues, probably due to capacitors drying out and/or eating themselves.
posted by flabdablet at 11:23 AM on October 27, 2017 [2 favorites]
Either a bad HD or logic board. Both become increasingly common with age. I wouldn't suspect the local environment at all.
posted by humboldt32 at 12:04 PM on October 27, 2017
posted by humboldt32 at 12:04 PM on October 27, 2017
Random hangs/freezes smell like dodgy RAM to me. You can get bootable images of memtest86 here to test it.
posted by doop at 12:18 PM on October 27, 2017 [1 favorite]
posted by doop at 12:18 PM on October 27, 2017 [1 favorite]
recovery mode goes straight to internet mode and then hangs, says it will take a day to d/l the software (my internet is slow but not that slow) and eventually dies (error 2102F)
Are you connected via Wifi? Possibly the wifi module is going bad on you, and that's a user-serviceable part on most older Macs. It's a shot in the dark, but you could try removing it and see if things improve.
If you have the appropriate Firewire cords, you could also try booting the iMac from your Macbook's hard disk using Target Disk mode. Hard freezes suggest this is a hardware issue though.
posted by neckro23 at 1:17 PM on October 27, 2017 [1 favorite]
Are you connected via Wifi? Possibly the wifi module is going bad on you, and that's a user-serviceable part on most older Macs. It's a shot in the dark, but you could try removing it and see if things improve.
If you have the appropriate Firewire cords, you could also try booting the iMac from your Macbook's hard disk using Target Disk mode. Hard freezes suggest this is a hardware issue though.
posted by neckro23 at 1:17 PM on October 27, 2017 [1 favorite]
Listen to Doop.
memtest86 is much more thorough than Apple's memory tests, and you can leave it running to catch those "intermittent" errors.
posted by the Real Dan at 1:25 PM on October 27, 2017
memtest86 is much more thorough than Apple's memory tests, and you can leave it running to catch those "intermittent" errors.
posted by the Real Dan at 1:25 PM on October 27, 2017
Response by poster: Memtest currently chewing away at things. Will it issue me a report when it's done?
And yeah I am mostly getting hangs (takes minutes for a click to register, mouse works) and not so much freezes (can't move mouse, no key commands do anything) though I've had a few. Sometimes hard to tell the difference. It's mostly that it's unusably slow (each click takes 5 min to register/process and it eventually hangs completely) and at least some of the time it still boots to the prohibited sign.
posted by jessamyn at 2:47 PM on October 27, 2017
And yeah I am mostly getting hangs (takes minutes for a click to register, mouse works) and not so much freezes (can't move mouse, no key commands do anything) though I've had a few. Sometimes hard to tell the difference. It's mostly that it's unusably slow (each click takes 5 min to register/process and it eventually hangs completely) and at least some of the time it still boots to the prohibited sign.
posted by jessamyn at 2:47 PM on October 27, 2017
Memtest should just keep running over and over again until it finds an error or your machine crashes. In the past I've just left it running overnight and proceeded on the assumption that if it doesn't find anything in that period then dodgy RAM is probably not the proximate cause of the problem.
Beyond that, if you can get it into single user mode then you could try running:
grep "I/O error" /var/log/system.log
..or otherwise poke around in system.log for evidence of disk or other hardware problems. Or maybe boot to single-user and run something like "find / >/dev/null" to get it doing something and then hope whatever it is goes wrong and you might see the kernel moaning about something onscreen. I can't remember how much single-user mode leaves you with on MacOS, other than it being annoyingly little last time I had to do it.
Beyond that, as a last resort, if you can get the case off, get a can of compressed air and give the insides a really good blast with it - sometimes you get teeny little bits of dust shorting out components or tracks.
Beyond that I'm out of ideas, sorry. I think it would be perfectly reasonable to assume it's a bad logic board if memtest comes back clean and you're low on time to mess around with it.
posted by doop at 3:23 PM on October 27, 2017 [1 favorite]
Beyond that, if you can get it into single user mode then you could try running:
grep "I/O error" /var/log/system.log
..or otherwise poke around in system.log for evidence of disk or other hardware problems. Or maybe boot to single-user and run something like "find / >/dev/null" to get it doing something and then hope whatever it is goes wrong and you might see the kernel moaning about something onscreen. I can't remember how much single-user mode leaves you with on MacOS, other than it being annoyingly little last time I had to do it.
Beyond that, as a last resort, if you can get the case off, get a can of compressed air and give the insides a really good blast with it - sometimes you get teeny little bits of dust shorting out components or tracks.
Beyond that I'm out of ideas, sorry. I think it would be perfectly reasonable to assume it's a bad logic board if memtest comes back clean and you're low on time to mess around with it.
posted by doop at 3:23 PM on October 27, 2017 [1 favorite]
Response by poster: Memtest wrapped up and found no errors. I rebooted the machine and it booted "normally" (first time since yesterday that I've been able to shut it down) except that the finder was pretty unstable: desktop icons would disappear sometimes and the whole thing felt like it was running with only a few percentages of CPU available. Had to relaunch Finder once. However I was able to get Terminal and Activity Monitor running and it showed that mostly CPU was okay though kernel_task seems to be taking up 40-50% of the CPU and WindowServer is taking up maybe 30%
The good news is that it was on the network and I was able to retrieve some files off of the desktop that I'd like to have (by moving them to another machine on the network) so I'm feeling like I have a lot less of a deadline for GET THIS WORKING. Got AHT running and will leave it overnight.
Will check system.log tomorrow. If there's any definitive way I could isolate "Hard drive versus logic board issue" I'd love to know it. Thanks for everyone's attention.
posted by jessamyn at 5:53 PM on October 27, 2017
The good news is that it was on the network and I was able to retrieve some files off of the desktop that I'd like to have (by moving them to another machine on the network) so I'm feeling like I have a lot less of a deadline for GET THIS WORKING. Got AHT running and will leave it overnight.
Will check system.log tomorrow. If there's any definitive way I could isolate "Hard drive versus logic board issue" I'd love to know it. Thanks for everyone's attention.
posted by jessamyn at 5:53 PM on October 27, 2017
If you can get smartmontools working and generate output resembling this,
Any Reallocated_Sector_Ct under about 20 is probably OK. Substantially more than that means the drive is rapidly approaching the end of its reliable service life.
A non-zero value for Current_Pending_Sector means that there is at least one disk block that will cause very lengthy re-read attempts every time the OS tries to read it, and this is a very common way for hard disks to cause hangs and unresponsiveness. Such blocks typically don't get reallocated until the OS actually rewrites them, which if they're in the middle of some executable file or system directory it will probably never do.
posted by flabdablet at 2:04 AM on October 28, 2017 [2 favorites]
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 209 207 021 Pre-fail Always - 4550 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 93 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 49 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 67 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 41 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4808 194 Temperature_Celsius 0x0022 125 119 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0then the numbers you want to pay attention to are the RAW_VALUE for the lines I've bolded.
Any Reallocated_Sector_Ct under about 20 is probably OK. Substantially more than that means the drive is rapidly approaching the end of its reliable service life.
A non-zero value for Current_Pending_Sector means that there is at least one disk block that will cause very lengthy re-read attempts every time the OS tries to read it, and this is a very common way for hard disks to cause hangs and unresponsiveness. Such blocks typically don't get reallocated until the OS actually rewrites them, which if they're in the middle of some executable file or system directory it will probably never do.
posted by flabdablet at 2:04 AM on October 28, 2017 [2 favorites]
If you can actually get enough of a way inside the thing to try the compressed air blast, have a look for bad capacitors while you're there. A healthy electrolytic capacitor has a completely flat top rather than a domed one, and certainly not a domed one with a little brown spot in the middle that looks like it might have leaked from a very old battery. Healthy caps will also be sitting nice and square on the circuit board rather than leaning over as if they're straining to shit their rubber sealing plugs out the bottom of their cans.
posted by flabdablet at 2:13 AM on October 28, 2017 [2 favorites]
posted by flabdablet at 2:13 AM on October 28, 2017 [2 favorites]
This really sounds like a failing drive to me.
MacOS uses disk-backed VM, so a disk in some failure modes can cause the whole interface to go wonky.
+1 for doop's suggestion of looking for I/O errors in system.log.
posted by tomierna at 8:38 AM on October 28, 2017
MacOS uses disk-backed VM, so a disk in some failure modes can cause the whole interface to go wonky.
+1 for doop's suggestion of looking for I/O errors in system.log.
posted by tomierna at 8:38 AM on October 28, 2017
The fact that it survived a memtest run suggests that it’s most likely not the logic board or the memory at fault. The bad news is that, if the HD is failing, it’s something of a nightmare to replace yourself.
On the positive side, Macs will happily boot from an external drive so if you feel like keeping the beast going, you can plug one in, install OSX (or Linux for that matter) on it and it’ll probably keep on trucking.
NB. If your iMac has spinning rust for storage, then using an external SSD drive plugged into the USB3 socket will make the machine fly by comparison with the internal drive.
posted by pharm at 12:28 PM on October 28, 2017
On the positive side, Macs will happily boot from an external drive so if you feel like keeping the beast going, you can plug one in, install OSX (or Linux for that matter) on it and it’ll probably keep on trucking.
NB. If your iMac has spinning rust for storage, then using an external SSD drive plugged into the USB3 socket will make the machine fly by comparison with the internal drive.
posted by pharm at 12:28 PM on October 28, 2017
Response by poster: Thank you everyone. I am completely OK with HD replacement nightmare if it comes to that. However the smartmontools discussion is really "bla bla Ginger" to me.
Last night I ran AHT all night (looping) and woke up to find that it had basically frozen at 25 minutes again. I was able to boot into the Apple Guest account and poke around some, mainly running Activity Monitor (nothing obvious) and keeping an eye out for weird processes (same). Kept it running a lot of the day, watched various apps go in and out of "not functioning" stage but no actual freezes.
If I am understanding folks here correctly, if it's the hard drive that is the problem I can boot from another drive which runs Mac OS (like a spare laptop, in Target Disk Mode) and just use my big screen with that machine? And yes I do have spinning (noisy) rust, so I'll see what I can do along those lines. That may be my best option for now.
posted by jessamyn at 1:12 PM on October 28, 2017
Last night I ran AHT all night (looping) and woke up to find that it had basically frozen at 25 minutes again. I was able to boot into the Apple Guest account and poke around some, mainly running Activity Monitor (nothing obvious) and keeping an eye out for weird processes (same). Kept it running a lot of the day, watched various apps go in and out of "not functioning" stage but no actual freezes.
If I am understanding folks here correctly, if it's the hard drive that is the problem I can boot from another drive which runs Mac OS (like a spare laptop, in Target Disk Mode) and just use my big screen with that machine? And yes I do have spinning (noisy) rust, so I'll see what I can do along those lines. That may be my best option for now.
posted by jessamyn at 1:12 PM on October 28, 2017
If I am understanding folks here correctly, if it's the hard drive that is the problem I can boot from another drive which runs Mac OS (like a spare laptop, in Target Disk Mode) and just use my big screen with that machine?
Pretty much. Booting from a laptop Target Disc Mode will run the entire OS on the iMac hardware. The laptop would just be acting as dumb storage in that scenario — it’s equivalent to using the laptop’s storage as a pluggable HD.
posted by pharm at 2:23 PM on October 28, 2017
Pretty much. Booting from a laptop Target Disc Mode will run the entire OS on the iMac hardware. The laptop would just be acting as dumb storage in that scenario — it’s equivalent to using the laptop’s storage as a pluggable HD.
posted by pharm at 2:23 PM on October 28, 2017
I know often when I run into weird issues on the Mac, I have had success running Disk Utility's verify/repair permissions option. If you only have access to command line you can do it using:
diskutil verifyPermissions /
diskutil repairPermissions /
Similarly you can verify/repair volumes/disks using:
diskutil verifyVolume /
diskutil repairVolume /
The "/" just tells it to do this to your "main" hard drive.
posted by kup0 at 3:33 PM on October 28, 2017 [1 favorite]
diskutil verifyPermissions /
diskutil repairPermissions /
Similarly you can verify/repair volumes/disks using:
diskutil verifyVolume /
diskutil repairVolume /
The "/" just tells it to do this to your "main" hard drive.
posted by kup0 at 3:33 PM on October 28, 2017 [1 favorite]
the smartmontools discussion is really "bla bla Ginger" to me
I don't have a Mac so I can't try this out, but it appears from the screenshots that SMART Utility will show you the same stuff without needing to get down and dirty with Terminal.
posted by flabdablet at 4:15 AM on October 29, 2017
I don't have a Mac so I can't try this out, but it appears from the screenshots that SMART Utility will show you the same stuff without needing to get down and dirty with Terminal.
posted by flabdablet at 4:15 AM on October 29, 2017
If a hang is caused by read retries on a spinning disk, you'll hear the drive going click... click... click... if you press your ear to the case.
posted by flabdablet at 5:59 AM on October 29, 2017
posted by flabdablet at 5:59 AM on October 29, 2017
Response by poster: I tried a few more things including verifying that the thing could not, in any fashion, get into Target Disk Mode. I also tried all new RAM just in case.
So hey the fucking thing is working again! A friend worked with me on opening it up and cleaning it out (SO much dust) and replaced the hard drive with a 750GB drive and now it's working with no issues. My best guess is that the HD was failing but it had a fusion drive (who knew? certainly not me) and there was at least some good data (including most of the booting OS) on the Flash part of the drive so it was behaving erratically at best. Replacing the drive fixed 100% of everything. Now I have to transfer all my stuff back from my (good) backup but that's pretty minor. All in all I lost a few weeks of computer use and maybe a day or two of data. Thanks again to everyone who chimed in with suggestions.
posted by jessamyn at 6:55 PM on November 15, 2017 [2 favorites]
So hey the fucking thing is working again! A friend worked with me on opening it up and cleaning it out (SO much dust) and replaced the hard drive with a 750GB drive and now it's working with no issues. My best guess is that the HD was failing but it had a fusion drive (who knew? certainly not me) and there was at least some good data (including most of the booting OS) on the Flash part of the drive so it was behaving erratically at best. Replacing the drive fixed 100% of everything. Now I have to transfer all my stuff back from my (good) backup but that's pretty minor. All in all I lost a few weeks of computer use and maybe a day or two of data. Thanks again to everyone who chimed in with suggestions.
posted by jessamyn at 6:55 PM on November 15, 2017 [2 favorites]
« Older Best practices for modest smartphone security? | Cell phone elbow? Something else? Holy hell this... Newer »
This thread is closed to new comments.
posted by Alensin at 10:55 AM on October 27, 2017