Strange X restarts...
March 19, 2009 10:45 PM Subscribe
LinuxFilter: What could be causing these strange, unwanted, and annoying X restarts?
I'm experiencing a very peculiar intermittent problem with X restarts, for which I have no explanation and hence no fix. I'm looking for some insight from one of you linux experts out there.
When I'm not traveling I have my laptop (IBM Thinkpad T43p) sitting in a docking station connected to an external monitor. I use this laptop to ssh into remote machines on which I'm actually doing all my work. I forward X and often start up numerous terminals and application windows remotely, which then get displayed on my monitor. One of the applications I use extensively is IDL (a data analysis and visualization tool), which runs in a terminal but also creates its own windows for displaying data and images.
Sometimes when I end one my IDL sessions by typing 'exit' at the IDL prompt this causes my local X session to restart. I get kicked out of gnome and end up at the login screen. This is obviously a big nuisance, since I have to re-login, ssh back into all the external machines, and recreate my whole work environment. Also, unless I happen to be using screen, I've lost what I was doing at the moment. I have to emphasize that this is not exactly reproducible behaviour. It only happens sometimes, seemingly at random. I believe it's somewhat more likely when I haven't been using the particular terminal for a while, but that may just be my imagination.
What I find so strange about this is that issuing a command (typing 'exit') in an IDL session running on a remote terminal is causing an X restart on my local machine. It's as if my typing 'exit' in the remote IDL session is sometimes interpreted by my local X as the equivalent of ctrl+alt+backspace. How could that happen? I'm not even sure what component is at fault here: is this a problem with my X server, with ssh X forwarding, with IDL, with the terminal?
I've posted a bugzilla report about this under xorg-x11 (#483489), but so far nothing has come of it. Like I said, I don't even know if this is an X problem...
Even if you don't know of a solution, I'd just be interested to understand how this could even happen. Thanks.
More information about my setup, in case it's helpful: Fedora 10, kernel 2.6.27.12-170.2.5.fc10.i686, xorg 1.5.3-6. The laptop has an ATI videocard, but I'm not running a proprietary driver. I'm letting my display and videocard be auto-detected (meaning I have no /etc/X11/xorg.conf file).
I'm experiencing a very peculiar intermittent problem with X restarts, for which I have no explanation and hence no fix. I'm looking for some insight from one of you linux experts out there.
When I'm not traveling I have my laptop (IBM Thinkpad T43p) sitting in a docking station connected to an external monitor. I use this laptop to ssh into remote machines on which I'm actually doing all my work. I forward X and often start up numerous terminals and application windows remotely, which then get displayed on my monitor. One of the applications I use extensively is IDL (a data analysis and visualization tool), which runs in a terminal but also creates its own windows for displaying data and images.
Sometimes when I end one my IDL sessions by typing 'exit' at the IDL prompt this causes my local X session to restart. I get kicked out of gnome and end up at the login screen. This is obviously a big nuisance, since I have to re-login, ssh back into all the external machines, and recreate my whole work environment. Also, unless I happen to be using screen, I've lost what I was doing at the moment. I have to emphasize that this is not exactly reproducible behaviour. It only happens sometimes, seemingly at random. I believe it's somewhat more likely when I haven't been using the particular terminal for a while, but that may just be my imagination.
What I find so strange about this is that issuing a command (typing 'exit') in an IDL session running on a remote terminal is causing an X restart on my local machine. It's as if my typing 'exit' in the remote IDL session is sometimes interpreted by my local X as the equivalent of ctrl+alt+backspace. How could that happen? I'm not even sure what component is at fault here: is this a problem with my X server, with ssh X forwarding, with IDL, with the terminal?
I've posted a bugzilla report about this under xorg-x11 (#483489), but so far nothing has come of it. Like I said, I don't even know if this is an X problem...
Even if you don't know of a solution, I'd just be interested to understand how this could even happen. Thanks.
More information about my setup, in case it's helpful: Fedora 10, kernel 2.6.27.12-170.2.5.fc10.i686, xorg 1.5.3-6. The laptop has an ATI videocard, but I'm not running a proprietary driver. I'm letting my display and videocard be auto-detected (meaning I have no /etc/X11/xorg.conf file).
Best answer: I don't know what's wrong, but this is the first step I'd take to troubleshooot:
First, set your machine to boot to text-only mode, rather than starting X. Once upon a time, you just had to set the default runlevel to 2 or 3 instead of 5, but many distros now make X start at any runlevel, meaning you may have to go into your startup directory and rename or remove the links that start X. If you need help with that, I can be more precise.
Once you've gotten it to a text-mode login prompt, login as your usual user and start your X session with the 'startx' command. You will see a hurricane of text scroll by, and then X should start fairly normally. Use the system like you normally would.
When/if X crashes, it should drop you to text mode, and there should be a nice onscreen log to tell you exactly what just happened. Shift-PGUP will scroll back in screen terminals.
posted by Malor at 11:18 PM on March 19, 2009 [1 favorite]
First, set your machine to boot to text-only mode, rather than starting X. Once upon a time, you just had to set the default runlevel to 2 or 3 instead of 5, but many distros now make X start at any runlevel, meaning you may have to go into your startup directory and rename or remove the links that start X. If you need help with that, I can be more precise.
Once you've gotten it to a text-mode login prompt, login as your usual user and start your X session with the 'startx' command. You will see a hurricane of text scroll by, and then X should start fairly normally. Use the system like you normally would.
When/if X crashes, it should drop you to text mode, and there should be a nice onscreen log to tell you exactly what just happened. Shift-PGUP will scroll back in screen terminals.
posted by Malor at 11:18 PM on March 19, 2009 [1 favorite]
Best answer: I don't have a solution, but I can clarify about how the problem could have come about.
X forwarding works by having the remote X client (in your case IDL) send X messages to your local X server. Thus my guess is that its something to do with the messages being sent by IDL to your X server that is making your X server crash. If for example there is a bug in the X server's message parsing code, an X client could presumbly send a malformed X message which will cause the server to crash. This is analogous to situations where malicious clients can cause buggy web servers or some other servers to crash.
posted by destrius at 11:26 PM on March 19, 2009
X forwarding works by having the remote X client (in your case IDL) send X messages to your local X server. Thus my guess is that its something to do with the messages being sent by IDL to your X server that is making your X server crash. If for example there is a bug in the X server's message parsing code, an X client could presumbly send a malformed X message which will cause the server to crash. This is analogous to situations where malicious clients can cause buggy web servers or some other servers to crash.
posted by destrius at 11:26 PM on March 19, 2009
Best answer: What Malor said, with addenda:
hit ctl-alt-f2 to access a vt with a text login (this will work with almost any linux distro by default). You can hit ctl-alt-f7 to get back to your default X instance. In the shell you get there, run your normal gdm (may be xdm or kdm, some name like this should show up toward the bottom of the top screen when you run the top command). This will give you a chance to log in to a second X session (if you switch away from it, you can get to it again by hitting ctl-alt-f8). Now do your normal thing from the second login. Any errors for the X server will show up on the vt you access by hitting ctl-alt-f2. You can make note of this error and report it to your distro. X.org only wants to hear about your bug if you found it on their latest release, but your distro will report anything applicable upstream, if it is a bug in your distro's active release.
What is happening, I am willing to bet, is that IDL is using some rare part of the X protocol that most programs do not use, or the network latency is exposing a race condition, or maybe even both. The salient point is that X is supoosed to be resiliant in the face of broken clients, they should never be able to make it crash.
Extra points bonus nerdy action: run "xinit -- :1 -logfile X-errors-IDL.log" from a vt (see above for how to get a vt). If you do not have the .xinitrc (man xinitrc) file in your home directory, you will get just an xterm, no wm and no desktop environment, but you can run gnome from the xterm by running the gnome-session command, or kde by running startkde. Most wms just run by typing some variant of their name (wmaker for windowmaker is the one exception that comes to mind). Now, all output from the X server will be appended to the file X-errors-IDL.log (feel free to pick a different file name of course). You can forward this file to the Xorg maintainer for your distro.
posted by idiopath at 11:52 PM on March 19, 2009
hit ctl-alt-f2 to access a vt with a text login (this will work with almost any linux distro by default). You can hit ctl-alt-f7 to get back to your default X instance. In the shell you get there, run your normal gdm (may be xdm or kdm, some name like this should show up toward the bottom of the top screen when you run the top command). This will give you a chance to log in to a second X session (if you switch away from it, you can get to it again by hitting ctl-alt-f8). Now do your normal thing from the second login. Any errors for the X server will show up on the vt you access by hitting ctl-alt-f2. You can make note of this error and report it to your distro. X.org only wants to hear about your bug if you found it on their latest release, but your distro will report anything applicable upstream, if it is a bug in your distro's active release.
What is happening, I am willing to bet, is that IDL is using some rare part of the X protocol that most programs do not use, or the network latency is exposing a race condition, or maybe even both. The salient point is that X is supoosed to be resiliant in the face of broken clients, they should never be able to make it crash.
Extra points bonus nerdy action: run "xinit -- :1 -logfile X-errors-IDL.log" from a vt (see above for how to get a vt). If you do not have the .xinitrc (man xinitrc) file in your home directory, you will get just an xterm, no wm and no desktop environment, but you can run gnome from the xterm by running the gnome-session command, or kde by running startkde. Most wms just run by typing some variant of their name (wmaker for windowmaker is the one exception that comes to mind). Now, all output from the X server will be appended to the file X-errors-IDL.log (feel free to pick a different file name of course). You can forward this file to the Xorg maintainer for your distro.
posted by idiopath at 11:52 PM on March 19, 2009
Forgot to mention, you can add the -logverbose option to the end of the xinit command, default for is 3, higher numbers give you more detailed logging of events and conditions the server encounters. 'man Xorg' for more info, any argument to xinit after -- is actually an argument for Xorg (or whatever other x server you are using on the unlikely case you are not using Xorg).
posted by idiopath at 11:56 PM on March 19, 2009
posted by idiopath at 11:56 PM on March 19, 2009
Man I am submitting these things too fast, one more thing just occurred to me that may help you. If you can handle the extra ram usage of having two instances of Xorg, desktop enviro, etc, you could just have a second X session for IDL only, so if it crashes Xorg, it only crashes the one instance it is using, rather than the one that has all your other apps in it. The one drawback other than the extra ram usage would be that the kernel has to work pretty hard when you switch vts, so you would probably notice any music you are playing skip, and all your programs pause for a half second or so after a switch. Otherwise it would save you much hassle until you get a fixed Xorg.
posted by idiopath at 12:07 AM on March 20, 2009
posted by idiopath at 12:07 AM on March 20, 2009
Best answer: The contents of /var/log/Xorg.0.log.old and /var/log/gdm/:0.log.1 (After the login screen has restarted these are the files from the previous login) would be of interest, especially the final five-ten lines of each.
Regardless, this is a crash bug: please do report it at the freedesktop.org bugzilla here: http://bugs.freedesktop.org/. Please create a login & report the bug by clicking on "New" and selecting the "Xorg" component (down the bottom somewhere in the big list).
Unfortunately, since IDL is a commercial application, it may be difficult for the X crowd to debug without quite a lot of information from you. I'd have a look myself, but my department doesn't have an IDL licence.
posted by pharm at 2:58 AM on March 20, 2009
Regardless, this is a crash bug: please do report it at the freedesktop.org bugzilla here: http://bugs.freedesktop.org/. Please create a login & report the bug by clicking on "New" and selecting the "Xorg" component (down the bottom somewhere in the big list).
Unfortunately, since IDL is a commercial application, it may be difficult for the X crowd to debug without quite a lot of information from you. I'd have a look myself, but my department doesn't have an IDL licence.
posted by pharm at 2:58 AM on March 20, 2009
Guess... your XSession is using the first terminal as the final executable for your session, IDL is a wrapper script that calls 'exit' inappropriately. You run IDL from your first terminal, IDL script calls exit and X shuts down.
I've had stuff like this happen many, many times. Try typing exit in that first terminal, does X quit? If it does, that's likely your problem. X will quit when the final process in the XSession exits.
The way around this is to always start IDL from a new XTerm. Never use your 'initial' xterm for anything. In fact, you should set up things so that that initial xterm is the console, or that your window manager is the final process in the XSession that stays running all the time. (start an extra 'xterm &' in your session)
eg:
$ cat .xinitrc
xmodmap ~/.xmodmap
feh --bg-center /etc/X11/logo.png
xterm -rv -sl 4096 &
firefox &
sunbird &
pidgin &
exec fluxbox
now you can close that xterm and start another... too many distros use something like:
fluxbox &
exec xterm
so when that xterm dies, X goes *poof*.
I get this all the time at work where people put exit in scripts that are supposed to be sourced and when you hit the exit... *poof* xterm/shell exits.
Use the window manager menu to start a new shell, run IDL from there. Don't run it from the initial xterm you have. See if that helps.
posted by zengargoyle at 3:05 AM on March 20, 2009
I've had stuff like this happen many, many times. Try typing exit in that first terminal, does X quit? If it does, that's likely your problem. X will quit when the final process in the XSession exits.
The way around this is to always start IDL from a new XTerm. Never use your 'initial' xterm for anything. In fact, you should set up things so that that initial xterm is the console, or that your window manager is the final process in the XSession that stays running all the time. (start an extra 'xterm &' in your session)
eg:
$ cat .xinitrc
xmodmap ~/.xmodmap
feh --bg-center /etc/X11/logo.png
xterm -rv -sl 4096 &
firefox &
sunbird &
pidgin &
exec fluxbox
now you can close that xterm and start another... too many distros use something like:
fluxbox &
exec xterm
so when that xterm dies, X goes *poof*.
I get this all the time at work where people put exit in scripts that are supposed to be sourced and when you hit the exit... *poof* xterm/shell exits.
Use the window manager menu to start a new shell, run IDL from there. Don't run it from the initial xterm you have. See if that helps.
posted by zengargoyle at 3:05 AM on March 20, 2009
Furthering idiopath's suggestion, I've got another one.
You can run an Xsession in a window on your existing desktop: easiest way is to use Xephyr (you'll need to make sure that xserver-xephyr is installed, or a similar package name).
$ Xephyr :1 -ac -screen 1024x768 &
$ IDL -display :1.0
then when you quit IDL, it should only affect the enclosing Xephyr xserver.
(There's also Xnest which does much the same thing, but doesn't include the more modern X11 protocol extensions, so you won't get anti-aliased test and the like. If you haven't got Xephyr you might have Xnest though.)
posted by pharm at 3:11 AM on March 20, 2009
You can run an Xsession in a window on your existing desktop: easiest way is to use Xephyr (you'll need to make sure that xserver-xephyr is installed, or a similar package name).
$ Xephyr :1 -ac -screen 1024x768 &
$ IDL -display :1.0
then when you quit IDL, it should only affect the enclosing Xephyr xserver.
(There's also Xnest which does much the same thing, but doesn't include the more modern X11 protocol extensions, so you won't get anti-aliased test and the like. If you haven't got Xephyr you might have Xnest though.)
posted by pharm at 3:11 AM on March 20, 2009
zengargoyle: sounds like he's using an out-of-the-box Fedora 10 gnome install, which shouldn't have the problem you descibe.
posted by pharm at 3:13 AM on March 20, 2009
posted by pharm at 3:13 AM on March 20, 2009
NB. If you like the warm fuzzy feeling of not letting anyone else sniff your keystrokes in your Xession, then instead of setting the "-ac" option (which turns off access control), use this script:
#!/bin/sh
MCOOKIE=$(mcookie)
xauth add $(hostname)/unix$1 . $MCOOKIE
xauth add localhost/unix$1 . $MCOOKIE
Xephyr "$@"
xauth remove $(hostname)/unix$1 localhost/unix$1
exit 0
Save it somewhere, make it excutable & run it with the same options as above but without -ac.
posted by pharm at 3:20 AM on March 20, 2009 [1 favorite]
#!/bin/sh
MCOOKIE=$(mcookie)
xauth add $(hostname)/unix$1 . $MCOOKIE
xauth add localhost/unix$1 . $MCOOKIE
Xephyr "$@"
xauth remove $(hostname)/unix$1 localhost/unix$1
exit 0
Save it somewhere, make it excutable & run it with the same options as above but without -ac.
posted by pharm at 3:20 AM on March 20, 2009 [1 favorite]
Response by poster: Thanks for the explanations and suggestions so far. I will try Malor's suggestion, and report back if there are any interesting error messages.
@pharm: I took a look at /var/log/Xorg.0.log.old and and found the following (also in gdm/:0.log.1):
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:31042560 vs v:35975574
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:30305280 vs v:35975574
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:30305280 vs v:35975574
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:30305280 vs v:35975574
The current Xorg.0.log doesn't have those lines. A diff between the Xorg.0.log and Xorg.0.log.old also reveals:
[root@laptop log]# diff Xorg.0.log Xorg.0.log.old
14c14
<> ---
> (==) Log file: "/var/log/Xorg.0.log", Time: Sun Mar 15 19:46:02 2009
599,600c599,600
<> <> ---
> (II) RADEON(0): [drm] Initialized kernel GART heap manager, 8388608
> adding fb map from c25b2000 for e10000 ret 0 c25b2000
[snip]
The gdm/:0.log.1 file additionally contains this line:
Xorg: radeon_lock.c:100: radeonGetLock: Assertion `drawable != ((void *)0)' failed.
@zengargoyle: pharm is right, it's just an out-of-the-box Fedora 10 gnome install, so I don't think there's anything special about the first terminal. I don't get an X restart when I type 'exit' in any local or remote terminals, only in the IDL application.>>>
posted by mqk at 8:54 AM on March 20, 2009
@pharm: I took a look at /var/log/Xorg.0.log.old and and found the following (also in gdm/:0.log.1):
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:31042560 vs v:35975574
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:30305280 vs v:35975574
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:30305280 vs v:35975574
RADEON DRM CS failure - corruptions/glitches may occur -12
bufmgr: last submission : r:0 vs g:525332480 w:30305280 vs v:35975574
The current Xorg.0.log doesn't have those lines. A diff between the Xorg.0.log and Xorg.0.log.old also reveals:
[root@laptop log]# diff Xorg.0.log Xorg.0.log.old
14c14
<> ---
> (==) Log file: "/var/log/Xorg.0.log", Time: Sun Mar 15 19:46:02 2009
599,600c599,600
<> <> ---
> (II) RADEON(0): [drm] Initialized kernel GART heap manager, 8388608
> adding fb map from c25b2000 for e10000 ret 0 c25b2000
[snip]
The gdm/:0.log.1 file additionally contains this line:
Xorg: radeon_lock.c:100: radeonGetLock: Assertion `drawable != ((void *)0)' failed.
@zengargoyle: pharm is right, it's just an out-of-the-box Fedora 10 gnome install, so I don't think there's anything special about the first terminal. I don't get an X restart when I type 'exit' in any local or remote terminals, only in the IDL application.>>>
posted by mqk at 8:54 AM on March 20, 2009
Response by poster: Sorry, the diff output got mangled. The salient difference was
In Xorg.0.log:
(EE) RADEON(0): [drm] Failed to initialize GART heap manager
adding fb map from c25b2000 for e10000 ret 0 244b6000
In Xorg.0.log.old:
(II) RADEON(0): [drm] Initialized kernel GART heap manager, 8388608
adding fb map from c25b2000 for e10000 ret 0 c25b2000
So the older session (which eventually crashed) correctly initialized the GART heap manager, and the one I'm running righ now failed to do so. Hmm...
posted by mqk at 8:57 AM on March 20, 2009
In Xorg.0.log:
(EE) RADEON(0): [drm] Failed to initialize GART heap manager
adding fb map from c25b2000 for e10000 ret 0 244b6000
In Xorg.0.log.old:
(II) RADEON(0): [drm] Initialized kernel GART heap manager, 8388608
adding fb map from c25b2000 for e10000 ret 0 c25b2000
So the older session (which eventually crashed) correctly initialized the GART heap manager, and the one I'm running righ now failed to do so. Hmm...
posted by mqk at 8:57 AM on March 20, 2009
That assertion just screams BUG! A function has been passed null when it was expecting a heap pointer.
(The output in the gdm log is actually the Xorg stdout or stderr IIRC.)
posted by pharm at 2:06 PM on March 20, 2009
(The output in the gdm log is actually the Xorg stdout or stderr IIRC.)
posted by pharm at 2:06 PM on March 20, 2009
NB. Do add that gdm.log output to the redhat bugzilla bug report: they're likely to take an assertion failure more seriously than a non-specific "it crashes sometimes when I do this" report, if only because an assertion failure gives them somewhere to start looking for a problem.
posted by pharm at 2:11 PM on March 20, 2009
posted by pharm at 2:11 PM on March 20, 2009
This thread is closed to new comments.
posted by grouse at 11:15 PM on March 19, 2009