Diagnose my failing full-screen video (Linux,Debian,Intel)
May 22, 2018 1:27 PM   Subscribe

Very recently out of nowhere my Dell XPS 13 9350 of a couple years has started to have a problem with full-screen video. The video will freeze after a bit of time while the audio continues to play normally.

This only appears to happen with full-screen video. It has never happened in the past, and does not happen when the video is not full-screen. There are no other video issues presenting. Simply toggling from full-screen to normal and back to full-screen again fixes the video playback for a variable amount of time.

Originally this only happened with local media played via `mpv` over NFS so I attributed it a bit to some sort of lag (backups, load on server, etc.) but it does happen with locally stored video, and now on occasion even streaming YouTube video.

I'm sorta leaning towards maybe the GPU and hardware decoding pipeline is getting borked somehow.

So since it appears to be only full-screen video that is wonky... Does anybody have any generic Linux X11 mpv/vlc/etc. tips or tricks or things to try so I can pin down whether my hardware is slowly failing or maybe a new bug (I've run Debian Sid since forever), or previous similar experience...
posted by zengargoyle to Computers & Internet (17 answers total)
 
Is it overheating? Check the running processes while it's lagging for a process called kidle_inject eating up CPU. This is a defensive process that idles your CPU so it has a chance to cool down. If heat is the problem you can try cleaning junk out of the fan's vents.
posted by qxntpqbbbqxl at 2:14 PM on May 22, 2018


What have you installed or updated lately? Any changes that can affect background system performance?

Try booting from an image and seeing if full screen video can play without problems.
posted by srboisvert at 3:14 PM on May 22, 2018


The times I've had similar issues and had difficulty isolating the cause, it's always been a screensaver kicking in and blanking the screen but failing to draw because it's under the video surface or some power management feature on the bus or in the video card itself turning on after a while of seeing no input.
posted by wierdo at 7:25 PM on May 22, 2018


Best answer: The Intel mode-setting graphics driver that gets baked into the Linux kernel seem fairly susceptible to suffering this kind of regression after updates. I've had to roll back to earlier kernels on two occasions when I really couldn't wait for it to be fixed with a newer update.

You might also have some joy by installing the old-school xserver-xorg-video-intel driver package instead of relying on the kernel driver. Or if that's already installed, you could try uninstalling it and seeing if the kernel driver works better for you.
posted by flabdablet at 4:02 AM on May 23, 2018


Best answer: Something else I've had occasional success with for working around instability issues with Intel video drivers on Linux is reverting them to the older UXA acceleration method instead of SNA, the newer (and quicker) default. You can do this by creating /etc/X11/xorg.conf containing just
Section "Device"
    Identifier  "Device0"
    Driver      "intel"
    Option      "AccelMethod" "uxa"
EndSection
and then restarting the X server (just restart your display manager if you don't want to reboot).
posted by flabdablet at 4:45 AM on May 23, 2018


Response by poster: overheating and system performance are not an issue in this case. Laptop and desktop are massivly overpowered for the workload I place on them and I'll feel my left hand getting warm through the keyboard long before fans even think of coming on.


updates are constant. This is Debian Sid which is a rolling release and I pretty much do a full-upgrade every morning. I'm used to that sort of flakey.
However, this did remind me that for a while I had pinned my window manager (i3) to an earlier version because the messed up window cycling for a release. It hasn't been that long ago that a fixed i3 came out. I've actually found a bug in i3 while trying to track downd my fullscreen issue, but I'm not really sure that the window manager has enough fingers in the pie to really affect the rendering pipeline or not.

booting from an image is TOO HARD. Because this bug is so intermittent a reboot / reset would likely fix it, and I could never stand a different setup long enough to pin things down.

screensaver no, that is behaving nominally.

kernel also TOO HARD. Basically it just hasn't been obviously borked enough for me to go OMG previous kernel. But this did remind me that there are a set of intel specific tools that I once tried (and failed) to use to get my desktop HDMI talking to my Sony TV (never Sony again) so there's that.

xorg.conf oh yeah. I think I can harass GDM into multiple seats and have :1, :2, :3 X servers running on different VTs with hopefully different configurations (even an older i3). This could allow a greater range of immediate testing that only has a console switch betwixt.

At the moment my plan was to replace (--fullscreen) with (--no-border --geometry 3200x1800+0+0) to narrow down between fullscreen vs window to see if there is a real difference. i3 has a bug where +0+0 is actually +0-y where y is evidently the height of a non-existant title bar. Whereas +0+1 just leaves a single pixel strip at the top. (i.e. +0+0 is just borked). But whether the window manager can mess with rendering enough to be a problem?

There is also the intel microcode blob which is another HARD thing. And I've added the compositor into that BIOS-kernel(microcode)-X-Compositor-WM-App chain. I also recenly enough flashed the BIOS (with a many months old update).

I had hoped somebody happened to know magic incantations to dynamically tweak parts of the rendering pipeline. But multiple X servers almost fills that slot.

"when I really couldn't wait for it to be fixed with a newer update" yep. It's only slightly annoying in one particular way that will probably be fixed before I can dig down and figure out what it was. Or my CPU/GPU is starting to sing a swan song.
posted by zengargoyle at 6:35 AM on May 24, 2018


kernel also TOO HARD. Basically it just hasn't been obviously borked enough for me to go OMG previous kernel.

Is your apt somehow configured to remove all previous kernels on every update? On my Debian Testing boxes, whose /etc/kernel/postinst.d/apt-auto-removal script I have left strictly alone, all I need to do to roll back to the previous kernel is select it from the GRUB menu at boot time.
posted by flabdablet at 3:23 PM on May 24, 2018


Response by poster: flabdablet: don't think about it too much. TOO HARD is a) a philosophy against rebooting being the answer to any problem (that involves a long story about cross-dressers and transvestites from outer space that would get me into more trouble than it would solve) and b) I have an uptime of around 18 days now and 5 desktops of projects going on so I'm not going to shut things down and reboot trying to chase a tiny issue that is currently unpredictable and unreproduceable (and hasn't happened even since the original post and may be fixed already by some update that's already occured).

<fedora> I already have a LiveCD ISO patched into my GRUB that I *could* boot from. I have a local apt proxy cache with months of packages hanging around for container building. I could netboot my laptop over PXE should I wish. There are no technical difficulties in that TOO HARD. </fedora>

Anyway, I'll *have* to reboot soon as there's a new X in the update pipeline. So in a couple of days I will, and if the issue occurs before I'm all settled in, then I will be checking different kernel, different X configs, etc.

I should have put a Special Snowflake in there that was more about looking for something analogous to 'xrandr - move your screens around', 'xinput - tweak your mouse/keyboard', 'x???? - change rendering pipeline'. Or sysctls, or "echo this > /sys/there" to muck with X in realtime. I was looking for more immediate A/B testing type of things.

Really, it just hasn't happened enough for me to pin it down enough to even know if it's happening compared to the time and effort involved vs the likelihood that it would be fixed before I could figure it out and that's a ton of wasted effort swatting at a fly that's going to be dead in a day.
posted by zengargoyle at 10:58 AM on May 25, 2018


Response by poster: Oh my GOD, You'll laugh, you'll cry. This morning my touchpad is unresponsive. Keyboard works, touchscreen works, external USB mouse plugs-in and works, friggin' trackpad is frozen. After mad navigation my X is updated and I've rebooted and the trackpad worked for a minute or so before ceasing to respond.

My ass seems closer to going down the failing hardware hole.

GRAAAAAAAAAAAAAAAAAAAAAAAAARRRRRRRRRRRRRRRRrrrrrr.
posted by zengargoyle at 7:59 PM on May 25, 2018


My ass seems closer to going down the failing hardware hole.

Well, that sucks. Sympathy.

On the upside, it's Debian, so just transplanting your existing drive (or an image thereof) into a new computer will be all you need to do to get back to where you were.
posted by flabdablet at 8:04 PM on May 25, 2018


Response by poster: I'm back now on the previous kernel. :P
And it's still working so far... (so there is that hope at the moment that is may actually be down in the kernel drivers). But I've also had in the past the mysterious (needs two reboots for no reason I can explain because it shouldn't work that way). So I'm still unsure in a different way. Ha!

I have rsnapshot incremental backups of important bits happening every 4 hours going back years so no worries except for the actual possible death spiral of physical hardware.

My touchpad is now jittery as f and going in and out, but ATM starts responding again if I jiggle/bash on the trackpad a bit to confuse it enough to snap out of its delirium.

If only my video issue involved video AND sound stopping I could write off my previous troubles as a gone-mad touchpad randomly pausing the video behind my back. But as it is it has to be more like a flood of random events from a failing touchpad... Almost but not quite satisfactory.

But alas, now the touchpad is back to not working at all so at least I have something concrete to work with. Sadly it does remind me of previous laptop death cycles. Still, I'm surprisingly happier to have an honest broken thing to go up against.
posted by zengargoyle at 8:47 PM on May 25, 2018


Best answer: Sounds like that trackpad issue is just a loose connector. It happens sometimes. If your laptop is at all reasonable it should only take removing a few screws to pop off the keyboard and palm rest so you can reseat them. Either that or it needs a good cleaning with alcohol wipes unless there's a known driver issue.
posted by wierdo at 1:42 PM on May 26, 2018


Try booting Ubuntu 18.04 from a USB stick (without installing), and see if you experience the same issues.
posted by Sharcho at 2:17 PM on May 26, 2018


Response by poster: weirdo: good point. I'll have to track down some XPS teardown videos. I used to take my Lattitude laptops apart enough, but haven't had to occasion to tear into my XPS. Might even be able to source a replacement trackpad. (But from a quick glance... it looks like I'd need itty-bitty torx screwdrivers which I don't have.) :(

I'll probably do something like figure out xev/xinput again and try to peek at just what it is that the touchpad is doing so that if I ever fix it I'll have something to compare against.

I'll probably have to order one of those fix-it toolkits with all the little special drivers for modern electronics. All I have is itty-bitty flat/phillips.
posted by zengargoyle at 3:26 PM on May 27, 2018


As much as I despise Walmart, they sell a screwdriver kit with a bunch of bits that include Torx bits all the way down to cell phone size. (T0, T1? I forget)

Very handy in a pinch since it's stocked at most stores.
posted by wierdo at 11:15 PM on May 27, 2018


Response by poster: Surprise! There is no Walmart anywhere near me. I think there's *one* somewhere. But down the road is All Electronics which I avoid because I will come out of there broke with a bunch of shiny things that I have no need for. :P

Torx is just something that is now a "missing tool" that I will acquire even if I don't use it because I flipped over my laptop and came to a tool that I don't have which irks me. </khaaaaann>

for the love of all that is holy don't bring up DAC and software oscilloscopes. unless you happen to have that solved....
posted by zengargoyle at 5:58 PM on May 28, 2018


Response by poster: I'm marking this as resolved. flabdablet and weirdo sorta hit that fine point that ends in either tearing apart my laptop and poking at it or more A/B diagnostics that can narrow things down by tweaking things and observation.

My initial gut feeling about GPU/Render is not quite completely satisfactory with the eventual progression that my trackpad is borking... but I can almost correlate the issues. And removing the trackpad from the equation has made the weird wrong apparently go away.

It's down to limp along with with the accommodation of using USB mouse until my laptop dies. Tearing apart my laptop to fix it. Or tossing my laptop out the window and having the same laptop in a newer faster body as soon as shipping allows.
posted by zengargoyle at 6:12 PM on May 28, 2018


« Older Interviewing the Interviewer   |   Anxious Bride Speech Ideas Newer »
This thread is closed to new comments.