Dell Latitude e7450
Broadwell HD Graphics 5500
mpv -hwdec=vaapi -vo=vaapi video.mkv no longer works with mesa 17.2. It shows a black window. No video plays. Audio plays fine. App UI works fine. No problems with mesa 17.1.9. It also affects chromium when it is patched to enable hardware acceleration via vaapi.
No errors on dmesg, mpv and /sys/class/drm/card0/error
Bisected mesa and found that vaapi stopped working with commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b
If I reverse the changes to brw_state_upload.c with HEAD 17.2, vaapi starts working again.
Just a heads up: the recommended vo= option is vo=opengl as far as I know. Does the problem still occur when using that?
It plays fine with the vo=opengl option.
This bug also affects Chromium compiled with VA-API patch to use hardware video acceleration. We have at least 6 people using ArchLinux confirming this so far.
Chromium sometimes just renders videos as a black rectangle, sometimes freezes entirely, and once even crashed X.
Someone reported, that reverting the same change as @alim mention fixes the issue.
Non-exhaustive list of affected Intel CPUs: i7-6500U, i7-7500U, i7-7820HQ.
If this can't / shouldn't be reverted, is there a way to compile Chromium in a different way so that we don't experience this issue?
How we compile Chromium with VA-API today:
History of comments discussing this issue:
I'll ask affected people to CC to this bug, so you can better see the number of affected people.
Oh dear. I was pretty sure that register is supposed to be context saved and restored, so it shouldn't affect anything else. But, I guess it must be...
I'll have to look into this. This week's XDC though, so it may take a bit longer...
I tried this on Kabylake tonight, and I get a black window with Mesa master, commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b (the bisected commit), commit 86bd3fd864a8383e1d6823114da422f6a948bf1e (the one before that), and 17.1. So, it doesn't seem like it ever worked.
-vo=gl continues to work fine, just -vo=vaapi doesn't.
I'll have to try on Broadwell...
I'm on Kabylake myself and I can confirm that mpv --vo=vaapi --hwdec=vaapi works with mesa 17.1.8, but not with 17.2.0. On the other hand --vo=opengl --hwdec=vaapi works with both versions.
Now, for Chromium built with vaapi, accelerated video playback (testing with YouTube) works with 17.1.8 but not with 17.2.0. With the latter version, all that's presented is a black video with working audio.
So, trying with Mesa 17.1 on Kabylake...mpv renders black with a theora video, GPU hangs immediately with a VC1 video, and with H264, renders garbage colors and/or black and GPU hangs. The GPU hangs are definitely a vaapi batch, not from Mesa.
I don't think this is a Mesa bug. It seems like there are a lot of bugs in vaapi. And the setting my patch changed is definitely supposed to be per-context, so Mesa changing it shouldn't affect vaapi at all. vaapi probably doesn't use contexts explicitly, but I thought the kernel was supposed to give every process a context by default, to prevent state leaks like this.
Thanks for looking into this, Kenneth!
How do you suggest we proceed from here?
For example, is it possible for you to file a bug with VA-API, and revert your change to mesa until VA-API is fixed?
Because there is a bunch of us who are not upgrading to mesa-17.2, and probably there is another bunch of people who experience this issue but don't know what is causing it.
My GPU is Intel Gen5. The bug causes the chrome tabs totally freeze.
Steps to reproduce:
1. install mesa-17.2
2. install chromium-vaapi-bin
3. enable vaapi accelaration
4. open a HTML5 video
5. expected playback smoothly, but actually freeze the whole process
(In reply to nanericwang from comment #9)
> My GPU is Intel Gen5. The bug causes the chrome tabs totally freeze.
Just to clarify, by "Gen5" you mean "5th generation Core Processor", right? (Naming is hard, 5th Gen CPUs have Gen8 GPUs...)
(In reply to Kenneth Graunke from comment #10)
> (In reply to nanericwang from comment #9)
> > My GPU is Intel Gen5. The bug causes the chrome tabs totally freeze.
> Just to clarify, by "Gen5" you mean "5th generation Core Processor", right?
> (Naming is hard, 5th Gen CPUs have Gen8 GPUs...)
I mean the 5th Gen GPU, Ironlake.
(In reply to nanericwang from comment #11)
> I mean the 5th Gen GPU, Ironlake.
Okay, the patch in question here only affects Gen8+ (Broadwell and later). So you must be hitting a different bug (with similar symptoms, but almost certainly a different cause). Please file a separate report. Thanks.
(In reply to Kenneth Graunke from comment #12)
> (In reply to nanericwang from comment #11)
> > I mean the 5th Gen GPU, Ironlake.
> > https://en.wikipedia.org/wiki/
> > List_of_Intel_graphics_processing_units#Fifth_generation
> Okay, the patch in question here only affects Gen8+ (Broadwell and later).
> So you must be hitting a different bug (with similar symptoms, but almost
> certainly a different cause). Please file a separate report. Thanks.
Report filed at:
Daniel reminded me that X (Glamor) and GNOME Shell use Mesa, so my earlier testing wasn't completely on Mesa 17.1. I can now confirm the report (on Kabylake).
mpv -hwdec=vaapi -vo=vaapi did indeed work with Mesa 17.1. It misrenders or GPU hangs with 17.2 and later (due to "i965: Switch to absolute addressing for constant buffer 0").
I ran a few more experiments:
- Starting X/Glamor with 17.1 without a compositor...
- mpv -hwdec=vaapi -vo=vaapi appears to work with any Mesa version
- Running Piglit using Mesa 17.2+ concurrently with a working mpv works fine...both mpv and Piglit work fine.
I think this suggests that CS_DEBUG_MODE2 is indeed saved and restored as part of the context - Glamor and mpv (on 17.1) would expect relative mode, and Piglit (on 17.2+) would expect absolute mode. And both worked. Daniel had suggested there might be a bug where setting the mode made it take effect on all rings, and that seems to not be the case either.
- Starting X/Glamor with 17.2+ without a compositor...
- mpv -hwdec=vaapi -vo=vaapi misrenders and GPU hangs.
My theory is that new contexts are inheriting the state from X/Glamor's context. I'd say maybe it's because it gets the fd from X via DRI3...but that doesn't make much sense, because libva-intel-driver doesn't use DRI3. Maybe it's because X is the first client on the system? Or it's drm master? Or runs just before mpv?
Note that libva-intel-driver doesn't create its own context. But I think the kernel ought to make one for it, when it opens the fd...
Daniel, Chris, do you have any ideas?
Chris seems to think that we should patch libva (and possibly beignet) to initialize this value.
In the meantime, I've posted a patch for Mesa 17.2.x that reverts the offending commit...
I experience the same issue on my Skylake laptop (Dell XPS 15 9550) and also on a Kaby Lake laptop (Dell XPS 15 9560). Works with mesa 17.1.8, not with mesa 17.2.0 and upwards.
Hey Kenneth, has there been any traction on reverting and/or fixing the code during the past 3 weeks? Looks like the thread your created  didn't receive any attention whatsoever.
(In reply to Maxim Baz from comment #17)
> Hey Kenneth, has there been any traction on reverting and/or fixing the code
> during the past 3 weeks? Looks like the thread your created  didn't
> receive any attention whatsoever.
True that. I sent another patch to revert it off of master last week, which stirred up a bit of flames, and Kristian apparently convinced at least -some- kernel people that this needs to be fixed in the kernel. Whether anything will happen to that end, I have no idea.
In the meantime, I've pushed my revert to master, with the appropriate marking for stable branches. Emil / Andres should pick it up for the 17.3 / 17.2 stable branches soon.
Thanks for your patience, and sorry about this mess. :(
Fixed in master with:
Author: Kenneth Graunke <firstname.lastname@example.org>
Date: Thu Oct 19 14:38:30 2017 -0700
i965: Revert absolute mode for constant buffer pointers.
The kernel doesn't initialize the value of the INSTPM or CS_DEBUG_MODE2
registers at context initialization time. Instead, they're inherited
from whatever happened to be running on the GPU prior to first run of a
new context. So, when we started setting these, other contexts in the
system started inheriting our values. Since this controls whether
3DSTATE_CONSTANT_* takes a pointer or an offset, getting the wrong
setting is fatal for almost any process which isn't expecting this.
Unfortunately, VA-API and Beignet don't initialize this (nor does older
Mesa), so they will die horribly if we start doing this. UXA and SNA
don't use any push constants, so they are unaffected.
Until we have some kind of solution to this problem, I'm going to revert
this patch and abandon using the feature for now. It will lead to fewer
pushed UBO ranges on Broadwell+, which may lead to lower performance,
though I don't have any data on the impact.
Cc: "17.3 17.2" <email@example.com>