Created attachment 141570 [details]
Compressed file with relevant files
When playing Dota 2 with Vulkan, it's noticed that there are many occasions where a GPU hang happens and, if using 4.18 or newer kernel, causes a system crash a few seconds later. The GPU hang is observed with 4.17.19, but the system crash is not.
The workaround is to use INTEL_DEBUG=nohiz as one of the environment variables, but there is performance degradation.
The attached file has the following content:
The GPU hang log, it has 2,6 MB;
The dmesg output which unfortunately has nothing about the system crash;
One screenshot (with the two monitors) showing what was the screen when the GPU hang happened;
One screenshot (single monitor) showing Dota 2's video settings;
One sound file with an approximation of what was the sound when the system crashed. Please note the sound is unpleasing to hear.
The GPU hang seems similar to https://bugs.freedesktop.org/show_bug.cgi?id=107760
Dota 2 is being affected by https://bugs.freedesktop.org/show_bug.cgi?id=107899 too.
Processor: Intel Core i3-6100U;
Video: Intel HD Graphics 520;
Mesa: 18.3.0-devel (git-914bd3014f);
Kernel version: drm-tip (feeccde66999c5e87be3550f2159e5d7eeb61c67)
Distribution: Xubuntu 18.04.1 amd64.
Yeah, we appear to have a HiZ bug that's crept in some time in the not-so-distant past. How reproducable is this? Can I go into a "test out a character" game and get a hang fairly quickly? If you've got a way to reliably reproduce the issue, it'll be much easier to fix.
The system crash is a separate issue and it's a kernel bug. Probably best to file another bug for that one so we can track them independently.
Created attachment 141574 [details]
Logs and screenshot
The hang happens when, for example, I try to see a character description. In the screenshot I sent I clicked to see Earthshaker description. In the moment the 3D model should appear the GPU hang happens. The steps are:
Open the game, then click to see the heroes, then choose one of them. Not all heroes trigger the GPU hang, but Earthshaker do, for example.
Another possibility is in the Learn tab, in the tutorial 2, when the Dragon Knight should appear the GPU hang happens.
One thing I noticed is that a few heroes, including the heroine Luna, do not cause the GPU hang. Which is fortunate as she's the first one that appears, so I have enough time to test.
The attached file has data from a hang using 4.17.19 kernel, as anything newer is causing the system to crash. I will report the system crash issue separated from this one.
I've been able to successfully reproduce with Earthshaker. Unfortunately, I don't have a second machine with me right now (and won't for a couple weeks) so I can't really debug it as running the game brings down my dev laptop. I'll try to look at it in more detail first week of October. Wanted to give you an update so you don't think I'm ignoring you for three weeks.
On Skylake with kernel 4.15.0-33-generic
hang bisected to commit:
commit 79270d2140ec4fe5e4351f35150ed2d14687af07 (HEAD, refs/bisect/bad)
Author: Jason Ekstrand <email@example.com>
Date: Wed Jul 11 16:31:02 2018 -0700
anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV
We've had several broadwell hangs that have come down to this bit just
not working correctly. Most recently, we've had a pile of hangs
reported with apps running under DXVK:
Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.
Reviewed-by: Kenneth Graunke <firstname.lastname@example.org>
(cherry picked from commit abd629eb3d4027b89c13158e90c6732b412e550e)
Bah! Thanks for your bisection! I think we need to only do what that patch does for gen8 and then do the old thing on gen9.
Initial version: https://patchwork.freedesktop.org/patch/250989/
Also v2 uploaded but will check tomorrow
After reverting that commit, the GPU hang is no longer happening, I can see Earthshaker description and Dragon Knight's tutorial with no problem. In fact I could cycle though all heroes and no hang happened.
And removing INTEL_DEBUG=nohiz made framerate increase significantly too, from 28 to 105.
v2 works on Skylake and Kabylake
This should be fixed by the following commit in master:
commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5 (public/master)
Author: Sergii Romantsov <email@example.com>
Date: Wed Sep 19 19:21:11 2018 +0300
anv/skylake: disable ForceThreadDispatchEnable
On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
-v2: enabling of ForceThreadDispatchEnable is only for gen8, for
gen9 and higher reverted enabling of PixelShaderHasUAV.
-v3 (Jason Ekstrand): Rework the comments a bit.
CC: Jason Ekstrand <firstname.lastname@example.org>
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <email@example.com>
Reviewed-by: Jason Ekstrand <firstname.lastname@example.org>
Before we decide that this is completely done and dusted, could you please try the following branch as well. It seems to fix Dota 2 for me.
Checked Dota 2 with your branch (commit 1a9cac2a8ef19be9c796fd78f6ed577086f2172d).
And it hangs.
Created attachment 142070 [details]
Created attachment 142071 [details]
When I downloaded that branch HEAD was 3c08f47027adab569e8f94d4c03c689c8f9cba69 and it had no GPU hangs. Apparently the commit was reverted after I already started the download.
I'll have to test with 1a9cac2a8ef19be9c796fd78f6ed577086f2172d.