| Summary: | [skl,kbl] [drm] GPU hang in Valve games based on Source 1 | ||
|---|---|---|---|
| Product: | Mesa | Reporter: | Robert <frail.knight> | 
| Component: | Drivers/DRI/i965 | Assignee: | Jason Ekstrand <jason> | 
| Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> | 
| Severity: | normal | ||
| Priority: | medium | CC: | bockor, egorr.berd, fan4326, frail.knight, horst, intel-gfx-bugs, nicolaspok | 
| Version: | 17.2 | Keywords: | bisected, regression | 
| Hardware: | x86-64 (AMD64) | ||
| OS: | Linux (All) | ||
| See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=104163 | ||
| Whiteboard: | |||
| i915 platform: | KBL | i915 features: | GPU hang | 
| Bug Depends on: | |||
| Bug Blocks: | 103491 | ||
| Attachments: | GPU dump file, CSGO dump file and dmesg output 2nd set - Triplet of Crashes hack Error state from sklgt4 crash dump | ||
| I'd also like to mention: Dell XPS 13 9360 DE Ubuntu(Xubuntu) 17.10 (in development with current updates) I will gladly provide more information if there are any other questions regarding packages or version numbers. Mesa: 17.2.0~rc4-0ubuntu3 xerver-xorg-video-intel: 2:2.99.917+git20170309-0ubuntu1 possibly maybe related with bug #102226 Created attachment 133915 [details]
2nd set - Triplet of CrashesFYI, it seems I've cornered the cause of the crashing. If I set Multisampling Anti-Aliasing Mode to None, every time I join a server and view the first in game [Continue] banner screen, the game will crash. If I set Multisampling Anti-Aliasing Mode to 2xMSAA, the game will load and play just fine. Issue still present with 17.2.0-0ubuntu1 There are reports of possibly the same issue being solved by downgrading from 17.2.0 to 17.1.8-2: https://github.com/ValveSoftware/csgo-osx-linux/issues/1523 Maybe that helps in narrowing down the issue to a change between versions. Modifying Multisampling Anti-Aliasing Mode from 2xMSAA (Which works fine) to None in Team Fortress 2 also crashes that game immediately. FYI reproduced on my KBL too, Team Fortress 2 seems to trigger this quite easily. Confirming what others have said: this issue is only present with Mesa 17.2.0 with MSAA disabled in the game settings. The issue is not present with Mesa 17.1.8 or with MSAA set to "2x MSAA" in the game settings. Still present in the offiial release of Ubuntu 17.10 along with Mesa 17.2.2 Can confirm exact same issue for Counter-Strike: Source via Steam on Intel Corporation HD Graphics 520 (i915) on Ubuntu 17.10 (kernel 4.13.0-16-generic). Setting Aliasing Mode to 2xMSAA also helped (how did you even find out that it does?). I was trying various settings to see if anything helped alleviate the crashes, and got lucky rather early with making that single change. :) Could someone else give a test with latest Mesa master? I played TF2 for a while now without MSAA and did not reproduce the hang. (In reply to Tapani Pälli from comment #14) > Could someone else give a test with latest Mesa master? I played TF2 for a > while now without MSAA and did not reproduce the hang. Forget about that, just reproduced it again :/ Will attempt a bisect later. Kisak-valve had performed a bisect here: https://github.com/ValveSoftware/csgo-osx-linux/issues/1509#issuecomment-339126634 Also looks like he contacted a dev? (In reply to Robert from comment #16) > Kisak-valve had performed a bisect here: > https://github.com/ValveSoftware/csgo-osx-linux/issues/1509#issuecomment- > 339126634 > > Also looks like he contacted a dev? The commit he bisected to is: commit 3e57e9494c2279580ad6a83ab8c065d01e7e634e Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Wed Jun 21 21:35:07 2017 -0700 i965: Enable regular fast-clears (CCS_D) on gen9+ Reassigning. Created attachment 135206 [details] [review] hack This hack applies on top of bisected commit. With CCS_E commented out, cannot reproduce the hang. (In reply to Tapani Pälli from comment #18) > Created attachment 135206 [details] [review] [review] > hack > > This hack applies on top of bisected commit. With CCS_E commented out, > cannot reproduce the hang. with the caveat that I was not playing for ~2 hours .. but with Team Fortress 2 this happens typically very fast. Created attachment 135218 [details]
Error state from sklgt4I did a little looking at this and can repro with TF2. I pulled two error states and both seem to be the third PIPE_CONTROL after a stream of 3DPRIMITIVE calls each of which draws a single quad. The 3DPRIMITIVE is writing PS depth count. I have no idea how much of that information is useful yet. Jason has suggested reverting 3e57e9494c2279580ad6a83ab8c065d01e7e634e for mesa 17.3 This one still needs a partial-revert from Jason *** Bug 103973 has been marked as a duplicate of this bug. *** This should be fixed by: commit ee57b15ec764736e2d5360beaef9fb2045ed0f68 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Wed Nov 29 16:22:42 2017 -0800 i965: Disable regular fast-clears (CCS_D) on gen9+ This partially reverts commit 3e57e9494c2279580ad6a83ab8c065d01e7e634e which caused a bunch of GPU hangs on several Source titles. To date, we have no clue why these hangs are actually happening. This undoes the final effect of 3e57e9494c227 and gets us back to not hanging. Tested with Team Fortress 2. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102435 Fixes: 3e57e9494c2279580ad6a83ab8c065d01e7e634e Cc: mesa-stable@lists.freedesktop.org If not, please reopen. Thanks for the reports and your patience! *** Bug 104223 has been marked as a duplicate of this bug. *** A proper fix for this now on the mailing list: https://patchwork.freedesktop.org/series/35325/ With that, I can now run TF2 just fine on SKL with CCS_E re-enabled for sRGB. *** Bug 104324 has been marked as a duplicate of this bug. *** We also have this one: https://bugs.freedesktop.org/show_bug.cgi?id=103509 Just have similar crash: [90130.822757] [drm] GPU HANG: ecode 9:0:0x84dfbffc, in X [10017], reason: Hang on rcs0, action: reset [90130.822759] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [90130.822760] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [90130.822760] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [90130.822760] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [90130.822761] [drm] GPU crash dump saved to /sys/class/drm/card1/error [90130.822766] i915 0000:00:02.0: Resetting rcs0 after gpu hang [90142.849781] i915 0000:00:02.0: Resetting rcs0 after gpu hang [90156.801766] i915 0000:00:02.0: Resetting rcs0 after gpu hang [90166.849852] i915 0000:00:02.0: Resetting rcs0 after gpu hang [90178.817831] i915 0000:00:02.0: Resetting rcs0 after gpu hang [90179.535886] nouveau 0000:01:00.0: disp: 0x00006820[0]: INIT_GENERIC_CONDITON: unknown 0x07 [90180.623289] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. Gentoo linux (Lenovo Thinkpad P51): mesa-17.3.3 gentoo-sources-4.15.3 xorg-server-1.19.5 Using modesetting driver. Created attachment 137394 [details]
crash dumpUnless this happened while playing a Valve game it is likely completely unrelated. Please file a new bug and include enough details to reproduce. | 
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 133817 [details] GPU dump file, CSGO dump file and dmesg output CSGO crashed after playing ~2 hours in and out of matches. The following was reported in dmesg: [ 7987.649974] [drm] GPU HANG: ecode 9:0:0x86df7cf9, in csgo_linux64 [4947], reason: Hang on rcs, action: reset [ 7987.649976] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 7987.649978] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 7987.649979] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 7987.649980] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 7987.649981] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 7987.650057] drm/i915: Resetting chip after gpu hang [ 7987.650622] [drm] RC6 on [ 8001.652386] drm/i915: Resetting chip after gpu hang [ 8001.652537] [drm] RC6 on [ 8013.652392] drm/i915: Resetting chip after gpu hang [ 8013.652531] [drm] RC6 on [ 8027.636176] drm/i915: Resetting chip after gpu hang [ 8027.636314] [drm] RC6 on [ 8038.644153] drm/i915: Resetting chip after gpu hang [ 8038.644306] [drm] RC6 on [ 8038.843763] show_signal_msg: 65 callbacks suppressed [ 8038.843765] csgo_linux64[5008]: segfault at 1338 ip 00007f04bfe3f2a9 sp 00007f0444182710 error 6 in client_client.so[7f04bf1c6000+17cf000] I've included this as well as the GPU crash dump in the attachment.