Bug 107269 - [kbl guc] GPU hang after suspend with 4.18.0-041800rc5-generic
Summary: [kbl guc] GPU hang after suspend with 4.18.0-041800rc5-generic
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-18 02:15 UTC by Troels Liebe Bentsen
Modified: 2018-08-30 07:24 UTC (History)
2 users (show)

See Also:
i915 platform: KBL
i915 features: GPU hang


Attachments
cat /sys/class/drm/card0/error > card0.dump (66.43 KB, text/plain)
2018-07-18 02:15 UTC, Troels Liebe Bentsen
no flags Details

Description Troels Liebe Bentsen 2018-07-18 02:15:22 UTC
Created attachment 140682 [details]
cat /sys/class/drm/card0/error > card0.dump

[  147.838566] [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [1910], reason: hang on rcs0, action: reset
[  147.838568] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  147.838569] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  147.838570] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  147.838570] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  147.838571] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  147.838596] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[  147.849982] i915 0000:00:02.0: Resetting chip for hang on rcs0
[  147.863346] [drm] HuC: Loaded firmware i915/kbl_huc_ver02_00_1810.bin (version 2.0)
[  147.864795] [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39)
[  147.864827] i915 0000:00:02.0: GuC firmware version 9.39
[  147.864828] i915 0000:00:02.0: GuC submission enabled
[  147.864829] i915 0000:00:02.0: HuC enabled
[  155.831526] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[  156.146966] rfkill: input handler enabled
[  160.014511] rfkill: input handler disabled
[  163.827649] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[  171.827675] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[  179.827534] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[  179.885574] rfkill: input handler enabled
[  187.950648] rfkill: input handler disabled
Comment 1 Chris Wilson 2018-07-18 06:47:12 UTC
That's definitely odd. It loaded the context but didn't update the RING_TAIL:

rcs0 command stream:
  IDLE?: no
  START: 0x00191000
  HEAD:  0x000002a0 [0x000002a8]
  TAIL:  0x000002a0 [0x00000308, 0x00000328]
  CTL:   0x00003000
  MODE:  0x00000200
  HWS:   0xfedcf000
  ACTHD: 0x00000000 000002a0
  IPEIR: 0x00000000
  IPEHR: 0x00000000

Oh guc. Don't do that.
Comment 2 James Ausmus 2018-07-18 22:52:49 UTC
Please try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Comment 3 Jani Saarinen 2018-08-13 09:51:57 UTC
Reporter, any luck testing latest https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M?
Comment 4 Troels Liebe Bentsen 2018-08-14 16:29:42 UTC
Sorry, work got in the way.

I have now installed this kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

So hoping for a hang so I have something to report back.
Comment 5 Jani Saarinen 2018-08-15 06:27:55 UTC
OK, thanks.
Comment 6 Lakshmi 2018-08-30 07:13:03 UTC
Reporter, do you still see the issue with latest drmtip? If no issues, I can close the bug.
Comment 7 Troels Liebe Bentsen 2018-08-30 07:16:41 UTC
I have not been able to trigger the bug again, so lets do that.
Comment 8 Lakshmi 2018-08-30 07:24:23 UTC
Closing this bug as it works fine with latest drmtip. Can be reopened when it occurs again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.