kernel-4.9.0-0.rc8.git0.1.fc26.x86_64 libwayland-server-1.12.0-1.fc25.x86_64 Reproducible: non-deterministic but is a regression, doesn't ever happen with 4.8.x kernels. Uncertain if Wayland is the trigger since I'm pretty much only using Wayland. Summary: Walk away from the laptop for some period, upon return there's a traceback on the screen, and the system is unresponsive. I can't get to a VT and I can't remotely login via ssh. Must be hard reset.
Created attachment 128414 [details] photo of crashed system call trace
Created attachment 128415 [details] journal full sudo journalctl -b -o short-monotonic
Created attachment 128416 [details] journal kernel sudo journalctl -b -o short-monotonic -k
Created attachment 128417 [details] lspci -vvnn 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 520 [8086:1916] (rev 07) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Device [103c:81a0]
First instance in the journal there's a problem... [31603.149518] f25h kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [7105], reason: Hang on render ring, action: reset That comes before the cell phone photo; and the journal goes to [31723.152073] at which point there's a total crash and call trace. So it doesn't look like much information was lost in the log itself; but there was no way to get the GPU crash dump saved to /sys/class/drm/card0/error.
(In reply to bugzilla from comment #5) > First instance in the journal there's a problem... > > [31603.149518] f25h kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in > gnome-shell [7105], reason: Hang on render ring, action: reset > > That comes before the cell phone photo; and the journal goes to > [31723.152073] at which point there's a total crash and call trace. So it > doesn't look like much information was lost in the log itself; but there was > no way to get the GPU crash dump saved to /sys/class/drm/card0/error. bugzilla@colorremedies.com, it could be really interesting to get this error crash dump. Moreover, it will useful, if you can enable a more verbose log for drm by setting "drm.debug=0x1e log_buf_len=1M" in your boot command line and then attached the kernel log (after issue is happening again). I may also recommend that you use latest firmware (GuC loading is indicated as skipped in your kernel log) ; you can download directly from https://01.org/linuxgraphics/intel-linux-graphics-firmwares You may have a try by using "i915.enable_rc6=0" in your boot command line and see if this issue is still occurring.
[ 1272.199182] [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [1556], reason: Hang on render ring, action: reset [ 1272.199195] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1272.199201] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1272.199206] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1272.199210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1272.199216] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1272.199377] drm/i915: Resetting chip after gpu hang [ 1272.201224] [drm] RC6 on [ 1272.213478] [drm] GuC firmware load skipped [ 1321.222138] drm/i915: Resetting chip after gpu hang [ 1321.222547] [drm] RC6 on [ 1321.239452] [drm] GuC firmware load skipped [ 1333.254100] drm/i915: Resetting chip after gpu hang [ 1333.254507] [drm] RC6 on [ 1333.270956] [drm] GuC firmware load skipped [ 1343.238096] drm/i915: Resetting chip after gpu hang [ 1343.238500] [drm] RC6 on [ 1343.257458] [drm] GuC firmware load skipped [ 1343.274295] do_trap: 222 callbacks suppressed [ 1343.274298] traps: gnome-software[1845] trap int3 ip:7fb92e8d7a21 sp:7fffd0941ca0 error:0 [ 1343.274303] in libglib-2.0.so.0.5000.2[7fb92e888000+110000]
Created attachment 128437 [details] drm card error # cat /sys/class/drm/card0/error This time there was no hang or freeze. This might be a duplicate of bug 98488.
/lib/firmware/i915/skl_dmc_ver1_26.bin and /lib/firmware/i915/skl_guc_ver6_1.bin are present already on the system. Their sha256sum matches that of the two binaries listed for skylake CPUs at https://01.org/linuxgraphics/intel-linux-graphics-firmwares So I don't understand why there's a skipped message. Does it need to be in the initramfs? 'sudo lsinitrd /boot/initramfs-4.9.0-0.rc8.git0.1.fc26.x86_64.img skl' returns nothing, so maybe that's the problem.
Nope, they are in the initramfs. [chris@f25h skl_guc_ver6_1]$ sudo lsinitrd /boot/initramfs-4.9.0-0.rc8.git0.1.fc26.x86_64.img | grep skl -rw-r--r-- 1 root root 8928 Sep 23 05:51 usr/lib/firmware/i915/skl_dmc_ver1_26.bin -rw-r--r-- 1 root root 129024 Sep 23 05:51 usr/lib/firmware/i915/skl_guc_ver6_1.bin
[chris@f25h i915]$ modinfo i915 | grep guc firmware: i915/kbl_guc_ver9_14.bin firmware: i915/bxt_guc_ver8_7.bin firmware: i915/skl_guc_ver6_1.bin parm: enable_guc_loading:Enable GuC firmware loading (-1=auto, 0=never [default], 1=if available, 2=required) (int) parm: enable_guc_submission:Enable GuC submission (-1=auto, 0=never [default], 1=if available, 2=required) (int) parm: guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int) Looks like it's set to not load this firmware by default.
Freeze/hang happened again just now. Black screen with mouse arrow that doesn't move, can't get to a VT either. This is all that's in the journal after rebooting. Dec 13 15:22:11 f25h kernel: [drm] GPU HANG: ecode 9:0:0xfffffffe, in gnome-shell [1584], reason: Hang on render ring, action: reset Dec 13 15:22:11 f25h kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Dec 13 15:22:11 f25h kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Dec 13 15:22:11 f25h kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Dec 13 15:22:11 f25h kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Dec 13 15:22:11 f25h kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Dec 13 15:22:11 f25h kernel: drm/i915: Resetting chip after gpu hang Dec 13 15:22:11 f25h kernel: [drm] RC6 on Dec 13 15:22:11 f25h kernel: [drm] GuC firmware load skipped Dec 13 15:22:23 f25h kernel: drm/i915: Resetting chip after gpu hang Dec 13 15:22:23 f25h kernel: [drm] RC6 on Dec 13 15:22:23 f25h kernel: [drm] GuC firmware load skipped Dec 13 15:23:00 f25h kernel: traps: gnome-terminal-[1997] trap int3 ip:7fc638631a21 sp:7ffe8ca59e60 error:0 Dec 13 15:23:00 f25h kernel: in libglib-2.0.so.0.5000.2[7fc6385e2000+110000] Dec 13 15:23:00 f25h kernel: traps: nautilus[5974] trap int3 ip:7f5beb288a21 sp:7ffdf124aec0 error:0 Dec 13 15:23:00 f25h kernel: in libglib-2.0.so.0.5000.2[7f5beb239000+110000] Dec 13 15:23:00 f25h kernel: traps: gnome-software[1862] trap int3 ip:7fe292685a21 sp:7fff3f201fc0 error:0 Dec 13 15:23:00 f25h kernel: in libglib-2.0.so.0.5000.2[7fe292636000+110000] Dec 13 15:23:00 f25h kernel: traps: abrt-applet[1868] trap int3 ip:7f5b977e2a21 sp:7fff9c4bcfb0 error:0 Dec 13 15:23:00 f25h kernel: in libglib-2.0.so.0.5000.2[7f5b97793000+110000] Dec 13 15:29:48 f25h kernel: intel_powerclamp: Start idle injection to reduce power
Created attachment 128456 [details] dmesg debug $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.9.0-1.fc26.x86_64 root=UUID=c45caf39-a048-4c44-90c9-535dc8003c71 ro rootflags=subvol=root elevator=noop no_console_suspend ignore_loglevel i915.enable_rc6=0 drm.debug=0xe log_buf_len=1M i915.enable_guc_loading=-1 i915.enable_guc_submission=-1 i915.guc_log_level=0 No crash yet, just dmesg following about an hour with the above command line. Both firmwares appear to be loaded now. If enable_rc6=0 is possibly inhibiting the problem, I'd rather run without it so the problem happens and hopefully the problem gets logged.
(In reply to bugzilla from comment #13) > Created attachment 128456 [details] > dmesg debug > > $ cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-4.9.0-1.fc26.x86_64 > root=UUID=c45caf39-a048-4c44-90c9-535dc8003c71 ro rootflags=subvol=root > elevator=noop no_console_suspend ignore_loglevel i915.enable_rc6=0 > drm.debug=0xe log_buf_len=1M i915.enable_guc_loading=-1 > i915.enable_guc_submission=-1 i915.guc_log_level=0 > > No crash yet, just dmesg following about an hour with the above command > line. Both firmwares appear to be loaded now. > > If enable_rc6=0 is possibly inhibiting the problem, I'd rather run without > it so the problem happens and hopefully the problem gets logged. thanks bugzilla@colorremedies.com. So it looks like to me that is may be a dup of 95063 *** This bug has been marked as a duplicate of bug 95063 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.