Bug 104520 - Intermittent X crashes: GPU HANG: ecode 9:0:0x85dffffb, in Xorg [443], reason: Hang on rcs0, action: reset
Summary: Intermittent X crashes: GPU HANG: ecode 9:0:0x85dffffb, in Xorg [443], reason...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: 17.3
Hardware: x86-64 (AMD64) Linux (All)
: highest major
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-07 08:16 UTC by Amy
Modified: 2019-09-18 19:40 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
GPU error dup. (53.53 KB, text/plain)
2018-01-07 08:16 UTC, Amy
Details
Dmesg log (58.80 KB, text/plain)
2018-01-07 08:17 UTC, Amy
Details
Glxinfo output. (31.33 KB, text/plain)
2018-01-07 08:24 UTC, Amy
Details
GPU crash dump from /sys/class/drm/card0/error (39.76 KB, text/plain)
2018-01-30 18:02 UTC, Michael Weitzel
Details
/sys/class/drm/card0/error file (129.84 KB, text/plain)
2018-01-31 14:06 UTC, Eric Blau
Details
dmesg output (62.15 KB, text/plain)
2018-02-27 08:41 UTC, Emilio J. Padrón
Details
xorg log (43.42 KB, text/plain)
2018-02-27 08:41 UTC, Emilio J. Padrón
Details
/sys/class/drm/card0/error (38.21 KB, text/plain)
2018-02-27 08:51 UTC, Emilio J. Padrón
Details
/sys/class/drm/card0/error (23.44 KB, text/plain)
2018-04-18 10:29 UTC, hfekih
Details
Log dump from /sys/class/drm/card0/error after GPU hang (as printed in dmesg output). (696.49 KB, text/plain)
2018-06-11 21:26 UTC, Alif Wahid
Details
/sys/class/drm/card0/error (48.07 KB, text/plain)
2018-12-06 10:00 UTC, John M.
Details

Description Amy 2018-01-07 08:16:36 UTC
Created attachment 136591 [details]
GPU error dup.

1) startx
2) loading i3 and i3 scripts (loading an xterm and palemoon) intermittently crashes
Result: GPU hangs, and eventually X crashes with this message in the dmesg.

[drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [443], reason: Hang on rcs0, action: reset
[  561.340148] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  561.340148] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  561.340148] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  561.340149] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  561.340149] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Comment 1 Amy 2018-01-07 08:17:14 UTC
Created attachment 136592 [details]
Dmesg log
Comment 2 Amy 2018-01-07 08:24:34 UTC
Further info:

Kernel (Arch Linux): 

4.14.12-1-ARCH #1 SMP PREEMPT Fri Jan 5 18:19:34 UTC 2018 x86_64 GNU/Linux

LSPCI info:
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
        Subsystem: Acer Incorporated [ALI] HD Graphics 620
        Kernel driver in use: i915
        Kernel modules: i915
--
01:00.0 3D controller: NVIDIA Corporation Device 179c (rev ff)
        Kernel modules: nouveau, nvidia_drm, nvidia
Comment 3 Amy 2018-01-07 08:24:58 UTC
Created attachment 136593 [details]
Glxinfo output.
Comment 4 Michael Weitzel 2018-01-30 18:01:54 UTC
I had the same crash (for the first time) - also on KabyLake, ArchLinux, Kernel 4.14.15-1-ARCH. I'll attach my crash dump.

[drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [636], reason: Hang on rcs0, action: reset
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
i915 0000:00:02.0: Resetting rcs0 after gpu hang
i915 0000:00:02.0: Resetting rcs0 after gpu hang
asynchronous wait on fence i915:kwin_x11[820]/1:23321 timed out
i915 0000:00:02.0: Resetting rcs0 after gpu hang
i915 0000:00:02.0: Resetting rcs0 after gpu hang
i915 0000:00:02.0: Resetting rcs0 after gpu hang
Comment 5 Michael Weitzel 2018-01-30 18:02:52 UTC
Created attachment 137059 [details]
GPU crash dump from /sys/class/drm/card0/error
Comment 6 Eric Blau 2018-01-31 14:05:42 UTC
I'm in the same boat. I get frequent hangs as reported:

kernel: [drm] GPU HANG: ecode 8:0:0x2e6b4c79, in Xorg [2680], reason: Hang on rcs0, action: reset
kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
kernel: i915 0000:00:02.0: Resetting chip after gpu hang
kernel: [drm:i915_reset [i915]] *ERROR* GPU recovery failed


Sometimes my laptop stays up and running, but other times it requires a power cycle.
Comment 7 Eric Blau 2018-01-31 14:06:30 UTC
Created attachment 137088 [details]
/sys/class/drm/card0/error file
Comment 8 xman 2018-02-01 03:49:44 UTC
I am also hitting the same issue.
[ 1516.880515] [drm] GPU HANG: ecode 9:0:0x85dffffb, reason: Hang on rcs0, action: reset
[ 1516.880517] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1516.880517] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1516.880518] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1516.880518] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1516.880518] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1516.880526] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1531.801142] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1539.837207] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1547.801180] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1555.801205] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[ 1559.001148] asynchronous wait on fence i915:compiz[1607]/1:243a timed out
[ 1563.801234] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Comment 9 Amy 2018-02-18 21:39:50 UTC
Still happening on:

Linux Playful-Plankton 4.15.4-1-ARCH #1 SMP PREEMPT Sat Feb 17 16:01:38 UTC 2018 x86_64 GNU/Linux

version number:    11.0
X.Org version: 1.19.6

xf86-video-intel 1:2.99.917+812+g75795523-1
Comment 10 Amy 2018-02-18 21:43:21 UTC
Additonally: Mesa version is

OpenGL version string: 3.0 Mesa 17.3.4
Comment 11 Amy 2018-02-25 23:07:53 UTC
Still happening on Mesa 17.3.5
Comment 12 Emilio J. Padrón 2018-02-27 08:34:49 UTC
Same (or similar) problem here!

Thinkpad T470, kaby lake (i5 7200U), running a Debian GNU/Linux Sid up-to-date with kernel 4.15.

I use Awesome 4.2 as window manager. The issue (GPU hang) seems to appear (above all) when using Emacs (I'm using the GTK-based emacs25).

I attach my dmesg output and the error dumped on /sys/class/drm/card0/error
Comment 13 Emilio J. Padrón 2018-02-27 08:41:10 UTC
Created attachment 137638 [details]
dmesg output
Comment 14 Emilio J. Padrón 2018-02-27 08:41:36 UTC
Created attachment 137639 [details]
xorg log
Comment 15 Emilio J. Padrón 2018-02-27 08:51:02 UTC
Created attachment 137640 [details]
/sys/class/drm/card0/error
Comment 16 Gennady 2018-03-28 22:47:13 UTC
I have same or very similar problem on Skylake.
Comment 17 Gennady 2018-03-28 22:57:13 UTC
I can not attach error log, please let me know if it is necessary.

Problem is reproducible, if I run certain qt5 app xorg hangs.

If I wait for a minute, it restarts.
Comment 18 Gennady 2018-03-28 23:00:55 UTC
GPU HANG: ecode 9:0:0x85dffffb, in Xorg [1035], reason: Hang on rcs0, action: reset
Kernel: 4.15.0-1-amd64
Time: 1522274173 s 645209 us
Boottime: 262 s 812221 us
Uptime: 260 s 247539 us
Active process (on ring render): Xorg [1035], score 0
Reset count: 0
Suspend count: 0
Platform: SKYLAKE
PCI ID: 0x191b
PCI Revision: 0x06
PCI Subsystem: 17aa:222e
Comment 19 Gennady 2018-03-28 23:04:24 UTC
Linux p50-debian 4.15.0-1-amd64 #1 SMP Debian 4.15.4-1 (2018-02-18) x86_64 GNU/Linux
ii  libgl1-mesa-dri:amd64                         17.3.7-1                             amd64        free implementation of the OpenGL API -- DRI modules
ii  xserver-xorg-video-intel                      2:2.99.917+git20171229-1             amd64        X.Org X server -- Intel i8xx, i9xx display driver
ii  xserver-xorg                                  1:7.7+19                             amd64        X.Org X server
ii  firmware-misc-nonfree                         20170823-1                           all          Binary firmware for various drivers in the Linux kernel
Comment 20 hfekih 2018-04-18 10:28:55 UTC
same problem here.
I am using Intel(R) Celeron(R) CPU  N3160
reproduced on Linux (kernel) 4.16.2 and 4.14.34, (Mesa 17.3.8)
problem can be reproduced by starting any application using OPEN GL ES

root@ca-linux:/home/cannon$ glmark2-es2-drm 
=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel Open Source Technology Center
    GL_RENDERER:   Mesa DRI Intel(R) HD Graphics 400 (Braswell) 
    GL_VERSION:    OpenGL ES 3.1 Mesa 17.3.8
=======================================================
[build] use-vbo=false:i965: Failed to submit batchbuffer: Input/output error
----------------------------------------------------------------------------
dmesg output:
[   38.859784] [drm] GPU HANG: ecode 8:0:0xe757feff, in glmark2-es2-drm [346], reason: Hang on rcs0, action: reset
[   38.859788] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   38.859789] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   38.859791] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   38.859792] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   38.859794] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   38.859869] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[   41.905511] i915 0000:00:02.0: Resetting chip after gpu hang
[   41.952424] asynchronous wait on fence i915:glmark2-es2-drm[346]/2:2 timed out
[   47.072637] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
[   51.372311] i915 0000:00:02.0: Failed to reset chip
Comment 21 hfekih 2018-04-18 10:29:53 UTC
Created attachment 138905 [details]
/sys/class/drm/card0/error
Comment 22 Alif Wahid 2018-06-11 21:26:39 UTC
Created attachment 140127 [details]
Log dump from /sys/class/drm/card0/error after GPU hang (as printed in dmesg output).

I see this error intermittently when running the Xilinx Vivado v2016.4 software on my Ubuntu 16.04 LTS desktop (kernel 4.4.0, Intel core i5-6400 cpu with Intel skylake-gt2 gpu). Attached the full dump from /sys/class/drm/card0/error as instructed by dmesg below.

[  255.622496] [drm] stuck on render ring
[  255.623115] [drm] GPU HANG: ecode 9:0:0x84dffff8, in Xorg [1006], reason: Engine(s) hung, action: reset
[  255.623120] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  255.623122] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  255.623125] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  255.623127] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  255.623130] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  255.624623] drm/i915: Resetting chip after gpu hang
[  257.622605] [drm] RC6 on
[  311.628103] [drm] stuck on render ring
[  311.628686] [drm] GPU HANG: ecode 9:0:0x84dffff8, in Xorg [1006], reason: Engine(s) hung, action: reset
[  311.630227] drm/i915: Resetting chip after gpu hang
[  313.628765] [drm] RC6 on
Comment 23 John M. 2018-12-06 10:00:32 UTC
Created attachment 142740 [details]
/sys/class/drm/card0/error

Hello,

Here's the error message I've got:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

The crash appeared when I tried to resize a gnome terminal window.
Comment 24 Yoshinori Gento 2019-01-28 07:05:28 UTC
I met the similar problem twice.

Environment is following.
CPU: SkyLake(core i5 6500TE)
Distribution: debian(customised)
Kernel: 4.14.40
Mesa: 17.3.9
libdrm: 2.4.89

----
[197410.815921] [drm] GPU HANG: ecode 9:0:0x85dffffb, in drawingproc [2559], reason: Hang on rcs0, action: reset
[197410.815927] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[197418.809902] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[197426.809904] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[197434.813910] i915 0000:00:02.0: Resetting rcs0 after gpu hang
i965: Failed to submit batchbuffer: Input/output error
[197442.813908] i915 0000:00:02.0: Resetting rcs0 after gpu hang
----

"i965: Failed to submit batchbuffer: Input/output error" was appeared in stderr by mesa linked by drawingproc.
Comment 25 GitLab Migration User 2019-09-18 19:40:38 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/779.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.