Bug 31021 - [arrandale] gpu hangs whenever running any glx programs e.g. glxgear
Summary: [arrandale] gpu hangs whenever running any glx programs e.g. glxgear
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2010-10-20 22:14 UTC by Mathieu Zhang
Modified: 2017-07-24 23:06 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (84.20 KB, text/plain)
2010-10-20 23:49 UTC, Mathieu Zhang
no flags Details
error_state (857.22 KB, text/plain)
2010-10-22 14:57 UTC, Mathieu Zhang
no flags Details
i915_error_state when gpu hangs (843.40 KB, text/plain)
2010-11-24 04:55 UTC, Dongxu Li
no flags Details
intel_reg_dump when gpu hangs (10.37 KB, text/plain)
2010-11-24 04:56 UTC, Dongxu Li
no flags Details

Description Mathieu Zhang 2010-10-20 22:14:01 UTC
I have the following hardware (Thinkpad x201):
#lspci -v -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device 215a
	Flags: bus master, fast devsel, latency 0, IRQ 47
	Memory at f2000000 (64-bit, non-prefetchable) [size=4M]
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at 1800 [size=8]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Capabilities: [a4] PCI Advanced Features
	Kernel driver in use: i915
	Kernel modules: i915

Whenever I run any glx program (glxgear, compiz etc) would result in gpu hang. dmesg shows:

[  207.092983] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  207.093009] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1586 at 1270)
[  207.176959] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  207.178870] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1587 at 1270)
[  208.394269] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  208.394288] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1591 at 1270)
[  242.986265] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  242.986480] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1593 at 1270)
[  243.062197] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  243.062226] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1595 at 1270)
[  243.140173] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  243.140193] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1597 at 1270)

and

Xorg.0.log logs an error:

[   358.379] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   358.796] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   358.898] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   360.072] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   360.360] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   372.084] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   376.080] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   376.080] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[   376.275] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.

My current software stack is latest git on all the following:
xf86-video-intel
libdrm
mesa
xorg-server

and the git version of kernel (just to catch bug fixes in kernel).

I also noticed this problem goes away if I downgrade xf86-video-intel to <2.11 along with appropriate versions of other libraries. I mostly use awesome for WM and rarely use any 3D program. So this does not really affect my day-to-day activity with the sole exception of making 3D plot in Mathematica, which would cause the same hang.

Let me know if I can provide any more useful information.
Comment 1 Mathieu Zhang 2010-10-20 23:49:17 UTC
Created attachment 39607 [details]
dmesg
Comment 2 Chris Wilson 2010-10-22 03:15:43 UTC
The most useful information is in /sys/kernel/debug/dri/0/i915_error_state. As this seems to be correlated with an -intel update, it suggests that it is swapbuffers related. [2.11 is notoriously buggy in this area!]. Bisecting -intel would be useful as you observe this on your x201 and I don't...
Comment 3 Mathieu Zhang 2010-10-22 14:57:02 UTC
Created attachment 39638 [details]
error_state

I induced a crash and got this error_state.
Comment 4 Mathieu Zhang 2010-10-22 18:54:19 UTC
(In reply to comment #3)
> Created an attachment (id=39638) [details]
> error_state
> 
> I induced a crash and got this error_state.

I just discover that the crash do not occur even if I upgrade xf86-video-intel to 2.13 as long as I keep <=mesa-7.9. This is the currently WORKING stack:

media-libs/mesa-7.8.2
x11-apps/mesa-progs-7.7
x11-base/xorg-server-1.9.0.902
x11-drivers/xf86-video-intel-2.13.0
x11-libs/libdrm-2.4.22
Comment 5 Chris Wilson 2010-10-23 01:50:47 UTC
Mesa was having a little too much fun:

0x06ca4100:      0x54f00006: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src ti
le 0, dst tile 0)
0x06ca4104:      0x03cc0080:    format 8888, dst pitch 128, clipping disabled
0x06ca4108:      0x00000000:    dst (0,0)
0x06ca410c:      0x00160016:    dst (22,22)
0x06ca4110:      0x0be8a000:    dst offset 0x0be8a000
0x06ca4114:      0x000803d5:    src (981,8)
0x06ca4118:      0x00000040:    src pitch 64
0x06ca411c:      0x0be89000:    src offset 0x0be89000

The source extents are outside the drawable -> HANG.
Comment 6 Chris Wilson 2010-10-23 01:54:35 UTC
However that error state is from an 945GME not an x201 (which is Arrandale)...
Comment 7 Dongxu Li 2010-11-24 04:55:03 UTC
Created attachment 40537 [details]
i915_error_state when gpu hangs

00:02.0 VGA compatible controller: Intel Corporation Arrandale Integrated Graphics Controller (rev 12)
Comment 8 Dongxu Li 2010-11-24 04:56:26 UTC
Created attachment 40538 [details]
intel_reg_dump when gpu hangs
Comment 9 Chris Wilson 2010-12-05 04:05:16 UTC
Mathieu, can you please list the broken versions and perhaps an i915_error_state captured from you x201?
Comment 10 Mathieu Zhang 2010-12-06 00:04:33 UTC
(In reply to comment #9)
> Mathieu, can you please list the broken versions and perhaps an
> i915_error_state captured from you x201?

See above comment #3 for error_state and #4 for last working stack.
Comment 11 Chris Wilson 2010-12-06 02:15:45 UTC
My fault, I looked at the wrong file.

From your error state:

0x0ce0b1ec:      0x78080003: 3DSTATE_VERTEX_BUFFERS
0x0ce0b1f0:      0x0000000c:    buffer 0: sequential, pitch 12b
0x0ce0b1f4: HEAD 0x00000000:    buffer address
0x0ce0b1f8:      0x00000000:    max index
0x0ce0b1fc:      0x00000000:    mbz

Are you using the gallium i965 driver or the classic?
Comment 12 Mathieu Zhang 2010-12-07 12:38:08 UTC
(In reply to comment #11)
> Are you using the gallium i965 driver or the classic?

Classic.
Comment 13 Chris Wilson 2011-01-19 11:50:18 UTC
I've only seen this widespread failure with the gallium drivers. glxgears failing is a sure sign of the error being within your configuration.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.