Bug 38952 - [gm45] garbage in batch buffer -> hang
Summary: [gm45] garbage in batch buffer -> hang
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium critical
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-04 09:45 UTC by Roshni
Modified: 2016-02-26 00:47 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Error Log i915 (766.08 KB, text/plain)
2011-07-04 09:45 UTC, Roshni
Details
dmesg output (84.13 KB, application/octet-stream)
2011-07-04 09:47 UTC, Roshni
Details
lspci output (18.67 KB, text/plain)
2011-07-04 09:47 UTC, Roshni
Details
Xorg Log (56.18 KB, application/x-zip-compressed)
2011-07-04 09:48 UTC, Roshni
Details
new i915 error state log (25 bytes, text/plain)
2011-07-08 18:08 UTC, Roshni
Details
new Xorg Log (60.82 KB, text/plain)
2011-07-08 18:09 UTC, Roshni
Details
new dmesg output (123.03 KB, text/plain)
2011-07-08 18:10 UTC, Roshni
Details

Description Roshni 2011-07-04 09:45:44 UTC
Created attachment 48745 [details]
Error Log i915

Machine details:
Lenovo Thinkpad T400
Intel® CoreTM2 Duo processor P8400
Integrated Intel® GMA 4500M HD
Ubuntu 2.6.38-8-generic-pae (32bit)
uname -m i686

Symptoms when the problem occurs:
Display stops responding. There are 3 types of failures noticed:
- Display may randomly flicker (not garbage but toggles between desktop and couple open windows)
- Mouse moves but everything else is forzen (including computer time)
- Display is frozen, includes mouse pointer not moving
When the display is frozen, we are able to ssh into the machine. Further we are able to play music etc using the hot keys.

Problem has occured:
Can not correlate to any particular program or task. Seems to be random in nature.
- Has happened when actively using the laptop on AC power without the laptop ever going to sleep or screen saver mode after power up (happened within 10 minutes of powerup). Google Chrome was open along with Clementine music player at the time of failure.
- Has happened after 4 days of actively using laptop (during the 4 days laptop was put to sleep and woken up multiple times

Error snippet:
[ 1467.864056] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1467.867755] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 788345 at 788325, next 788372)
[ 1468.376145] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 1468.431602] show_signal_msg: 24 callbacks suppressed
[ 1468.431607] compiz[1490]: segfault at 0 ip b6fcd64d sp bff88998 error 6 in libc-2.13.so[b6eb8000+15a000]
[ 1470.269453] compiz[2573]: segfault at 0 ip b63fd2ce sp bff8a980 error 6 in i965_dri.so[b63e5000+a4000]
[ 1472.045789] compiz[2579]: segfault at 0 ip b63fa2ce sp bf8e9010 error 6 in i965_dri.so[b63e2000+a4000]
[ 1473.883547] compiz[2581]: segfault at 0 ip b63a92ce sp bf9f7f20 error 6 in i965_dri.so[b6391000+a4000]
[ 1475.578960] compiz[2583]: segfault at 0 ip b62982ce sp bff012b0 error 6 in i965_dri.so[b6280000+a4000]


modinfo i915
filename:       /lib/modules/2.6.38-8-generic-pae/kernel/drivers/gpu/drm/i915/i915.ko
license:        GPL and additional rights
description:    Intel Graphics
author:         Tungsten Graphics, Inc.
license:        GPL and additional rights
srcversion:     0974A24E53B65781A91250E

modinfo drm
filename:       /lib/modules/2.6.38-8-generic-pae/kernel/drivers/gpu/drm/drm.ko
license:        GPL and additional rights
description:    DRM shared core routines
author:         Gareth Hughes, Leif Delgass, José Fonseca, Jon Smirl
srcversion:     CC0F28C4FDAC8FEAD451313

Please let me know if any additional information is required. I will be happy to help.
Comment 1 Roshni 2011-07-04 09:47:00 UTC
Created attachment 48746 [details]
dmesg output
Comment 2 Roshni 2011-07-04 09:47:59 UTC
Created attachment 48747 [details]
lspci output
Comment 3 Roshni 2011-07-04 09:48:43 UTC
Created attachment 48748 [details]
Xorg Log
Comment 4 Chris Wilson 2011-07-04 10:04:42 UTC
You set the priority to high even though you didn't test with the latest drivers?

The culprit is:

0x0f2b85cc:      0x0beb1c00: MI UNKNOWN
0x0f2b85d0: HEAD 0x00000004: MI_NOOP
0x0f2b85d4:      0x00000000: MI_NOOP
0x0f2b85d8:      0x08000000: MI UNKNOWN
0x0f2b85dc:      0x00000000: MI_NOOP
0x0f2b85e0:      0x08000000: MI UNKNOWN
0x0f2b85e4:      0x0beb1700: MI UNKNOWN
0x0f2b85e8:      0x00000000: MI_NOOP
0x0f2b85ec:      0x00000000: MI_NOOP
0x0f2b85f0:      0x10000010: MI_STORE_DATA_IMM
0x0f2b85f4:      0x0beb1740:    dword 1
0x0f2b85f8:      0x00000004:    dword 2
0x0f2b85fc:      0x00000000:    dword 3

which is actually due to the misplaced write (and relocation!) of 0x0f2b85cc and 0x0f2b85d0 (which are both misplaced by 8 bytes).
Comment 5 Roshni 2011-07-04 10:43:01 UTC
Thanks a lot for the prompt reply. I apologize for setting the priority as High. 
Is there a fix for this issue, if so, please direct me to the git repo and the driver version which has the fix. I will try it on my kernel and let you know asap. 

Thanks,
Roshni

(In reply to comment #4)
> You set the priority to high even though you didn't test with the latest
> drivers?
> 
> The culprit is:
> 
> 0x0f2b85cc:      0x0beb1c00: MI UNKNOWN
> 0x0f2b85d0: HEAD 0x00000004: MI_NOOP
> 0x0f2b85d4:      0x00000000: MI_NOOP
> 0x0f2b85d8:      0x08000000: MI UNKNOWN
> 0x0f2b85dc:      0x00000000: MI_NOOP
> 0x0f2b85e0:      0x08000000: MI UNKNOWN
> 0x0f2b85e4:      0x0beb1700: MI UNKNOWN
> 0x0f2b85e8:      0x00000000: MI_NOOP
> 0x0f2b85ec:      0x00000000: MI_NOOP
> 0x0f2b85f0:      0x10000010: MI_STORE_DATA_IMM
> 0x0f2b85f4:      0x0beb1740:    dword 1
> 0x0f2b85f8:      0x00000004:    dword 2
> 0x0f2b85fc:      0x00000000:    dword 3
> 
> which is actually due to the misplaced write (and relocation!) of 0x0f2b85cc
> and 0x0f2b85d0 (which are both misplaced by 8 bytes).
Comment 6 Roshni 2011-07-08 18:08:56 UTC
Created attachment 48908 [details]
new i915 error state log
Comment 7 Roshni 2011-07-08 18:09:33 UTC
Created attachment 48909 [details]
new Xorg Log
Comment 8 Roshni 2011-07-08 18:10:00 UTC
Created attachment 48910 [details]
new dmesg output
Comment 9 Roshni 2011-07-08 18:12:15 UTC
(In reply to comment #8)
> Created an attachment (id=48910) [details]
> new dmesg output

The symptoms were a bit different this time:
The display was blank (not flickering)

Just thought additional information might help! Please let me know if there is an updated driver that I can try. Really appreciate all the help. 

Thanks,
Roshni
Comment 10 Chris Wilson 2011-07-10 04:46:15 UTC
What exactly is the second bug? The display remains blank when? The only thing of interest in the more recent logs is that the PnP failed to resume, i.e. nothing related to gfx appears to malfunctioning.
Comment 11 Roshni 2011-07-10 07:40:45 UTC
Hi Chris,

Thanks for your prompt analysis and reply. In the second case, the Symptoms were as follows:

Before
1. Everything was working fine, then due to 15 minutes of inactivity display was in sleep state (auto-Display shutoff).
2. Laptop was running on AC power, and no external peripherals were connected.

Symptoms:
1. Display did not resume after mouse / keyboard activity. Mouse pointer was showing movement.
2. Music could be played/paused by the media buttons on the Laptop.
3. Had to reboot the PC to get the display back. 

Thought it was the same issue, so updated the logs. But from your analysis it seems like there is another issue. 

Thanks,
Roshni

(In reply to comment #10)
> What exactly is the second bug? The display remains blank when? The only thing
> of interest in the more recent logs is that the PnP failed to resume, i.e.
> nothing related to gfx appears to malfunctioning.

(In reply to comment #10)
> What exactly is the second bug? The display remains blank when? The only thing
> of interest in the more recent logs is that the PnP failed to resume, i.e.
> nothing related to gfx appears to malfunctioning.
Comment 12 Florian Mickler 2012-04-05 06:56:27 UTC
A patch referencing this bug report has been merged in Linux v3.4-rc1:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish
Comment 13 Christopher M. Penalver 2016-02-26 00:47:22 UTC
As per https://bugs.freedesktop.org/show_bug.cgi?id=38952#c12 .


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.