Bug 24578 - [855] wedged GPU and failing intel_gpu_dumper
Summary: [855] wedged GPU and failing intel_gpu_dumper
Status: RESOLVED WONTFIX
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium critical
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2009-10-16 13:23 UTC by Bruno
Modified: 2011-02-01 05:41 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel trace while attempting to get gpu dump (2.93 KB, text/plain)
2009-10-16 13:23 UTC, Bruno
no flags Details
Complete kernel log (43.05 KB, text/plain)
2009-10-17 02:14 UTC, Bruno
no flags Details
Xorg log (from non-wedged session) (17.62 KB, text/plain)
2009-10-17 02:22 UTC, Bruno
no flags Details
Record batch buffer at time of error (14.77 KB, patch)
2010-02-17 10:06 UTC, Chris Wilson
no flags Details | Splinter Review
Kernel log - scheduing while atomie on GPU crash with attachment #33361 applied (52.37 KB, text/plain)
2010-02-17 10:36 UTC, Bruno
no flags Details
Record batch buffer at time of error (14.09 KB, patch)
2010-02-18 02:29 UTC, Chris Wilson
no flags Details | Splinter Review
Archive containing kernel and Xorg logs and /sys/kernel/debug/dri/0/* (221.31 KB, application/x-bzip2)
2010-02-18 12:11 UTC, Bruno
no flags Details
Yet another wedged GPU capture (225.58 KB, application/x-bzip2)
2010-02-19 09:38 UTC, Bruno
no flags Details

Description Bruno 2009-10-16 13:23:03 UTC
Created attachment 30490 [details]
Kernel trace while attempting to get gpu dump

My environment:
- linux-2.6.32-rc4+ at GIT commit 2caa731819a633bec5a56736e64c562b7e193666:
    Merge branch 'for-linus' of git://git.kernel.org/.../git/jbarnes/pci-2.6
- distro: Gentoo
- xorg-server-1.6.4
- intel-gpu-tools-1.0.1
- xf86-video-intel-9999 (GIT commit 86bc23ab5da34137c82250395c68aa92ecd88a24:
    debug: Enable cache flushing after every operation)
- libdrm-2.4.13
- mesa-7.5.2
- Acer Travelmate 66x laptop, using LVDS
- 00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM
          Integrated Graphics Device [8086:3582] (rev 02)
  00:02.1 Display controller [0380]: Intel Corporation 82852/855GM
          Integrated Graphics Device [8086:3582] (rev 02)

intel_gpu_dump: impossible to obtain, see attached kernel trace while attempting to get it.


Relevant kernel messages around the moment when the GPU went wedged:
[39632.320233] audacious2 used greatest stack depth: 1036 bytes left
[40827.910029] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed...
               GPU hung
[40827.910043] render error detected, EIR: 0x00000000
[40827.910050] i915: Waking up sleeping processes
[40827.910073] [drm:i915_wait_request] *ERROR* i915_wait_request returns -5
               (awaiting 3800895 at 3800884)
[40827.910354] reboot required
[40827.976240] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
[40827.992608] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
[40827.992666] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
...

Actions that led to wedged state: unknown, possibly some random render accelerated operation while moving or repainting some area.

Today I got the wedged state while scrolling in vim inside xterm terminal.
Yesterday I ended at same state while switching virtual desktop (using enlightenment as window manager/desktop environment)
So for now, once a day, each time after about around 12 hours uptime
Comment 1 Gordon Jin 2009-10-16 20:19:48 UTC
Do you enabled KMS? Could you attach the full dmesg output and Xorg.0.log?
Comment 2 Bruno 2009-10-17 02:14:45 UTC
Created attachment 30495 [details]
Complete kernel log
Comment 3 Bruno 2009-10-17 02:22:33 UTC
Created attachment 30496 [details]
Xorg log (from non-wedged session)

Here is a Xorg log from a normal session without wedged GPU.
When GPU gets wedged there is no additional in Xorg log.

xorg.conf can be found at attachment #28004 [details], with following line added to Extensions section:
        Option          "Composite"     "enable"


> Do you enabled KMS?
Yes I'm running with KMS (that way I can benefit from the full 1400x1050 pixels of my LVDS while on linux console)
Comment 4 Michael Groh 2009-11-23 02:00:19 UTC
This looks related to the bug i am hitting since the 2.6.32 rc's.

I already reported this as a bug to the LKML, see: http://thread.gmane.org/gmane.linux.kernel/914118

If you need any information just ask, i am willing to help :)
Comment 5 Chris Wilson 2010-02-17 10:06:42 UTC
Created attachment 33361 [details] [review]
Record batch buffer at time of error

Can you apply this patch and report the /sys/kernel/debug/dri/0/i915_error_state following a hang?
Comment 6 Bruno 2010-02-17 10:36:10 UTC
Created attachment 33362 [details]
Kernel log - scheduing while atomie on GPU crash with attachment #33361 [details] [review] applied

(In reply to comment #5)
> Created an attachment (id=33361) [details]
> Record batch buffer at time of error
> 
> Can you apply this patch and report the
> /sys/kernel/debug/dri/0/i915_error_state following a hang?


I applied that patch from intel-gfx mailing list yesterday and half an hour ago it crashed my system when GPU got wedged, scheduling while atomic...

Note, I currently run with libdrm-2.4.17 with commits from future libdrm-2.6.18: 4f0f871730b76730ca58209181d16725b0c40184, 973d8d6bd04230da801a8bc19af41dbc60e1918d, fdcde592c2c48e143251672cf2e82debb07606bd applied on top of it (3 with intel in their subject.
Comment 7 Chris Wilson 2010-02-17 10:51:35 UTC
Grr, that is most upsetting. I thought that patch was almost ready to be applied. :( Thanks for testing it.

And it looks like the original bug is still present as well.
Comment 8 Bruno 2010-02-17 11:12:16 UTC
(In reply to comment #7)
> Grr, that is most upsetting. I thought that patch was almost ready to be
> applied. :( Thanks for testing it.

I would have preferred not to test it (well it was a good reason to read into restore of my backup) as it corrupted enlightenment's configuration.

> And it looks like the original bug is still present as well.

Yep, as I said yesterday on IRC, the 3 "intel"-labeled patches seem to have fixed my font/display corruption but didn't help for the wedged GPU.
(and I'm wondering if I should give the debugging patch a second chance at capturing data or not...)
Comment 9 Chris Wilson 2010-02-18 02:29:31 UTC
Created attachment 33381 [details] [review]
Record batch buffer at time of error

New version, all atomic, all the time.
Comment 10 Bruno 2010-02-18 12:11:29 UTC
Created attachment 33397 [details]
 Archive containing kernel and Xorg logs and /sys/kernel/debug/dri/0/*

(In reply to comment #9)
> Created an attachment (id=33381) [details]
> Record batch buffer at time of error
> 
> New version, all atomic, all the time.

Here is a dump of wedged GPU (for current system setup see comment #6 and bug #26580)

The dump was captured with kernel 2.6.33-rc8 + my local patch for sd to stop disk on reboot and patch of attachment #33381 [details] [review] (I took the v7 version from intel-gfx mailing list - though they should be identical)

Note, the file 'vma' is not included as kernel had trouble allocating a big enough buffer for it (and once it finally worked the file was quite old and possibly out of date - if it's useful I still have it around)
Comment 11 Bruno 2010-02-19 09:38:30 UTC
Created attachment 33428 [details]
Yet another wedged GPU capture
Comment 12 Daniel Vetter 2010-03-18 06:35:32 UTC
I've created a preliminary patch that fixes gtt related cache coherency problems at least for my i855GM. Look here for instructions:

http://bugs.freedesktop.org/show_bug.cgi?id=26345#c61

Comment 13 Chris Wilson 2011-02-01 05:41:31 UTC
intel_gpu_dumper has become obsolete in favour of the in-kernel capture of error state and presentation through /sys/kernel/debug/dri/0/i915_error_state.

The cause of the hang is likely to be i8xx-cache-coherency.

(I am not going to spend time improving intel_gpu_dumper further now that it is retired from public use.)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.