Created attachment 30490 [details] Kernel trace while attempting to get gpu dump My environment: - linux-2.6.32-rc4+ at GIT commit 2caa731819a633bec5a56736e64c562b7e193666: Merge branch 'for-linus' of git://git.kernel.org/.../git/jbarnes/pci-2.6 - distro: Gentoo - xorg-server-1.6.4 - intel-gpu-tools-1.0.1 - xf86-video-intel-9999 (GIT commit 86bc23ab5da34137c82250395c68aa92ecd88a24: debug: Enable cache flushing after every operation) - libdrm-2.4.13 - mesa-7.5.2 - Acer Travelmate 66x laptop, using LVDS - 00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02) 00:02.1 Display controller [0380]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02) intel_gpu_dump: impossible to obtain, see attached kernel trace while attempting to get it. Relevant kernel messages around the moment when the GPU went wedged: [39632.320233] audacious2 used greatest stack depth: 1036 bytes left [40827.910029] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [40827.910043] render error detected, EIR: 0x00000000 [40827.910050] i915: Waking up sleeping processes [40827.910073] [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 3800895 at 3800884) [40827.910354] reboot required [40827.976240] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged [40827.992608] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged [40827.992666] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged ... Actions that led to wedged state: unknown, possibly some random render accelerated operation while moving or repainting some area. Today I got the wedged state while scrolling in vim inside xterm terminal. Yesterday I ended at same state while switching virtual desktop (using enlightenment as window manager/desktop environment) So for now, once a day, each time after about around 12 hours uptime
Do you enabled KMS? Could you attach the full dmesg output and Xorg.0.log?
Created attachment 30495 [details] Complete kernel log
Created attachment 30496 [details] Xorg log (from non-wedged session) Here is a Xorg log from a normal session without wedged GPU. When GPU gets wedged there is no additional in Xorg log. xorg.conf can be found at attachment #28004 [details], with following line added to Extensions section: Option "Composite" "enable" > Do you enabled KMS? Yes I'm running with KMS (that way I can benefit from the full 1400x1050 pixels of my LVDS while on linux console)
This looks related to the bug i am hitting since the 2.6.32 rc's. I already reported this as a bug to the LKML, see: http://thread.gmane.org/gmane.linux.kernel/914118 If you need any information just ask, i am willing to help :)
Created attachment 33361 [details] [review] Record batch buffer at time of error Can you apply this patch and report the /sys/kernel/debug/dri/0/i915_error_state following a hang?
Created attachment 33362 [details] Kernel log - scheduing while atomie on GPU crash with attachment #33361 [details] [review] applied (In reply to comment #5) > Created an attachment (id=33361) [details] > Record batch buffer at time of error > > Can you apply this patch and report the > /sys/kernel/debug/dri/0/i915_error_state following a hang? I applied that patch from intel-gfx mailing list yesterday and half an hour ago it crashed my system when GPU got wedged, scheduling while atomic... Note, I currently run with libdrm-2.4.17 with commits from future libdrm-2.6.18: 4f0f871730b76730ca58209181d16725b0c40184, 973d8d6bd04230da801a8bc19af41dbc60e1918d, fdcde592c2c48e143251672cf2e82debb07606bd applied on top of it (3 with intel in their subject.
Grr, that is most upsetting. I thought that patch was almost ready to be applied. :( Thanks for testing it. And it looks like the original bug is still present as well.
(In reply to comment #7) > Grr, that is most upsetting. I thought that patch was almost ready to be > applied. :( Thanks for testing it. I would have preferred not to test it (well it was a good reason to read into restore of my backup) as it corrupted enlightenment's configuration. > And it looks like the original bug is still present as well. Yep, as I said yesterday on IRC, the 3 "intel"-labeled patches seem to have fixed my font/display corruption but didn't help for the wedged GPU. (and I'm wondering if I should give the debugging patch a second chance at capturing data or not...)
Created attachment 33381 [details] [review] Record batch buffer at time of error New version, all atomic, all the time.
Created attachment 33397 [details] Archive containing kernel and Xorg logs and /sys/kernel/debug/dri/0/* (In reply to comment #9) > Created an attachment (id=33381) [details] > Record batch buffer at time of error > > New version, all atomic, all the time. Here is a dump of wedged GPU (for current system setup see comment #6 and bug #26580) The dump was captured with kernel 2.6.33-rc8 + my local patch for sd to stop disk on reboot and patch of attachment #33381 [details] [review] (I took the v7 version from intel-gfx mailing list - though they should be identical) Note, the file 'vma' is not included as kernel had trouble allocating a big enough buffer for it (and once it finally worked the file was quite old and possibly out of date - if it's useful I still have it around)
Created attachment 33428 [details] Yet another wedged GPU capture
I've created a preliminary patch that fixes gtt related cache coherency problems at least for my i855GM. Look here for instructions: http://bugs.freedesktop.org/show_bug.cgi?id=26345#c61
intel_gpu_dumper has become obsolete in favour of the in-kernel capture of error state and presentation through /sys/kernel/debug/dri/0/i915_error_state. The cause of the hang is likely to be i8xx-cache-coherency. (I am not going to spend time improving intel_gpu_dumper further now that it is retired from public use.)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.