Created attachment 36189 [details] Xorg.0.log.old I get seemingly random lockups of the entire graphics system (not able to switch to other virtual consoles) due to EQ overflows in X when using the Intel driver and compositing under KDE 4.4. I am on Arch Linux. %lspci 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03) 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03) kernel26 2.6.33.4 xorg-server 1.7.6 xf86-video-intel 2.10.0 intel-dri 7.7.1 mesa 7.7.1 I use no xorg.conf file Kernel options: enable_mtrr_cleanup nopat i915.powersave=0 (without (some of?) these the display goes black after resume) Xorg.0.log.old is attached. Backtrace: 0: /usr/bin/X (xorg_backtrace+0x3b) [0x809f81b] 1: /usr/bin/X (mieqEnqueue+0x1ab) [0x809856b] 2: /usr/bin/X (xf86PostMotionEventP+0xd2) [0x80a3b22] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0xb72a9000+0x4581) [0xb72ad581] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0xb72a9000+0x487e) [0xb72ad87e] 5: /usr/bin/X (0x8048000+0x6663f) [0x80ae63f] 6: /usr/bin/X (0x8048000+0xf95b4) [0x81415b4] 7: (vdso) (__kernel_sigreturn+0x0) [0xb780f400] 8: /usr/lib/libpixman-1.so.0 (0xb773a000+0x59e5a) [0xb7793e5a] 9: /usr/lib/libpixman-1.so.0 (0xb773a000+0x16373) [0xb7750373] 10: /usr/lib/libpixman-1.so.0 (pixman_blt+0x78) [0xb7775b18] 11: /usr/lib/xorg/modules/libfb.so (fbCopyNtoN+0x24d) [0xb7240bdd] 12: /usr/lib/xorg/modules/drivers/intel_drv.so (0xb724c000+0x35aca) [0xb7281aca] 13: /usr/bin/X (miCopyRegion+0x21b) [0x81a5b4b] 14: /usr/bin/X (miDoCopy+0x44d) [0x81a606d] 15: /usr/lib/xorg/modules/drivers/intel_drv.so (0xb724c000+0x35328) [0xb7281328] 16: /usr/bin/X (0x8048000+0xc65c3) [0x810e5c3] 17: /usr/bin/X (0x8048000+0x3e2d5) [0x80862d5] 18: /usr/bin/X (0x8048000+0x40437) [0x8088437] 19: /usr/bin/X (0x8048000+0x1a705) [0x8062705] 20: /lib/libc.so.6 (__libc_start_main+0xe6) [0xb73f3c76] 21: /usr/bin/X (0x8048000+0x1a2f1) [0x80622f1]
Different signal, but this looks remarkably similar to bug 27313 and the patch there should fix the crash. Though I am actually more interested in knowing which path provoked the fallback.
Chris, I could patch the driver and see if the (albeit rare) crashes disappear. However, you mention that you are more interested in what triggers the crash. Could you point me to what I could do to be of most help to you?
This turns out to be another page-fault-of-doom... 08:57 < ohsix> hrm 08:57 < ohsix> 83118.00 - 62.1% : drm_clflush_pages [drm] 08:57 < ohsix> doesn't look right heh; (from perf top) 08:57 < ohsix> 206911.00 - 64.2% : drm_clflush_pages [drm] 08:57 < ohsix> 59009.00 - 18.3% : read_hpet 08:58 < ohsix> rebooted back with the "old" kernel and its doing that cpu burning freeze that it did last night 09:03 < ohsix> [ 31003.366] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
*** Bug 28471 has been marked as a duplicate of this bug. ***
Chris, I tried the patch you suggested in comment #c1 and did not experience the lockup afterwards. By now I have moved on to the 1.8 release (@ 1.8.1.902 at the moment) with a 2.6.34 kernel. No lockups since. Should I close this? Unfortunately, I don't know ohsix's setup.
Thanks for testing, I was starting to think this was the page-fault-of-doom and not a simple buffer overrun. We can't close this just yet as the patch hasn't been included upstream, since under review this patch should have no effect. Yet it obviously does...
Yes! A much better fix is finally on its way upstream! commit e2bf07fe23fd11a2acba609bf34ccc59c5553389 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 7 11:01:24 2010 +0100 drm/i915: Implement fair lru eviction across both rings. (v2) Based in a large part upon Daniel Vetter's implementation and adapted for handling multiple rings in a single pass. This should lead to better gtt usage and fixes the page-fault-of-doom triggered. The fairness is provided by scanning through the GTT space amalgamating space in rendering order. As soon as we have a contiguous space in the GTT large enough for the new object (and its alignment), evict any object which lies within that space. This should keep more objects resident in the GTT. Doing throughput testing on a PineView machine with cairo-perf-trace indicates that there is very little difference with the new LRU scan, perhaps a small improvement... Except oddly for the poppler trace. Reference: Bug 15911 - Intermittent X crash (freeze) https://bugzilla.kernel.org/show_bug.cgi?id=15911 Bug 20152 - cannot view JPG in firefox when running UXA https://bugs.freedesktop.org/show_bug.cgi?id=20152 Bug 24369 - Hang when scrolling firefox page with window in front https://bugs.freedesktop.org/show_bug.cgi?id=24369 Bug 28478 - Intermittent graphics lockups due to overflow/loop https://bugs.freedesktop.org/show_bug.cgi?id=28478 v2: Attempt to clarify the logic and order of eviction through the use of comments and macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Eric Anholt <eric@anholt.net>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.