Forwarding this bug from Ubuntu reporter Liam McDermott: http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/718767 [Problem] GPU freeze during regular system usage. User reports having seen freezes off and on since upgrading to Ubuntu 11.04 a few weeks ago. We have been getting a variety of gpu dumps similar to this one, with ESR: 0x00000001 and some IPEHR value that varies from report to report. They generally have dmesg output similar to this one, with no specific error message. [Original Description] The notification that this had crashed appeared just after rebooting. The bug reporting tool was also crashing at the same time so it's hard to say when this happened/what the cause was. ACTHD: 0xffffffff EIR: 0x00000000 EMR: 0xffffffed ESR: 0x00000001 PGTBL_ER: 0x00000000 IPEHR: 0x00007272 IPEIR: 0x00000000 INSTDONE: 0x7fffffc1 [ 13.413946] mtrr: no more MTRRs available [ 13.413958] [drm] MTRR allocation failed. Graphics performance may suffer. [ 13.425214] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [ 13.425225] [drm] Driver supports precise vblank timestamp query. [ 13.507981] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 13.508920] [drm] initialized overlay support ... [ 1120.564076] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 1120.571548] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 123059 at 123042, next 123237) [ 1120.578527] [drm:i915_reset] *ERROR* Failed to reset chip. [ 1120.672949] show_signal_msg: 6 callbacks suppressed [ 1120.672964] compiz[1262]: segfault at 0 ip 003690e0 sp bff00110 error 6 in libc-2.12.2.so[255000+15a000] ProblemType: Crash DistroRelease: Ubuntu 11.04 Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu7 ProcVersionSignature: Ubuntu 2.6.38-3.30-generic 2.6.38-rc4 Uname: Linux 2.6.38-3-generic i686 Architecture: i386 Chipset: i945gme DRM.card0.LVDS.1: status: connected enabled: enabled dpms: On modes: 1024x600 edid-base64: AP///////wAGr9IwAAAAAAETAQOAFg14CmaVlllXkSgfUFQAAAABAQEBAQEBAQEBAQEBAQEBsBMAQEFYGSAYiDEA330AAAAYAAAADwAAAAAAAAAAAAAAAAAgAAAA/gBBVU8KICAgICAgICAgAAAA/gBCMTAxQVcwMyBWMCAKAFI= DRM.card0.VGA.1: status: disconnected enabled: disabled dpms: Off modes: edid-base64: Date: Mon Feb 14 08:44:56 2011 DistUpgraded: Yes, recently upgraded Log time: 2011-01-27 10:36:04.407155 DistroCodename: natty DistroVariant: ubuntu DumpSignature: 1d5b69ea (ESR: 0x00000001 IPEHR: 0x00007272) ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py GraphicsCard: Subsystem: QUANTA Computer Inc Device [152d:1777] Subsystem: QUANTA Computer Inc Device [152d:1777] InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha i386 (20110122) InterpreterPath: /usr/bin/python2.7 MachineType: Quanta UW1 ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py ProcEnviron: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-3-generic root=UUID=fd4da622-36c1-4d74-811d-a8a5c90f2738 ro quiet splash vt.handoff=7 ProcKernelCmdLine_: BOOT_IMAGE=/boot/vmlinuz-2.6.38-3-generic root=UUID=fd4da622-36c1-4d74-811d-a8a5c90f2738 ro quiet splash vt.handoff=7 RelatedPackageVersions: xserver-xorg 1:7.6~3ubuntu4 libdrm2 2.4.23-1ubuntu3 xserver-xorg-video-intel 2:2.14.0-1ubuntu7 SourcePackage: xserver-xorg-video-intel Title: [i945gme] GPU lockup 1d5b69ea (ESR: 0x00000001 IPEHR: 0x00007272) UserGroups: dmi.bios.date: 05/19/2009 dmi.bios.vendor: INSYDE dmi.bios.version: Q3F21 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: Base Board Product Name dmi.board.vendor: Quanta dmi.board.version: 03 dmi.chassis.asset.tag: dmi.chassis.type: 1 dmi.chassis.vendor: Chassis Manufacturer dmi.chassis.version: Chassis Version dmi.modalias: dmi:bvnINSYDE:bvrQ3F21:bd05/19/2009:svnQuanta:pnUW1:pvr04:rvnQuanta:rnBaseBoardProductName:rvr03:cvnChassisManufacturer:ct1:cvrChassisVersion: dmi.product.name: UW1 dmi.product.version: 04 dmi.sys.vendor: Quanta version.compiz: compiz 1:0.9.2.1+glibmainloop4-0ubuntu11 version.libdrm2: libdrm2 2.4.23-1ubuntu3 version.libgl1-mesa-glx: libgl1-mesa-glx 7.10-1ubuntu1 version.xserver-xorg: xserver-xorg 1:7.6~3ubuntu4 version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.13.2+git20110124.fadee040-0ubuntu4 version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-1ubuntu7 version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110107+b795ca6e-0ubuntu4
Created attachment 43393 [details] BootDmesg.txt
Created attachment 43394 [details] CurrentDmesg.txt
Created attachment 43395 [details] XorgLog.txt
Created attachment 43396 [details] i915_error_state.txt Also: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/718767/+attachment/1849330/+files/IntelGpuDump.txt
Btw, what is 'IPEHR'? Is it significant that two otherwise similar gpu crash reports would have differing values?
IPEHR is the 'instruction pointer error header', i.e. the first dword of the last instruction parsed. This looks like memory corruption nothing to do with i915.ko. Something wrote garbage into the physical memory we are using for the ringbuffer: 0x000078a0: 0x00007272: MI_NOOP 0x000078a4: 0xf1ecfc44: UNKNOWN 0x000078a8: 0xf1ecfc44: UNKNOWN 0x000078ac: 0x00000000: MI_NOOP 0x000078b0: 0x00000000: MI_NOOP 0x000078b4: 0x00000000: MI_NOOP That doesn't match any pattern used by i915.ko, mesa, or the ddx. It could be a wild write from an unrelocated target surface, but that usually clobbers a whole lot more (and starting from the beginning of the ringbuffer).
Bryce, for all the 915/945 bugs can you please have the reporters test the latest kernel with the enlarged unfenced alignment. That's the most likely cause of random writes, though I don't suspect it in this case.
(In reply to comment #7) > Bryce, for all the 915/945 bugs can you please have the reporters test the > latest kernel with the enlarged unfenced alignment. That's the most likely > cause of random writes, though I don't suspect it in this case. Alright, doing so for both i915 and i945. I am pointing them at this package repository, which has daily snapshots of the kernel, and currently provides linux-image-2.6.38-999-generic_2.6.38-999.201102221357 http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ For reference, what commit(s) provide the enlarged unfenced alignment? I was not able to locate commit messages referring to unfenced alignments in either the current linus tree or in your drm-intel-next tree. If the patches help, I'd like to forward them to our kernel team to look at including.
Created attachment 43683 [details] dmesg Fwiw, I also got this user to test your debug patch on bug #34014. Attached is his dmesg from after reproducing the lockup. https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/718767/+attachment/1861287/+files/dmesg.txt
(In reply to comment #8) > For reference, what commit(s) provide the enlarged unfenced alignment? I was > not able to locate commit messages referring to unfenced alignments in either > the current linus tree or in your drm-intel-next tree. If the patches help, > I'd like to forward them to our kernel team to look at including. Looks like perhaps kernel commit 5e7833?
Chris, I've had multiple i915 and i945 reporters test the current daily kernel. Universally all say it makes no difference; they all still these same freezes. I also have verified we've had that enlarged unfenced alignment (commit 5e7833) in our kernel for some time.
(In reply to comment #11) > Chris, I've had multiple i915 and i945 reporters test the current daily kernel. > Universally all say it makes no difference; they all still these same freezes. > > I also have verified we've had that enlarged unfenced alignment (commit 5e7833) > in our kernel for some time. That's a relief in one sense. Can you keep the error states coming? Establishing a pattern would be most useful. There's only been one related fix so far: commit 467cffba85791cdfce38c124d75bd578f4bb8625 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Mar 7 10:42:03 2011 +0000 drm/i915: Rebind the buffer if its alignment constraints changes with tiling Early gen3 and gen2 chipset do not have the relaxed per-surface tiling constraints of the later chipsets, so we need to check that the GTT alignment is correct for the new tiling. If it is not, we need to rebind. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Can you give drm-intel-staging, and in particular, commit 0faba0d4e49361886b16c703995a3477951b14e5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 17 15:23:22 2011 +0000 drm/i915: Fix tiling corruption from pipelined fencing ... even though it was disabled. A mistake in the handling of fence reuse caused us to skip the vital delay of waiting for the object to finish rendering before changing the register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584 Cc: Andy Whitcroft <apw@canonical.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [Note for 2.6.38-stable, we need to reintroduce the interruptible passing] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> a whirl?
Working on the theory that it is one and the same bug: commit b5b5ac2dec49ea5ae033434efa90863aa5cdfb2c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 17 15:23:22 2011 +0000 drm/i915: Fix tiling corruption from pipelined fencing ... even though it was disabled. A mistake in the handling of fence reuse caused us to skip the vital delay of waiting for the object to finish rendering before changing the register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584 Cc: Andy Whitcroft <apw@canonical.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [Note for 2.6.38-stable, we need to reintroduce the interruptible passing] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Dave Airlie <airlied@linux.ie>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.