Summary: | GPU lockup - Invalid GTT entry during Display B Fetch or Host Access | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Bryce Harrington <bryce> | ||||||||||||||
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> | ||||||||||||||
Status: | CLOSED FIXED | QA Contact: | |||||||||||||||
Severity: | normal | ||||||||||||||||
Priority: | medium | CC: | ben, chris, daniel, jbarnes | ||||||||||||||
Version: | unspecified | Keywords: | regression | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||
OS: | Linux (All) | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||
Attachments: |
|
Description
Bryce Harrington
2010-12-14 10:38:14 UTC
Created attachment 41117 [details]
BootDmesg.txt
Created attachment 41118 [details]
CurrentDmesg.txt
Created attachment 41119 [details]
i915_error_state.txt
Bugzilla won't allow me to attach IntelGpuDump.txt, but here's a link: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/686388/+attachment/1758234/+files/IntelGpuDump.txt Also, Xorg.0.log if you need it: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/686388/+attachment/1758250/+files/XorgLog.txt That PGTBL_ER only makes sense in conjunction with a mode change. I can't see the actual crash dmesg to confirm that it was the only error detected along with the crash running firefox. Same user saw similar gpu lockup (same PGTBL error code at least) during a resume from suspend. https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/694244 Created attachment 42174 [details] IntelGpuDump.txt Here's the GPU dump from the last freeze. This includes several calls to XY_COLOR_BLT() - we've had several other bug reports with this in their batchbuffers, although the gpu dumps differ from bug to bug. Bugzilla won't allow the raw file to be attached, so I've attached it gzipped and here's a link: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/694244/+attachment/1775717/+files/IntelGpuDump.txt (In reply to comment #7) > Here's the GPU dump from the last freeze. This includes several calls to > XY_COLOR_BLT() - we've had several other bug reports with this in their > batchbuffers, although the gpu dumps differ from bug to bug. The contents of the batch buffers are more or less irrelevant to the reported error, since we only manipulate the display surfaces directly from the kernel. The latest i915_error_state is more interesting with regards to capturing these errors, since it includes the display settings and the pinned buffers. Pinning down the timing to that of a mode change and confirming that on 2.6.38-rc1, which has extra paranoia with regards the timing of the display surface removal and the improved error state, would be most useful. Created attachment 42241 [details]
i915_error_state.txt
No chance, kernel team says we have 2.6.38-rc1 but it's too horridly borked to foist on users; the kernel guys say it doesn't even boot.
But here's the i915_error_state.txt you asked for.
Created attachment 44991 [details] [review] Disable outputs before KMS takeover If the reported hang was occurring early in the boot process, then the attached might be an answer. But afaics, this hang is much later... Hmm, there is a second patch required to fix an oops. In drm-intel-staging: commit ea1167d6601f370f5d7e425eb0b3c7577edd02cd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 29 13:19:09 2011 +0100 drm/i915: Move the irq wait queue initialisation into the ring init Required so that we don't obliterate the queue if initialising the rings after the global IRQ handler is installed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> commit f8acdf5aa142926961e1f7ddb9e86490c50f8e6a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 29 10:40:27 2011 +0100 drm/i915: Disable all outputs early, before KMS takeover If the outputs are active and continuing to access the GATT when we teardown the PTEs, then there is a potential for us to hang the GPU. The hang tends to be a PGTBL_ER with either an invalid host access or an invalid display plane fetch. Reported-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> *** Bug 35976 has been marked as a duplicate of this bug. *** *** Bug 35975 has been marked as a duplicate of this bug. *** *** Bug 35974 has been marked as a duplicate of this bug. *** Those two patches have been reported by one user to have fixed the issue for him, but I need a few more testers since they seem to foul up a MacBook (but then there are more than one issue at play with MacBooks...) Thanks Chris, the early output disablement especially sounds promising. Our kernel team does daily builds of drm-intel-next and drm-next, but not drm-intel-staging, so may take a while before we can produce something for the reporters to test (and I doubt the reporters would be patching their kernel manually although who knows). I know, catch 22. They can't be accepted into -fixes unless we know they fix the bug. And whilst they are in -staging, only the foolhardy will try them. One of our kernel engineers was kind enough to do a quick package of the patches for user testing. This is a cherrypick of the natty kernel with these two patches, not a package of drm-intel-staging: Please install the Natty test kernel 2.6.32-32.61~lp719446.1 from https://launchpad.net/~timg-tpi/+archive/ppa echo "deb http://ppa.launchpad.net/timg-tpi/ppa/ubuntu natty main"|sudo tee /etc/apt/sources.list.d/timg-ppa.list sudo apt-get update sudo apt-get -u dist-upgrade Hopefully one of the reporters of this bug will test the kernel and give feedback. There has been one tester of this patched kernel so far, Daniel G. Taylor, who writes: """ Above commands caused my display to not work and I had to reboot into an older kernel selected in grub to get things working again. It does NOT fix the issue for me. I'm on a ~2007 Macbook. The display was off and showed no graphics whatsoever. I can't tell if the boot process succeeded or failed and had to do a hard-reboot. I installed 2.6.38-8-generic_2.6.38-8.42~lp686388 for i386 and the associated linux-image-generic, linux-generic, linux-libc-dev. That's the one that caused the issue. I tried booting both with an external monitor attached and without an external monitor. Bug LP #749784 is apparently my dupe of this one so you can see my system information in that one. Let me know what else I can do to help. """ MacBooks seem to enter the kernel with a PGTBL_ER already pending. You need the v2 patch to survive, but as stated it looks like MacBooks has a separate issue. commit c7bd4c25650704d4d065eb4ce2a122d2a80ce804 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 24 16:36:50 2012 +0100 drm/i915: Remove too early plane enable on pre-PCH hardware Enabling the plane before we have assigned valid address means that it will access random PTE (often with conflicting memory types) and cause GPU lockups. However, enabling the plane too early appears to workaround a number of bugs in our modesetting code. Cc: Franz Melchior <melchior.franz@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=39947 References: https://bugs.freedesktop.org/show_bug.cgi?id=41091 References: https://bugs.freedesktop.org/show_bug.cgi?id=49041 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Closing resolved+fixed. CommitDate: Thu May 3 2012. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.