Summary: | [ilk regression] 3.7.x corrupt console image, hard hang starting X | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Nathan Myers <ncm> | ||||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||
Severity: | blocker | ||||||||||||||
Priority: | medium | ||||||||||||||
Version: | DRI git | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Description
Nathan Myers
2013-02-07 02:49:47 UTC
Can you please try bisecting between 3.6.10 and 3.7.4? That would most likely be the quickest method to isolate the cause. In addition to the bisect, a screenshot (with a camera) would be interesting. Working... Created attachment 74664 [details]
screen image, blurred but maybe better than nothing
That screen image,, attached...
Looking again, it seems more as if the display driver and the renderer disagree on the scan-line stride, but in a way that text lines often get several aligned scan lines for some distance.
I bisected (first time!) a half-dozen cycles between 3.6.10 and 3.7.4, but it was straying off into 3.6.10-preX. Probably my last "bad" assertion was some other bug. Picking up with a shortened bisect log and a new path...
Sorry, that was "3.6.0-preX". Can you try another screenshot, sharper? To figure out what's broken I need to do pixel-counting of how exactly things moved around, which isn't possible with yours. Maybe put the camera on a stand to avoid shaking, and if the pixels still aren't sharp enough, maybe also do a shot of the top-left corner only. Created attachment 74678 [details]
git bisect log
I will get another pic. In the meantime, a bisect log. (How could it take so many more than log2(n) steps?!) Tested by booting each build with init=/bin/bash, then "modprobe i915". Verified X starts OK on the last version before the failure.
Can you revert that patch by applying diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index 207e5c3..0a1d654 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -601,7 +601,7 @@ static int intel_gtt_init(void) gtt_map_size = intel_private.gtt_total_entries * 4; intel_private.gtt = NULL; - if (INTEL_GTT_GEN < 6 && INTEL_GTT_GEN > 2) + if (INTEL_GTT_GEN < 6 && INTEL_GTT_GEN > 2 && 0) intel_private.gtt = ioremap_wc(intel_private.gtt_bus_addr, gtt_map_size); if (intel_private.gtt == NULL) to your most recent release (or drm-intel-next/-nightly)? Anything unusual about your Ironlake, e.g. VT-d enabled? Yes, that fixes it. I don't know of anything unusual about this hardware, except I think the 1920x1080 LCD is not very common on this model. In dmesg I see a line "PCI-DMA: Intel(R) Virtualization Technology for Directed I/O". In .config, I seem to have "CONFIG_VIRT_TO_BUS" and "CONFIG_HAVE_KVM" turned on, but CONFIG_VIRTUALIZATION off. Hmm, definitely entering into Ironlake errata territory. How about "grep IOMMU /boot/config-`uname -r`" ? Complete boot dmesg would be useful in general I think. Created attachment 74713 [details]
dmesg output
No IOMMU in uname.
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
# CONFIG_AMD_IOMMU is not set
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_IOMMU_STRESS is not set
dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020e30272 ecap 1000
dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap c0000020230272 ecap 1000
dmar: IOMMU 2: reg_base_addr fed93000 ver 1:0 cap c9008020630272 ecap 1000
IOMMU 1 0xfed91000: using Register based invalidation
IOMMU 0 0xfed90000: using Register based invalidation
IOMMU 2 0xfed93000: using Register based invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:02.0 [0xbdc00000 - 0xbfffffff]
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbb7d7000 - 0xbb7e6fff]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbb7d7000 - 0xbb7e6fff]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
Created attachment 74714 [details]
3.7.7 .config
Should I try building with any different config settings?
(In reply to comment #14) > Created attachment 74714 [details] > 3.7.7 .config > > Should I try building with any different config settings? Yes. Please try disabling CONFIG_IOMMU_SUPPORT. Look for "IOMMU Hardware Support" in config (under Device Drivers in menuconfig). Is turning off IOMMU just a diagnostic exercise, or is it usually a better choice? A build of stock 3.7.7 (i.e. w/o the patch from #8) with IOMMU disabled boots and runs X normally. (In reply to comment #16) > Is turning off IOMMU just a diagnostic exercise, or is it > usually a better choice? Better choice. At least if you are using Intel graphics since there were a few errata in the silicon that prevent it from functioning properly (and the workarounds we have are to stall the GPU every time we update its page tables), and I think we've found another one. Created attachment 74737 [details] [review] Disable WC PTE updates for ILK VTd Can you please test this patch and report yay-or-nay on the mailing list? (I've cc'ed you on that patch) We need to check one more thing: Please test a IOMMU kernel both with and without the patch with intel_iommu=igfx_off added on the kernel cmdline. I built stock 3.7.7 with IOMMU turned on, and patched, and booted with and without the suggested option. It booted to full X both times. Without the patch, and built with IOMMU on, it fails reliably. Without the patch, and without IOMMU, it boots and runs X with no apparent problems. IOMMU turned on means: # CONFIG_CALGARY_IOMMU is not set CONFIG_IOMMU_HELPER=y CONFIG_IOMMU_API=y CONFIG_IOMMU_SUPPORT=y # CONFIG_AMD_IOMMU is not set CONFIG_INTEL_IOMMU=y CONFIG_INTEL_IOMMU_DEFAULT_ON=y CONFIG_INTEL_IOMMU_FLOPPY_WA=y # CONFIG_IOMMU_STRESS is not set This is not to say there are no problems. The 3.6 and 3.7 kernels are prone to freezing, with no mouse pointer motion, no response to keyboard input, and no response to ping, about once a week on this machine, but I don't know how to get any diagnostics out when it happens. (In reply to comment #21) > I built stock 3.7.7 with IOMMU turned on, and patched, and > booted with and without the suggested option. It booted to > full X both times. Without the patch, and built with IOMMU on, > it fails reliably. Without the patch, and without IOMMU, it > boots and runs X with no apparent problems. > > IOMMU turned on means: > > # CONFIG_CALGARY_IOMMU is not set > CONFIG_IOMMU_HELPER=y > CONFIG_IOMMU_API=y > CONFIG_IOMMU_SUPPORT=y > # CONFIG_AMD_IOMMU is not set > CONFIG_INTEL_IOMMU=y > CONFIG_INTEL_IOMMU_DEFAULT_ON=y > CONFIG_INTEL_IOMMU_FLOPPY_WA=y > # CONFIG_IOMMU_STRESS is not set Have you also tested what happens with an IOMMU-enable kernel, but adding intel_iommu=igfx_off on the kernel cmdline? That is a slightly different mode of "IOMMU disabled" which we need to test separately. > This is not to say there are no problems. The 3.6 and 3.7 > kernels are prone to freezing, with no mouse pointer motion, > no response to keyboard input, and no response to ping, about > once a week on this machine, but I don't know how to get any > diagnostics out when it happens. Should be fixed in latest stable updates, see bug #55984 Yes, to be precise, I tested 1. a STOCK 3.7.7 kernel with IOMMU configured OFF and booted with NO option intel_iommu=igfx_off (success) 2. a STOCK 3.7.7 kernel with IOMMU configured ON and booted with NO option intel_iommu=igfx_off (FAIL) 3. a PATCHed 3.7.7 kernel with IOMMU configured OFF and booted with NO option intel_iommu=igfx_off (success) 4. a PATCHed 3.7.7 kernel with IOMMU configured ON and booted with NO option intel_iommu=igfx_off (success) 5. a PATCHed 3.7.7 kernel with IOMMU configured ON and booted WITH option intel_iommu=igfx_off (success) where PATCH refers to attachment 74737 [details] [review], "Disable WC PTE updates for ILK VTd", and IOMMU ON as defined in #21. btw: It seems as if I cannot turn off CONFIG_IOMMU_HELPER or CONFIG_CONFIG_SWIOTLB using the regular configuration tools. (off-topic) 5afeb70e606bdce5a76de, referred to in #55984 as mentioned in #22, appears to be ancestral to 3.7.7. If so, it does not fix the occasional hang mentioned at the end of #21, which also occurred when I was running 3.7.7 (with IOMMU configured OFF). But there's no evidence to suggest that the hang has anything to do with [drm], beyond that it usually (but not always) happens when I'm sitting at the machine. Fix merged to drm-intel-next as commit b4950816cb3b1e10d8d0db3cd112e432b6c244cf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Feb 13 09:31:53 2013 +0000 drm/i915: Disable WC PTE updates to w/a buggy IOMMU on ILK For your ilk woes, the real fix is: https://bugs.freedesktop.org/attachment.cgi?id=73105 Dunno whether you've meant that since I couldn't find any patch on comment #22 on that bug. If you still have hangs, please file a new bug report and attache the i915_error_state. "git blame" indicates attachment #73105 [details] [review] is the commit 5afeb70e I mentioned. (In reply to comment #26) > "git blame" indicates attachment #73105 [details] [review] [review] is the commit > 5afeb70e I mentioned. Ah the sha1 you cite is on stable, not the upstream commit - hence I couldn't find it. Thank you to all. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.