I am having regressions, screen updates are causing GPU to lockup. I am able to reproduce 100% but nothing is being logged from kernel, just GPU wedges can't get into machine either. Using latest libdrm, ddx and mesa git master code.
I have a Intel GMA 4500HDx (i965)
X.Org X Server 1.7.4 Release Date: 2010-01-08
Which component regressed? Can you bisect?
I switched to Fedora so here is new info: X.Org X Server 1.7.99.901 (1.8.0 RC 1) - xorg-x11-server-Xorg-1.7.99.901-6.20100215.fc13.x86_64.rpm intel DDX: xorg-x11-drv-intel-2.10.0-4.fc13.x86_64 libdrm: libdrm-2.4.18-0.1.fc13.x86_64 Kernel is custom built early this morning of February 20th around 11-12am EST time using: anholt's latest patches against 2.6.33-rc8+ What happens is if I use 2D acceleration the GPU will lockup, if there was any audio being played it will get stuck playing a chunk of the buffer over and over, can't ssh into machine wedges. We can rule out the X server parts since two different versions show this. It might be the drm driver changes?
I can give some more information there is a error being reported just prior to KMS mode setup. This still happens on 2.6.33 + for-linus and intel-drm-next, with latest libdrm, mesa, ddx. When moving windows X uses high CPU, when trying to use glxgears, low FPS, X uses 40-50% cpu glxgears 50% CPU, the gears show stalls then locks up completely. kernel spits out: Mar 3 02:07:18 segfault kernel: DMA-API: preallocated 32768 debug entries Mar 3 02:07:18 segfault kernel: DMA-API: debugging enabled by kernel config Mar 3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.02] not found Mar 3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.02] not found Mar 3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.03] not found Mar 3 02:07:18 segfault kernel: DMAR: Device scope device [0000:00:03.03] not found Mar 3 02:07:18 segfault kernel: IOMMU 0xfeb00000: using Register based invalidation Mar 3 02:07:18 segfault kernel: IOMMU 0xfeb01000: using Register based invalidation Mar 3 02:07:18 segfault kernel: IOMMU 0xfeb03000: using Register based invalidation Mar 3 02:07:18 segfault kernel: IOMMU 0xfeb02000: using Register based invalidation Mar 3 02:07:18 segfault kernel: IOMMU: Setting RMRR: Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:02.0 [0xbdc00000 - 0xc0000000] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:02.1 [0xbdc00000 - 0xc0000000] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.0 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.1 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.2 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1d.7 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.0 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.1 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.2 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1a.7 [0xfc226c00 - 0xfc227400] Mar 3 02:07:18 segfault kernel: IOMMU: Prepare 0-16MiB unity mapping for LPC Mar 3 02:07:18 segfault kernel: IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000] Mar 3 02:07:18 segfault kernel: DRHD: handling fault status reg 3 Mar 3 02:07:18 segfault kernel: DMAR:[DMA Write] Request device [00:02.0] fault addr 200000000 Mar 3 02:07:18 segfault kernel: DMAR:[fault reason 05] PTE Write access is not set 00:02.0 in this case is the Intel integrated graphics
I note: I cannot view /debug/dri/0/i915_wedged because the GPU is not doing a reset at all. This continues even with 2.6.33 final, I also can reproduce this with 2.6.32 vanilla, so there might be something broken in the DDX with 2.10, I'm not able to git bisect too far back because of ABI changes which break with Xorg pre-1.8 server.
(In reply to comment #6) > I also can reproduce this with 2.6.32 > vanilla, so there might be something broken in the DDX with 2.10, I'm not able > to git bisect too far back because of ABI changes which break with Xorg pre-1.8 > server. You mean it's regression in xf86-video-intel 2.10. So what's the previous working version? I don't understand why you can't bisect (with xserver 1.7)? I don't see this problem on my G45 (GMA4500HD)
Well, You cannot build 2.9 against X server 1.8 due to API changes. After testing with 2.6.32 + Xserver 1.8 i encountered lockups. Either the regression is in the drm driver and I have to go back to .31 to confirm or it's in the DDX. I can try 2.6.31 and compile this to validate that it really isn't from the drm driver if you like. It would be good if it would trigger a GPU reset but it's not which is making this very difficult to debug. Even using Intel AMT over LAN for serial it did not dump anything from the drm saying there was faults. I do not think Xserver 1.7.x would cause it to lockup so tightly only the DDX or drm drivers could do this.
Ok, It is not happening in 2.6.31 vanilla. So this is a drm regression. I will begin a bisect from .31. Something in 2.6.32-rcX broke and we're going to find out.
I believe it is the use of the new DMA API which broke my GM45 (i965) GMA 4500HD. From what .31 had and .32/.33 can someone else confirm this? Do we need a quirk for this chipset im using?
I can confirm using kernel parameter intel_iommu=off I can use the GPU with 2.6.34-rc1 even fine.
Drop severity, the new code is experimental workaround solves problem right now.
I think Zhenyu wrote this code; could very well have broken in recent kernels possibly due to other IOMMU code changes. Zhenyu, one other thing I notice when using the IOMMU code is that with DMAR debugging enabled, the kernel will eventually give up tracking DMAR regions due to an overload, and at unload time we seem to have some stale mappings. Maybe we're not matching map/unmap somewhere?
I note that DMA-Remapping is now disabled on this chipset due to a few hardware issues...
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.