Summary: | [i855] drm-intel-next Freeze shortly after X startup (i915_error_state) | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Geir Ove Myhr <gomyhr> | ||||||||||||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||||||||||||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||||||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||||||||||
Priority: | medium | CC: | bill.farrow | ||||||||||||||||||||||||||||||
Version: | unspecified | ||||||||||||||||||||||||||||||||
Hardware: | x86 (IA32) | ||||||||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Geir Ove Myhr
2010-02-24 23:36:19 UTC
Created attachment 33552 [details]
Batch buffer dump with v8 patch on top of Linus' kernel as of 2010-02-21
Created attachment 33553 [details]
Xorg.0.log
Created attachment 33554 [details]
lspci -vvnn
Assigning to Chris Wilson since I assume he may be interested at looking at the captured error state from his patch drm/i915: Record batch buffer following GPU error. Hope this is okay. Created attachment 33561 [details]
crash2: dmesg log
Created attachment 33562 [details]
crash2: Xorg log
Created attachment 33563 [details]
crash2: batch buffer dump
At the suggestion from Geir, I have collected dmesg [1], Xorg.0.log [2], and the batch buffer dump [3] from a single boot up and crash/freeze instance. This is much better than the previous log files which came from differing runs and maybe even different kernel builds. The kernel was built from the drm-intel.git repository [4], which includes Chris Wilson's gpu debug code. kernel = 2.6.33-rc8-v2.6.29-rc1-51333-g9df3079 git describe = v2.6.29-rc1-51333-g9df3079 [1]: attachment 33561 [details] dmesg [2]: attachment 33562 [details] /var/log/Xorg.0.log [3]: attachment 33563 [details] cat /sys/kernel/debug/dri/0/i915_error_state [4]: http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git Thanks, this is another cache flushing bug. The telltale here is: IPEHR: 0x40c00000 ... 0x02618194: 0x7c09c0cc: 3DSTATE_MAP_COORD_SET_I830 0x02618198: 0x7d020000: 3DSTATE_MAP_COORD_SETBIND_I830 0x0261819c: HEAD 0x00000098: dword 1 0x026181a0: 0x7c291099: 3DSTATE_MAP_TEX_STREAM_I830 i.e. the last instruction header does not match the previous dword of the command stream -- the GPU is seeing a different state of memory wrt the CPU. Created attachment 33616 [details] [review] msleep(magic_delay) This patch has proven vital to work-around more obvious cache-flushing bugs. I'd appreciate much wider testing... (In reply to comment #10) > Created an attachment (id=33616) [details] > msleep(magic_delay) > > This patch has proven vital to work-around more obvious cache-flushing bugs. > I'd appreciate much wider testing... > I've tested this patch for over an hour and my GPU is still up and running. I'm running latest intel-drm-next kernel from git, libdrm-2.4.18, Xorg 1.7.5, latest xf86-video-intel from git. Furtermore, this patch also fixes the render errors that are reported in this bug #26346. That said, rendering (both 2d and 3d) is now quite slow, as expected. Chris, the msleep patch [1] works, I can log in with gdm and get to the desktop now. Moving and redrawing windows is slow, as expected. I had one weird freeze when closing firefox where the mouse pointer still moved, and clicking on panel icons changes the mouse pointer to the spinning circle as if it was launching the application, but then the mouse pointer returns to an arrow and no application was displayed. Unfortunately I did not grab the logs, and I have been unable to reproduce it since. So how do we clean this up and fix this cache flush problem properly ? I'm happy to code if you give me some pointers. [1]: attachment 33616 [details] [review] msleep(magic_delay) As it is clearly the CPU/GPU coherency issue, I'm duping this so as to consolidate the reports... As to how to fix it, I've yet to find a suitable solution. The key is to ensure that the ICH has finished its writes prior to the GPU starting to DMA from memory. Sounds like it should be a fairly trivial, well-documented problem... But I've yet to find this precise scenario mentioned. *** This bug has been marked as a duplicate of bug 26345 *** Created attachment 33742 [details]
Batch buffer dump from Crash 3
Crash with msleep() patch applied
Created attachment 33743 [details]
Xorg log from crash 3
Crash with msleep() patch applied
Created attachment 33744 [details]
Xorg log after X restarted but with black screen
Crash with msleep() patch applied
Created attachment 33745 [details]
Batch buffer dump from Crash 4
Crash with msleep() patch applied
Created attachment 33746 [details]
Xorg.0.old log from before the freeze
Crash with msleep() patch applied
Created attachment 33747 [details]
Xorg log from freeze
Crash with msleep() patch applied
Tonight I updated my ubuntu packages including xserver-xorg-* keeping the kernel with msleep() patch and I had an Xorg crash and restart, and on the next boot an Xorg freeze. I have captured the batch buffer and Xorg log files if that helps. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.