Created attachment 26894 [details] intel_gpu_dump.txt.gz Forwarding this Ubuntu bug: https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/388357 [Problem] GPU hang with call trace in dmesg occurs subsequent to full screen activity (video playback); also seen by other users after resuming from screen blanking via DPMS and after resuming from screensaver. [Call Trace] [ 6000.528124] INFO: task events/1:10 blocked for more than 120 seconds. [ 6000.528133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6000.528140] events/1 D 0000000100151496 0 10 2 [ 6000.528152] ffff8800bded1db0 0000000000000046 ffff8800bded1d30 0000000000013000 [ 6000.528163] ffff8800bdec83a8 0000000000013000 0000000000013000 0000000000013000 [ 6000.528173] 0000000000013000 0000000000013000 ffff8800bdec83a8 0000000000013000 [ 6000.528183] Call Trace: [ 6000.528203] [<ffffffff806d9467>] __mutex_lock_slowpath+0xd7/0x160 [ 6000.528216] [<ffffffff802436b1>] ? finish_task_switch+0x51/0x110 [ 6000.528225] [<ffffffff806d9186>] mutex_lock+0x26/0x50 [ 6000.528260] [<ffffffffa0251ec8>] i915_gem_retire_work_handler+0x38/0x90 [i915] [ 6000.528283] [<ffffffffa0251e90>] ? i915_gem_retire_work_handler+0x0/0x90 [i915] [ 6000.528292] [<ffffffff802643d5>] run_workqueue+0x95/0x170 [ 6000.528300] [<ffffffff80264554>] worker_thread+0xa4/0x120 [ 6000.528310] [<ffffffff80268e90>] ? autoremove_wake_function+0x0/0x40 [ 6000.528318] [<ffffffff802644b0>] ? worker_thread+0x0/0x120 [ 6000.528327] [<ffffffff80268a35>] kthread+0x55/0xa0 [ 6000.528335] [<ffffffff802130ca>] child_rip+0xa/0x20 [ 6000.528344] [<ffffffff802689e0>] ? kthread+0x0/0xa0 [ 6000.528351] [<ffffffff802130c0>] ? child_rip+0x0/0x20 [Original Report] I had finished watching a video in totem, and had been writing email using mutt and vim in a terminal for some time, when the screen stopped updating. My music was still playing, though; everything seemed to be running except for the X server (symptoms similar to bug 359392). I was able to ssh in from another system and collect intel_gpu_dump output, which i will attach. /proc/interrupts showed no change in the number of interrupts for i915. The kernel logged a page allocation failure while intel_gpu_dump was running(!), which will be shown in the attached dmesg. I've seen it happen twice now (in the span of 2 hours), and both times, dmesg shows the above trace. ProblemType: Bug Architecture: amd64 Date: Wed Jun 17 10:20:15 2009 DistroRelease: Ubuntu 9.10 MachineType: LENOVO 6465CTO Package: xserver-xorg-video-intel 2:2.7.99.1+git20090602.ec2fde7c-0ubuntu2 ProcCmdLine: root=UUID=305dde78-d20a-4248-aaf4-09447b7c5791 ro quiet splash ProcEnviron: LC_COLLATE=C PATH=(custom, user) LANG=en_US.UTF-8 SHELL=/bin/zsh ProcVersionSignature: Ubuntu 2.6.30-9.10-generic RelatedPackageVersions: xserver-xorg 1:7.4~5ubuntu21 libgl1-mesa-glx 7.4.1-1ubuntu2 libdrm2 2.4.11-0ubuntu1 xserver-xorg-video-intel 2:2.7.99.1+git20090602.ec2fde7c-0ubuntu2 xserver-xorg-video-ati 1:6.12.2-2ubuntu1 SourcePackage: xserver-xorg-video-intel Uname: Linux 2.6.30-9-generic x86_64 dmi.bios.date: 01/21/2008 dmi.bios.vendor: LENOVO dmi.bios.version: 7LETB0WW (2.10 ) dmi.board.name: 6465CTO dmi.board.vendor: LENOVO dmi.board.version: Not Available dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: Not Available dmi.modalias: dmi:bvnLENOVO:bvr7LETB0WW(2.10):bd01/21/2008:svnLENOVO:pn6465CTO:pvrThinkPadT61:rvnLENOVO:rn6465CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable: dmi.product.name: 6465CTO dmi.product.version: ThinkPad T61 dmi.sys.vendor: LENOVO fglrx: Not loaded system: distro: Ubuntu architecture: x86_64kernel: 2.6.30-9-generic
Created attachment 26895 [details] dmesg
Ubuntu bugs with similar backtraces which I suspect are dupes: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/383973 Freeze when trying to resume from a blanked screen https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/383822 Freeze when trying to resume from screensaver https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/384242 Freeze after DPMS has kicked in https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/384865 kernel oops with intel graphics when screensaver turns screen off
*** Bug 22318 has been marked as a duplicate of this bug. ***
That backtrace is a generic "the gpu is hung" backtrace. Don't use it for classifying bugs. The dump in this report is broken because there were too many batchbuffers queued up and seqfile failed thanks to its use of kmalloc (the page allocation failure warning). If you can find a way to reliably reproduce the problem, and any 3D applications are in use, running them with INTEL_DEBUG=sync in the environment may help get successful dumping.
Here is a backtrace of the X server at the time of the hang: #0 0x00007feedbda0ec7 in ioctl () from /lib/libc.so.6 #1 0x00007feeda9812e3 in drmIoctl () from /usr/lib/libdrm.so.2 #2 0x00007feeda9815e6 in drmCommandNone () from /usr/lib/libdrm.so.2 #3 0x00007feeda50b370 in I830BlockHandler (i=0, blockData=<value optimized out>, pTimeout=0x7fff7b671df8, pReadmask=0x7dff80) at ../../src/i830_driver.c:2281 #4 0x0000000000536885 in AnimCurScreenBlockHandler ( screenNum=<value optimized out>, blockData=<value optimized out>, pTimeout=<value optimized out>, pReadmask=<value optimized out>) at ../../render/animcur.c:222 #5 0x0000000000500d86 in compBlockHandler (i=0, blockData=0x0, pTimeout=0x7fff7b671df8, pReadmask=<value optimized out>) at ../../composite/compinit.c:158 #6 0x00000000004520e0 in BlockHandler (pTimeout=0x7fff7b671df8, pReadmask=0x7dff80) at ../../dix/dixutils.c:384 #7 0x00000000004eed31 in WaitForSomething ( pClientsReady=<value optimized out>) at ../../os/WaitFor.c:215 #8 0x000000000044dd52 in Dispatch () at ../../dix/dispatch.c:367 #9 0x0000000000433f15 in main (argc=<value optimized out>, argv=0x7fff7b672018, envp=<value optimized out>) at ../../dix/main.c:397 Looks the same as in https://bugs.freedesktop.org/show_bug.cgi?id=20560
I have experienced this problem several times without the use of any 3D applications (I switched from compiz to metacity in hopes of a workaround). I'll attach another GPU dump. This one was taken sooner after the hang, so perhaps it will be more useful. How can I tell a useful dump from a useless one?
Created attachment 26932 [details] intel_gpu_dump output from a subsequent hang
The dump there looks pretty sane. Do you have a way to reproduce this bug?
"sane" as in "not broken like the previous one" or "sane" as in "contains no indication of any problem"? Has this information provided any clue as to where the problem lies? I've switched from compiz to metacity to get my life back, but was seeing this recur a couple of times per day while using compiz. I expect I could reproduce it by going back to compiz. Is there something more I can do to help diagnose the problem if indeed I can reproduce it? I am happy to try. If you want to have a go at reproducing it on your own hardware, I recommend trying with Ubuntu Karmic alpha 2: http://cdimage.ubuntu.com/releases/karmic/alpha-2/
"contains no indication of any problem" There are changes in intel-gpu-tools git that improve dump reporting and might have more information, but I don't expect it to help. We just need to figure out how to reliably reproduce the problem in a short period of time, so we can fix it.
*** Bug 22624 has been marked as a duplicate of this bug. ***
I have a 100% reliable way to reproduce this on Ubuntu Karmic x86_64. On any normal system with all defaults, kms and compiz enabled, just login and wait for the screen to blank. That hangs the display right there, and I get these same stacks in dmesg (see my duplicated bug, which actually has two hung stacks, not just the one noted here).
Created attachment 27401 [details] output of intel_gpu_dump.gz after hang My GPU dump after hang ... looks similar to previous but I'm not smart enough to tell the difference.
jwbaker, you have a completely different bug. We still need to figure out how to reliably reproduce this one.
What additional information can I provide? The best recipe I have so far is: - Install a recent Ubuntu snapshot - Boot the system - Work normally in X for a while I've re-enabled compiz to confirm that it still happens with the latest bits. However, unless you can provide instructions for diagnosis, the best I'll be able to do is run intel_gpu_dump and attach another (probably useless) dump.
Created attachment 28262 [details] intel gpu dump, dmesg and system info I am not 100% sure if I have the same problem but I hope the attached information will help to clarify this. How I produced the gpu hung. (steps which were used to produce the gpu_dump) 1. compiz is in use 2. open 25-30 pictures 1920x1200 with EOG 3. press <Alt>+<Tap> 4. the screen should now be frozen except the mouse cursor, but it could be that the mouse cursor is also frozen. I was not able to reproduce this with metacity as window manager (30 pictures). Another way to hung the gpu is to change the wallpaper in gnome while compiz is active (1920x1200). The gpu doesn't always hung immediately. 1. compiz is in use 2. 1-3 workspaces with each a full-screen windows open 3. 4th workspace to change the wallpaper 4. the screen should be frozen immediately or after some time. If it doesn't hung immediately you should choose different wallpaper until the system is frozen. I think that the system freezes much easier if I had it in use for some time, before I try to change the wallpaper. I hope this information is somehow helpful. Regards Achim
I can't say for sure in your case, since you didn't mention using any other 3d apps, it looks like you've got a screen with an appropriately aligned height, and it looks like compiz doesn't use a depth buffer, but it may still be worth trying with this commit series: xf86-video-intel: commit e8f0763d405a8152c74c28792c52fe12c1d41dd5 Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 7 18:24:44 2009 -0700 Fix math in the tiling alignment fix. commit 222b52ef16895823fbf3a0fc0be4eb23b930ed1b Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 7 18:05:29 2009 -0700 Align tiled pixmap height so we don't address beyond the end of our buffers. Mesa: commit ceb8afcca5b0a52b005a782ea54b301beaee1a15 Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 7 18:09:31 2009 -0700 intel: Align region height as required for tiled regions. Otherwise, we would address beyond the end of our buffers. Fixes reliable GPU segfault with texture_tiling=true and oglconform shadow.c. Bug #22406.
Created attachment 29294 [details] [review] Avoid wrapping mid-instruction. The first gpu dump shows that we wrapped the ringbuffer mid-instruction, which is invalid according to the docs. I've posted this patch for review.
decreasing priority and not to block Q3 release, as lacking of response.
Closing this due to lack of response. If the problem continues with the components updated for the other hangs we've fixed, please reopen.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.