After upgrading my xorg stack today to test some patches, I found that suspend and resume (S3) are now broken. When I attempt to resume the machine after suspending with compiz running, xorg comes back with a black screen and an unresponsive cursor. mesa: 264cba6f70eacd9e04646104d10ba63c248d7b83 libdrm: b0d93c74d884b40bd94469a5ef75fdb2fef17680 xserver: f841d4e3cccbde02e91c948f5ffb9e32c8c3b3cc xf86-video-intel: b662ecccb5c036fcc4aa19026642bde0a1ca2ac8 kernel: 2.6.27.7-132.fc10.x86_64
As expected, VT switching is also now broken with compiz. The symptoms seem to reflect bug #18062 pretty closely yet before upgrading to git today, this bug had been fixed ever since keith's vblank counter fix was merged into the fedora kernel. Below is a backtrace from a frozen server. #0 0x00000033b6addff7 in ioctl () from /lib64/libc.so.6 #1 0x0000003f57c02753 in drmIoctl (fd=12, request=3222823994, arg=0x7fff35c893f0) at xf86drm.c:183 #2 0x0000003f57c02bf0 in drmWaitVBlank (fd=12, vbl=0x7fff35c893f0) at xf86drm.c:1895 #3 0x0000000001063b3e in do_wait (vbl=0x7fff35c893f0, vbl_seq=0x7f921b4f19a0, fd=902337520) at ../common/vblank.c:255 #4 0x0000000001063d53 in driWaitForVBlank (priv=0x7f921b4f1940, missed_deadline=0x7fff35c8947f "") at ../common/vblank.c:406 #5 0x000000000106ba05 in intelSwapBuffers (dPriv=0x7f921b4f1940) at intel_buffers.c:740 #6 0x0000000001063f43 in driSwapBuffers (dPriv=0xc) at ../common/dri_util.c:321 #7 0x0000000000c049bf in __glXDRIdrawableSwapBuffers (basePrivate=0x7f92190abf20) at glxdri.c:251 #8 0x0000000000bf8c46 in __glXDisp_SwapBuffers (cl=<value optimized out>, pc=<value optimized out>) at glxcmds.c:1436 #9 0x0000000000bfbf5f in __glXDispatch (client=0x7f921b8a2620) at glxext.c:523 #10 0x000000000043e1b4 in Dispatch () at dispatch.c:437 #11 0x0000000000423fad in main (argc=8, argv=0x7fff35c89708, envp=<value optimized out>) at main.c:383
By request of anholt, I've been spending a bit of time attempting to bisect this. I first attempted to rollback xf86-video-intel to b662ecccb5c036fcc4aa19026642bde0a1ca2ac8, but this made no impact. I then went back to master and tried rolling back mesa to MESA_7_2. Unfortunately, this broke glx, [0331 ben@mercury /opt/exp/xorg/mesa] $ glxinfo name of display: :1.0 Failed to initialize TTM buffer manager. Falling back to classic. [intel_init_bufmgr:500] Error initializing buffer manager. brwCreateContext: failed to init intel context X Error of failed request: BadAlloc (insufficient resources for operation) Major opcode of failed request: 152 (GLX) Minor opcode of failed request: 3 (X_GLXCreateContext) Serial number of failed request: 24 Current serial number in output stream: 27 Going to try a mesa commit from Sep 17 next (39d29fe7fec304fa3638db15b868ebbcb8292167).
That mesa commit didn't configure, tried 72c914805b8b3b37bf8f44d94bc25ca3d146ac66 from Nov 1. This wouldn't compile due to dri2 changes, dri2.c: In function ‘DRI2CopyRegion’: dri2.c:297: error: ‘xDRI2CopyRegionReq’ has no member named ‘bitmask’ Tried 4c167f8fc1e56b6c82d8917c237e70531e3d57b9 from Nov 13. Same issue. This is futile. I'm going to give the hell up and get some sleep.
the mesa 7.2 failed because you presumably have Legacy3D FALSE set since you'd moved to a GEM environment, but mesa 7.2 needs the classic memory allocation.
I am able to reproduce this on the following bits: Mesa: f18880038b46c253d8689c9f6f7b77fca261e702 xf86-video-intel: 8d7cbab267e8fbcb2fcf90b18346b60607277266 libdrm: b0d93c74d884b40bd94469a5ef75fdb2fef17680 xserver: 027ff97a1354ab4c83fecb615f6bc2a6b739b871 kernel: 66647dc60d16fae9f6963fd98b6d9baa1a8dac69 Start the xserver with compiz and doing 'chvt 1', waiting a few seconds, and doing 'chvt 7' results X hanging with an identical stack track.
This seems the true block for Q4 release. Can any of you confirm this still exists?
the issue has been fixed against: Libdrm: (master)0243c9f801a35de3465a0321c02f18a4d07ce5b8 Mesa_stable: (intel-2008-q4)f96baeaac3ef41260ac3975750627ece073fdce0 Xserver_stable: (server-1.6-branch)32e81074b967716865aef08b66ec29caf0fec2c5 Xf86_video_intel_stable:(xf86-video-intel-2.6-branch) 83f3c376b5942e134047a220e6e5f2432ffc492c GEM_kernel: (for-airlied)0fbdb7c9455a05eb89f358f0eb66fb8ab094a0c5
*** Bug 19202 has been marked as a duplicate of this bug. ***
it works on gm965 ,but broken on q965 with the same code. So I reopen this bug for q965.xorg comes back with a black screen and an unresponsive cursor. we can access it by remote but no response when run any applications. the only thing we can do is just reboot.
the same issue happens on 945gm and g45.
any chance of a backported patch against 2.5?
(In reply to comment #9) > it works on gm965 ,but broken on q965 with the same code. So I reopen this bug > for q965.xorg comes back with a black screen and an unresponsive cursor. we can > access it by remote but no response when run any applications. the only thing > we can do is just reboot. > can you grab the log via ssh and attach here?
Created attachment 21423 [details] xorg.0.log
Created attachment 21424 [details] xorg conf file
(In reply to comment #12) > (In reply to comment #9) > > it works on gm965 ,but broken on q965 with the same code. So I reopen this bug > > for q965.xorg comes back with a black screen and an unresponsive cursor. we can > > access it by remote but no response when run any applications. the only thing > > we can do is just reboot. > > > > can you grab the log via ssh and attach here? > xorg.0.log is attached.
I can't reproduce this particular problem with the drm-intel-next branch, mesa, xserver and xf86-video-intel from today. However, compiz does give me a black screen when I VT switch back to it; only the mouse cursor is visible. It changes when I move across a window though, so the window manager is running and doesn't appear to be stuck waiting for vblank at least...
Oops, spoke too soon, looks like I am seeing this. So far, I see what look like a couple of problems: - the vblank refcount is 0 even while X & compiz are running this shouldn't happen since X is constantly doing vblank sync'd buffer swaps (or at least it appears to be) - glxgears properly causes the refcount to be increased, but doesn't prevent the problem It looks like interrupts aren't coming in after the VT switch...
Created attachment 21571 [details] [review] don't uninstall irq handler This patch fixes the problem for me. Looks like the server was calling into the DRM's vblank wait routine before the 2D driver had called in to re-enable interrupts. We probably shouldn't be disabling interrupts at all though...
(In reply to comment #18) > Created an attachment (id=21571) [details] > don't uninstall irq handler > > This patch fixes the problem for me. Looks like the server was calling into > the DRM's vblank wait routine before the 2D driver had called in to re-enable > interrupts. We probably shouldn't be disabling interrupts at all though... > hi,jesse, your patch works with drm-intel-next branch for me , but it will cause oops against drm-intel-2.6.28 branch and kernel 2.6.28-release when start X.
following is the oops info: Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: Oops: 0000 [#1] SMP Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: last sysfs file: /sys/class/drm/card0/dev Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: Process X (pid: 2627, ti=f6110000 task=f60679e0 task.ti=f6110000) Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: Stack: Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: 00000000 f60679e0 c041f8a0 00000000 00000000 f805da6c f6403e88 40046445 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: f6076400 f7e05648 f606bde0 fffffff4 f607642c f8067854 f615b600 bf8b8344 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: Call Trace: Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c0435763>] add_wait_queue+0x1f/0x2b Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<f805db34>] i915_irq_wait+0xc8/0x190 [i915] Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c041f8a0>] default_wake_function+0x0/0x8 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<f805da6c>] i915_irq_wait+0x0/0x190 [i915] Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<f7e05648>] drm_ioctl+0x1a7/0x22f [drm] Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c04831bc>] vfs_ioctl+0x47/0x5d Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c048366f>] do_vfs_ioctl+0x3c5/0x40f Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c0464c45>] handle_mm_fault+0x560/0x5bb Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c040440b>] common_interrupt+0x23/0x28 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c04836fa>] sys_ioctl+0x41/0x58 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c040386d>] sysenter_do_call+0x12/0x21 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: [<c0630000>] generic_processor_info+0x83/0x103 Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: Code: 63 71 c0 e8 b3 24 f3 ff 83 c4 14 8b 13 8b 43 04 89 42 04 89 10 c7 43 04 00 02 20 00 c7 03 00 01 10 00 5b c3 57 89 c7 56 89 d6 53 <8b> 41 04 89 cb 39 d0 74 17 51 50 52 68 7a 63 71 c0 6a 1a 68 2f Message from syslogd@x-q965 at Dec 31 14:50:44 ... kernel: EIP: [<c04f3dc9>] __list_add+0x7/0x52 SS:ESP 0068:f6110ebc [1]+ Done xinit
Created attachment 21582 [details] dmesg after oops
Created attachment 21601 [details] [review] don't remove irq handler (take #2) This one works for me against the 2.6.28 branch.
(In reply to comment #22) > Created an attachment (id=21601) [details] > don't remove irq handler (take #2) > > This one works for me against the 2.6.28 branch. > thanks,this patch also works for me against 2.6.28 branch.
Created attachment 21728 [details] [review] clear vblank enabled on irq uninstall Can you try this one too? It's best to try VT switching back both before and after 5s or so have elapsed, that way you'll test the disable timer too.
(In reply to comment #24) > Created an attachment (id=21728) [details] > clear vblank enabled on irq uninstall > > Can you try this one too? It's best to try VT switching back both before and > after 5s or so have elapsed, that way you'll test the disable timer too. > hi,jesse against your two patches, VT-switch is fine but s3 still broken. I 'm sorry for forgetting testing s3 with your first patch.
*** Bug 18940 has been marked as a duplicate of this bug. ***
Created attachment 21774 [details] [review] clear vblank enabled on irq uninstall This one also works for me with kwin & compiz (though with compiz I can't always reproduce the problem now that a separate libdrm fix has been committed). Suspend/resume works as well. Are you sure this patch was causing suspend/resume to fail? There were some problems with upstream kernels there...
(In reply to comment #27) > Created an attachment (id=21774) [details] > clear vblank enabled on irq uninstall > > This one also works for me with kwin & compiz (though with compiz I can't > always reproduce the problem now that a separate libdrm fix has been > committed). Suspend/resume works as well. Are you sure this patch was causing > suspend/resume to fail? There were some problems with upstream kernels > there... > I got the reason. we sticked libdrm to 2.4.3 recently,so the fix which you said above is not updated. right now,with the libdrm fix, your patch works for me with vt and s3. thanks.
Jesse, another success report here: https://bugs.gentoo.org/show_bug.cgi?id=253813 Would be great to see that patch upstream soon. Thanks!
I only tested the take #2 patch... but that didn't long term... need to test the new patch, I need to clarify in my downstream.
Two new commits in drm-intel-next and drm-intel-2.6.28: commit e1a6fcee467556a7e955fe1f7ccc134dd2f974e7 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Tue Jan 6 10:21:24 2009 -0800 drm/i915: set vblank enabled flag correctly across IRQ install/uninstall commit 9f4f07ceb1716d8796089fcef91621c5f07c872a Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Thu Jan 8 10:42:15 2009 -0800 drm/i915: don't enable vblanks on disabled pipes along with libdrm: commit f4f76a6894b40abd77f0ffbf52972127608b9bca Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Wed Jan 7 10:18:08 2009 -0800 libdrm: add timeout handling to drmWaitVBlank
trying to test the all 3 commits (was just going to grab them via web) but I can't seem to find the last patch. first 2 alone don't cut it.
(In reply to comment #32) > trying to test the all 3 commits (was just going to grab them via web) but I > can't seem to find the last patch. first 2 alone don't cut it. You should be able to find the last patch at http://cgit.freedesktop.org/mesa/drm/commit/?id=f4f76a6894b40abd77f0ffbf52972127608b9bca
verified against: xf86_video_intel xf86-video-intel-2.6-branch commit 4447973345a2a7af20ba1d6cb18c5f1ed8949d00 (2.5.99.2) mesa intel-2008-q4 branch commit eef0dcc298f65158dc750a09f80317ded1101dc7 (before and close to 7.3) kernel drm-intel-2.6.28 branch commit e1a6fcee467556a7e955fe1f7ccc134dd2f974e7 (2.6.28 + 5 patches) libdrm master branch commit ac8b3308b9432edef5cabe30559004314d42d98c (after 2.4.3) xserver server-1.6-branch commit 8cfb353078d9b5d03a9633304038141a60adc970
are the X11 fixes required for this to work?
No, just kernel and libdrm fixes. For the kernel: drm/i915: set vblank enabled flag correctly across install/uninstall drm/i915: don't enable vblanks on disabled pipes and for libdrm: libdrm: add timeout handling to drmWaitVBlank
just making sure I'm not being stupid... libdrm isn't part of the kernel.. (my comment on X was due to the fact that gentoo has libdrm as an X package) I couldn't get the libdrm patch to apply to 2.4.3. possible I'm doing something wrong... but I want to make sure it should.
Oh I'm not sure about applying it to 2.4.3, I was developing against git master. You'd have to take a look at what else went into git master that might affect the context of the patch (or just replace your package tarball with a new tarball of git master to make things easy).
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.