Summary: | VBlank related server hang in indirect rendering mode | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Mathieu Bérard <mathieu.berard> | ||||||||||||
Component: | Driver/Radeon | Assignee: | xf86-video-ati maintainers <xorg-driver-ati> | ||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||
Severity: | normal | ||||||||||||||
Priority: | medium | CC: | ghepeu | ||||||||||||
Version: | git | Keywords: | have-backtrace | ||||||||||||
Hardware: | Other | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Description
Mathieu Bérard
2007-10-23 15:28:30 UTC
Created attachment 12165 [details]
glxgears running and killing backtrace
Backtrace obtained when launching indirect rendering glxgears and the killing it
Created attachment 12166 [details]
not responding server backtrace
Backtrace of the Xorg process that is not responding anymore after launching an indirect rendering glxgears and hiding its window behind another one
Created attachment 12167 [details] [review] workaround patch dirty hack that sucessfully prevent the X server from hanging Comment on attachment 12167 [details] [review] workaround patch >diff --git a/shared-core/radeon_irq.c b/shared-core/radeon_irq.c >index 1ece639..6875a62 100644 >--- a/shared-core/radeon_irq.c >+++ b/shared-core/radeon_irq.c >@@ -153,12 +153,21 @@ static int radeon_driver_vblank_do_wait(struct drm_device * dev, > unsigned int cur_vblank; > int ret = 0; > int ack = 0; >+ int fixup = 0; >+ int saved_crtc; > atomic_t *counter; > if (!dev_priv) { > DRM_ERROR("%s called with no initialization\n", __FUNCTION__); > return -EINVAL; > } > >+ if (!(radeon_vblank_crtc_get(dev) == crtc)) { >+ saved_crtc = crtc; >+ fixup = 1; >+ radeon_vblank_crtc_set(dev, crtc); >+ DRM_INFO("Fixup mode !\n"); >+ } >+ > if (crtc == DRM_RADEON_VBLANK_CRTC1) { > counter = &dev->vbl_received; > ack |= RADEON_CRTC_VBLANK_STAT; >@@ -182,6 +191,9 @@ static int radeon_driver_vblank_do_wait(struct drm_device * dev, > > *sequence = cur_vblank; > >+ if (fixup == 1) >+ radeon_vblank_crtc_set(dev, saved_crtc); >+ > return ret; > } > Created attachment 12168 [details] [review] fixed patch Previous patch contained a leftover "return 0" from one of my many experiments, this one should be fixed I think we should simply remove the radeon DDX code to disable the vblank interrupts, finish up the drm vblank-rework branch and merge it. Dave, what do you think? (In reply to comment #6) I think that this commit is also very interesting: http://gitweb.freedesktop.org/?p=mesa/drm.git;a=commit;h=c06808fb6521822238bca4574758f30246b71c2d As it enforce a correct timeout when a signal is received for some ioctls. I think this is another aspect of the hang I observed: if the X server continuously receive SIGIO while sleeping in drmWaitVBlank then it may spin in the "while (ret && errno == EINTR)" loop forever. In fact I have tried to hack drmWaitVBlank to incorporate the same kind of fix as for the mentioned patch, and it successfully prevented the X server hang. (I can provide a patch if desired) As a sidenote, that modification add at least 2 calls of gettimeofday for each ioctls, I don't know if those are considered hotpaths but that might be suboptimal. I think that those can be avoided by handling this issue in the kernel: just put the task in a uninterruptible sleep state, for exemple by creating an uninterruptible version of DRM_WAIT_FOR and we can forgot this signal issue. That may not be valid/possile for all the ioctls, but I think that can work at leat for drmWaitVBlank. (In reply to comment #7) > I think this is another aspect of the hang I observed: if the X server > continuously > receive SIGIO while sleeping in drmWaitVBlank then it may spin in the > "while (ret && errno == EINTR)" loop forever. No, I think that should be handled correctly, the vblank ioctl updates the arguments for this before returning to userspace. > In fact I have tried to hack drmWaitVBlank to incorporate the same kind of fix > as for the mentioned patch, and it successfully prevented the X server hang. > (I can provide a patch if desired) Feel free. > I think that those can be avoided by handling this issue in the kernel: just > put the task in a uninterruptible sleep state, for exemple by creating an > uninterruptible version of DRM_WAIT_FOR and we can forgot this signal issue. > That may not be valid/possile for all the ioctls, but I think that can work at > leat for drmWaitVBlank. Handling SIGIO ASAP is critical for input processing, in particular for smooth mouse movement, so this would probably be a bad idea. I think I may be seeing this now - happens sometimes (but not reliably, unfortunately) running compiz and exiting fullscreen games such as xmoto or after running e.g. glxgears when invoking a compiz effect that involves full buffer swaps, such as the alt-tab switcher. I don't understand why DRM_WAIT_ON seems to constantly get interrupted by a signal before it hits the three second timeout. Also, I can't seem to reproduce this using the DRM from the 2.6.24 kernel, so it seems to be a DRM regression that I'm trying to bisect, but it's slow progress. Is this in line with your observations, and/or have you made any new ones? Well, I think that there might be some mismatch between the ddx driver and the drm code when the vblank-rework branch was merged. ddx driver's RADEONDRITransitionTo2d/3d still enable/disable VBlank interrupts right now, If I understood correctly that's not correct anymore with vblank-rework. On the other hand drm from 2.6.24 kernel does not have vblank-rework. Is there any way for the ddx to know if the drm layer has been vblank-reworked ? Maybe some version minor number in the drm sould have been incremented ... I don't know if that's directly related to this particular bug, just some random thoughts... I will try to do some tests to see if I can still reproduce the bug. (In reply to comment #10) > I don't know if that's directly related to this particular bug, [...] Apparently not, as I have already disabled that code in my xf86-video-ati tree. Sorry, I forgot to mention that before. So far it's looking like this is indeed related to the vblank-rework changes though - or did you see the problem without them? - but I still don't understand how any DRM changes could have an impact on the return value of signal_pending()... (In reply to comment #11) > [...] as I have already disabled that code in my xf86-video-ati tree. > Sorry, I forgot to mention that before. > Yes I did that too at one point to do some test after vblank-rework got merged. But I've dropped that after a while. > So far it's looking like this is indeed related to the vblank-rework changes > though - or did you see the problem without them? When I first reported that bug, it was without vblank-rework (and without compiz) so at one point yes. I need to redo some test to be sure how it behave now with and without vblank-rework and/or compiz. > but I still don't > understand how any DRM changes could have an impact on the return value of > signal_pending()... > Neither do I. And I do not understand ethier why drmWaitVBlank clear the DRM_VBLANK_RELATIVE flag when retrying after being interrupted by a signal ? (In reply to comment #12) > Neither do I. And I do not understand ethier why drmWaitVBlank clear the > DRM_VBLANK_RELATIVE flag when retrying after being interrupted by a signal ? It's just a workaround for older DRMs that didn't do it yet. drm_wait_vblank() adds the current sequence number to the relative wait period and treats it the same way as an absolute wait then. So if the relative flag wasn't cleared, interrupted relative waits would add the current sequence number again and again. Now that you mention this though, I notice there does seem to be a bug related to interrupted waits. I'll attach a patch as I can't test it right now, it'll be interesting whether it helps with this problem. Created attachment 14651 [details] [review] Fix for interrupted vblank wait This should be fixed with current Git, please reopen if you're still seeing the same problem. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.