I'm having a very weird bug where FLTK dialogs get corrupted by "old" data in the window at certain times. I've attached screen shots and a movie of the effect. The bug seems to only appear with compositing, and possibly only with 3D compositing (I've only seen it with gnome shell and compiz). I can reproduce it with the radeon driver and the nouveau driver. I'm unable to reproduce it using software only servers (Xephyr and Xvnc). I've not tested any other graphics cards. Through a whole bunch of tedious work, I've determined that the bug appears when Xft is trying to render a string, with an active clip region, and the string is completely clipped. At this point the rectangle extents of the string will instead be corrupted. It does not happen for every of these cases though, and I've yet to figure out what is the triggering factor for these specific widgets. One idea is that it has something to do with how the clip region is relative to the clipped string. Testing the clipping manually and avoiding the call to XftDrawString32() when it is fully clipped is sufficient to make the bug go away. I've not been able to construct a simple test case so far... Ideas? :/ ~ [ossman@ossman]$ rpm -q xorg-x11-server-Xorg xorg-x11-drv-ati mesa-libGL libXft libXrender xorg-x11-server-Xorg-1.11.3-1.fc16.x86_64 xorg-x11-drv-ati-6.14.3-3.20111125git534fb6e41.fc16.x86_64 mesa-libGL-7.11.2-1.fc16.x86_64 mesa-libGL-7.11.2-1.fc16.i686 libXft-2.2.0-2.fc15.x86_64 libXft-2.2.0-2.fc15.i686 libXrender-0.9.6-2.fc15.x86_64 libXrender-0.9.6-2.fc15.i686
Created attachment 56448 [details] Correct output
Created attachment 56449 [details] Incorrect output
Created attachment 56450 [details] Movie of the bug in action
Two more data points: - Fedora 14, metacity with compositing: Works fine - Ubuntu 11.04, unity 3d: Works fine
(In reply to comment #4) > - Fedora 14, metacity with compositing: Works fine > > - Ubuntu 11.04, unity 3d: Works fine Which upstream versions of xserver do those use? My first guess would be that this is an EXA bug (does Option "EXANoComposite" work around it?), I'd start looking at exaGlyphs().
(In reply to comment #5) > (In reply to comment #4) > > - Fedora 14, metacity with compositing: Works fine > > > > - Ubuntu 11.04, unity 3d: Works fine > > Which upstream versions of xserver do those use? > xorg-x11-server-Xorg-1.9.5-2.fc14.x86_64 xorg-server 2:1.10.1-1ubuntu1.3
(In reply to comment #6) > xorg-server 2:1.10.1-1ubuntu1.3 Hmm, I can't see any obviously relevant EXA changes between 1.10 and 1.11... At this point I'm afraid the best bet is to bisect.
(In reply to comment #5) > > My first guess would be that this is an EXA bug (does Option "EXANoComposite" > work around it?), I'd start looking at exaGlyphs(). Yup, EXANoComposite does indeed prevent the bug from happening. I guess I'll try to build my own copy of the X server and see if I can find something.
(In reply to comment #7) > (In reply to comment #6) > > xorg-server 2:1.10.1-1ubuntu1.3 > > Hmm, I can't see any obviously relevant EXA changes between 1.10 and 1.11... At > this point I'm afraid the best bet is to bisect. I seem to recall seeing this bug on Fedora 14 with Compiz. So I think the bug is just not showing up on Ubuntu for some reason.
Something very subtle is going on here. What I've determined so far: - It is indeed a glyph rendering call that is causing the problem. If I force the use of the fallback for Composite when it is called from Glyphs, then the problem goes away. - Added a debug print and concluded that miComputeCompositeRegion() is correctly determining that there is nothing to do for the suspicious rendering request. - Forcing the fallback for all request with the same clip region as the suspicious request does NOT make the problem go away. I will continue to try to pinpoint this, but it is a bit confusing at this point.
I've managed to pinpoint it as far as I can go in the Xorg code. The triggering element is that the damage code is a bit blunt and assumes that all Glyphs and Composite operations modify the entire destination region (IOW it doesn't bother looking at the clipping at all). Now this shouldn't really be any problem other than causing some needless churn somewhere else. So the real bug is whatever is handling these damage events. I guess something is doing double buffering, and when it gets the damage event it assumes the region has been filled with fresh data. Since it hasn't, it is presenting the stale back buffer data instead. So where do we go from here? Is it mutter that's the next suspect? Or some other Xorg component?
(In reply to comment #11) > I guess something is doing double buffering, and when it gets the damage event > it assumes the region has been filled with fresh data. Since it hasn't, it is > presenting the stale back buffer data instead. I suspect that's spot on, with 'it' being EXA's migration code. It assumes that the damaged region has become more up to date in one of the pixmap copies (GPU or CPU accessible) and invalidates it in the other copy. But since exaGlyphs ends up not doing any actual rendering, the copies aren't synchronized before this, and it can end up invalidating current bits and keeping stale ones. I think it would be best to fix this in the damage layer if at all possible. EXA could work around this problem, but it could potentially involve expensive synchronization of pixmap copies for a no-op.
So there are internal listeners to damage events inside the X server? Still, damage events are defined as being a super set of the actual modified area. And I don't see how it could be any other way. If a diagonal line is drawn, it is very difficult to represent just the modified pixels using rects. So even if we can fix this specific instance of the bug, having code that blindly assumes that damaged areas are completely redrawn seems to be just asking for more issues down the road. It does seem to me that the back buffer handling is broken somewhere though. If the clipping hadn't been active, the glyphs would have been drawn on a stale old image, not on top of what's currently on the screen. So maybe the bug is somewhere in the double buffering code, rather than the fact that it is displaying things needlessly?
(In reply to comment #13) > [...] having code that blindly assumes that damaged areas are completely > redrawn seems to be just asking for more issues down the road. It doesn't assume that. It assumes that if there's a non-empty damage region pending when the damage layer calls down into the lower layers, that *something* will end up being drawn. As this is not the case here, exaDoMigration is never called, so the pixmap copies are never synchronized for the pending damage region. BTW, I wouldn't worry too much about exactly what the incorrect contents look like, as the stale bits from the wrong pixmap copy could be from any previous time.
Hmm... I'm clearly not understanding the complete picture here. Is the behaviour something like this: [A] ==> [B] ==> [FB] A: Primary buffer B: Staging buffer FB: Frame buffer The application draws to A. This triggers a copy (migration by EXA?) of most of A (i.e. not just the exact pixel that were modified) to B. The compositor then updates the frame buffer based on what was modified in B. So the bug would be that we state that a certain area was modified in B, but that area was never copied from A, even though it is assumed it was. Is this roughly how it works, or am I way off? :)
Created attachment 59414 [details] [review] EXA: Factor in composite region early on Does this patch fix the problem?
Created attachment 60063 [details] [review] EXA/mixed: Always create damage record for pixmaps
Created attachment 60064 [details] [review] EXA: Factor in composite region early on. The previous patch was flawed, can you try these?
I'm a bit stressed out at work right now, but I'll make sure to give these patches a test eventually. Just bear with me. :)
I confirm this bug. Gentoo Base System release 2.2 tigervnc-1.7.0 + fltk-1.3.3 with xft xorg-server-1.18.4 xf86-video-nouveau-1.0.12 + Option "AccelMethod" "exa" + Option "Composite" "0" also tested with attached patches (sligthly modified for xorg-server-1.18.4): effect demonstrated in "Movie of the bug in action" is gone, but i see similar effect on other vncviewer options tab "Compression". Option "AccelMethod" "none" - works fine. if run "recordmydesktop" to record video - also all works fine.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/200.
Comment on attachment 60064 [details] [review] EXA: Factor in composite region early on. Great
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.