Bug 45500 - composite/xrender bug with Xft and FLTK
Summary: composite/xrender bug with Xft and FLTK
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Ext/Composite (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard: 2012BRB_Reviewed
Keywords: patch
Depends on:
Blocks:
 
Reported: 2012-02-01 06:17 UTC by Pierre Ossman
Modified: 2019-03-22 15:17 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Correct output (28.84 KB, image/png)
2012-02-01 06:18 UTC, Pierre Ossman
no flags Details
Incorrect output (27.63 KB, image/png)
2012-02-01 06:18 UTC, Pierre Ossman
no flags Details
Movie of the bug in action (1.01 MB, video/webm)
2012-02-01 06:19 UTC, Pierre Ossman
no flags Details
EXA: Factor in composite region early on (2.60 KB, patch)
2012-04-03 04:06 UTC, Michel Dänzer
no flags Details | Splinter Review
EXA/mixed: Always create damage record for pixmaps (2.41 KB, patch)
2012-04-16 06:27 UTC, Michel Dänzer
no flags Details | Splinter Review
EXA: Factor in composite region early on. (3.85 KB, patch)
2012-04-16 06:28 UTC, Michel Dänzer
no flags Details | Splinter Review

Description Pierre Ossman 2012-02-01 06:17:46 UTC
I'm having a very weird bug where FLTK dialogs get corrupted by "old" data in the window at certain times. I've attached screen shots and a movie of the effect.

The bug seems to only appear with compositing, and possibly only with 3D compositing (I've only seen it with gnome shell and compiz).

I can reproduce it with the radeon driver and the nouveau driver. I'm unable to reproduce it using software only servers (Xephyr and Xvnc). I've not tested any other graphics cards.

Through a whole bunch of tedious work, I've determined that the bug appears when Xft is trying to render a string, with an active clip region, and the string is completely clipped. At this point the rectangle extents of the string will instead be corrupted.

It does not happen for every of these cases though, and I've yet to figure out what is the triggering factor for these specific widgets. One idea is that it has something to do with how the clip region is relative to the clipped string.

Testing the clipping manually and avoiding the call to XftDrawString32() when it is fully clipped is sufficient to make the bug go away.

I've not been able to construct a simple test case so far...

Ideas? :/


~
[ossman@ossman]$ rpm -q xorg-x11-server-Xorg xorg-x11-drv-ati mesa-libGL libXft libXrender
xorg-x11-server-Xorg-1.11.3-1.fc16.x86_64
xorg-x11-drv-ati-6.14.3-3.20111125git534fb6e41.fc16.x86_64
mesa-libGL-7.11.2-1.fc16.x86_64
mesa-libGL-7.11.2-1.fc16.i686
libXft-2.2.0-2.fc15.x86_64
libXft-2.2.0-2.fc15.i686
libXrender-0.9.6-2.fc15.x86_64
libXrender-0.9.6-2.fc15.i686
Comment 1 Pierre Ossman 2012-02-01 06:18:27 UTC
Created attachment 56448 [details]
Correct output
Comment 2 Pierre Ossman 2012-02-01 06:18:46 UTC
Created attachment 56449 [details]
Incorrect output
Comment 3 Pierre Ossman 2012-02-01 06:19:14 UTC
Created attachment 56450 [details]
Movie of the bug in action
Comment 4 Pierre Ossman 2012-02-01 06:53:33 UTC
Two more data points:

 - Fedora 14, metacity with compositing: Works fine

 - Ubuntu 11.04, unity 3d: Works fine
Comment 5 Michel Dänzer 2012-02-01 09:32:42 UTC
(In reply to comment #4)
>  - Fedora 14, metacity with compositing: Works fine
> 
>  - Ubuntu 11.04, unity 3d: Works fine

Which upstream versions of xserver do those use?

My first guess would be that this is an EXA bug (does Option "EXANoComposite" work around it?), I'd start looking at exaGlyphs().
Comment 6 Pierre Ossman 2012-02-02 02:28:37 UTC
(In reply to comment #5)
> (In reply to comment #4)
> >  - Fedora 14, metacity with compositing: Works fine
> > 
> >  - Ubuntu 11.04, unity 3d: Works fine
> 
> Which upstream versions of xserver do those use?
> 

xorg-x11-server-Xorg-1.9.5-2.fc14.x86_64
xorg-server 2:1.10.1-1ubuntu1.3
Comment 7 Michel Dänzer 2012-02-02 03:09:38 UTC
(In reply to comment #6)
> xorg-server 2:1.10.1-1ubuntu1.3

Hmm, I can't see any obviously relevant EXA changes between 1.10 and 1.11... At this point I'm afraid the best bet is to bisect.
Comment 8 Pierre Ossman 2012-02-02 04:14:21 UTC
(In reply to comment #5)
> 
> My first guess would be that this is an EXA bug (does Option "EXANoComposite"
> work around it?), I'd start looking at exaGlyphs().

Yup, EXANoComposite does indeed prevent the bug from happening. I guess I'll try to build my own copy of the X server and see if I can find something.
Comment 9 Pierre Ossman 2012-02-02 04:15:50 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > xorg-server 2:1.10.1-1ubuntu1.3
> 
> Hmm, I can't see any obviously relevant EXA changes between 1.10 and 1.11... At
> this point I'm afraid the best bet is to bisect.

I seem to recall seeing this bug on Fedora 14 with Compiz. So I think the bug is just not showing up on Ubuntu for some reason.
Comment 10 Pierre Ossman 2012-02-04 08:24:42 UTC
Something very subtle is going on here. What I've determined so far:

 - It is indeed a glyph rendering call that is causing the problem. If I force the use of the fallback for Composite when it is called from Glyphs, then the problem goes away.

 - Added a debug print and concluded that miComputeCompositeRegion() is correctly determining that there is nothing to do for the suspicious rendering request.

 - Forcing the fallback for all request with the same clip region as the suspicious request does NOT make the problem go away.

I will continue to try to pinpoint this, but it is a bit confusing at this point.
Comment 11 Pierre Ossman 2012-02-06 04:25:18 UTC
I've managed to pinpoint it as far as I can go in the Xorg code. The triggering element is that the damage code is a bit blunt and assumes that all Glyphs and Composite operations modify the entire destination region (IOW it doesn't bother looking at the clipping at all).

Now this shouldn't really be any problem other than causing some needless churn somewhere else. So the real bug is whatever is handling these damage events. I guess something is doing double buffering, and when it gets the damage event it assumes the region has been filled with fresh data. Since it hasn't, it is presenting the stale back buffer data instead.

So where do we go from here? Is it mutter that's the next suspect? Or some other Xorg component?
Comment 12 Michel Dänzer 2012-02-08 09:06:38 UTC
(In reply to comment #11)
> I guess something is doing double buffering, and when it gets the damage event
> it assumes the region has been filled with fresh data. Since it hasn't, it is
> presenting the stale back buffer data instead.

I suspect that's spot on, with 'it' being EXA's migration code. It assumes that the damaged region has become more up to date in one of the pixmap copies (GPU or CPU accessible) and invalidates it in the other copy. But since exaGlyphs ends up not doing any actual rendering, the copies aren't synchronized before this, and it can end up invalidating current bits and keeping stale ones.

I think it would be best to fix this in the damage layer if at all possible. EXA could work around this problem, but it could potentially involve expensive synchronization of pixmap copies for a no-op.
Comment 13 Pierre Ossman 2012-02-08 13:10:32 UTC
So there are internal listeners to damage events inside the X server?

Still, damage events are defined as being a super set of the actual modified area. And I don't see how it could be any other way. If a diagonal line is drawn, it is very difficult to represent just the modified pixels using rects.

So even if we can fix this specific instance of the bug, having code that blindly assumes that damaged areas are completely redrawn seems to be just asking for more issues down the road.


It does seem to me that the back buffer handling is broken somewhere though. If the clipping hadn't been active, the glyphs would have been drawn on a stale old image, not on top of what's currently on the screen. So maybe the bug is somewhere in the double buffering code, rather than the fact that it is displaying things needlessly?
Comment 14 Michel Dänzer 2012-02-08 23:55:55 UTC
(In reply to comment #13)
> [...] having code that blindly assumes that damaged areas are completely
> redrawn seems to be just asking for more issues down the road.

It doesn't assume that. It assumes that if there's a non-empty damage region pending when the damage layer calls down into the lower layers, that *something* will end up being drawn.

As this is not the case here, exaDoMigration is never called, so the pixmap copies are never synchronized for the pending damage region.

BTW, I wouldn't worry too much about exactly what the incorrect contents look like, as the stale bits from the wrong pixmap copy could be from any previous time.
Comment 15 Pierre Ossman 2012-02-09 01:14:45 UTC
Hmm... I'm clearly not understanding the complete picture here. Is the behaviour something like this:

    [A] ==> [B] ==> [FB]

    A: Primary buffer
    B: Staging buffer
    FB: Frame buffer


The application draws to A. This triggers a copy (migration by EXA?) of most of A (i.e. not just the exact pixel that were modified) to B. The compositor then updates the frame buffer based on what was modified in B.

So the bug would be that we state that a certain area was modified in B, but that area was never copied from A, even though it is assumed it was.


Is this roughly how it works, or am I way off? :)
Comment 16 Michel Dänzer 2012-04-03 04:06:17 UTC
Created attachment 59414 [details] [review]
EXA: Factor in composite region early on

Does this patch fix the problem?
Comment 17 Michel Dänzer 2012-04-16 06:27:16 UTC
Created attachment 60063 [details] [review]
EXA/mixed: Always create damage record for pixmaps
Comment 18 Michel Dänzer 2012-04-16 06:28:23 UTC
Created attachment 60064 [details] [review]
EXA: Factor in composite region early on.

The previous patch was flawed, can you try these?
Comment 19 Pierre Ossman 2012-04-16 08:30:59 UTC
I'm a bit stressed out at work right now, but I'll make sure to give these patches a test eventually. Just bear with me. :)
Comment 20 Dmitry Bakshaev 2016-11-07 19:44:55 UTC
I confirm this bug.

Gentoo Base System release 2.2
tigervnc-1.7.0 + fltk-1.3.3 with xft
xorg-server-1.18.4
xf86-video-nouveau-1.0.12 + Option "AccelMethod" "exa" + Option "Composite" "0"

also tested with attached patches (sligthly modified for xorg-server-1.18.4): 
effect demonstrated in "Movie of the bug in action" is gone,
but i see similar effect on other vncviewer options tab "Compression".

Option "AccelMethod" "none" - works fine.

if run "recordmydesktop" to record video - also all works fine.
Comment 21 GitLab Migration User 2018-12-13 18:31:48 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/200.
Comment 22 Jevon 2019-03-22 15:17:46 UTC
Comment on attachment 60064 [details] [review]
EXA: Factor in composite region early on.

Great


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.