Bug 4320

Summary: Over from xrgb8888 pictures not fast-pathed in XAA
Product: xorg Reporter: Frederic Crozat <fred>
Component: Driver/RadeonAssignee: Matthias Hopf <mat>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: high CC: ajax, basic, boiko, bugzilla, carlosgc, cworth, desintegr, federico, ghepeu, jasmin-bugfd, jwalden+bfo, lool, marejde, mat, pat, seb128
Version: gitKeywords: patch
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 5387    
Attachments:
Description Flags
big-fill.c
none
Updated big-fill.c
none
Proposed patch for improving XAAComposite Fastpath none

Description Frederic Crozat 2005-08-31 07:16:59 UTC
Follow up of GNOME bug http://bugzilla.gnome.org/show_bug.cgi?id=314616 and
Mandriva bug http://qa.mandriva.com/show_bug.cgi?id=17723

Radeon acceleration is not enabled when using XRenderComposite to render pixmap
background in nautilus (when using Gtk 2.8 + Cairo, when cairo workaround for
old broken Render implementation is not enabled).

Acceleration is back to normal when starting glxinfo (seems to reset
acceleration in the driver).
Disabling acceleration Radeon driver also fixes the problem.

card used : Radeon 9200SE
Comment 1 Federico Mena-Quintero 2005-09-15 11:58:54 UTC
See also bug #4456
Comment 2 T. Hood 2005-09-26 08:31:58 UTC
This bug can be workaround by running glxinfo (seems to reset Radeon
acceleration code) or adding Options NoAccel "true" in xorg.conf.
Comment 3 Federico Mena-Quintero 2006-01-18 05:53:27 UTC
Created attachment 4388 [details]
big-fill.c

This test uses the same sequence of xlib/cairo calls that Nautilus uses when
painting its background.  It shows how cairo_fill() of a non-alpha rectangle
takes over 1 second.

Per Vlad's suggestion, I turned on 

  Options "AccelMethod" "EXA"

for my Radeon card.  This makes things fast.

But still, the non-accel compositor should be better, since there is no
compositing to be made (pixmap to pixmap copy).  KeithP and CWorth say there
may be a code path where the RENDER implementation doesn't detect that it can
simply use XCopyArea().

This comment shows where this happens in the server:
http://bugzilla.gnome.org/show_bug.cgi?id=314616#c9
Comment 4 Pat Suwalski 2006-02-01 04:21:53 UTC
I've tried Federico's code. In theory his comment is correct, but in practice,
it does not work.

Without EXA: ~0.72s for fill.
With EXA: ~0.03s for fill.

Nevertheless, dragging windows around in GNOME becomes painfully slow with EXA
on. Composite and Render don't seem to affect these things. However xcompmgr
further slows that down to ~1 frame per second when dragging a relatively small
window.
Comment 5 Frederic Crozat 2006-02-01 04:22:42 UTC
I've enabled EXA on my system with X.org 6.9 final and it is still slow :(
Comment 6 Pat Suwalski 2006-02-01 04:33:33 UTC
I think this eMail I responded to on the GNOME desktop-devel list narrows things
down. It is perhaps some interaction with Nautilus and the X.org
acceleration/EXA code:

James Livingston wrote:
> One thing I noticed is that the time is greatly affected by whether
> Nautilus is drawing the desktop or not. I normally don't, but when
> turned on the time was up to around a second. Drawing the icons and text
> might take extra time, but is there something Nautilus is doing that
> causes it to go that much slower?

BINGO. This narrows in on the culprit. Disabling "show_desktop" makes the whole
desktop 3-4 times more snappy, especially with EXA. It appears that (at least
with radeon), nautilus' desktop drawing breaks very drastically.

But even with top-of-the-line nVidia (with closed driver), desktop scaling speed
is very much improved without nautilus.

--Pat
Comment 7 Matthias Hopf 2006-02-03 01:09:40 UTC
This is also discussed in SUSE's bugzilla:
https://bugzilla.novell.com/show_bug.cgi?id=117163

What I found out so far:
The test case in attachment #4388 [details] only reveals this problem on configurations
with a framebuffer width of 1400 or higher. In both cases (fast+slow)
fbCopyAreammx (which doesn't use MMX at all BTW) is called.

Running glxinfo could also trigger this by allocating more graphics memory and
leaving no space for the pixmaps any longer. Though this is only a very rough guess.

For 1280x1024 framebuffers the off-screen pixmap seems to be created in regular
memory (src+dest bytes):

fbCopyAreammx: src 0xaf4d3008 0/0 dest 0xaef5f008 0/0 size 1400/1021
fbCopyAreammx: src bytes 0xaf4d3060 stride 15e0 dest bytes 0xaef5f060 stride
15e0 byte_width 15e0
fbCopyAreammx: 0.037907s
aef5f000-afa6f000 rw-p aef5f000 00:00 0 
afa6f000-b7a6f000 rw-s d8000000 03:02 26724      /dev/mem

while for 1400x1050 the pixmap is created in card memory:
fbCopyAreammx: src 0x820fcc0 0/1055 dest 0x820fd20 0/2105 size 1400/1021
fbCopyAreammx: src bytes 0xb0048a00 stride 1600 dest bytes 0xb05ec600 stride
1600 byte_width 15e0
fbCopyAreammx: 0.981807s
081c8000-0838a000 rw-p 081c8000 00:00 0          [heap]
afa9e000-b7a9e000 rw-s d8000000 03:02 26724      /dev/mem

This happens because the requested pixmaps are of size 1400x1021 and 1400x1050.
AFAIR The current memory manager is stride based, that is it can only allocate
pixmaps in graphics memory up to the framebuffer width.

Card memory is incredibly slow to read, that's what's hitting us here. So what
actually should happen is that the XAA CopyArea function should have been
called, which does everything in graphics memory with the GPU.

We still have one problem: what shall we do if we have to copy a pixmap from GPU
memory to host memory? This will always be slow.
Comment 8 Federico Mena-Quintero 2006-02-03 03:38:18 UTC
Created attachment 4545 [details]
Updated big-fill.c

OK, my example program has hardcoded values for my particular screen resolution
(1400x1050) :)

Here is an updated version which picks up your screen resolution and uses
values based on that.  Hopefully that will be more useful in diagnosing the
problem.
Comment 9 Federico Mena-Quintero 2006-02-03 03:44:47 UTC
(In reply to comment #7)
> Card memory is incredibly slow to read, that's what's hitting us here. So what
> actually should happen is that the XAA CopyArea function should have been
> called, which does everything in graphics memory with the GPU.

This makes a lot of sense.  As Keith and Carl mentioned once, there is code in
the server-side implementation of RENDER that needs to detect that it is copying
a non-alpha pixmap to another pixmap, and it can just use CopyArea instead of
copying the pixels by hand.

We *always* used XCopyArea() from the client in the past, since we didn't use
Cairo.  Even then, however, we had no guarantee that both pixmaps would be in
graphics memory.  Maybe we got lucky in that both pixmaps always lived in the
graphics card.  Or does CopyArea have some magic that makes copies fast from
graphics memory to plain memory?
Comment 10 Matthias Hopf 2006-02-03 04:36:33 UTC
(In reply to comment #8)
> Updated big-fill.c
> 
> OK, my example program has hardcoded values for my particular screen resolution
> (1400x1050) :)

I noticed, and that was actually good, because it moved me into the right
direction (by not exposing the bug to me first ;)

(In reply to comment #9)
> This makes a lot of sense.  As Keith and Carl mentioned once, there is code in
> the server-side implementation of RENDER that needs to detect that it is copying
> a non-alpha pixmap to another pixmap, and it can just use CopyArea instead of
> copying the pixels by hand.

Actually, that is (partially) already done in XComposite, Cairo only hit a
corner case that wasn't accelerated.

I would call this a bug in Cairo as well, 1st) because it uses PictOpOver with a
source without alpha and without a mask (should do PictOpSrc in this case), 2nd)
because it enables repeat even for 1:1 copies.

Anyone willing to report this to Cairo guys? I know David is working on glitz,
but this should be detected and worked around earlier in the library.

Anyway, patch for Xorg pending which makes this path fast and the code ugly
(there is a *long* if() statement).

> Cairo.  Even then, however, we had no guarantee that both pixmaps would be in
> graphics memory.  Maybe we got lucky in that both pixmaps always lived in the
> graphics card.  Or does CopyArea have some magic that makes copies fast from
> graphics memory to plain memory?

As long as the pixmap was not wider than the framebuffer, you would typically
get them in gfx memory, using the fully accelerated CopyArea.
Typically.
Comment 11 Matthias Hopf 2006-02-03 05:07:59 UTC
Created attachment 4547 [details] [review]
Proposed patch for improving XAAComposite Fastpath

This patch improves the Fastpath from XAAComposite so that this corner case
(yes, it is a corner case) is accelerated as well.

This patch should be discussed here, as I'm not 100% sure about the meaning of
some elements (pDrawable->x & y). As the slow path is definitively broken WRT
some others (see bug #5796), this shouldn't create any regressions, though.


Results here (Radeon 7500):
Slow path (Framebuffer width < Pixmap width): 37ms
Fast path (Framebuffer width >= Pixmap width):
   without patch:   900-1200ms
   with patch:	    1-1.2ms

Acceleration factor: approx. 1000   =-)
Comment 12 Matthias Hopf 2006-02-03 05:08:59 UTC
If noone objects I will submit this to CVS during the next week.
Comment 13 Carl Worth 2006-02-03 05:15:24 UTC
(In reply to comment #10)
> I would call this a bug in Cairo as well, 1st) because it uses PictOpOver with a
> source without alpha and without a mask (should do PictOpSrc in this case), 2nd)
> because it enables repeat even for 1:1 copies.

The first one, yes, cairo could drop to PictOpSrc if the source surface has no
alpha even if the user selects CAIRO_OPERATOR_OVER.

As for repeat, I don't think cairo sets that unless the application asks for it.
I suppose we could do some fairly complicated check to determine that it won't
actually _need_ repeating, but I think this would actually be more robust either
above in the application (eg. nautilus), or below in the X server. In either of
those cases there are fewer coodinate systems involved so it's going to be
easier to get the will-need-repeat analysis correct.
 
> Anyone willing to report this to Cairo guys?

I'm following this bug at least. If you'd like to report anything more focused
and cairo-specific in cairo's bugzilla, then I'd be happy to track them as well.

-Carl
Comment 14 Matthias Hopf 2006-02-03 05:24:34 UTC
(In reply to comment #13)
> As for repeat, I don't think cairo sets that unless the application asks for it.

I didn't do an in-depth analysis of big-fill, but in the Xserver I get a
Composite request with src_repeat set. Maybe Federico can comment on what his
source is doing here.

> I suppose we could do some fairly complicated check to determine that it won't
> actually _need_ repeating, but I think this would actually be more robust either
> above in the application (eg. nautilus), or below in the X server. In either of
> those cases there are fewer coodinate systems involved so it's going to be
> easier to get the will-need-repeat analysis correct.

Right, but the current test in the Xserver will only work as well as long as not
transformations are involved either. I agree it is the application which should
*not* use repeat if possible at all.

> > Anyone willing to report this to Cairo guys?
> 
> I'm following this bug at least. If you'd like to report anything more focused
> and cairo-specific in cairo's bugzilla, then I'd be happy to track them as well.

I won't have time the next few weeks, so I can only leave this to you folks. It
all depends on how bugzilla-focused your development system is. It's no time
critical thing as we will have a workaround in the Xserver for the most common
case now.
Comment 15 Federico Mena-Quintero 2006-02-03 06:05:44 UTC
That patch is very interesting.  Thanks a lot for cooking it, Matthias :)

Will it also handle the case where we *do* need to repeat the source? You can
test this by making the back_pixmap smaller than temp_pixmap in big-fill.c -
just make it 300x300 or so.

Carl: you asked me to test OPERATOR_SOURCE in big-fill.c, and it made no difference.
Comment 16 Carl Worth 2006-02-03 06:25:04 UTC
(In reply to comment #15)
> That patch is very interesting.  Thanks a lot for cooking it, Matthias :)
> 
> Will it also handle the case where we *do* need to repeat the source? You can
> test this by making the back_pixmap smaller than temp_pixmap in big-fill.c -
> just make it 300x300 or so.

There should be a similar fix possible for the X server to handle the repeat
case, (which is to make the server act the same as if doing an XFillRectangle
under the influence of XSetFillStyle(..., FillTiled).

> Carl: you asked me to test OPERATOR_SOURCE in big-fill.c, and it made no
difference.

OK.

We already have cairo-side optimizations for both non-repeat and repeating
cases. The test for the non-repeating case looks like this:

    if (!have_mask &&
        is_integer_translation &&
        src_attr->extend == CAIRO_EXTEND_NONE &&
        !needs_alpha_composite &&
        _surfaces_compatible(src, dst))
    {
        return DO_XCOPYAREA;
    }

where needs_alpha_composite is set by:

    if (op == CAIRO_OPERATOR_SOURCE ||
        (!surface_has_alpha &&
         (op == CAIRO_OPERATOR_OVER ||
          op == CAIRO_OPERATOR_ATOP ||
          op == CAIRO_OPERATOR_IN)))
        return FALSE;

and hopefully the surface_has_alpha flag is being set properly, (your change to
use CAIRO_OPERATOR_SOURCE suggests that surface_has_alpha is not the problem).

When the tests above succeed, cairo uses XCopyArea instead of XRenderComposite.

For the repeat case, the test in cairo looks like:
    if (is_integer_translation &&
        src_attr->extend == CAIRO_EXTEND_REPEAT &&
        (src->width != 1 || src->height != 1))
    {
        if (!have_mask &&
            !needs_alpha_composite &&
            _surfaces_compatible (dst, src))
        {
            return DO_XTILE;
        }

        return DO_UNSUPPORTED;
    }

And if this succeeds then instead of XRenderComposite, cairo calls into:

        XSetTSOrigin (dst->dpy, dst->gc,
                      - (itx + src_attr.x_offset), - (ity + src_attr.y_offset));
        XSetTile (dst->dpy, dst->gc, src->drawable);
        XSetFillStyle (dst->dpy, dst->gc, FillTiled);

        XFillRectangle (dst->dpy, dst->drawable, dst->gc,
                        dst_x, dst_y, width, height);

So, if you're getting XRenderComposite called in either of these cases, there is
a bug in some of the logic that feeds the tests I mention above.

Beyond that, even if cairo didn't do any of these optimizations, it would be
good if the server optimized these cases itself. Matthias has proposed a patch
for the XCopyArea case, and something more should be needed for the FillTiled case.

Comment 17 Federico Mena-Quintero 2006-02-03 06:38:42 UTC
If I turn off CAIRO_EXTEND_REPEAT in big-fill.c, it becomes fast.

GTK+ needs REPEAT turned on in the problematic code path in Nautilus for the
following reason.  When it gets an exposure event, GTK+ creates a temporary
pixmap for double-buffering.  The desktop window (almost the size of the root
window) has a background pixmap with a photo of your dog.  GTK+ clears this
temporary double-buffer pixmap by filling it with the background pixmap, using a
Cairo pattern, cairo_rectangle(), cairo_fill().  You can see where this is going.

I'm putting a workaround in GTK+ for this (at least in our Novell package); I'll
attach it in a second.
Comment 18 Carl Worth 2006-02-03 11:37:04 UTC
(In reply to comment #16)
>
> For the repeat case, the test in cairo looks like:
>     if (is_integer_translation &&
>         src_attr->extend == CAIRO_EXTEND_REPEAT &&
>         (src->width != 1 || src->height != 1))
>     {
>         if (!have_mask &&
>             !needs_alpha_composite &&
>             _surfaces_compatible (dst, src))
>         {
>             return DO_XTILE;
>         }
> 
>         return DO_UNSUPPORTED;
>     }

Actually, there's a subtle issue here. If the server is characterized as not
having "buggy_repeat" (that is Xorg > 6.8.2 or XFree86 4.5.0) then after the
XCopyArea optimization check cairo currently decides to go immediately to
XRenderComposite before even checking to see if FillTile with XFillRectangle
might do the trick.

We can certainly change that if we have evidence that it will help.

Oddly enough, I have access to a machine now (with Xorg 7.0.0) for which making
cairo use either of XCopyArea, XFillRectangle is never faster than using
XRenderComposite---always resulting in more than 2 seconds of time for the
operation. This is with the big-fill test case, with CAIRO_EXTEND_REPEAT or
CAIRO_EXTEND_NONE.

I'm not sure why the X server would always be so slow, but I also don't know
what more cairo could do in a situation like this.

-Carl


I'm surprised to see all of those paths

> And if this succeeds then instead of XRenderComposite, cairo calls into:
> 
>         XSetTSOrigin (dst->dpy, dst->gc,
>                       - (itx + src_attr.x_offset), - (ity + src_attr.y_offset));
>         XSetTile (dst->dpy, dst->gc, src->drawable);
>         XSetFillStyle (dst->dpy, dst->gc, FillTiled);
> 
>         XFillRectangle (dst->dpy, dst->drawable, dst->gc,
>                         dst_x, dst_y, width, height);
> 
> So, if you're getting XRenderComposite called in either of these cases, there is
> a bug in some of the logic that feeds the tests I mention above.
> 
> Beyond that, even if cairo didn't do any of these optimizations, it would be
> good if the server optimized these cases itself. Matthias has proposed a patch
> for the XCopyArea case, and something more should be needed for the FillTiled
case.
> 
> 

Comment 19 Matthias Hopf 2006-02-03 22:25:44 UTC
(In reply to comment #16)
> > Will it also handle the case where we *do* need to repeat the source? You can
> > test this by making the back_pixmap smaller than temp_pixmap in big-fill.c -
> > just make it 300x300 or so.
> 
> There should be a similar fix possible for the X server to handle the repeat
> case, (which is to make the server act the same as if doing an XFillRectangle
> under the influence of XSetFillStyle(..., FillTiled).

I can look into this. However, AFAIR this is often not accelerated. So perhaps
we should check for the size of the source tile and according to some heuristics
just call multiple XCopyArea instead (which are likely to be accelerated).

> > Carl: you asked me to test OPERATOR_SOURCE in big-fill.c, and it made no
> difference.

Sure, as repeat is on and that case was not handled in the PictOpSrc case as well.

> We already have cairo-side optimizations for both non-repeat and repeating
> cases. The test for the non-repeating case looks like this:

AFAICS the optimization in X has been exactly the same, so it was actually never
to be triggered :-P

> where needs_alpha_composite is set by:
> 
>     if (op == CAIRO_OPERATOR_SOURCE ||
>         (!surface_has_alpha &&
>          (op == CAIRO_OPERATOR_OVER ||
>           op == CAIRO_OPERATOR_ATOP ||
>           op == CAIRO_OPERATOR_IN)))
>         return FALSE;

You could actually just set op to CAIRO_OPERATOR_SOURCE if this if-statement is
true. That would help for all cases.

> For the repeat case, the test in cairo looks like:
[...]
> And if this succeeds then instead of XRenderComposite, cairo calls into:
> 
>         XSetTSOrigin (dst->dpy, dst->gc,
>                       - (itx + src_attr.x_offset), - (ity + src_attr.y_offset));
>         XSetTile (dst->dpy, dst->gc, src->drawable);
>         XSetFillStyle (dst->dpy, dst->gc, FillTiled);
> 
>         XFillRectangle (dst->dpy, dst->drawable, dst->gc,
>                         dst_x, dst_y, width, height);

This can easily be not accelerated in the driver. Checking for potential
acceleration is not that trivial (at least I don't know how to do it right now).
I'll discuss this at Xorg Developer Conference.

> Beyond that, even if cairo didn't do any of these optimizations, it would be
> good if the server optimized these cases itself. Matthias has proposed a patch
> for the XCopyArea case, and something more should be needed for the FillTiled
case.

Personally I feel this should always be something for the Xserver, because 1) it
would help non-cairo applications as well 2nd) the Xserver can more easily
decide whether some accerlation is available and either use it or work around it.

(In reply to comment #18)
> Actually, there's a subtle issue here. If the server is characterized as not
> having "buggy_repeat" (that is Xorg > 6.8.2 or XFree86 4.5.0) then after the
> XCopyArea optimization check cairo currently decides to go immediately to
> XRenderComposite before even checking to see if FillTile with XFillRectangle
> might do the trick.
> 
> We can certainly change that if we have evidence that it will help.

I'm not entirely sure. Though it certainly wouldn't do no harm in the current
situation - except for EXA driver maybe.

> Oddly enough, I have access to a machine now (with Xorg 7.0.0) for which making
> cairo use either of XCopyArea, XFillRectangle is never faster than using
> XRenderComposite---always resulting in more than 2 seconds of time for the
> operation. This is with the big-fill test case, with CAIRO_EXTEND_REPEAT or
> CAIRO_EXTEND_NONE.

Can you check that machine whether it uses EXA? AFAIK EXA currently *only*
accelerates Render.

> I'm not sure why the X server would always be so slow, but I also don't know
> what more cairo could do in a situation like this.

I guess the best thing would be to have the decision in the Xserver and cairo
should always do Composite. It should also do some trivial optimizations like
OPERATOR_SOURCE and removing REPEAT if not necessary, but the Xserver should
these itself as well.

However, we should first get these optimizations in the Xserver, and remove
these acceleration hacks in cairo after they the optimizations have been
established. May even something like another version check.
Comment 20 Carl Worth 2006-02-04 04:20:13 UTC
(In reply to comment #19)
>
> I guess the best thing would be to have the decision in the Xserver and cairo
> should always do Composite.

Historically, that's what cairo has done. The difficulty is when Composite is
actually slower than alternate, existing code paths in deployed servers. That's
the justification for things like cairo's XCopyArea optimization.

Putting stuff like this into cairo is "dangerous" though as it's not future
proof, and may end up doing something slower in the future as new Render
acceleration is added while the support for old core requests stagnates.

So, as you said, the trick is in how one can characterize the performance of Render.

> However, we should first get these optimizations in the Xserver, and remove
> these acceleration hacks in cairo after they the optimizations have been
> established. May even something like another version check.

Makes sense to me. I think it would be legitimate for Render versions to
advertise things like "Any version of Render >= X.Y.Z optimizes any OVER
compositing with an opaque source and an identity transformation as a simple
copy" or whatever. That would at least provide some guidance for making
decisions in things like cairo.

Comment 21 Ray Strode [halfline] 2006-02-04 04:30:43 UTC
Hi,

Carl was playing around remotely on my laptop.  The X server he was playing on
was using XAA in a dual head (merged fb) setup.  

At the time there was another X server on a different VT, that was also using
XAA but had the the NoOffscreenPixmaps option enabled.  The X server Carl was
using had NoOffscreenPixmaps disabled (the default).
Comment 22 Adam Jackson 2006-02-04 06:08:41 UTC
(In reply to comment #12)
> If noone objects I will submit this to CVS during the next week.

trivially correct; please commit.
Comment 23 Adam Jackson 2006-02-04 06:10:16 UTC
(In reply to comment #19)
> Can you check that machine whether it uses EXA? AFAIK EXA currently *only*
> accelerates Render.

No, it accelerates solid fills and XCopyArea-style blits as well.

- ajax
Comment 24 Adam Jackson 2006-02-05 05:34:12 UTC
(In reply to comment #20)
> Makes sense to me. I think it would be legitimate for Render versions to
> advertise things like "Any version of Render >= X.Y.Z optimizes any OVER
> compositing with an opaque source and an identity transformation as a simple
> copy" or whatever. That would at least provide some guidance for making
> decisions in things like cairo.

I go back and forth on this issue and I usually land on the side of not
advertising these sorts of details.  There's a few reasons:

- We're not the only Render implementation in the world (XiG for example)
- Acceleration architecture has significant impact on what paths are considered
optimized
- OpenGL doesn't
- It's completely inappropriate to overload the API or protocol version numbers
to indicate performance hints

I might accept advertising something like a build number, that when combined
with the server's existing version information would give a fairly good idea of
what the performance characteristics are.  Although, to get the complete profile
you'd have to have a fairly long tuple: server build, Render build, acceleration
architecture name and build, driver name and build, and hardware subclass. 
There's probably some good justification for exporting this sort of information
from the server but it seems far beyond our scope here.

For this bug in particular I'm inclined to say "don't use a broken X" and rely
on the distro path to get patches backported.  This seems a clear candidate for
the next stable server release, for example.

And for what it's worth I'm working on translating a slightly more general form
of this optimization to EXA, which already has a Composite op reduction stage of
a sort.
Comment 25 Eric Anholt 2006-02-06 04:30:17 UTC
It looks like there's a minor issue in the diff.  Remember that an x8r8g8b8
source must be treated as if alpha was 1.0, so drawing that to an a8r8g8b8 dst
using a straight copy for Over or Src would be wrong, as the dest wouldn't end
up with a correct alpha channel.  Other than that, it looks good to me.
Comment 26 Adam Jackson 2006-02-15 10:05:36 UTC
*** Bug 5289 has been marked as a duplicate of this bug. ***
Comment 27 Matthias Hopf 2006-05-11 20:23:04 UTC
Finally updated patch and commited it.

I also think I can commit the following fastpath corner case improvement (as x
is not specified in xrgb), even though it *will* behave differently compared to
fb (x is initialized with 0xff in fbStore_x8r8g8b8):

RCS file: /cvs/xorg/xserver/xorg/hw/xfree86/xaa/xaaPict.c,v
retrieving revision 1.12
diff -u -p -r1.12 xaaPict.c
--- hw/xfree86/xaa/xaaPict.c    11 May 2006 10:18:08 -0000      1.12
+++ hw/xfree86/xaa/xaaPict.c    11 May 2006 10:21:57 -0000
@@ -516,7 +516,10 @@ XAAComposite (CARD8      op,
        (!pSrc->repeat || (xSrc >= 0 && ySrc >= 0 &&
                          xSrc+width<=pSrc->pDrawable->width &&
                          ySrc+height<=pSrc->pDrawable->height)) &&
-       ((op == PictOpSrc && pSrc->format == pDst->format) ||
+       ((op == PictOpSrc &&
+        ((pSrc->format==pDst->format) ||
+         (pSrc->format==PICT_a8r8g8b8 && pDst->format==PICT_x8r8g8b8) ||
+         (pSrc->format==PICT_a8b8g8r8 && pDst->format==PICT_x8b8g8r8))) ||
        (op == PictOpOver && !pSrc->alphaMap && !pDst->alphaMap &&
         pSrc->format==pDst->format &&
         (pSrc->format==PICT_x8r8g8b8 || pSrc->format==PICT_x8b8g8r8))))
Comment 28 Matthias Hopf 2006-05-11 20:37:01 UTC
Ajax, shall I submit to 7.1 just now before RC3?
Comment 29 Matthias Hopf 2006-06-21 08:14:11 UTC
Committed to git.

(In reply to comment #16)
> Beyond that, even if cairo didn't do any of these optimizations, it would be
> good if the server optimized these cases itself. Matthias has proposed a patch
> for the XCopyArea case, and something more should be needed for the FillTiled
case.

Leaving this bug open for the FillTiled case.
Comment 30 Jasmin Buchert 2006-12-05 12:43:35 UTC
What Frederic described sounds related to my infamous "very strange radeon 
behaviour" problem.

If I start Xorg, 2D ist slow. Moving the Terminals etc. is slow. But now the 
strange thing: If I run glxinfo it's fast(er)... for a while..

Some operations (after "speeding it up" with glxinfo) seem to put the r200 
back into the "slow mode". I seem to hit these specific operations at least 
every few minutes or so.
Especially QT apps like Konqueror exhibit this phenomenon. On some webpages 
the scrolling performance is ultra slow, but after running glxinfo while on 
the site its fast again.
The same strange performance bug haunts Composite/xcompmgr. Only there the 
difference is more extreme.

I have the feeling Gnome applications are more immune to this problem but I'm 
not 100% sure.
I reported the problem a while ago but nobody seemed to understand what I 
meant (perhaps my bad english).

My card is a Radeon RV280 (rev 01) on AMD64.
I hope this helps..
Comment 31 Jasmin Buchert 2006-12-05 12:47:35 UTC
I forgot to mention that I of course tested the latest GIT version.
Comment 32 Federico Mena-Quintero 2006-12-20 17:35:26 UTC
(In reply to comment #30)
> If I start Xorg, 2D ist slow. Moving the Terminals etc. is slow. But now the 
> strange thing: If I run glxinfo it's fast(er)... for a while..
> 
> Some operations (after "speeding it up" with glxinfo) seem to put the r200 
> back into the "slow mode". I seem to hit these specific operations at least 
> every few minutes or so.

This seems to be related to how full of pixmaps your video memory is.

Here's a cumbersome but more or less reliable way to reproduce the slowdown:

1. Start X.  See that "moving terminals" is fast.
2. Start Firefox.  Load one of those pages with a *ton* of photos.  Wink wink,
you know what kind of page.  Or open many tabs with such pages (different pages,
so that they show different images).  Since Firefox keeps pixmaps for all the
images that are displayed on all its open tabs, it's easy to fill up VRAM like this.
3. See that "moving terminals" is slow again.

I have no idea if running glxinfo will "make it fast" again.
Comment 33 Pat Suwalski 2006-12-21 07:10:19 UTC
I don't think these last few comments are the same issue.
Comment 34 Daniel Stone 2007-02-27 01:27:48 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 35 Nicolò Chieffo 2007-07-04 13:19:35 UTC
now that evince uses cairo I have a problem very similar to this: when opening a pdf and moving a terminal over it, everything gets stucked and the cpu is used so much! the same thing happens when scrolling the 
I also have the firefox scrolling problem in pages with big images
And I also have big performance issues when using compiz
Can my problems be related to this bug? (especially the evince one: it is vital to see pdfs!)
Comment 36 Corbin Simpson 2010-03-19 21:12:14 UTC
The original patch appears in git (ea5e0eab), XAA is (hopefully) deprecated for most people, no followup patches have been posted, and "recent" comments appear to be horribly off-track. Closing; if anybody has further patches, send them to the mailing list.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.