Summary: | EXA subpixel glyph rendering terribly slow | ||
---|---|---|---|
Product: | xorg | Reporter: | Pierre Ossman <pierre-bugzilla> |
Component: | Server/Acceleration/EXA | Assignee: | Eric Anholt <eric> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | high | CC: | eric |
Version: | git | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 6666 |
Description
Pierre Ossman
2005-10-03 01:52:09 UTC
If you edit exapict.c to change the line in exaGlyphs(): if (!pExaScr->info->accel.PrepareComposite) { to if (!pExaScr->info->accel.PrepareComposite || (maskFormat && NeedsComponent(maskFormat->format))) { does that make the performance more like you'd expect? Afraid not. It's still terribly slow. I committed the patch I suggested, which improved runtimes of ls -lR in my gnome-terminal by 62%. Could you test with CVS and see if you're still having the problem? If you are, could you please tell me what exactly you're running to see this issue, and how it compares to without using EXA? (numbers, please) Testing this proved difficult since there is no way to get 'time' to catch rendering time in terminals. x11perf fortunatly had some undocumented tests so I ran the following: $ x11perf --aa24text --rgb24text -> -aa24text Char in 30-char aa line (Charter 24) -> -rgb24text Char in 30-char rgb line (Charter 24) -> -rgb24text Char in 30-char rgb core line (Charter 24) Results: EXA: 6400000 trep @ 0.0047 msec (214000.0/sec): Char in 30-char aa line (Charter 24) 64000 trep @ 0.4997 msec ( 2000.0/sec): Char in 30-char rgb line (Charter 24) 128000 trep @ 0.2240 msec ( 4460.0/sec): Char in 30-char rgb core line (Charter 24) EXA, no RenderAccel: 320000 trep @ 0.0756 msec ( 13200.0/sec): Char in 30-char aa line (Charter 24) 480000 trep @ 0.0641 msec ( 15600.0/sec): Char in 30-char rgb line (Charter 24) 128000 trep @ 0.2240 msec ( 4460.0/sec): Char in 30-char rgb core line (Charter 24) XAA: 8000000 trep @ 0.0034 msec (295000.0/sec): Char in 30-char aa line (Charter 24) 480000 trep @ 0.0642 msec ( 15600.0/sec): Char in 30-char rgb line (Charter 24) 128000 trep @ 0.2215 msec ( 4520.0/sec): Char in 30-char rgb core line (Charter 24) XAA, no RenderAccel: 480000 trep @ 0.0738 msec ( 13600.0/sec): Char in 30-char aa line (Charter 24) 480000 trep @ 0.0643 msec ( 15600.0/sec): Char in 30-char rgb line (Charter 24) 128000 trep @ 0.2222 msec ( 4500.0/sec): Char in 30-char rgb core line (Charter 24) The results here are consistent with the percieved performance from day-to-day usage. xorg CVS from 2005-10-11. the way you time ls -lR is by saying time ls -lR since ls will block until output (and therefore rendering) completes this is a pretty effective way of measuring rendering time. ls will only block until gnome-terminal accepts all data, not until rendering is complete. time reports the same value with the different font settings, but there is a significant difference if you measure it externally. E.g. time claims 0.5s but you can easily measure the time to several seconds. That said, if you give it enough data the buffers do not screw up the results too much. One run gave 6 seconds vs. 42 seconds for grayscale vs. subpixel with EXA on. With software rendering subpixel requires 8 seconds. x11perf tests X operations directly so it is a much better tool here. (In reply to comment #6) > ls will only block until gnome-terminal accepts all data, not until rendering is > complete. except that exit() calls fflush(stdout), so yes, it does wait. at any rate exa is experimental in 7.0 anyway, so this is not a 7.0 blocker, though i'll certainly take fixes for it if they arise. (In reply to comment #7) > at any rate exa is experimental in 7.0 anyway, so this is not a 7.0 blocker, > though i'll certainly take fixes for it if they arise. Utter lack of motion on this since 7.0 means it's either no longer an issue or it's not enough of one to block 7.1. Moving out to 7.2. Did a test of the current version and got: 320000 trep @ 0.0829 msec ( 12100.0/sec): Char in 30-char aa line (Charter 24) 320000 trep @ 0.0823 msec ( 12100.0/sec): Char in 30-char rgb line (Charter 24) 320000 trep @ 0.0825 msec ( 12100.0/sec): Char in 30-char rgb core line (Charter 24) Very strange... I can see that it uses grayscale for the first and sub-pixel for the other test, so it isn't doing the same thing three times. So sub-pixel rendering has gotten a lot better, but "normal" aa doesn't seem to be accelerated anymore. Is this on a 'naked' X server? Running xserver HEAD? If so, can you try with Option "MigrationHeuristics" "greedy" and "always"? This is fedora rawhide, which I believe is at least partly taken from CVS. xorg-x11-drv-ati-6.6.0-1 xorg-x11-server-Xorg-1.0.99.901-5 The driver doesn't seem to contain the strings you mention, so I'll have to build it from CVS. I'll get back to you. :) (In reply to comment #11) > The driver doesn't seem to contain the strings you mention, [...] It's an option in EXA, not the driver. So I discovered. And it is also included in Red Hat's package. I'll commence the testing then. :) Ehm... where do I specify this option? If I put it in ServerFlags it doesn't say anything anywhere, and if I put it in Device then it says that it ignores it. Ah, looks like RC1 didn't support several migration schemes yet. Meaning I should do what? Compile the server from CVS? And where should the option be in that case? (In reply to comment #16) > Meaning I should do what? Compile the server from CVS? If you want to try different migration schemes, yes; at least EXA. > And where should the option be in that case? I have it in the device section. I tried replacing libexa.so from a current CVS build and added MigrationHeuristics to the device section, but all I got was: (WW) RADEON(0): Option "MigrationHeuristics" is not used I also get some funky glitches here and there so I'm guessing replacing just libexa.so wasn't completely safe. :) OK, I've committed an update which improves the situtuation, but doesn't quite fix it. Man, I hate Glyphs. Here are some comparisons on my laptop between XAA, EXA right before my patch, and EXA right after my patch: 1: /home/anholt/text-xaa 2: /home/anholt/text-exa-before 3: /home/anholt/text-exa-after 1 2 3 Operation -------- ----------------- ----------------- ----------------- 178000.0 110000.0 ( 0.62) 183000.0 ( 1.03) Char in 30-char aa line (Charter 24) 17200.0 3150.0 ( 0.18) 16200.0 ( 0.94) Char in 30-char rgb line (Charter 24) 1: /home/anholt/text-compmgr-xaa 2: /home/anholt/text-compmgr-exa-before 3: /home/anholt/text-compmgr-exa-after 1 2 3 Operation -------- ----------------- ----------------- ----------------- 167000.0 163000.0 ( 0.98) 103000.0 ( 0.62) Char in 30-char aa line (Charter 24) 16100.0 16100.0 ( 1.00) 53700.0 ( 3.34) Char in 30-char rgb line (Charter 24) And ls -lR times without a compmgr, with aa and rgb (smaller is better, ouch!): x /home/anholt/time-xaa + /home/anholt/time-exa-before * /home/anholt/time-exa-after +--------------------------------------------------------------------------+ | x xx x x + + * ** * *| ||______M____A____________| |____M__A_______| |___M___A________| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 6.03 6.61 6.08 6.184 0.24151605 + 5 6.67 7.04 6.72 6.764 0.15630099 Difference at 95.0% confidence 0.58 +/- 0.296677 9.37904% +/- 4.7975% (Student's t, pooled s = 0.203421) * 5 6.99 7.38 7.05 7.132 0.16483325 Difference at 95.0% confidence 0.948 +/- 0.301549 15.3299% +/- 4.87627% (Student's t, pooled s = 0.206761) x /home/anholt/time-xaa-rgb + /home/anholt/time-exa-rgb-before * /home/anholt/time-exa-rgb-after +--------------------------------------------------------------------------+ |x * + | |x * + | |x *** +++| |A |A| MA|| +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 12.5 12.89 12.58 12.648 0.1512283 + 5 78.84 80.74 79.33 79.588 0.76809505 Difference at 95.0% confidence 66.94 +/- 0.807324 529.254% +/- 6.38302% (Student's t, pooled s = 0.553552) * 5 41.9 43.44 42.65 42.55 0.62373873 Difference at 95.0% confidence 29.902 +/- 0.661882 236.417% +/- 5.2331% (Student's t, pooled s = 0.453828) So the current patch makes things better except for aa24text with a compmgr, which I would guess to be because of the additional damage computation. Beating XAA on composited rgb24text makes me pretty happy. But it doesn't catch us up to XAA for gnome-terminal with subpixel, which I suspect is because the intersection test in the patch isn't conservative enough. Leaving this open until I can figure out what's up with gnome-terminal. I'm not getting those numbers. Here things got worse with the latest CVS (compared to RC1): 8000000 trep @ 0.0034 msec (296000.0/sec): Char in 30-char aa line (Charter 24) 128000 trep @ 0.2123 msec ( 4710.0/sec): Char in 30-char rgb line (Charter 24) 128000 trep @ 0.2132 msec ( 4690.0/sec): Char in 30-char rgb core line (Charter 24) X also eats silly ammounts of CPU. The machine isn't really usable right now. With my commit today, the subpixel glyph rendering with Radeon driver CVS and Xorg CVS is now 1/2 the speed of AA text rendering (96000 glyphs/sec for rgb24text). This makes sense, since it's done in two passes. Gnome-terminal is also correspondingly faster. Marking it fixed, even though I hope to merge to stable branch. Current CVS of ATI driver and server: XAA: 8000000 trep @ 0.0032 msec (313000.0/sec): Char in 30-char aa line (Charter 24) 320000 trep @ 0.0810 msec ( 12300.0/sec): Char in 30-char rgb line (Charter 24) 320000 trep @ 0.0810 msec ( 12300.0/sec): Char in 30-char rgb core line (Charter 24) EXA: 8000000 trep @ 0.0033 msec (303000.0/sec): Char in 30-char aa line (Charter 24) 4800000 trep @ 0.0063 msec (159000.0/sec): Char in 30-char rgb line (Charter 24) 4800000 trep @ 0.0063 msec (159000.0/sec): Char in 30-char rgb core line (Charter 24) Anholt, where do I sign up for your fan club? ;) There is something else broken with the current EXA though (compared to RC1). The machine is very sluggish, particularly dragging windows around and scrolling in firefox. I don't know much about EXA operations, but perhaps there is a "move region" operation that has been tweaked? Should I open a new bug for this? Yes, I've also seen some other performance issues recently, and I'm working on tracking them down. If you wanted to bug-track it, a new bug would absolutely be the place. Bug 6773 has been opened. Things occasionally tend to drag out a bit so a tracker bug is always nice. :) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.