Bug 27284 - latest r6xx+ commits kill performance on my system
Summary: latest r6xx+ commits kill performance on my system
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 24003 27283 27296 27678 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-03-24 02:06 UTC by Martin Stolpe
Modified: 2010-06-04 10:56 UTC (History)
15 users (show)

See Also:
i915 platform:
i915 features:


Attachments
switching tabs in opera (121.67 KB, application/gzip)
2010-03-24 02:06 UTC, Martin Stolpe
no flags Details
dmesg.log (26.41 KB, application/octet-stream)
2010-03-24 02:07 UTC, Martin Stolpe
no flags Details
Xorg log (69.17 KB, application/octet-stream)
2010-03-24 02:08 UTC, Martin Stolpe
no flags Details
xorg.conf (4.33 KB, application/octet-stream)
2010-03-24 02:09 UTC, Martin Stolpe
no flags Details
allow bo placement in vram or gart (2.73 KB, patch)
2010-03-24 09:08 UTC, Alex Deucher
no flags Details | Splinter Review
switching tabs in firefox oprofile (162.55 KB, application/x-gzip)
2010-03-24 15:04 UTC, Martin Stolpe
no flags Details
switching tabs in firefox oprofile (162.55 KB, application/x-gzip)
2010-03-24 15:25 UTC, Martin Stolpe
no flags Details
flush command stream if bo domain changes (2.09 KB, patch)
2010-03-25 08:38 UTC, Alex Deucher
no flags Details | Splinter Review
oprofile only second patch applied (194.22 KB, application/x-gzip)
2010-03-25 15:10 UTC, Martin Stolpe
no flags Details
oprofile, both patches applied (182.87 KB, application/x-gzip)
2010-03-25 15:11 UTC, Martin Stolpe
no flags Details
slowdown, both patches applied (116.84 KB, application/x-gzip)
2010-03-26 01:36 UTC, Martin Stolpe
no flags Details
sysprof of glxgears running at 7fps (356.52 KB, text/plain)
2010-03-27 13:51 UTC, Andy Furniss
no flags Details
dmesg output on hang (58.44 KB, patch)
2010-04-14 09:25 UTC, Andreas Juch
no flags Details | Splinter Review
Xorg.log after hang (36.59 KB, patch)
2010-04-14 09:31 UTC, Andreas Juch
no flags Details | Splinter Review

Description Martin Stolpe 2010-03-24 02:06:43 UTC
Created attachment 34394 [details]
switching tabs in opera

Switching tabs in firefox and opera is really slow after the latest commits.
Comment 1 Martin Stolpe 2010-03-24 02:07:47 UTC
Created attachment 34395 [details]
dmesg.log
Comment 2 Martin Stolpe 2010-03-24 02:08:42 UTC
Created attachment 34396 [details]
Xorg log
Comment 3 Martin Stolpe 2010-03-24 02:09:11 UTC
Created attachment 34397 [details]
xorg.conf
Comment 4 Alex Deucher 2010-03-24 09:08:59 UTC
Created attachment 34414 [details] [review]
allow bo placement in vram or gart

Does this patch help?
Comment 5 Kevin DeKorte 2010-03-24 09:35:17 UTC
I tried the patch and it did not solve the regression on my machine. I found that I can only duplicate this issue when running with a window manager that does not have composite enabled. (ie. stock metacity). Running with compiz makes the slowness go away.
Comment 6 Martin Stolpe 2010-03-24 14:09:32 UTC
Yes this patch seems to have fixed the problem. Thanks a lot!
Comment 7 Alex Deucher 2010-03-24 14:43:49 UTC
*** Bug 27296 has been marked as a duplicate of this bug. ***
Comment 8 Martin Stolpe 2010-03-24 15:04:48 UTC
Created attachment 34424 [details]
switching tabs in firefox oprofile

Sorry, I was celebrating too early. Tab switching in Opera works fine now but it's slow when using firefox. I've created another oprofile for firefox.
Comment 9 Alex Deucher 2010-03-24 15:14:59 UTC
Not yet resolved.
Comment 10 Martin Stolpe 2010-03-24 15:25:31 UTC
Created attachment 34425 [details]
switching tabs in firefox oprofile

Sorry, I was celebrating too early. Tab switching in Opera works fine now but it's slow when using firefox. I've created another oprofile for firefox.
Comment 11 Andy Furniss 2010-03-25 05:01:53 UTC
(In reply to comment #7)
> *** Bug 27296 has been marked as a duplicate of this bug. ***
> 

Have done some more testing and it's the first commit - 

dda3f5a99e7a2dc5d57860f4d07df3498e1e21df
r6xx EXA/Xv: track src/dst domains

that introduces the problems for me.
Comment 12 Andy Furniss 2010-03-25 05:18:08 UTC
(In reply to comment #11)

Testing head + the patch does seem to solve the seamonkey perf, but VT1 is still flooded with 

WRITE DOMAIN RELOC FAILURE 0xd 6 2
WRITE DOMAIN RELOC FAILURE 0xd 2 6
Comment 13 Alex Deucher 2010-03-25 07:30:35 UTC
*** Bug 27283 has been marked as a duplicate of this bug. ***
Comment 14 Alex Deucher 2010-03-25 08:38:26 UTC
Created attachment 34434 [details] [review]
flush command stream if bo domain changes

Can you try this patch both with and without the previous one?
Comment 15 Andy Furniss 2010-03-25 10:50:32 UTC
(In reply to comment #12)
> (In reply to comment #11)
> 
> Testing head + the patch does seem to solve the seamonkey perf

After about 4 hrs of running this things fell apart -

7fps in glxgears

xv didn't draw.

screen taking 1/2 sec to update its self.

Comment 16 Pauli 2010-03-25 10:54:48 UTC
> --- Comment #15 from Andy Furniss <lists@andyfurniss.entadsl.com>  2010-03-25 10:50:32 PST ---
> (In reply to comment #12)
>> (In reply to comment #11)
>>
>> Testing head + the patch does seem to solve the seamonkey perf
>
> After about 4 hrs of running this things fell apart -
>
> 7fps in glxgears
>
> xv didn't draw.
>
> screen taking 1/2 sec to update its self.
>
Is there any errors in xorg.log or dmesg when the slow down happens?
Comment 17 Andy Furniss 2010-03-25 10:57:19 UTC
(In reply to comment #14)
> Created an attachment (id=34434) [details]
> flush command stream if bo domain changes
> 
> Can you try this patch both with and without the previous one?
> 

Both patches seem to (while it lasts) fix the perf but whether alone or together I still get the RELOC errors to stderr.

Both in any combination seem to fix the dmesg error I get with unpatched head -

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !
Comment 18 Andy Furniss 2010-03-25 11:08:15 UTC
(In reply to comment #16)

> Is there any errors in xorg.log or dmesg when the slow down happens?

Nothing in dmesg - I briefly saw a different line scroll off when I quit X but my fbcon scroll back seems to be limited to a couple of lines so I can't say what it said. Now I've tried the other patch I'll try and recreate it and redirect to a file this time.

Comment 19 Martin Stolpe 2010-03-25 15:10:29 UTC
Created attachment 34449 [details]
oprofile only second patch applied
Comment 20 Martin Stolpe 2010-03-25 15:11:25 UTC
Created attachment 34450 [details]
oprofile, both patches applied

My system seems to be running fine with the second patch.
Comment 21 Andy Furniss 2010-03-25 17:47:28 UTC
(In reply to comment #18)
> (In reply to comment #16)
> 
> > Is there any errors in xorg.log or dmesg when the slow down happens?
> 
> Nothing in dmesg - I briefly saw a different line scroll off when I quit X but
> my fbcon scroll back seems to be limited to a couple of lines so I can't say
> what it said. Now I've tried the other patch I'll try and recreate it and
> redirect to a file this time.
> 

After waiting ages and accumulating 22k reloc errors I decided to retrace my steps and so now I've managed to find a way to trigger this - just use flash - something which doesn't usually happen as I run flashblock. Unblocking a flash totally trashes perf even after seamonkey is closed

The error when perf is trashed is -

space check failed in flush

oprofile (not that I totally trust it) shows most time in -

1116680  80.3342  libpixman-1.so.0.17.3    libpixman-1.so.0.17.3    pixman_blt_mmx

It happens with unpatched head and the first patch.

Running with the second patch alone or + the first patch fixes it.


Comment 22 Andy Furniss 2010-03-25 18:08:40 UTC
(In reply to comment #21)

> I've managed to find a way to trigger this - just use flash -
> something which doesn't usually happen as I run flashblock. Unblocking a flash
> totally trashes perf even after seamonkey is closed

More testing shows that not just any flash will trigger it, but this one does -

http://www.speedtest.bbmax.co.uk/
Comment 23 Martin Stolpe 2010-03-26 01:34:01 UTC
(In reply to comment #21)
> (In reply to comment #18)
> > (In reply to comment #16)
> > 
> > > Is there any errors in xorg.log or dmesg when the slow down happens?
> > 
> > Nothing in dmesg - I briefly saw a different line scroll off when I quit X but
> > my fbcon scroll back seems to be limited to a couple of lines so I can't say
> > what it said. Now I've tried the other patch I'll try and recreate it and
> > redirect to a file this time.
> > 
> 
> After waiting ages and accumulating 22k reloc errors I decided to retrace my
> steps and so now I've managed to find a way to trigger this - just use flash -
> something which doesn't usually happen as I run flashblock. Unblocking a flash
> totally trashes perf even after seamonkey is closed
> 
> The error when perf is trashed is -
> 
> space check failed in flush
> 
> oprofile (not that I totally trust it) shows most time in -
> 
> 1116680  80.3342  libpixman-1.so.0.17.3    libpixman-1.so.0.17.3   
> pixman_blt_mmx
> 
> It happens with unpatched head and the first patch.
> 
> Running with the second patch alone or + the first patch fixes it.
> 

It's much better with the patches (I've tested it only with both patches applied) but it's still possibly to provoke the slowdown. I just have to open a few youtube tabs (videos paused). pixman_blt_mmx still seems to be problematic.
Comment 24 Martin Stolpe 2010-03-26 01:36:30 UTC
Created attachment 34470 [details]
slowdown, both patches applied

I'm have no idea if those oprofiles are really needed. Please let me know if you find them useful. I don't want to flood this bug report with oprofiles which no one needs.
Comment 25 Pauli 2010-03-26 01:48:33 UTC
> Created an attachment (id=34470)
>  --> (http://bugs.freedesktop.org/attachment.cgi?id=34470)
> slowdown, both patches applied
>
> I'm have no idea if those oprofiles are really needed. Please let me know if
> you find them useful. I don't want to flood this bug report with oprofiles
> which no one needs.
>

They are good but there is still some missing information that has to be solved.

What is causing the pixman calls? (pixman is software rasterizer)
Comment 26 Luca Tettamanti 2010-03-26 13:46:48 UTC
(In reply to comment #22)
> (In reply to comment #21)
> 
> > I've managed to find a way to trigger this - just use flash -
> > something which doesn't usually happen as I run flashblock. Unblocking a flash
> > totally trashes perf even after seamonkey is closed
> 
> More testing shows that not just any flash will trigger it, but this one does -
> 
> http://www.speedtest.bbmax.co.uk/
> 

Might be related to bug #15293. In that case the performance hit was caused by flash reading back the video in order to draw stuff (e.g. the controls) over it.
I did see a bit of activity in pixman, though it was nowhere near what you're seeing.
Comment 27 Andy Furniss 2010-03-27 13:08:58 UTC
(In reply to comment #21)

> The error when perf is trashed is -
> 
> space check failed in flush
> 
> oprofile (not that I totally trust it) shows most time in -
> 
> 1116680  80.3342  libpixman-1.so.0.17.3    libpixman-1.so.0.17.3   
> pixman_blt_mmx
> 
> It happens with unpatched head and the first patch.
> 
> Running with the second patch alone or + the first patch fixes it.

I was too hasty in saying the second patch fixes it - I can still trigger, it just takes a bit longer.

With patch2 I don't see the "space check failed in flush" errors when it happens.

The pixman oprofile above was running glxgears after triggering and closing seamonkey.

If I take a profile while just moving an xterm around (which is only redrawing at 2fps) then libc memcpy is the cpu hog and libpixman barely shows.
Comment 28 Andy Furniss 2010-03-27 13:51:24 UTC
Created attachment 34512 [details]
sysprof of glxgears running at 7fps

(In reply to comment #25)

> They are good but there is still some missing information that has to be
> solved.
> 
> What is causing the pixman calls? (pixman is software rasterizer)

In case it shows anything more than oprofile.
Here's a sysprof of glxgears running at 7fps after I've triggered the bug.
Comment 29 Michel Dänzer 2010-03-29 01:08:23 UTC
It does look like the X driver is falling back to software for everything for some reason.
Comment 30 Alex Deucher 2010-03-31 19:34:35 UTC
bc93395b3eb5e3511c1b62af90693269f4fa6e13 should hopefully fix this.
Comment 31 Martin Stolpe 2010-04-01 03:06:50 UTC
(In reply to comment #30)
> bc93395b3eb5e3511c1b62af90693269f4fa6e13 should hopefully fix this.
> 

I'm using git version 6baa96c44ca93b88acf5233335cee233e59d5af4 and wasn't able to trigger the software fallback. Hopefully this bug is really fixed as it was not clear what actually triggered the bug.
Comment 32 Alexandre Derumier 2010-04-01 03:26:41 UTC
i have no problem anymore ( Bug 27283), with last git.

thanks !
Comment 33 Andy Furniss 2010-04-01 06:27:23 UTC
(In reply to comment #30)
> bc93395b3eb5e3511c1b62af90693269f4fa6e13 should hopefully fix this.
> 

Todays head is working OK for me.
Comment 34 Andreas Juch 2010-04-14 09:24:17 UTC
I'm still experiencing similar effects. Firefox triggers the "[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !" messages when scrolling up/down fast. Sometimes Xorg freezes. Remote ssh is still possible, but I didn't get a shell, just motd. At one occasion it didn't crash complete and i got the following dmesg output (attachment). Xorg.0.log is not reporting anything special. Maybe it's a different bug, but the message is still the same...

Software is:
Kernel:    2.6.33-2-amd64 from Debian experimental
libdrm2:   2.4.18-4
libgl:     7.7.1-1
radeon:    1:6.13.0-1
xorg-core: 2:1.7.6-2

Hardware:
01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3470 (prog-if 00 [VGA controller])
	Subsystem: PC Partner Limited Device e390
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 34
	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at d0100000 (64-bit, non-prefetchable) [size=64K]
	Region 4: I/O ports at 2000 [size=256]
	Expansion ROM at d0120000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: radeon
Comment 35 Andreas Juch 2010-04-14 09:25:58 UTC
Created attachment 35012 [details] [review]
dmesg output on hang
Comment 36 Andreas Juch 2010-04-14 09:31:02 UTC
Created attachment 35013 [details] [review]
Xorg.log after hang
Comment 37 Alex Deucher 2010-04-16 07:40:42 UTC
*** Bug 27678 has been marked as a duplicate of this bug. ***
Comment 38 Alex Deucher 2010-04-16 07:41:19 UTC
still seems to be problematic.
Comment 39 Bryce Harrington 2010-04-16 10:29:30 UTC
Alex, on bug 27678 Brian has identified that Ubuntu's 1:6.12.192-2ubuntu2 does not show the problem, so it looks like this is a regression between 6.12.192 and 6.13.0 if that helps.
Comment 40 roberth 2010-04-16 10:41:04 UTC
(In reply to comment #39)
> Alex, on bug 27678 Brian has identified that Ubuntu's 1:6.12.192-2ubuntu2 does
> not show the problem, so it looks like this is a regression between 6.12.192
> and 6.13.0 if that helps.

Ubuntu's 6.12.192-2ubuntu2 mentioned above was a git checkout up to commit 5c256808cb5fea955eea96ffe9196473715156aa
"XAA: disable render accel"

after the 6.12.192 tag for future reference.
Comment 41 Andreas Juch 2010-04-16 11:04:10 UTC
The debian version 1:6.12.192-2 also works fine on my system.
Comment 42 Alex Deucher 2010-04-16 11:24:32 UTC
Can you narrow down with op is causing the problem?  Add:
return FALSE;
to the top of R600PrepareCopy() or R600PrepareSolid() or R600UploadToScreenCS() or R600DownloadFromScreenCS() or R600PrepareComposite() in r600_exa.c and see if any of them prevent the problem.
Comment 43 Andreas Juch 2010-04-16 13:27:37 UTC
I tried disabling, one at a time. Results are not very useful, I fear:

1 Disabling R600PrepareCopy resulted in slow FF scrolling

2 Disabling R600PrepareSolid: Xorg freeze

3 Disabling R600UploadToScreenCS: Xorg freeze

4 Disabling R600DownloadFromScreenCS: Xorg freeze

5 Disabling R600PrepareComposite: no freeze, just slow scrolling. I tried it for ~20 minutes, methods 2, 3, 4 crashed after <5 minutes.

The funny thing is, that I got no "Failed to parse relocation" dmesg errors. Maybe debian/ubuntu isn't using the git tag xf86-video-ati-6.13.0 as I did in these tests...
Comment 44 Alex Deucher 2010-04-18 10:54:51 UTC
*** Bug 24003 has been marked as a duplicate of this bug. ***
Comment 45 Victor NOEL 2010-04-19 01:09:48 UTC
I don't know if it is helpful but here I just get slow scrolling in firefox.

A good example of extremely slow webpage is http://www.ofai.at/research/agents/conf/at2ai7/

And I also get the error:
[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !

But that's all.
Comment 46 Jerome Glisse 2010-06-04 03:41:14 UTC
I am closing this bug as the original issue is fixed, please test a kernel which has the e86527533586259875f08fccb173e3347046cc3f commit and if such kernel fails open a new bug and attach full dmesg + full lspci -v output.

You can test :
http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=shortlog;h=refs/heads/drm-radeon-testing
Comment 47 A 2010-06-04 10:56:46 UTC
Hi. I have manually applied the patch that the mentioned commit consists of, but the problem only seems mitigated, not fixed.

In dmesg I just found this:
radeon 0000:01:05.0: ffff880109913200 reserve failed for wait

Video device is:
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3200 Graphics (prog-if 00 [VGA controller])

Also, I was wondering whether an issue which was apparently pinpointed as a radeon driver regression could be fixed by a kernel patch alone (of course it's possible, I was just wondering). And then, would it be possible that it's not that commit alone, but a "family" of patches that fixed the issue?

Honestly I don't feel I have the authority to reopen this bug, but I'm not sure it can actually be called "resolved-fixed".

I am obviously available to provide detailed information and to help troubleshooting and isolating even further the issue, if needed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.