Bug 23187

Summary: cairo's performance drops greatly caused by the kernel
Product: DRI Reporter: zhao jian <jian.j.zhao>
Component: DRM/IntelAssignee: Carl Worth <cworth>
Status: CLOSED WORKSFORME QA Contact:
Severity: normal    
Priority: high Keywords: regression
Version: XOrg git   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.0.log none

Description zhao jian 2009-08-07 00:58:07 UTC
Created attachment 28415 [details]
xorg.0.log

System Environment:
----------------------
Platform:       G41
Arch:           x86_64
OSD:            Fedora release 9 (Sulphur)
Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
Kernel:         (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9

Bug Description:
---------------------
I test with cairo-perf on G41, find there is regression when test with
poppler-alt-20090608.trace and poppler-20090608.trace. Maybe they are the
same issue. The performance data I get with the code of 20090729 is 25% slower than the data with our Q2 release code. And I find the regression is caused by the kernel, if I only change the kernel from drm-intel-next to qa branch(d2a55f12b2425ed1febb9e29e10bb405b91aa8dd),it performs much better. 
with code of 20090729 and kernel on drm-intel-next(2a2430f4542467502d39660bfd66b0004fd8d6a9):
poppler-alt-20090608.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image         poppler-alt-20090608   47.803   47.920   0.24%    6/6
[  0]     xlib         poppler-alt-20090608  253.262  253.386   0.25%    6/6
poppler-20090608.trace
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image             poppler-20090608   47.770   48.047   0.24%    6/6
[  0]     xlib             poppler-20090608  252.956  255.086   0.46%    6/6
if I still use code of 20090729 and only change the kernel to qa branch (d2a55f12b2425ed1febb9e29e10bb405b91aa8dd):
poppler-alt-20090608.trace.KMS
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image         poppler-alt-20090608   46.875   47.566   0.95%    6/6
[  0]     xlib         poppler-alt-20090608  196.346  197.292   0.41%    6/6
poppler-20090608.trace.KMS
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image             poppler-20090608   46.587   46.744   0.61%    6/6
[  0]     xlib             poppler-20090608  196.546  197.130   0.29%    6/6

Reproduce Steps:
---------------------
1. xinit&
2. cairo-perf-trace poppler-alt-20090608.trace(poppler-20090608.trace)
Comment 1 Carl Worth 2009-09-14 14:45:08 UTC
(In reply to comment #0)
> System Environment:
> ----------------------
> Platform:       G41
> Arch:           x86_64
> OSD:            Fedora release 9 (Sulphur)
> Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
> Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
> Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
> Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
> Kernel:         (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9

Thanks for the bug report.

The details above showing the versions at which the regression
first appeared are very appreciated. Thanks!

What's missing is the previously tested versions at which things were
last seen to be working. From a separate report, I believe these are
the working versions:

Last known versions without regression
--------------------------------------
Libdrm:         (master)30449829c0347dc7dbe29acb13e49e2f2cb72ae9
Mesa:           (master)506bacb8e40b0a170a4b620113506925d2333735
Xserver:                (master)b1c3dc6ae226db178420e3b5f297b94afc87c94c
Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
Kernel_unstable:    (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9

Let me know if I didn't get those right.

-Carl
Comment 2 zhao jian 2009-09-14 23:23:41 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > System Environment:
> > ----------------------
> > Platform:       G41
> > Arch:           x86_64
> > OSD:            Fedora release 9 (Sulphur)
> > Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
> > Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
> > Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
> > Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
> > Kernel:         (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9
> Thanks for the bug report.
> The details above showing the versions at which the regression
> first appeared are very appreciated. Thanks!
> What's missing is the previously tested versions at which things were
> last seen to be working. From a separate report, I believe these are
> the working versions:
> Last known versions without regression
> --------------------------------------
> Libdrm:         (master)30449829c0347dc7dbe29acb13e49e2f2cb72ae9
> Mesa:           (master)506bacb8e40b0a170a4b620113506925d2333735
> Xserver:                (master)b1c3dc6ae226db178420e3b5f297b94afc87c94c
> Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
> Kernel_unstable:    (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9
> Let me know if I didn't get those right.
> -Carl

No. Carl, I think you have some misunderstanding. 
As is said in my bug description, I first found this regression with the code of 20090729 compared with our Q2 release. And finally I tested with the code of 20090729 and only change the kernel to qa-branch(d2a55f12b2425ed1febb9e29e10bb405b91aa8dd) it works well, so I think it is caused by kernel. You can just try with the commit of 20090729 and compare drm-intel-next and qa-branch. 
And the code of 20090729 is: 
Libdrm:         (master)5a73f066ba149816cc0fc2de4b97ec4714cf8ebc
Mesa:           (master)03607708b0499816291f0fb0d1c331fbf034f0ba
Xserver:        (master)a85523dc50f392a33a1c00302a0946828bc9249d
Xf86_video_intel:         (master)50e2a6734de43a135aa91cd6e6fb5147e15ce315
Kernel:         (drm-intel-next)2a2430f4542467502d39660bfd66b0004fd8d6a9 (bad)
Kernel:         (qa-branch) d2a55f12b2425ed1febb9e29e10bb405b91aa8dd     (good)
Comment 3 Chris Wilson 2010-07-18 02:30:04 UTC
Sigh, it's only 2 million lines of changes to review...

My fear, given the nature of the traces, is that it is some subtle interaction of the memory manager. I will have a look and see if I can reproduce this with the current stack on top of those two kernels on my g45. I am loathe to write off a 25% performance hit without understanding it, but we may never find the cause. :(
Comment 4 Chris Wilson 2010-07-19 04:51:24 UTC
Using benchmark/poppler.trace on g45:

[2.6.30.1] d2a55f12b2425ed1febb9e29e10bb405b91aa8dd - 8.640s
[drm-intel-next, 2.6.35-rc4+] d44a78e83f7549b3c4ae611e667a0db265cf2e00 - 8.460
[fair-eviction, 2.6.35-rc5+] - 9.040s

So I can not find evidence to support the argument that the mysterious slowdown is still present in current kernels. Though on this particular setup moving to a LRU fair-eviction scheme to avoid the page-fault-of-doom livelock will hurt.
Comment 5 Chris Wilson 2010-07-19 05:11:36 UTC
Even more bizarrely, I find the behaviour was reversed:

[2.6.31-rc2, bad!] 2a2430f4542467502d39660bfd66b0004fd8d6a - 7.929s

Lets investigate a bit further.
Comment 6 Chris Wilson 2010-07-19 06:52:51 UTC
However, I've hit a kernel panic in the midst of the bisect with only 2268 revisions left to test. Not fun.
Comment 7 Chris Wilson 2010-07-19 09:15:31 UTC
So for completeness the cause of the speed-up on my system was:

commit 52dc7d32b88156248167864f77a9026abe27b432
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 6 09:46:01 2009 +0100

    drm/i915: Clear fence register on tiling stride change.
    
    The fence register value also depends upon the stride of the object, so we
    need to clear the fence if that is changed as well.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    [anholt: Added 8xx and 965 paths, and renamed the confusing
    i915_gem_object_tiling_ok function to i915_gem_object_fence_offset_ok]
    Signed-off-by: Eric Anholt <eric@anholt.net>

Quite puzzling. Does anyone volunteer to bisect between 52dc7d and 2.6.35 to find the lost perfomance?
Comment 8 zhao jian 2010-07-20 02:06:49 UTC
(In reply to comment #7)
> So for completeness the cause of the speed-up on my system was:
> commit 52dc7d32b88156248167864f77a9026abe27b432
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sat Jun 6 09:46:01 2009 +0100
>     drm/i915: Clear fence register on tiling stride change.
>     The fence register value also depends upon the stride of the object, so we
>     need to clear the fence if that is changed as well.
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     [anholt: Added 8xx and 965 paths, and renamed the confusing
>     i915_gem_object_tiling_ok function to i915_gem_object_fence_offset_ok]
>     Signed-off-by: Eric Anholt <eric@anholt.net>
> Quite puzzling. Does anyone volunteer to bisect between 52dc7d and 2.6.35 to
> find the lost perfomance?

I tested with poppler.trace on G41, with kernel 2.6.34 it is 9.302s, with 2.6.31 it is 8.533s and with 2.6.32 it is 9.732. So I think it may be regressed between 2.6.31 and 2.6.32. And I will do more test tomorrow.
Comment 9 Jari Tahvanainen 2016-11-03 10:05:27 UTC
Closing verified+fixed. No activity on ~6 years.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.