Bug 87131 - [PNV regression] igt/gem_exec_lut_handle takes more than 10 minutes
Summary: [PNV regression] igt/gem_exec_lut_handle takes more than 10 minutes
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: PatchMerged
Keywords: regression
Depends on:
Blocks:
 
Reported: 2014-12-09 03:06 UTC by lu hua
Modified: 2017-07-05 19:32 UTC (History)
2 users (show)

See Also:
i915 platform: ALL
i915 features: GEM/PPGTT


Attachments
dmesg (115.31 KB, text/plain)
2014-12-09 03:06 UTC, lu hua
no flags Details

Description lu hua 2014-12-09 03:06:30 UTC
Created attachment 110601 [details]
dmesg

==System Environment==
--------------------------
Regression: not sure, it has bug 78023

Non-working platforms:PNV

==kernel==
--------------------------
drm-intel-nightly/bfdd01aa1825aa0068f9236b21362b550f6d630f
drm-intel-next-queued/bcfcc8ba29c933ee98c7f498ee7ccfa17be400dd
drm-intel-fixes/b0616c5306b342ceca07044dbc4f917d95c4f825

==Bug detailed description==
-----------------------------
It happens on PNV with drm-intel-nightly, drm-intel-next-queued and drm-intel-fixes kernel. It has bug 78023, I am unable to find out a good commit between latest commit and bug 78023 fixed commit.
output:
[root@x-pnv2 tests]# time ./gem_exec_lut_handle
IGT-Version: 1.8-g819e68f (i686) (Linux: 3.18.0_drm-intel-nightly_34d267_20141209+ i686)
relocation: buffers=   1: old=   8299 + 531.5*reloc, lut=   8555 + 514.9*reloc (ns)
relocation: buffers=   2: old=   9516 + 533.0*reloc, lut=  10054 + 519.0*reloc (ns)
relocation: buffers=   4: old=  10180 + 533.5*reloc, lut=  10201 + 516.7*reloc (ns)
relocation: buffers=   8: old=  12547 + 534.7*reloc, lut=  12365 + 517.0*reloc (ns)
relocation: buffers=  16: old=  17163 + 537.4*reloc, lut=  17001 + 520.6*reloc (ns)
relocation: buffers=  32: old=  27241 + 540.0*reloc, lut=  26238 + 523.0*reloc (ns)
relocation: buffers=  64: old=  55244 + 546.7*reloc, lut=  53265 + 526.8*reloc (ns)
relocation: buffers= 128: old= 112780 + 561.8*reloc, lut= 107378 + 546.3*reloc (ns)
relocation: buffers= 256: old= 229777 + 574.0*reloc, lut= 221390 + 549.4*reloc (ns)
relocation: buffers= 512: old= 460770 + 601.9*reloc, lut= 446514 + 571.0*reloc (ns)
relocation: buffers=1024: old=1440171 + 665.0*reloc, lut=1390482 + 623.0*reloc (ns)
relocation: buffers=2048: old=4711207 + 751.2*reloc, lut=4627285 + 669.5*reloc (ns)
skip-relocs: buffers=   1: old=   7186 + 121.8*reloc, lut=   7089 + 112.3*reloc (ns)
skip-relocs: buffers=   2: old=   7886 + 121.8*reloc, lut=   7798 + 112.3*reloc (ns)
skip-relocs: buffers=   4: old=   9128 + 121.8*reloc, lut=   9022 + 112.4*reloc (ns)
skip-relocs: buffers=   8: old=  11445 + 121.9*reloc, lut=  11192 + 112.5*reloc (ns)
skip-relocs: buffers=  16: old=  15966 + 122.1*reloc, lut=  15602 + 112.6*reloc (ns)
skip-relocs: buffers=  32: old=  26033 + 123.7*reloc, lut=  25124 + 113.1*reloc (ns)
skip-relocs: buffers=  64: old=  54052 + 130.7*reloc, lut=  51915 + 117.4*reloc (ns)
skip-relocs: buffers= 128: old= 112102 + 144.3*reloc, lut= 108181 + 129.6*reloc (ns)
skip-relocs: buffers= 256: old= 225810 + 165.3*reloc, lut= 221114 + 142.2*reloc (ns)
skip-relocs: buffers= 512: old= 450470 + 188.8*reloc, lut= 439379 + 169.8*reloc (ns)
skip-relocs: buffers=1024: old=1420360 + 244.3*reloc, lut=1375216 + 215.9*reloc (ns)
skip-relocs: buffers=2048: old=4695973 + 329.9*reloc, lut=4606190 + 264.9*reloc (ns)
no-relocs: buffers=   1: old=   7214 + 51.5*reloc, lut=   7227 + 44.1*reloc (ns)
no-relocs: buffers=   2: old=   7361 + 53.6*reloc, lut=   7908 + 44.1*reloc (ns)
no-relocs: buffers=   4: old=   8986 + 51.6*reloc, lut=   8899 + 44.6*reloc (ns)
no-relocs: buffers=   8: old=  12873 + 51.0*reloc, lut=  13569 + 43.3*reloc (ns)
no-relocs: buffers=  16: old=  15422 + 51.7*reloc, lut=  15119 + 44.4*reloc (ns)
no-relocs: buffers=  32: old=  24774 + 51.8*reloc, lut=  23764 + 45.1*reloc (ns)
no-relocs: buffers=  64: old=  52149 + 51.7*reloc, lut=  50101 + 44.4*reloc (ns)
no-relocs: buffers= 128: old= 108474 + 52.2*reloc, lut= 104396 + 44.8*reloc (ns)
no-relocs: buffers= 256: old= 219645 + 56.0*reloc, lut= 211694 + 47.6*reloc (ns)
no-relocs: buffers= 512: old= 432975 + 70.8*reloc, lut= 420700 + 60.7*reloc (ns)
no-relocs: buffers=1024: old=1363594 + 93.7*reloc, lut=1310922 + 77.0*reloc (ns)
no-relocs: buffers=2048: old=4591523 + 64.0*reloc, lut=4497553 + 46.4*reloc (ns)

real    11m30.839s
user    0m31.562s
sys     10m57.773s


==Reproduce steps==
---------------------------- 
1. ./gem_exec_lut_handle
Comment 1 Chris Wilson 2014-12-09 07:41:26 UTC
It is worth checking if the reloc time regressed. What is the output on drm-intel-fixes?
Comment 2 lu hua 2014-12-09 08:40:31 UTC
Test on latest drm-intel-fixes kernel, it takes 9+ minutest.
output:
[root@x-pnv2 tests]# time ./gem_exec_lut_handle
IGT-Version: 1.8-g819e68f (i686) (Linux: 3.18.0-rc7_drm-intel-fixes_b0616c_20141209+ i686)
relocation: buffers=   1: old=   8081 + 512.2*reloc, lut=   8002 + 499.5*reloc (ns)
relocation: buffers=   2: old=   5679 + 524.5*reloc, lut=   8733 + 499.8*reloc (ns)
relocation: buffers=   4: old=  10113 + 512.1*reloc, lut=   9883 + 499.5*reloc (ns)
relocation: buffers=   8: old=  11974 + 510.9*reloc, lut=  12842 + 502.0*reloc (ns)
relocation: buffers=  16: old=  16232 + 516.0*reloc, lut=  15976 + 499.5*reloc (ns)
relocation: buffers=  32: old=  25303 + 517.1*reloc, lut=  24376 + 507.0*reloc (ns)
relocation: buffers=  64: old=  50120 + 525.1*reloc, lut=  47786 + 512.3*reloc (ns)
relocation: buffers= 128: old= 104995 + 538.5*reloc, lut= 101171 + 523.2*reloc (ns)
relocation: buffers= 256: old= 220469 + 549.6*reloc, lut= 208159 + 533.4*reloc (ns)
relocation: buffers= 512: old= 430050 + 567.2*reloc, lut= 417591 + 548.2*reloc (ns)
relocation: buffers=1024: old=1143818 + 623.7*reloc, lut=1097252 + 593.9*reloc (ns)
relocation: buffers=2048: old=3667335 + 692.3*reloc, lut=3575493 + 641.8*reloc (ns)
skip-relocs: buffers=   1: old=   7056 + 119.1*reloc, lut=   6876 + 109.7*reloc (ns)
skip-relocs: buffers=   2: old=   7893 + 119.2*reloc, lut=   7745 + 109.7*reloc (ns)
skip-relocs: buffers=   4: old=   8921 + 119.2*reloc, lut=   8727 + 109.7*reloc (ns)
skip-relocs: buffers=   8: old=  10384 + 121.8*reloc, lut=  10706 + 109.8*reloc (ns)
skip-relocs: buffers=  16: old=  15339 + 119.5*reloc, lut=  15719 + 113.2*reloc (ns)
skip-relocs: buffers=  32: old=  24206 + 119.9*reloc, lut=  23253 + 110.3*reloc (ns)
skip-relocs: buffers=  64: old=  49104 + 124.3*reloc, lut=  46761 + 114.9*reloc (ns)
skip-relocs: buffers= 128: old= 103890 + 137.0*reloc, lut=  99817 + 126.9*reloc (ns)
skip-relocs: buffers= 256: old= 213578 + 149.1*reloc, lut= 205201 + 138.5*reloc (ns)
skip-relocs: buffers= 512: old= 424448 + 166.9*reloc, lut= 412640 + 155.1*reloc (ns)
skip-relocs: buffers=1024: old=1131231 + 224.9*reloc, lut=1089359 + 203.7*reloc (ns)
skip-relocs: buffers=2048: old=3661083 + 287.5*reloc, lut=3570173 + 252.8*reloc (ns)
no-relocs: buffers=   1: old=   6874 + 51.5*reloc, lut=   6895 + 44.1*reloc (ns)
no-relocs: buffers=   2: old=   7634 + 51.5*reloc, lut=   7636 + 44.1*reloc (ns)
no-relocs: buffers=   4: old=   8678 + 51.5*reloc, lut=   8646 + 44.2*reloc (ns)
no-relocs: buffers=   8: old=  10647 + 51.6*reloc, lut=  10483 + 44.2*reloc (ns)
no-relocs: buffers=  16: old=  14610 + 51.6*reloc, lut=  14261 + 44.3*reloc (ns)
no-relocs: buffers=  32: old=  23059 + 51.6*reloc, lut=  22257 + 44.4*reloc (ns)
no-relocs: buffers=  64: old=  46940 + 51.5*reloc, lut=  44469 + 44.3*reloc (ns)
no-relocs: buffers= 128: old= 100053 + 51.7*reloc, lut=  96143 + 44.3*reloc (ns)
no-relocs: buffers= 256: old= 204337 + 53.3*reloc, lut= 196056 + 45.1*reloc (ns)
no-relocs: buffers= 512: old= 414929 + 62.0*reloc, lut= 402343 + 70.6*reloc (ns)
no-relocs: buffers=1024: old=1107129 + 23.3*reloc, lut=1046326 + 28.8*reloc (ns)
no-relocs: buffers=2048: old=3537830 + 78.3*reloc, lut=3478675 + 47.5*reloc (ns)

real    9m51.192s
user    0m31.725s
sys     9m4.004s
Comment 3 Chris Wilson 2014-12-09 08:49:39 UTC
So handling of execbuffer buffer list is slower by about 10-20% and the relocation list by about 5-10%.
Comment 4 Chris Wilson 2015-04-08 18:29:03 UTC
A second sore point is the i915_gem_obj_to_vma(). I have a cunning plan, all I need is a turnip.
Comment 5 Chris Wilson 2015-04-08 20:16:21 UTC
On my pnv, I am down from 770s to 419s. So let's look forward to some improvements here.
Comment 6 ye.tian 2015-04-09 06:18:53 UTC
Tested on the latest fixed kernel, it takes 13+ minutes.

output:
-------------------
root@x-pnv1:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_exec_lut_handle
IGT-Version: 1.10-g1f6a64e (x86_64) (Linux: 4.0.0-rc6_drm-intel-fixes_840a1c_20150403+ x86_64)
relocation: buffers=   1: old=  12972 + 215.4*reloc, lut=  12751 + 215.1*reloc (ns)
relocation: buffers=   2: old=  13087 + 217.3*reloc, lut=  13860 + 215.0*reloc (ns)
relocation: buffers=   4: old=  16865 + 216.3*reloc, lut=  17707 + 214.9*reloc (ns)
relocation: buffers=   8: old=  19819 + 217.5*reloc, lut=  19344 + 217.1*reloc (ns)
relocation: buffers=  16: old=  26013 + 225.4*reloc, lut=  26422 + 220.2*reloc (ns)
relocation: buffers=  32: old=  46689 + 237.7*reloc, lut=  45972 + 233.7*reloc (ns)
relocation: buffers=  64: old=  87659 + 257.6*reloc, lut=  84603 + 252.0*reloc (ns)
relocation: buffers= 128: old= 168278 + 276.7*reloc, lut= 163484 + 272.5*reloc (ns)
relocation: buffers= 256: old= 377065 + 316.1*reloc, lut= 366626 + 300.5*reloc (ns)
relocation: buffers= 512: old=1194388 + 137.7*reloc, lut=1164111 + 189.0*reloc (ns)
relocation: buffers=1024: old=2440147 + 375.8*reloc, lut=2417108 + 306.7*reloc (ns)
relocation: buffers=2048: old=5664382 + 614.0*reloc, lut=5624147 + 406.1*reloc (ns)
skip-relocs: buffers=   1: old=   7842 + 49.6*reloc, lut=   7816 + 46.1*reloc (ns)
skip-relocs: buffers=   2: old=   8569 + 49.6*reloc, lut=   8663 + 46.0*reloc (ns)
skip-relocs: buffers=   4: old=   9556 + 49.6*reloc, lut=   9575 + 46.1*reloc (ns)
skip-relocs: buffers=   8: old=  11805 + 49.6*reloc, lut=  11644 + 46.2*reloc (ns)
skip-relocs: buffers=  16: old=  16352 + 49.7*reloc, lut=  16308 + 46.2*reloc (ns)
skip-relocs: buffers=  32: old=  29442 + 52.1*reloc, lut=  28865 + 47.8*reloc (ns)
skip-relocs: buffers=  64: old=  61269 + 65.5*reloc, lut=  60698 + 58.3*reloc (ns)
skip-relocs: buffers= 128: old= 119237 + 80.7*reloc, lut= 117896 + 73.0*reloc (ns)
skip-relocs: buffers= 256: old= 236430 + 98.5*reloc, lut= 233710 + 88.2*reloc (ns)
skip-relocs: buffers= 512: old= 594453 + 144.7*reloc, lut= 586675 + 116.5*reloc (ns)
skip-relocs: buffers=1024: old=2416857 + 256.1*reloc, lut=2387079 + 188.5*reloc (ns)
skip-relocs: buffers=2048: old=5653596 + 466.5*reloc, lut=5610695 + 287.8*reloc (ns)
no-relocs: buffers=   1: old=   8168 + 0.2*reloc, lut=   8378 + 0.1*reloc (ns)
no-relocs: buffers=   2: old=   8252 + 0.2*reloc, lut=   8338 + 0.2*reloc (ns)
no-relocs: buffers=   4: old=   9286 + 0.2*reloc, lut=   9216 + 0.2*reloc (ns)
no-relocs: buffers=   8: old=  11238 + 0.4*reloc, lut=  11268 + 0.1*reloc (ns)
no-relocs: buffers=  16: old=  15609 + 0.1*reloc, lut=  15421 + 0.1*reloc (ns)
no-relocs: buffers=  32: old=  27898 + 0.2*reloc, lut=  27345 + 0.2*reloc (ns)
no-relocs: buffers=  64: old=  58729 + 0.1*reloc, lut=  58143 + 0.1*reloc (ns)
no-relocs: buffers= 128: old= 114477 + 0.1*reloc, lut= 112812 + 0.3*reloc (ns)
no-relocs: buffers= 256: old= 227536 + 0.2*reloc, lut= 224279 + 0.3*reloc (ns)
no-relocs: buffers= 512: old= 569191 + 0.9*reloc, lut= 559735 + 1.2*reloc (ns)
no-relocs: buffers=1024: old=2362485 + 3.7*reloc, lut=2327493 + 4.8*reloc (ns)
no-relocs: buffers=2048: old=5496344 + -2.3*reloc, lut=5443557 + -2.0*reloc (ns)
SUCCESS (801.260s)

real    13m21.287s
user    0m3.221s
sys     13m16.822s
Comment 7 Humberto Israel Perez Rodriguez 2015-09-23 16:35:04 UTC
still happen on SKL-Y with the latest kernel :

Kernel commit log:
commit b4c4542ba1abfb0d3d6913504502573bf2c62b12
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Aug 28 15:51:30 2015 +0200
drm-intel-nightly: 2015y-08m-28d-13h-50m-34s UTC integration manifest
Comment 8 cprigent 2015-10-01 15:48:25 UTC
Bug scrub:
Test is long but result is Pass, priority updated to medium/normal.
Comment 9 maria guadalupe 2016-05-27 16:34:12 UTC
This is also reproduced on KBL with the next configuration

Gfx stack information
===============================================
--> Component : drm 
                 tag : libdrm-2.4.68-4-g7aab852 
                 commit : 7aab852 
 --> Component : mesa 
                 tag : mesa-11.1.2 
                 commit : 7bcd827 
 --> Component : cairo 
                 tag : 1.15.2 
                 commit : db8a7f1 
 --> Component : intel-gpu-tools
                tag : intel-gpu-tools-1.14-346-gcce2ff0
                commit : cce2ff0


Software information
============================

Kernel version                      : 4.6.0-nightly+
Linux distribution                  : Ubuntu 15.10
Architecture                        : 64-bit
Bios revision                       : 28.1
KSC revision                        : 1.15


Hardware information 
=============================

Platform                            : SKL-Y to KBL (RVP3)
Motherboard model                   : Kabylake Client platform
Motherboard type                    : Skylake Y LPDDR3 RVP3 Laptop
Motherboard manufacturer            : Intel Corporation
CPU family                          : Other
CPU information                     : Genuine Intel(R) CPU 0000 @ 0.90GHz
GPU Card                            : Intel Corporation Device 591e (prog-if 00 [VGA controller])


|=== kernel information ===|

commit 2ec823981d62c56d1511bda42b8295e31ece800f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun May 22 18:23:13 2016 +0200


========== test cases failed ===============
igt@gem_exec_lut_handle
Comment 10 yann 2016-08-04 10:22:42 UTC
Is this is still an issue? I am noticing that the original bug was reported against PNV with some regression data. Is it still occurring on PNV?

However on SKL & KBL, we are only considering the overall time and no real regression data. So removing for now SKL & KBL from the current platforms until we have concrete data here.
Comment 11 Chris Wilson 2016-08-04 10:38:26 UTC
Yes, the regression is still present. And that regression impacts all machines, though there is a separate regression for Baytrail that nobody else appears to have noticed.
Comment 12 Chris Wilson 2017-02-10 11:57:50 UTC
No need for bisecting, the original cause was the double lookup of handle to vma. Patches are still pending (though some are already in place).
Comment 13 Jari Tahvanainen 2017-04-25 06:16:46 UTC
Chris - any progress/news on patches still pending?
Comment 14 Ricardo 2017-05-09 17:30:29 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 15 Chris Wilson 2017-06-16 16:24:44 UTC
Two major improvements landed:

commit 4ff4b44cbb70c269259958cbcc48d7b8a2cb9ec8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 16 15:05:16 2017 +0100

    drm/i915: Store a direct lookup from object handle to vma

commit 2889caa9232109afc8881f29a2205abeb5709d0c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 16 15:05:19 2017 +0100

    drm/i915: Eliminate lots of iterations over the execobjects array

which used to be enough to not only fix the original regression but make a substantial improvement.

However, since the regression was noticed, I added two more iterations to the test (to exercise different paths through reloc) and we now interact with dma-fence and reservation_objects which are a huge step backwards.

Resolving as the original regression is fixed. I've some patches to improve scalabiltiy of reservation_objects, but the main cost of them are the atomics we so painstakingly avoided previously (because of this bug highlighting how expensive they were along this path).
Comment 16 Elizabeth 2017-06-16 21:34:26 UTC
In order to be able to close this bug, please test and share results. Thank you.
Comment 17 Ricardo 2017-06-30 23:52:32 UTC
Will close this issue as we do not have PNV to verify
Comment 18 Chris Wilson 2017-07-01 07:52:09 UTC
(In reply to Ricardo from comment #17)
> Will close this issue as we do not have PNV to verify

It was never PNV specific.

(In reply to Elizabeth from comment #16)
> In order to be able to close this bug, please test and share results. Thank
> you.

That is your job. I have given you the tools....
Comment 19 Ricardo Madrigal 2017-07-05 19:32:05 UTC
Works fine with the following configuration on BXT, GLK, KBL, SKL, BDW, BSW, HSW IVB and BYT.

This is my Configuration:

======================================
                 Kernel
======================================
commit df0182c2c95385492772c6e4ace76b463298b8ca
Author:     Jani Nikula <jani.nikula@intel.com>
AuthorDate: Mon Jul 3 16:20:53 2017 +0300
Commit:     Jani Nikula <jani.nikula@intel.com>
CommitDate: Mon Jul 3 16:20:53 2017 +0300

drm-tip: 2017y-07m-03d-13h-20m-24s UTC integration manifest


======================================
        Graphic drivers
======================================
Component: drm
    tag: libdrm-2.4.81-24-g3095cc8
    commit: 3095cc8eaba1aa87ad38c04ae2b1eabe30f7e16c

Component: cairo
    tag: 1.15.6-2-g57b4050
    commit: 57b40507dda3f58dfc8635548d606b86dc7bcf51

Component: intel-gpu-tools
    tag: intel-gpu-tools-1.19-57-g6fcc8e8
    commit: 6fcc8e8b247661c7950b998e0b95141ffbd6b833

Component: piglit
    tag: piglit-v1
    commit: c8f4fd9eeb298a2ef0855927f22634f794ef3eff
==========================================


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.