Summary: | Over 15% performance lost on large branching shader | ||
---|---|---|---|
Product: | Mesa | Reporter: | Kevin Rogovin <kevin.rogovin> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED DUPLICATE | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=110344 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
output from Mesa 18.2
output from Mesa Git |
Description
Kevin Rogovin
2019-04-12 11:23:10 UTC
Created attachment 143949 [details]
output from Mesa Git
Hi Kevin I've compiled the test and run it, but I'm not sure how to compare FPS. How did you check them? Did you use special tools for it or some flag in test or something else? Press the "L" key (atleast on US Keyboards) to bring up a jazz with FPS and other things. At startup, a list of all what all key presses are printed to stdout. If you are sufficiently masochistic, yo can run the program with the single command line argument "--help" to see all command line options. Just to make sure all is good, did the demos as-is draw a wall of text to the screen? -Kevin Looking at the attached shader assembly... Mesa 18.2: SIMD8 shader: 2413 instructions. 11 loops. 131452 cycles. 0:0 spills:fills. Promoted 15 constants. Compacted 38608 to 27856 bytes (28%) Mesa git: SIMD8 shader: 2388 instructions. 11 loops. 120307 cycles. 0:0 spills:fills. Promoted 14 constants. Compacted 38208 to 27392 bytes (28%) => Both versions reach only SIMD8 and new version uses less instructions. Loops in git version are shorter, except for last two which are marginally longer: Mesa 18.2: while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -296 { align1 1Q }; while(8) JIP: -4496 { align1 1Q }; while(8) JIP: -1136 { align1 1Q }; while(8) JIP: -1136 { align1 1Q }; Mesa git: while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -288 { align1 1Q }; while(8) JIP: -4424 { align1 1Q }; while(8) JIP: -1144 { align1 1Q }; while(8) JIP: -1144 { align1 1Q }; At maximum, old code seems to have 61 live regs, new one 62. Both Mesa and my own (crappy) ISA analyzer think that the new version (which has more lrp & mad reg bank conflicts) should use less cycles, but in branching code that can't really be predicted as it depends so much on which branches get selected. For this test, what branches that gets hit are all the same. Did you get the demo to run to verify the performance drop? if you add "painter_use_uber_item_shader false" to the command line, that should make the shader much less uber-ish for analysis (though I confess I have not compared the benchmark numbers for this case yet). Hi, Apparently I added a show_framerate option which prints to stdout the average frametime across all frames. To use it, add "show_framerate true" to the command line. If one pulls (i.e. git commit 203b84c336c0c013cae670766182c5ea81cd0711 or newer) there is a "warm-up counter" to avoid including in the average the first few N-frames. -Kevin Hi guys Kevin, thanks for the tip - it works. I've bisected the mesa between mesa-18.2.8(785e09e3b3) and latest master version of Mesa (04e672257c) on Skylake with IntelĀ® HD Graphics 520. Bisect brought me to the commit a920979d4f30a48a23f8ff375ce05fa8a947dd96 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Nov 16 10:46:27 2018 -0600 intel/fs: Use split sends for surface writes on gen9+ Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> commit a920979d4f30a48a23f8ff375ce05fa8a947dd96 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Nov 16 10:46:27 2018 -0600 intel/fs: Use split sends for surface writes on gen9+ Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bad commits had 60 FPS, good commits had 70 FPS on my machine. Thankyou for the work of finding the offending commit! I confess though, this leaves even more mysteries since the commit message stats the change is only for surface write messages and the shaders in the benchmark should only have surface writes only at the very end: writing to the render target (dual-src). Hopefully, someone from the Intel Mesa team will pick this up and investigate. Hi guys Looks like, that it's a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=110344 Jason has described all scope of the work in it. I'm adding a ticket to 'see also' section. *** This bug has been marked as a duplicate of bug 110344 *** *** This bug has been marked as a duplicate of bug 109507 *** *** This bug has been marked as a duplicate of bug 109517 *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.