Bug 109517

Summary:	[GEN9+] 14-24% perf drop in SynMark2 v7 CSDof
Product:	Mesa	Reporter:	Eero Tamminen <eero.t.tamminen>
Component:	Drivers/DRI/i965	Assignee:	Jason Ekstrand <jason>
Status:	RESOLVED MOVED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium	CC:	fdbugs, kenneth, kevin.rogovin, mattst88
Version:	git
Hardware:	Other
OS:	All
See Also:	https://bugs.freedesktop.org/show_bug.cgi?id=110745
Whiteboard:
i915 platform:		i915 features:

Description Eero Tamminen 2019-01-31 11:34:53 UTC

Mesa performance in SynMark2 v7 CSDof test dropped 24% on SKL GT4e and 14-20% on other GEN9+ platforms.

Setup:
- Ubuntu 18.04
- Unity / compiz desktop
- Mesa from git
- X server and drm-tip kernel from git within last 1-2 months, with modifier support enabled
- FullHD monitor

Test-case:
- synmark2 OglCSDof
  (configured to run in fullscreen FullHD resolution.)

Regression happened between following commits:
* 2019-01-28 17:50:08 41a0acd6a1: Switch imx to kmsro and remove the imx winsys
* 2019-01-30 17:49:45 f4eb746ef7: r600: add -Wstrict-overflow=0 to meson to silence the warning

CSDof compute shaders register spill, so a change to register usage (e.g. SENDS support which is GEN9+ specific) is likely cause for the perf regression.

Comment 1 Eero Tamminen 2019-02-01 09:09:37 UTC

FYI: None of the few other compute shader tests for which I have data are impacted, but they don't spill either.

Comment 2 Mark Janes 2019-02-02 01:32:32 UTC

bisected to series ending in:

a920979d4f30a48a23f8ff375ce05fa8a947dd96
Author:     Jason Ekstrand <jason@jlekstrand.net>
intel/fs: Use split sends for surface writes on gen9+

Surface reads don't need them because they just have the one address
payload.  With surface writes, on the other hand, we can put the address
and the data in the different halves and avoid building the payload all
together.

The decrease in register pressure and added freedom in register
allocation resulting from this change reduces spilling enough to improve
the performance of one customer benchmark by about 2x.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Comment 3 Eero Tamminen 2019-02-11 16:59:35 UTC

There was a very small (<1%) drop also in Sacha Willems' Vulkan Raytracing demo.

Comment 4 Jason Ekstrand 2019-02-11 17:30:02 UTC

Ugh...  I'm not really sure what we should do about this one.  Mark's bisect is exactly correct.  I've looked at the shaders, and there seems to be two issues:

 1) There's one SIMD8 shader which schedules massively differently for no apparent reason.

 2) There's a SIMD16 shader which starts spilling way more than it was before

In both cases, I have no idea why it's happening beyond the fact that our current RA and scheduling has rather random behaviour at times.  Using SENDS should only ever decrease register pressure and increase RA freedom because it no longer has to build the message into a single hunk and can just send the two bits (address and data) separately.

As I said in the commit message I have another (unfortunately not public yet) customer workload where the opposite happens and using SENDS decreases spilling and improves performance by 2x.

Ken, Matt, Any thoughts?

Comment 5 Eero Tamminen 2019-02-12 09:30:00 UTC

(In reply to Jason Ekstrand from comment #4)
> Ugh...  I'm not really sure what we should do about this one.  Mark's bisect
> is exactly correct.  I've looked at the shaders, and there seems to be two
> issues:
> 
>  1) There's one SIMD8 shader which schedules massively differently for no
> apparent reason.
>
>  2) There's a SIMD16 shader which starts spilling way more than it was before

Based on your comment below I assume that SIMD8 shader also got worse, but does it also spill?  I.e. is the bad behavior limited to spilling shaders?

 
> In both cases, I have no idea why it's happening beyond the fact that our
> current RA and scheduling has rather random behaviour at times.  Using SENDS
> should only ever decrease register pressure and increase RA freedom because
> it no longer has to build the message into a single hunk and can just send
> the two bits (address and data) separately.
> 
> As I said in the commit message I have another (unfortunately not public
> yet) customer workload where the opposite happens and using SENDS decreases
> spilling and improves performance by 2x.

SENDS support is needed performance feature.  If the current implementation improves things more than it regresses, and especially if the improving cases are more important like here, I think letting regression in for the release is fine.

There could be some meta-bug about the RA / scheduler related issues which this (and e.g. bugs about bad sampler fetch scheduling) would link to though.

Comment 6 Jason Ekstrand 2019-02-14 17:20:19 UTC

We had a discussion about this today and determined that we'd leave the bug open but drop it from the 19.0 release tracker.

Comment 7 Eero Tamminen 2019-04-17 12:45:43 UTC

Perf improved 5-10% (depending on platform) between following commits:
2019-02-28 17:30:48 df5cd51259: gitlab-ci: install xmllint to validate 00-mesa-defaults.conf
2019-03-01 16:46:32 fc82ea1350: Revert "swr/rast: Archrast codegen updates"

I assume improvement comes from the nir/copy_prop_vars series.

Comment 8 Jason Ekstrand 2019-04-17 14:35:33 UTC

(In reply to Eero Tamminen from comment #7)
> I assume improvement comes from the nir/copy_prop_vars series.

Very likely.

Comment 9 Jason Ekstrand 2019-04-17 14:36:08 UTC

*** Bug 110344 has been marked as a duplicate of this bug. ***

Comment 10 Jason Ekstrand 2019-04-17 14:36:59 UTC

*** Bug 110412 has been marked as a duplicate of this bug. ***

Comment 11 GitLab Migration User 2019-09-25 20:29:32 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1786.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.