Bug 77957

Summary: Variably-indexed constant arrays result in terrible shader code
Product: Mesa Reporter: Kenneth Graunke <kenneth>
Component: Drivers/DRI/i965Assignee: Kenneth Graunke <kenneth>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: enhancement    
Priority: medium CC: eero.t.tamminen, kenneth
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 77547    
Attachments: First cut at a patch to implement lowering const arrays to uniforms

Description Kenneth Graunke 2014-04-26 06:50:16 UTC
Our vec4 backend generates rubbish for code such as:

   const ivec2 offsets[] =
      ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1), ivec2(0, -1),
              ivec2(0, 0), ivec2(0, 1), ivec2(1, -1), ivec2(1, 0), ivec2(1, 1));

   ivec2 offset = offsets[<non-constant expression>];

The constant array is stored as a temporary value in ordinary VGRFs, and then demoted to scratch due to the variable indexing.

This means:
1. Piles of MOVs from immediates into registers (not to mention wasted registers!)
2. Piles of scratch writes to move those registers into scratch.
3. Finally, a scratch read for the access.

We could instead demote constant arrays to uniforms.  This would eliminate the scratch writes, moves, and wasted registers.  The pull load would be approximately the same cost as the scratch read.

If we see constant arrays in the backend, I believe we can assume that they're variably-indexed...otherwise opt_array_splitting would have handled them.

I'm not sure whether to implement this at the GLSL IR level or in the backend.
Comment 1 Kenneth Graunke 2014-04-26 08:12:30 UTC
Created attachment 98008 [details] [review]
First cut at a patch to implement lowering const arrays to uniforms

Here's a first cut at a patch.  I haven't run Piglit on it yet; there are probably some cases that are broken.

In a terrain rendering microbenchmark, this cuts the number of VS instructions by nearly half, and approximately doubles the performance.
Comment 2 Kenneth Graunke 2014-04-27 06:56:23 UTC
I talked with Ilia Mirkin about this tonight on #dri-devel.  Apparently, this pattern results in some pretty ridiculous TGSI and nv50 assembly as well, so it sounds like applying this transformation would be worthwhile for everybody.  We probably don't need to make it optional.
Comment 3 Kenneth Graunke 2014-10-09 13:56:01 UTC
*** Bug 84758 has been marked as a duplicate of this bug. ***
Comment 4 Kenneth Graunke 2014-10-09 13:57:44 UTC
Jason is planning on implementing indirect addressing support for arrays, which should fix this.
Comment 5 Kenneth Graunke 2014-11-07 07:58:30 UTC
Fixed - with my new lowering pass, it no longer generates terrible code.

commit 4f22db5fbbe59eacb762aa410f18c3078e85c2b7
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Apr 26 00:18:54 2014 -0700

    glsl: Lower constant arrays to uniform arrays.
    
    Consider GLSL code such as:
    
       const ivec2 offsets[] =
          ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
                  ivec2(0, -1),  ivec2(0, 0),  ivec2(0, 1),
                  ivec2(1, -1),  ivec2(1, 0),  ivec2(1, 1));
    
       ivec2 offset = offsets[<non-constant expression>];
    
    Both i965 and nv50 currently handle this very poorly.  On i965, this
    becomes a pile of MOVs to load the immediate constants into registers,
    a pile of scratch writes to move the whole array to memory, and one
    scratch read to actually access the value - effectively the same as if
    it were a non-constant array.
    
    We'd much rather upload large blocks of constant data as uniform data,
    so drivers can simply upload the data via constbufs, and not have to
    populate it via shader instructions.
    
    This is currently non-optional because both i965 and nouveau benefit
    from it, and according to Marek radeonsi would benefit today as well.
    (According to Tom, radeonsi may want to handle this itself in the long
    term, but we can always add a flag when it becomes useful.)
    
    Improves performance in a terrain rendering microbenchmark by about 2x,
    and cuts the number of instructions in about half.  Helps a lot of
    "Natural Selection 2" shaders, as well as one "HOARD" shader.
    
    total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
    instructions in affected programs:     5880 -> 4186 (-28.81%)
    
    v2: Use ir_var_hidden to avoid exposing the new uniform via the GL
        uniform introspection API.
    
    v3: Alphabetize Makefile.sources properly.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.