Bug 23743

Summary:	For loop from 0 to 0 not optimized out
Product:	Mesa	Reporter:	Hans Nieser <hans>
Component:	Mesa core	Assignee:	Ian Romanick <idr>
Status:	RESOLVED FIXED	QA Contact:
Severity:	normal
Priority:	medium
Version:	git
Hardware:	All
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	29044
Attachments:	phong vertex shader phong pixel shader

Description Hans Nieser 2009-09-06 08:12:23 UTC

Created attachment 29268 [details]
phong vertex shader

I have a basic phong GLSL shader (attached) that has a for loop in its pixel shader. It loops until MAX_LIGHTS, which in my case on Intel hardware is #defined to be 0, i.e. it should just ignore the for loop entirely. It is not however and during compilation it notes "Note: 'for (i ... )' body is too large/complex to unroll", and then seems to run the loop body at least once anyway since all my lighting is off (makes sense as I didn't set any parameters for the OpenGL lights), and performance plummets. If I just remove the entire loop (#ifdef 0/#endif) the lighting is fine and FPS is much higher.

Obviously it is simple to just put #if MAX_LIGHTS > 0 around it as a workaround but I guess this might be a problem for other applications so I thought I would report it anyway.

I am using Mesa from git (latest pull may be a few days ago at time of writing) with an Intel® GM45 Express integrated GPU

Comment 1 Hans Nieser 2009-09-06 08:12:48 UTC

Created attachment 29269 [details]
phong pixel shader

Comment 2 Ian Romanick 2010-08-27 10:48:38 UTC

I added the piglit test glsl-fs-loop-zero-iter to reproduce this behavior.  The test produces the correct result, but, as can be seen in the INTEL_DEBUG=wm output below, the loop is not optimized away.

brw_wm_glsl_emit:
pre-fp:
# Fragment Program/Shader 3
  0: MOV OUTPUT[1], CONST[0];
  1: MOV TEMP[0].x, CONST[0].xxxx;
  2: BGNLOOP; # (end at 10)
  3:    SGE.C TEMP[1].x, TEMP[0].xxxx, CONST[0].xxxx;
  4:    IF (NE.xxxx); # (if false, goto 6);
  5:       BRK (TR.xxxx); # (goto 10);
  6:    ENDIF;
  7:    MOV OUTPUT[1], CONST[0].yxzw;
  8:    ADD TEMP[1].x, TEMP[0].xxxx, CONST[0].yyyy;
  9:    MOV TEMP[0].x, TEMP[1].xxxx;
 10: ENDLOOP; # (goto 2)
 11: END

pass_fp:
  0: MOV OUTPUT[1], CONST[0];
  1: MOV TEMP[0].x, CONST[0].xxxx;
  2: BGNLOOP; # (end at 10)
  3: SGE.C TEMP[1].x, TEMP[0].xxxx, CONST[0].xxxx;
  4: IF (NE.xxxx); # (if false, goto 6);
  5: BRK (TR.xxxx); # (goto 10);
  6: ENDIF;
  7: MOV OUTPUT[1], CONST[0].yxzw;
  8: ADD TEMP[1].x, TEMP[0].xxxx, CONST[0].yyyy;
  9: MOV TEMP[0].x, TEMP[1].xxxx;
 10: ENDLOOP; # (goto 2)
 11: FB_WRITE  ???, OUTPUT[1], FILE14[30], OUTPUT[0];

wm-native:
mov(1)          a0<1>UW         0x00e0UW                        { align1 };
mov(8)          g9<1>F          0F                              { align1 };
mov(8)          g10<1>F         1F                              { align1 };
mov(8)          g11<1>F         0F                              { align1 };
mov(8)          g12<1>F         0F                              { align1 };
mov(8)          g13<1>F         0F                              { align1 };
do(8)                                                           { align1 };
cmp.ge(8)       null            g13<8,8,1>F     0F              { align1 };
mov(8)          g14<1>F         0F                              { align1 };
(+f0) mov(8)    g14<1>F         1F                              { align1 };
(+f0) iff(8)    ip              ip              3D              { align1 switch };
break(8)                        ip              9D              { align1 };
endif(8)                        g0<4,4,1>UD     65536D          { align1 switch };
mov(8)          g9<1>F          1F                              { align1 };
mov(8)          g10<1>F         0F                              { align1 };
mov(8)          g11<1>F         0F                              { align1 };
mov(8)          g12<1>F         0F                              { align1 };
add(8)          g14<1>F         g13<8,8,1>F     1F              { align1 };
mov(8)          g13<1>F         g14<8,8,1>F                     { align1 };
while(8)                        ip              65524D          { align1 };
mov(8)          m2<1>F          g9<8,8,1>F                      { align1 };
mov(8)          m3<1>F          g10<8,8,1>F                     { align1 };
mov(8)          m4<1>F          g11<8,8,1>F                     { align1 };
mov(8)          m5<1>F          g12<8,8,1>F                     { align1 };
mov(8)          m1<1>F          g1<8,8,1>F                      { align1 nomask };
send(8) 0       null            g0<8,8,1>UW
                write (0, 12, 4, 0) mlen 6 rlen 0               { align1 EOT };

brw_wm_glsl_emit done:

Comment 3 Ian Romanick 2010-09-03 12:02:10 UTC

commit 7850ce0a9990c7f752e43a1dd88c204a7cf090aa
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Fri Aug 27 11:26:08 2010 -0700

    glsl2: Eliminate zero-iteration loops

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.