Bug 93418

Summary:	Geometry Shaders output wrong vertices on Sandy Bridge
Product:	Mesa	Reporter:	Link Mauve <bugs>
Component:	Drivers/DRI/i965	Assignee:	Matt Turner <mattst88>
Status:	RESOLVED FIXED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium	CC:	imirkin, sheepdestroyer
Version:	git
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	apitrace reproducing this bug (starts at frame 470) Screenshot from trace change copyprop condition

Description Link Mauve 2015-12-17 01:26:45 UTC

Created attachment 120555 [details]
apitrace reproducing this bug (starts at frame 470)

Some vertices seem to be wrongly put on the (0,0) (1,1) diagonal when using a geometry shader on Sandy Bridge.

Screenshots of this bug:
http://i.imgur.com/c8WhpF5.png
http://linkmauve.fr/files/dolphin.png (from the previous trace)

Related Dolphin bug report:
https://bugs.dolphin-emu.org/issues/9166

Comment 1 Ilia Mirkin 2016-01-16 21:11:42 UTC

Created attachment 121075 [details]
Screenshot from trace

What's not obvious in the previous screenshot is that all these lines stop at the diagonal. It looks like there's a fan vs strip type of situation going on here on top of whatever's causing the lines to appear.

The first draw with a GS shows the bad behaviour:

813290 @0 glDrawRangeElementsBaseVertex(mode = GL_POINTS, start = 0, end = 24630, count = 24630, type = GL_UNSIGNED_SHORT, indices = 0x15eb10, basevertex = 413659)

So it takes points in and outputs a triangle_strip. Perhaps that's where things are going wrong?

Comment 2 Ilia Mirkin 2016-01-16 22:06:07 UTC

And with master I'm getting:

brw_vec4.cpp:1797: void brw::vec4_visitor::convert_to_hw_regs(): Assertion `brw_is_single_value_swizzle(inst->src[i].swizzle)' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff607f167 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff607f167 in raise () from /lib64/libc.so.6
#1  0x00007ffff60804ca in abort () from /lib64/libc.so.6
#2  0x00007ffff6078296 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff6078342 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff286d630 in brw::vec4_visitor::convert_to_hw_regs (this=this@entry=0x4b59490) at brw_vec4.cpp:1797
#5  0x00007ffff286d991 in brw::vec4_visitor::run (this=this@entry=0x4b59490) at brw_vec4.cpp:1960
#6  0x00007ffff287b2d4 in brw::brw_compile_gs (compiler=0x9185d0, log_data=log_data@entry=0x7ffff7fd0040, 
    mem_ctx=mem_ctx@entry=0x4bc6510, key=key@entry=0x7fffffffd910, prog_data=prog_data@entry=0x7fffffffd6e0, 
    src_shader=<optimized out>, shader_prog=shader_prog@entry=0x4b4cac0, 
    shader_time_index=shader_time_index@entry=-1, final_assembly_size=final_assembly_size@entry=0x7fffffffd6d4, 
    error_str=error_str@entry=0x7fffffffd6d8) at brw_vec4_gs_visitor.cpp:914
#7  0x00007ffff27a308c in brw_codegen_gs_prog (brw=brw@entry=0x7ffff7fd0040, prog=prog@entry=0x4b4cac0, 
    gp=gp@entry=0x4b1d940, key=key@entry=0x7fffffffd910) at brw_gs.c:160
#8  0x00007ffff27a3604 in brw_gs_precompile (ctx=0x7ffff7fd0040, shader_prog=0x4b4cac0, prog=0x4b1d940)
    at brw_gs.c:288
#9  0x00007ffff27a3dc9 in brw_shader_precompile (sh_prog=0x4b4cac0, ctx=0x7ffff7fd0040) at brw_link.cpp:53
#10 brw_link_shader (ctx=0x7ffff7fd0040, shProg=<optimized out>) at brw_link.cpp:282
#11 0x00007ffff266833e in _mesa_glsl_link_shader (ctx=0x7ffff7fd0040, prog=0x4b4cac0)
    at program/ir_to_mesa.cpp:2962
#12 0x00007ffff256f35a in link_program (ctx=0x7ffff7fd0040, program=<optimized out>) at main/shaderapi.c:1048

The shader in question is, I believe, the same one as the one used in draw call 813290. Here are some debug prints from INTEL_DEBUG=gs (I skipped the GLSL IR bits):

NIR (SSA form) for geometry shader:
shader: MESA_SHADER_GEOMETRY
name: GLSL214
inputs: 0
outputs: 0
uniforms: 0
decl_var uniform INTERP_QUALIFIER_NONE ivec4 ctexoffset (2, 0)
decl_var uniform INTERP_QUALIFIER_NONE vec4 clinept (1, 0)
decl_var uniform INTERP_QUALIFIER_NONE vec4 cstereo (0, 0)
decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] colors_0 (VARYING_SLOT_VAR0, 26)
decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] pos (VARYING_SLOT_VAR1, 27)
decl_var shader_out INTERP_QUALIFIER_NONE vec4 gl_Position (VARYING_SLOT_POS, 0)
decl_var shader_out INTERP_QUALIFIER_NONE vec4 colors_0@0 (VARYING_SLOT_VAR0, 26)
decl_function main returning void

impl main {
        block block_0:
        /* preds: */
        vec1 ssa_0 = load_const (0x00000000 /* 0.000000 */)
        vec2 ssa_1 = load_const (0xbf800000 /* -1.000000 */, 0x3f800000 /* 1.000000 */)
        vec2 ssa_2 = load_const (0x3f800000 /* 1.000000 */, 0xbf800000 /* -1.000000 */)
        vec1 ssa_3 = load_const (0x00000010 /* 0.000000 */)
        vec4 ssa_4 = intrinsic load_ubo (ssa_0, ssa_3) () ()
        vec1 ssa_5 = frcp ssa_4
        vec1 ssa_6 = fmul ssa_4.w, ssa_5
        vec1 ssa_7 = frcp ssa_4.y
        vec1 ssa_8 = fmul -ssa_4.w, ssa_7
        vec2 ssa_9 = vec2 ssa_6, ssa_8
        vec4 ssa_10 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (27)    /* pos */
        vec2 ssa_11 = fmul ssa_9, ssa_10.ww
        vec4 ssa_12 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (26)    /* colors_0 */
        vec2 ssa_13 = fadd ssa_10, -ssa_11
        vec4 ssa_14 = vec4 ssa_13, ssa_13.y, ssa_10.z, ssa_10.w
        vec2 ssa_15 = ffma ssa_2, ssa_11, ssa_10
        vec4 ssa_16 = vec4 ssa_15, ssa_15.y, ssa_10.z, ssa_10.w
        vec2 ssa_17 = ffma ssa_1, ssa_11, ssa_10
        vec4 ssa_18 = vec4 ssa_17, ssa_17.y, ssa_10.z, ssa_10.w
        vec2 ssa_19 = fadd ssa_10, ssa_11
        vec4 ssa_20 = vec4 ssa_19, ssa_19.y, ssa_10.z, ssa_10.w
        intrinsic store_output (ssa_14, ssa_0) () (0, 15)       /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_0) () (0)
        vec1 ssa_21 = load_const (0x00000001 /* 0.000000 */)
        intrinsic store_output (ssa_16, ssa_0) () (0, 15)       /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_21) () (0)
        vec1 ssa_22 = load_const (0x00000002 /* 0.000000 */)
        intrinsic store_output (ssa_18, ssa_0) () (0, 15)       /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_22) () (0)
        vec1 ssa_23 = load_const (0x00000003 /* 0.000000 */)
        intrinsic store_output (ssa_20, ssa_0) () (0, 15)       /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_23) () (0)
        vec1 ssa_24 = load_const (0x00000004 /* 0.000000 */)
        intrinsic end_primitive_with_counter (ssa_24) () (0)
        intrinsic set_vertex_count (ssa_24) () ()
        /* succs: block_1 */
        block block_1:
}

NIR (final form) for geometry shader:
shader: MESA_SHADER_GEOMETRY
name: GLSL214
inputs: 0
outputs: 0
uniforms: 0
decl_var uniform INTERP_QUALIFIER_NONE ivec4 ctexoffset (2, 0)
decl_var uniform INTERP_QUALIFIER_NONE vec4 clinept (1, 0)
decl_var uniform INTERP_QUALIFIER_NONE vec4 cstereo (0, 0)
decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] colors_0 (VARYING_SLOT_VAR0, 26)
decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] pos (VARYING_SLOT_VAR1, 27)
decl_var shader_out INTERP_QUALIFIER_NONE vec4 gl_Position (VARYING_SLOT_POS, 0)
decl_var shader_out INTERP_QUALIFIER_NONE vec4 colors_0@0 (VARYING_SLOT_VAR0, 26)
decl_function main returning void

impl main {
        decl_reg vec2 r0
        decl_reg vec4 r1
        decl_reg vec4 r2
        decl_reg vec4 r3
        decl_reg vec4 r4
        block block_0:
        /* preds: */
        vec1 ssa_0 = load_const (0x00000000 /* 0.000000 */)
        vec2 ssa_1 = load_const (0xbf800000 /* -1.000000 */, 0x3f800000 /* 1.000000 */)
        vec2 ssa_2 = load_const (0x3f800000 /* 1.000000 */, 0xbf800000 /* -1.000000 */)
        vec1 ssa_3 = load_const (0x00000010 /* 0.000000 */)
        vec4 ssa_4 = intrinsic load_ubo (ssa_0, ssa_3) () ()
        vec1 ssa_5 = frcp ssa_4
        r0.x = fmul ssa_4.w, ssa_5
        vec1 ssa_7 = frcp ssa_4.y
        r0.y = fmul -ssa_4.w, ssa_7.x
        vec4 ssa_10 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (27)    /* pos */
        vec2 ssa_11 = fmul r0, ssa_10.ww
        vec4 ssa_12 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (26)    /* colors_0 */
        r1.xy = fadd ssa_10, -ssa_11
        r1.zw = imov ssa_10
        r2.xy = ffma ssa_2, ssa_11, ssa_10
        r2.zw = imov r1
        r3.xy = ffma ssa_1, ssa_11, ssa_10
        r3.zw = imov r2
        r4.xy = fadd ssa_10, ssa_11
        r4.zw = imov r3
        intrinsic store_output (r1, ssa_0) () (0, 15)   /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_0) () (0)
        vec1 ssa_21 = load_const (0x00000001 /* 0.000000 */)
        intrinsic store_output (r2, ssa_0) () (0, 15)   /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_21) () (0)
        vec1 ssa_22 = load_const (0x00000002 /* 0.000000 */)
        intrinsic store_output (r3, ssa_0) () (0, 15)   /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_22) () (0)
        vec1 ssa_23 = load_const (0x00000003 /* 0.000000 */)
        intrinsic store_output (r4, ssa_0) () (0, 15)   /* gl_Position */
        intrinsic store_output (ssa_12, ssa_0) () (26, 15)      /* colors_0 */
        intrinsic emit_vertex_with_counter (ssa_23) () (0)
        vec1 ssa_24 = load_const (0x00000004 /* 0.000000 */)
        intrinsic end_primitive_with_counter (ssa_24) () (0)
        intrinsic set_vertex_count (ssa_24) () ()
        /* succs: block_1 */
        block block_1:
}

GS Input VUE map (4 slots, non-SSO)
  [0] VARYING_SLOT_PSIZ
  [1] VARYING_SLOT_POS
  [2] VARYING_SLOT_VAR0
  [3] VARYING_SLOT_VAR1

GS Output VUE map (3 slots, non-SSO)
  [0] VARYING_SLOT_PSIZ
  [1] VARYING_SLOT_POS
  [2] VARYING_SLOT_VAR0

GS estimated execution time: 138 cycles

Comment 3 Ilia Mirkin 2016-01-18 00:47:25 UTC

Created attachment 121097 [details] [review]
change copyprop condition

Well, this patch "fixes" it for me, but I don't know if vstride is available at this point in the compiler flow. This makes the condition match what's in convert_to_hw_values though. The trace runs seemingly correctly with this.

Comment 4 Ilia Mirkin 2016-01-18 01:40:40 UTC

Actually I think this patch is a little better:

http://patchwork.freedesktop.org/patch/70688/

Comment 5 sheepdestroyer 2016-02-04 06:50:12 UTC

Hi,
what is the status on this bug? Has the patch been reviewed?
The intel driver on SNB has been blacklisted in dolphin for this feature. Lift waiting for resolution.
Thanks

Comment 6 Matt Turner 2016-02-04 07:00:15 UTC

I'll handle it. It's on my todo list.

Comment 7 Matt Turner 2016-02-04 18:47:18 UTC

I've sent a patch to the mailing list, in reply to Ilia's initial patch.

Comment 8 sheepdestroyer 2016-02-06 19:54:53 UTC

Thanks, I can confirm the original bug in dolphin has been solved with mesa-master

Comment 9 Matt Turner 2016-02-06 20:09:54 UTC

Thanks for testing!

commit 9f2e22bf343b21d6b44e6a502f00a86d169f5ade
Author: Matt Turner <mattst88@gmail.com>
Date:   Sun Jan 17 20:30:14 2016 -0500

    i965/vec4: don't copy ATTR into 3src instructions with complex swizzles

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.