Bug 94080

Summary: [HSW] intel_do_flush_locked failed: Invalid argument in dEQP-GLES31.functional.compute.indirect_dispatch.upload_buffer.single_invocation
Product: Mesa Reporter: Ilia Mirkin <imirkin>
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 94448    
Attachments: INTEL_DEBUG=bat,buf output prior to error

Description Ilia Mirkin 2016-02-10 16:58:13 UTC
Created attachment 121652 [details]
INTEL_DEBUG=bat,buf output prior to error

I reliably get the attached (INTEL_DEBUG=bat,buf) error when running:

MESA_GLES_VERSION_OVERRIDE=3.1 ./deqp-gles31 --deqp-visibility=hidden --deqp-case='dEQP-GLES31.functional.compute.indirect_dispatch.upload_buffer.single_invocation'

Note that I have Ken's recent patch series to fix up some compute state tracking: https://patchwork.freedesktop.org/series/3213/ although the error also happens without it.

This dEQP build is from https://android.googlesource.com/platform/external/deqp + a minor patch to make it actually build (libpng.h -> png.h in CMakeLists.txt). You'll need Xorg 1.18.1 or the relevant GLX patch for it to work too. Please ask if you're having trouble getting it up and running, or need any additional debug info, this reproduces 100% for me.
Comment 1 Ilia Mirkin 2016-02-10 18:25:29 UTC
Upgrading from kernel 4.3.0 to 4.4.1 fixed it but... 2 things

(a) You shouldn't be exposing GL_ARB_compute_shader in this case
(b) exit(1) is *really* harsh on intel_do_flush_locked failure
Comment 2 Kenneth Graunke 2016-02-10 21:04:01 UTC
I actually addressed (a) in bd21b5460761560 ("i965: Only turn on ARB_compute_shader if we can write registers."), but only for desktop GL.  Presumably we need something that stops us from advertising ES 3.1 as well.

Regarding (b)...we've always done that.  We don't really know why the kernel returned an error from the execbuf2 ioctl, but several options are: 1) the GPU is toast (can't really continue).  2) the kernel has revoked our rights to talk to the GPU after hosing it repeatedly (shouldn't continue).  3) some out of memory condition (who knows what to do?).  4) the new command parser rejected our batch for doing bogus things (a bug in Mesa, so kind of like an assert).

The last reason is the sketchiest.  IMHO the command parser is misdesigned - platforms with the hardware checker simply MI_NOOP disallowed things - but the Gen7/7.5-only software checker -EINVALs your program.  I think it should mimic the hardware behavior.  But, others disagree.

So, that's where we're at.  *shrug*
Comment 3 Ilia Mirkin 2016-02-10 21:08:38 UTC
(In reply to Kenneth Graunke from comment #2)
> I actually addressed (a) in bd21b5460761560 ("i965: Only turn on
> ARB_compute_shader if we can write registers."), but only for desktop GL. 
> Presumably we need something that stops us from advertising ES 3.1 as well.

I was force-enabling GLES 3.1. However GL_ARB_compute_shader was exposed for me in Linux kernel 4.3.0. I guess there's more to it? This specifically had to do with indirect compute dispatch, I believe separately from indirect draws.

> 
> Regarding (b)...we've always done that.  We don't really know why the kernel
> returned an error from the execbuf2 ioctl, but several options are: 1) the
> GPU is toast (can't really continue).  2) the kernel has revoked our rights
> to talk to the GPU after hosing it repeatedly (shouldn't continue).  3) some
> out of memory condition (who knows what to do?).  4) the new command parser
> rejected our batch for doing bogus things (a bug in Mesa, so kind of like an
> assert).
> 
> The last reason is the sketchiest.  IMHO the command parser is misdesigned -
> platforms with the hardware checker simply MI_NOOP disallowed things - but
> the Gen7/7.5-only software checker -EINVALs your program.  I think it should
> mimic the hardware behavior.  But, others disagree.
> 
> So, that's where we're at.  *shrug*

Yeah, dealing with unexpected errors sucks. I think the ultimate move is to tear down the context and start from scratch. And start returning errors if you can't bring things up properly. You're going to have to deal with this for proper robustness support eventually, but I agree this is a giant pain :)
Comment 4 Matt Turner 2016-11-03 01:11:52 UTC
The test passes for me on HSW (now that we expose ES 3.1). The original bug is fixed. RESOLVED.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.