Bug 108630

Summary: [G965] piglit.spec.!opengl 1_2.tex3d-maxsize spins forever
Product: Mesa Reporter: Mark Janes <mark.a.janes>
Component: Drivers/DRI/i965Assignee: andrii simiklit <andrey.simiklit.1989>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: jljusten, kenneth
Version: gitKeywords: bisected, regression
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Mark Janes 2018-11-01 22:44:04 UTC
bisected to:

a9031bf9b55602d93cccef6c926e2179c23205b4
Author:     Andrii Simiklit <andrii.simiklit@globallogic.com>

i965/batch: avoid reverting batch buffer if saved state is an empty

There's no point reverting to the last saved point if that save point is
the empty batch, we will just repeat ourselves.

CC: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 3faf56ffbdeb "intel: Add an interface for saving/restoring
                     the batchbuffer state."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107626
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>


Previously, this test failed within 3s with:
 i965: Failed to submit batchbuffer: No space left on device
Comment 1 Mark Janes 2018-11-01 22:50:07 UTC
Because the build spun forever, Mesa CI timed the build out.  Unfortunately, this was reported as a success :(

The CI test results for this patch were invalid.
Comment 2 Mark Janes 2018-11-01 23:36:42 UTC
Reverting the patch fixes the test, without any other regressions shown in CI.
Comment 3 Jordan Justen 2018-11-02 02:34:08 UTC
Andrii,

I found the issue. It is not specific to g965. In the blorp code, on a
retry, the check_aperture_failed_once variable will be reset back to
false since we clear things before the retry. This small patch added
to your change fixes it.

Another idea could be to preset check_aperture_failed_once before
the retry label.

Can you re-submit the patch with this, or a similar change?

diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
index fd9ce93c6c..7b0f77af25 100644
--- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
+++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
@@ -309,7 +309,7 @@ retry:
    intel_batchbuffer_require_space(brw, 1400);
    brw_require_statebuffer_space(brw, 600);
    intel_batchbuffer_save_state(brw);
-   check_aperture_failed_once = intel_batchbuffer_saved_state_is_empty(brw);
+   check_aperture_failed_once |= intel_batchbuffer_saved_state_is_empty(brw);
    brw->batch.no_wrap = true;
 
 #if GEN_GEN == 6
Comment 4 Mark Janes 2018-11-02 03:09:30 UTC
I reverted the original patch, fixing the regression.
Comment 5 asimiklit 2018-11-02 08:57:05 UTC
(In reply to Mark Janes from comment #0)
> bisected to:
> 
> a9031bf9b55602d93cccef6c926e2179c23205b4
> Author:     Andrii Simiklit <andrii.simiklit@globallogic.com>
> 
> i965/batch: avoid reverting batch buffer if saved state is an empty
> 
> There's no point reverting to the last saved point if that save point is
> the empty batch, we will just repeat ourselves.
> 
> CC: Chris Wilson <chris@chris-wilson.co.uk>
> Fixes: 3faf56ffbdeb "intel: Add an interface for saving/restoring
>                      the batchbuffer state."
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107626
> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
> 
> 
> Previously, this test failed within 3s with:
>  i965: Failed to submit batchbuffer: No space left on device

Yes you are right. It my mistake :(
Sorry for that.
Thanks a lot for your work. 

(In reply to Jordan Justen from comment #3)
> Andrii,
> 
> I found the issue. It is not specific to g965. In the blorp code, on a
> retry, the check_aperture_failed_once variable will be reset back to
> false since we clear things before the retry. This small patch added
> to your change fixes it.
> 
> Another idea could be to preset check_aperture_failed_once before
> the retry label.
> 
> Can you re-submit the patch with this, or a similar change?
> 
> diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> index fd9ce93c6c..7b0f77af25 100644
> --- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> +++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> @@ -309,7 +309,7 @@ retry:
>     intel_batchbuffer_require_space(brw, 1400);
>     brw_require_statebuffer_space(brw, 600);
>     intel_batchbuffer_save_state(brw);
> -   check_aperture_failed_once = intel_batchbuffer_saved_state_is_empty(brw);
> +   check_aperture_failed_once |=
> intel_batchbuffer_saved_state_is_empty(brw);
>     brw->batch.no_wrap = true;
>  
>  #if GEN_GEN == 6

You are right. Thanks a lot for your fix. 
I will send updated patch shortly.
Comment 6 asimiklit 2018-11-02 10:00:09 UTC
I investigated why I missed this regression on my local machine despite I checked this patch several times under piglit and found out that this test passes on my machine even with my original patch. So I guess it depends on some additional things. 

asimiklit@asimiklit-pc:~/projects/piglit$ bin/tex3d-maxsize -auto 
Probe tolerance: 0.023438, 0.023438, 0.023438, 0.023438
GL_MAX_3D_TEXTURE_SIZE = 2048
Actual max 3D texture size for GL_INTENSITY8: 1024 x 1024 x 1024 (1024 MB)
Testing 1024 x 1024 x 256 GL_INTENSITY8 (256 MB) texture
Testing 1024 x 1024 x 512 GL_INTENSITY8 (512 MB) texture
Testing 1024 x 1024 x 1024 GL_INTENSITY8 (1024 MB) texture
Actual max 3D texture size for GL_RGBA8: 512 x 512 x 1024 (1024 MB)
Testing 512 x 512 x 256 GL_RGBA8 (256 MB) texture
Testing 512 x 512 x 512 GL_RGBA8 (512 MB) texture
Testing 512 x 512 x 1024 GL_RGBA8 (1024 MB) texture
Actual max 3D texture size for GL_RGBA32F: 256 x 512 x 512 (1024 MB)
Testing 256 x 512 x 128 GL_RGBA32F (256 MB) texture
Testing 256 x 512 x 256 GL_RGBA32F (512 MB) texture
Testing 256 x 512 x 512 GL_RGBA32F (1024 MB) texture
PIGLIT: {"result": "pass" }


So I will check my updated patch on Intel CI before sent.
Comment 7 asimiklit 2018-11-02 13:35:30 UTC
(In reply to asimiklit from comment #6)
> I investigated why I missed this regression on my local machine despite I
> checked this patch several times under piglit and found out that this test
> passes on my machine even with my original patch. So I guess it depends on
> some additional things. 
> 
> asimiklit@asimiklit-pc:~/projects/piglit$ bin/tex3d-maxsize -auto 
> Probe tolerance: 0.023438, 0.023438, 0.023438, 0.023438
> GL_MAX_3D_TEXTURE_SIZE = 2048
> Actual max 3D texture size for GL_INTENSITY8: 1024 x 1024 x 1024 (1024 MB)
> Testing 1024 x 1024 x 256 GL_INTENSITY8 (256 MB) texture
> Testing 1024 x 1024 x 512 GL_INTENSITY8 (512 MB) texture
> Testing 1024 x 1024 x 1024 GL_INTENSITY8 (1024 MB) texture
> Actual max 3D texture size for GL_RGBA8: 512 x 512 x 1024 (1024 MB)
> Testing 512 x 512 x 256 GL_RGBA8 (256 MB) texture
> Testing 512 x 512 x 512 GL_RGBA8 (512 MB) texture
> Testing 512 x 512 x 1024 GL_RGBA8 (1024 MB) texture
> Actual max 3D texture size for GL_RGBA32F: 256 x 512 x 512 (1024 MB)
> Testing 256 x 512 x 128 GL_RGBA32F (256 MB) texture
> Testing 256 x 512 x 256 GL_RGBA32F (512 MB) texture
> Testing 256 x 512 x 512 GL_RGBA32F (1024 MB) texture
> PIGLIT: {"result": "pass" }
> 
> 
> So I will check my updated patch on Intel CI before sent.

The updated patch is delayed due to testing problems.
Comment 8 Jordan Justen 2018-11-02 21:19:42 UTC
(In reply to asimiklit from comment #7)
> (In reply to asimiklit from comment #6)
> > I investigated why I missed this regression on my local machine despite I
> > checked this patch several times under piglit and found out that this test
> > passes on my machine even with my original patch. So I guess it depends on
> > some additional things. 
> > 
> 
> The updated patch is delayed due to testing problems.

I hope my comment ("It is not specific to g965") is not delaying
things. What I meant is that "in theory" it could happen on any system.
But, due to other factors, we only currently know of a reproducible
case on g965.

So, I would not hold off on re-posting the patch just because you
cannot reproduce the bug on a newer machine.

-Jordan
Comment 9 asimiklit 2018-11-03 14:55:36 UTC
(In reply to Jordan Justen from comment #8)
> (In reply to asimiklit from comment #7)
> > (In reply to asimiklit from comment #6)
> > > I investigated why I missed this regression on my local machine despite I
> > > checked this patch several times under piglit and found out that this test
> > > passes on my machine even with my original patch. So I guess it depends on
> > > some additional things. 
> > > 
> > 
> > The updated patch is delayed due to testing problems.
> 
> I hope my comment ("It is not specific to g965") is not delaying
> things. What I meant is that "in theory" it could happen on any system.
> But, due to other factors, we only currently know of a reproducible
> case on g965.
> 
> So, I would not hold off on re-posting the patch just because you
> cannot reproduce the bug on a newer machine.
> 
> -Jordan

I just wanted to see "Build # XX - Successful!" from Intel CI
for my updated patch to be 100% sure that there are no regressions. 
But on Friday I was unable to do it due to troubles with a 'mesa-ci.01.org' (I didn't have an ability to see results of Intel CI) and I was unable to get "Success" even for latest mesa master. 
Today I started one more test run: "global_logic - Build # 37 - Building!"  for latest mesa master and when I receive 'Success' I will run it for my updated patch.
Comment 10 Jordan Justen 2018-11-20 15:32:22 UTC
I pushed v4 of Andrii's patch:

commit b787dcf57b7298868ce9b6885a827d57a6127ba1
Author: Andrii Simiklit <andrii.simiklit@globallogic.com>
Date:   Mon Nov 5 09:48:26 2018 +0200

    i965/batch: avoid reverting batch buffer if saved state is an empty

It appears to abort (as expected) with tex3d-maxsize on g965,
rather than hanging the test:

https://mesa-ci.01.org/jljusten/builds/215/results/9151567
Comment 11 asimiklit 2018-11-21 08:09:27 UTC
(In reply to Jordan Justen from comment #10)
> I pushed v4 of Andrii's patch:
> 
> commit b787dcf57b7298868ce9b6885a827d57a6127ba1
> Author: Andrii Simiklit <andrii.simiklit@globallogic.com>
> Date:   Mon Nov 5 09:48:26 2018 +0200
> 
>     i965/batch: avoid reverting batch buffer if saved state is an empty
> 
> It appears to abort (as expected) with tex3d-maxsize on g965,
> rather than hanging the test:
> 
> https://mesa-ci.01.org/jljusten/builds/215/results/9151567

Thanks a lot)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.