Bug 111405

Summary:	Some infinite 'do{}while' loops lead mesa to an infinite compilation
Product:	Mesa	Reporter:	andrii simiklit <andrey.simiklit.1989>
Component:	Drivers/DRI/i965	Assignee:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Status:	RESOLVED FIXED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium	CC:	andrey.simiklit.1989
Version:	git	Keywords:	bisected, regression
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	111444
Attachments:	reproducer simple fix

Description andrii simiklit 2019-08-15 10:04:00 UTC

Created attachment 145070 [details]
reproducer

While I was trying to create a shader_test for the issue:
https://bugs.freedesktop.org/show_bug.cgi?id=111069
I created a test which leads the latest mesa master to an infinity compilation.

Like in the mentioned issue there is a loop without any exits.
And compiler stucks in an optimization loop because 'progress' always is true.
One of optimization can't stop some constants generation( here is a part of nir: 

>   vec1 32 ssa_5 = load_const (0x3fc00000 /* 1.500000 */)
>   vec1 32 ssa_6 = load_const (0x40100000 /* 2.250000 */)
>   vec1 32 ssa_7 = load_const (0x40580000 /* 3.375000 */)
>   ...
>   ...
>   vec1 32 ssa_221 = load_const (0x7ef509be /* 162855721648427948063735027106695872512.000000 */)
>   vec1 32 ssa_222 = load_const (0x7f37c74e /* 244283572331437120269767328686418165760.000000 */)
>   vec1 32 ssa_223 = load_const (0x7f800000 /* inf */)


Possible that this and mentioned issue could be fixed using a single solution which was mentioned by Jason Ekstrand in the bug111069:
> 1. Detect when we've deleted the last exit of a loop and simply delete the entire loop immediately in nir_opt_dead_cf.
But it is just an assumption.

I am investigating this issue.

Comment 1 andrii simiklit 2019-08-15 11:27:23 UTC

08bfd710a25c14df5f690cce9604617536d7c560 is the first bad commit
commit 08bfd710a25c14df5f690cce9604617536d7c560
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Wed Feb 13 21:42:39 2019 -0600

    nir/dead_cf: Stop relying on liveness analysis
    
    The liveness analysis pass is fairly expensive because it has to build
    large bit-sets and run a fix-point algorithm on them.  Instead of
    requiring liveness for detecting if values escape a CF node, just take
    advantage of the structured nature of NIR and use block indices instead.
    This only requires the block index metadata which is the fastest we have
    metadata to generate.
    
    No shader-db changes on Kaby Lake
    
    Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>


I continue my investigation)

Comment 2 andrii simiklit 2019-08-15 14:48:31 UTC

Created attachment 145071 [details] [review]
simple fix

Actually, I checked this solution and it helped for this bug but didn't for bug111069 because there the last break is removed in nir_opt_if isn't in nir_opt_dead_cf. So for bug111069 we need another solution or we need to move nir_opt_dead_cf after nir_opt_if. The second option helps for both issues I already checked that but didn't check side effects using Intel CI yet.

Comment 3 andrii simiklit 2019-08-19 09:53:37 UTC

(In reply to andrii simiklit from comment #2)
> Created attachment 145071 [details] [review] [review]
> simple fix
> 
> Actually, I checked this solution and it helped for this bug but didn't for
> bug111069 because there the last break is removed in nir_opt_if isn't in
> nir_opt_dead_cf. So for bug111069 we need another solution or we need to
> move nir_opt_dead_cf after nir_opt_if. The second option helps for both
> issues I already checked that but didn't check side effects using Intel CI
> yet.
Never mind, I found why it didn't work for bug111069, I will fix it a bit later.

Comment 4 andrii simiklit 2019-08-23 11:34:45 UTC

Well, looks like I found a root cause.
There is an infinite loop of the following optimizations:
nir_opt_if->nir_opt_constant_folding->nir_opt_if->nir_opt_constant_folding->...   

For instance the original loop is:
>vec1 32 ssa_1 = load_const (0x40000000 /* 2.000000 */)
>vec1 32 ssa_2 = load_const (0x41200000 /* 10.000000 */)
>...
>loop {
>   vec1 32 ssa_10 = phi block_2: ssa_1, block_3: ssa_12
>   vec1 32 ssa_12 = fmul ssa_10, ssa_2
>}

nir_opt_if optimizes first loop iteration extracting fmul out of the loop:
>vec1 32 ssa_0 = load_const (0x40000000 /* 2.000000 */)
>vec1 32 ssa_1 = load_const (0x41200000 /* 10.000000 */)
>...
>vec1 32 ssa_6 = fmul ssa_0, ssa_1
>loop {
>   vec1 32 ssa_8 = phi block_1: ssa_6, block_2: ssa_10
>   vec1 32 ssa_10 = fmul ssa_8, ssa_1
>}

then nir_opt_constant_folding replaces fmul by constant value:
>vec1 32 ssa_0 = load_const (0x40000000 /* 2.000000 */)
>vec1 32 ssa_1 = load_const (0x41200000 /* 10.000000 */)
>...
>vec1 32 ssa_16 = load_const (0x41a00000 /* 20.000000 */)
>loop {
>    vec1 32 ssa_8 = phi block_1: ssa_16, block_2: ssa_10
>    vec1 32 ssa_10 = fmul ssa_8, ssa_1
>}

and nir_opt_if does it again extracts fmul out of the loop:
>vec1 32 ssa_0 = load_const (0x40000000 /* 2.000000 */)
>vec1 32 ssa_1 = load_const (0x41200000 /* 10.000000 */)
>...
>vec1 32 ssa_16 = load_const (0x41a00000 /* 20.000000 */)
>vec1 32 ssa_17 = fmul ssa_16, ssa_1
>loop {
>    vec1 32 ssa_19 = phi block_1: ssa_17, block_2: ssa_18
>    vec1 32 ssa_18 = fmul ssa_19, ssa_1
>}
again nir_opt_constant_folding replaces fmul by the constant
....

The simple solution is to disallow `opt_split_alu_of_phi` for infinity loops but I can't say for sure that it is the best one at least for now. So I am investigating nir_opt_if

Comment 5 Mark Janes 2019-09-05 16:08:42 UTC

Ardrii: please use the bisected and regression tags when you find a bug like this.  We rely on those tags to block releases for unfixed regressions.

Comment 6 Jason Ekstrand 2019-09-06 23:40:53 UTC

Fixed by the following commit now in master:

commit c832820ce959ae7c2b4971befceae634d800330f (HEAD -> master, origin/master, origin/HEAD)
Author: Jason Ekstrand <jason@jlekstrand.net>
Date:   Fri Aug 30 11:35:26 2019 -0500

    nir/dead_cf: Repair SSA if the pass makes progress
    
    The dead_cf pass calls into the CF manipulation helpers which attempt to
    keep NIR's SSA form sane.  However, when the only break is removed from
    a loop, dominance gets messed up anyway because the CF SSA clean-up code
    only looks at phis and doesn't consider the case of code becoming
    unreachable.  One solution to this would be to put the loop into LCSSA
    form before we modify any of its contents.  Another (and the approach
    taken by this pass) is to just run the repair_ssa pass afterwards
    because the CF manipulation helpers are smart enough to keep all the
    use/def stuff sane; they just don't always preserve dominance
    properties.
    
    While we're here, we clean up some bogus indentation.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111405
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111069
    Cc: mesa-stable@lists.freedesktop.org
    Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.