Summary: | [ANV] The Witcher 3 shadows flickering | ||
---|---|---|---|
Product: | Mesa | Reporter: | nagrigoriadis |
Component: | Drivers/Vulkan/intel | Assignee: | Matt Turner <mattst88> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | danylo.piliaiev, jason |
Version: | git | Keywords: | bisected, regression |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
vulkaninfo
Even with patch applied, glitch still happens, but less so. Witcher 3 save Shader in question, correct and incorrect NIRs Patch to comment out line as per Danylos suggestion |
Description
nagrigoriadis
2019-01-21 12:11:45 UTC
Hi, thanks for the report! Bisected to: commit 3561108de089f383bb7733eaf49e3d517994b51c Author: Timothy Arceri <tarceri@itsqueeze.com> Date: Wed Nov 7 14:29:18 2018 +1100 anv/i965: make use of nir_link_constant_varyings() shader-db results for SLK: total instructions in shared programs: 13106498 -> 13091573 (-0.11%) instructions in affected programs: 1186244 -> 1171319 (-1.26%) helped: 6186 HURT: 0 total cycles in shared programs: 332062633 -> 331961653 (-0.03%) cycles in affected programs: 8537165 -> 8436185 (-1.18%) helped: 5371 HURT: 862 LOST: 6 GAINED: 14 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Which is just enabling of the feature from: commit d40dd05553e548b648c77394424b59cc0abca199 Author: Timothy Arceri <tarceri@itsqueeze.com> Date: Fri Nov 9 09:24:11 2018 +1100 nir: add new linking opt nir_link_constant_varyings() This pass moves constant outputs to the consuming shader stage where possible. Reviewed-by: Eric Anholt <eric@anholt.net> In RenderDoc the issue is manifested as "inverted" normal map of the scene. I tested at the very beginning of the game and it looks like the issue you have encountered. However it does not happen on 18.3.0. Will look into it tomorrow if it won't be fixed until then. Thanks for the quick response Danylo. I re-build an un-patched Mesa 18.3.2 just now, and re-tested (no other change), and you are right. The issue is vastly improved. There is a single flicker I can get on the same scene, but once per session only. It might have manifested itself as me patching Mesa with the transform feedback patches? (Which don't apply themselves cleanly to Mesa git anymore, but that is a different issue) I rebuilt mesa git and did a revert of commit 3561108de089f383bb7733eaf49e3d517994b51c and the issue also seems resolved. Created attachment 143178 [details]
Even with patch applied, glitch still happens, but less so.
Even with the match to reverse the aforementioned reverted patch, I do still get some glitches such as the attached image.
It occurs less frequent, but most often with buildings or in caves.
After playing The Witcher 3 some more, I came to the conclusion that the constant folding in the identified patch really just makes the bug trigger more likely. It still happens quite often, just less as often. Now that I know what to look for. I'll donwngrade to 18.3.2 again to confirm wether this is really a regression in git, or a longer standing bug. Confirmed to still be there with v18.3.2 It looks like some lighting normals get inverted and flicker between dark and light. See this video: https://drive.google.com/file/d/1OxKno4pqjqipgOxYbDOlDC60ogqvk3Ft/view That wooden house should not be so dark? I'll try with 18.2.8 next, but can't do so immediately. Could you attach your game save at the location where the issue is visible? Since without nir_link_constant_varyings I cannot reproduce it in the very beginning of the game. Created attachment 143197 [details]
Witcher 3 save
Hi Danylo
I attached the save file just by the last video I did.
Thanks for the save, unfortunately I don't have expansions it is made with. I'll travel then with debug console. Interesting, I have traveled around with and without the rain and cannot spot any issues. I have tested it on commit right before 3561108de089f383bb7733eaf49e3d517994b51c and on master without this commit changes. Could you tell the exact location it happens, preferably nearby signpost? All settings set to low (no manual changes in config files). I have HD Graphics 620. Its in the town of Crow's Perch in Velen. Go to the Crow's Perch signpost, just go north over the long bridge into the village there. The save was at night (and raining), but I see artefacts all the way going up to the Manor house at the top. All settings at low, except for texture resolution to medium. (but it happens for me in low). No manual changes in config filed. Using a UHD620 as well. I just tested it with Mesa 18.2.8, and had the same issues (Only noticeable difference was that it ran much slower). I then deleted both the mesa shader cache and the witcher3.dxvk-cache in case there was an issue from there, but the issue persists. So I have had this happen with: * Mesa 18.2.8 * Mesa 18.3.2 * Mesa git (worst) * Mesa git + revert of 3561108de089f383bb7733eaf49e3d517994b51c The unpatched Mesa git is by far the worst, but the issues manifest themselves as either the same, or very similar. I wonder if the constant folding/lowering only just made a certain bug trigger much more often? Or the issue could be due to something else altogether on my system. I indeed see the issue at that place even without bisected commit. > I wonder if the constant folding/lowering only just made a certain bug trigger much more often?
Yes, looks like it.
After staring at shader in RenderDoc I found that something wrong with 1-bit boolean optimizations.
Commenting out
(('bcsel', a, -1, 0), ('ineg', ('b2i', 'a@1'))),
in src/compiler/nir/nir_opt_algebraic.py solves the issue, this optimization is part of optimizations for 1-bit booleans introduced in:
6bcd2af0
nir/algebraic: Add some optimizations for D3D-style Booleans
I have some doubts that this exact optimization is a root of the issue but still it's near enough.
So in the shader there is next contruction:
vec4 _893 = shader_in[4];
_893.x = uintBitsToFloat(gl_FrontFacing ? 4294967295u : 0u);
shader_in[4] = _893;
...
_613.x = uintBitsToFloat((0u >= floatBitsToUint(shader_in[4].x)) ? 4294967295u : 0u);
r1 = _613;
if (floatBitsToUint(r1.x) != 0u)
{
// Something that happens when it should not
}
When problematic optimization kicks in - it is reduced to:
vec1 32 ssa_14 = intrinsic load_front_face () ()
vec1 32 ssa_15 = b2i32 ssa_14
vec1 32 ssa_16 = imov -ssa_15
...
vec1 32 ssa_200 = uge32 ssa_1, ssa_16
/* succs: block_1 block_2 */
if ssa_200 {
block block_1:
And without it:
vec1 32 ssa_14 = intrinsic load_front_face () ()
...
vec1 32 ssa_198 = inot ssa_14
/* succs: block_1 block_2 */
if ssa_198 {
block block_1:
The first one is incorrect, I was unable to find how it happens today.
I'll attach shader and the resulted NIR.
Created attachment 143202 [details]
Shader in question, correct and incorrect NIRs
since the bisection was wrong, re-assigning... Created attachment 143218 [details] [review] Patch to comment out line as per Danylos suggestion Thanks Danylo I applied the attached patch (as per four findings), without reversing the original patch, to latest Mesa git. And I have to say, the game renders perfectly now :-) Thanks for your hard work! Well, not perfectly. Now there is some polygons that flicker, but I'm certain it is unrelated to this issue. It also feels slightly slower? (not scientific at all, as I am in a different area of the game, and that area could perform worse :P ) What have I found on Friday: The NIR I have attached is not a problem - it's a correct one as Jason corrected me on IRC. The issue seems to be even further down, in native code such comparison generated: asr.le.f0.0(8) null<1>D -g0<0,1,0>W 15D { align1 1Q }; ... (+f0.0) if(8) JIP: 128 UIP: 128 { align1 1Q }; The 15th bit of that register means: 0: Front Facing 1: Back Facing Then it is negated to have: 0: Back Facing ~0: Front Facing Then the *signed* .le (less or equal) with zero, as I see it - this is the wrong part, ~0 < 0 when comparison is signed however the initial intention was to have "false" in this comparison when triangle is front facing. Manually changing null<1>D to null<1>UD fixes the issue. How such comparison is created? brw_fs_cmod_propagation.cpp, in opt_cmod_propagation_local, line 323 - CONDITION_LE modifier is propagated when condition operates on two REGISTER_TYPE_UD while scan_inst (opcode: asr) has dst: REGISTER_TYPE_D and src: REGISTER_TYPE_W. From my point of view it's not a correct thing to do, but I'm not sure what to do here. (In reply to Danylo from comment #18) > What have I found on Friday: > > The NIR I have attached is not a problem - it's a correct one as Jason > corrected me on IRC. > > The issue seems to be even further down, in native code such comparison > generated: > > asr.le.f0.0(8) null<1>D -g0<0,1,0>W 15D { align1 1Q > }; > > ... > > (+f0.0) if(8) JIP: 128 UIP: 128 { align1 1Q > }; > > The 15th bit of that register means: > > 0: Front Facing > 1: Back Facing > > Then it is negated to have: > > 0: Back Facing > ~0: Front Facing > > Then the *signed* .le (less or equal) with zero, as I see it - this is the > wrong part, ~0 < 0 when comparison is signed however the initial intention > was to have "false" in this comparison when triangle is front facing. > > Manually changing null<1>D to null<1>UD fixes the issue. > > How such comparison is created? > > brw_fs_cmod_propagation.cpp, in opt_cmod_propagation_local, line 323 - > CONDITION_LE modifier is propagated when condition operates on two > REGISTER_TYPE_UD while scan_inst (opcode: asr) has dst: REGISTER_TYPE_D and > src: REGISTER_TYPE_W. > > From my point of view it's not a correct thing to do, but I'm not sure what > to do here. I'm just (In reply to Danylo from comment #18) > What have I found on Friday: > > The NIR I have attached is not a problem - it's a correct one as Jason > corrected me on IRC. > > The issue seems to be even further down, in native code such comparison > generated: > > asr.le.f0.0(8) null<1>D -g0<0,1,0>W 15D { align1 1Q > }; > > ... > > (+f0.0) if(8) JIP: 128 UIP: 128 { align1 1Q > }; > > The 15th bit of that register means: > > 0: Front Facing > 1: Back Facing > > Then it is negated to have: > > 0: Back Facing > ~0: Front Facing > > Then the *signed* .le (less or equal) with zero, as I see it - this is the > wrong part, ~0 < 0 when comparison is signed however the initial intention > was to have "false" in this comparison when triangle is front facing. > > Manually changing null<1>D to null<1>UD fixes the issue. > > How such comparison is created? > > brw_fs_cmod_propagation.cpp, in opt_cmod_propagation_local, line 323 - > CONDITION_LE modifier is propagated when condition operates on two > REGISTER_TYPE_UD while scan_inst (opcode: asr) has dst: REGISTER_TYPE_D and > src: REGISTER_TYPE_W. > > From my point of view it's not a correct thing to do, but I'm not sure what > to do here. Thank you very much for the analysis (and attaching the shader). I'll take a closer look. > Thank you very much for the analysis (and attaching the shader). I'll take a closer look. Thanks and here is a link to RenderDoc trace https://drive.google.com/open?id=19kV_pv63sLrqpeKyx3-UnVzQIKW7d7gG , to see the issue and shader in the wild you should look at call №2428, the shader I have attached is from that call but made compilable. Please test if git://people.freedesktop.org/~mattst88/mesa bug-109404 fixes the misrendering. Hi Matt I built your branch and it does indeed seem to resolve the issue :-) Thank you Yes, it fixes the issue. Fixed by : commit 2dff9a66b629834bffad47e7a9025e0f1de5ffc3 Author: Matt Turner <mattst88@gmail.com> Date: Mon Feb 11 16:02:15 2019 -0800 intel/compiler: Avoid propagating inequality cmods if types are different v2: Fix silly bug in logic. s/||/&&/ All but one of the affected shaders is in an Unreal4 demo. The other is in Tomb Raider. All of the cases that Ian investigated appear to be sequences like the following if (int(uint(some_float)) < 0) /* other relations too */ ... At least in Tomb Raider, it's not obvious that this sequence came from the original shader. In some of the Unreal demos, the shader contains code like if (int(uint(textureLod(...))) > 0) ... which explicitly generates the offending sequence. All Gen6+ platforms had similar results (Skylake shown): total instructions in shared programs: 15437170 -> 15437187 (<.01%) instructions in affected programs: 4492 -> 4509 (0.38%) helped: 0 HURT: 17 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.57% 0.75% Instructions are HURT. total cycles in shared programs: 383007996 -> 383007992 (<.01%) cycles in affected programs: 20542 -> 20538 (-0.02%) helped: 6 HURT: 7 helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6 helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for cycles value: -3.30 2.69 95% mean confidence interval for cycles %-change: -0.19% 0.19% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: nagrigoriadis@gmail.com Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com> Thanks. I built latest GIT and tested. Seems all good :-) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.