Bug 104226

Summary: [bisected] Anvil accesses uninitialized memory while compiling shaders
Product: Mesa Reporter: Józef Kucia <joseph.kucia>
Component: Drivers/Vulkan/intelAssignee: Francisco Jerez <currojerez>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: currojerez, jason
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: config.log
Simple Vulkan program to reproduce the problem
libvulkan_intel.so 17.3-branchpoint-1675-g35c3cbad3c30
0001-intel-fs-Initialize-fs_visitor-grf_used-on-construct.patch
Hack
0001-intel-fs-bank_conflicts-Use-posix_memalign-instead-o.patch

Description Józef Kucia 2017-12-12 15:20:11 UTC
This seems to cause a crash in any 32-bit application I've tried. For 64-bit applications there are just uninitialized memory accesses without crashes.

This is a regression:

commit af2c320190f3c73180f1610c8df955a7fa2a4d09
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Thu Jun 15 15:23:57 2017 -0700

    intel/fs: Implement GRF bank conflict mitigation pass.

The patch from bug 104199 doesn't help.

(gdb) bt
#0  0xf79784ba in (anonymous namespace)::weight_vector_type::weight_vector_type (n=136416036, this=<optimized out>)
    at ../../../mesa-git/src/intel/compiler/brw_fs_bank_conflicts.cpp:281
#1  (anonymous namespace)::shader_conflict_weight_matrix (v=0x8218d30, v=0x8218d30, p=<optimized out>)
    at ../../../mesa-git/src/intel/compiler/brw_fs_bank_conflicts.cpp:591
#2  fs_visitor::opt_bank_conflicts (this=0xffff7d44)
    at ../../../mesa-git/src/intel/compiler/brw_fs_bank_conflicts.cpp:878
#3  0xf799c941 in fs_visitor::allocate_registers (this=0xffff7d44, min_dispatch_width=8, allow_spilling=true)
    at ../../../mesa-git/src/intel/compiler/brw_fs.cpp:6120
#4  0xf799e95f in fs_visitor::run_vs (this=0xffff7d44) at ../../../mesa-git/src/intel/compiler/brw_fs.cpp:6191
#5  0xf7a238d2 in brw_compile_vs (compiler=0x808c158, log_data=0x0, mem_ctx=0x8117678, key=0xffffa44c, 
    prog_data=0xffffa508, src_shader=0x8184868, use_legacy_snorm_formula=false, shader_time_index=-1, error_str=0x0)
    at ../../../mesa-git/src/intel/compiler/brw_vec4.cpp:2865
#6  0xf78fa3e6 in anv_pipeline_compile_vs (pipeline=pipeline@entry=0x81836a8, cache=cache@entry=0x0, 
    module=module@entry=0x8182ad0, entrypoint=0x804c6c0 "main", spec_info=0x0, info=0xffffca50)
    at ../../../mesa-git/src/intel/vulkan/anv_pipeline.c:546
#7  0xf78fc1cb in anv_pipeline_init (pipeline=0x81836a8, device=0x82103f8, cache=0x0, pCreateInfo=0xffffca50, 
    alloc=<optimized out>) at ../../../mesa-git/src/intel/vulkan/anv_pipeline.c:1319
#8  0xf793f551 in gen9_graphics_pipeline_create (pPipeline=<optimized out>, pAllocator=<optimized out>, 
    pCreateInfo=<optimized out>, cache=<optimized out>, _device=<optimized out>)
    at ../../../mesa-git/src/intel/vulkan/genX_pipeline.c:1661
#9  gen9_CreateGraphicsPipelines (_device=0x82103f8, pipelineCache=0, count=1, pCreateInfos=0xffffca50, 
    pAllocator=0x0, pPipelines=0xffffc980) at ../../../mesa-git/src/intel/vulkan/genX_pipeline.c:1864
#10 0xf7fa02b8 in vkCreateGraphicsPipelines () from /home/joseph/src/vulkan/vulkan-build32/loader/libvulkan.so.1
#11 0x08049dab in create_graphics_pipeline ()
#12 0x0804b67e in main ()

==19019== Conditional jump or move depends on uninitialised value(s)
==19019==    at 0x6182B28: fs_visitor::bank_conflict_cycles(fs_inst const*) const (brw_fs_bank_conflicts.cpp:906)
==19019==    by 0x620945F: fs_instruction_scheduler::issue_time(backend_instruction*) (brw_schedule_instructions.cpp:1546)
==19019==    by 0x620AF8D: instruction_scheduler::compute_delays() (brw_schedule_instructions.cpp:831)
==19019==    by 0x620F2E1: instruction_scheduler::run(cfg_t*) (brw_schedule_instructions.cpp:1701)
==19019==    by 0x620F4F0: fs_visitor::schedule_instructions(instruction_scheduler_mode) (brw_schedule_instructions.cpp:1730)
==19019==    by 0x61A3D45: fs_visitor::allocate_registers(unsigned int, bool) (brw_fs.cpp:6071)
==19019==    by 0x61A5BC1: fs_visitor::run_vs() (brw_fs.cpp:6191)
==19019==    by 0x6220CD2: brw_compile_vs (brw_vec4.cpp:2865)
==19019==    by 0x5FCB3CC: anv_pipeline_compile_vs.isra.13 (anv_pipeline.c:546)
==19019==    by 0x5FCD194: anv_pipeline_init (anv_pipeline.c:1319)
==19019==    by 0x60FCB27: gen9_graphics_pipeline_create (genX_pipeline.c:1661)
==19019==    by 0x60FCB27: gen9_CreateGraphicsPipelines (genX_pipeline.c:1864)
==19019==    by 0x4E67C2B: vkCreateGraphicsPipelines (trampoline.c:1257)
Comment 1 Józef Kucia 2017-12-12 15:23:06 UTC
Possibly a duplicate of bug 104197
Comment 2 Francisco Jerez 2017-12-12 19:20:29 UTC
Can you share your config.log?  vktrace of a misbehaving program would also be useful.
Comment 3 Józef Kucia 2017-12-15 14:42:28 UTC
Created attachment 136195 [details]
config.log
Comment 4 Józef Kucia 2017-12-15 14:44:11 UTC
Created attachment 136196 [details]
Simple Vulkan program to reproduce the problem
Comment 5 Francisco Jerez 2017-12-15 18:46:49 UTC
Thank you.  Would you mind sending me a copy of your libvulkan_intel.so binary in addition?
Comment 6 Józef Kucia 2017-12-16 10:37:30 UTC
Created attachment 136215 [details]
libvulkan_intel.so 17.3-branchpoint-1675-g35c3cbad3c30
Comment 7 Francisco Jerez 2017-12-17 08:45:56 UTC
Created attachment 136237 [details] [review]
0001-intel-fs-Initialize-fs_visitor-grf_used-on-construct.patch

The attached patch should fix the valgrind error you pointed out.  Strangely enough I have been unable to reproduce the crash, even while running the same binary you compiled yourself.  The conditional jump depending on uninitialized memory doesn't explain the problem, it seems fully harmless.  What platform are you running this on?  And what exactly is the nature of the crash you're seeing?  (segfault?)
Comment 8 Józef Kucia 2017-12-17 10:46:17 UTC
Created attachment 136238 [details] [review]
Hack

(In reply to Francisco Jerez from comment #7)
> The attached patch should fix the valgrind error you pointed out.  Strangely
> enough I have been unable to reproduce the crash, even while running the
> same binary you compiled yourself.  The conditional jump depending on
> uninitialized memory doesn't explain the problem, it seems fully harmless. 
> What platform are you running this on?  And what exactly is the nature of
> the crash you're seeing?  (segfault?)
Yes, the crash is unrelated to the valgrind error. It seems to be SSE alignment issue. The attached hack fixes the problem for me.
Comment 9 Francisco Jerez 2017-12-17 19:31:53 UTC
What platform are you running this on?
Comment 10 Francisco Jerez 2017-12-17 22:16:09 UTC
Created attachment 136240 [details] [review]
0001-intel-fs-bank_conflicts-Use-posix_memalign-instead-o.patch

I think I would prefer to address the problem at the allocation point as in the attached patch.  Can you confirm whether the attachment fixes the crash for you since I'm unable to reproduce it?
Comment 11 Józef Kucia 2017-12-17 23:10:07 UTC
(In reply to Francisco Jerez from comment #10)
> Created attachment 136240 [details] [review] [review]
> 0001-intel-fs-bank_conflicts-Use-posix_memalign-instead-o.patch
> 
> I think I would prefer to address the problem at the allocation point as in
> the attached patch.  Can you confirm whether the attachment fixes the crash
> for you since I'm unable to reproduce it?
The patch fixes the crash. I'm not sure if it's the best solution though, e.g. _mesa_align_malloc() might be preferred instead of posix_memalign().

(In reply to Francisco Jerez from comment #9)
> What platform are you running this on?
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz GenuineIntel GNU/Linux

gcc (Gentoo 5.4.0-r3 p1.4, pie-0.6.5) 5.4.0
Comment 12 Francisco Jerez 2017-12-17 23:33:25 UTC
(In reply to Józef Kucia from comment #11)
> (In reply to Francisco Jerez from comment #10)
> > Created attachment 136240 [details] [review] [review] [review]
> > 0001-intel-fs-bank_conflicts-Use-posix_memalign-instead-o.patch
> > 
> > I think I would prefer to address the problem at the allocation point as in
> > the attached patch.  Can you confirm whether the attachment fixes the crash
> > for you since I'm unable to reproduce it?
> The patch fixes the crash.

Great, I'll send it for review once I get some results from our CI.

> I'm not sure if it's the best solution though, e.g. _mesa_align_malloc() might
> be preferred instead of posix_memalign().

I shortly considered using _mesa_align_malloc(), however I don't think it's available to the vulkan driver.  Not a big deal, I think posix_memalign should be fine, the i965 driver is not expected to run on Windows.
Comment 13 Francisco Jerez 2017-12-22 00:31:55 UTC
Should be fixed in master now.  Closing as resolved.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.