Bug 102571 - vulkaninfo fails with "trap divide error"
Summary: vulkaninfo fails with "trap divide error"
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
Keywords: bisected, have-backtrace, regression
Depends on:
Reported: 2017-09-07 04:24 UTC by danielrf12
Modified: 2018-03-10 21:35 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:

backtrace of vulkaninfo crash (3.46 KB, text/plain)
2017-09-07 04:24 UTC, danielrf12

Description danielrf12 2017-09-07 04:24:48 UTC
Created attachment 134033 [details]
backtrace of vulkaninfo crash

While testing vulkan support for my 390x on git master, I ran across the following issue.

Running vulkaninfo crashes and this line shows up in the journal:
traps: vulkaninfo[30227] trap divide error ip:7f9777251563 sp:7fff7aa100e8 error:0 in libvulkan_radeon.so[7f97771ed000+1a5000]

I also ran git-bisect, which indicates that this issue was caused by the commit 180c1b924e1ed3a2918fad9c5cbb653524de8233

Attached is a backtrace of when it crashes, which is in radv_pipeline_scratch_init of src/vulkan/radv_pipeline.c. The only divide by zero error that looks possible in that function is on line 763 if pipeline->shaders[i]->config.num_vgprs is zero.

Additional information:
Running 4.13.0 kernel on NixOS
Comment 1 Vinson Lee 2017-09-07 05:12:20 UTC
commit 180c1b924e1ed3a2918fad9c5cbb653524de8233
Author: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Date:   Wed Aug 16 09:09:56 2017 +0200

    ac/nir: Add shader support for multiviews.
    It uses an user SGPR to pass the view index to the shaders, except
    for the fragment shader where we use layer=view (which comes in
    handy when we want to do the NV ext that allows us to execute pre-FS
    stages once instead of per view).
    Reviewed-by: Dave Airlie <airlied@redhat.com>
Comment 2 Bas Nieuwenhuizen 2017-09-09 13:08:39 UTC
I can't reproduce. Can you do a crashing run with RADV_DEBUG=shaders,shaderstats  and then upload the stdout+stderr?

if num_vgprs is really 0, I'd think something is really wrong.
Comment 3 danielrf12 2017-09-09 18:43:20 UTC
Strangely, I can no longer reproduce the original error. This is true even when booting the exact system configuration (same kernel, mesa, and all other libraries and executables tracked by NixOS). This is surprising given how consistently well the bisect seemed to be working a few days ago. In fact, I'm even able to finally run steamvr with my 390x.

I'll go ahead and close this issue. I'll reopen in the future if I ever am able to consistently reproduce this again. Thanks anyway.
Comment 4 Sven Arvidsson 2018-03-10 21:35:56 UTC

I had the same problem on a previously working system, with a very similar looking backtrace: 

Thread 1 "vulkaninfo" received signal SIGFPE, Arithmetic exception.
0x00007ffff62aa2f3 in radv_pipeline_scratch_init (pipeline=pipeline@entry=0x555555a66c50, device=<optimized out>, 
    device=<optimized out>) at ../../../../src/amd/vulkan/radv_pipeline.c:117

I noticed that running vulkaninfo as root, or as another user worked. 

Just deleting the file ~/.cache/radv_builtin_shaders fixed it here. 

I'm not sure if the file simply got corrupt due to me killing a vulkan app abruptly or if it had something to do with switching between 64bit and 32bit apps and drivers?

(Not entirely sure how to regenerate the file to recreate the bug either, just running vulkan-smoketest doesn't seem to suffice)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.