Summary: | The GPU Vega 56 was hang while try to pass #GraphicsFuzz shader15 test | ||
---|---|---|---|
Product: | Mesa | Reporter: | mikhail.v.gavrilov |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | CC: | david.cap, devurandom, freedesktop, ilvipero |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Shader runner link test |
Description
mikhail.v.gavrilov
2018-03-01 18:41:26 UTC
Mikhail one suggestion to consider for the future: Do mention version numbers (or sha if using a git checkout), for the different components mesa, llvm, kernel. (In reply to Emil Velikov from comment #1) > Mikhail one suggestion to consider for the future: > > Do mention version numbers (or sha if using a git checkout), for the > different components mesa, llvm, kernel. kernel: 4.16.0-rc1-git63e5921e856b mesa: 18.1.0-0.4.git56dc9f9 llvm: 7.0.0-0.1.r326462 [ 463.172901] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=26958, last emitted seq=26960 [ 463.172985] [drm] No hardware hang detected. Did some blocks stall? [ 473.357738] sysrq: SysRq : Show Blocked State [ 473.357758] task PC stack pid father [ 473.357955] amdgpu_cs:0 D13176 2340 2283 0x00000000 [ 473.357969] Call Trace: [ 473.357988] ? __schedule+0x2ed/0xba0 [ 473.358005] ? dma_fence_default_wait+0x14f/0x370 [ 473.358013] schedule+0x2f/0x90 [ 473.358021] schedule_timeout+0x23d/0x540 [ 473.358030] ? find_held_lock+0x34/0xa0 [ 473.358044] ? mark_held_locks+0x56/0x80 [ 473.358053] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 473.358065] ? dma_fence_default_wait+0x14f/0x370 [ 473.358072] dma_fence_default_wait+0x23b/0x370 [ 473.358081] ? dma_fence_release+0x170/0x170 [ 473.358094] dma_fence_wait_timeout+0x4f/0x270 [ 473.358176] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu] [ 473.358237] amdgpu_cs_ioctl+0x99/0x1d60 [amdgpu] [ 473.358357] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 473.358383] drm_ioctl_kernel+0x5b/0xb0 [drm] [ 473.358409] drm_ioctl+0x2d5/0x370 [drm] [ 473.358466] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 473.358479] ? __pm_runtime_resume+0x54/0x90 [ 473.358493] ? trace_hardirqs_on_caller+0xed/0x180 [ 473.358551] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 473.358566] do_vfs_ioctl+0xa5/0x6e0 [ 473.358589] SyS_ioctl+0x74/0x80 [ 473.358603] do_syscall_64+0x79/0x220 [ 473.358612] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 473.358678] RIP: 0033:0x7fa95fa5c0f7 [ 473.358683] RSP: 002b:00007fa957459998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 473.358692] RAX: ffffffffffffffda RBX: 00007fa957459a80 RCX: 00007fa95fa5c0f7 [ 473.358697] RDX: 00007fa957459a00 RSI: 00000000c0186444 RDI: 000000000000000b [ 473.358701] RBP: 00007fa957459a00 R08: 00007fa957459ab0 R09: 00007fa9574599e0 [ 473.358706] R10: 00007fa957459ab0 R11: 0000000000000246 R12: 00000000c0186444 [ 473.358710] R13: 000000000000000b R14: 0000000002876fe8 R15: 0000000000000002 [ 473.358836] tracker-store D12456 2792 2166 0x00000000 [ 473.358848] Call Trace: [ 473.358862] ? __schedule+0x2ed/0xba0 [ 473.358882] schedule+0x2f/0x90 [ 473.358889] io_schedule+0x12/0x40 [ 473.358898] generic_file_read_iter+0x39e/0xdb0 [ 473.358922] ? page_cache_tree_insert+0x130/0x130 [ 473.359001] xfs_file_buffered_aio_read+0x65/0x1a0 [xfs] [ 473.359066] xfs_file_read_iter+0x64/0xc0 [xfs] [ 473.359077] __vfs_read+0x102/0x170 [ 473.359100] vfs_read+0x9e/0x150 [ 473.359111] SyS_pread64+0x93/0xb0 [ 473.359119] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 473.359132] do_syscall_64+0x79/0x220 [ 473.359142] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 473.359148] RIP: 0033:0x7f7bb7448873 [ 473.359152] RSP: 002b:00007ffc37fd1220 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 473.359161] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f7bb7448873 [ 473.359166] RDX: 0000000000001000 RSI: 0000556e21670258 RDI: 0000000000000008 [ 473.359170] RBP: 0000000000001000 R08: 0000556e21670258 R09: 000000000fef0fff [ 473.359175] R10: 0000000002761000 R11: 0000000000000293 R12: 0000000000000000 [ 473.359179] R13: 0000556e21670258 R14: 0000000002761000 R15: 0000556e214e9d80 [ 473.359343] kworker/u16:0 D12152 4711 2 0x80000000 [ 473.359370] Workqueue: events_unbound commit_work [drm_kms_helper] [ 473.359379] Call Trace: [ 473.359394] ? __schedule+0x2ed/0xba0 [ 473.359410] ? dma_fence_default_wait+0x14f/0x370 [ 473.359418] schedule+0x2f/0x90 [ 473.359425] schedule_timeout+0x23d/0x540 [ 473.359433] ? find_held_lock+0x34/0xa0 [ 473.359448] ? mark_held_locks+0x56/0x80 [ 473.359456] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 473.359469] ? dma_fence_default_wait+0x14f/0x370 [ 473.359476] dma_fence_default_wait+0x23b/0x370 [ 473.359484] ? dma_fence_release+0x170/0x170 [ 473.359498] dma_fence_wait_timeout+0x4f/0x270 [ 473.359509] reservation_object_wait_timeout_rcu+0x193/0x4d0 [ 473.359607] amdgpu_dm_do_flip+0x112/0x350 [amdgpu] [ 473.359761] amdgpu_dm_atomic_commit_tail+0xb66/0xdc0 [amdgpu] [ 473.359777] ? wait_for_completion_timeout+0x76/0x1b0 [ 473.359826] commit_tail+0x3d/0x70 [drm_kms_helper] [ 473.359841] process_one_work+0x266/0x6b0 [ 473.359876] worker_thread+0x3a/0x390 [ 473.359883] ? process_one_work+0x6b0/0x6b0 [ 473.359886] kthread+0x121/0x140 [ 473.359890] ? kthread_create_worker_on_cpu+0x70/0x70 [ 473.359896] ret_from_fork+0x3a/0x50 Created attachment 138471 [details]
Shader runner link test
I've distilled one problem in the attached shader runner test. Seems we have another unrolling bug somewhere in the GLSL IR unrolling pass.
We end up with the following:
FRAG
DCL OUT[0], COLOR
DCL TEMP[0..3], LOCAL
IMM[0] UINT32 {0, 4294967295, 0, 0}
IMM[1] INT32 {0, 1, 0, 0}
IMM[2] FLT32 { 1.0000, 0.0000, 0.0000, 0.0000}
0: MOV TEMP[0].x, IMM[0].xxxx
1: MOV TEMP[1].x, IMM[1].xxxx
2: BGNLOOP
3: USEQ TEMP[2].x, TEMP[1].xxxx, IMM[1].yyyy
4: UIF TEMP[2].xxxx
5: BRK
6: ENDIF
7: MOV TEMP[3], IMM[2].xxxx
8: MOV TEMP[0].x, IMM[0].yyyy
9: BRK
10: UADD TEMP[1].x, TEMP[1].xxxx, IMM[1].yyyy
11: ENDLOOP
12: MOV OUT[0], IMM[2].xxxx
13: END
Terminator found in the middle of a basic block!
label %endif6
LLVM ERROR: Broken function found, compilation aborted!
*** Bug 104683 has been marked as a duplicate of this bug. *** Piglit test: https://patchwork.freedesktop.org/patch/214341/ Mesa fix: https://patchwork.freedesktop.org/patch/214346/ Note the WebGL test still froze in my testing but I think Firefox was continuing to use my system mesa libs for some reason. The mesa patch fixes the hang in the piglit test. Likely duplicate of this https://bugs.freedesktop.org/show_bug.cgi?id=104817 This already landed in Mesa. Can we close this as fixed? I don't thinks so because if it happens again by another reason GPU again will hang. I will be happy if it this case GPU reset code will present in driver. I am also affected by this bug. I filed a bug with openSUSE tumbleweed and bug was closed earlier this year. However, with latest mesa updates, the issue resurfaced, therefore I reopened the bug. This is the link https://bugzilla.opensuse.org/show_bug.cgi?id=1090456 System Info: OS: OpenSUSE tumbleweed x86_64 updated (2018 08 27) Kernel: 4.18.0-1-default Desktop Environment: KDE Plasma (x11) OpenGL version string: 3.1 Mesa 18.1.6 GPU: AMD Radeon RX Vega 64 8GB Relevant log lines I found during freeze: 2018-08-09T23:16:53.103775+08:00 MGDT-Tumbleweed kernel: [ 6305.852703] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=1745163, last emitted seq= 1745165 2018-08-09T23:16:53.103795+08:00 MGDT-Tumbleweed kernel: [ 6305.852704] [drm] No hardware hang detected. Did some blocks stall? Dmesg lines relative to amdgpu: [ 3.130759] [drm] amdgpu kernel modesetting enabled. [ 3.135770] fb: switching to amdgpudrmfb from EFI VGA [ 3.136106] amdgpu 0000:03:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff [ 3.136171] amdgpu 0000:03:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used) [ 3.136173] amdgpu 0000:03:00.0: GTT: 512M 0x000000F600000000 - 0x000000F61FFFFFFF [ 3.136494] [drm] amdgpu: 8176M of VRAM memory ready [ 3.136495] [drm] amdgpu: 8176M of GTT memory ready. [ 4.114469] fbcon: amdgpudrmfb (fb0) is primary device [ 4.141179] amdgpu 0000:03:00.0: fb0: amdgpudrmfb frame buffer device [ 4.164072] amdgpu 0000:03:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0 [ 4.164074] amdgpu 0000:03:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0 [ 4.164075] amdgpu 0000:03:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0 [ 4.164075] amdgpu 0000:03:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0 [ 4.164076] amdgpu 0000:03:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0 [ 4.164077] amdgpu 0000:03:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0 [ 4.164078] amdgpu 0000:03:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0 [ 4.164079] amdgpu 0000:03:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0 [ 4.164079] amdgpu 0000:03:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0 [ 4.164080] amdgpu 0000:03:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0 [ 4.164081] amdgpu 0000:03:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1 [ 4.164082] amdgpu 0000:03:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1 [ 4.164083] amdgpu 0000:03:00.0: ring 12(uvd) uses VM inv eng 6 on hub 1 [ 4.164084] amdgpu 0000:03:00.0: ring 13(uvd_enc0) uses VM inv eng 7 on hub 1 [ 4.164085] amdgpu 0000:03:00.0: ring 14(uvd_enc1) uses VM inv eng 8 on hub 1 [ 4.164085] amdgpu 0000:03:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1 [ 4.164086] amdgpu 0000:03:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1 [ 4.164087] amdgpu 0000:03:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1 [ 4.164553] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:03:00.0 on minor 0 as a side note, the freeze does not happen on my Kubuntu system. Same hardware, same games. OS: Kubuntu 18.04 x86_64 updated (2018 08 27) Kernel: 4.15.0-33-generic Desktop Environment: KDE Plasma (x11) OpenGL version string: 3.0 Mesa 18.0.5 GPU: AMD Radeon RX Vega 64 8GB -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1307. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.