105251 – [Vega10] GPU lockup on boot: VMC page fault

Bug 105251 - [Vega10] GPU lockup on boot: VMC page fault

Summary: [Vega10] GPU lockup on boot: VMC page fault

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/AMDgpu (show other bugs)
Version:	DRI git
Hardware:	Other Linux (All)

Importance:	medium blocker
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-26 09:39 UTC by Adrià Cereto i Massagué
Modified:	2019-11-19 08:31 UTC (History)
CC List:	15 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Complete DMESG from boot to lockup (193.42 KB, application/x-troff-man) 2018-07-15 22:44 UTC, Barry G	no flags	Details
glxinfo dump as requested (143.98 KB, text/plain) 2018-08-21 08:48 UTC, CheatCodesOfLife	no flags	Details
debug files (2.28 MB, application/octet-stream) 2018-08-24 14:12 UTC, CheatCodesOfLife	no flags	Details
logs/trace with amd-drm-next and GALLIUM_DDEBUG=always (13.39 MB, application/octet-stream) 2018-08-25 02:48 UTC, CheatCodesOfLife	no flags	Details
amd3.tar.gz dmesg, trace, ddebug logs (3.64 MB, application/gzip) 2018-08-25 09:09 UTC, CheatCodesOfLife	no flags	Details
ddebug_dumps/Cemu.exe_2244_00000000 dmesg_dump event_dump umr_dump (1.91 MB, application/octet-stream) 2018-08-27 23:14 UTC, CheatCodesOfLife	no flags	Details
patch - fix ddebug BO list reporting (1.06 KB, patch) 2018-08-28 19:10 UTC, Marek Olšák	no flags	Details \| Splinter Review
logs after building the patched mesa (2.10 MB, application/octet-stream) 2018-08-29 09:30 UTC, CheatCodesOfLife	no flags	Details
amd6.tar.gz and amd7.tar.gz with usual logs, 2 attempts (2.22 MB, application/octet-stream) 2018-08-30 14:33 UTC, CheatCodesOfLife	no flags	Details
logs and trace (2.13 MB, application/octet-stream) 2018-09-03 13:26 UTC, CheatCodesOfLife	no flags	Details
Logs + trace with patched mesa, plus example code which consistently triggers crash. (4.42 MB, application/x-tar) 2018-09-11 08:36 UTC, zzyxpaw	no flags	Details
vega_crasher Logs + trace with patched mesa (112.37 KB, application/octet-stream) 2018-09-11 10:01 UTC, CheatCodesOfLife	no flags	Details
Gallium, UMR, Dmesg Dump Package (652.06 KB, application/x-7z-compressed) 2018-11-18 01:13 UTC, Benjamin Hodgetts	no flags	Details
New Error Since 4.19.X (21.25 KB, text/plain) 2018-12-13 00:46 UTC, Benjamin Hodgetts	no flags	Details
vega_crasher after patch, black central output, on ryzen 2200G with vega 8 graphics (54.24 KB, image/png) 2019-07-22 21:03 UTC, deltasquared	no flags	Details
vega_crasher after patch, colour shaded central output, on ryzen 2200G with vega 8 graphics (55.14 KB, image/png) 2019-07-22 21:06 UTC, deltasquared	no flags	Details
View All

Description Adrià Cereto i Massagué 2018-02-26 09:39:09 UTC

Happens on linux > 4.16 (also on the amd-staging-4.17-wip) but not on 4.15

Here are the relevant lines from dmesg:

[   33.835186] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835188] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835189] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835193] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835195] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835196] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835200] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835202] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835203] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835207] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835208] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835210] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835214] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835215] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835217] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835220] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835222] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835223] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835227] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835229] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835230] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   33.835234] amdgpu 0000:26:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:1 pas_id:0)
[   33.835235] amdgpu 0000:26:00.0:   at page 0x0000000100000000 from 27
[   33.835237] amdgpu 0000:26:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[   43.998837] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=6, last emitted seq=7
[   43.998848] [drm] No hardware hang detected. Did some blocks stall?

Comment 1 coolo1mc 2018-04-28 06:10:20 UTC

Getting the exact same issue with my vega 56, system hangs when I log in to lightdm, fans spin up and just get louder and louder, shutting down doesn't work. Reverting to 4.15 didn't seem to fix the issue either, even though it was working fine before upgrading to 4.16

Comment 2 Stefan 2018-04-29 13:29:51 UTC

Also happens on Manjaro KDE with kernels 4.16 through the latest 4.17rc:

[ 8164.289086] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289091] amdgpu 0000:38:00.0:   at page 0x000000010d203000 from 27
[ 8164.289093] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[ 8164.289099] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289101] amdgpu 0000:38:00.0:   at page 0x000000010d205000 from 27
[ 8164.289103] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289109] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289110] amdgpu 0000:38:00.0:   at page 0x000000010d20b000 from 27
[ 8164.289112] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289118] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289119] amdgpu 0000:38:00.0:   at page 0x000000010d20d000 from 27
[ 8164.289121] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289126] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289128] amdgpu 0000:38:00.0:   at page 0x000000010d201000 from 27
[ 8164.289129] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289135] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289136] amdgpu 0000:38:00.0:   at page 0x000000010d207000 from 27
[ 8164.289138] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289143] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289145] amdgpu 0000:38:00.0:   at page 0x000000010d209000 from 27
[ 8164.289146] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289152] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289153] amdgpu 0000:38:00.0:   at page 0x000000010d201000 from 27
[ 8164.289154] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289160] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289161] amdgpu 0000:38:00.0:   at page 0x000000010d20e000 from 27
[ 8164.289163] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8164.289168] amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
[ 8164.289170] amdgpu 0000:38:00.0:   at page 0x000000010d212000 from 27
[ 8164.289171] amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 8174.340966] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=401175, last emitted seq=401177
[ 8174.340974] [drm] No hardware hang detected. Did some blocks stall?

Comment 3 Stefan 2018-04-29 13:54:05 UTC

Vega8 / Ryzen 2400G btw.

Comment 4 Levis Raju 2018-05-24 05:24:51 UTC

amdgpu 0000:38:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768)
amdgpu 0000:38:00.0:   at page 0x000000010760d000 from 27
amdgpu 0000:38:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031

Got the issue on kernel 4.17-rc6 with Mesa 18.2 built against LLVM 7.0.
2400G with Vega 11 Graphics.

Comment 5 coolo1mc 2018-05-25 14:22:55 UTC

Is there any additional info we need to get? Anything we can test? My system is currently unusable until this is fixed and it has been 3 months since being reported and haven't heard anything but more reports

Comment 6 dxxf 2018-06-01 17:31:30 UTC

It seems I'm now affected by this bug too...

Hardware:
GPU: RX Vega 64 Liquid
CPU: Ryzen R7 1800X

Software:
OS: OpenSUSE Tumbleweed
Kernel: 4.17rc5 (from OpenSUSE Factory repos)
Mesa: 18.1.0 (from OpenSUSE Tumbleweed repos)

Kernel log - "journalctl -b -1 -r | grep amdgpu":
May 31 20:38:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2, last emitted seq=3
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x001013BD
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 from 27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:222 vmid:1 pasid:32768)
May 31 20:35:48 kernel: [drm] Initialized amdgpu 3.25.0 20150101 for 0000:0d:00.0 on minor 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 14(uvd_enc1) uses VM inv eng 8 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 13(uvd_enc0) uses VM inv eng 7 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 12(uvd) uses VM inv eng 6 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: fb0: amdgpudrmfb frame buffer device
May 31 20:35:48 kernel: fbcon: amdgpudrmfb (fb0) is primary device
May 31 20:35:47 kernel: [drm] amdgpu: 8176M of GTT memory ready.
May 31 20:35:47 kernel: [drm] amdgpu: 8176M of VRAM memory ready
May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: GTT: 512M 0x000000F600000000 - 0x000000F61FFFFFFF
May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
May 31 20:35:47 kernel: [drm] add ip block number 6 <gfx_v9_0>
May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: enabling device (0006 -> 0007)
May 31 20:35:47 kernel: fb: switching to amdgpudrmfb from EFI VGA
May 31 20:35:47 kernel: [drm] amdgpu kernel modesetting enabled.

VMC Page faults are now in the log always, but "amdgpu_job_timeout" is 
persistent:
May 31 20:38:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2, last emitted seq=3

Comment 7 coolo1mc 2018-06-04 14:00:17 UTC

I discovered that the cause of this for me was pywal, when I ran it my gpu hung, but if I didn't run it, it was otherwise fine. Another cause is cemu through wine with mesa_mild

Comment 8 dxxf 2018-06-04 17:52:48 UTC

For me its hang immediately at boot (as soon as Xorg loads). Only way I was able to successfully boot the machine is setting: "NoAccel" "True" in Xorg.conf.d/10-amdgpu.conf.
In some cases there is nothing in dmesg or Xorg.0.log machine just hangs with "cursor" on the screen.

Comment 9 dxxf 2018-06-04 18:39:55 UTC

So one more update:
My boot issue went away after updating:
- kernel-firmware to 20180525 (as there were some amdgpu firmware updates in 20180518).
- libLLVM6 from 6.0.0rc1 to 6.0.0 (and I strongly suspect this was the cause as I had VM pagefault issues before with libLLVM5 - but only in some OpenGL applications, not at boot).

Comment 10 Christian König 2018-06-05 08:50:22 UTC

Hi everybody,

first of all please add logs as attachments and not inline into the bug report.

Then make sure that the firmware files are up to date. It looks like we accidentally released corrupted firmware files once, but those should already be replaced with working versions.

Comment 11 Barry G 2018-07-15 22:44:34 UTC

Created attachment 140645 [details]
Complete DMESG from boot to lockup

Comment 12 Barry G 2018-07-15 22:56:30 UTC

My system (Threadripper, Vega 64) started exhibiting the same issue on 4.17.  It will lock hard for me under IO.  I have a custom python script that I run that does NFS IO off my X550 network card and then invokes imagemagick/convert to generate thumbnails.  I wasn't experiencing issues on 4.16 personally.  4.17.5 locks 100% on running of the script with the attached ciri example dmesg.

The only other system change I have made recently is the addition of the opencl-amd package in a failed attempt to make Divinci Resolve run (https://aur.archlinux.org/packages/opencl-amd/).

My linux-firmware package is linux-firmware-git 20171125.17e6288-1.  There might be a better way to get the firmware version from the card itself, but I don't possess such knowledge (yet).

Barry

Comment 13 Barry G 2018-07-15 23:09:18 UTC

I upgraded my linux-firmware to 20180606.d114732-1 and it had no affect on the issue.  Still locks running the script with the same dmesg.

Comment 14 Barry G 2018-07-16 15:13:57 UTC

Did some more testing and found that I can cause this issue to happen repeatably by using Imagemagick convert to attempt to convert and resize a jpg image.

Doing the same convert and settings the environment variable MAGICK_OCL_DEVICE=OFF works without lockup.

Some sort of OpenCL thing?

Comment 15 dergottdergrunten 2018-08-08 23:40:42 UTC

I think my issue is related. I get black screen boots roughly every 2/3 times I boot up my computer. This last time, it booted up, kernel panic'd and I could still see the output so I took some pictures.

https://imgur.com/gallery/T69zIjX

Info:

$ uname -a
Linux itx-dev.local 4.17.12-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 3 07:16:41 UTC 2018 x86_64 GNU/Linux

$ cat /proc/cpuinfo
processor	: 5
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 17
model name	: AMD Ryzen 5 2400G with Radeon Vega Graphics
stepping	: 0

$ pacman -Qs amdgpu
local/xf86-video-amdgpu 18.0.1-2 (xorg-drivers)
    X.org amdgpu video driver

$ pacman -Qs mesa
local/glu 9.0.0-5
    Mesa OpenGL Utility library
local/lib32-libva-mesa-driver 18.1.5-1
    VA-API implementation for gallium (32-bit)
local/lib32-mesa 18.1.5-1
    An open-source implementation of the OpenGL specification (32-bit)
local/lib32-vulkan-radeon 18.1.5-1
    Radeon's Vulkan mesa driver (32-bit)
local/libva-mesa-driver 18.1.5-1
    VA-API implementation for gallium
local/mesa 18.1.5-1
    An open-source implementation of the OpenGL specification
local/mesa-vdpau 18.1.5-1
    Mesa VDPAU drivers
local/vulkan-radeon 18.1.5-1
    Radeon's Vulkan mesa driver

Comment 16 Andrey Grodzovsky 2018-08-14 19:19:50 UTC

Hi everyone, I've tried with latest kernel and latest VEGA10 firmware and wasn't able to reproduce this problem.

From the logs it seems all of you are running 4.17.x kernel or earlier - try latest 4.18 and latest firmware form here -

https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/

Comment 17 CheatCodesOfLife 2018-08-20 06:30:14 UTC

(In reply to Andrey Grodzovsky from comment #16)
> Hi everyone, I've tried with latest kernel and latest VEGA10 firmware and
> wasn't able to reproduce this problem.
> 
> From the logs it seems all of you are running 4.17.x kernel or earlier - try
> latest 4.18 and latest firmware form here -
> 
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/

Hi,

I can reproduce this every time, on kernel 4.18 with mesa 18.3 and a Vega64.

Simply try to open Mario Kart 8 in Cemu with wine, and the system will crash with the exact same dmesg.

Comment 18 Andrey Grodzovsky 2018-08-20 15:13:30 UTC

(In reply to CheatCodesOfLife from comment #17)
> (In reply to Andrey Grodzovsky from comment #16)
> > Hi everyone, I've tried with latest kernel and latest VEGA10 firmware and
> > wasn't able to reproduce this problem.
> > 
> > From the logs it seems all of you are running 4.17.x kernel or earlier - try
> > latest 4.18 and latest firmware form here -
> > 
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> 
> Hi,
> 
> I can reproduce this every time, on kernel 4.18 with mesa 18.3 and a Vega64.
> 
> Simply try to open Mario Kart 8 in Cemu with wine, and the system will crash
> with the exact same dmesg.

I had mesa 18.2 so I updated to 18.3 - still nothing. Could you provide glxinfo dump ? What LLVM are you using ? I have 7.

Comment 19 CheatCodesOfLife 2018-08-21 08:46:24 UTC

(In reply to Andrey Grodzovsky from comment #18)
> (In reply to CheatCodesOfLife from comment #17)
> > (In reply to Andrey Grodzovsky from comment #16)
> > > Hi everyone, I've tried with latest kernel and latest VEGA10 firmware and
> > > wasn't able to reproduce this problem.
> > > 
> > > From the logs it seems all of you are running 4.17.x kernel or earlier - try
> > > latest 4.18 and latest firmware form here -
> > > 
> > > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> > > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> > 
> > Hi,
> > 
> > I can reproduce this every time, on kernel 4.18 with mesa 18.3 and a Vega64.
> > 
> > Simply try to open Mario Kart 8 in Cemu with wine, and the system will crash
> > with the exact same dmesg.
> 
> I had mesa 18.2 so I updated to 18.3 - still nothing. Could you provide
> glxinfo dump ? What LLVM are you using ? I have 7.

I have had this problem with mesa 18.2 and LLVM7.

Currently on mesa 18.3 and LLVM8.

I also had this result with a Vega56, and I know people online who have the same problem. Nobody can open Mario Kart 8 in Cemu with wine if they have a Vega card. 

I've attached my glxinfo > glxinfo.txt

Comment 20 CheatCodesOfLife 2018-08-21 08:48:55 UTC

Created attachment 141210 [details]
glxinfo dump as requested

Comment 21 Andrey Grodzovsky 2018-08-21 14:56:19 UTC

(In reply to CheatCodesOfLife from comment #17)
> (In reply to Andrey Grodzovsky from comment #16)
> > Hi everyone, I've tried with latest kernel and latest VEGA10 firmware and
> > wasn't able to reproduce this problem.
> > 
> > From the logs it seems all of you are running 4.17.x kernel or earlier - try
> > latest 4.18 and latest firmware form here -
> > 
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> 
> Hi,
> 
> I can reproduce this every time, on kernel 4.18 with mesa 18.3 and a Vega64.
> 
> Simply try to open Mario Kart 8 in Cemu with wine, and the system will crash
> with the exact same dmesg.

I had mesa 18.2 so I updated to 18.3 - still nothing. Could you provide glxinfo dump ? What LLVM are you using ? I have 7.(In reply to CheatCodesOfLife from comment #20)
> Created attachment 141210 [details]
> glxinfo dump as requested

Thanks for the info, is there any other way you reproduce it without the wine platform ?

Comment 22 CheatCodesOfLife 2018-08-21 15:12:34 UTC

You're welcome.

Not the exact same problem, no. I can get a hard-lock by trying to use amdvlk to play rpcs3, but it doesn't produce the same error and it's not as consistent (takes up to 15 minutes to crash)

Not sure if it's worth noting but I went back and tried every Cemu version back to 1.5 and a lot of wine versions going back to 2.8. It happens every time as soon as the game loads.

Comment 23 Andrey Grodzovsky 2018-08-22 20:21:43 UTC

(In reply to CheatCodesOfLife from comment #22)
> You're welcome.
> 
> Not the exact same problem, no. I can get a hard-lock by trying to use
> amdvlk to play rpcs3, but it doesn't produce the same error and it's not as
> consistent (takes up to 15 minutes to crash)
> 
> Not sure if it's worth noting but I went back and tried every Cemu version
> back to 1.5 and a lot of wine versions going back to 2.8. It happens every
> time as soon as the game loads.

Let's try to get some debug info for the VMC page fault then -  

Clone and build our open source register analyzer from here - https://cgit.freedesktop.org/amd/umr/ 
Install trace-cmd utility 
Load driver with cmd line parameter amdgpu.vm_fault_stop=2 from grub
P.S Best to use latest kernel from here - https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

After desktop is loaded type 

sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
to enable kernel event tracing log

If possible to launch the game from shell then prepend the command with  GALLIUM_DDEBUG=always 
to dump all the MESA commands into files in ~/ddebug_dumps/


Start the game. When the problem happens do the following - 

as root 
cd /sys/kernel/debug/tracing && cat trace > event_dump

as normal user or root
sudo umr -lb > umr_dump
sudo umr -O verbose,use_colour -R gfx[.] >> umr_dump
sudo umr -O halt_waves,use_colour -wa >> umr_dump
dmesg > dmesg_dump

Upload a tar/zip of all those files + all the files from ~/ddebug_dumps/

Comment 24 CheatCodesOfLife 2018-08-24 14:11:55 UTC

(In reply to Andrey Grodzovsky from comment #23)
> (In reply to CheatCodesOfLife from comment #22)
> > You're welcome.
> > 
> > Not the exact same problem, no. I can get a hard-lock by trying to use
> > amdvlk to play rpcs3, but it doesn't produce the same error and it's not as
> > consistent (takes up to 15 minutes to crash)
> > 
> > Not sure if it's worth noting but I went back and tried every Cemu version
> > back to 1.5 and a lot of wine versions going back to 2.8. It happens every
> > time as soon as the game loads.
> 
> Let's try to get some debug info for the VMC page fault then -  
> 
> Clone and build our open source register analyzer from here -
> https://cgit.freedesktop.org/amd/umr/ 
> Install trace-cmd utility 
> Load driver with cmd line parameter amdgpu.vm_fault_stop=2 from grub
> P.S Best to use latest kernel from here -
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> 
> After desktop is loaded type 
> 
> sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
> "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
> to enable kernel event tracing log
> 
> If possible to launch the game from shell then prepend the command with 
> GALLIUM_DDEBUG=always 
> to dump all the MESA commands into files in ~/ddebug_dumps/
> 
> 
> Start the game. When the problem happens do the following - 
> 
> as root 
> cd /sys/kernel/debug/tracing && cat trace > event_dump
> 
> as normal user or root
> sudo umr -lb > umr_dump
> sudo umr -O verbose,use_colour -R gfx[.] >> umr_dump
> sudo umr -O halt_waves,use_colour -wa >> umr_dump
> dmesg > dmesg_dump
> 
> Upload a tar/zip of all those files + all the files from ~/ddebug_dumps/

Thanks for the instructions. I think I've followed them correctly. I didn't build the amd-drm-next kernel as it'll be an overnight job (slow internet speeds) but I did add the grub parameters. I have attached the files.

Comment 25 CheatCodesOfLife 2018-08-24 14:12:59 UTC

Created attachment 141269 [details]
debug files

Comment 26 Andrey Grodzovsky 2018-08-24 14:24:36 UTC

(In reply to CheatCodesOfLife from comment #25)
> Created attachment 141269 [details]
> debug files

Thanks a lot, i will find some time in the next few days to analyze it.

Comment 27 Andrey Grodzovsky 2018-08-24 16:39:13 UTC

(In reply to CheatCodesOfLife from comment #25)
> Created attachment 141269 [details]
> debug files

Since your kernel build doesn't have the latest AMD code I don't have ALL the trace logs so I can't be curtain but it does looks like the address reported by GPU fault is bad address, it's above any VA range seen in logs.
I would need you to run Cemu.exe with GALLIUM_DDEBUG=always environment variable and upload logs from from ~/ddebug_dumps/
From googling it looks like WINE will pass down any ENVs picked from shell to the apps it runs so should be easy -  just run GALLIUM_DDEBUG=always 'WINE launch commands' from shell.
Also provide all the other logs like last time.

Comment 28 Andrey Grodzovsky 2018-08-24 19:35:16 UTC

Also please verify you MESA build includes the following fix - 
https://cgit.freedesktop.org/mesa/mesa/commit/id=c5c6e0187fd5d535c304ca3fd62de0f5e636c0c2

I assume you are running WINE with MESA ?

Comment 29 Andrey Grodzovsky 2018-08-24 19:37:24 UTC

Sorry , this link 
https://cgit.freedesktop.org/mesa/mesa/commit/?id=c5c6e0187fd5d535c304ca3fd62de0f5e636c0c2

Comment 30 CheatCodesOfLife 2018-08-25 02:48:45 UTC

Created attachment 141276 [details]
logs/trace with amd-drm-next and GALLIUM_DDEBUG=always

Comment 31 CheatCodesOfLife 2018-08-25 02:49:50 UTC

(In reply to Andrey Grodzovsky from comment #29)
> Sorry , this link 
> https://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=c5c6e0187fd5d535c304ca3fd62de0f5e636c0c2

Yeah, I am using mesa. I've setup the amd-drm-next kernel kernel. This is the command I used to launch Cemu:

GALLIUM_HUD="fps" GALLIUM_DDEBUG=always wine64 Cemu.exe

(I switched on the GALLIUM_HUD as well so that I could verify that wine was receiving the ENVs, which it is.)

uname -a
Linux nihonium2 4.18.0-rc1-5024f8dfe478 #1 SMP PREEMPT Sat Aug 25 05:10:49 AEST 2018 x86_64 GNU/Linux

The logs are attached (this time it took 3 tries to actually launch cemu due to an unrelated issue so the archive is 14mb)

Comment 32 Andrey Grodzovsky 2018-08-25 05:59:47 UTC

Looks like dmesg is missing, Can you recover the correct dmesg log for this last reproduction ? The bad address is there.

Comment 33 CheatCodesOfLife 2018-08-25 09:09:18 UTC

Created attachment 141277 [details]
amd3.tar.gz dmesg, trace, ddebug logs

Sorry about that.
I don't have that dmesg any more but I did the whole process again and attached it. This time I have confirmed all the files are in the archive.

Comment 34 Andrey Grodzovsky 2018-08-27 16:12:18 UTC

(In reply to CheatCodesOfLife from comment #33)
> Created attachment 141277 [details]
> amd3.tar.gz dmesg, trace, ddebug logs
> 
> Sorry about that.
> I don't have that dmesg any more but I did the whole process again and
> attached it. This time I have confirmed all the files are in the archive.

But where is the trace file ? :) Any way I will try to check with what I have.

Comment 35 Andrey Grodzovsky 2018-08-27 17:19:00 UTC

(In reply to Andrey Grodzovsky from comment #34)
> (In reply to CheatCodesOfLife from comment #33)
> > Created attachment 141277 [details]
> > amd3.tar.gz dmesg, trace, ddebug logs
> > 
> > Sorry about that.
> > I don't have that dmesg any more but I did the whole process again and
> > attached it. This time I have confirmed all the files are in the archive.
> 
> But where is the trace file ? :) Any way I will try to check with what I
> have.

Any way, doesn't matter, could you please redo the capture and this time instead of GALLIUM_DDEBUG=always do GALLIUM_DDEBUG=1000 ? This way we can get one big dump file when VM_FAULT happens with all the info.

Comment 36 CheatCodesOfLife 2018-08-27 23:14:21 UTC

Created attachment 141303 [details]
ddebug_dumps/Cemu.exe_2244_00000000 dmesg_dump  event_dump  umr_dump

Hi,

This time I double-checked the tar archive, the trace, dmesg umr and ddebug file are there. It's just 1 ddebug file this time as you said, but it's only 368kb.

Command I used was:
GALLIUM_HUD="fps" GALLIUM_DDEBUG=1000 wine64 Cemu.exe

Comment 37 Marek Olšák 2018-08-28 19:10:50 UTC

Created attachment 141323 [details] [review]
patch - fix ddebug BO list reporting

Hi,

Can you please get a new ddebug report with the attached patch? Thanks.

Comment 38 Andrey Grodzovsky 2018-08-28 19:20:24 UTC

(In reply to Marek Olšák from comment #37)
> Created attachment 141323 [details] [review] [review]
> patch - fix ddebug BO list reporting
> 
> Hi,
> 
> Can you please get a new ddebug report with the attached patch? Thanks.

Just to be clear, you need to rebuild you mesa library with that patch on top.

Comment 39 CheatCodesOfLife 2018-08-29 09:30:20 UTC

Created attachment 141342 [details]
logs after building the patched mesa

Hi,

Thanks for the logging patch. 

I have applied patched that into the latest master branch from the mesa github page, built it and ran the game again with the new version.

The logs are attached.

Comment 40 Andrey Grodzovsky 2018-08-29 14:32:59 UTC

Marek Olšák, I still don't see the expected debug output. I looked for 'Buffer list'
CheatCodesOfLife, can you verify please you are running the patched version of MESA ? We tested yesterday the new prints and they do show on VM_FAULTs.

Comment 41 CheatCodesOfLife 2018-08-29 22:54:15 UTC

(In reply to Andrey Grodzovsky from comment #40)
> Marek Olšák, I still don't see the expected debug output. I looked for
> 'Buffer list'
> CheatCodesOfLife, can you verify please you are running the patched version
> of MESA ? We tested yesterday the new prints and they do show on VM_FAULTs.

This is most likely my fault as I'm new to most of this sort of thing. This is what I did, maybe you'll see where I went wrong:

- Patch
This is the patched version of src/gallium/drivers/radeonsi/si_gfx_cs.c
http://termbin.com/ypet

- Build
I installed this build of mesa to a different prefix, rather than overriding my system install (I use this computer for work, everything).

System install:
glxinfo |grep Mesa\ 18
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.0-devel (git-e345247092)
OpenGL version string: 4.4 (Compatibility Profile) Mesa 18.3.0-devel (git-e345247092)
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.3.0-devel (git-e345247092)

New build:

OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.0-devel (git-a72dbc461b)
OpenGL version string: 4.4 (Compatibility Profile) Mesa 18.3.0-devel (git-a72dbc461b)
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.3.0-devel (git-a72dbc461b)

- Running:
I then ran Cemu like this:
LD_LIBRARY_PATH=/home/paul/mesa_log/lib/ GALLIUM_HUD="fps" GALLIUM_DDEBUG=1000 wine64 Cemu.exe

I know wine lets you do this because this is how we used to use a fork of mesa called 'mesa_mild' to get the required compatibility profile prior to mesa 18.2 which provided core compatibility 4.4

If installing to a prefix like that isn't adequate for this testing, let me know and I'll re-install the OS on an external drive, do a system-wide install of this patched mesa and try again.

Comment 42 Marek Olšák 2018-08-30 01:37:40 UTC

The file is incomplete, but I don't know why. Can you try it again? Maybe it'll be complete next time. It's better to use the REISUB key sequence to reboot the machine. (put it in google)

Comment 43 CheatCodesOfLife 2018-08-30 02:03:31 UTC

(In reply to Marek Olšák from comment #42)
> The file is incomplete, but I don't know why. Can you try it again? Maybe
> it'll be complete next time. It's better to use the REISUB key sequence to
> reboot the machine. (put it in google)

Hi Marek,

Yep, I'll do this tonight (including the REISUB to reboot).

In which file should I grep for 'Buffer list' to ensure it's worked before posting here?

And is fine that I've sandbox'd the install to /home/paul/mesa_log rather than a system-install?

Comment 44 Marek Olšák 2018-08-30 02:41:18 UTC

If glxinfo picks up the correct driver, it's fine.

The ddebug file should contain "Buffer list".

Comment 45 CheatCodesOfLife 2018-08-30 14:27:37 UTC

I've just tried it again a couple of times, and this time I'm sitting there tailing (-f) the ddebug file and nothing is being added to it after GFX_

 tail -f ~/ddebug_dumps/Cemu.13f.exe_1990_00000000
                            HQD_IB_BUSY = 0
        CP_CPF_STALLED_STAT1 <- RING_FETCHING_DATA = 1
                                INDR1_FETCHING_DATA = 1
                                INDR2_FETCHING_DATA = 0
                                STATE_FETCHING_DATA = 0
                                TCIU_WAITING_ON_FREE = 0
                                TCIU_WAITING_ON_TAGS = 0
                                UTCL2IU_WAITING_ON_FREE = 0
                                UTCL2IU_WAITING_ON_TAGS = 0
                                GFX_

It's been 10 minutes, that's the end of it.
I don't think it's a reboot / flush logs to the filesystem issue since I'm still SSH'd in and following the log file.
No  "Buffer list" in the file either :(

I also tried building the latest master branch and applying the patch again, same thing.

On the monitor in the terminal where I ran wine, it says "Hang detection timeout is 1000ms." Not sure if that's relevant.

Comment 46 CheatCodesOfLife 2018-08-30 14:33:44 UTC

Created attachment 141377 [details]
amd6.tar.gz and amd7.tar.gz with usual logs, 2 attempts

Comment 47 Marek Olšák 2018-08-31 19:13:26 UTC

The log is truncated for some reason. Can you apply this to make it shorter?

diff --git a/src/gallium/drivers/radeonsi/si_debug.c b/src/gallium/drivers/radeonsi/si_debug.c
index 5e80469cee1..325e1e3ed01 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -101,6 +101,7 @@ static void si_dump_shader(struct si_screen *sscreen,
                           enum pipe_shader_type processor,
                           const struct si_shader *shader, FILE *f)
 {
+       return;
        if (shader->shader_log)
                fwrite(shader->shader_log, shader->shader_log_size, 1, f);
        else

Comment 48 CheatCodesOfLife 2018-09-03 13:26:58 UTC

Created attachment 141425 [details]
logs and trace

Hi,

I have applied the patch, ran through the process and attached the logs. The file doesn't appear to be truncated anymore.

Comment 49 Andrey Grodzovsky 2018-09-04 17:40:53 UTC

(In reply to CheatCodesOfLife from comment #48)
> Created attachment 141425 [details]
> logs and trace
> 
> Hi,
> 
> I have applied the patch, ran through the process and attached the logs. The
> file doesn't appear to be truncated anymore.

Looks like still not Buffer list in the log...

Comment 50 CheatCodesOfLife 2018-09-06 04:07:20 UTC

(In reply to Andrey Grodzovsky from comment #49)
> (In reply to CheatCodesOfLife from comment #48)
> > Created attachment 141425 [details]
> > logs and trace
> > 
> > Hi,
> > 
> > I have applied the patch, ran through the process and attached the logs. The
> > file doesn't appear to be truncated anymore.
> 
> Looks like still not Buffer list in the log...

Hi Andrey, sorry for the late reply, I applied the patches and built it as you guys wanted. Could something to do with the crash be causing the log file to be incomplete?

I think the system is pretty unstable after the crash. Apart from all input/output on the desktop going away, I also can't 'reboot' or 'shutdown -h now' (I have to do REISUB). Perhaps something there is affecting the logging?

Anything else you could think of that I can try on my end?

Cheers

Comment 51 Michel Dänzer 2018-09-06 09:55:55 UTC

One thing you could try is setting the synchronous attribute on the ddebug dump file before the hang:

 chattr +S ~/ddebug_dump/*

Of course, you'll have to wait for the file to be created before doing this.

Comment 52 zzyxpaw 2018-09-11 08:36:05 UTC

Created attachment 141522 [details]
Logs + trace with patched mesa, plus example code which consistently triggers crash.

I've been experiencing a random crash which seems a lot like this; the image freezes, the keyboard stops working, the mouse can still be moved for a second then also freezes, and the "GPU usage" leds all light and the fans spin up.

Oddly enough, while working on a toy opengl program I seem to have accidentally found a means of consistently triggering it. I've included the sources in the tarball; I didn't try and narrow down the exact cause, so please pardon any extra fluff which is no doubt in there.

I at least captured a trace which contains the string "Buffer list". I also noticed umr was spitting quite a bit of stuff out to stderr which isn't in the dump; if you want that too let me know.

Some version numbers:
Radeon RX Vega 64
Linux amd-staging-drm-next 4.19.0-rc1-d0a96214993c
Mesa 18.3.0-devel (git-133e12fb69) (with the si_debug.c patch applied)

Comment 53 CheatCodesOfLife 2018-09-11 10:01:16 UTC

Created attachment 141524 [details]
vega_crasher Logs + trace with patched mesa

Hi Michel,

Even with the chattr +S command, the buffer list is not present :(

I also ran the vega_crasher from zzyxpaw and am able to reproduce that on my system. I have attached the output and it includes the Buffer Lists.

For some reason, when Cemu + Mario Kart 8 crashes, the file gets truncated, but when the vega_crasher tool crashes, the files are not truncated. This leads me to believe I'm not doing anything wrong? lol.

Other than that, the symptoms are the same. Mouse moves for a little while then it stops.

Comment 54 CheatCodesOfLife 2018-09-12 03:26:58 UTC

Oh and my umr spits out a lot of things to stderr as well, with both this and the MK8 crash. Let me know if you want this.

Comment 55 zzyxpaw 2018-10-24 04:01:36 UTC

Any updates to this? I can still reproduce with the latest amd-drm-next kernel.

Comment 56 CheatCodesOfLife 2018-10-24 08:38:35 UTC

I am still able to reproduce this, as is everybody with a Vega in the #linux channel in the Cemu discord server.

Someone with a Vega8 has also reproduced it.

Comment 57 Antonio Chirizzi 2018-10-28 12:14:41 UTC

Hello there,
I am seeing the same problems on my Ryzen 2700U which is freezing as well, with the latest Ubuntu kernel: 4.19.0-041900-generic.

I am luckily able to ssh back in and try to shut it down, but it won't completely.

This is what I see on the kern.log, and in my case the whole graphics is frozen.
I am running linuxmint cinnamon 19, with the latest Ubuntu kernel.



Oct 28 11:30:58 antonioRyzen kernel: [22639.758782] gmc_v9_0_process_interrupt: 10 callbacks suppressed
Oct 28 11:30:58 antonioRyzen kernel: [22639.758789] amdgpu 0000:02:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:7 pasid:32769, for p
rocess cinnamon pid 1459 thread amdgpu_cs:0 pid 1463
Oct 28 11:30:58 antonioRyzen kernel: [22639.758789] )
Oct 28 11:30:58 antonioRyzen kernel: [22639.758797] amdgpu 0000:02:00.0:   at address 0x00000001010e1000 from 27
Oct 28 11:30:58 antonioRyzen kernel: [22639.758801] amdgpu 0000:02:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00701031
Oct 28 11:30:58 antonioRyzen kernel: [22639.758818] amdgpu 0000:02:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:7 pasid:32769, for process cinnamon pid 1459 thread amdgpu_cs:0 pid 1463
Oct 28 11:30:58 antonioRyzen kernel: [22639.758818] )
Oct 28 11:30:58 antonioRyzen kernel: [22639.758822] amdgpu 0000:02:00.0:   at address 0x00000001010e0000 from 27
Oct 28 11:30:58 antonioRyzen kernel: [22639.758825] amdgpu 0000:02:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 28 11:30:58 antonioRyzen kernel: [22639.758834] amdgpu 0000:02:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:7 pasid:32769, for process cinnamon pid 1459 thread amdgpu_cs:0 pid 1463
Oct 28 11:30:58 antonioRyzen kernel: [22639.758834] )
Oct 28 11:30:58 antonioRyzen kernel: [22639.758839] amdgpu 0000:02:00.0:   at address 0x00000001010e0000 from 27
Oct 28 11:30:58 antonioRyzen kernel: [22639.758841] amdgpu 0000:02:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 28 11:30:58 antonioRyzen kernel: [22639.758850] amdgpu 0000:02:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:7 pasid:32769, for process cinnamon pid 1459 thread amdgpu_cs:0 pid 1463
Oct 28 11:30:58 antonioRyzen kernel: [22639.758850] )
Oct 28 11:30:58 antonioRyzen kernel: [22639.758853] amdgpu 0000:02:00.0:   at address 0x00000001010e0000 from 27
Oct 28 11:30:58 antonioRyzen kernel: [22639.758855] amdgpu 0000:02:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 28 11:30:58 antonioRyzen kernel: [22639.758863] amdgpu 0000:02:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:7 pasid:32769, for process cinnamon pid 1459 thread amdgpu_cs:0 pid 1463
Oct 28 11:30:58 antonioRyzen kernel: [22639.758863] )
Oct 28 11:30:58 antonioRyzen kernel: [22639.758867] amdgpu 0000:02:00.0:   at address 0x00000001010e0000 from 27
Oct 28 11:30:58 antonioRyzen kernel: [22639.758869] amdgpu 0000:02:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 28 11:30:58 antonioRyzen kernel: [22639.758877] amdgpu 0000:02:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:7 pasid:32769, for process cinnamon pid 1459 thread amdgpu_cs:0 pid 1463
...
Oct 28 11:30:58 antonioRyzen kernel: [22639.758931] )
Oct 28 11:30:58 antonioRyzen kernel: [22639.758935] amdgpu 0000:02:00.0:   at address 0x00000001010e1000 from 27
Oct 28 11:30:58 antonioRyzen kernel: [22639.758937] amdgpu 0000:02:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 28 11:31:08 antonioRyzen kernel: [22649.811916] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1445040, emitt
ed seq=1445042
Oct 28 11:31:08 antonioRyzen kernel: [22649.811925] [drm] GPU recovery disabled.
Oct 28 11:33:58 antonioRyzen kernel: [22820.391906] INFO: task kworker/u32:1:19683 blocked for more than 120 seconds.
Oct 28 11:33:58 antonioRyzen kernel: [22820.391914]       Not tainted 4.19.0-041900-generic #201810221809
Oct 28 11:33:58 antonioRyzen kernel: [22820.391917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 11:33:58 antonioRyzen kernel: [22820.391920] kworker/u32:1   D    0 19683      2 0x80000000
Oct 28 11:33:58 antonioRyzen kernel: [22820.391943] Workqueue: events_unbound commit_work [drm_kms_helper]
Oct 28 11:33:58 antonioRyzen kernel: [22820.391945] Call Trace:
Oct 28 11:33:58 antonioRyzen kernel: [22820.391956]  __schedule+0x29e/0x840
Oct 28 11:33:58 antonioRyzen kernel: [22820.391959]  schedule+0x2c/0x80
Oct 28 11:33:58 antonioRyzen kernel: [22820.391962]  schedule_timeout+0x258/0x360
Oct 28 11:33:58 antonioRyzen kernel: [22820.392050]  ? optc1_get_crtc_scanoutpos+0x69/0xa0 [amdgpu]
Oct 28 11:33:58 antonioRyzen kernel: [22820.392062]  dma_fence_default_wait+0x20a/0x280
Oct 28 11:33:58 antonioRyzen kernel: [22820.392065]  ? dma_fence_release+0xa0/0xa0
Oct 28 11:33:58 antonioRyzen kernel: [22820.392068]  dma_fence_wait_timeout+0xe7/0x110
Oct 28 11:33:58 antonioRyzen kernel: [22820.392071]  reservation_object_wait_timeout_rcu+0x201/0x340
Oct 28 11:33:58 antonioRyzen kernel: [22820.392140]  ? amdgpu_get_vblank_counter_kms+0x111/0x160 [amdgpu]
Oct 28 11:33:58 antonioRyzen kernel: [22820.392222]  amdgpu_dm_do_flip+0x12c/0x370 [amdgpu]
Oct 28 11:33:58 antonioRyzen kernel: [22820.392305]  amdgpu_dm_atomic_commit_tail+0x7ac/0xea0 [amdgpu]

Comment 58 CheatCodesOfLife 2018-10-28 23:19:04 UTC

Hi Adrià,

Are you getting this by trying to run MK8 in Cemu? Or some other way?

Could you try running through Andrey's instructions here:
https://bugs.freedesktop.org/show_bug.cgi?id=105251#c23

That didn't work on my system (log was truncated) so they've kinda stopped looking into it, but if you could get them the complete log maybe they'll find something?

I have the same thing where I can ssh back in, but can't fully `shutdown -h now` as it hangs part of the way through.

You can reboot more gracefully by doing this:
http://blog.kember.net/articles/reisub-the-gentle-linux-restart/

Comment 59 Benjamin Hodgetts 2018-11-17 22:57:00 UTC

It's not just Cemu, it looks like it happens in Yuzu too. If you Google for "VMC page fault" then you'll find people running into that error in various other programs too.

Personally, this is what I got when running MK8 in Cemu:
==============
[Sat Nov 17 22:29:43 2018] amdgpu 0000:03:00.0: [gfxhub] VMC page fault (src_id:0 ring:155 vmid:3 pasid:32769, for process Cemu.exe pid 963 thread Cemu.exe:cs0 pid 1035) at address 0x000080014189a000 from 27 VM_L2_PROTECTION_FAULT_STATUS:0x00301137
(repeated 7 times over the space of a few minutes)
[Sat Nov 17 22:32:44 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=1020, emitted seq=1023
==============

And then this when trying to run Super Mario Odyssey in Yuzu:
==============
[Sat Nov 17 22:47:26 2018] amdgpu 0000:03:00.0: [gfxhub] VMC page fault (src_id:0 ring:156 vmid:3 pasid:32769, for process yuzu pid 960 thread yuzu:cs0 pid 972 at address 0x000080bd27743000 from 27 VM_L2_PROTECTION_FAULT_STATUS:0x00301138
[Sat Nov 17 22:47:36 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=24703, emitted seq=24704
==============

I'll look into getting the info dump that was requested earlier in the thread to see if that helps, but the seemingly abandoned state of this bug is rather concerning.

Comment 60 Benjamin Hodgetts 2018-11-18 01:13:11 UTC

Created attachment 142505 [details]
Gallium, UMR, Dmesg Dump Package

Ok, following Andrey Grodzovsky's instructions to get the dumps didn't work for Yuzu but it did for Cemu.

In the case of Yuzu, it looks like it thought the GPU had hung well before it actually did. Infact almost immediately. It then freaked out and went downhill from there to the point where the program only appeared very briefly (well before it gets to the point where it hangs).

=================================
Gallium debugger active. Logging all calls.
Hang detection timeout is 1000ms.
GPU hang detected, collecting information...

Draw #   driver  prev BOP  TOP  BOP  dump file
-------------------------------------------------------------
0         NO       NO      NO   NO   /home/arcade/ddebug_dumps/yuzu_894_00000000
Cannot open DRI name under debugfs: Permission denied
Cannot open DRI name under debugfs: Permission denied
Cannot open DRI name under debugfs: Permission denied

Done.
dd: Aborting the process...
Segmentation fault (core dumped)
=================================


Luckily Cemu seemed to be successful:
=================================
Gallium debugger active. Logging all calls.
Hang detection timeout is 1000ms.
GPU hang detected, collecting information...

Draw #   driver  prev BOP  TOP  BOP  dump file
-------------------------------------------------------------
14626     YES      NO      NO   NO   /home/arcade/ddebug_dumps/Cemu.exe_1158_00014629
=================================


All the requested debug files are attached inside the archive for the Cemu attempt.

Comment 61 Benjamin Hodgetts 2018-12-13 00:46:13 UTC

Created attachment 142797 [details]
New Error Since 4.19.X

Ok, so some time between my last report and now (started happening since 4.19.something, I don't know which version specifically) this problem has changed in how it manifests itself. Previously you'd get the "[gfxhub] VMC page fault" messages. Now it manifests itself in considerably more serious looking errors (with none of the "VMC page faults" in sight). Log file attached.

Once something triggers this, the card will become basically unresponsive and anything that tries to use it will start throwing more of the errors seen in the attached log.

It's not random though. For example I can run Unigine valley/superposition or Elder Scrolls Online (via Wine+DXVK) for as long as I like, stress-testing or benchmarking and it'll be fine. But as soon a I try one of the problem programs, it'll basically "break" the graphics card until I hard reset.

Comment 62 Michel Dänzer 2018-12-13 09:05:48 UTC

(In reply to Benjamin Hodgetts from comment #61)
> [...] this problem has changed in how it manifests itself. Previously you'd get
> the "[gfxhub] VMC page fault" messages. Now it manifests itself in considerably
> more serious looking errors (with none of the "VMC page faults" in sight).

That might be a different issue, please file a separate report about it.

Comment 63 e88z4 2018-12-18 01:56:01 UTC

Hi 

I came across this bug report that might be related to my bug report #109022

https://bugs.freedesktop.org/show_bug.cgi?id=109022

I got the VMC Page Fault error as well while playing Yuzu with RX580. The bug can be reproduced easily at the same spot each time. GPU crashed but can be accessed through SSH.

If you think my bug is very similar with this bug, maybe I can help debugging.

Comment 64 CheatCodesOfLife 2018-12-18 02:08:16 UTC

It'll be a different bug, this only affects Vega10 cards. Polaris is fine with Cemu and the other guy's Vega test app.

Comment 65 zzyxpaw 2019-02-19 01:47:58 UTC

I've just tested the vega_crasher on the latest kernel from the linux-amd-staging-drm-next-git package (archlinux) and it didn't crash.

% uname -a
Linux erebor 5.0.0-rc1-amd-staging-drm-next-git-b8cd95e15410+ #1 SMP PREEMPT Sat Feb 16 02:30:22 PST 2019 x86_64 GNU/Linux

That said, I'm still experiencing random crashes. I'll try and and get a debug dump next time it happens, but it looks a lot like what is described on this thread: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1049483-amd-devs-error-ring-gfx-timeout

Comment 66 CheatCodesOfLife 2019-05-19 02:38:16 UTC

I jumped ship to nvidia months ago so this doesn't help me, but for you guys following this thread, the Cemu developers managed to fix this issue on their end.

If you install the latest public release of Cemu, all games will work with Vega + mesa under wine.

Since there are non-cemu cases in here, I won't close the issue (someone else can if appropriate).

I'm unsubscribing from this now.

Comment 67 udo 2019-05-21 14:01:41 UTC

I get multiple of these:

392.377183] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:5 pasid:32772, for process firefox pid 4467 thread firefox:cs0 pid 4565
               )
[  392.377194] amdgpu 0000:09:00.0:   at address 0x00000001013d4000 from 27
[  392.377200] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00501031

(...)

[  402.621544] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=28019, emitted seq=28022
[  402.621551] [drm] GPU recovery disabled.

Fedora 30 on Gigabyte X470 AORUS ULTRA GAMING w/ AMD Ryzen 5 2400G with Radeon Vega Graphics running git mesa and git xf86-video-amdgpu.

Comment 68 udo 2019-05-21 14:02:39 UTC

I started getting these after/around commit 076159b40b96096ba01413abc011a26c9acf7176

Comment 69 Hleb Valoshka 2019-07-03 11:08:08 UTC

I have this fault with 2400G and mesa 18.3 & 19.1.1 with Linux 4.19 (other versions haven't been tested).

It seems that Vega is unable to handle tiny VBO correctly. I have an old application that uses a lot of immediate mode GL functions to create small billboards using GL_QUADS like the following one:

    glTexCoord2f(0, 0);          glVertex(v0 * Size);
    glTexCoord2f(1, 0);          glVertex(v1 * Size);
    glTexCoord2f(1, 1);          glVertex(v2 * Size);
    glTexCoord2f(0, 1);          glVertex(v3 * Size);

Initially I have replaced this code with
    static GLfloat Vtx[] =
    {
        -1, -1, 0,    0, 0,
         1, -1, 0,    1, 0,
         1,  1, 0,    1, 1,
        -1,  1, 0,    0, 1
    };

    glBufferData(GL_ARRAY_BUFFER, sizeof(Vtx), Vtx, GL_STATIC_DRAW);
    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_TEXTURE_COORD_ARRAY);
    glVertexPointer(3, GL_FLOAT, 5*sizeof(GLfloat), 0);
    glTexCoordPointer(2, GL_FLOAT, 5*sizeof(GLfloat), 3*sizeof(GLfloat));

    + I use VAO if it's available.

As a variant I used independent arrays for position and texture coordinates. But with the same fault.

So as a result I added required data to another related VBO which contains 8192 vertices. Now I don't have this fault.

I know that OpenGL doesn't like herds of small VBOs, but the hardware failure is not an expected result if we use them.

Comment 70 Pierre-Eric Pelloux-Prayer 2019-07-04 15:19:10 UTC

(In reply to zzyxpaw from comment #52)
> Created attachment 141522 [details]
> Logs + trace with patched mesa, plus example code which consistently
> triggers crash.
> 


The example code is incorrect. Line 99: 
   glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 5*sizeof(float), &vertices[3]);
Should be:
   glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 5*sizeof(float), 3 * sizeof(float));

(cf glVertexAttribPointer documentation: "pointer is treated as a byte offset into the buffer object's data store")

With this change the program runs correctly.

Note that even if the program is invalid it shouldn't hang the GPU. I'm working on a fix for this.

Comment 71 deltasquared 2019-07-20 17:05:43 UTC

I would like to pitch into this as it seems this particular problem has been plaguing me for some months now. Currently running kernel 5.2.1-arch1-1-ARCH and I will still occasionally get errors like this when running minetest (they seem to be subtly different from the others in this thread upon reading):

[ 5699.136659] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0 ring:155 vmid:5 pasid:32770, for process minetest pid 7127 thread minetest:cs0 pid 7133)
[ 5699.136662] amdgpu 0000:0b:00.0:   in page starting at address 0x000080014034d000 from 27
[ 5699.136664] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00501136
[ 5704.343299] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out.
[ 5709.259775] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=443165, emitted seq=443167
[ 5709.259860] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5709.259862] [drm] GPU recovery disabled.
*repeat last four lines endlessly...*

Relevant hardware is a ryzen 2200G (vega 8 GPU). The issue has survived swapping almost every component in my system so I think it is safe to rule out hardware brokenness in my case at least. Mercifully it seems the rest of the system survives this hence being able to capture the dmesg output, but with the gpu hard locked obviously the only recourse is to then reboot (after gathering some output for a while).

I haven't yet been able to obtain an API trace from minetest when it becomes difficult. Furthermore it doesn't do so reliably - I can often play for hours, but then the crash will strike and then the issue can sometimes persist across a few reboots if I press minetest to try and load a world again fast enough. Heck idk, is it a case of the precise 3D cloud pattern in the menu background at the time? Sounds like it would be useful for me to have apitrace running in the background whenever I run it on the off chance I can catch it in the act.

zzyxpaw's "vega crasher" in message #52 has reliably been able to cause GPU lock-up. Same sort of story: black window will pop up, nothing happens, and either lock-up occurs after a moment, or (interestingly) attempting to move the window in X11 will cause the lock-up immediately.

If there is any more data (such as attempting to get an apitrace) that would be useful I am willing to attempt to gather it, as this issue is the only blemish on an otherwise perfectly stable system.

Comment 72 Pierre-Eric Pelloux-Prayer 2019-07-22 07:40:32 UTC

This MR https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1265 should improve the situation. It has been merged last week.

An incorrect program (like "vega_crasher") should hit an assert (if they're enabled in Mesa) or produce an incorrect rendering but shouldn't hang the GPU anymore.

Comment 73 Juan A. Suarez 2019-07-22 08:29:21 UTC

(In reply to Pierre-Eric Pelloux-Prayer from comment #72)
> This MR https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1265 should
> improve the situation. It has been merged last week.
> 
> An incorrect program (like "vega_crasher") should hit an assert (if they're
> enabled in Mesa) or produce an incorrect rendering but shouldn't hang the
> GPU anymore.

It could be good if people could report here if this improved with this MR.

Comment 74 deltasquared 2019-07-22 20:15:29 UTC

(In reply to Juan A. Suarez from comment #73)
> It could be good if people could report here if this improved with this MR.

I can utilise the mesa-git package in the arch user repository to compile from latest sources. I will then test both vega_crasher and minetest with that package installed to see what occurs. Stay tuned for updates, though it may take a couple of days while I juggle $dayjob.

Comment 75 deltasquared 2019-07-22 20:56:46 UTC

After compiling mesa-git on commit 0661c357c60 from the AUR pkgbuild, I can now confirm my system seems to have become impervious to the above "vega_crasher" program.

Output from said program after resizing and moving vega_crasher's window a bit, in case it was important:

L CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic (remark): <unknown>:0:0: 9 instructions in function
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 16 VGPRS: 24 Code Size: 52 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic (remark): <unknown>:0:0: 12 instructions in function
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 16 VGPRS: 8 Code Size: 92 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 16 VGPRS: 24 Code Size: 44 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 16 VGPRS: 8 Code Size: 80 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic (remark): <unknown>:0:0: 2 instructions in function
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic (remark): <unknown>:0:0: 3 instructions in function
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic (remark): <unknown>:0:0: 4 instructions in function
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 16 VGPRS: 24 Code Size: 44 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 16 VGPRS: 8 Code Size: 136 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic (remark): <unknown>:0:0: 16 instructions in function
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 24 VGPRS: 24 Code Size: 92 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0
GL CALLBACK:  type = 0x8251, severity = 0x826b, message = Shader Stats: SGPRS: 24 VGPRS: 24 Code Size: 88 LDS: 0 Scratch: 0 Max Waves: 10 Spilled SGPRs: 0 Spilled VGPRs: 0 PrivMem VGPRs: 0

Minetest will take longer to test as the pkgbuild doesn't enable asserts, and also because of adformentioned $dayjob. I guess in that case I'd only know if I saw garbled output; it was never very consistent when it occured but would always be during the loading bar screen (but when it did happen some very colourful blocky corruption would result).

Comment 76 deltasquared 2019-07-22 20:57:43 UTC

(In reply to deltasquared from comment #75)

> L CALLBACK:  type = 0x8251, severity = 0x826b, message = LLVM diagnostic
GL_CALLBACK rather on that first line. terminal copypaste fail.

Comment 77 deltasquared 2019-07-22 21:03:22 UTC

Created attachment 144845 [details]
vega_crasher after patch, black central output, on ryzen 2200G with vega 8 graphics

Screenshot time (1/2). It seems sometimes vega_crasher will render black - I haven't thoroughly looked over the code so I'm not sure if this is the adformentioned incorrect result (where an assert would have been hit).

Comment 78 deltasquared 2019-07-22 21:06:45 UTC

Created attachment 144846 [details]
vega_crasher after patch, colour shaded central output, on ryzen 2200G with vega 8 graphics

Screenshot 2/2 of vega_crasher after patch. It seems to indeterministically switch between shaded and black central regions - I can only assume this is down to whether or not the offending index ends up out of bounds?

If it helps I can attempt more tests with an asserts-enabled build, though that will take some more time, a resource I am out of for today. (Will have to look at how to do that also - just a question of a debug build or another flag that needs passing?)

Comment 79 deltasquared 2019-07-26 18:52:52 UTC

OK, have managed to get an unrelated crash from starting minetest now with the mentioned patch so at this point I think that case is unrelated. (Certainly seems to be more subtle, this MT crash has never been as reliable to trigger as some of the other things on this thread).

Will endeavour to file a bug separately. Any suggestions on information to kickstart such a related bug would be appreciated, else I will reach out to various channels on freenode first to get that ball rolling.

Comment 80 Martin Peres 2019-11-19 08:31:41 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/311.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.