Created attachment 128350 [details] /sys/class/drm/card0/error Upon opening any program which uses OpenGL, the GPU hangs. System environment: -- chipset: Intel(R) Core(TM) i7-4700MQ CPU -- system architecture: x86_64 -- xf86-video-intel: 2.99.917.740.g9ac7a33-1 -- xserver: 1.18.4-1 -- mesa: 13.0.2-2 -- libdrm: 2.4.74.r14.g0825792-1 -- kernel: 4.9.0-rc8inteldri+ commit 721d484 of intel-drm branch -- Linux distribution: Archlinux -- Machine: MSI GE60 2OE Reproducing steps: Run glxspheres64, and the GPU will hang. Additional info: I haven't been able to bisect the issue, v3.8 to HEAD exhibit the hang. From v3.8 to v3.12 the GPU doesn't consistently hang, and from v3.13 onward the GPU hangs consistently. The offending line in glxspheres.c which causes the GPU to hang appears to be: GLXFBConfig *c=glXChooseFBConfig(dpy, screen, rgbAttribs, &n);
Created attachment 128351 [details] dmesg log from hang
Assigning to Mesa product (please let me know if I am mistaken with this GPU Hang). Kernel: 4.9.0-rc8inteldri+ commit 721d484 of intel-drm branch Platform: Haswell (pci id: 0x0416, pci revision: 0x06, pci subsystem: 1462:10e0) Mesa: 13.0.2-2 From this error dump, hung is happening in render ring batch with active head at 0x018542c4, with 0x7a000003 (PIPE_CONTROL) as IPEHR. We can also note ERROR: 0x00000101 [TLB page fault error (GTT entry not valid)] and then in render ring "Unloaded PD Fault (PPGTT)" Batch extract (around 0x018542c4): 0x01854294: 0x7b000005: 3DPRIMITIVE: 0x01854298: 0x00000006: tri fan sequential 0x0185429c: 0x00000004: vertex count 0x018542a0: 0x00000000: start vertex 0x018542a4: 0x00000001: instance count 0x018542a8: 0x00000000: start instance 0x018542ac: 0x00000000: index bias 0x018542b0: 0x7a000003: PIPE_CONTROL 0x018542b4: 0x00101001: no write, cs stall, render target cache flush, depth cache flush, 0x018542b8: 0x00000000: destination address 0x018542bc: 0x00000000: immediate dword low 0x018542c0: 0x00000000: immediate dword high 0x018542c4: 0x7a000003: PIPE_CONTROL 0x018542c8: 0x00000c10: no write, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, 0x018542cc: 0x00000000: destination address 0x018542d0: 0x00000000: immediate dword low 0x018542d4: 0x00000000: immediate dword high 0x018542d8: 0x780e0000: 3DSTATE_CC_STATE_POINTERS 0x018542dc: 0x00007f01: pointer to COLOR_CALC_STATE at 0x00007f00 (changed)
This demo works properly with latest mesa, on haswell, using a stable kernel (4.7). Yann, can you reproduce on the specified intel-drm kernel, and bisect the kernel regression?
(In reply to Mark Janes from comment #3) > This demo works properly with latest mesa, on haswell, using a stable kernel > (4.7). > > Yann, can you reproduce on the specified intel-drm kernel, and bisect the > kernel regression? Thanks Mark, I will setup env on my side. In the meantime webstrand@gmail.com can you confirm that it is working on your side with an earlier kernel? Regression?
Unfortunately, linux 4.7.0-1-ARCH also exhibits the GPU HANG.
Created attachment 128370 [details] /sys/class/drm/card0/error for 4.7
(In reply to webstrand from comment #0) > Upon opening any program which uses OpenGL, the GPU hangs. I'm not really sure what to say. A lot of people use Archlinux with Haswell GT2 and OpenGL programs work just fine. I suspect this is something specific to your setup, but I'm not sure what it would be. It might be worth trying X with the modesetting driver instead of intel/SNA. I don't know that it's the problem, but that would eliminate one of the variables. (Easiest way to accomplish this is to pacman -R xf86-video-intel).
I was unable to get xorg to start without xf86-video-intel. I can invest some more time in figuring out what's wrong, if necessary. However, I've installed a fresh copy of Ubuntu 14.04.4 on which I can also reproduce the GPU hang. I was able to successfully bisect the mainline kernel between v3.12 and v3.13. first bad commit: [b29c19b645287f7062e17d70fa4e9781a01a5d88] drm/i915: Boost RPS frequency for CPU stalls
Wonderful. Thank you for bisecting!
(In reply to webstrand from comment #8) > I was unable to get xorg to start without xf86-video-intel. I can invest > some more time in figuring out what's wrong, if necessary. > > However, I've installed a fresh copy of Ubuntu 14.04.4 on which I can also > reproduce the GPU hang. I was able to successfully bisect the mainline > kernel between v3.12 and v3.13. > > first bad commit: [b29c19b645287f7062e17d70fa4e9781a01a5d88] drm/i915: > Boost RPS frequency for CPU stalls False bisect result unfortunately. (In reply to Kenneth Graunke from comment #7) > (In reply to webstrand from comment #0) > > Upon opening any program which uses OpenGL, the GPU hangs. > > I'm not really sure what to say. A lot of people use Archlinux with Haswell > GT2 and OpenGL programs work just fine. I suspect this is something > specific to your setup, but I'm not sure what it would be. > > It might be worth trying X with the modesetting driver instead of intel/SNA. > I don't know that it's the problem, but that would eliminate one of the > variables. (Easiest way to accomplish this is to pacman -R > xf86-video-intel). Completely bogus and unhelpful.
(In reply to Chris Wilson from comment #10) > False bisect result unfortunately. I'm guessing you mean that the commit I referenced is another bug which has already been fixed? Would it be worth trying to bisect again? The last time I was able to use opengl on this laptop was with linux v3.12. (In reply to Kenneth Graunke from comment #7) > It might be worth trying X with the modesetting driver instead of intel/SNA. In my xorg config, I've set: Option "NoAccel" "True" And I've set LIBGL_ALWAYS_SOFTWARE=1 globally, which I unset temporarily for testing glxspheres64. In an effort to reproduce the issue without using xorg, I installed weston. Launching weston causes the GPU hang about every 1 in 3 launches.
Created attachment 128394 [details] /sys/class/drm/card0/error for weston
Highest+Blocker as being regression w/o workaround
Yann, use five minutes and check out if new crash log give any new indication about the reason for hang. I also removed bisected keyword since assumable that is not the case.
(In reply to Jari Tahvanainen from comment #14) > Yann, use five minutes and check out if new crash log give any new > indication about the reason for hang. > I also removed bisected keyword since assumable that is not the case. According to mesa engineers, mesa only emits 3DSTATE_VERTEX_ELEMENTS on-demand right before 3DPRIMITIVE. Chris has changed SNA to emit a dummy primitive between VertexElements in 4acd4a7d3d2f41227022fa7581cfb85a0b124eae in xf86-video-intel (thanks to https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=4acd4a7d3d2f41227022fa7581cfb85a0b124eae) but this is for gen9. Here we are dealing with gen7, do we need such mechanism as well in gen7_emit_vertex_elements? An alternate solution is to use Glamor/modesetting. *Details: - Kernel: 4.9.0-rc8inteldri+ - Platform: Haswell (PCI ID: 0x0416, PCI Revision: 0x06, PCI Subsystem: 1462:10e0) - Mesa: 13.0.2-2 - xf86-video-intel: 2.99.917.740.g9ac7a33-1 0x008a0540: 0x78090001: 3DSTATE_VERTEX_ELEMENTS 0x008a0544: 0x02850000: buffer 0: valid, type 0x0085, src offset 0x0000 bytes 0x008a0548: 0x11230000: (X, Y, 0.0, 1.0), dst offset 0x00 bytes 0x008a054c: 0x7b000005: 3DPRIMITIVE: 0x008a0550: 0x00000006: tri fan sequential 0x008a0554: 0x00000004: vertex count 0x008a0558: 0x00000000: start vertex 0x008a055c: 0x00000001: instance count 0x008a0560: 0x00000000: start instance 0x008a0564: 0x00000000: index bias 0x008a0568: 0x7a000003: PIPE_CONTROL 0x008a056c: 0x00101001: no write, cs stall, render target cache flush, depth cache flush, 0x008a0570: 0x00000000: destination address 0x008a0574: 0x00000000: immediate dword low 0x008a0578: 0x00000000: immediate dword high 0x008a057c: 0x7a000003: PIPE_CONTROL 0x008a0580: 0x00000c10: no write, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, 0x008a0584: 0x00000000: destination address 0x008a0588: 0x00000000: immediate dword low 0x008a058c: 0x00000000: immediate dword high
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1551.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.