Summary: | [HSW] GPU HANG: ecode 0:0x87d3bffa - hang loading ctx | ||
---|---|---|---|
Product: | DRI | Reporter: | Byoungchan Lee <byoungchan.lee.public> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED DUPLICATE | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | barry.scott, berndkuhls, chrassig, dasebek, erlend1969, fernetmenta, fritsch, hal.from.2001, hi, intel-gfx-bugs, jesse.osiecki, j.g.villalonga, myfoolishgames, nemesis, nil, redwoz, rr1991b, samdavispan, tournieral, wes |
Version: | XOrg git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Created attachment 101354 [details]
dmesg
Created attachment 101355 [details]
i915_error_state
GPU dump in /sys/class/drm/card0/error
Created attachment 101356 [details]
syslog
syslog in /var/log/syslog
Your driver stack is out of date and misses an important bug fix for a very similar bug in mesa. Please update and report back. (In reply to comment #4) > Your driver stack is out of date and misses an important bug fix for a very > similar bug in mesa. Please update and report back. So I updated some of the packages by using xorg-edgers/ppa. ( http://launchpad.net/~xorg-edgers/+archive/ppa ) -- xf86-video-intel: 2.99.910-0ubuntu1 -> 2.99.912+git20140618.d49f53cc-0ubuntu0ricotz~trusty -- mesa: 10.1.3-0ubuntu0.1 -> 10.3.0~git20140618.88b887fa-0ubuntu0ricotz~trusty -- libdrm: 2.4.52-1 -> 2.4.54+git20140523.8fc62ca8-0ubuntu0ricotz~trusty I believe that these packages are built from git trunk. But, still I got hang. i915_error_state is re-uploaded. Created attachment 101366 [details]
i915_error_state_06192240
GPU Dump with updated mesa.
Created attachment 103576 [details]
GPU crash dump saved in /sys/class/drm/card0/error
So, I updated some packages in order to get latest versions.
System hang still occurs, and Here is GPU crash dump.
Environments
-- chipset: Intel Pentium 3556U with haswell-based mobile graphics.
-- system architecture: x86_64
-- xf86-video-intel: 2:2.99.914+git20140723.8d95e90b-0ubuntu0sarvatt2~trusty # updated
-- xserver: 1.15.1-0ubuntu2
-- mesa: 10.3.0~git20140723.fb237ba7-0ubuntu0sarvatt~trusty # updated
-- libdrm: 2.4.54+git20140716.c0b34dca-0ubuntu0ricotz~trusty # updated
-- kernel: 3.13.0-32-generic # updated
-- Linux distribution: Ubuntu 14.04
-- Machine or mobo model: Lenovo IdeaPad S310 (LENOVO_MT_20300)
*** Bug 82103 has been marked as a duplicate of this bug. *** I may being blunt but I don't really understand how's the assignee is the same person who opened the issue which doesn't seem to be a kernel developer (called it a hunch by the fact the latest test done with kernel 3.13 and Xorg 1.15). As a (non developer) user who's affected by this issue for more than 2 months (and I'm guessing anyone else with 2955U) I would love to see it resolved so I hope a developer is assigned to it. p.s. @Byoungchan Lee thanks for reporting this, didn't meant for disrespect. Created attachment 104159 [details] GPU crash dump saved in /sys/class/drm/card0/error (updated) Due to the size of dump, first 10,000 lines are uploaded only. Full text is uploaded in http://pastebay.net/1476614 . Created attachment 104160 [details]
Xorg.0.log (updated)
Created attachment 104161 [details]
dmesg (updated)
As dhead666 mentioned, I`m not a developer of the linux kernel. I have a little experience with the linux system, but I`d like to fix this issue (and help FOSS community) and If there are something other than uploading log files and write a comment related to symptom, I`ll do it if I can do. Updated Environments (with kernel update. hang still occurs.) -- chipset: Intel Pentium 3556U with haswell-based mobile graphics. -- system architecture: x86_64 -- xf86-video-intel: 2.99.914+git20140806.105d478c-0ubuntu0sarvatt~trusty # updated -- xserver: 1.15.1-0ubuntu2 -- mesa: 10.3.0~git20140805.fc2b2d33-0ubuntu0sarvatt2~trusty # updated -- libdrm: 2.4.56+git20140801.5d835797-0ubuntu0sarvatt~trusty # updated -- kernel: 3.16 rc7 # updated -- Linux distribution: Ubuntu 14.04 -- Machine or mobo model: Lenovo IdeaPad S310 (LENOVO_MT_20300) *** Bug 82304 has been marked as a duplicate of this bug. *** *** Bug 82350 has been marked as a duplicate of this bug. *** *** Bug 82459 has been marked as a duplicate of this bug. *** *** Bug 82457 has been marked as a duplicate of this bug. *** *** Bug 82456 has been marked as a duplicate of this bug. *** *** Bug 82460 has been marked as a duplicate of this bug. *** *** Bug 82461 has been marked as a duplicate of this bug. *** *** Bug 82523 has been marked as a duplicate of this bug. *** *** Bug 82769 has been marked as a duplicate of this bug. *** *** Bug 83017 has been marked as a duplicate of this bug. *** Updated GPU crash dump. https://www.dropbox.com/s/k0ixyism7vx3a8h/gpu_crash_dump.log?dl=0 System: Intel Celeron 2955U arch x86_64 (Arch Linux) kernel 3.17rc2 mesa/intel-dri 10.2.6 xf86-video-intel 2.99.914 xorg-server 1.16 libdrm 2.4.56 error message: [drm] stuck on render ring [drm] GPU HANG: ecode 0:0x87d3bffa, in chromium [9573], reason: Ring hung, action: reset [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [drm] GPU crash dump saved to /sys/class/drm/card0/error *** Bug 83157 has been marked as a duplicate of this bug. *** (In reply to comment #25) > *** Bug 83157 has been marked as a duplicate of this bug. *** I meant to just upload a new crash dump, but I wasn't thinking so I submitted a new bug sorry. Is there anything I can do to help with this bug? I would donate money to the project if that would encourage someone to work on it. Thanks. I'm joining @nil suggestion. If an Intel dev need help how to reproduce this issue then the Acer C720 (same device I've got) goes for 180$ at Amazon, I'm willing to donate half of that (yes, I know, not much for the rate of experienced developer time but what can I say, I'm still a student). Another journalctl and /sys/class/drm/card0/error outputs, now with kernel 3.17rc3. The most affected application is Chromium 37 which slows until halt on some sites. https://www.dropbox.com/s/ptt7jxyl7s6kblx/3.17rc3_journal.log?dl=0 https://www.dropbox.com/s/mrixjg7wmdkbrmy/3.17rc3_gpu_crash_dump.log?dl=0 *** Bug 83585 has been marked as a duplicate of this bug. *** *** Bug 82392 has been marked as a duplicate of this bug. *** Updated logs: https://www.dropbox.com/s/4jhbxbhovea8y9i/3.17rc6_journal.log?dl=0 https://www.dropbox.com/s/wakq9d8rt0zuqnn/3.17rc6_gpu_crash_dump.log?dl=0 * linux 3.17rc6 * mesa/intel-dri 10.3.0 * xf86-video-intel 2.99.916 * xorg-server 1.16.1 I also saw these errors in the log kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0 kernel: pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID) kernel: pcieport 0000:00:1c.0: device [8086:9c10] error status/mask=00001000/00002000 kernel: pcieport 0000:00:1c.0: [12] Replay Timer Timeout I also see the following error messages: [drm:ivybridge_set_fifo_underrun_reporting] *ERROR* uncleared fifo underrun on pipe A [drm:ivb_err_int_handler] *ERROR* Pipe A FIFO underrun I believe they appears when: * Running X (rootless) on tty1 and changing to tty2. * Running X (rootless) on tty1, Wayland on tt2, after running some X apps on the Wayland session it crashes and usually output the error message. All sessions are Gnome desktop. I enabled i915.mmio_debug=1, I don't know if more debug details added to the gpu crash dump but here's it: https://www.dropbox.com/s/47xsk1yllvrczfr/3.17rc6_gpu_crash_dump_fifo_underrun.log?dl=0 For the brave git://people.freedesktop.org/~ickle/linux-2.6 requests may be of interest. *** Bug 84469 has been marked as a duplicate of this bug. *** Do you have a specific patch you want to get tested? Or is this a "catch all tree" with general rework, that could fix this bug by accident? Any specific branch? (In reply to comment #35) > Do you have a specific patch you want to get tested? Or is this a "catch all > tree" with general rework, that could fix this bug by accident? > > Any specific branch? The branch is requests, the specific patch itself is a bit of a shotgun: http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=requests&id=f30eb97f6bde8e207316e014705534ae813f9634 do you have any idea what triggers this issue? we may need to find some work around before we release XBMC 14. on my systems this happens when doing texture loading on an extra thread using vaPuSurface and texture from pixmap. do you see any relationship with the linked patch? we also run a thread with an extra gl context on NVIdia and AMD systems where this issue does not show. (In reply to comment #33) > For the brave git://people.freedesktop.org/~ickle/linux-2.6 requests may be > of interest. Freedesktop server is slow to say the least. Couldn't get to shell/prompt, not sure what's going on (darn journald, I don't have syslog server running). Can I apply the patch against rc1 mainline (or even rc7) ? Happened 4 times in 10 minutes: Sep 30 09:31:04 H87 kernel: [ 391.118916] [drm] stuck on render ring Sep 30 09:31:04 H87 kernel: [ 391.119755] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [750], reason: Ring hung, action: reset Sep 30 09:31:04 H87 kernel: [ 391.119757] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Sep 30 09:31:04 H87 kernel: [ 391.119757] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Sep 30 09:31:04 H87 kernel: [ 391.119758] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Sep 30 09:31:04 H87 kernel: [ 391.119759] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Sep 30 09:31:04 H87 kernel: [ 391.119760] [drm] GPU crash dump saved to /sys/class/drm/card0/error Sep 30 09:31:06 H87 kernel: [ 393.120568] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Sep 30 09:32:57 H87 kernel: [ 504.222252] [drm] stuck on render ring Sep 30 09:32:57 H87 kernel: [ 504.223114] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [750], reason: Ring hung, action: reset Sep 30 09:32:59 H87 kernel: [ 506.223906] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Sep 30 09:38:46 H87 kernel: [ 853.504324] [drm] stuck on render ring Sep 30 09:38:46 H87 kernel: [ 853.505173] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [750], reason: Ring hung, action: reset Sep 30 09:38:48 H87 kernel: [ 855.505962] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Sep 30 09:42:35 H87 kernel: [ 1082.689397] [drm] stuck on render ring Sep 30 09:42:35 H87 kernel: [ 1082.690240] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [750], reason: Ring hung, action: reset Sep 30 09:42:37 H87 kernel: [ 1084.691045] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Silence? This bug is more than 3 months old and I can provoke a lot of those on my systems, just happened again. Are you interested in solving this? I have the feeling this is stuck and I don't like this! Could you please reproduce with latest drm-intel-nightly and paste dmesg and gpu error state again? here we go: dmesg: http://paste.ubuntu.com/8477521/ crash dump: https://dl.dropboxusercontent.com/u/47522966/gpu-dump.txt (In reply to Rainer Hochecker from comment #42) > here we go: > > dmesg: http://paste.ubuntu.com/8477521/ > crash dump: https://dl.dropboxusercontent.com/u/47522966/gpu-dump.txt Please always *attach* such information to the bug to keep them in one place and to not lose them. Thanks. (In reply to Jani Nikula from comment #43) > (In reply to Rainer Hochecker from comment #42) > > here we go: > > > > dmesg: http://paste.ubuntu.com/8477521/ > > crash dump: https://dl.dropboxusercontent.com/u/47522966/gpu-dump.txt > > Please always *attach* such information to the bug to keep them in one place > and to not lose them. Thanks. I would have done so but this system here denies attachments > 3000k Created attachment 107231 [details]
crashdump in bzip2 format
Attached you find the logfiles from Rainer above in bzip2 format. Btw. we are on the way to release xbmc 14.0 in the next months. We have completely rewritten VAAPI there and are one of the first at all that support your new - but nowhere else implemented - VPP API. We are thinking of removing VAAPI completely from xbmc, cause we don't like to ship broken software we cannot fix ourselves. With the VAAPI devs we have worked together with great success but this mesa, kernel, whatever bug we cannot fix alone. Most intel chips are fast enough to do multi-core decoding. Not sure if the small nucs can do that over time without overheating. But this is better after all than hangs we cannot fix ourselves. Created attachment 107232 [details]
dmesg - the usual hang - as reported multiple times
I have build the branch chris wilson linked in his kernel tree, I used the default ubuntu kernel config and generated .deb files. So if someone is running Ubuntu please give those a test: Headers: https://dl.dropboxusercontent.com/u/55728161/linux-headers-3.17.0-rc7-ickle%2B_3.17.0-rc7-ickle%2B-10.00.Custom_amd64.deb Kernel: https://dl.dropboxusercontent.com/u/55728161/linux-image-3.17.0-rc7-ickle%2B_3.17.0-rc7-ickle%2B-10.00.Custom_amd64.deb Created attachment 107244 [details]
Chris Wilson Kernel test
Sorry - testing with the branch chris wilson linked is impossible. The Kernel crashes and hard hangs every time xbmc is closed or the xserver is restarted. I am open for other suggestions to test, patch and whatever. Created attachment 107250 [details]
Crashlog with Kernel 3.17 drm-intel-nightly with drm.debug=0xe
Crashlog Kernel 3.17 drm-intel-nightly with drm.debug=0xe
Created attachment 107251 [details]
Kernel 3.17 drm-intel-nightly dmesg with drm.debug=0xe
Kernel 3.17 drm-intel-nightly dmesg with drm.debug=0xe
I followed a recommendation at https://johnlewis.ie/tentative-fixwork-around-for-i915-gpu-hangs/ to set some i915 and drm options in the kernel cmdline and at first glace it seems to help. If I'll drop the default values from the given list then the options are: drm.vblankoffdelay=1 i915.semaphores=0 i915.modeset=1 i915.use_mmio_flip=1 i915.enable_ppgtt=1 i915.reset=0 i915.lvds_use_ssc=0 At least half of the options you name are "default options, see modinfo i915. Also disabling the gpu reset is probably not a good idea since that will freeze the complete system if such a hang occurs. Rainer and I are testing with i915.enable_rc6=0 since several hours and for now - we did not have another hang. concerning your parameters: drm.debug=0 drm.vblankoffdelay=1 i915.semaphores=0 <- user per chip defaults (-1) is default i915.modeset=1 <- forces modesetting i915.use_mmio_flip=1 <- this is not documented i915.powersave=1 <- is the default i915.enable_ips=1 <- is the default i915.disable_power_well=1 <- is the default i915.enable_hangcheck=1 <- default i915.enable_cmd_parser=1 <- default i915.fastboot=0 <- default i915.enable_ppgtt=1 <- -1 (auto) is the default i915.reset=0 <- this is probably dangerous as the gpu won't be reset i915.lvds_use_ssc=0 <- default is auto i915.enable_psr=0 <- this is the default So no idea which one makes a difference. The semaphores most likely? I will also start testing with i915.semaphores=0 now. Found the mmio thingy. It's new and first available in 3.17 kernel. I think we should try "one after the other" to find out which combination really solves it. Today I tested with rc6 disabled and the problem did not show. But I think this is not the desired solution. another hang with rc6 off would be nice if we cold get a comment from Intel what they think. then we may have a chance to work around this bug if they are not able to fix it. p.s. I am XBMC developer and speak for a large community. [ 9106.161151] [drm] stuck on render ring [ 9106.161995] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [752], reason: Ring hung, action: reset [ 9106.161997] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 9106.161997] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 9106.161998] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 9106.161999] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 9106.162000] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 9108.162761] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off [10496.270560] [drm] stuck on render ring [10496.271391] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [752], reason: Ring hung, action: reset [10498.272190] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off [11031.702508] [drm] stuck on render ring [11031.703342] [drm] GPU HANG: ecode 0:0x87d3bffa, in xbmc.bin [752], reason: Ring hung, action: reset [11033.704151] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off I'm about two days testing with i915.semaphores=0 i915.modeset=1 i915.use_mmio_flip=1 i915.enable_ppgtt=1 I didn't had hangs and didn't see the message "stuck on render ring" I do see high memory consumption by Chromium when opening pages with graphics until RAM is almost full (4GB) and then the system slowing down until it kills the gpu process (can't remember the log message) but it doesn't kill chromium (just the gpu process) and I see the error "The GPU process hung". I still have unrecoverable full system freezes, it's hard to reproduce it but it's usually happens with GTK apps (and not for example with Chromium), leave Gnome (3.12/3.14) desktop visible for a hour, two or a day and it will freeze in some point, it's also happens on Epiphany, Gedit and other GTK apps. Frequency can be once a week or 5-6 times a day. I encountered countless of system freezes, never happened when Chromium open (a usually it is open constantly) so I'm absolutely sure it got nothing to do with hardware malfunction and I'm guessing GTK/Clutter triggers a GPU related bug. Don't have logs as the system don't even send the error message to syslog-ng and I'm not sure how to debug this further (got a Bus Blaster and Bus Pirate but I'm not sure where is the JTAG on the my Acer C720 and if it would be much of help). *** Bug 84996 has been marked as a duplicate of this bug. *** Does this happen with i915.enable_ppgtt=0 ? Please upload a new fresh error state if so. Created attachment 107878 [details]
gpu crash dump
(In reply to Mika Kuoppala from comment #60) > Does this happen with i915.enable_ppgtt=0 ? > > Please upload a new fresh error state if so. yes, it does I can also confirm, except i915.enable_ppgtt=0 no other i915 or drm module options were set. 3.17_gpu_crash_dump-ppgtt_disable.log at https://www.dropbox.com/s/n8kda72qnmg8kzz/3.17_gpu_crash_dump-ppgtt_disable.log?dl=0 *** This bug has been marked as a duplicate of bug 83677 *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 101353 [details] Xorg.0.log Bug description: Hangs for a while, and then continues with some character or window glitches. System environment: -- chipset: Intel Pentium 3556U with haswell-based mobile graphics. -- system architecture: x86_64 -- xf86-video-intel: 2.99.910-0ubuntu1 -- xserver: 1.15.1-0ubuntu2 -- mesa: 10.1.3-0ubuntu0.1 -- libdrm: 2.4.52-1 -- kernel: 3.13.0-29-generic -- Linux distribution: Ubuntu 14.04 -- Machine or mobo model: Lenovo IdeaPad S310 (LENOVO_MT_20300) Reproducing steps: While using programs like firefox(usually hang occurs while surfing complex webpage.) or libreoffice, system hangs for a while.(usually 3~10 seconds.) After that, system hangs for a while, and then continues with some character or window glitches. Additional info: Xorg.0.log, dmesg, i915_error_state, syslog (/var/log/syslog) attached. render command stream: IPEHR: 0x780c0000