Summary: | Linux 4.5 regression: FIFO underruns on Skylake | ||
---|---|---|---|
Product: | DRI | Reporter: | Andy Lutomirski <luto> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | ashley, gary.c.wang, giuliani.v, intel-gfx-bugs, manfred.kitzbichler, matthew.d.roper, przanoni, q3aiml |
Version: | unspecified | Keywords: | regression |
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | SKL | i915 features: | display/eDP, display/watermark |
Description
Andy Lutomirski
2016-02-29 16:31:02 UTC
This was not fixed by "drm/i915/skl: Fix power domain suspend sequence". This is also not fixed in drm-intel-nightly 2016y-03m-11d-13h-31m-03s. There is some highly questionable code in here. In skl_pipe_wm_get_hw_state: temp = hw->plane_trans[pipe][PLANE_CURSOR]; skl_pipe_wm_active_state(temp, active, true, true, i, 0); by "i", do you mean PLANE_CURSOR? This bug probably doesn't matter, because if is_cursor, then i is ignored. But I'm wondering why there's an is_cursor parameter at all, given that the code appears to be identical in both cases. If PLANE_CURSOR is intended to be just like the other planes, why not either make it plane 0 or have a for_each_plane or similar that iterates over all plane indices including PLANE_CURSOR? Just for completeness: the bug is present in 4.5 final. I've definitely seen FIFO overruns while the first two WM levels show the cursor being on (PLANE_WM_EN set). I've never seen PLANE_WM_EN set beyond the second level (LP0 and LP1 if I understand the code correctly). I tend to have a gnome-terminal running, and gnome-terminal loves toggling the cursor state, but I certainly don't need to have gnome-terminal in the foreground to see this issue. I have not reproduced this problem in 4.6-rc1, so it might be fixed. If I had to guess, I'd say this was: commit bf22045250fafbe733277e13300eaa240ba2104d Author: Matt Roper <matthew.d.roper@intel.com> Date: Tue Jan 19 11:43:04 2016 -0800 Revert "drm/i915: Add two-stage ILK-style watermark programming (v10)" which is the fix for bug 93640. I think there is something wrong with your release process. v4.5 has: commit e2e407dc093f530b771ee8bf8fe1be41e3cea8b3 Author: Matt Roper <matthew.d.roper@intel.com> AuthorDate: Mon Feb 8 11:05:28 2016 -0800 Commit: Jani Nikula <jani.nikula@intel.com> CommitDate: Tue Feb 9 11:24:39 2016 +0200 drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2) v4.6-rc1 also has: commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb Author: Matt Roper <matthew.d.roper@intel.com> AuthorDate: Tue Feb 2 22:06:51 2016 -0800 Commit: Matt Roper <matthew.d.roper@intel.com> CommitDate: Wed Feb 3 05:59:03 2016 -0800 drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2) What gives? (I haven't confirmed that the latter is the change that fixes this.) This is not fixed as of Linux 4.6.2-1-ARCH. Sometimes the bug causes my system to completely freeze and I have to reboot with the power button. In journal I see: Jun 24 10:51:01 miki-laptop kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun but the worse one is: Jun 24 10:30:34 miki-laptop kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun Jun 24 10:39:25 miki-laptop kernel: BUG: unable to handle kernel NULL pointer dereference at (null) Jun 24 10:39:25 miki-laptop kernel: IP: [< (null)>] (null) Jun 24 10:39:25 miki-laptop kernel: PGD 84d3e067 PUD 8523c067 PMD 0 Jun 24 10:39:25 miki-laptop kernel: Oops: 0010 [#1] PREEMPT SMP Jun 24 10:39:25 miki-laptop kernel: Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev Jun 24 10:39:25 miki-laptop kernel: glue_helper ablk_helper snd input_leds cryptd cfg80211 led_class serio_raw pcspkr soundcore i2c_i801 hci_uart shpchp btbcm i2c_hid thermal wmi btqca hid elan_i2c Jun 24 10:39:25 miki-laptop kernel: drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt Jun 24 10:39:25 miki-laptop kernel: CPU: 0 PID: 765 Comm: Xorg Tainted: G U O 4.6.2-1-ARCH #1 Jun 24 10:39:25 miki-laptop kernel: Hardware name: ASUSTeK COMPUTER INC. UX305UA/UX305UA, BIOS UX305UA.201 10/12/2015 Jun 24 10:39:25 miki-laptop kernel: task: ffff8802692b0f40 ti: ffff880084d34000 task.ti: ffff880084d34000 Jun 24 10:39:25 miki-laptop kernel: RIP: 0010:[<0000000000000000>] [< (null)>] (null) Jun 24 10:39:25 miki-laptop kernel: RSP: 0018:ffff880084d37af0 EFLAGS: 00010286 Jun 24 10:39:25 miki-laptop kernel: RAX: ffff880084d37bb8 RBX: ffff88026a1b5c00 RCX: 000000000001fd36 Jun 24 10:39:25 miki-laptop kernel: RDX: 000000000001fd36 RSI: ffff8802685220f8 RDI: ffff88026a1b5f00 Jun 24 10:39:25 miki-laptop kernel: RBP: ffff880084d37b78 R08: ffff88026a1b5f00 R09: ffff88026a1b5f00 Jun 24 10:39:25 miki-laptop kernel: R10: ffff88020a43ed00 R11: 0000000000000000 R12: 0000000000000001 Jun 24 10:39:25 miki-laptop kernel: R13: ffff880268523368 R14: ffff8802685220f8 R15: 0000000000000000 Jun 24 10:39:25 miki-laptop kernel: FS: 00007fb7671b8940(0000) GS:ffff880273c00000(0000) knlGS:0000000000000000 Jun 24 10:39:25 miki-laptop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 24 10:39:25 miki-laptop kernel: CR2: 0000000000000000 CR3: 000000007f387000 CR4: 00000000003406f0 Jun 24 10:39:25 miki-laptop kernel: Stack: Jun 24 10:39:25 miki-laptop kernel: ffffffffa0122da0 ffff880268520000 ffff8802685220f8 0001fd36000400d8 Jun 24 10:39:25 miki-laptop kernel: ffff880084d37bb8 ffff8801cb6cf3c0 ffff88026a1b5c00 ffff880230579cc0 Jun 24 10:39:25 miki-laptop kernel: ffff880084d37b40 ffffffffa0125ffd ffff880084d37b80 00000000afcbbfb4 Jun 24 10:39:25 miki-laptop kernel: Call Trace: Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0122da0>] ? i915_gem_object_sync+0x1b0/0x340 [i915] Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0125ffd>] ? i915_gem_object_pin+0x2d/0x30 [i915] Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0135abd>] intel_execlists_submission+0x1cd/0x440 [i915] Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0114a20>] i915_gem_do_execbuffer.isra.14+0xaf0/0x1450 [i915] Jun 24 10:39:25 miki-laptop kernel: [<ffffffff812e6ae9>] ? idr_get_empty_slot+0x189/0x370 Jun 24 10:39:25 miki-laptop kernel: [<ffffffff812e6d53>] ? idr_alloc+0x83/0x100 Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0018079>] ? drm_gem_handle_create_tail+0xc9/0x1a0 [drm] Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa01160d4>] i915_gem_execbuffer2+0xd4/0x250 [i915] Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0018aa2>] drm_ioctl+0x152/0x540 [drm] Jun 24 10:39:25 miki-laptop kernel: [<ffffffffa0116000>] ? i915_gem_execbuffer+0x330/0x330 [i915] Jun 24 10:39:25 miki-laptop kernel: [<ffffffff81209bc3>] do_vfs_ioctl+0xa3/0x5d0 Jun 24 10:39:25 miki-laptop kernel: [<ffffffff814a1091>] ? __sys_recvmsg+0x51/0x90 Jun 24 10:39:25 miki-laptop kernel: [<ffffffff8120a169>] SyS_ioctl+0x79/0x90 Jun 24 10:39:25 miki-laptop kernel: [<ffffffff815c7272>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Jun 24 10:39:25 miki-laptop kernel: Code: Bad RIP value. Jun 24 10:39:25 miki-laptop kernel: RIP [< (null)>] (null) Jun 24 10:39:25 miki-laptop kernel: RSP <ffff880084d37af0> Jun 24 10:39:25 miki-laptop kernel: CR2: 0000000000000000 Jun 24 10:39:25 miki-laptop kernel: ---[ end trace 946c0a8763286b97 ]--- Jun 24 10:39:25 miki-laptop org.a11y.atspi.Registry[5911]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server :0 Jun 24 10:39:25 miki-laptop org.a11y.atspi.Registry[5911]: after 2113 requests (2113 known processed) with 0 events remaining. -- Reboot -- (In reply to Michele Lacchia from comment #6) > This is not fixed as of Linux 4.6.2-1-ARCH. Sometimes the bug causes my > system to completely freeze and I have to reboot with the power button. In > journal I see: > > Jun 24 10:51:01 miki-laptop kernel: [drm:intel_cpu_fifo_underrun_irq_handler > [i915]] *ERROR* CPU pipe A FIFO underrun > > but the worse one is: ... Which is a completely separate and much more critical bug than the underrun. Please do file a separate bug report for it. I'm using MSI GS60 with latest mainline kernel, still hitting this bug: Sep 03 12:06:25 mylap kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun Extra info in case you need: # uname -a Linux mylap 4.8.0-rc3+ #85 SMP PREEMPT Thu Aug 25 17:20:53 CST 2016 x86_64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz GenuineIntel GNU/Linux # lspci 00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 07) 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 07) 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31) 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31) 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA Controller [AHCI mode] (rev 31) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1) 00:1c.2 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #3 (rev f1) 00:1c.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #4 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31) 00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31) 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) 01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2) 02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 20) 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5249 PCI Express Card Reader (rev 01) 04:00.0 Ethernet controller: Qualcomm Atheros Killer E2400 Gigabit Ethernet Controller (rev 10) 3e:00.0 Non-Volatile memory controller: Toshiba America Info Systems Device 010f (rev 01) Hi Over the course of the last month we submitted a significant number of fixes that could have fixed this bug. Can you please try to reproduce this bug on a recent drm-intel-nightly Kernel? Thanks, Paulo I'm having similar issues with a Skylake-based Lenovo T460s. I'm using Fedora 24 with the latest updates: $ uname -a Linux gaston 4.7.5-200.fc24.x86_64 #1 SMP Mon Sep 26 21:25:47 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux I see the comment above about bug fixes over the past month, but this is a fairly recent kernel, so I'm posting in case this is a good data point. I see many errors like the following: Oct 07 22:35:13 gaston kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun Oct 07 22:35:13 gaston kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun Oct 07 12:25:25 gaston kernel: [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun I've been seeing one or the other of my external monitors black out for about a second, and then come back. On a few occasions, the system has completely frozen during one of these blackouts. It also frequently freezes when left unattended, and always when the screens are "blanked". I've been able to prevent the system from freezing by changing the Power->"Blank screen" setting to "Never". When the system "freezes", the only way I've been able to recover is to hold down the power button. The issues only seem to happen when running with external monitors - I've never had the system freeze problem when I haven't had external monitors. For example, I ran for a week recently without external monitors and without any system freezes. I did a full fresh install of Fedora about a week ago to see if that would help. I ran for a few days without issues, but then it started again. I have no idea whether these error logs have anything to do with my system freeze problems. Please re-test with Paulo's patch to apply memory workarounds for skylake: https://patchwork.freedesktop.org/series/13548/ FYI I think this is somehow related to bug 91883, I regularly hit this since weeks, and always see the error from this bug here and the one from that bug appear together, and 1 of 2 externally connected screens "flicker", on a Skylake-based Lenovo T460s under Fedora 24 with the latest updates, today that's a 4.8.4-200.fc24.x86_64 (same as Andre Fredette; we're both @ Red Hat; the T460s is our standard widely rolled out model...) [60954.177636] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [60961.293531] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun [61713.069574] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=50476 end=50477) time 152 us, min 1073, max 1079, scanline start 1072, end 1083 [62135.779839] CPU2: Core temperature above threshold, cpu clock throttled (total events = 409148) [62135.779859] CPU2: Package temperature above threshold, cpu clock throttled (total events = 547662) [62135.779868] mce_notify_irq: 1 callbacks suppressed [62135.779869] mce: [Hardware Error]: Machine check events logged [62135.782856] CPU2: Core temperature/speed normal [64171.939147] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=198008 end=198009) time 160 us, min 1073, max 1079, scanline start 1070, end 1081 [64615.439710] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=224618 end=224619) time 158 us, min 1073, max 1079, scanline start 1072, end 1083 [65694.324329] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=289351 end=289352) time 178 us, min 1073, max 1079, scanline start 1068, end 1080 [66737.025209] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun Is this issue still seen with latest kernel? I haven't seen it for a while on the latest kernel. (In reply to Andy Lutomirski from comment #14) > I haven't seen it for a while on the latest kernel. Thanks Andy for your feedback. Closing as fixed then. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.