Bug 93509

Summary: [SNB] [4.4 regression] vblank wait timed out on crtc
Product: DRI Reporter: Tomas Janousek <tomi>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, main.haarp, nemesis
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: SNB i915 features: display/atomic

Description Tomas Janousek 2015-12-26 10:54:43 UTC
I'm sometimes getting "vblank wait timed out on crtc 0" when switching between X and console on my ThinkPad T420.

Steps to reproduce:
1. get a machine with sandybridge and optimus
2. boot it with some 4.4-rc kernel
3. power off nvidia using bbswitch (couldn't reproduce without this)
4. switch between X and console a few times
   (while :; do chvt 10; sleep 2; chvt 1; sleep 2; done)
5. this happens:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1382 at drivers/gpu/drm/drm_irq.c:1216 drm_wait_one_vblank+0x1b0/0x1c0()
vblank wait timed out on crtc 0
Modules linked in: ccm br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c dm_bufio binfmt_misc tun bnep nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_raw uvcvideo xt_TCPMSS xt_tcpudp iptable_mangle btusb ipt_MASQUERADE nf_nat_masquerade_ipv4 videobuf2_vmalloc btrtl xt_addrtype videobuf2_memops videobuf2_v4l2 btbcm iptable_nat arc4 nf_conntrack_ipv4 btintel videobuf2_core iwldvm bluetooth nf_defrag_ipv4 nf_nat_ipv4 v4l2_common nf_nat videodev mac80211 iptable_filter ip_tables x_tables x86_pkg_temp_thermal kvm_intel kvm iwlwifi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_codec cfg80211 irqbypass crc32_pclmul snd_hwdep crc32c_intel sdhci_pci snd_hda_core sdhci snd_pcm_oss aesni_intel mmc_core aes_x86_64 lrw hdaps(O) thinkpad_ec(O) glue_helper psmouse ablk_helper snd_mixer_oss cryptd snd_pcm serio_raw e1000e thinkpad_acpi ehci_pci snd_timer nvram ehci_hcd wmi usbcore ptp snd pps_core tpm_rng tpm_tis tpm battery bbswitch(O) nf_conntrack_netlink soundcore ac i2c_i801 usb_common nfnetlink evdev nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_h323 nf_conntrack_irc nf_conntrack_ftp nf_conntrack coretemp hwmon ecryptfs loop dm_crypt
CPU: 0 PID: 1382 Comm: Xorg Tainted: G           O    4.4.0-rc2-lis64-bisect+ #13
Hardware name: LENOVO 4178A3G/4178A3G, BIOS 83ET76WW (1.46 ) 07/05/2013
 0000000000000000 0000000061618deb ffff88041fc0b790 ffffffff8f301fcc
 ffff88041fc0b7d8 ffff88041fc0b7c8 ffffffff8f05a062 ffff88042ad01000
 0000000000000000 0000000000000000 0000000000000f58 ffff88042b485a08
Call Trace:
 [<ffffffff8f301fcc>] dump_stack+0x4e/0x82
 [<ffffffff8f05a062>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8f05a0fc>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff8f703d0e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
 [<ffffffff8f095765>] ? finish_wait+0x55/0x70
 [<ffffffff8f427730>] drm_wait_one_vblank+0x1b0/0x1c0
 [<ffffffff8f095bf0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffff8f4bd302>] intel_atomic_commit+0x732/0x1890
 [<ffffffff8f44017e>] ? drm_atomic_check_only+0x18e/0x590
 [<ffffffff8f43ff27>] ? drm_atomic_add_affected_connectors+0x27/0xf0
 [<ffffffff8f4405b7>] drm_atomic_commit+0x37/0x60
 [<ffffffff8f41d76f>] restore_fbdev_mode+0x22f/0x260
 [<ffffffff8f41fac3>] drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80
 [<ffffffff8f41fb3d>] drm_fb_helper_set_par+0x2d/0x50
 [<ffffffff8f4d45ca>] intel_fbdev_set_par+0x1a/0x60
 [<ffffffff8f365ae1>] ? fb_set_var+0x301/0x450
 [<ffffffff8f365a28>] fb_set_var+0x248/0x450
 [<ffffffff8f08822c>] ? update_curr+0x5c/0x120
 [<ffffffff8f0888bd>] ? check_preempt_wakeup+0xdd/0x1c0
 [<ffffffff8f360343>] fbcon_blank+0x363/0x3b0
 [<ffffffff8f3e69e3>] do_unblank_screen+0xc3/0x190
 [<ffffffff8f3dc9d9>] complete_change_console+0x59/0xe0
 [<ffffffff8f3dd170>] vt_ioctl+0x710/0x12e0
 [<ffffffff8f425719>] ? drm_ioctl+0x189/0x540
 [<ffffffff8f429a50>] ? drm_setmaster_ioctl+0xa0/0xa0
 [<ffffffff8f3ddf08>] vt_compat_ioctl+0x188/0x3b0
 [<ffffffff8f0ae650>] ? __call_rcu.constprop.56+0x250/0x250
 [<ffffffff8f3cfc14>] tty_compat_ioctl+0x54/0xc0
 [<ffffffff8f1f56bb>] compat_SyS_ioctl+0xeb/0x11f0
 [<ffffffff8f09a3c7>] ? percpu_up_read+0x17/0x40
 [<ffffffff8f1a4291>] ? __sb_end_write+0x21/0x30
 [<ffffffff8f1a1f35>] ? vfs_write+0x165/0x1a0
 [<ffffffff8f002c9a>] do_fast_syscall_32+0x9a/0x150
 [<ffffffff8f7064d2>] sysenter_flags_fixed+0x8/0x12
---[ end trace 5d6ad91f242dc76e ]---

I couldn't reproduce this on 4.3 so I bisected it and got:

43d59eda1f69631c267e06ab6b94ed3c14f1f6d1 is the first bad commit
commit 43d59eda1f69631c267e06ab6b94ed3c14f1f6d1
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Thu Sep 24 15:53:07 2015 -0700

    drm/i915: Eliminate usage of plane_wm_parameters from ILK-style WM code (v2)
    
    Just pull the info out of the plane state structure rather than staging
    it in an additional structure.
    
    v2: Add 'visible' condition to sprites_scaled so that we don't limit the
        WM level when the sprite isn't enabled.  (Ville)
    
    Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
    Reviewed-by(v1): Ander Conselvan de Oliveira <conselvan2@gmail.com>
    Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 b11db95f066a752dcccb0082490eabf21f62e8db 88bc02171246d693a868cd9ddb3144ee5d9b0ea8 M      drivers

I also tried merging yesterday's drm-intel-next (7447a2b221cd4df3960e82478a4ee29312589611) to yesterday's master (a88164345b81292b55a8d4829fdd35c8d611cd7d) and couldn't reproduce it either. Bisecting gave me:

92826fcdfc147a7d16766e987c12a9dfe1860c3f is the first fix commit
commit 92826fcdfc147a7d16766e987c12a9dfe1860c3f
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date:   Thu Dec 3 13:49:13 2015 +0100

    drm/i915: Calculate watermark related members in the crtc_state, v4.
    
    This removes pre/post_wm_update from intel_crtc->atomic, and
    creates atomic state for it in intel_crtc.
    
    Changes since v1:
    - Rebase on top of wm changes.
    Changes since v2:
    - Split disable_cxsr into a separate patch.
    Changes since v3:
    - Move some of the changes to intel_wm_need_update.
    
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/56603A49.5000507@linux.intel.com
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 b97b4b2338e1da2ee9e93791154b4deb2ecaf8b6 ae91735a668b6785e55122c646dbca93b54eb8c9 M      drivers

I can confirm that cherry-picking just this 92826fcdf on top of master is enough to fix the issue for me and I intend to be running that for a while.

It is worth noting that I've had problems with this "vblank wait timed out on crtc 0" in the past and it's just this switching-between-X-and-console-repeatedly that I can't reproduce in 4.3. I need to keep this in xorg.conf:

        Option      "TearFree" "true"
        Option      "SwapbuffersWait" "false"
        Option      "VSync" "false"

Otherwise tearing and/or stuttering appears within a few days of X running (although I'm not sure if all that is needed with current kernels, I added this 4.1-ish). I do get tolerable visual artifacts regularly when playing youtube videos in chrome (playing them using mpv with hwdec off is fine), but that's a lot better than 5 fps and random X crashes.
Comment 1 Gilles Hamel 2016-03-20 22:07:42 UTC
I confirm this bug. Sometimes, Xserver hangs when switching back VT from text mode to graphical display. When it hangs, suspend2ram and resume fixes the issue. Here is the stack :

------------[ cut here ]------------
WARNING: CPU: 2 PID: 10921 at drivers/gpu/drm/drm_irq.c:1326 drm_wait_one_vblank+0x1b0/0x1c0 [drm]()
vblank wait timed out on crtc 0
Modules linked in: sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev iTCO_wdt i
 syscopyarea mei sysfillrect sysimgblt fb_sys_fops i2c_algo_bit snd_timer ptp pps_core nvram snd sound
CPU: 2 PID: 10921 Comm: Xorg Tainted: G     U  W  O    4.4.5-1-ARCH #1
Hardware name: LENOVO 4236JZ5/4236JZ5, BIOS 83ET78WW (1.48 ) 01/21/2016
 0000000000000286 0000000074de7b7f ffff8800c2edf810 ffffffff812cb391
 ffff8800c2edf858 ffffffffa0533272 ffff8800c2edf848 ffffffff810776e2
 ffff880118b99800 0000000000000000 0000000000000000 00000000000ac2c4
Call Trace:
 [<ffffffff812cb391>] dump_stack+0x63/0x82
 [<ffffffff810776e2>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107777c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff810b8825>] ? finish_wait+0x55/0x70
 [<ffffffff810747eb>] ? __mmdrop+0x7b/0xd0
 [<ffffffffa0506840>] drm_wait_one_vblank+0x1b0/0x1c0 [drm]
 [<ffffffff810b8cc0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffffa0791685>] intel_atomic_commit+0x475/0x1430 [i915]
 [<ffffffffa051f55e>] ? drm_atomic_check_only+0x18e/0x590 [drm]
 [<ffffffffa051f307>] ? drm_atomic_add_affected_connectors+0x27/0xf0 [drm]
 [<ffffffffa051f997>] drm_atomic_commit+0x37/0x60 [drm]
 [<ffffffffa0572faf>] restore_fbdev_mode+0x22f/0x260 [drm_kms_helper]
 [<ffffffffa0575313>] drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80 [drm_kms_helper]
 [<ffffffffa057538d>] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
 [<ffffffffa07a819a>] intel_fbdev_set_par+0x1a/0x60 [i915]
 [<ffffffff812f2db9>] ? __percpu_counter_add+0x59/0x80
 [<ffffffff8133fa46>] fb_set_var+0x236/0x460
 [<ffffffff8117db21>] ? shmem_recalc_inode+0x71/0xa0
 [<ffffffff810a744c>] ? __enqueue_entity+0x6c/0x70
 [<ffffffff810adfeb>] ? enqueue_entity+0x1fb/0xd20
 [<ffffffff813366ef>] fbcon_blank+0x30f/0x350
 [<ffffffff813b2ef3>] do_unblank_screen+0xc3/0x190
 [<ffffffff813a941a>] vt_ioctl+0x50a/0x12e0
 [<ffffffff811f0050>] ? filename_parentat+0xe0/0x150
 [<ffffffff8139d731>] tty_ioctl+0x361/0xc30
 [<ffffffff813e5693>] ? vga_arb_release+0xe3/0x130
 [<ffffffff811c33d0>] ? kfree+0x170/0x180
 [<ffffffff811fff5c>] ? mntput_no_expire+0x2c/0x1b0
 [<ffffffff811f3258>] do_vfs_ioctl+0x298/0x480
 [<ffffffff811fd307>] ? __fget+0x77/0xb0
 [<ffffffff811f34b9>] SyS_ioctl+0x79/0x90
 [<ffffffff815970ee>] entry_SYSCALL_64_fastpath+0x12/0x6d
---[ end trace 5926662876f4c875 ]---
Comment 2 Gilles Hamel 2016-03-24 20:32:31 UTC
Here, upgrading to linux 4.5 have fixed the issue :)
Great !
Comment 3 Tomas Janousek 2016-03-25 10:33:36 UTC
Yeah, that's good news. Not really surprising given "drm/i915: Calculate watermark related members in the crtc_state, v4." is pretty old and was in the queue already when I researched this, but I'm very glad to hear that they haven't managed to break it with something else. :-)
Comment 4 Jani Nikula 2016-03-29 08:26:32 UTC
Thanks for following up.
Comment 5 Erik Quaeghebeur 2016-04-14 20:10:36 UTC
(In reply to Tomas Janousek from comment #3)
> "drm/i915: Calculate
> watermark related members in the crtc_state, v4." is pretty old and was in
> the queue already when I researched this, [...]

I think I'm hit with the same bug. Is there any way to find out which kernel releases have it. Specifically, has it been back-ported to the 4.1 line?

To verify that I am hit with this bug, my trace:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1011 at drivers/gpu/drm/drm_irq.c:1141 drm_wait_one_vblank+0x9d/0x170 [drm]()
vblank wait timed out on crtc 0
Modules linked in: ctr ccm af_packet uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev hid_logitech_hidpp hid_logitech_dj btrfs xor zlib_deflate snd_hda_codec_hdmi snd_hda_codec_generic raid6_pq zlib_inflate arc4 i915 coretemp x86_pkg_temp_thermal intel_powerclamp fbcon bitblit softcursor font aesni_intel cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea aes_x86_64 glue_helper drm_kms_helper iwldvm lrw gf128mul mac80211 drm sdhci_pci ablk_helper cryptd sdhci microcode psmouse mmc_core snd_hda_intel snd_hda_controller i2c_i801 xhci_pci iwlwifi xhci_hcd snd_hda_codec i2c_core snd_hwdep fb snd_hda_core cfg80211 wmi thinkpad_acpi fbdev e1000e intel_gtt snd_pcm agpgart snd_timer nvram ptp snd pps_core soundcore ehci_pci hwmon ehci_hcd rfkill evdev video ipv6
CPU: 0 PID: 1011 Comm: X Not tainted 4.1.15-gentoo-r1 #3
Hardware name: LENOVO 4290W4H/4290W4H, BIOS 8DET70WW (1.40 ) 05/14/2015
 0000000000000000 ffffffffa039fd90 ffffffff8143e0e9 ffff88020f843888
 ffffffff8104739c 0000000000000000 ffff8800d857a000 0000000000000000
 0000000000000000 00000000000024ef ffffffff81047415 ffffffffa03a1518
Call Trace:
 [<ffffffff8143e0e9>] ? dump_stack+0x40/0x50
 [<ffffffff8104739c>] ? warn_slowpath_common+0x7c/0xb0
 [<ffffffff81047415>] ? warn_slowpath_fmt+0x45/0x50
 [<ffffffff8107c32e>] ? prepare_to_wait_event+0x7e/0xf0
 [<ffffffffa037660d>] ? drm_wait_one_vblank+0x9d/0x170 [drm]
 [<ffffffff8107c220>] ? wait_woken+0x80/0x80
 [<ffffffffa04b35f0>] ? drm_plane_helper_commit+0x250/0x2c0 [drm_kms_helper]
 [<ffffffffa0692587>] ? intel_crtc_set_config+0x817/0xf60 [i915]
 [<ffffffffa038d91a>] ? drm_atomic_state_clear+0x10a/0x180 [drm]
 [<ffffffffa037ed0f>] ? drm_mode_set_config_internal+0x5f/0x100 [drm]
 [<ffffffffa04bb6c9>] ? restore_fbdev_mode+0xb9/0xe0 [drm_kms_helper]
 [<ffffffffa04bd4eb>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x1b/0x60 [drm_kms_helper]
 [<ffffffffa04bd54d>] ? drm_fb_helper_set_par+0x1d/0x40 [drm_kms_helper]
 [<ffffffffa06a0901>] ? intel_fbdev_set_par+0x11/0x60 [i915]
 [<ffffffff8114c935>] ? do_sys_poll+0x105/0x530
 [<ffffffffa023f80c>] ? fb_set_var+0x16c/0x3a0 [fb]
 [<ffffffffa05d7355>] ? fbcon_blank+0x225/0x2e0 [fbcon]
 [<ffffffff812951b2>] ? do_unblank_screen+0xb2/0x1e0
 [<ffffffff8128b86f>] ? complete_change_console+0x4f/0xd0
 [<ffffffff8128c9ae>] ? vt_ioctl+0x10be/0x1340
 [<ffffffffa037450c>] ? drm_ioctl+0x18c/0x580 [drm]
 [<ffffffff8113a008>] ? __sb_end_write+0x28/0x60
 [<ffffffff8128066e>] ? tty_ioctl+0x3ce/0xc00
 [<ffffffff8114ac98>] ? do_vfs_ioctl+0x2e8/0x4f0
 [<ffffffff8113a008>] ? __sb_end_write+0x28/0x60
 [<ffffffff81137fa3>] ? vfs_write+0x183/0x1b0
 [<ffffffff8114aed6>] ? SyS_ioctl+0x36/0x80
 [<ffffffff81443097>] ? system_call_fastpath+0x12/0x6a
---[ end trace 0dabb2916250fc62 ]---
Comment 6 Christian Reis 2016-04-15 17:37:31 UTC
See also bug 93782.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.