Bug 96938 - [HSW modeset regression] i915/drm locks up when quitting X and returning to fbdev console
Summary: [HSW modeset regression] i915/drm locks up when quitting X and returning to f...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2016-07-15 08:57 UTC by nkalkhof
Modified: 2016-12-05 07:53 UTC (History)
4 users (show)

See Also:
i915 platform: HSW
i915 features: display/atomic


Attachments
complete dmesg with drm.debug=0xe (155.15 KB, text/plain)
2016-07-15 08:57 UTC, nkalkhof
no flags Details
Kernel Oop logged at the moment of DisplayPort display disconnection (3.76 MB, image/jpeg)
2016-10-28 10:40 UTC, Alejandro Lorenzo
no flags Details

Description nkalkhof 2016-07-15 08:57:39 UTC
Created attachment 125084 [details]
complete dmesg with drm.debug=0xe

Hi,

drm modeset fails to switch back to fbdev console after quitting X. This happens when I startup my laptop with an external screen connected at DP2. After switching to the internal screen eDP, disconnect DP2 (undocking laptop) and shutting down X, i915/drm locks up. This issue has been around for months now.

[  159.046154] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  159.046159] IP: [<ffffffff8150b1b2>] __ww_mutex_lock_slowpath+0x92/0x1b0
[  159.046160] PGD 40bbd5067 PUD 40b30d067 PMD 0 
[  159.046162] Oops: 0002 [#1] SMP
[  159.046171] Modules linked in: snd_hda_codec_hdmi i915 fbcon bitblit snd_hda_codec_realtek softcursor font snd_hda_codec_generic intel_gtt i2c_algo_bit iwlmvm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm snd_hda_intel snd_hda_codec fb fbdev snd_hwdep snd_hda_core snd_pcm snd_timer iwlwifi thinkpad_acpi
[  159.046173] CPU: 1 PID: 1444 Comm: X Not tainted 4.7.0-rc7-danvet+ #19
[  159.046174] Hardware name: LENOVO qqqqENX407/qqqqENX407, BIOS GLET80WW (2.34 ) 07/23/2015
[  159.046175] task: ffff88040adbb980 ti: ffff8804097dc000 task.ti: ffff8804097dc000
[  159.046176] RIP: 0010:[<ffffffff8150b1b2>]  [<ffffffff8150b1b2>] __ww_mutex_lock_slowpath+0x92/0x1b0
[  159.046177] RSP: 0018:ffff8804097dfa20  EFLAGS: 00010282
[  159.046178] RAX: 0000000000000000 RBX: ffff8804002d0300 RCX: ffff88040afc3a60
[  159.046178] RDX: 0000000000000001 RSI: ffff8804097dfa30 RDI: ffff88040afc3a5c
[  159.046179] RBP: ffff8804097dfa78 R08: 0000000000000000 R09: 0000000000000000
[  159.046179] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88040adbb980
[  159.046180] R13: ffff88040afc3a5c R14: 00000000ffffffff R15: ffff88040afc3a58
[  159.046181] FS:  00007f7d131b18c0(0000) GS:ffff88041e300000(0000) knlGS:0000000000000000
[  159.046182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  159.046182] CR2: 0000000000000000 CR3: 000000040afad000 CR4: 00000000001406a0
[  159.046182] Stack:
[  159.046184]  0000004800000000 ffff88040afc3a60 ffff88040afc3a60 0000000000000000
[  159.046185]  0000043c00000438 0000046500000441 ffff8804002d0300 ffff88040afc5800
[  159.046186]  ffff88040019e000 ffff8804095f7880 0000000000000001 ffff88040afc3a58
[  159.046186] Call Trace:
[  159.046194]  [<ffffffffa0136a3b>] ? drm_modeset_lock+0x2b/0xc0 [drm]
[  159.046199]  [<ffffffffa01376a8>] ? drm_atomic_get_connector_state+0x28/0x160 [drm]
[  159.046202]  [<ffffffffa019756a>] ? __drm_atomic_helper_set_config+0x23a/0x2f0 [drm_kms_helper]
[  159.046205]  [<ffffffffa019a5d1>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x161/0x2d0 [drm_kms_helper]
[  159.046207]  [<ffffffffa019a764>] ? drm_fb_helper_set_par+0x24/0x50 [drm_kms_helper]
[  159.046222]  [<ffffffffa02fee31>] ? intel_fbdev_set_par+0x11/0x60 [i915]
[  159.046224]  [<ffffffffa00be6c8>] ? fb_set_var+0x208/0x3c0 [fb]
[  159.046227]  [<ffffffff8115d2be>] ? ext4_mark_iloc_dirty+0x49e/0x750
[  159.046229]  [<ffffffff8117c851>] ? __ext4_journal_get_write_access+0x31/0x70
[  159.046231]  [<ffffffffa0267cf4>] ? fbcon_blank+0x2b4/0x2f0 [fbcon]
[  159.046233]  [<ffffffff8128e863>] ? do_unblank_screen+0xd3/0x1a0
[  159.046235]  [<ffffffff81285970>] ? vt_ioctl+0x4c0/0x1260
[  159.046237]  [<ffffffff8117ca90>] ? __ext4_handle_dirty_metadata+0x40/0x1c0
[  159.046238]  [<ffffffff8127c0f1>] ? tty_ioctl+0x311/0xbc0
[  159.046240]  [<ffffffff81193723>] ? jbd2_journal_stop+0x143/0x290
[  159.046242]  [<ffffffff81106354>] ? dput+0xb4/0x230
[  159.046244]  [<ffffffff81101cd8>] ? do_vfs_ioctl+0x88/0x590
[  159.046246]  [<ffffffff81055ba5>] ? task_work_run+0x75/0x90
[  159.046247]  [<ffffffff81102216>] ? SyS_ioctl+0x36/0x70
[  159.046249]  [<ffffffff8150d29b>] ? entry_SYSCALL_64_fastpath+0x13/0x8f


See attached dmesg for more info.
Comment 1 nkalkhof 2016-08-24 15:51:14 UTC
issue still present in current drm-intel git.
Comment 2 Maarten Lankhorst 2016-10-06 11:55:20 UTC
I've been able to reproduce it as well, usually when running tests and disconnecting a MST display. You probably have a MST dock or MST display.

This regression is likely introduced by commit 0552f7651bc233e5407ab06ba97a9d7c25e19580 "drm/i915/mst: use reference counted connectors. (v3)"
Comment 3 nkalkhof 2016-10-09 10:06:33 UTC
(In reply to Maarten Lankhorst from comment #2)
> I've been able to reproduce it as well, usually when running tests and
> disconnecting a MST display. You probably have a MST dock or MST display.
> 
> This regression is likely introduced by commit
> 0552f7651bc233e5407ab06ba97a9d7c25e19580 "drm/i915/mst: use reference
> counted connectors. (v3)"


Marteen,

thanks for confirming this! I have a Lenovo series 3 docking station and a external monitor connected via HDMI.

could you add a patch I can try?
Comment 4 Alejandro Lorenzo 2016-10-24 21:21:33 UTC
I think i also hit this one and also should be able to test patches. Just glad to help :D
Comment 5 nkalkhof 2016-10-28 09:26:40 UTC
Still present in current drm-intel-nightly :(
Comment 6 Alejandro Lorenzo 2016-10-28 10:40:34 UTC
Created attachment 127584 [details]
Kernel Oop logged at the moment of DisplayPort display disconnection
Comment 7 Alejandro Lorenzo 2016-10-28 10:41:59 UTC
I am not sure if related, but today i was testing 4.8.5 and got a kernel oop when disconnecting the DisplayPort display while at the tty. (Added attachment as picture, since i was not logging the console)
Comment 8 Alejandro Lorenzo 2016-11-29 08:55:47 UTC
Kernel 4.8.11 seems to improve the situation a lot. The computer no longer crashes.

Still i think something is not quite right in the GPU after the disconnection because of some funny behavior i saw, but more testing is still required

However, for those of you affected by this, i would give the 4.8.11 a spin :D
Comment 9 nkalkhof 2016-12-05 07:45:01 UTC
Issue has venished with current drm-tip (intel nightly) git. Thx for fixing this :)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.