Summary: | [CFL] black screen with DP MST | ||
---|---|---|---|
Product: | DRI | Reporter: | Delete This Account <joerg425> |
Component: | DRM/Intel | Assignee: | Jose Roberto de Souza <jose.souza> |
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | high | CC: | arequipeno, frederik.schwan, h.becker, intel-gfx-bugs, joerg425, pmenzel+bugs.freedesktop.org, samuel |
Version: | DRI git | Keywords: | bisected, regression |
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | CFL | i915 features: | display/DP MST |
Attachments: |
Description
Delete This Account
2017-12-07 08:54:13 UTC
Created attachment 136022 [details]
dmesg on 4.14.3
Created attachment 136023 [details]
Xorg.0.log on 4.14.3
Created attachment 136024 [details]
corruption on URXVT (1)
Created attachment 136025 [details]
corruption on URXVT (2)
Created attachment 136026 [details]
corruption on URXVT (3)
Created attachment 136027 [details]
corruption on Termite (there's a terminal open on the right-hand side)
Created attachment 136028 [details]
corruption on Termite (background and foreground colours wrong)
Created attachment 136029 [details]
corruption on Termite (after forcing an update by highlighting a part)
There's also corruption in every second other app (screen parts that are not redrawn, corrupted menus) There's no corruption on Wayland, or with TearFree, so it seems to be two separate issues. There's still a black screen directly after boot with 4.14.5 Still a black screen with 4.15-rc3 (just tested) Created attachment 136137 [details]
dmesg with 4.15-rc3 and drm.debug=0x1e
I logged in blind on the black screen and stored the output of dmesg
Hello Jrg, Is there any change that you can bisect the culprit commit for this issue? Thank you. Hi Elizabeth, if it's not too difficult, I can do it, yes It seems doable. I'll read up a bit on it and report back with the bisection 7042c2b9a19e74972f5783ab3b7ef98ebdee293f is the first bad commit commit 7042c2b9a19e74972f5783ab3b7ef98ebdee293f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Nov 25 19:41:55 2017 +0000 drm/i915/fbdev: Serialise early hotplug events with async fbdev config commit a45b30a6c5db631e2ba680304bd5edd0cd1f9643 upstream. As both the hotplug event and fbdev configuration run asynchronously, it is possible for them to run concurrently. If configuration fails, we were freeing the fbdev causing a use-after-free in the hotplug event. <7>[ 3069.935211] [drm:intel_fb_initial_config [i915]] Not using firmware configuration <7>[ 3069.935225] [drm:drm_setup_crtcs] looking for cmdline mode on connector 77 <7>[ 3069.935229] [drm:drm_setup_crtcs] looking for preferred mode on connector 77 0 <7>[ 3069.935233] [drm:drm_setup_crtcs] found mode 3200x1800 <7>[ 3069.935236] [drm:drm_setup_crtcs] picking CRTCs for 8192x8192 config <7>[ 3069.935253] [drm:drm_setup_crtcs] desired mode 3200x1800 set on crtc 43 (0,0) <7>[ 3069.935323] [drm:intelfb_create [i915]] no BIOS fb, allocating a new one <4>[ 3069.967737] general protection fault: 0000 [#1] PREEMPT SMP <0>[ 3069.977453] --------------------------------- <4>[ 3069.977457] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_in <4>[ 3069.977492] CPU: 1 PID: 15414 Comm: kworker/1:0 Tainted: G U 4.14.0-CI-CI_DRM_3388+ #1 <4>[ 3069.977497] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 3069.977508] Workqueue: events output_poll_execute <4>[ 3069.977512] task: ffff880177734e40 task.stack: ffffc90001fe4000 <4>[ 3069.977519] RIP: 0010:__lock_acquire+0x109/0x1b60 <4>[ 3069.977523] RSP: 0018:ffffc90001fe7bb0 EFLAGS: 00010002 <4>[ 3069.977526] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000282 RCX: 0000000000000000 <4>[ 3069.977530] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880170d4efd0 <4>[ 3069.977534] RBP: ffffc90001fe7c70 R08: 0000000000000001 R09: 0000000000000000 <4>[ 3069.977538] R10: 0000000000000000 R11: ffffffff81899609 R12: ffff880170d4efd0 <4>[ 3069.977542] R13: ffff880177734e40 R14: 0000000000000001 R15: 0000000000000000 <4>[ 3069.977547] FS: 0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 <4>[ 3069.977551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 3069.977555] CR2: 00007f7e8b7bcf04 CR3: 0000000003e0f000 CR4: 00000000003406e0 <4>[ 3069.977559] Call Trace: <4>[ 3069.977565] ? mark_held_locks+0x64/0x90 <4>[ 3069.977571] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 3069.977575] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 3069.977579] ? trace_hardirqs_on_caller+0xde/0x1c0 <4>[ 3069.977583] ? _raw_spin_unlock_irq+0x2f/0x50 <4>[ 3069.977588] ? finish_task_switch+0xa5/0x210 <4>[ 3069.977592] ? lock_acquire+0xaf/0x200 <4>[ 3069.977596] lock_acquire+0xaf/0x200 <4>[ 3069.977600] ? __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977604] _raw_spin_lock+0x2a/0x40 <4>[ 3069.977608] ? __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977612] __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977616] ? drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977621] ? drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977625] drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977630] output_poll_execute+0x8d/0x180 <4>[ 3069.977635] process_one_work+0x22e/0x660 <4>[ 3069.977640] worker_thread+0x48/0x3a0 <4>[ 3069.977644] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 3069.977649] kthread+0x102/0x140 <4>[ 3069.977653] ? process_one_work+0x660/0x660 <4>[ 3069.977657] ? kthread_create_on_node+0x40/0x40 <4>[ 3069.977662] ret_from_fork+0x27/0x40 <4>[ 3069.977666] Code: 8d 62 f8 c3 49 81 3c 24 e0 fa 3c 82 41 be 00 00 00 00 45 0f 45 f0 83 fe 01 77 86 89 f0 49 8b 44 c4 08 48 85 c0 0f 84 76 ff ff ff <f0> ff 80 38 01 00 00 8b 1d 62 f9 e8 01 45 8b 85 b8 <1>[ 3069.977707] RIP: __lock_acquire+0x109/0x1b60 RSP: ffffc90001fe7bb0 <4>[ 3069.977712] ---[ end trace 4ad012eb3af62df7 ]--- In order to keep the dev_priv->ifbdev alive after failure, we have to avoid the free and leave it empty until we unload the module (which is less than ideal, but a necessary evil for simplicity). Then we can use intel_fbdev_sync() to serialise the hotplug event with the configuration. The serialisation between the two was removed in commit 934458c2c95d ("Revert "drm/i915: Fix races on fbdev""), but the use after free is much older, commit 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") Fixes: 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") Fixes: 934458c2c95d ("Revert "drm/i915: Fix races on fbdev"") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lukas Wunner <lukas@wunner.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/20171125194155.355-1-chris@chris-wilson.co.uk (cherry picked from commit ad88d7fc6c032ddfb32b8d496a070ab71de3a64f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> :040000 040000 3ac2fa96e69c677b86f1970f1e139fe4115c6894 4c737e5fed0e1d42aa2d6bcce49907a711a9d0c1 M drivers (In reply to jrg2718 from comment #16) > 7042c2b9a19e74972f5783ab3b7ef98ebdee293f is the first bad commit > commit 7042c2b9a19e74972f5783ab3b7ef98ebdee293f > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Sat Nov 25 19:41:55 2017 +0000 Could you take a look on this Chris? (In reply to jrg2718 from comment #16) > 7042c2b9a19e74972f5783ab3b7ef98ebdee293f is the first bad commit > commit 7042c2b9a19e74972f5783ab3b7ef98ebdee293f > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Sat Nov 25 19:41:55 2017 +0000 > > drm/i915/fbdev: Serialise early hotplug events with async fbdev config First bad commit for what? The blank screen during boot is expected behaviour for your setup. It's expected that the screen stays black and I can't use the computer? Maybe I didn't understand you correctly. How exactly is it expected behavior that a fairly standard setup with several configs (including the default config) boots up in 4.14.3 with the screen working, but boots up to a black screen in 4.14.4 (bisected to cited patch)? I just tested with 4.14.8, and the screen still stays black Hi there, 1. if I understood correctly this is a desktop with only one DP monitor plugged, right? 2. Could you please try latest drm-tip branch from https://cgit.freedesktop.org/drm-tip ? We have recently merged few fixes for CFL... 3. Also, regarding the bisect, have you tried to git revert this patch and confirm this was really caused by this commit? 4. Could you please post the logs for 4.13, or whatever latest version that works, with drm.debug=0x1e? (working one for comparison) 5. Do you have any kind xorg.config that forces intel X driver? I wonder why this didn't fallback to modesetting driver. Thanks, Rodrigo. 1. Yes, one monitor connected with DP. There was an Nvidia card in there that is supposed to be used later for virtualized compute, but for debugging purposes, I took it completely out. 2. I'll try drm-tip later and post an update. 3. On the bisected git, I did git bisect reset bisect/bad && git checkout HEAD~1 With this code, the screen comes up again. 4. I'll post the dmesg on the working build of 4.14.3+ 5. The machine is booting to a login getty. There's no custom xorg.conf for X Created attachment 136353 [details]
dmesg with drm.debug=0x1e on good 4.14.3+
Created attachment 136354 [details]
dmesg with drm.debug=0x1e on bad 4.14.3+
I just double-checked that 7042c2b9a19e74972f5783ab3b7ef98ebdee293f leads to the black screen. I logged on blindly again to get the dmesg: maybe the output with only this commit as difference makes it easier to compare
Created attachment 136355 [details]
dmesg with drm.debug=0x1e on drm-tip
I tested with drm-tip (from git://anongit.freedesktop.org/drm-tip), and the screen still stays black
I just tested with 4.14.9, and the issue remains (screen stays black after booting up). Chris, what you mean by "expected behaviour for your setup." is because it seems the lack of stolen space? [drm:i915_gem_object_create_stolen_for_preallocated [i915]] failed to allocate stolen space Probably something that needs to be adjustable on BIOS? Comparing good x bad logs I just see that drm:drm_setup_crtcs() is no longer called. Probably because drm_fb_helper_hotplug_event() is not called anymore after this patch because it is protected now by + if (ifbdev->vma). Do we really need to sync on this vma for this hotplug event? jrq, 1. could you please check if you have ways to increase the stolen memory? 2. IF 1 doesn't solve, could you please check if by removing this - if (ifbdev->vma). before drm_fb_helper_hotplug_event() call would get things working for you back? 1. I checked the GPU-related UEFI settings. Setting the vague "Share memory" to any number as opposed to "Auto" does not resolve the issue. 2. Commenting out the "if (ifbdev->vma)" in intel_fbdev_output_poll_changed() does indeed turn the screen back on and fixes the issue on the machine! Thanks for the check and please ignore the patch that I sent yesterday. Chris has provided other ideas. So, based on his idea and looking to the code and history I have an experiment that I'd like to do. Could you please revert d55159918176 ("drm/i915: Remove references to crtc->active from intel_fbdev.c") and let me know if this by itself would solve the issue for you and paste the dmesg? Thanks, Rodrigo I reverted d551599181769571f4f68dd93e5d8b15868889af "drm/i915: Remove references to crtc->active from intel_fbdev.c" on current drm-tip, but unfortunately, that doesn't solve the issue. Created attachment 136576 [details]
dmesg with drm.debug=0x1e and d55159918176 reverted on drm-tip
I just tested with HDMI instead of DP, and that seems to work well. > I just tested with HDMI instead of DP, and that seems to work well.
Huum that is odd, can you attach a dmesg with drm.debug=0xe1 when using HDMI?
Created attachment 136860 [details]
dmesg with HDMI and drm.debug=0x1e on 4.14.4
Main difference why one works as is and the other doesn't: Bad (with DP): [ 2.378275] [drm:drm_setup_crtcs [drm_kms_helper]] No connectors reported connected with modes [ 2.378276] [drm:drm_setup_crtcs [drm_kms_helper]] connector 58 enabled? no [ 2.378278] [drm:drm_setup_crtcs [drm_kms_helper]] connector 63 enabled? no [ 2.378279] [drm:drm_setup_crtcs [drm_kms_helper]] connector 65 enabled? no [ 2.378280] [drm:drm_setup_crtcs [drm_kms_helper]] connector 69 enabled? no [ 2.378300] [drm:intel_fb_initial_config [i915]] Not using firmware configuration Good (with HDMI): [ 2.394275] [drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:69:HDMI-A-3] disconnected [ 2.394288] [drm:drm_setup_crtcs [drm_kms_helper]] connector 58 enabled? yes I was able to reproduce here, the issue is related to DP MST. Working in a solution. In the mean time you can use DP without MST by adding this i915.enable_dp_mst=0 to the kernel parameters. *** Bug 104425 has been marked as a duplicate of this bug. *** Jrq, Can you still repro the issue on dinq kernel without additional cmdline parameter which Jose suggested ? If yes can you try reverting the below patch and retry ? commit 669c9215afea4e3684ef13e54e6908e9ae34f0ae Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Mon Sep 4 12:48:38 2017 +0200 drm/atomic: Make async plane update checks work as intended, v2. By always keeping track of the last commit in plane_state, we know whether there is an active update on the plane or not. With that information we can reject the fast update, and force the slowpath to be used as was originally intended. We cannot use plane_state->crtc->state here, because this only mentions the most recent commit for the crtc, but not the planes that were part of it. We specifically care about what the last commit involving this plane is, which can only be tracked with a pointer in the plane state. Changes since v1: - Clean up the whole function here, instead of partially earlier. - Add mention in the commit message why we need commit in plane_state. - Swap plane->state in intel_legacy_cursor_update, instead of reassigning all variables. With this commit We know that the cursor is not part of any active commits so this hack can be removed. Cc: Gustavo Padovan <gustavo.padovan@collabora.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20170904104838.23822-7-maarten.lankhorst@linux.intel.com [mlankhorst: Amend commit for merge conflicts with drm-intel] I have the same issue. That this is MST related would make sense since the only none MST desktop I have is not affected but 2 MST desktops are. All are on 4.14.17 (does not work with 4.15.2 either). In past I've worked on exactly similar issue. After boot, only cursor and clock at bottom was visible. And the reason i could root cause was that the user-space was sending commits too fast. By applying the below patch, I could fix it. drm/i915: Always wait for flip_done, v2. The next commit removes the wait for flip_done in in drm_atomic_helper_commit_cleanup_done, but we need it for the tests to pass. Instead of using complicated vblank tracking which ends up being ignored anyway, call the correct atomic helper. :) Changes since v1: - Always call drm_atomic_helper_wait_for_flip_done, even for legacy cursor updates. (danvet) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 3 +- drivers/gpu/drm/i915/intel_display.c | 84 +++--------------------------------- 2 files changed, 8 insertions(+), 79 deletions(-) Maybe you wanna try ? First things should be first, not at comment #42. We support Coffeelake starting from v4.15. To put it bluntly, we don't care if it's broken in v4.14. We are not spending time on debugging it on v4.14. Please update to v4.15 or later, add drm.debug=14 module parameter, reproduce the issue, and attach full dmesg from boot. Everyone with the same or similar issue, check if you have a Coffeelake machine. If not, this is not your bug. Please file a new one instead. Okay, found the drm-tip logs too. :) Please update to current drm-tip, and retry. jrg2718@gmail.com, does i915.enable_dp_mst=0 work around the issue for you? (In reply to Jani Nikula from comment #42) > Everyone with the same or similar issue, check if you have a Coffeelake > machine. If not, this is not your bug. Please file a new one instead. As requested, I have re-opened https://bugs.freedesktop.org/show_bug.cgi?id=104425 First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. If this is still reproducible, please try this: https://bugs.freedesktop.org/show_bug.cgi?id=104425#c18 There is a simpler solution could you test this? https://patchwork.freedesktop.org/patch/217789/ Is this also fixed by commit df9e6521749ab33cde306e8a4350b0ac7889220a Author: José Roberto de Souza <jose.souza@intel.com> Date: Wed Apr 18 16:41:58 2018 -0700 drm/i915/fbdev: Enable late fbdev initial configuration in drm-tip? Closing, please re-open if occurs again. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.