Bug 107951 - Console screen corruption
Summary: Console screen corruption
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high minor
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-16 17:18 UTC by Martin Jørgensen
Modified: 2019-04-08 21:36 UTC (History)
5 users (show)

See Also:
i915 platform: KBL
i915 features: display/Other


Attachments
picture of the corruption after switching to console (3.41 MB, image/jpeg)
2018-09-16 17:19 UTC, Martin Jørgensen
no flags Details
another picture of the corruption after switching to console (3.50 MB, image/jpeg)
2018-09-16 17:20 UTC, Martin Jørgensen
no flags Details
full screen image with corruption (3.27 MB, image/jpeg)
2018-09-16 17:20 UTC, Martin Jørgensen
no flags Details
kernel_log_latest_drm_tip_with_flags_Denis (3.37 MB, text/plain)
2018-09-17 11:43 UTC, Denis
no flags Details
Console corruption (regression) with kernel 4.19.28 (156.81 KB, image/jpeg)
2019-04-08 21:33 UTC, Miguel A. Vallejo
no flags Details

Description Martin Jørgensen 2018-09-16 17:18:44 UTC
I'm running latest Debian sid/buster on a Dell XPS 9370 laptop (Kabylake-R graphics).

Sometimes, when switching to a tty console after been running a Xorg session, my laptop screen gets corrupted in the top left corner. This corruption disappears when switching back to the Xorg session or when rebooting the machine.
Comment 1 Martin Jørgensen 2018-09-16 17:19:34 UTC
Created attachment 141586 [details]
picture of the corruption after switching to console
Comment 2 Martin Jørgensen 2018-09-16 17:20:13 UTC
Created attachment 141587 [details]
another picture of the corruption after switching to console
Comment 3 Martin Jørgensen 2018-09-16 17:20:53 UTC
Created attachment 141588 [details]
full screen image with corruption
Comment 4 Lakshmi 2018-09-17 08:19:00 UTC
Martin, can you attach the full dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.

This will help us in investigating the issue.
Comment 5 Lakshmi 2018-09-17 08:20:44 UTC
Can you also verify if same issue is on latest drm-tip:
https://cgit.freedesktop.org/drm-tip
Comment 6 Denis 2018-09-17 10:12:47 UTC
Hi, I can confirm the same behavior on my machine (KBL based).
And the same with Martin, issue was seen on Debian buster (with latest kernel from repo).

Will try suggested kernel also.
Comment 7 Denis 2018-09-17 11:43:08 UTC
Created attachment 141601 [details]
kernel_log_latest_drm_tip_with_flags_Denis

Attaching requested log. also will try to roll back drm-tip somewhere to 4.13 kernel version, just in case of possible regression
Comment 8 Denis 2018-09-18 11:43:32 UTC
debian is extremely unfriendly for compiling kernels :( at least for me.

So I decided to take kernel from stable branch

Linux debian 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 GNU/Linux


Can't reproduce the issue on it. So would be great to bisect it. I will try to do this on intel-drm... on Ubuntu it was quite easy.
Comment 9 Denis 2018-09-18 16:18:21 UTC
providing my results here, maybe they will help somebody.
I couldn't build and boot to any kernel from drm-intel. 
Successfully were built 2 kernels:

>drm-intel-fixes-2017-01-19  
4.10 rc-3
>b21ebf2fb4cde1618915a97cc773e287ff49173e   
4.16 rc2
On both I stucked on startup screen "booting kernel" or similar.

All other kernels between 4.10 and 4.16 returned this error:

Unsupported relocation type: R_X86_64_PLT32 (4)
make[2]: *** [arch/x86/boot/compressed/Makefile:122: arch/x86/boot/compressed/vmlinux.relocs] Error 1
make[2]: *** Waiting for unfinished jobs....
  CC      arch/x86/boot/video-bios.o
make[1]: *** [arch/x86/boot/Makefile:112: arch/x86/boot/compressed/vmlinux] Error 2

From what I found, this issue can be solved by downgrading binutils, but this require to downgrade and gcc/g++ also, and appropriate versions don't exist in repo (the lowest gcc-6 still requires highest binutils).

Finally after trying to install all needed dependencies manually I stucked with:

>sudo apt-get -f install
>sudo: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.26' not found (required >by sudo)
>sudo: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.26' not found (required >by /usr/lib/sudo/libsudo_util.so.0)

Maybe somebody will have better luck((
Comment 10 Hans de Goede 2018-10-09 14:26:49 UTC
Can someone who is seeing this try to revert 011f22eb545a35f972036bb6a245c95c2e7e15a0 (drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated buffers v2) ?

That will likely fix this. If that indeed fixes it then we should really only use the Video BIOS / GOP driver framebuffer when taking over the initial mode and, *if it starts within the first 4k*, use a new framebuffer for fbdev emulation instead of inheriting the BIOS / GOP driver framebuffer there.

This will allow us to keep the initial framebuffer for flickerfree boot, while selecting another framebuffer which honors the WaSkipStolenMemoryFirstPage:bdw+
workaround for fbcon, which should fix the fbcon corruption.
Comment 11 vadym 2018-10-11 13:27:50 UTC
(In reply to Hans de Goede from comment #10)
> Can someone who is seeing this try to revert
> 011f22eb545a35f972036bb6a245c95c2e7e15a0 (drm/i915: Do NOT skip the first 4k
> of stolen memory for pre-allocated buffers v2) ?
> 
> That will likely fix this. If that indeed fixes it then we should really
> only use the Video BIOS / GOP driver framebuffer when taking over the
> initial mode and, *if it starts within the first 4k*, use a new framebuffer
> for fbdev emulation instead of inheriting the BIOS / GOP driver framebuffer
> there.
> 
> This will allow us to keep the initial framebuffer for flickerfree boot,
> while selecting another framebuffer which honors the
> WaSkipStolenMemoryFirstPage:bdw+
> workaround for fbcon, which should fix the fbcon corruption.

Hi Hans,

I'm able to reproduce this issue. I've just rebuild kernel with 011f22eb545a35f972036bb6a245c95c2e7e15a0 patch reverted but it didn't help. Issue is still reproducible.
Comment 12 Hans de Goede 2018-10-11 13:32:49 UTC
(In reply to vadym from comment #11)
> > This will allow us to keep the initial framebuffer for flickerfree boot,
> > while selecting another framebuffer which honors the
> > WaSkipStolenMemoryFirstPage:bdw+
> > workaround for fbcon, which should fix the fbcon corruption.
> 
> I'm able to reproduce this issue. I've just rebuild kernel with
> 011f22eb545a35f972036bb6a245c95c2e7e15a0 patch reverted but it didn't help.
> Issue is still reproducible.

Thanks. I'm a bit surprised that reverting that commit does not fix things. Did you rebuild your initrd?  I'm happy that I (I wrote that commit) did not cause this breakage, but I'm a bit surprised.
Comment 13 vadym 2018-10-16 16:21:36 UTC
(In reply to Hans de Goede from comment #12)
> 
> Thanks. I'm a bit surprised that reverting that commit does not fix things.
> Did you rebuild your initrd?  I'm happy that I (I wrote that commit) did not
> cause this breakage, but I'm a bit surprised.

Hi Hans,

Yes, I've got new initrd for my build. If I'm not mistaking you patch was landed in 4.18 kernel. But I can reproduce this issue with my default Debian 4.17.0-3-amd64 kernel. Also I'm not able to reproduce this on 4.9 kernel. So I think this issue can be bisected between 4.9 and 4.17
Comment 14 vadym 2018-10-22 13:29:04 UTC
Following patch fixes the issue for me: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/877396/

That patch fixes similar issue on the ChromeOS. 

I think this can be marked as duplicate of Bug 106478
Comment 15 Mathieu 2018-11-16 07:30:48 UTC
Hello,
If it can help, it seems I have the same problem : top left screen corruption looking the same.
Screen is looking fine at boot (console), then Kodi starts and screen corruption appears after a few seconds. Then it is still visible even if I go back to console.

The problems started when I upgrade the box from Ubuntu 18.04.LTS to 18.1 (kernel version is Linux server 4.18.0-10-generic #11-Ubuntu).

Hardware is an Intel(R) Core(TM) i3-7100 using the internal GPU, connected to TV with HDMI cable.

Please tell me if you need traces or more details.

regards
Comment 16 Denis 2018-12-12 10:57:27 UTC
hey. As I found out, https://bugs.freedesktop.org/show_bug.cgi?id=108257 this ticket was closed as fixed. I think that current one is the same and also should be closed. Reporter, did you try to check drm-tip, how it works for you?
Comment 17 Martin Jørgensen 2018-12-12 16:19:59 UTC
(In reply to Denis from comment #16)
> hey. As I found out, https://bugs.freedesktop.org/show_bug.cgi?id=108257
> this ticket was closed as fixed. I think that current one is the same and
> also should be closed. Reporter, did you try to check drm-tip, how it works
> for you?

I'm now running latest Debian testing kernel (4.18.0-3-amd64), and I still see the same corruption issue. It seems to be triggered on my machine after running a game, like a fullscreen OpenGL game.

I have not tried the drm-tip repos yet. I'll try look into how to build it and run it.
Comment 18 Martin Jørgensen 2018-12-12 21:29:34 UTC
Well, something broken. I get an grub "error: Out of memory" when loading 4.20-rc5 kernel built from the drm-tip repos ...
Comment 19 Denis 2018-12-13 11:05:36 UTC
hm, this definitely doesn't relate to current issue, but still - blocks from checking :(
btw - I recollecting problems with building on debian using defconfig command.
Could you try build using command "make oldconfig" instead "make defconfig"?
It should take your current "workable" kernel config for a new kernel.
Comment 20 Denis 2018-12-13 11:06:44 UTC
upd - keep in mind that using your config building process will take longer time then with "defconfig" (about 2-4 hours depending on your PC).
Comment 21 Denis 2018-12-13 13:43:24 UTC
oh and the last thing - user in the related ticket mentioned exactly this commit
https://cgit.freedesktop.org/drm-tip/commit/?id=2f99c4889e4124f9cf50b745d037f432318c4bb4
as workable for him. 

So if you didn't take exactly it and just built "latest" - maybe it worse it to build exactly this one.
Comment 22 Martin Jørgensen 2018-12-14 06:22:16 UTC
(In reply to Denis from comment #19)
> hm, this definitely doesn't relate to current issue, but still - blocks from
> checking :(
> btw - I recollecting problems with building on debian using defconfig
> command.
> Could you try build using command "make oldconfig" instead "make defconfig"?
> It should take your current "workable" kernel config for a new kernel.

Getting latest 4.20-rc6 sources, trying "make oldconfig" and answering "n" to all questions, makes an installable kernel that produce the same "out of memory" error as before on boot.

I'll try clone the drm-tip and checkout the commit "2f99c4889e4124f9cf50b745d037f432318c4bb4" and build that instead.
Comment 23 Martin Jørgensen 2018-12-14 06:47:33 UTC
(In reply to Denis from comment #21)
> oh and the last thing - user in the related ticket mentioned exactly this
> commit
> https://cgit.freedesktop.org/drm-tip/commit/
> ?id=2f99c4889e4124f9cf50b745d037f432318c4bb4
> as workable for him. 
> 
> So if you didn't take exactly it and just built "latest" - maybe it worse it
> to build exactly this one.

hmm, does it exist?

~/dev $ git clone git://anongit.freedesktop.org/drm-tip
Cloning into 'drm-tip'...
remote: Counting objects: 6396733, done.
remote: Compressing objects: 100% (957256/957256), done.
remote: Total 6396733 (delta 5394085), reused 6396675 (delta 5394047)
Receiving objects: 100% (6396733/6396733), 1.16 GiB | 3.95 MiB/s, done.
Resolving deltas: 100% (5394085/5394085), done.
Checking out files: 100% (62551/62551), done.
~/dev $ cd drm-tip/
~/dev/drm-tip $ git checkout 2f99c4889e4124f9cf50b745d037f432318c4bb4
fatal: reference is not a tree: 2f99c4889e4124f9cf50b745d037f432318c4bb4
Comment 24 Denis 2018-12-14 10:02:29 UTC
hm, I am not familiar well with kernel fix process, but I think that if that patch provided changes to "UTC integration manifest" - it is something general, means that from this date "2018y-11m-30d-21h-47m-58s" this file still includes fixes, so "git checkout master" should be ok.

The last thing I forgot to mention when wrote about "defconfig" - you should take your current config, and apply it during the building (according to your steps, you manually selected all options, and for sure, disabling everything - is not a right way).

1. If I am not mistaking, your current config should be here:
/boot/config-X.X.X (where X.X.X - your current stable kernel version).
2. During typing "make difconfig" - in the opened GUI find "Load configuration file" or similar to this, and select your old config.
3. Save these changes (note that config file should be renamed from config-X.X.X to .config)
4. Continue setup (it shouldn't ask you about enabling/disabling anything. It should take everything from your old .config file).

If this will not help for you, I will try to install debian later and compile kernel as well, cos I also reproduced this issue... but it won't bee fast :(
Comment 25 Denis 2018-12-14 10:04:37 UTC
upd - "Save these changes (note that config file should be renamed from config-X.X.X to .config)"

to be safe doing these operations, don't do this on original config file. Copy it somewhere (it may be obvious, I know... but still should be mentioned :) ).
Comment 26 Martin Jørgensen 2018-12-15 15:47:38 UTC
I searched through the .config and found that it was set to build with debug symbols, which made the initrd image unreasonably big, causing the "out of memory" error
Comment 27 Denis 2018-12-18 10:34:35 UTC
soo? :) Did your try was successful?)
Comment 28 Martin Jørgensen 2018-12-18 15:46:36 UTC
Yes, managed to build a 4.20-rc6 from drm-tip

I've tried reproduce the screen corruption with 4.20-rc6 but haven't been able to so fa.
Comment 29 Denis 2018-12-18 16:11:13 UTC
thanks a lot! That's confirming that fix was landed within those patches. Closing ticket. (https://patchwork.freedesktop.org/series/51878/ series with fix)
Please reopen if you get new information or reproduce it again.
Comment 30 Miguel A. Vallejo 2019-04-08 21:31:47 UTC
Hello there.

The bug returned to kernel 4.19.28:

$ uname -a

Linux waterhole 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 GNU/Linux

See atttached image.
Comment 31 Miguel A. Vallejo 2019-04-08 21:33:28 UTC
Created attachment 143903 [details]
Console corruption (regression) with kernel 4.19.28
Comment 32 Miguel A. Vallejo 2019-04-08 21:36:40 UTC
I'm sorry guys, I just realized I booted the wrong kernel. Sorry for the inconveniences.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.