104528 – [SKL] [bisected] [regression] Artifacts when running window manager remotely

Bug 104528 - [SKL] [bisected] [regression] Artifacts when running window manager remotely

Summary: [SKL] [bisected] [regression] Artifacts when running window manager remotely

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:	bisected

Depends on:
Blocks:

Reported:	2018-01-07 16:12 UTC by Mariusz Białończyk
Modified:	2018-10-04 18:18 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:	SKL
i915 features:	display/Other

Attachments
dmesg (2.36 MB, text/x-log) 2018-03-29 13:20 UTC, Mariusz Białończyk	no flags	Details
View All

Description Mariusz Białończyk 2018-01-07 16:12:31 UTC

Hello.
I am running enlightenment WM from a lxc container on one of my screen. To run it I am logging via ssh and I am passing export DISPLAY=:0 then 'enlightenment_start'. All was working flawlessly until I upgraded my stock debian kernel today. I switched from stable kernel 4.11 to 4.12+ (also tested on 4.13, 4.14 and also on 4.15.0-rc5).
The problem exist on all kernels above 4.11.

After running 'enlightenment_start' the screen is starting to flicker and I cannot normally work. I can see that it flickers mostly when some screen regions are redrawn by enlightenment (it has a clock with seconds displayed).
And because a picture is worth a thousand words, so I recorded a short movie with this artifact effect:
http://skyboo.net/temp/drm-bug/VID_20180107_053715.mp4

I spent several hours compiling the kernels to track down the problem, and finally I've bisected the single commit from which the problem started to happen:

616d9cee4fdc4a377c03be8fd6efa5df4fcd0d81 is the first bad commit
commit 616d9cee4fdc4a377c03be8fd6efa5df4fcd0d81
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 16 15:05:21 2017 +0100

    drm/i915: First try the previous execbuffer location
    
    When choosing a slot for an execbuffer, we ideally want to use the same
    address as last time (so that we don't have to rebind it) and the same
    address as expected by the user (so that we don't have to fixup any
    relocations pointing to it). If we first try to bind the incoming
    execbuffer->offset from the user, or the currently bound offset that
    should hopefully achieve the goal of avoiding the rebind cost and the
    relocation penalty. However, if the object is not currently bound there
    we don't want to arbitrarily unbind an object in our chosen position and
    so choose to rebind/relocate the incoming object instead. After we
    report the new position back to the user, on the next pass the
    relocations should have settled down.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>

:040000 040000 2cf47689666547b40b9aacee05ee208713f1616b b857932a3e91700e68a9b236e6a5ea475cfa54f6 M	drivers

Distro: debian buster/sid
Screen configuration: separate xorg instance only for integrated intel graphics (two displays: HDMI for kodi, DP for enlightenment)
CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz

Comment 1 Chris Wilson 2018-01-07 16:19:03 UTC

Sadly that is just telling you something else is wrong.

Comment 2 Kenneth C 2018-01-07 18:48:42 UTC

I reported the same bug with the same artifacts and the same kernel commit ... certainly this means that commit has SOMETHING to do with this display corruption, especially since everything works perfectly before, and after if this commit is reverted (and for many kernel versions afterwards, incl. Linus' tip)?

Comment 3 Elizabeth 2018-03-28 22:41:17 UTC

Hello, could  you attach full dmesg with drm.debug=0x1e with the corruption captured in the log?

Comment 4 Jani Saarinen 2018-03-29 07:10:50 UTC

First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.

Comment 5 Mariusz Białończyk 2018-03-29 13:19:09 UTC

Hi Elizabeth,
I compiled a fresh 4.16.0-rc7 kernel today.
I am attaching the full dmesg as you requested.

Some notes:
After clean reboot I started xorg and then exported the DISPLAY and run the display manager. Artifacts started to happen.

Then after some time I terminated the xorg and then the lines
[drm:gen8_irq_handler] *ERROR* Fault errors on pipe B: 0x00000800
started to flood my log (it is very common when I terminate xorg).

Then I created a full dmesg to you.

Comment 6 Mariusz Białończyk 2018-03-29 13:20:01 UTC

Created attachment 138421 [details]
dmesg

Comment 7 Jani Saarinen 2018-04-25 11:03:13 UTC

Chris, any advice here?

Comment 8 Lakshmi 2018-09-11 09:13:39 UTC

Sorry for the delay...

Reporter, do you still have the issue?

Please try to reproduce the issue using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.

Comment 9 Mariusz Białończyk 2018-09-11 20:14:54 UTC

Frankly, I was skeptical about retesting it again...

Fortunately: I DON'T HAVE THIS ISSUE ANYMORE ON LATEST DRM SOURCES (vanilla kernel)! :)
I don't need to patch the i915_gem_execbuffer.c file anymore - it is working out of the box without any patch applied.

I don't know which commit fixed this problem, but it doesn't matter for me, as it seems it is now back working properly. I am closing this bug.

Thank you all involved!

Comment 10 Lakshmi 2018-10-04 18:18:38 UTC

(In reply to Mariusz Białończyk from comment #9)
> Frankly, I was skeptical about retesting it again...
> 
> Fortunately: I DON'T HAVE THIS ISSUE ANYMORE ON LATEST DRM SOURCES (vanilla
> kernel)! :)
> I don't need to patch the i915_gem_execbuffer.c file anymore - it is working
> out of the box without any patch applied.
> 
> I don't know which commit fixed this problem, but it doesn't matter for me,
> as it seems it is now back working properly. I am closing this bug.
> 
> Thank you all involved!

Thanks for the feedback. Closing this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.