Bug 102818

Summary: [BSW] image corruption issue after edd849e5448c4f6ddc04a5fa1ac5479176660c27
Product: DRI Reporter: freedesktop
Component: DRM/IntelAssignee: JP <jp.guarrera>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: intel-gfx-bugs, jp.guarrera
Version: XOrg gitKeywords: bisected
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: BSW/CHT i915 features: display/HDMI
Description Flags
xrandr --verbose
git bisect log
single frame from the video exhibiting the issue
dmesg that includes period when image artifacts appeared
Reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d for use with 4.13.3
Steps to confirm that 608b20506941969ea30d8c08dc9ae02bb87dbf7d is BAD
Hold powerwell for vblanks none

Description freedesktop 2017-09-17 10:47:30 UTC
A user has reported an intermittent issue since 4.12-rc1 which manifests as a "z" shaped flicker/breakup of the displayed image.

The OS is LibreELEC with latest mainline kernel, running Kodi 18.

Since a picture is worth a thousand words, you can view the issue in two short videos:

Video #1[1]: flicker at 1s, 10s and 12s
Video #2[2]: flicker at 1s and 8s

The user has tested with two different HDMI cables and two different TVs, and the result is the same.

I bisected the kernel with the user (bisect log attached) and the bad commit[3] is:

Merge tag 'drm-misc-next-2017-03-21' of git://anongit.freedesktop.org/git/drm-misc into drm-next

Unfortunately this is a merge commit, but it seems to be in the right area.

The kernel log with "drm.debug=0xe" is attached, as is Xorg.0.log.

Unfotunately it hasn't been possible to obtain a vbios dump.

The user has tried on two TVs with two different cables, and the same problem is present with all combinations:

TV #1: Sony KDL 32EX521 SW:PKG3-309-EUA-0104 (1080p, 6 years old)
TV #2: Sony KDL 49X8305C SW:PKG3-473-0107EUB (4K, 1 year old)
Cable #1: Belkin High speed 3m #49749
Cable #2: Startech 1m Highspeed 20276

The PC configuration is:

CPU: Intel Atom x5-E8000 (Braswell) @ 1.04GHz (SolidRun IB8000 SOM[4])
GPU: Intel HD Graphics 400
Driver: intel-vaapi-driver 1.8.3, libva 1.8.3, mesa 17.2

The issue is still present with kernel 4.13.2 and 4.14-rc1.

I'd be happy to create test builds with patches for the user to try out.

Many thanks.

1. http://milhouse.libreelec.tv/other/vsync_issue/IMG_0403.MOV (27MB)
2. http://milhouse.libreelec.tv/other/vsync_issue/IMG_0406.MOV (18MB)
3. https://github.com/torvalds/linux/commit/edd849e5448c4f6ddc04a5fa1ac5479176660c27
4. https://www.solid-run.com/intel-braswell-family/braswell-som-system-on-module/braswell-som-specifications/
Comment 1 freedesktop 2017-09-17 10:48:04 UTC
Created attachment 134287 [details]
Comment 2 freedesktop 2017-09-17 10:48:56 UTC
Created attachment 134288 [details]
Comment 3 freedesktop 2017-09-17 10:49:33 UTC
Created attachment 134289 [details]
xrandr --verbose
Comment 4 freedesktop 2017-09-17 11:08:17 UTC
Created attachment 134290 [details]
git bisect log
Comment 5 Jani Nikula 2017-09-18 08:33:56 UTC
I don't see why git bisect wouldn't bisect into the merge. Care to check the last results again?
Comment 6 freedesktop 2017-09-18 09:22:09 UTC
> I don't see why git bisect wouldn't bisect into the merge.

Sorry Jani, I'm not entirely sure what you mean here - I simply used "git bisect" between 4.11.10 (known good) and 4.12-rc1 (known bad), and based on testing of the resulting kernels git decided that the merge commit was the bad commit.

Can you tell me the best way to bisect the merge commit and I'll give that a go.

My kernel repo is a clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

> Care to check the last results again?

I'll ask the user to check the builds again, there were 14 in total. This may take a few days. Unfortunately I can't reproduce this issue myself.
Comment 7 Jani Nikula 2017-09-21 12:30:29 UTC
Does the dmesg cover the part where the user sees flickering?
Comment 8 Jani Nikula 2017-09-21 12:43:13 UTC
Created attachment 134407 [details]
single frame from the video exhibiting the issue
Comment 9 freedesktop 2017-09-21 13:26:45 UTC
Created attachment 134408 [details]
dmesg that includes period when image artifacts appeared

Quote from user:

"dmesg | pastebinit -> http://sprunge.us/cTFK
& the issues are reproduced ~20 times during 2min !!!"

So it would appear that when these events occur, nothing is being logged...
Comment 10 freedesktop 2017-09-29 08:10:23 UTC
I used git to bisect between v4.12-rc1 (BAD) and v4.11.10 (GOOD).

Unfortunately git identified a merge commit as the BAD commit:


and would not bisect any further. I decided to "bisect" manually, and having done so the first BAD commit is:


By reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d from 4.13.3 (see attached patch), the user has confirmed that 4.13.3 is now GOOD, while the normal 4.13.3 (without the revert) is still BAD.

I'll attach the "revert" patch (which is just for test purposes), and also my steps taken to confirm this conclusion (just for completeness)
Comment 11 freedesktop 2017-09-29 08:11:34 UTC
Created attachment 134562 [details] [review]
Reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d for use with 4.13.3
Comment 12 freedesktop 2017-09-29 08:12:11 UTC
Created attachment 134563 [details]
Steps to confirm that 608b20506941969ea30d8c08dc9ae02bb87dbf7d is BAD
Comment 13 JP 2017-10-05 09:19:16 UTC

I'm the originator of this bug. I start my own company and after several researches I decided to base my products on intel platform ( instead of Amlogx, imx,...) 
I try to offer to end user and complete personal DVBT/DVBS streaming solution
To reach this target , my company is developping the client & server by ourself ( at limited costs )
I'm now in blocked state due to the above bug.

May I ask you kindly to review it & let me know if there is chance to found some solution.

Otherwise, I 'll be forced to switch from intel to other platform ( I know it's technical the best ... but ... project / results first )

Thanks a lot in advance

Project Manager
Comment 14 Elizabeth 2017-10-05 20:24:26 UTC
Reopening since information requested in comment #5 and comment #7 has been provided. 

Good afternoon JP. If your company has an agreement with Intel please use proper internal channels to speed this up, otherwise allow us to review further.
Comment 15 JP 2017-10-06 06:54:28 UTC
Hi Elizabeth
As I wrote I'm small company with limited budget
Unfortunatelly I don't have any agreement with Intel

I just count from now on developpers devotion...

Thanks in advance
Comment 16 Jani Nikula 2017-10-09 07:33:28 UTC
(In reply to freedesktop from comment #10)
> By reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d from 4.13.3 (see
> attached patch), the user has confirmed that 4.13.3 is now GOOD, while the
> normal 4.13.3 (without the revert) is still BAD.

608b20506941 ("drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)")
Comment 17 Chris Wilson 2017-10-09 08:52:24 UTC
Created attachment 134760 [details] [review]
Hold powerwell for vblanks

Something like this?
Comment 18 freedesktop 2017-10-11 07:35:37 UTC
Hi Chris. Unfortunately the patch hasn't helped - JP has tested a build based on 4.13.5 + your patch from comment #17, and he has the same Z display corruption as before.
Comment 19 JP 2017-10-19 05:50:59 UTC
Any update on the previous topic ?
thanks in advance
Comment 20 Daniel Vetter 2017-11-07 12:20:26 UTC
We have a bunch of bugfixes all over in flight, please reteste with latest drm-tip (and quote the full top commit of it, it's a rebasing tree, the sha1 isn't useful). Both with and without Chris' patch.
Comment 21 Daniel Vetter 2017-11-07 12:23:49 UTC
Also note: PSR will break the vblank code, pls make sure you don't have that enabled somewhere in the module options.
Comment 22 freedesktop 2017-11-13 17:41:41 UTC
Hi, apologies for the long delay - a bad case of Man Flu has knocked me out for the past week.

On Sunday I produced three test builds for JP based on drm-tip[1]:

"drm-tip: 2017y-11m-12d-14h-36m-14s UTC integration manifest"

The three builds were:

#1113b: drmtip only
#1113c: #1113b + Chris Wilson patch (comment 17)
#1113d: #1113b + my hack patch (comment 11)

After testing, JP has called all three builds as "GOOD", as none are exhibiting the Z-shaped corruption. Therefore it doesn't look like any additional patches from this bug are required.

However a different build (exact same userland but with drm-tip replaced by the mainline 4.14 kernel released on Sunday, and no patches from this bug) continues to show the Z-shaped corruption, so the corruption issue is still present in the mainline 4.14 kernel.

Maybe this issue will be fixed in 4.15?

1. https://cgit.freedesktop.org/drm-tip/commit/?id=0dc48f1fad834c3ab95f4d178e9e38e6ea39b6cf
Comment 23 freedesktop 2017-11-13 17:42:42 UTC
Oh and there are no PSR (Panel Self Refresh) settings enabled to my knowledge.
Comment 24 JP 2017-12-08 13:08:40 UTC
it appears bug solved in the 4.15rc1
I 'll close the bug when the test will be also positive for the 4.15.0
Thanks for al
Comment 25 Jani Saarinen 2018-03-29 07:10:56 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 26 Jani Saarinen 2018-04-20 14:26:44 UTC
Closing, please re-open if still occurs.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.