Bug 93640

Summary:

[BAT ILK SNB IVB bisected] Fifo underruns since CI_DRM_952

Product:

DRI

Reporter:

Daniel Vetter <daniel>

Component:

DRM/Intel

Assignee:

Matt Roper <matthew.d.roper>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

highest

CC:

intel-gfx-bugs

Version:

XOrg git

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

ILK, IVB, SNB

i915 features:

display/watermark

Attachments:

Description	Flags
dmesg with drm.debug=0xe set before starting the test	none
wm after boot, pipe A underrun from boot-up	none
wm after boot, pipe B underrun due to test	none

Description Daniel Vetter 2016-01-08 08:43:33 UTC

Build 951 was somehow fail on the affected snb machine, so unclear where exactly the regression is.

Machine name is snb-dellxps. Affected testecases (from a quick look 100% reproducible):

igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b
igt@kms_pipe_crc_basic@read-crc-pipe-b
igt@kms_flip@basic-plain-flip
igt@kms_flip@basic-flip-vs-wf_vblank
igt@drv_module_reload_basic
	
Assigning to Matt Roper since most likely this is fallout from the wm rework.

Comment 1 Matt Roper 2016-01-09 02:02:44 UTC

Untested, but does the following patch solve the problem?  The immediate bug is that we don't drop locks in the error case.  But I'm unclear why we're hitting the error condition in the first place.  Is there a way to get logs with drm.debug out of the CI system?  Or to submit a patch directly to CI for testing without posting it on the mailing list?


diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 391cc7f..075df0c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15317,7 +15317,7 @@ retry:
 
        state = drm_atomic_helper_duplicate_state(dev, &ctx);
        if (WARN_ON(IS_ERR(state)))
-               return;
+               goto fail;
 
        ret = intel_atomic_check(dev, state);
        if (ret) {
@@ -15345,6 +15345,7 @@ retry:
        }
 
        drm_atomic_state_free(state);
+fail:
        drm_modeset_drop_locks(&ctx);
        drm_modeset_acquire_fini(&ctx);
 }

Comment 2 Daniel Vetter 2016-01-11 10:22:33 UTC

Fixup summary, accidentally typed SKL instead of SNB.

Comment 3 Matt Roper 2016-01-12 19:37:07 UTC

My above patch/comment was actually intended for bug 93666; disregard for this defect since it likely isn't related to underruns.

If I'm reading the CI details correctly, it's not just SNB that gets the underruns, but also ILK and IVB.  However HSW/BDW, which use pretty much the same watermark logic and code, are fine.  I'll take a look to see if I can figure out the source of the problems.  Unfortunately I don't have any of these platforms at the moment so debugging is somewhat challenging.

Comment 4 Daniel Vetter 2016-01-13 10:03:12 UTC

(In reply to Matt Roper from comment #3)
> My above patch/comment was actually intended for bug 93666; disregard for
> this defect since it likely isn't related to underruns.
> 
> If I'm reading the CI details correctly, it's not just SNB that gets the
> underruns, but also ILK and IVB.  However HSW/BDW, which use pretty much the
> same watermark logic and code, are fine.  I'll take a look to see if I can
> figure out the source of the problems.  Unfortunately I don't have any of
> these platforms at the moment so debugging is somewhat challenging.

Yeah, I made the bug report before Tomi restored machines to operability (due to the ww mutex deadlock that killed them).

Comment 5 Daniel Vetter 2016-01-15 16:20:12 UTC

Note: On ivb fifo underrun and crc interrupts are shared, so if a fifo underrun happens that also kills the crc. That explains the failures.

Tried to repro here, somehow couldn't :(

Comment 6 Daniel Vetter 2016-01-15 21:04:18 UTC

Bisected on my snb to

commit 396e33ae204f52abebec9e578f44c749305500f4
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Wed Jan 6 11:34:30 2016 -0800

    drm/i915: Add two-stage ILK-style watermark programming (v10)

Comment 7 Daniel Vetter 2016-01-15 21:29:46 UTC

Created attachment 121068 [details]
dmesg with drm.debug=0xe set before starting the test

Comment 8 Daniel Vetter 2016-01-19 22:03:19 UTC

Interesting new observatin I've just made: I can only repro the fifo underrun once per pipe on my snb: First on boot-up on pipe A, then once more with an igt on pipe B. Reloading the module resets this, and I can repro once more.

I can't really reconcile this with the CI results since I thought CI runtime ordering of testcases isn't that stable ... but assuming it's indeed only the first time we use a pipe: Some bug in the initial setup that breaks the wm code?

Comment 9 Daniel Vetter 2016-01-21 08:58:21 UTC

Created attachment 121170 [details]
wm after boot, pipe A underrun from boot-up

Comment 10 Daniel Vetter 2016-01-21 08:58:42 UTC

Created attachment 121171 [details]
wm after boot, pipe B underrun due to test

Comment 11 Matt Roper 2016-02-25 16:28:51 UTC

This was fixed a while back by revert:
  https://patchwork.freedesktop.org/patch/71064/

A new version of the problematic patch that should solve the issue being reported here has been posted but not yet merged:
  https://patchwork.freedesktop.org/patch/74800/

Comment 12 Jari Tahvanainen 2016-09-30 09:34:19 UTC

https://patchwork.freedesktop.org/patch/74800/ code changes are in drm-intel-nightly as today. Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.