Bug 93640

Summary: [BAT ILK SNB IVB bisected] Fifo underruns since CI_DRM_952
Product: DRI Reporter: Daniel Vetter <daniel>
Component: DRM/IntelAssignee: Matt Roper <matthew.d.roper>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: ILK, IVB, SNB i915 features: display/watermark
Attachments:
Description Flags
dmesg with drm.debug=0xe set before starting the test
none
wm after boot, pipe A underrun from boot-up
none
wm after boot, pipe B underrun due to test none

Description Daniel Vetter 2016-01-08 08:43:33 UTC
Build 951 was somehow fail on the affected snb machine, so unclear where exactly the regression is.

Machine name is snb-dellxps. Affected testecases (from a quick look 100% reproducible):

igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b
igt@kms_pipe_crc_basic@read-crc-pipe-b
igt@kms_flip@basic-plain-flip
igt@kms_flip@basic-flip-vs-wf_vblank
igt@drv_module_reload_basic
	
Assigning to Matt Roper since most likely this is fallout from the wm rework.
Comment 1 Matt Roper 2016-01-09 02:02:44 UTC
Untested, but does the following patch solve the problem?  The immediate bug is that we don't drop locks in the error case.  But I'm unclear why we're hitting the error condition in the first place.  Is there a way to get logs with drm.debug out of the CI system?  Or to submit a patch directly to CI for testing without posting it on the mailing list?


diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 391cc7f..075df0c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15317,7 +15317,7 @@ retry:
 
        state = drm_atomic_helper_duplicate_state(dev, &ctx);
        if (WARN_ON(IS_ERR(state)))
-               return;
+               goto fail;
 
        ret = intel_atomic_check(dev, state);
        if (ret) {
@@ -15345,6 +15345,7 @@ retry:
        }
 
        drm_atomic_state_free(state);
+fail:
        drm_modeset_drop_locks(&ctx);
        drm_modeset_acquire_fini(&ctx);
 }
Comment 2 Daniel Vetter 2016-01-11 10:22:33 UTC
Fixup summary, accidentally typed SKL instead of SNB.
Comment 3 Matt Roper 2016-01-12 19:37:07 UTC
My above patch/comment was actually intended for bug 93666; disregard for this defect since it likely isn't related to underruns.

If I'm reading the CI details correctly, it's not just SNB that gets the underruns, but also ILK and IVB.  However HSW/BDW, which use pretty much the same watermark logic and code, are fine.  I'll take a look to see if I can figure out the source of the problems.  Unfortunately I don't have any of these platforms at the moment so debugging is somewhat challenging.
Comment 4 Daniel Vetter 2016-01-13 10:03:12 UTC
(In reply to Matt Roper from comment #3)
> My above patch/comment was actually intended for bug 93666; disregard for
> this defect since it likely isn't related to underruns.
> 
> If I'm reading the CI details correctly, it's not just SNB that gets the
> underruns, but also ILK and IVB.  However HSW/BDW, which use pretty much the
> same watermark logic and code, are fine.  I'll take a look to see if I can
> figure out the source of the problems.  Unfortunately I don't have any of
> these platforms at the moment so debugging is somewhat challenging.

Yeah, I made the bug report before Tomi restored machines to operability (due to the ww mutex deadlock that killed them).
Comment 5 Daniel Vetter 2016-01-15 16:20:12 UTC
Note: On ivb fifo underrun and crc interrupts are shared, so if a fifo underrun happens that also kills the crc. That explains the failures.

Tried to repro here, somehow couldn't :(
Comment 6 Daniel Vetter 2016-01-15 21:04:18 UTC
Bisected on my snb to

commit 396e33ae204f52abebec9e578f44c749305500f4
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Wed Jan 6 11:34:30 2016 -0800

    drm/i915: Add two-stage ILK-style watermark programming (v10)
Comment 7 Daniel Vetter 2016-01-15 21:29:46 UTC
Created attachment 121068 [details]
dmesg with drm.debug=0xe set before starting the test
Comment 8 Daniel Vetter 2016-01-19 22:03:19 UTC
Interesting new observatin I've just made: I can only repro the fifo underrun once per pipe on my snb: First on boot-up on pipe A, then once more with an igt on pipe B. Reloading the module resets this, and I can repro once more.

I can't really reconcile this with the CI results since I thought CI runtime ordering of testcases isn't that stable ... but assuming it's indeed only the first time we use a pipe: Some bug in the initial setup that breaks the wm code?
Comment 9 Daniel Vetter 2016-01-21 08:58:21 UTC
Created attachment 121170 [details]
wm after boot, pipe A underrun from boot-up
Comment 10 Daniel Vetter 2016-01-21 08:58:42 UTC
Created attachment 121171 [details]
wm after boot, pipe B underrun due to test
Comment 11 Matt Roper 2016-02-25 16:28:51 UTC
This was fixed a while back by revert:
  https://patchwork.freedesktop.org/patch/71064/

A new version of the problematic patch that should solve the issue being reported here has been posted but not yet merged:
  https://patchwork.freedesktop.org/patch/74800/
Comment 12 Jari Tahvanainen 2016-09-30 09:34:19 UTC
https://patchwork.freedesktop.org/patch/74800/ code changes are in drm-intel-nightly as today. Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.