Summary: | [BAT][CI] igt@* - dmesg-warn|dmesg-fail - *ERROR* Timeout waiting for engines to idle | *ERROR* [CRTC:36:pipe A] flip_done timed out | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Marta Löfstedt <marta.lofstedt> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | highest | CC: | intel-gfx-bugs, jaar, jani.saarinen, ricardo.vega |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | KBL | i915 features: | display/Other |
Bug Depends on: | 103163, 103170 | ||
Bug Blocks: |
Description
Marta Löfstedt
2017-10-09 12:36:35 UTC
Yep, This is not related to LSPCON/Link training, but looks a separate issue. - Shashank (In reply to shashank.sharma@intel.com from comment #1) > Yep, This is not related to LSPCON/Link training, but looks a separate issue. > > - Shashank I still think we should wait with any further investigation of this until your lspcon patches has landed. Shashank was right CI_DRM_3228 has the lspcon fixes. So, lspcon issue see to be gone but the issue in this bug remain. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3233/shard-kbl6/igt@gem_pwrite@big-cpu-forwards.html *** This bug has been marked as a duplicate of bug 103170 *** https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3244/shard-kbl3/dmesg29.log has: <3>[ 220.135728] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:46:pipe B] flip_done timed out However: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3244/shard-kbl3/run29.log show that the run completed. Hence, we can have the "*ERROR* [CRTC:46:pipe B] flip_done timed out" print without this causing an incomplete. This proves that this bug is NOT a duplicate of BUG 103170. I do however, agree that a lot of KBL-shard incomplete have either/and/or the "*ERROR* Timeout waiting for engines to idle" and "*ERROR* [CRTC:36:pipe A] flip_done timed out". Also, since the dmesg-warns symptom obviously happened before the incomplete I believe that it is more logical to deal with the issue in this bug. Also, in bug 103163. There is a pattern with: <3>[ 472.866651] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle <3>[ 482.784380] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out <3>[ 493.024355] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out <3>[ 503.264164] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out before the softdog is triggered. On CI_DRM_3269 BAT-machine fi-kbl-7567u igt@prime_vgem@basic-fence-flip and igt@prime_vgem@basic-fence-mmap [ 417.909011] i915 0000:00:02.0: Resetting rcs0 after gpu hang [ 420.086987] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3269/fi-kbl-7567u/igt@prime_vgem@basic-fence-flip.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3269/fi-kbl-7567u/igt@prime_vgem@basic-fence-mmap.html Alo on APL-shards: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3275/shard-apl1/igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size.html [ 935.542706] hpet1: lost 7161 rtc interrupts [ 936.437376] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle [ 936.450968] hpet1: lost 7161 rtc interrupts *** Bug 103218 has been marked as a duplicate of this bug. *** Raise priority Also, filed patchwork on fi-bsw-n3050 https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_416/fi-bsw-n3050/igt@gem_flink_basic@bad-open.html https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6166/fi-bsw-n3050/igt@drv_module_reload@basic-reload.html on this bug. (In reply to Marta Löfstedt from comment #14) > Also, filed patchwork on fi-bsw-n3050 > > https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_416/fi-bsw-n3050/ > igt@gem_flink_basic@bad-open.html > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6166/fi-bsw-n3050/ > igt@drv_module_reload@basic-reload.html > > on this bug. I have unduplicated this to bug 103479 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl7/igt@kms_color@legacy-gamma-reset-pipe0.html This looks a bit different: [ 310.895874] i915 0000:00:02.0: Resetting bcs0 after gpu hang [ 310.896773] i915 0000:00:02.0: Resetting vcs0 after gpu hang [ 310.897738] i915 0000:00:02.0: Resetting vcs1 after gpu hang [ 310.898473] i915 0000:00:02.0: Resetting vecs0 after gpu hang [ 312.881353] i915 0000:00:02.0: Resetting rcs0 after gpu hang [ 315.067619] [drm:intel_engines_park [i915]] *ERROR* rcs0 is not idle before parking [ 315.068296] [drm:intel_engines_park [i915]] *ERROR* bcs0 is not idle before parking [ 315.069155] [drm:intel_engines_park [i915]] *ERROR* vcs0 is not idle before parking [ 315.070039] [drm:intel_engines_park [i915]] *ERROR* vecs0 is not idle before parking [ 326.263182] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl2/igt@gem_userptr_blits@map-fixed-invalidate-overlap.html This also has: *ERROR* vecs0 is not idle before parking [ 276.831015] i915 0000:00:02.0: Resetting chip after gpu hang [ 279.081273] [drm:intel_engines_park [i915]] *ERROR* rcs0 is not idle before parking [ 279.081812] [drm:intel_engines_park [i915]] *ERROR* bcs0 is not idle before parking [ 279.082355] [drm:intel_engines_park [i915]] *ERROR* vcs0 is not idle before parking [ 279.082846] [drm:intel_engines_park [i915]] *ERROR* vecs0 is not idle before parking [ 286.883983] i915 0000:00:02.0: Resetting chip after gpu hang [ 289.172584] [drm:intel_engines_park [i915]] *ERROR* rcs0 is not idle before parking [ 289.173338] [drm:intel_engines_park [i915]] *ERROR* bcs0 is not idle before parking [ 289.173842] [drm:intel_engines_park [i915]] *ERROR* vcs1 is not idle before parking [ 289.174606] [drm:intel_engines_park [i915]] *ERROR* vecs0 is not idle before parking [ 299.625225] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3305/shard-kbl5/igt@gem_persistent_relocs@forked-faulting-reloc-thrashing.html [ 904.165570] i915 0000:00:02.0: Resetting vecs0 after gpu hang [ 906.213198] i915 0000:00:02.0: Resetting rcs0 after gpu hang [ 908.399684] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle [ 908.421588] Setting dangerous option prefault_disable - tainting kernel [ 908.422662] Setting dangerous option prefault_disable - tainting kernel [ 908.426009] Setting dangerous option prefault_disable - tainting kernel [ 908.426018] Setting dangerous option prefault_disable - tainting kernel [ 914.216762] i915 0000:00:02.0: Resetting vcs0 after gpu hang [ 914.217422] i915 0000:00:02.0: Resetting vecs0 after gpu hang [ 918.186750] i915 0000:00:02.0: Resetting bcs0 after gpu hang [ 922.208800] [drm:i915_gem_wait_for_idle [i915]] *ERROR* Failed to idle engines, declaring wedged! [ 922.220751] i915 0000:00:02.0: Resetting rcs0 after gpu hang [ 922.233956] Setting dangerous option prefault_disable - tainting kernel [ 922.234436] Setting dangerous option prefault_disable - tainting kernel [ 922.234506] Setting dangerous option prefault_disable - tainting kernel [ 932.273591] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out why are we continuing testing when we have declared wedged? https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3305/shard-kbl5/igt@gem_wait@write-wait-vebox.html (gem_wait:2011) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_wait:2011) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest write-wait-vebox failed. [ 773.090105] i915 0000:00:02.0: Resetting bcs0 after gpu hang [ 773.090392] i915 0000:00:02.0: Resetting vcs0 after gpu hang [ 773.090769] i915 0000:00:02.0: Resetting vcs1 after gpu hang [ 773.091122] i915 0000:00:02.0: Resetting vecs0 after gpu hang [ 777.124141] i915 0000:00:02.0: Resetting rcs0 after gpu hang [ 779.310206] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl7/igt@prime_busy@before-bsd2.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl2/igt@gem_exec_store@basic-all.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3308/shard-kbl5/igt@kms_flip@vblank-vs-dpms-suspend.html <7>[ 176.769694] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15 <3>[ 176.972381] [drm:intel_engines_park [i915]] *ERROR* rcs0 is not idle before parking <7>[ 176.972425] intel_engines_park rcs0 <7>[ 176.972428] intel_engines_park current seqno 1c6cf, last 1c6cf, hangcheck 0 [-123060 ms], inflight 0 <7>[ 176.972430] intel_engines_park Reset count: 1 <7>[ 176.972437] intel_engines_park Requests: <7>[ 176.972916] intel_engines_park RING_START: 0x0000b000 [0x00000000] <7>[ 176.972921] intel_engines_park RING_HEAD: 0x00000c10 [0x00000000] <7>[ 176.972925] intel_engines_park RING_TAIL: 0x00000c10 [0x00000000] <7>[ 176.972930] intel_engines_park RING_CTL: 0x00003000 <7>[ 176.972935] intel_engines_park RING_MODE: 0x00000200 [idle] <7>[ 176.972942] intel_engines_park ACTHD: 0x00000000_00000c10 <7>[ 176.972947] intel_engines_park BBADDR: 0x00000000_00000004 <7>[ 176.972952] intel_engines_park Execlist status: 0x00000301 00000000 <7>[ 176.972956] intel_engines_park Execlist CSB read 0 [-1 cached], write 1 [1 from hws], interrupt posted? no <7>[ 176.972960] intel_engines_park Execlist CSB[1]: 0x00000018 [0x00000018 in hwsp], context: 3 [3 in hwsp] <7>[ 176.972962] intel_engines_park ELSP[0] count=1, <7>[ 176.972965] intel_engines_park rq: 1c6cf! [3:10] prio=0 @ 15379ms: kms_flip[2822]/0 <7>[ 176.972968] intel_engines_park ELSP[1] idle <7>[ 176.972970] intel_engines_park HW active? 0x1 <7>[ 176.972983] intel_engines_park I have unduplicated fi-cfl-s and APL-shards, from this issue. The frequency of this issue on those machine is so much lower, that it is not worth the super suppression caused by KBL. Note APL-shards bug for this issue is now: https://bugs.freedesktop.org/show_bug.cgi?id=103545 Among the attached logs, I dont see any DP/LSPCON errors or Link training issues. Is there any LSPCON failure sighting here ? - Shashank (In reply to shashank.sharma@intel.com from comment #24) > Among the attached logs, I dont see any DP/LSPCON errors or Link training > issues. Is there any LSPCON failure sighting here ? > > - Shashank Shashank this bug is not about DP/LSPCON issues. Check bug 102295, bug 103558, for that. The issue will hopefully be solved with new DMC FW soon. There may be issue filed on this bug that is not related to the DMC issue it is my intention to close this once the new DMC have arrived. Same goes for bug 103163 and bug 103170. Apparently the new DMC hasn't been rolled out yet: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3327/shard-kbl4/igt@kms_flip@absolute-wf_vblank.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3327/shard-kbl4/igt@gem_exec_store@cachelines-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3327/shard-kbl4/igt@gem_exec_async@concurrent-writes-render.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3327/shard-kbl4/igt@gem_exec_parallel@default.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-kbl6/igt@gem_busy@busy-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-kbl6/igt@gem_gtt_cpu_tlb.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-kbl6/igt@kms_universal_plane@cursor-fb-leak-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-kbl6/igt@kms_addfb_basic@addfb25-bad-modifier.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-kbl6/igt@kms_chv_cursor_fail@pipe-a-64x64-top-edge.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-kbl6/igt@gem_exec_schedule@out-order-vebox.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3335/shard-kbl6/igt@kms_rotation_crc@primary-rotation-90-y-tiled.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-kbl3/igt@gem_exec_capture@capture-blt.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-kbl3/igt@gem_create@create-valid-nonaligned.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-kbl3/igt@gem_wait@write-busy-vebox.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3341/shard-kbl5/igt@prime_busy@wait-before-blt.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-kbl3/igt@drv_suspend@fence-restore-untiled.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3344/shard-kbl5/igt@prime_self_import@basic-with_one_bo_two_files.html [ 348.611179] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang: TDH <0> TDT <f> next_to_use <f> next_to_clean <0> buffer_info[next_to_clean]: time_stamp <fffc10e8> next_to_watch <0> jiffies <10000be80> next_to_watch.status <0> MAC Status <40000083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 349.571421] i915 0000:00:02.0: Resetting chip after gpu hang [ 350.594120] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3346/shard-kbl3/igt@gem_set_tiling_vs_gtt.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3346/shard-kbl3/igt@drm_import_export@prime.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3346/shard-kbl3/igt@gem_ppgtt@flink-and-exit-vma-leak.html [ 268.828009] i915 0000:00:02.0: Resetting rcs0 after gpu hang [ 268.882233] [drm:i915_gem_wait_for_idle [i915]] *ERROR* Failed to idle engines, declaring wedged! [ 279.136624] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out [ 289.381702] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out [ 299.626832] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3352/shard-kbl5/igt@gem_mmap_wc@write.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3349/shard-kbl6/igt@gem_mmap_gtt@basic-wc.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3353/shard-kbl1/igt@prime_busy@wait-after-vebox.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3353/shard-kbl1/igt@gem_exec_params@invalid-bsd2-flag-on-vebox.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3355/shard-kbl4/igt@syncobj_wait@invalid-single-wait-unsubmitted.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3355/shard-kbl4/igt@gem_ctx_bad_exec@default.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3355/shard-kbl4/igt@kms_addfb_basic@clobberred-modifier.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3989/shard-kbl6/igt@prime_vgem@fence-wait-vebox.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3989/shard-kbl6/igt@gem_exec_store@cachelines-default.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3989/shard-kbl6/igt@syncobj_basic@bad-create-flags.html *** Bug 103039 has been marked as a duplicate of this bug. *** *** Bug 103049 has been marked as a duplicate of this bug. *** *** Bug 102586 has been marked as a duplicate of this bug. *** On KBL sometimes we're getting same assertion with test igt@gem_exec_suspend@basic-s3: (gem_exec_suspend:11459) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_suspend:11459) igt-aux-CRITICAL: Failed assertion: !"GPU hung" latest commit with fail: IGT-Version: 1.20-gf8f6db9 (x86_64) (Linux: 4.14.0-drm-intel-qa-ww47-commit-f710441+ x86_64) https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3370/shard-kbl6/igt@gem_userptr_blits@unsync-overlap.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3997/shard-kbl4/igt@kms_color@legacy-gamma-reset-pipe2.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3997/shard-kbl4/igt@prime_vgem@busy-bsd2.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3997/shard-kbl4/igt@gem_exec_reloc@basic-gtt-cpu-active.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3368/shard-kbl4/igt@vgem_basic@second-client.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4006/shard-kbl2/igt@perf_pmu@invalid-init.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4006/shard-kbl2/igt@perf_pmu@semaphore-wait-vcs0.html Please note that only from CI_DRM_3375 onwards having DMC 1.04. This based on still CI_DRM_3373/ Resolving now as CI_DRM_3375 has 1.04: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3375/commits_short.log Lets see results after this now. CI_DRM_3375 and CI_DRM_3376 has no of: *ERROR* Timeout waiting for engines to idle nor *ERROR* [CRTC:36:pipe A] flip_done timed out. I will close this |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.