Bug 110368 - [CI][SHARDS] igt@gem_exec_nop@basic-series - fail - Failed assertion: !"GPU hung"
Summary: [CI][SHARDS] igt@gem_exec_nop@basic-series - fail - Failed assertion: !"GPU h...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Mika Kuoppala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-09 10:51 UTC by Martin Peres
Modified: 2019-08-13 13:42 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-04-09 10:51:46 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5889/shard-iclb1/igt@gem_exec_nop@basic-series.html

Starting subtest: basic-series
(gem_exec_nop:5054) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:501:
(gem_exec_nop:5054) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-series failed.
Comment 1 CI Bug Log 2019-04-09 10:57:51 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_exec_nop@basic-series - fail - Failed assertion: !"GPU hung"
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5889/shard-iclb1/igt@gem_exec_nop@basic-series.html
Comment 2 Chris Wilson 2019-04-09 10:58:14 UTC
Stuck waiting for a CS interrupt. I don't think it's exactly the same as the garbage we see within the CSB -- as if we hit garbage there I expect the various asserts to fire. So a CS interrupt not arriving makes more sense?

<7> [1693.074610] hangcheck rcs0
<7> [1693.074625] hangcheck 	Hangcheck d2e90f01:d2e90f01 [6022 ms]
<7> [1693.074633] hangcheck 	Reset count: 5 (global 0)
<7> [1693.074641] hangcheck 	Requests:
<7> [1693.075384] hangcheck 	RING_START: 0x007f2000
<7> [1693.075909] hangcheck 	RING_HEAD:  0x000035f8
<7> [1693.076492] hangcheck 	RING_TAIL:  0x000035f8
<7> [1693.076506] hangcheck 	RING_CTL:   0x00003000
<7> [1693.076521] hangcheck 	RING_MODE:  0x00000200 [idle]
<7> [1693.076533] hangcheck 	RING_IMR: 00000000
<7> [1693.077255] hangcheck 	ACTHD:  0x00000000_024035f8
<7> [1693.079415] hangcheck 	BBADDR: 0x00000000_00000004
<7> [1693.080162] hangcheck 	DMA_FADDR: 0x00000000_00000000
<7> [1693.081461] hangcheck 	IPEIR: 0x00000000
<7> [1693.082112] hangcheck 	IPEHR: 0x00000000
<7> [1693.084094] hangcheck 	Execlist status: 0x00018001 00000000
<7> [1693.085382] hangcheck 	Execlist CSB read 0, write 0 [mmio:0], tasklet queued? no (enabled)
<7> [1693.085504] hangcheck 		ELSP[0] count=1, ring:{start:007f2000, hwsp:fffee200, seqno:000011b8}, rq:  b5f:11b8!  prio=6 @ 8128ms: signaled
<7> [1693.085511] hangcheck 		ELSP[1] idle
<7> [1693.085518] hangcheck 		HW active? 0x1
<7> [1693.085632] hangcheck 		Queue priority hint: 6
<7> [1693.085642] hangcheck 		Q  b5f:11ba  prio=6 @ 8128ms: gem_exec_nop[5054]
<7> [1693.085651] hangcheck 		Q  b5f:11bc  prio=4 @ 8127ms: gem_exec_nop[5054]
<7> [1693.085660] hangcheck 		Q  b5f:11be  prio=4 @ 8127ms: gem_exec_nop[5054]
<7> [1693.085667] hangcheck 		Q  b5f:11c0  prio=4 @ 8127ms: gem_exec_nop[5054]
<7> [1693.085675] hangcheck 		Q  b5f:11c2  prio=4 @ 8127ms: gem_exec_nop[5054]
<7> [1693.085683] hangcheck 		Q  b5f:11c4  prio=4 @ 8127ms: gem_exec_nop[5054]
<7> [1693.085692] hangcheck 		Q  b5f:11c6  prio=4 @ 8127ms: gem_exec_nop[5054]
<7> [1693.085756] hangcheck 		...skipping 82 queued requests...
<7> [1693.085764] hangcheck 		Q  b5f:126c  prio=4 @ 8124ms: gem_exec_nop[5054]
<7> [1693.085997] hangcheck HWSP:
<7> [1693.086004] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [1693.086008] hangcheck *
<7> [1693.086016] hangcheck [0040] 10000018 00000040 10008002 00000040 10000018 00000040 10000001 00000000
<7> [1693.086022] hangcheck [0060] 10000018 00000040 10000001 00000000 00000000 00000000 00000000 00000000
<7> [1693.086028] hangcheck [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [1693.086033] hangcheck *
<7> [1693.086039] hangcheck [00c0] 00000000 00000000 00000000 00000000 d2e90f01 00000000 00000000 00000000
<7> [1693.086045] hangcheck [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [1693.086050] hangcheck *
<7> [1693.086063] hangcheck Idle? no

However, the CSB pointers are zero... So that implies we did consume whatever was most recently signaled.
Comment 3 Jani Saarinen 2019-04-22 15:43:21 UTC
Seen only once, can someone comment impact for the user?
Comment 4 Jani Saarinen 2019-05-06 08:11:17 UTC
Still seen only once, lowering priority ok?
Comment 5 Francesco Balestrieri 2019-05-13 12:17:18 UTC
Yes, moving to medium.
Comment 6 Francesco Balestrieri 2019-06-03 06:45:44 UTC
And still only once, closing.
Comment 7 Lakshmi 2019-08-13 13:41:53 UTC
(In reply to Francesco Balestrieri from comment #6)
> And still only once, closing.

Still its the same. Closing and archiving this issue.
Comment 8 CI Bug Log 2019-08-13 13:42:00 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.