Bug 103163

Summary: [CI] igt@* - Softdog incomplete
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: KBL i915 features: display/Other
Bug Depends on: 103170    
Bug Blocks: 103165    

Description Marta Löfstedt 2017-10-09 11:03:59 UTC
On CI_DRM_3190 KBL-shards igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-render caused softdog incomplete:  

Note earlier in dmesg:
<3>[  195.552323] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:56:pipe C] flip_done timed out
<3>[  205.792318] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:56:pipe C] flip_done timed out
<7>[  365.375387] [IGT] kms_flip: exiting, ret=99
<3>[  375.776423] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:56:pipe C] flip_done timed out
<3>[  386.016397] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:56:pipe C] flip_done timed out
<6>[  397.733301] [drm] GPU HANG: ecode 9:1:0xfffffffe, reason: Hang on bcs0, vcs1, vecs0, action: reset
<6>[  397.733355] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[  397.733358] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[  397.733361] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[  397.733363] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[  397.733366] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<7>[  397.733652] [drm:i915_reset_device [i915]] resetting chip
also:
<3>[  429.024393] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

So, the issue is likely to have started in an earlier test. However, from cibuglog perspective the bug must be filed to the correct test. Note, this is specifically unfair for test on the shards.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3190/shard-kbl1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-render.html
Comment 1 Marta Löfstedt 2017-10-10 10:43:44 UTC
*** Bug 103189 has been marked as a duplicate of this bug. ***
Comment 6 Marta Löfstedt 2017-10-18 07:53:03 UTC
before softdog is triggered:
<3>[  264.865531] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<3>[  274.911357] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  285.151298] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  295.391171] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3255/shard-kbl3/igt@kms_draw_crc@draw-method-xrgb8888-blt-ytiled.html
Comment 7 Marta Löfstedt 2017-10-18 07:53:43 UTC
Before softdog is triggered:
<3>[  472.866651] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<3>[  482.784380] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  493.024355] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  503.264164] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3255/shard-kbl4/igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-ytiled.html
Comment 8 Marta Löfstedt 2017-10-18 07:54:22 UTC
before softdog is triggered:
<3>[  272.865948] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<3>[  283.104358] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  293.344405] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  303.584205] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3254/shard-kbl3/igt@pm_rpm@gem-execbuf.html
Comment 9 Marta Löfstedt 2017-10-18 07:54:43 UTC
Also, see BAT bug: https://bugs.freedesktop.org/show_bug.cgi?id=102846
Comment 10 Marta Löfstedt 2017-10-18 11:56:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3256/shard-kbl2/igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-xtiled.html

<3>[  476.127323] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  486.367350] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  496.607401] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<2>[  825.933377] softdog: Initiating panic
<0>[  825.933401] Kernel panic - not syncing: Software Watchdog Timer expired
Comment 11 Marta Löfstedt 2017-10-25 06:40:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3279/shard-kbl6/igt@kms_plane_multiple@atomic-pipe-A-tiling-x.html

<3>[  425.951254] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  436.191302] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  446.431249] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<12>[  761.457919] owatch: TIMEOUT!
<12>[  761.458582] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
<12>[  761.459856] owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
<2>[  762.459776] softdog: Initiating panic

Also, note the new owatch prints in run.log:
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
FATAL: command execution failed
java.io.EOFException
Comment 12 Marta Löfstedt 2017-10-25 06:43:24 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3279/shard-kbl7/igt@kms_flip_tiling@flip-to-Yf-tiled.html

from dmesg:
<7>[  295.530616] [drm:drm_mode_setcrtc] [CRTC:56:pipe C]
<3>[  305.772464] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  316.017533] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<12>[  597.328922] owatch: TIMEOUT!
<12>[  597.329563] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
<12>[  597.331414] owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
<2>[  598.331231] softdog: Initiating panic

from run.log:
[15/73] skip: 3, pass: 11, dmesg-fail: 1 \   
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
FATAL: command execution failed
java.io.EOFException
Comment 13 Marta Löfstedt 2017-11-02 06:56:06 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl2/igt@gem_pwrite_pread@display-pwrite-blt-gtt_mmap-performance.html

From dmesg:
<3>[  108.553701] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
...
<3>[  171.097364] [drm:intel_engines_park [i915]] *ERROR* rcs0 is not idle before parking
...
So long time before this test start we are already in trouble:
<7>[  334.101061] [IGT] gem_pwrite_pread: starting subtest display-pwrite-blt-gtt_mmap-performance


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl2/dmesg-1509579602_Panic_2.log
<12>[  680.293023] owatch: TIMEOUT!

run.log:
[27/72] skip: 6, pass: 17, dmesg-warn: 1, fail: 1, dmesg-fail: 2 \   
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
Comment 14 Marta Löfstedt 2017-11-09 07:49:34 UTC
Note, although there are incompletes here I don't believe are related to the DMC issue it is my intention to close this once the new DMC have arrived.
Same goes for bug 103170 and bug 103165.
Comment 15 Marta Löfstedt 2017-11-09 08:50:36 UTC
IGT Patchwork:
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_479/fi-kbl-7560u/igt@gem_linear_blits@basic.html

<3>[  762.052057] [drm:intel_engines_park [i915]] *ERROR* vecs0 is not idle before parking
<7>[  762.052252] intel_engines_park vecs0
<7>[  762.052268] intel_engines_park 	current seqno 30639, last 30639, hangcheck 30639 [2252 ms], inflight 0
<7>[  762.052432] intel_engines_park 	Reset count: 16
<7>[  762.052445] intel_engines_park 	Requests:
<7>[  762.052621] intel_engines_park 	RING_START: 0x0001b000 [0x00000000]
<7>[  762.052639] intel_engines_park 	RING_HEAD:  0x00000278 [0x00000000]
<7>[  762.052654] intel_engines_park 	RING_TAIL:  0x00000278 [0x00000000]
<7>[  762.052672] intel_engines_park 	RING_CTL:   0x00003000
<7>[  762.052691] intel_engines_park 	RING_MODE:  0x00000200 [idle]
<7>[  762.052715] intel_engines_park 	ACTHD:  0x00000000_00000278
<7>[  762.052736] intel_engines_park 	BBADDR: 0x00000000_00000004
<7>[  762.052755] intel_engines_park 	Execlist status: 0x00000301 00000000
<7>[  762.052770] intel_engines_park 	Execlist CSB read 0 [-1 cached], write 1 [1 from hws], interrupt posted? no
<7>[  762.052791] intel_engines_park 	Execlist CSB[1]: 0x00000018 [0x00000018 in hwsp], context: 2 [2 in hwsp]
<7>[  762.052804] intel_engines_park 		ELSP[0] count=1, 
<7>[  762.052822] intel_engines_park rq: 30639! [2:8] prio=0 @ 12226ms: gem_linear_blit[3111]/0
<7>[  762.052840] intel_engines_park 		ELSP[1] idle
<7>[  762.052906] intel_engines_park 		HW active? 0x1
<7>[  762.052925] intel_engines_park Idle? no
<7>[  762.052937] intel_engines_park 
<3>[  772.338318] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  782.573188] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<12>[  788.665090] owatch: TIMEOUT!
<12>[  788.665325] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
<12>[  788.666443] owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
<2>[  789.665379] softdog: Initiating panic
Comment 16 Marta Löfstedt 2017-11-16 08:22:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3352/shard-kbl5/igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-xtiled.html

Note: CONFIG_GLOBAL_TRACE_BUF_SIZE=8192

dmesg: 
<5>[   16.509214] owatch: Using watchdog device /dev/watchdog0
<5>[   16.509313] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   16.509837] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[  311.281098] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CONNECTOR:58:DP-1] flip_done timed out
<3>[  321.520935] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  331.761143] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

run.log:
[04/73] skip: 1, pass: 1, dmesg-warn: 2 \
running: igt/kms_draw_crc/draw-method-rgb565-mmap-gtt-xtiled
[04/73] skip: 1, pass: 1, dmesg-warn: 2 |                   
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3352@shard-kbl5 : FAILURE
CI_IGT_test runtime 665 seconds
Rebooting shard-kbl5

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3352/shard-kbl5/dmesg-1510786732_Oops_1.log
<0>[  635.077407]   <idle>-0       0..s1 20614228us : intel_lrc_irq_handler: rcs0 csb[1d]: status=0x00008002:0x00000004
...
<0>[  635.084549] ksoftirq-7       0..s. 288782600us : intel_lrc_irq_handler: rcs0 in[0]:  ctx=3.1, seqno=ea
<0>[  635.084551] ---------------------------------
<4>[  635.084553] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel i2c_hid
<4>[  635.084578] CPU: 3 PID: 1006 Comm: owatch Not tainted 4.14.0-CI-CI_DRM_3352+ #1
<4>[  635.084579] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
<4>[  635.084582] task: ffff88026f6550c0 task.stack: ffffc90000ac4000
<4>[  635.084585] RIP: 0010:sysrq_handle_crash+0x41/0x80
<4>[  635.084587] RSP: 0018:ffffc90000ac7e10 EFLAGS: 00010286
<4>[  635.084590] RAX: ffff88026f6550c0 RBX: 0000000000000063 RCX: c15960b1a52c1b78
<4>[  635.084592] RDX: 0000000000000001 RSI: 000000005dc120db RDI: ffff88026f6550c0
<4>[  635.084593] RBP: ffffc90000ac7e38 R08: ffff88026f655980 R09: 0000000000000000
<4>[  635.084595] R10: 00000000476afa9d R11: 00000000b7f6c965 R12: 0000000000000004
<4>[  635.084597] R13: ffffffff81ebf240 R14: 0000000000000000 R15: 00007efe79e81400
<4>[  635.084599] FS:  00007efe7a08f700(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000
<4>[  635.084601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  635.084603] CR2: 0000000000000000 CR3: 000000026ecb6005 CR4: 00000000003606e0
<4>[  635.084604] Call Trace:
<4>[  635.084608]  __handle_sysrq+0x128/0x200
<4>[  635.084611]  write_sysrq_trigger+0x48/0x50
<4>[  635.084614]  proc_reg_write+0x3d/0x60
<4>[  635.084618]  __vfs_write+0x23/0x130
<4>[  635.084621]  ? rcu_read_lock_sched_held+0x74/0x80
<4>[  635.084623]  ? rcu_sync_lockdep_assert+0x2a/0x50
<4>[  635.084626]  ? __sb_start_write+0xde/0x1f0
<4>[  635.084628]  ? __sb_start_write+0xf8/0x1f0
<4>[  635.084631]  vfs_write+0xc2/0x1b0
<4>[  635.084634]  SyS_write+0x45/0xb0
<4>[  635.084637]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  635.084639] RIP: 0033:0x7efe79bbb8f0
<4>[  635.084641] RSP: 002b:00007fff3f62a418 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  635.084644] RAX: ffffffffffffffda RBX: ffffc90000ac7ff0 RCX: 00007efe79bbb8f0
<4>[  635.084646] RDX: 0000000000000001 RSI: 000000000136e250 RDI: 0000000000000003
<4>[  635.084648] RBP: 0000000000000001 R08: 00007efe7a08f700 R09: 0000000000000410
<4>[  635.084650] R10: 00007efe79e84b58 R11: 0000000000000246 R12: 0000000000000046
<4>[  635.084651] R13: 00007fff3f62a660 R14: 0000000000000000 R15: 0000000000000000
<4>[  635.084655] Code: 33 e8 d4 c8 b7 ff 48 c7 c2 e7 80 57 81 be 01 00 00 00 48 c7 c7 80 d6 e4 81 e8 5c f9 b5 ff c7 05 02 2d ed 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c3 e8 81 f5 b7 ff 84 c0 75 c4 48 c7 c2 
<1>[  635.084709] RIP: sysrq_handle_crash+0x41/0x80 RSP: ffffc90000ac7e10
<4>[  635.084711] CR2: 0000000000000000
<4>[  635.084714] ---[ end trace 703db5d686bfd1fe ]---
Comment 17 Marta Löfstedt 2017-11-16 08:27:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3349/shard-kbl6/igt@kms_color@ctm-green-to-red-pipe0.html

<5>[   15.010319] owatch: Using watchdog device /dev/watchdog0
<5>[   15.010451] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   15.011354] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[  776.069567] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  786.309387] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  796.549392] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  806.789367] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out

run.log
running: igt/kms_color/ctm-green-to-red-pipe0           
[06/72] skip: 1, pass: 2, dmesg-warn: 1, dmesg-fail: 2 -
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3349@shard-kbl6 : FAILURE
CI_IGT_test runtime 847 seconds
Rebooting shard-kbl6

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3349/shard-kbl6/dmesg-1510758445_Oops_1.log
<0>[  821.830927]   <idle>-0       3..s1 19024456us : intel_lrc_irq_handler: bcs0 cs-irq head=1 [1], tail=3 [3]
...
<0>[  821.838924] ksoftirq-29      3..s. 763681457us : intel_lrc_irq_handler: rcs0 in[0]:  ctx=2.1, seqno=fa
<0>[  821.838931] ---------------------------------
<4>[  821.838936] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec x86_pkg_temp_thermal snd_hwdep intel_powerclamp snd_hda_core coretemp crct10dif_pclmul crc32_pclmul snd_pcm ghash_clmulni_intel e1000e ptp pps_core mei_me mei prime_numbers pinctrl_sunrisepoint i2c_hid pinctrl_intel
<4>[  821.838974] CPU: 1 PID: 1025 Comm: owatch Not tainted 4.14.0-CI-CI_DRM_3349+ #1
<4>[  821.838980] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
<4>[  821.838988] task: ffff88026d44d0c0 task.stack: ffffc900008b4000
<4>[  821.838994] RIP: 0010:sysrq_handle_crash+0x45/0x80
<4>[  821.838999] RSP: 0018:ffffc900008b7de0 EFLAGS: 00010282
<4>[  821.839004] RAX: ffff88026d44d0c0 RBX: 0000000000000063 RCX: 06bfa1bb72b91bb3
<4>[  821.839010] RDX: 0000000000000001 RSI: 00000000187f961e RDI: ffff88026d44d0c0
<4>[  821.839016] RBP: ffffc900008b7de0 R08: ffff88026d44d980 R09: 0000000000000000
<4>[  821.839021] R10: 00000000a527f8ee R11: 000000005a398595 R12: 0000000000000000
<4>[  821.839027] R13: 0000000000000004 R14: ffffffff81ebf1c0 R15: ffffc900008b7f20
<4>[  821.839033] FS:  00007ff8f3dab700(0000) GS:ffff88027ec80000(0000) knlGS:0000000000000000
<4>[  821.839040] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  821.839044] CR2: 0000000000000000 CR3: 000000026ecbe003 CR4: 00000000003606e0
<4>[  821.839050] Call Trace:
<4>[  821.839054]  __handle_sysrq+0x132/0x210
<4>[  821.839059]  write_sysrq_trigger+0x51/0x60
<4>[  821.839064]  proc_reg_write+0x42/0x70
<4>[  821.839068]  __vfs_write+0x28/0x130
<4>[  821.839073]  ? rcu_read_lock_sched_held+0x7a/0x90
<4>[  821.839077]  ? rcu_sync_lockdep_assert+0x2f/0x60
<4>[  821.839082]  ? __sb_start_write+0x10c/0x200
<4>[  821.839087]  vfs_write+0xc8/0x1c0
<4>[  821.839091]  SyS_write+0x49/0xb0
<4>[  821.839096]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  821.839101] RIP: 0033:0x7ff8f38d78f0
<4>[  821.839104] RSP: 002b:00007fff094efe28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  821.839111] RAX: ffffffffffffffda RBX: ffffffff814920c3 RCX: 00007ff8f38d78f0
<4>[  821.839117] RDX: 0000000000000001 RSI: 000000000195a250 RDI: 0000000000000003
<4>[  821.839123] RBP: ffffc900008b7f88 R08: 00007ff8f3dab700 R09: 0000000000000410
<4>[  821.839128] R10: 00007ff8f3ba0b58 R11: 0000000000000246 R12: 000000000195a020
<4>[  821.839134] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
<4>[  821.839141]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  821.839146] Code: 34 e8 e0 cc b5 ff 48 c7 c2 ab ef 59 81 be 01 00 00 00 48 c7 c7 00 d6 e4 81 e8 08 e8 b3 ff c7 05 3e ee b6 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 6c fb b5 ff 84 c0 75 c3 48 c7 
<1>[  821.839196] RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc900008b7de0
<4>[  821.839201] CR2: 0000000000000000
<4>[  821.839205] ---[ end trace 01bc9cfdf38e8ce4 ]---

As with previous, we don't see anything from owatch in dmesg/oops.log
Comment 18 Marta Löfstedt 2017-11-16 09:37:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-snb4/igt@kms_flip@flip-vs-dpms-off-vs-modeset.html

<3>[  105.994237] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<12>[  113.537128] owatch: TIMEOUT!
<12>[  113.537456] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
<12>[  113.539488] owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
<2>[  114.539750] softdog: Initiating panic
<0>[  114.539818] Kernel panic - not syncing: Software Watchdog Timer expired
<4>[  114.539904] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc7-CI-CI_DRM_3311+ #1
<4>[  114.539995] Hardware name: GIGABYTE GB-BKi7(H)A-7500/MFLP7AP-00, BIOS F7 06/28/2017
<4>[  114.540084] Call Trace:
<4>[  114.540122]  <IRQ>
<4>[  114.540171]  dump_stack+0x68/0x9f
<4>[  114.540223]  panic+0xd4/0x21d
<4>[  114.540275]  ? printk+0x43/0x4b
<4>[  114.540336]  softdog_fire+0x6b/0x70
<4>[  114.540390]  __hrtimer_run_queues+0xed/0x4d0
<4>[  114.540453]  ? softdog_ping+0x50/0x50
<4>[  114.540511]  hrtimer_interrupt+0xb5/0x200
<4>[  114.540577]  smp_apic_timer_interrupt+0x7d/0x2e0
<4>[  114.540641]  apic_timer_interrupt+0x9a/0xa0
<4>[  114.540696]  </IRQ>
<4>[  114.540735] RIP: 0010:cpuidle_enter_state+0x136/0x370
<4>[  114.540799] RSP: 0018:ffffc900000c3e80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
<4>[  114.540897] RAX: ffff880263e22880 RBX: 0000000001de8779 RCX: 0000000000000001
<4>[  114.540981] RDX: 0000000000000000 RSI: ffffffff81d0eb14 RDI: ffffffff81cc26f6
<4>[  114.541064] RBP: ffffc900000c3eb8 R08: 00000000000006a4 R09: 0000000000000006
<4>[  114.541147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
<4>[  114.541230] R13: 0000000000000004 R14: ffffe8ffffd83850 R15: 0000001aa93acdd3
<4>[  114.541340]  cpuidle_enter+0x17/0x20
<4>[  114.541395]  call_cpuidle+0x23/0x40
<4>[  114.541448]  do_idle+0x192/0x1e0
<4>[  114.541504]  cpu_startup_entry+0x1d/0x20
<4>[  114.541561]  start_secondary+0x11c/0x140
<4>[  114.541622]  secondary_startup_64+0xa5/0xa5
<0>[  114.541794] Kernel Offset: disabled
Comment 19 Marta Löfstedt 2017-11-16 13:00:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7130/shard-kbl1/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-render.html

dmesg:
<5>[   15.157717] owatch: Using watchdog device /dev/watchdog0
<5>[   15.157813] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   15.158468] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[  614.367229] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<7>[  614.367509] [drm:verify_connector_state.isra.74 [i915]] [CONNECTOR:58:DP-1]
<7>[  614.367651] [drm:intel_atomic_commit_tail [i915]] [CRTC:36:pipe A]
<7>[  614.367858] [drm:verify_single_dpll_state.isra.75 [i915]] DPLL 1

run.log
running: igt/kms_frontbuffer_tracking/fbc-1p-primscrn-pri-indfb-draw-render
[24/73] skip: 8, pass: 11, dmesg-warn: 3, dmesg-fail: 2 |                  
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
...
Finished: FAILURE
Completed CI_IGT_test Patchwork_7130@shard-kbl1 : FAILURE
CI_IGT_test runtime 972 seconds
Comment 20 Marta Löfstedt 2017-11-17 07:14:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3353/shard-kbl1/igt@kms_3d.html

Note still no header in oops logs, although ftrace buffer was set to 1024.

<0>[  873.100378] ksoftirq-17      1..s. 524757167us : intel_lrc_irq_handler: vecs0 in[0]:  ctx=4.1, seqno=f9
<0>[  873.100398] ksoftirq-17      1..s. 528797006us : intel_lrc_irq_handler: rcs0 in[0]:  ctx=4.1, seqno=1fd
<0>[  873.100405] ---------------------------------
<4>[  873.100409] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel e1000e snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_pcm ptp pps_core mei_me prime_numbers mei pinctrl_sunrisepoint pinctrl_intel i2c_hid
<4>[  873.100447] CPU: 1 PID: 1078 Comm: owatch Tainted: G     U          4.14.0-CI-CI_DRM_3353+ #1
<4>[  873.100453] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4>[  873.100461] task: ffff88026dfa2880 task.stack: ffffc900009b4000
<4>[  873.100467] RIP: 0010:sysrq_handle_crash+0x41/0x80
<4>[  873.100471] RSP: 0018:ffffc900009b7e10 EFLAGS: 00010286
<4>[  873.100476] RAX: ffff88026dfa2880 RBX: 0000000000000063 RCX: abac48f149fcf827
<4>[  873.100482] RDX: 0000000000000001 RSI: 000000002239a1db RDI: ffff88026dfa2880
<4>[  873.100487] RBP: ffffc900009b7e38 R08: ffff88026dfa3140 R09: 0000000000000000
<4>[  873.100493] R10: 0000000027c3564c R11: 00000000d7992c33 R12: 0000000000000004
<4>[  873.100498] R13: ffffffff81ebf240 R14: 0000000000000000 R15: 00007f6ecef57400
<4>[  873.100504] FS:  00007f6ecf165700(0000) GS:ffff88027ec80000(0000) knlGS:0000000000000000
<4>[  873.100510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  873.100515] CR2: 0000000000000000 CR3: 000000026edff005 CR4: 00000000003606e0
<4>[  873.100521] Call Trace:
<4>[  873.100525]  __handle_sysrq+0x128/0x200
<4>[  873.100530]  write_sysrq_trigger+0x48/0x50
<4>[  873.100534]  proc_reg_write+0x3d/0x60
<4>[  873.100539]  __vfs_write+0x23/0x130
<4>[  873.100544]  ? rcu_read_lock_sched_held+0x74/0x80
<4>[  873.100548]  ? rcu_sync_lockdep_assert+0x2a/0x50
<4>[  873.100553]  ? __sb_start_write+0xde/0x1f0
<4>[  873.100557]  ? __sb_start_write+0xf8/0x1f0
<4>[  873.100561]  vfs_write+0xc2/0x1b0
<4>[  873.100565]  SyS_write+0x45/0xb0
<4>[  873.100570]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  873.100575] RIP: 0033:0x7f6ecec918f0
<4>[  873.100578] RSP: 002b:00007ffd9f063db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  873.100585] RAX: ffffffffffffffda RBX: ffffc900009b7ff0 RCX: 00007f6ecec918f0
<4>[  873.100590] RDX: 0000000000000001 RSI: 0000000000918250 RDI: 0000000000000003
<4>[  873.100596] RBP: 0000000000000001 R08: 00007f6ecf165700 R09: 0000000000000410
<4>[  873.100601] R10: 00007f6ecef5ab58 R11: 0000000000000246 R12: 0000000000000046
<4>[  873.100607] R13: 00007ffd9f064000 R14: 0000000000000000 R15: 0000000000000000
<4>[  873.100614] Code: 33 e8 d4 c8 b7 ff 48 c7 c2 e7 80 57 81 be 01 00 00 00 48 c7 c7 80 d6 e4 81 e8 5c f9 b5 ff c7 05 02 2d ed 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c3 e8 81 f5 b7 ff 84 c0 75 c4 48 c7 c2 
<1>[  873.100662] RIP: sysrq_handle_crash+0x41/0x80 RSP: ffffc900009b7e10
<4>[  873.100667] CR2: 0000000000000000
<4>[  873.100671] ---[ end trace 92e7d24c6d8cd86b ]---
<0>[  874.570964] Kernel panic - not syncing: Fatal exception
<0>[  874.570978] Dumping ftrace buffer:
<0>[  874.571231]    (ftrace buffer empty)
<0>[  874.571236] Kernel Offset: disabled
Comment 21 Marta Löfstedt 2017-11-20 07:31:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3989/shard-kbl6/igt@kms_ccs@pipe-c-crc-primary-basic.html

pstore is broken by ftrace,
However dmesg:
<3>[  320.115596] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

run.log:
[18/73] skip: 7, pass: 7, dmesg-warn: 2, dmesg-fail: 2 -
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution failed
Comment 22 Marta Löfstedt 2017-11-20 09:01:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3361/shard-kbl6/igt@gem_cs_tlb@basic-default.html

<3>[  171.567651] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

This is still broken:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3361/shard-kbl6/pstore5-1511164114_Oops_1.log

However, run.log
running: igt/gem_cs_tlb/basic-default      

[41/73] skip: 14, pass: 26, dmesg-fail: 1 /
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution f
Comment 23 Marta Löfstedt 2017-11-21 14:13:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3995/shard-kbl5/igt@kms_plane@plane-panning-bottom-right-pipe-c-planes.html

dmesg:
<5>[   20.371591] owatch: Using watchdog device /dev/watchdog0
<5>[   20.371851] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   20.372829] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[  196.488869] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:56:pipe C] flip_done timed out
...
<3>[  237.940462] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:56:pipe C] flip_done timed out
...
<7>[  237.941345] [drm:verify_single_dpll_state.isra.75 [i915]] DPLL 1
<12>[  497.502417] owatch: TIMEOUT!
<12>[  497.502681] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)

run.log:
[01/73] dmesg-warn: 1 /                                        
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution failed
java.io.EOFException
Comment 24 Marta Löfstedt 2017-11-23 11:22:46 UTC
Solution for DMC ver. 1.04 was added to CI_DRM_3375.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3375/commits_short.log

I am eagerly awaiting shard results. If OK results I will close.
Comment 26 Marta Löfstedt 2017-11-23 13:13:41 UTC
CI_DRM_3375 and CI_DRM_3376 has none of:
 "*ERROR* Timeout waiting for engines to idle" nor "*ERROR* [CRTC:36:pipe A] flip_done timed out.", related softdogs. 

I will close this

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.