One of the subtests in gem_exec_parallel often hangs the host. Below is dump from SKL6700K on Z170 MB, hanged hard on igt@gem_exec_parallel@render-fds after running tests/intel-ci/fast-feedback.testlist CI_DRM_2352 is drm-tip, todays build. For details https://intel-gfx-ci.01.org/CI/ [ 947.215802] general protection fault: 0000 [#1] PREEMPT SMP [ 947.221439] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek x 86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul gh ash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei e1000e igb ptp pps_core pr ime_numbers pinctrl_sunrisepoint pinctrl_intel i2c_hid [last unloaded: i915] [ 947.254918] CPU: 6 PID: 47 Comm: ksoftirqd/6 Tainted: G U 4.11.0-rc2-CI-CI_DRM_2352+ #1 [ 947.264181] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2 017 [ 947.273499] task: ffff88042bdaa7c0 task.stack: ffffc900001fc000 [ 947.279489] RIP: 0010:notifier_call_chain+0x59/0xa0 [ 947.284426] RSP: 0018:ffffc900001ffd38 EFLAGS: 00010286 [ 947.289698] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 00000000ffffffff [ 947.296917] RDX: ffff8803bf65d5c0 RSI: 0000000000000001 RDI: ffff88041d05e4c8 [ 947.304249] RBP: ffffc900001ffd70 R08: 0000000000000000 R09: 643e07b800000000 [ 947.311544] R10: 0000000000000000 R11: ffff88042bdaa7c0 R12: 0000000000000000 [ 947.318833] R13: 0000000000000000 R14: 00000000ffffffff R15: 6b6b6b6b6b6b6b6b [ 947.326070] FS: 0000000000000000(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000 [ 947.334408] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 947.340242] CR2: 00007f83d8000010 CR3: 0000000429021000 CR4: 00000000003406e0 [ 947.347512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 947.354801] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 947.362080] Call Trace: [ 947.364582] __atomic_notifier_call_chain+0x73/0x110 [ 947.369709] ? unregister_die_notifier+0x20/0x20 [ 947.374406] atomic_notifier_call_chain+0x11/0x20 [ 947.379276] intel_lrc_irq_handler+0x191/0x490 [i915] [ 947.384458] tasklet_hi_action+0xf0/0x110 [ 947.388611] __do_softirq+0x116/0x4c0 [ 947.392321] run_ksoftirqd+0x22/0x50 [ 947.395960] smpboot_thread_fn+0x180/0x280 [ 947.400129] kthread+0x107/0x140 [ 947.403431] ? sort_range+0x20/0x20 [ 947.407002] ? kthread_create_on_node+0x40/0x40 [ 947.411621] ret_from_fork+0x2e/0x40 [ 947.415233] Code: 4c 89 ff 41 ff 17 4d 85 e4 41 89 c5 74 05 41 83 04 24 01 41 f7 c5 00 80 00 00 7 5 39 83 eb 01 4d 89 f7 4d 85 ff 74 2e 85 db 74 2a <49> 8b 3f 4d 8b 77 08 e8 cb ca ff ff 85 c0 75 bd 48 c7 c2 04 f1 [ 947.434475] RIP: notifier_call_chain+0x59/0xa0 RSP: ffffc900001ffd38 [ 947.440931] ---[ end trace e6564010da93ee3e ]--- [ 947.608936] Kernel panic - not syncing: Fatal exception in interrupt [ 947.615465] Kernel Offset: disabled [ 947.791838] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [ 947.799101] ------------[ cut here ]------------ [ 947.803805] WARNING: CPU: 6 PID: 47 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3a/ 0x40 [ 947.813181] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek x 86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul gh ash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei e1000e igb ptp pps_core pr ime_numbers pinctrl_sunrisepoint pinctrl_intel i2c_hid [last unloaded: i915] [ 947.846626] CPU: 6 PID: 47 Comm: ksoftirqd/6 Tainted: G UD 4.11.0-rc2-CI-CI_DRM_2352+ #1 [ 947.855898] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2 017 [ 947.865208] Call Trace: [ 947.867711] <IRQ> [ 947.869768] dump_stack+0x67/0x92 [ 947.873129] __warn+0xc6/0xe0 [ 947.876145] warn_slowpath_null+0x18/0x20 [ 947.880218] native_smp_send_reschedule+0x3a/0x40 [ 947.885019] trigger_load_balance+0x2cd/0x580 [ 947.889448] ? trigger_load_balance+0x6f/0x580 [ 947.893956] scheduler_tick+0x97/0xc0 [ 947.897673] ? tick_sched_handle.isra.7+0x40/0x40 [ 947.902458] update_process_times+0x42/0x50 [ 947.906721] tick_sched_handle.isra.7+0x1c/0x40 [ 947.911356] tick_sched_timer+0x3d/0x70 [ 947.915249] __hrtimer_run_queues+0xf3/0x530 [ 947.919590] hrtimer_interrupt+0xb9/0x210 [ 947.923655] local_apic_timer_interrupt+0x31/0x50 [ 947.928449] smp_apic_timer_interrupt+0x33/0x50 [ 947.933084] apic_timer_interrupt+0x90/0xa0 [ 947.937330] RIP: 0010:panic+0x1c7/0x205 [ 947.941231] RSP: 0018:ffffc900001ffb90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 947.948909] RAX: 0000000000000041 RBX: 0000000000000000 RCX: 0000000000000000 [ 947.956191] RDX: 0000000000000101 RSI: ffffffff81c6e65d RDI: ffffffff8117ef23 [ 947.963410] RBP: ffffc900001ffc00 R08: 0000000000000001 R09: 0000000000000000 [ 947.970664] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 947.977909] R13: 0000000000000000 R14: 0000000000000000 R15: 6b6b6b6b6b6b6b6b [ 947.985170] </IRQ> [ 947.987320] ? panic+0x1c4/0x205 [ 947.990604] ? kmsg_dump+0x11f/0x1c0 [ 947.994237] oops_end+0x78/0x90 [ 947.997435] die+0x46/0x60 [ 948.000172] do_general_protection+0xe0/0x1a0 [ 948.004610] general_protection+0x22/0x30 [ 948.008683] RIP: 0010:notifier_call_chain+0x59/0xa0 [ 948.013650] RSP: 0018:ffffc900001ffd38 EFLAGS: 00010286 [ 948.018963] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 00000000ffffffff [ 948.026224] RDX: ffff8803bf65d5c0 RSI: 0000000000000001 RDI: ffff88041d05e4c8 [ 948.033487] RBP: ffffc900001ffd70 R08: 0000000000000000 R09: 643e07b800000000 [ 948.040759] R10: 0000000000000000 R11: ffff88042bdaa7c0 R12: 0000000000000000 [ 948.048020] R13: 0000000000000000 R14: 00000000ffffffff R15: 6b6b6b6b6b6b6b6b [ 948.055277] __atomic_notifier_call_chain+0x73/0x110 [ 948.060328] ? unregister_die_notifier+0x20/0x20 [ 948.065034] atomic_notifier_call_chain+0x11/0x20 [ 948.069835] intel_lrc_irq_handler+0x191/0x490 [i915] [ 948.074960] tasklet_hi_action+0xf0/0x110 [ 948.079023] __do_softirq+0x116/0x4c0 [ 948.082766] run_ksoftirqd+0x22/0x50 [ 948.086406] smpboot_thread_fn+0x180/0x280 [ 948.090593] kthread+0x107/0x140 [ 948.093867] ? sort_range+0x20/0x20 [ 948.097421] ? kthread_create_on_node+0x40/0x40 [ 948.102041] ret_from_fork+0x2e/0x40 [ 948.105681] ---[ end trace e6564010da93ee3f ]---
Created attachment 130263 [details] dmesg from SKL CI_DRM_2352 Added dmesg from boot
commit 3fc03069bc6e6c316f19bb526e3c8ce784677477 Author: Changbin Du <changbin.du@intel.com> Date: Mon Mar 13 10:47:11 2017 +0800 drm/i915: make context status notifier head be per engine GVTg has introduced the context status notifier to schedule the GVTg workload. At that time, the notifier is bound to GVTg context only, so GVTg is not aware of host workloads. Now we are going to improve GVTg's guest workload scheduler policy, and add Guc emulation support for new Gen graphics. Both these two features require acknowledgment for all contexts running on hardware. (But will not alter host workload.) So here try to make some change. The change is simple: 1. Move the context status notifier head from i915_gem_context to intel_engine_cs. Which means there is a notifier head per engine instead of per context. Execlist driver still call notifier for each context sched-in/out events of current engine. 2. At GVTg side, it binds a notifier_block for each physical engine at GVTg initialization period. Then GVTg can hear all context status events. In this patch, GVTg do nothing for host context event, but later will add a function there. But in any case, the notifier callback is a noop if this is no active vGPU. Since intel_gvt_init() is called at early initialization stage and require the status notifier head has been initiated, I initiate it in intel_engine_setup(). v2: remove a redundant newline. (chris) Fixes: 3c7ba6359d70 ("drm/i915: Introduce execlist context status change notification") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100232 Signed-off-by: Changbin Du <changbin.du@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170313024711.28591-1-changbin.du@intel.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*** Bug 100253 has been marked as a duplicate of this bug. ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.