Summary: | [SKL][BAT] gem_exec_flush@basic-uc-pro-default incomplete in CI | ||
---|---|---|---|
Product: | DRI | Reporter: | Jani Saarinen <jani.saarinen> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs, tomeu |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | SKL | i915 features: |
Description
Jani Saarinen
2017-03-08 20:45:47 UTC
Now also other test incomplete. Documenting: https://intel-gfx-ci.01.org/CI/CI_DRM_2307/fi-skl-6770hq/igt@gem_exec_flush@basic-batch-kernel-default-uc.html Last change seen after these changes: 79e440c drm-tip: 2017y-03m-08d-20h-49m-20s UTC integration manifest 5b5554c drm/i915: Check for an invalid seqno before __i915_gem_request_started f166244 drm/i915: Purge i915_gem_object_is_dead() 03d1cac drm/i915: Avoiding recursing on ww_mutex inside shrinker 6f85859 drm-tip: 2017y-03m-08d-14h-47m-28s UTC integration manifest But might not be related to https://patchwork.freedesktop.org/series/20911/ as Chris saying: "More random unrelated fails, thanks for the report & review, pushed." Another incomplete https://intel-gfx-ci.01.org/CI/CI_DRM_2309/fi-skl-6700hq/igt@gem_exec_flush@basic-uc-ro-default.html (In reply to Jani Saarinen from comment #3) > Another incomplete > https://intel-gfx-ci.01.org/CI/CI_DRM_2309/fi-skl-6700hq/ > igt@gem_exec_flush@basic-uc-ro-default.html Another one: https://intel-gfx-ci.01.org/CI/CI_DRM_2310/fi-skl-6770hq/igt@gem_exec_flush@basic-uc-prw-default.html Can we get a trimmed list (no suspend or hibernate) and run it in a loop on a skl (seems to be most susceptible) and see if we can get anything out of netconsole? Or just be able to manually collect information when it freezes? Chris, it was decided on the CI meeting that I should categorize all bugs on cibuglog. I want to pin-point all bugs where the run didn't terminate as expected this would then be input to a task-force to get to the bottom of this problem. Another one: http://intel-gfx-ci.01.org/CI/CI_DRM_2312/fi-skl-6700hq/igt@gem_ctx_create@basic-files.html Theese are the only bugs for incomplete I have found so far, that has hudson timeout in igt.log and: [ 53.111596] [ INFO: possible circular locking dependency detected ] [ 53.111628] 4.11.0-rc1-CI-CI_DRM_2310+ #1 Not tainted in dmesg_before.txt Looks like deadlock is for pstore. We almost caught the ghost, but then pstore messed it up. I mailed Tony Luck about the deadloack I believe he is pstore maintainer. Now similar issues seen on pw runs: https://intel-gfx-ci.01.org/CI/Patchwork_4112/fi-skl-6700k/igt@gem_exec_flush@basic-batch-kernel-default-uc.html https://intel-gfx-ci.01.org/CI/Patchwork_4112/fi-kbl-7500u/igt@gem_exec_flush@basic-uc-rw-default.html Do we consider this to same bucket even run in patchwork? https://patchwork.freedesktop.org/series/21020/ Chris Wilson thinks that this commit in igt may have fixed the issue (https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/commit/?id=9759df989f18697a817d5de27021bae09bcf344e). Every run in between CI_DRM_2306 and CI_DRM_2315 were showing the issue, but nothing for the past 10 runs. We'll keep an eye on this for a little longer before closing the bug. (In reply to Martin Peres from comment #13) > Chris Wilson thinks that this commit in igt may have fixed the issue > (https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/commit/ > ?id=9759df989f18697a817d5de27021bae09bcf344e). > > Every run in between CI_DRM_2306 and CI_DRM_2315 were showing the issue, but > nothing for the past 10 runs. > > We'll keep an eye on this for a little longer before closing the bug. From #intel-gfx: ickle: tomeu: considering they still occur, my optimism that the signal fix was all that was required was wrong Reproduced using another 6700K/Z170 with CI_DRM_2333 i915_gem_request.h:203 is GEM_BUG_ON(fence && !dma_fence_is_i915(fence)); inside static inline struct drm_i915_gem_request * to_request(struct dma_fence *fence) {} --- [ 794.038599] [IGT] gem_exec_flush: starting subtest basic-uc-pro-default [ 796.056398] ------------[ cut here ]------------ [ 796.061108] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.h:203! [ 796.067755] invalid opcode: 0000 [#1] PREEMPT SMP [ 796.072537] Modules linked in: snd_hda_intel i915 vgem x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek] [ 796.102872] CPU: 6 PID: 19066 Comm: gem_exec_flush Tainted: G U 4.11.0-rc1-CI-CI_DRM_2333+ #1 [ 796.112715] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2017 [ 796.121998] task: ffff88042902cf40 task.stack: ffffc9000071c000 [ 796.128037] RIP: 0010:notify_ring+0x219/0x220 [i915] [ 796.133108] RSP: 0018:ffff88043ed83c28 EFLAGS: 00010007 [ 796.138431] RAX: 0000000000000001 RBX: ffff8803a1b22158 RCX: 0000000081edfc31 [ 796.145702] RDX: 0000000081edfc30 RSI: 0000000000000000 RDI: ffff8804235aea20 [ 796.152981] RBP: ffff88043ed83c48 R08: 0000000000000001 R09: 0000000000000001 [ 796.160261] R10: 0000000000000000 R11: ffff88042902cf40 R12: ffff8804235aea20 [ 796.167533] R13: ffffc9001143bbf8 R14: ffff8803a1b221a8 R15: ffff8804212e0000 [ 796.174795] FS: 00007ff8623148c0(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000 [ 796.183037] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 796.188896] CR2: 00007ffd5d3d6940 CR3: 000000037635a000 CR4: 00000000003406e0 [ 796.196150] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 796.203422] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 796.210693] Call Trace: [ 796.213179] <IRQ> [ 796.215245] gen8_gt_irq_handler+0x219/0x290 [i915] [ 796.220236] gen8_irq_handler+0x8e/0x6b0 [i915] [ 796.224855] __handle_irq_event_percpu+0x58/0x370 [ 796.229647] handle_irq_event_percpu+0x1e/0x50 [ 796.234181] handle_irq_event+0x34/0x60 [ 796.238089] handle_edge_irq+0xbe/0x150 [ 796.242008] handle_irq+0x15/0x20 [ 796.245377] do_IRQ+0x63/0x130 [ 796.248482] common_interrupt+0x90/0x90 [ 796.252390] RIP: 0010:_raw_spin_unlock_irqrestore+0x54/0x60 [ 796.258056] RSP: 0018:ffff88043ed83ea0 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff18 [ 796.265770] RAX: 0000000000000006 RBX: 0000000000000292 RCX: 0000000000000000 [ 796.273024] RDX: ffffffffa008db2c RSI: 0000000000000001 RDI: ffffffff8187a552 [ 796.280294] RBP: ffff88043ed83eb0 R08: 0000000000000005 R09: 0000000000000000 [ 796.287574] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804212e79d0 [ 796.294836] R13: ffff8803a1b22450 R14: ffff8803a1b22158 R15: 0000000000000000 [ 796.302070] ? intel_lrc_irq_handler+0x45c/0x490 [i915] [ 796.307395] ? _raw_spin_unlock_irqrestore+0x52/0x60 [ 796.312481] intel_lrc_irq_handler+0x45c/0x490 [i915] [ 796.317649] tasklet_hi_action+0xf0/0x110 [ 796.321738] __do_softirq+0x116/0x4c0 [ 796.325475] irq_exit+0xa9/0xc0 [ 796.328672] do_IRQ+0x6c/0x130 [ 796.331788] ? i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 796.337018] common_interrupt+0x90/0x90 [ 796.340938] RIP: 0010:osq_lock+0x77/0x110 [ 796.345034] RSP: 0018:ffffc9000071fbf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff18 [ 796.352731] RAX: 0000000000000000 RBX: ffff88043ed9ab40 RCX: 0000000000000002 [ 796.359976] RDX: ffff88042902cf40 RSI: ffffffff81c6eedd RDI: ffffffff81c7ce87 [ 796.367255] RBP: ffffc9000071fc08 R08: 0000000000000000 R09: 0000000000000000 [ 796.374553] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043ecdab40 [ 796.381860] R13: ffff8804212e00b0 R14: ffffffffa00732b4 R15: ffff8804212e0070 [ 796.389155] </IRQ> [ 796.391308] ? i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 796.396491] __mutex_lock+0x649/0x990 [ 796.400191] ? __mutex_lock+0xb0/0x990 [ 796.404022] ? i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 796.409282] ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915] [ 796.414533] mutex_lock_interruptible_nested+0x16/0x20 [ 796.419804] i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 796.424795] ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915] [ 796.430056] ? __might_fault+0x87/0x90 [ 796.433904] ? __might_fault+0x3e/0x90 [ 796.437701] drm_ioctl+0x200/0x450 [ 796.441140] ? i915_gem_object_get_page+0x60/0x60 [i915] [ 796.446522] ? retint_kernel+0x2d/0x2d [ 796.450335] do_vfs_ioctl+0x90/0x6e0 [ 796.453948] SyS_ioctl+0x3c/0x70 [ 796.457233] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 796.461940] RIP: 0033:0x7ff860d3d357 [ 796.465561] RSP: 002b:00007ffd5d2df588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 796.473233] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff860d3d357 [ 796.480469] RDX: 00007ffd5d2df5c0 RSI: 000000004020645c RDI: 0000000000000003 [ 796.487670] RBP: 00000000000003ee R08: 0000000000000004 R09: 0000000000000000 [ 796.494924] R10: 000000000000003a R11: 0000000000000246 R12: 00007ff8623d9fb8 [ 796.502187] R13: 0000000000000001 R14: 0000000000000fb8 R15: 0000000000000108 [ 796.509408] Code: c0 0f 85 08 ff ff ff 48 c7 c2 70 13 15 a0 be ee 02 00 00 48 c7 c7 a0 13 15 a0 c6 05 6f 61 15 00 01 e8 cc 47 0b e1 e9 e4 fe ff ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 54 53 4 [ 796.528463] RIP: notify_ring+0x219/0x220 [i915] RSP: ffff88043ed83c28 [ 796.535042] ---[ end trace dcc74bec3ebb6986 ]--- [ 798.943346] Kernel panic - not syncing: Fatal exception in interrupt [ 800.026540] Shutting down cpus with NMI [ 800.030440] Kernel Offset: disabled [ 800.182376] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [ 800.189634] ------------[ cut here ]------------ [ 800.194346] WARNING: CPU: 6 PID: 19066 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3a/0x40 [ 800.203972] Modules linked in: snd_hda_intel i915 vgem x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek] [ 800.234255] CPU: 6 PID: 19066 Comm: gem_exec_flush Tainted: G UD 4.11.0-rc1-CI-CI_DRM_2333+ #1 [ 800.244064] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2017 [ 800.253356] Call Trace: [ 800.255842] <IRQ> [ 800.257905] dump_stack+0x67/0x92 [ 800.261276] __warn+0xc6/0xe0 [ 800.264300] warn_slowpath_null+0x18/0x20 [ 800.268367] native_smp_send_reschedule+0x3a/0x40 [ 800.273160] trigger_load_balance+0x2cd/0x580 [ 800.277604] ? trigger_load_balance+0x6f/0x580 [ 800.282094] scheduler_tick+0x97/0xc0 [ 800.285848] ? tick_sched_handle.isra.7+0x30/0x30 [ 800.290629] update_process_times+0x42/0x50 [ 800.294876] tick_sched_handle.isra.7+0x29/0x30 [ 800.299479] tick_sched_timer+0x3d/0x70 [ 800.303378] __hrtimer_run_queues+0xf3/0x530 [ 800.307739] hrtimer_interrupt+0xb9/0x210 [ 800.311828] local_apic_timer_interrupt+0x31/0x50 [ 800.316614] smp_apic_timer_interrupt+0x33/0x50 [ 800.321225] apic_timer_interrupt+0x90/0xa0 [ 800.325489] RIP: 0010:panic+0x1c2/0x1fb [ 800.329404] RSP: 0018:ffff88043ed83990 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 800.337127] RAX: 0000000000000041 RBX: 0000000000000000 RCX: 0000000000000000 [ 800.344381] RDX: 0000000000010104 RSI: ffffffff81c6eedd RDI: ffffffff8118352e [ 800.351627] RBP: ffff88043ed83a00 R08: 0000000000000001 R09: 0000000000000000 [ 800.358899] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 800.366177] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 800.373415] ? panic+0x1bf/0x1fb [ 800.376701] ? kmsg_dump+0x11f/0x1c0 [ 800.380340] oops_end+0x78/0x90 [ 800.383529] die+0x46/0x60 [ 800.386302] do_trap+0xae/0x140 [ 800.389508] do_error_trap+0x88/0x120 [ 800.393246] ? notify_ring+0x219/0x220 [i915] [ 800.397690] ? enqueue_task_fair+0xb6/0xe90 [ 800.401946] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 800.406729] do_invalid_op+0x1b/0x20 [ 800.410379] invalid_op+0x18/0x20 [ 800.413758] RIP: 0010:notify_ring+0x219/0x220 [i915] [ 800.418811] RSP: 0018:ffff88043ed83c28 EFLAGS: 00010007 [ 800.424132] RAX: 0000000000000001 RBX: ffff8803a1b22158 RCX: 0000000081edfc31 [ 800.431393] RDX: 0000000081edfc30 RSI: 0000000000000000 RDI: ffff8804235aea20 [ 800.438673] RBP: ffff88043ed83c48 R08: 0000000000000001 R09: 0000000000000001 [ 800.445938] R10: 0000000000000000 R11: ffff88042902cf40 R12: ffff8804235aea20 [ 800.453191] R13: ffffc9001143bbf8 R14: ffff8803a1b221a8 R15: ffff8804212e0000 [ 800.460463] ? notify_ring+0x5f/0x220 [i915] [ 800.464837] gen8_gt_irq_handler+0x219/0x290 [i915] [ 800.469813] gen8_irq_handler+0x8e/0x6b0 [i915] [ 800.474424] __handle_irq_event_percpu+0x58/0x370 [ 800.479209] handle_irq_event_percpu+0x1e/0x50 [ 800.483742] handle_irq_event+0x34/0x60 [ 800.487641] handle_edge_irq+0xbe/0x150 [ 800.491541] handle_irq+0x15/0x20 [ 800.494914] do_IRQ+0x63/0x130 [ 800.498025] common_interrupt+0x90/0x90 [ 800.501941] RIP: 0010:_raw_spin_unlock_irqrestore+0x54/0x60 [ 800.507609] RSP: 0018:ffff88043ed83ea0 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff18 [ 800.515330] RAX: 0000000000000006 RBX: 0000000000000292 RCX: 0000000000000000 [ 800.522584] RDX: ffffffffa008db2c RSI: 0000000000000001 RDI: ffffffff8187a552 [ 800.529830] RBP: ffff88043ed83eb0 R08: 0000000000000005 R09: 0000000000000000 [ 800.537083] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804212e79d0 [ 800.544354] R13: ffff8803a1b22450 R14: ffff8803a1b22158 R15: 0000000000000000 [ 800.551641] ? intel_lrc_irq_handler+0x45c/0x490 [i915] [ 800.556965] ? _raw_spin_unlock_irqrestore+0x52/0x60 [ 800.562047] intel_lrc_irq_handler+0x45c/0x490 [i915] [ 800.567184] tasklet_hi_action+0xf0/0x110 [ 800.571265] __do_softirq+0x116/0x4c0 [ 800.575011] irq_exit+0xa9/0xc0 [ 800.578216] do_IRQ+0x6c/0x130 [ 800.581338] ? i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 800.586563] common_interrupt+0x90/0x90 [ 800.590497] RIP: 0010:osq_lock+0x77/0x110 [ 800.594571] RSP: 0018:ffffc9000071fbf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff18 [ 800.602257] RAX: 0000000000000000 RBX: ffff88043ed9ab40 RCX: 0000000000000002 [ 800.609538] RDX: ffff88042902cf40 RSI: ffffffff81c6eedd RDI: ffffffff81c7ce87 [ 800.616809] RBP: ffffc9000071fc08 R08: 0000000000000000 R09: 0000000000000000 [ 800.624063] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043ecdab40 [ 800.631299] R13: ffff8804212e00b0 R14: ffffffffa00732b4 R15: ffff8804212e0070 [ 800.638553] </IRQ> [ 800.640706] ? i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 800.645922] __mutex_lock+0x649/0x990 [ 800.649640] ? __mutex_lock+0xb0/0x990 [ 800.653471] ? i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 800.658696] ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915] [ 800.663931] mutex_lock_interruptible_nested+0x16/0x20 [ 800.669174] i915_gem_pread_ioctl+0x234/0x7f0 [i915] [ 800.674236] ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915] [ 800.679469] ? __might_fault+0x87/0x90 [ 800.683275] ? __might_fault+0x3e/0x90 [ 800.687088] drm_ioctl+0x200/0x450 [ 800.690580] ? i915_gem_object_get_page+0x60/0x60 [i915] [ 800.695981] ? retint_kernel+0x2d/0x2d [ 800.699793] do_vfs_ioctl+0x90/0x6e0 [ 800.703423] SyS_ioctl+0x3c/0x70 [ 800.706718] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 800.711415] RIP: 0033:0x7ff860d3d357 [ 800.715063] RSP: 002b:00007ffd5d2df588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 800.722768] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff860d3d357 [ 800.730047] RDX: 00007ffd5d2df5c0 RSI: 000000004020645c RDI: 0000000000000003 [ 800.737311] RBP: 00000000000003ee R08: 0000000000000004 R09: 0000000000000000 [ 800.744548] R10: 000000000000003a R11: 0000000000000246 R12: 00007ff8623d9fb8 [ 800.751819] R13: 0000000000000001 R14: 0000000000000fb8 R15: 0000000000000108 [ 800.759081] ---[ end trace dcc74bec3ebb6987 ]--- *** Bug 100193 has been marked as a duplicate of this bug. *** *** Bug 100081 has been marked as a duplicate of this bug. *** *** Bug 100112 has been marked as a duplicate of this bug. *** *** Bug 100084 has been marked as a duplicate of this bug. *** *** Bug 100083 has been marked as a duplicate of this bug. *** *** Bug 100082 has been marked as a duplicate of this bug. *** *** Bug 99726 has been marked as a duplicate of this bug. *** I hoped commit 429732e860fda07fc1bb96fe23c43146c27e08e0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 15 21:07:23 2017 +0000 drm/i915/breadcrumbs: Update bottom-half before marking as complete would take care of the trace Tomi reported. Still silent incompletes on farm1 - let's hope they are getting rarer. *** Bug 99742 has been marked as a duplicate of this bug. *** Now proclaiming fixed, see comment 23. Tomi found a separate issue with the same symptoms, i.e nothing recorded by CI, (bug 100232) that seems to have accounted for the last of them. *** Bug 100254 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.