Bug 108490

Summary: [CI][SHARDS] igt@syncobj_wait@wait(-all)?-for-submit-*- incomplete
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg gitKeywords: regression
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ALL i915 features: GEM/Other

Description Martin Peres 2018-10-19 08:33:53 UTC
The links may not be reachable right now because we have issues with the storage on 01.org, but here there are for the future:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5006/shard-skl9/igt@syncobj_wait@wait-all-for-submit-complex.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5006/shard-skl6/igt@syncobj_wait@reset-during-wait-for-submit.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5006/shard-skl5/igt@syncobj_wait@wait-all-for-submit-delayed-submit.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5006/shard-skl5/igt@syncobj_wait@wait-for-submit-complex.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5006/shard-skl4/igt@syncobj_wait@wait-for-submit-delayed-submit.html

<6> [363.585140] Console: switching to colour dummy device 80x25
<6> [363.585915] [IGT] syncobj_wait: executing
<6> [363.646916] [IGT] syncobj_wait: starting subtest wait-for-submit-delayed-submit
<4> [363.752517] 
<4> [363.752542] ============================================
<4> [363.752561] WARNING: possible recursive locking detected
<4> [363.752582] 4.19.0-rc8-CI-CI_DRM_5006+ #1 Tainted: G U 
<4> [363.752602] --------------------------------------------
<4> [363.752621] syncobj_wait/2297 is trying to acquire lock:
<4> [363.752641] 0000000029228054 (&(&syncobj->lock)->rlock){+.+.}, at: drm_syncobj_garbage_collection+0x25/0x140
<4> [363.752699] \x0abut task is already holding lock:
<4> [363.752721] 0000000029228054 (&(&syncobj->lock)->rlock){+.+.}, at: drm_syncobj_replace_fence+0x1b6/0x2c0
<4> [363.752768] \x0aother info that might help us debug this:
<4> [363.752791] Possible unsafe locking scenario:\x0a
<4> [363.752812] CPU0
<4> [363.752824] ----
<4> [363.752835] lock(&(&syncobj->lock)->rlock);
<4> [363.752856] lock(&(&syncobj->lock)->rlock);
<4> [363.752877] \x0a *** DEADLOCK ***\x0a
<4> [363.752905] May be due to missing lock nesting notation\x0a
<4> [363.752932] 1 lock held by syncobj_wait/2297:
<4> [363.752949] #0: 0000000029228054 (&(&syncobj->lock)->rlock){+.+.}, at: drm_syncobj_replace_fence+0x1b6/0x2c0
<4> [363.753001] \x0astack backtrace:
<4> [363.753030] CPU: 2 PID: 2297 Comm: syncobj_wait Tainted: G U 4.19.0-rc8-CI-CI_DRM_5006+ #1
<4> [363.753060] Hardware name: Google Caroline/Caroline, BIOS MrChromebox 08/27/2018
<4> [363.753083] Call Trace:
<4> [363.753109] dump_stack+0x67/0x9b
<4> [363.753136] __lock_acquire+0xc67/0x1b50
<4> [363.753164] ? deactivate_slab.isra.26+0x7a4/0x7e0
<4> [363.753203] ? __lock_acquire+0x3c8/0x1b50
<4> [363.753242] ? __wake_up_common_lock+0x5e/0xb0
<4> [363.753285] ? lock_acquire+0xa6/0x1c0
<4> [363.753306] lock_acquire+0xa6/0x1c0
<4> [363.753334] ? drm_syncobj_garbage_collection+0x25/0x140
<4> [363.753365] _raw_spin_lock+0x2a/0x40
<4> [363.753392] ? drm_syncobj_garbage_collection+0x25/0x140
<4> [363.753419] drm_syncobj_garbage_collection+0x25/0x140
<4> [363.753450] drm_syncobj_search_fence+0x38/0x200
<4> [363.753478] ? drm_syncobj_replace_fence+0x1b6/0x2c0
<4> [363.753509] syncobj_wait_syncobj_func+0x11/0x20
<4> [363.753536] drm_syncobj_replace_fence+0x1f8/0x2c0
<4> [363.753561] ? drm_syncobj_handle_to_fd_ioctl+0x170/0x170
<4> [363.753585] drm_syncobj_fd_to_handle_ioctl+0x120/0x1d0
<4> [363.753609] ? drm_syncobj_handle_to_fd_ioctl+0x170/0x170
<4> [363.753638] drm_ioctl_kernel+0x81/0xf0
<4> [363.753665] drm_ioctl+0x2e6/0x3a0
<4> [363.753686] ? drm_syncobj_handle_to_fd_ioctl+0x170/0x170
<4> [363.753717] ? lock_acquire+0xa6/0x1c0
<4> [363.753743] do_vfs_ioctl+0xa0/0x6d0
<4> [363.753769] ? __fget+0xfc/0x1e0
<4> [363.753792] ksys_ioctl+0x35/0x60
<4> [363.753816] __x64_sys_ioctl+0x11/0x20
<4> [363.753840] do_syscall_64+0x55/0x190
<4> [363.753866] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [363.753890] RIP: 0033:0x7f0d84f9a5d7
<4> [363.753911] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [363.753959] RSP: 002b:00007f0d79804ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [363.753990] RAX: ffffffffffffffda RBX: 00007f0d74000b20 RCX: 00007f0d84f9a5d7
<4> [363.754016] RDX: 00007f0d79804b50 RSI: 00000000c01064c2 RDI: 0000000000000005
<4> [363.754040] RBP: 00007f0d79804b50 R08: 0000000000000000 R09: 000000000000001e
<4> [363.754064] R10: 0000000000000054 R11: 0000000000000246 R12: 00000000c01064c2
<4> [363.754088] R13: 0000000000000005 R14: 00007f0d74000b20 R15: 000055ab44e2c3f8
Comment 1 Martin Peres 2018-10-19 09:54:09 UTC
The revert is already on the way: https://patchwork.freedesktop.org/patch/257591/
Comment 2 Chris Wilson 2018-10-23 17:07:15 UTC
commit 43cf1fc0e27e2f7eeb5d6c15fd023813a5b49987
Author: Chunming Zhou <david1.zhou@amd.com>
Date:   Tue Oct 23 17:37:45 2018 +0800

    drm: fix deadlock of syncobj v6
    
    v2:
    add a mutex between sync_cb execution and free.
    v3:
    clearly separating the roles for pt_lock and cb_mutex (Chris)
    v4:
    the cb_mutex should be taken outside of the pt_lock around
    this if() block. (Chris)
    v5:
    fix a corner case
    v6:
    tidy drm_syncobj_fence_get_or_add_callback up. (Chris)
    
    Tested by syncobj_basic and syncobj_wait of igt.
    
    Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Christian König <christian.koenig@amd.com>
    Cc: intel-gfx@lists.freedesktop.org
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Link: https://patchwork.kernel.org/patch/10652893/
Comment 3 Martin Peres 2018-10-30 16:12:28 UTC
(In reply to Chris Wilson from comment #2)
> commit 43cf1fc0e27e2f7eeb5d6c15fd023813a5b49987
> Author: Chunming Zhou <david1.zhou@amd.com>
> Date:   Tue Oct 23 17:37:45 2018 +0800
> 
>     drm: fix deadlock of syncobj v6
>     
>     v2:
>     add a mutex between sync_cb execution and free.
>     v3:
>     clearly separating the roles for pt_lock and cb_mutex (Chris)
>     v4:
>     the cb_mutex should be taken outside of the pt_lock around
>     this if() block. (Chris)
>     v5:
>     fix a corner case
>     v6:
>     tidy drm_syncobj_fence_get_or_add_callback up. (Chris)
>     
>     Tested by syncobj_basic and syncobj_wait of igt.
>     
>     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>     Cc: Daniel Vetter <daniel@ffwll.ch>
>     Cc: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Christian König <christian.koenig@amd.com>
>     Cc: intel-gfx@lists.freedesktop.org
>     Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Signed-off-by: Christian König <christian.koenig@amd.com>
>     Link: https://patchwork.kernel.org/patch/10652893/

Yep, seems very much fixed. Thanks for the link!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.