Bug 102473

Summary: [BAT] igt@kms_frontbuffer_tracking@basic WARN_ON(fbc->active)
Product: DRI Reporter: Jani Saarinen <jani.saarinen>
Component: DRM/IntelAssignee: Marta Löfstedt <marta.lofstedt>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs, przanoni
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BDW i915 features: display/Other
Attachments:
Description Flags
take mutex in intel_fbc_init none

Description Jani Saarinen 2017-08-30 06:07:55 UTC
On CI igt@kms_frontbuffer_tracking gave following dmesg warning

Out	
IGT-Version: 1.19-gbf45d253 (x86_64) (Linux: 4.13.0-rc7-CI-CI_DRM_3016+ x86_64)
Primary screen: HDMI-A 1920x1080, crtc 0
FBC last action not supported
Can't test PSR: no usable eDP screen.
Sink CRC not supported: primary screen is not eDP
Subtest basic: SUCCESS (3.035s)
Environment	
PIGLIT_SOURCE_DIR="/opt/igt/piglit" PIGLIT_PLATFORM="mixed_glx_egl"
Command	
/opt/igt/tests/kms_frontbuffer_tracking --run-subtest basic
Dmesg	
[  361.782610] WARN_ON(fbc->active)
[  361.782647] ------------[ cut here ]------------
[  361.782713] WARNING: CPU: 1 PID: 4067 at drivers/gpu/drm/i915/intel_fbc.c:1173 __intel_fbc_disable+0xdf/0x110 [i915]
[  361.782719] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek i915 snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_pcm e1000e mei_me ptp lpc_ich prime_numbers mei pps_core i2c_hid
[  361.782829] CPU: 1 PID: 4067 Comm: kms_frontbuffer Tainted: G     U          4.13.0-rc7-CI-CI_DRM_3016+ #1
[  361.782834] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0362.2017.0118.0940 01/18/2017
[  361.782839] task: ffff880244cdcfc0 task.stack: ffffc90002980000
[  361.782899] RIP: 0010:__intel_fbc_disable+0xdf/0x110 [i915]
[  361.782905] RSP: 0018:ffffc900029839a0 EFLAGS: 00010286
[  361.782914] RAX: 0000000000000014 RBX: ffff880244320000 RCX: 0000000000000006
[  361.782919] RDX: 0000000000000006 RSI: ffffffff81cf777c RDI: ffffffff81cae616
[  361.782924] RBP: ffffc900029839b0 R08: ffff880244cdd900 R09: 0000000000000000
[  361.782929] R10: ffffc900029839b0 R11: 0000000000000000 R12: ffff880237d5e678
[  361.782933] R13: ffff8802443247c0 R14: ffff880243b674e8 R15: ffff880244320000
[  361.782939] FS:  00007fa3b01aea40(0000) GS:ffff880256c80000(0000) knlGS:0000000000000000
[  361.782944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  361.782949] CR2: 00007fa3b01ad000 CR3: 00000001e9dc0000 CR4: 00000000003406e0
[  361.782954] Call Trace:
[  361.783013]  intel_fbc_disable+0x61/0x70 [i915]
[  361.783178]  intel_atomic_commit_tail+0x135/0xf00 [i915]
[  361.783254]  ? intel_atomic_commit_ready+0x48/0x5c [i915]
[  361.783317]  ? __i915_sw_fence_complete+0x189/0x230 [i915]
[  361.783409]  intel_atomic_commit+0x1bd/0x280 [i915]
[  361.783432]  drm_atomic_commit+0x4b/0x50
[  361.783504]  hsw_pipe_A_crc_wa+0x62/0x160 [i915]
[  361.783588]  get_new_crc_ctl_reg+0x143/0x330 [i915]
[  361.783665]  intel_crtc_set_crc_source+0x7c/0x1e0 [i915]
[  361.783686]  crtc_crc_open+0xa4/0x2b0
[  361.783698]  ? rcu_read_lock_sched_held+0x7a/0x90
[  361.783710]  ? kmem_cache_alloc_trace+0x261/0x2d0
[  361.783730]  full_proxy_open+0xfd/0x1b0
[  361.783745]  ? u32_array_release+0x20/0x20
[  361.783759]  do_dentry_open.isra.1+0x1d3/0x2e0
[  361.783774]  vfs_open+0x47/0x70
[  361.783788]  path_openat+0x274/0x990
[  361.783799]  ? getname_flags+0x37/0x190
[  361.783822]  do_filp_open+0x8a/0xf0
[  361.783854]  ? _raw_spin_unlock+0x31/0x50
[  361.783866]  ? __alloc_fd+0xf8/0x210
[  361.783885]  do_sys_open+0x12f/0x200
[  361.783897]  ? do_sys_open+0x12f/0x200
[  361.783917]  SyS_openat+0x14/0x20
[  361.783929]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  361.783938] RIP: 0033:0x7fa3ae39f0fa
[  361.783947] RSP: 002b:00007ffefdd115a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  361.783967] RAX: ffffffffffffffda RBX: ffffffff8148a453 RCX: 00007fa3ae39f0fa
[  361.783975] RDX: 0000000000000000 RSI: 00007ffefdd11660 RDI: 0000000000000006
[  361.783984] RBP: ffffc90002983f88 R08: 0000000000000000 R09: 000000000000000f
[  361.783993] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
[  361.784081] R13: 0000564f58ddb9aa R14: 0000000000000001 R15: 0000564f59277450
[  361.784104]  ? __this_cpu_preempt_check+0x13/0x20
[  361.784134] Code: bf d8 27 a0 e8 54 b5 ef e0 0f ff 80 bb 02 49 00 00 00 0f 84 6f ff ff ff 48 c7 c6 d9 d8 27 a0 48 c7 c7 bf d8 27 a0 e8 32 b5 ef e0 <0f> ff 41 80 bc 24 e8 05 00 00 00 0f 84 5a ff ff ff 48 c7 c6 ee 
[  361.784826] ---[ end trace 2e67ae9be4bc8ab5 ]---
[  361.788431] WARN_ON(fbc->active)
[  361.788467] ------------[ cut here ]------------
[  361.788541] WARNING: CPU: 1 PID: 4067 at drivers/gpu/drm/i915/intel_fbc.c:1141 intel_fbc_enable+0x4a3/0x550 [i915]
[  361.788548] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek i915 snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_pcm e1000e mei_me ptp lpc_ich prime_numbers mei pps_core i2c_hid
[  361.788701] CPU: 1 PID: 4067 Comm: kms_frontbuffer Tainted: G     U  W       4.13.0-rc7-CI-CI_DRM_3016+ #1
[  361.788707] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0362.2017.0118.0940 01/18/2017
[  361.788714] task: ffff880244cdcfc0 task.stack: ffffc90002980000
[  361.788783] RIP: 0010:intel_fbc_enable+0x4a3/0x550 [i915]
[  361.788847] RSP: 0018:ffffc90002983920 EFLAGS: 00010282
[  361.788862] RAX: 0000000000000014 RBX: ffff880244320000 RCX: 0000000000000006
[  361.788869] RDX: 0000000000000006 RSI: ffffffff81cf777c RDI: ffffffff81cae616
[  361.788877] RBP: ffffc90002983970 R08: ffff880244cdd900 R09: 0000000000000000
[  361.788883] R10: ffffc90002983910 R11: 0000000000000000 R12: ffff880237d5e678
[  361.788890] R13: ffff880245ca6678 R14: ffff880249366758 R15: ffff880243b674e8
[  361.788897] FS:  00007fa3b01aea40(0000) GS:ffff880256c80000(0000) knlGS:0000000000000000
[  361.788904] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  361.788911] CR2: 00007fa3b01ad000 CR3: 00000001e9dc0000 CR4: 00000000003406e0
[  361.788917] Call Trace:
[  361.788996]  intel_update_crtc+0x7a/0xd0 [i915]
[  361.789164]  intel_update_crtcs+0x6b/0x80 [i915]
[  361.789268]  intel_atomic_commit_tail+0x2bf/0xf00 [i915]
[  361.789366]  ? intel_atomic_commit_ready+0x48/0x5c [i915]
[  361.789450]  ? __i915_sw_fence_complete+0x189/0x230 [i915]
[  361.789571]  intel_atomic_commit+0x1bd/0x280 [i915]
[  361.789597]  drm_atomic_commit+0x4b/0x50
[  361.789683]  hsw_pipe_A_crc_wa+0x62/0x160 [i915]
[  361.789801]  get_new_crc_ctl_reg+0x143/0x330 [i915]
[  361.789891]  intel_crtc_set_crc_source+0x7c/0x1e0 [i915]
[  361.789920]  crtc_crc_open+0xa4/0x2b0
[  361.789935]  ? rcu_read_lock_sched_held+0x7a/0x90
[  361.789952]  ? kmem_cache_alloc_trace+0x261/0x2d0
[  361.789977]  full_proxy_open+0xfd/0x1b0
[  361.789997]  ? u32_array_release+0x20/0x20
[  361.790072]  do_dentry_open.isra.1+0x1d3/0x2e0
[  361.790094]  vfs_open+0x47/0x70
[  361.790114]  path_openat+0x274/0x990
[  361.790131]  ? getname_flags+0x37/0x190
[  361.790166]  do_filp_open+0x8a/0xf0
[  361.790206]  ? _raw_spin_unlock+0x31/0x50
[  361.790222]  ? __alloc_fd+0xf8/0x210
[  361.790249]  do_sys_open+0x12f/0x200
[  361.790265]  ? do_sys_open+0x12f/0x200
[  361.790293]  SyS_openat+0x14/0x20
[  361.790309]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  361.790322] RIP: 0033:0x7fa3ae39f0fa
[  361.790335] RSP: 002b:00007ffefdd115a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  361.790358] RAX: ffffffffffffffda RBX: ffffffff8148a453 RCX: 00007fa3ae39f0fa
[  361.790369] RDX: 0000000000000000 RSI: 00007ffefdd11660 RDI: 0000000000000006
[  361.790380] RBP: ffffc90002983f88 R08: 0000000000000000 R09: 000000000000000f
[  361.790391] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
[  361.790406] R13: 0000564f58ddb9aa R14: 0000000000000001 R15: 0000564f59277450
[  361.790428]  ? __this_cpu_preempt_check+0x13/0x20
[  361.790458] Code: c6 00 95 29 a0 48 c7 c7 bf d8 27 a0 e8 18 9e ef e0 0f ff e9 f6 fb ff ff 48 c7 c6 d9 d8 27 a0 48 c7 c7 bf d8 27 a0 e8 fe 9d ef e0 <0f> ff e9 ce fb ff ff 49 8b 94 24 60 48 00 00 41 03 94 24 b8 5f 
[  361.791061] ---[ end trace 2e67ae9be4bc8ab6 ]---


Link: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3016/fi-bdw-5557u/igt@kms_frontbuffer_tracking@basic.html

Dmesg link: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3016/fi-bdw-5557u/dmesg0.log
Comment 1 Marta Löfstedt 2017-08-30 06:12:23 UTC
This is similar to what is discussed in https://bugs.freedesktop.org/show_bug.cgi?id=101623

However, BUG 101623 was originally
Comment 2 Marta Löfstedt 2017-08-30 06:19:00 UTC
(In reply to Marta Löfstedt from comment #1)
> This is similar to what is discussed in
> https://bugs.freedesktop.org/show_bug.cgi?id=101623
> 
> However, BUG 101623 was originally

OOPS, if you update the title of the bug, bugzilla saves non-finished comment.

However, BUG 101623 was originally spuriously hitting the "FBC enabled" fail. Sometime after 2017-08-25 the "FBC enabled" issue can not be reproduced on BDW. Instead there was the BUG 102410 issues. When BUG 102410 was "fixed" we are now hitting these WARN_ONs.

Since this is a new thing I believe it is better to discuss this in a bug separate from BUG 101623.
Comment 3 Marta Löfstedt 2017-08-30 07:53:21 UTC
Created attachment 133875 [details] [review]
take mutex in intel_fbc_init

I am doing some experiments with attached patch. So, far I have all kms_frontbuffer_tracking 5 times without hitting the WARN_ON.
Comment 4 Marta Löfstedt 2017-08-30 10:19:04 UTC
if I change the patch to grab the mutex just after in was initialized in: 
void intel_fbc_init(struct drm_i915_private *dev_priv)
 I didn't reproduce the issue over 10 loops of all kms_frontbuffer_tracking subtests.
Comment 5 Marta Löfstedt 2017-08-30 11:23:00 UTC
mäh! I managed to still hit the WARN_ON
Comment 6 Paulo Zanoni 2017-09-01 18:42:30 UTC
This is most probably a regression, right?

What does git-bisect tell us?
Comment 7 Marta Löfstedt 2017-09-08 08:50:37 UTC
(In reply to Paulo Zanoni from comment #6)
> This is most probably a regression, right?
> 
> What does git-bisect tell us?

If you are referring to "FBC disabled" -> WARN_ON(fbc->active)

That started with drm-tip@6e644626945c
Comment 9 Jani Saarinen 2017-11-27 08:10:50 UTC
Marta: for 6e644626945c7c1a7f4d4f83b806b898297846d0
From Ville:
"That patch will simply cause more modesets. I think it exposes some pre-existing race conditions in the fbc enable/disable sequences.
Chris had some patches, some of which didn't quite make sense, but the current code doesn't really make sense either."
Comment 10 Marta Löfstedt 2017-11-27 09:08:39 UTC
(In reply to Jani Saarinen from comment #9)
> Marta: for 6e644626945c7c1a7f4d4f83b806b898297846d0
> From Ville:
> "That patch will simply cause more modesets. I think it exposes some
> pre-existing race conditions in the fbc enable/disable sequences.
> Chris had some patches, some of which didn't quite make sense, but the
> current code doesn't really make sense either."

Yes, this "race" thing is what I tried to communicate to przanoni@gmail.com previously, and also why I did some experiments that didn't pan out. Is there some copying going on with the fbc struct somewhere?
Comment 11 Marta Löfstedt 2017-12-14 15:49:01 UTC
After changing display cable on my BDW NUCi5 I can no longer reproduce this issue.

Also, it hasn't been seen on the BAT BDWs for a long time

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.