Bug 89550

Summary: [SNB+ Regression bisected]igt/pm_rpm/universal-planes cause call trace
Product: DRI Reporter: Ding Heng <hengx.ding>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: intel-gfx-bugs, matthew.d.roper, przanoni, ville.syrjala
Version: DRI gitKeywords: bisect_pending
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
possible fix none

Description Ding Heng 2015-03-12 03:24:57 UTC
Created attachment 114240 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes, bisect later

Non-working platforms: SNB

==kernel==
--------------------------
origin/drm-intel-nightly: f72a97e5af1d406f961958509cee53431fe61a46(2015-03-12)

==Bug detailed description==
-----------------------------
Run igt/pm_rpm/universal-planes  will cause call trace in dmesg:

[root@x-hnr9 ~]# dmesg -r|egrep "<[1-4]>"|grep drm
<4>[  126.993092] WARNING: CPU: 0 PID: 4073 at drivers/gpu/drm/drm_irq.c:1133 drm_wait_one_vblank+0x3b/0x16d [drm]()
<4>[  126.993095] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi ppdev snd_hda_codec_idt snd_hda_codec_generic joydev firewire_ohci pcspkr serio_raw uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev snd_hda_intel lpc_ich snd_hda_controller mfd_core snd_hda_codec snd_hwdep snd_pcm firewire_core snd_timer crc_itu_t snd soundcore wmi battery parport_pc parport tpm_infineon tpm_tis tpm ac acpi_cpufreq i915 button video drm_kms_helper drm
<4>[  126.993117] CPU: 0 PID: 4073 Comm: pm_rpm Not tainted 4.0.0-rc3_drm-intel-nightly_f72a97_20150312+ #120
<4>[  126.993141]  [<ffffffffa0005f43>] ? drm_wait_one_vblank+0x3b/0x16d [drm]
<4>[  126.993150]  [<ffffffffa0005f43>] ? drm_wait_one_vblank+0x3b/0x16d [drm]
<4>[  126.993173]  [<ffffffffa005290d>] ? drm_plane_helper_commit+0x16c/0x224 [drm_kms_helper]
<4>[  126.993208]  [<ffffffffa000c1e7>] ? drm_mode_set_config_internal+0x4e/0xd2 [drm]
<4>[  126.993215]  [<ffffffffa000fe69>] ? drm_mode_setcrtc+0x3fe/0x4f4 [drm]
<4>[  126.993221]  [<ffffffffa00047d4>] ? drm_ioctl+0x344/0x3b3 [drm]
<4>[  126.993230]  [<ffffffffa000fa6b>] ? drm_mode_setplane+0x1dc/0x1dc [drm]
[root@x-hnr9 ~]# dmesg -r|egrep "<[1-4]>"
<4>[    3.334805] ACPI: Deprecated procfs I/F for AC is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
<4>[    3.348697] ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042f conflicts with OpRegion 0x0000000000000400-0x000000000000047f (\PMIO) (20150204/utaddress-258)
<4>[    3.348884] ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054f conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20150204/utaddress-258)
<4>[    3.349053] ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053f conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20150204/utaddress-258)
<4>[    3.349220] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052f conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20150204/utaddress-258)
<4>[    3.349387] lpc_ich: Resource conflict(s) found affecting gpio_ich
<4>[    3.385912] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
<4>[    3.385995] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
<3>[    5.271944] systemd[1]: Failed to insert module 'ipv6'
<3>[    6.491482] systemd-readahead[2586]: Failed to create fanotify object: Function not implemented
<4>[  126.993078] ------------[ cut here ]------------
<4>[  126.993092] WARNING: CPU: 0 PID: 4073 at drivers/gpu/drm/drm_irq.c:1133 drm_wait_one_vblank+0x3b/0x16d [drm]()
<4>[  126.993094] vblank not available on crtc 0, ret=-22
<4>[  126.993095] Modules linked in: dm_mod iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi ppdev snd_hda_codec_idt snd_hda_codec_generic joydev firewire_ohci pcspkr serio_raw uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev snd_hda_intel lpc_ich snd_hda_controller mfd_core snd_hda_codec snd_hwdep snd_pcm firewire_core snd_timer crc_itu_t snd soundcore wmi battery parport_pc parport tpm_infineon tpm_tis tpm ac acpi_cpufreq i915 button video drm_kms_helper drm
<4>[  126.993117] CPU: 0 PID: 4073 Comm: pm_rpm Not tainted 4.0.0-rc3_drm-intel-nightly_f72a97_20150312+ #120
<4>[  126.993118] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.22 12/22/2011
<4>[  126.993120]  0000000000000000 0000000000000009 ffffffff81794128 ffff8800ae87ba38
<4>[  126.993122]  ffffffff8103bd5a 0000000100000000 ffffffffa0005f43 0000000000000000
<4>[  126.993125]  ffff8800b8186000 0000000000000000 ffff8800b8146800 ffffffffa010c870
<4>[  126.993127] Call Trace:
<4>[  126.993132]  [<ffffffff81794128>] ? dump_stack+0x40/0x50
<4>[  126.993135]  [<ffffffff8103bd5a>] ? warn_slowpath_common+0x98/0xb0
<4>[  126.993141]  [<ffffffffa0005f43>] ? drm_wait_one_vblank+0x3b/0x16d [drm]
<4>[  126.993144]  [<ffffffff8103bdb7>] ? warn_slowpath_fmt+0x45/0x4a
<4>[  126.993150]  [<ffffffffa0005f43>] ? drm_wait_one_vblank+0x3b/0x16d [drm]
<4>[  126.993153]  [<ffffffff813f16f8>] ? __pm_runtime_resume+0x5b/0x6a
<4>[  126.993169]  [<ffffffffa00c3710>] ? intel_finish_crtc_commit+0x47/0x10b [i915]
<4>[  126.993173]  [<ffffffffa005290d>] ? drm_plane_helper_commit+0x16c/0x224 [drm_kms_helper]
<4>[  126.993176]  [<ffffffff810e5018>] ? kmemdup+0x18/0x2c
<4>[  126.993188]  [<ffffffffa00cde12>] ? __intel_set_mode+0x313/0x8bd [i915]
<4>[  126.993200]  [<ffffffffa00d3d6a>] ? intel_crtc_set_config+0x89a/0xbaa [i915]
<4>[  126.993208]  [<ffffffffa000c1e7>] ? drm_mode_set_config_internal+0x4e/0xd2 [drm]
<4>[  126.993215]  [<ffffffffa000fe69>] ? drm_mode_setcrtc+0x3fe/0x4f4 [drm]
<4>[  126.993221]  [<ffffffffa00047d4>] ? drm_ioctl+0x344/0x3b3 [drm]
<4>[  126.993224]  [<ffffffff8133f69f>] ? sprintf+0x46/0x4b
<4>[  126.993230]  [<ffffffffa000fa6b>] ? drm_mode_setplane+0x1dc/0x1dc [drm]
<4>[  126.993233]  [<ffffffff8111fb0e>] ? do_vfs_ioctl+0x360/0x424
<4>[  126.993236]  [<ffffffff810a254c>] ? __audit_syscall_entry+0xb3/0xd3
<4>[  126.993238]  [<ffffffff8100d493>] ? syscall_trace_enter_phase1+0x11a/0x123
<4>[  126.993241]  [<ffffffff8111fc1b>] ? SyS_ioctl+0x49/0x7a
<4>[  126.993243]  [<ffffffff81799b8e>] ? int_check_syscall_exit_work+0x34/0x3d
<4>[  126.993245]  [<ffffffff81799972>] ? system_call_fastpath+0x12/0x17
<4>[  126.993247] ---[ end trace 2fd9622063df10d9 ]---

==Reproduce steps==
---------------------------- 
./pm_rpm --run-subtest universal-planes
Comment 1 Ding Heng 2015-03-12 07:06:51 UTC
add SKL in this bug.
Comment 2 Paulo Zanoni 2015-03-12 19:50:07 UTC
I can reproduce a very similar backtrace on BDW with pm_rpm/legacy-planes.

Please bisect this.
Comment 3 Ding Heng 2015-03-13 10:12:20 UTC
(In reply to Paulo Zanoni from comment #2)
> I can reproduce a very similar backtrace on BDW with pm_rpm/legacy-planes.
> 
> Please bisect this.

When I tried to bisect on next-queued branch I met a different issue, pls repfer to Bug 89568
Comment 4 lu hua 2015-03-18 06:26:10 UTC
It impacts SNB+ platforms.
Following 4 case have this issue.
igt/pm_rpm/legacy-planes
igt/pm_rpm/legacy-planes-dpms
igt/pm_rpm/universal-planes
igt/pm_rpm/universal-planes-dpms
Comment 5 Paulo Zanoni 2015-03-18 18:14:24 UTC
I did some debugging, here is some relevant information:

At pm_rpm.c:test_one_plane() we set a mode in one monitor, then we set a plane with drmModeSetPlane(), and then we call disable_or_dpms_all_screens_and_wait(), which, as the name says, disables all the modes set. It is this final call that triggers the Kernel WARN.

The problem seems to be that, at __intel_set_mode(), we first call intel_crtc_disable(), which disables everything, calls drm_vblank_off(), and also flips intel_crtc->atomic.wait_vblank to true.

Later at __intel_set_mode(), we indirectly call intel_finish_crtc_commit(), which calls intel_wait_for_vblank() if intel_crtc->atomic.wait_vblank is true. And we get the WARN because we already disabled vblanks in the previous paragraph.

So a way to "hide" the WARN would be to just change {ivb,ilk}_disable_plane(), removing the line that sets intel_crtc->atomic.wait_vblank to true. I didn't check how to do this on SKL.

That said, Matt or Ville, do you have any comments or insights here?
Comment 6 Paulo Zanoni 2015-03-18 19:13:49 UTC
Created attachment 114448 [details] [review]
possible fix

Hi

Can you please confirm this fixes the problem, at least from SNB to BDW? I'm not sure it's going to fix SKL, but testing won't hurt :)

Thanks,
Paulo
Comment 7 Paulo Zanoni 2015-03-18 21:27:52 UTC
(In reply to Ding Heng from comment #3)
> (In reply to Paulo Zanoni from comment #2)
> > I can reproduce a very similar backtrace on BDW with pm_rpm/legacy-planes.
> > 
> > Please bisect this.
> 
> When I tried to bisect on next-queued branch I met a different issue, pls
> repfer to Bug 89568

Bug 89568 appears to only happen on SNB. What happens if you bisect on HSW or BDW?
Comment 8 Matt Roper 2015-03-19 00:07:16 UTC
Here's a patch that I think solves the bug a little closer to the source:
  http://patchwork.freedesktop.org/patch/44960/

However if this winds up not working, Paulo's proposed fix looks correct to me and should be mergeable.
Comment 9 Ding Heng 2015-03-19 08:57:47 UTC
2fdd7def16dd7580f297827930126c16b152ec11 is the first bad commit
commit 2fdd7def16dd7580f297827930126c16b152ec11
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Wed Mar 4 10:49:04 2015 -0800

    drm/i915: Don't clobber plane state on internal disables

    We need to disable all sprite planes when disabling the CRTC.  We had
    been using the top-level atomic 'disable' entrypoint to accomplish this,
    which was wrong.  Not only can this lead to various locking issues, it
    also modifies the actual plane state, making it impossible to restore
    the plane properly later.  For example, a DPMS off followed by a DPMS on
    will result in any sprite planes in use not being restored properly.

    The proper solution here is to call directly into our 'commit plane'
    hook with a copy of the plane's current state that has 'visible' set to
    false.  Committing this dummy state will turn off the plane, but will
    not touch the actual plane->state pointer, allowing us to properly
    restore the plane state later.

    Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 fc36308f0bd42314b67ba401f783e47baaa9d06b 437fc3efb5aa5d569bf3bda658284125724b0d24 M      drivers
Comment 10 Ding Heng 2015-03-19 09:01:39 UTC
2fdd7def16dd7580f297827930126c16b152ec11 is the first bad commit
commit 2fdd7def16dd7580f297827930126c16b152ec11
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Wed Mar 4 10:49:04 2015 -0800

    drm/i915: Don't clobber plane state on internal disables

    We need to disable all sprite planes when disabling the CRTC.  We had
    been using the top-level atomic 'disable' entrypoint to accomplish this,
    which was wrong.  Not only can this lead to various locking issues, it
    also modifies the actual plane state, making it impossible to restore
    the plane properly later.  For example, a DPMS off followed by a DPMS on
    will result in any sprite planes in use not being restored properly.

    The proper solution here is to call directly into our 'commit plane'
    hook with a copy of the plane's current state that has 'visible' set to
    false.  Committing this dummy state will turn off the plane, but will
    not touch the actual plane->state pointer, allowing us to properly
    restore the plane state later.

    Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 fc36308f0bd42314b67ba401f783e47baaa9d06b 437fc3efb5aa5d569bf3bda658284125724b0d24 M      drivers
Comment 11 Paulo Zanoni 2015-03-20 19:06:07 UTC
Fixed by "drm/i915: Move vblank wait determination to 'check' phase". If the bug still happens, please reopen.
Comment 12 Ding Heng 2015-03-23 01:38:39 UTC
(In reply to Paulo Zanoni from comment #11)
> Fixed by "drm/i915: Move vblank wait determination to 'check' phase". If the
> bug still happens, please reopen.

Verified on this commit, change state to verified.
Comment 13 Elizabeth 2017-10-06 14:31:08 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.