Bug 86644

Summary: [IVB+ Bisected]3cc134e drm/i915: sanitize rps irq enabling
Product: DRI Reporter: Guo Jinxian <jinxianx.guo>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: huax.lu, intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Guo Jinxian 2014-11-24 00:49:14 UTC
==System Environment==
--------------------------
Regression: Yes

Non-working platforms: IVB

==kernel==
--------------------------
origin/drm-intel-nightly: 0f8cb1fb8e01c53f9ad47344e9448d72df49fcf2
    drm-intel-nightly: 2014y-11m-21d-19h-18m-03s UTC integration manifest

==Bug detailed description==
(IVB)igt/gem_bad_reloc/negative-reloc PASS->NSPT
(IVB)igt/gem_reset_stats/ban-render PASS->DMESG_WARN
(IVB)igt/gem_reset_stats/ban-blt PASS->DMESG_WARN


Dmesg:
[ 277.025554] WARNING: CPU: 7 PID: 957 at drivers/gpu/drm/i915/i915_irq.c:275 gen6_enable_rps_interrupts+0x42/0xa6 [i915]()
[ 277.025592] WARN_ON(dev_priv->rps.pm_iir)
[ 277.025607] Modules linked in:
[ 277.025622] ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc ipv6 dm_mod snd_hda_codec_hdmi joydev iTCO_wdt tpm_infineon iTCO_vendor_support ppdev snd_hda_codec_realtek snd_hda_codec_generic serio_raw pcspkr snd_hda_intel i2c_i801 snd_hda_controller snd_hda_codec snd_hwdep snd_pcm firewire_ohci snd_timer firewire_core lpc_ich crc_itu_t snd mfd_core soundcore wmi battery parport_pc parport tpm_tis tpm acpi_cpufreq i915 button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea
[ 277.025921] CPU: 7 PID: 957 Comm: kworker/7:1 Not tainted 3.18.0-rc5_prts_6f2dca_20141121_debug+ #64
[ 277.025955] Hardware name: Hewlett-Packard HP Compaq Elite 8300 CMT/3396, BIOS K01 v02.05 05/07/2012
[ 277.025998] Workqueue: events intel_gen6_powersave_work [i915]
[ 277.026021] 0000000000000009 ffff8800d4c8fc38 ffffffff8182909d 00000000d4dae2e1
[ 277.026055] ffff8800d4c8fc88 ffff8800d4c8fc78 ffffffff8103b9cf 0000000000000000
[ 277.026089] ffffffffa00c4431 ffff88020d650000 ffff88020d658238 0000000000060000
[ 277.026123] Call Trace:
[ 277.026138] [] dump_stack+0x46/0x58
[ 277.026161] [] warn_slowpath_common+0x81/0x9b
[ 277.026195] [] ? gen6_enable_rps_interrupts+0x42/0xa6 [i915]
[ 277.026223] [] warn_slowpath_fmt+0x46/0x48
[ 277.026246] [] ? _raw_spin_lock_irq+0x3f/0x46
[ 277.026278] [] ? gen6_enable_rps_interrupts+0x1f/0xa6 [i915]
[ 277.026314] [] gen6_enable_rps_interrupts+0x42/0xa6 [i915]
[ 277.026350] [] intel_gen6_powersave_work+0xf69/0xf8a [i915]
[ 277.026379] [] ? process_one_work+0x1ac/0x400
[ 277.026402] [] process_one_work+0x228/0x400
[ 277.026425] [] ? process_one_work+0x1ac/0x400
[ 277.026449] [] worker_thread+0x288/0x37c
[ 277.026471] [] ? cancel_delayed_work_sync+0x15/0x15
[ 277.026497] [] kthread+0xf6/0xfe
[ 277.026517] [] ? kthread_create_on_node+0x1ac/0x1ac
[ 277.026541] [] ret_from_fork+0x7c/0xb0
[ 277.026562] [] ? kthread_create_on_node+0x1ac/0x1ac
[ 277.026586] ---[ end trace c6e8c793b9781d0f ]---

==Reproduce steps==
---------------------------- 
1. ./gem_bad_reloc --run-subtest negative-reloc

==Bisect results from PRTS==
----------------------------
Bisect shows: 3cc134e3ee09055d5a87193fc7eb0ecf4a59eaa1 is the first bad commit
Author:     Imre Deak <imre.deak@intel.com>
AuthorDate: Wed Nov 19 15:30:03 2014 +0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Nov 19 15:03:17 2014 +0100

    drm/i915: sanitize rps irq enabling
    
    Atm we first enable the RPS interrupts then we clear any pending ones.
    By this we could lose an interrupt arriving after we unmasked it. This
    may not be a problem as the caller should handle such a race, but logic
    still calls for the opposite order. Also we can delay enabling the
    interrupts until after all the RPS initialization is ready with the
    following order:
    
    1. disable left-over RPS (earlier via intel_uncore_sanitize)
    2. clear any pending RPS interrupts
    3. initialize RPS
    4. enable RPS interrupts
    
    This also allows us to do the 2. and 4. step the same way for all
    platforms, so let's follow this order to simplifying things.
    
    Also make sure any queued interrupts are also cleared.
    
    v2:
    - rebase on the GEN9 patches where we don't support RPS yet, so we
      musn't enable RPS interrupts on it (Paulo)
    v3:
    - avoid enabling RPS interrupts on GEN>9 too (Paulo)
    - clarify the RPS init sequence in the log message (Chris)
    - add POSTING_READ to gen6_reset_rps_interrupts() (Paulo)
    - WARN if any PM_IIR bits are set in gen6_enable_rps_interrupts()
      (Paulo)
    
    Signed-off-by: Imre Deak <imre.deak@intel.com>
    Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 1 Imre Deak 2014-11-24 07:01:17 UTC
(In reply to Guo Jinxian from comment #0)
> ==System Environment==
> --------------------------
> Regression: Yes
> 
> Non-working platforms: IVB
> 
> ==kernel==
> --------------------------
> origin/drm-intel-nightly: 0f8cb1fb8e01c53f9ad47344e9448d72df49fcf2
>     drm-intel-nightly: 2014y-11m-21d-19h-18m-03s UTC integration manifest
> 
> ==Bug detailed description==

> (IVB)igt/gem_bad_reloc/negative-reloc PASS->NSPT

Could you recheck if the bisect result for the above is correct? That is run the test on the parent commit of "drm/i915: sanitize rps irq enabling".

> (IVB)igt/gem_reset_stats/ban-render PASS->DMESG_WARN
> (IVB)igt/gem_reset_stats/ban-blt PASS->DMESG_WARN

Could you check if the following two patches fixes the above two problems:

http://lists.freedesktop.org/archives/intel-gfx/2014-November/055970.html
Comment 2 Guo Jinxian 2014-11-25 08:31:06 UTC
(In reply to Imre Deak from comment #1)
> (In reply to Guo Jinxian from comment #0)
> > ==System Environment==
> > --------------------------
> > Regression: Yes
> > 
> > Non-working platforms: IVB
> > 
> > ==kernel==
> > --------------------------
> > origin/drm-intel-nightly: 0f8cb1fb8e01c53f9ad47344e9448d72df49fcf2
> >     drm-intel-nightly: 2014y-11m-21d-19h-18m-03s UTC integration manifest
> > 
> > ==Bug detailed description==
> 
> > (IVB)igt/gem_bad_reloc/negative-reloc PASS->NSPT
> 
> Could you recheck if the bisect result for the above is correct? That is run
> the test on the parent commit of "drm/i915: sanitize rps irq enabling".
Case igt/gem_bad_reloc/negative-reloc always skips, I didn't find good commit.
> 
> > (IVB)igt/gem_reset_stats/ban-render PASS->DMESG_WARN
> > (IVB)igt/gem_reset_stats/ban-blt PASS->DMESG_WARN
> 
> Could you check if the following two patches fixes the above two problems:
> 
> http://lists.freedesktop.org/archives/intel-gfx/2014-November/055970.html

The failure still able to reproduce with this patch.

[root@x-ivb9 tests]# ./gem_reset_stats --run-subtest ban-render
IGT-Version: 1.8-gd807891 (x86_64) (Linux: 3.18.0-rc6_glody_d10cf9_20141125+ x86_64)
Subtest ban-render: SUCCESS (12.087s)
Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:881:
Test requirement: gem_has_vebox(fd)
[root@x-ivb9 tests]# echo $?
0
[root@x-ivb9 tests]# dmesg -r|egrep "<[1-4]>"|grep drm
<4>[   74.858431] WARNING: CPU: 3 PID: 2338 at drivers/gpu/drm/i915/i915_irq.c:284 gen6_enable_rps_interrupts+0x34/0x95 [i915]()
<4>[   74.858494]  battery dm_mod tpm_tis tpm acpi_cpufreq i915 button video drm_kms_helper drm
[root@x-ivb9 tests]# ./gem_reset_stats --run-subtest ban-blt
IGT-Version: 1.8-gd807891 (x86_64) (Linux: 3.18.0-rc6_glody_d10cf9_20141125+ x86_64)
Subtest ban-blt: SUCCESS (11.789s)
Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:881:
Test requirement: gem_has_vebox(fd)
[root@x-ivb9 tests]# dmesg -r|egrep "<[1-4]>"|grep drm
<4>[   74.858431] WARNING: CPU: 3 PID: 2338 at drivers/gpu/drm/i915/i915_irq.c:284 gen6_enable_rps_interrupts+0x34/0x95 [i915]()
<4>[   74.858494]  battery dm_mod tpm_tis tpm acpi_cpufreq i915 button video drm_kms_helper drm
<4>[  102.860728] WARNING: CPU: 2 PID: 707 at drivers/gpu/drm/i915/i915_irq.c:284 gen6_enable_rps_interrupts+0x34/0x95 [i915]()
<4>[  102.860791]  battery dm_mod tpm_tis tpm acpi_cpufreq i915 button video drm_kms_helper drm
Comment 3 Imre Deak 2014-11-25 08:56:50 UTC
(In reply to Guo Jinxian from comment #2)
> (In reply to Imre Deak from comment #1)
> > (In reply to Guo Jinxian from comment #0)
> > > ==System Environment==
> > > --------------------------
> > > Regression: Yes
> > > 
> > > Non-working platforms: IVB
> > > 
> > > ==kernel==
> > > --------------------------
> > > origin/drm-intel-nightly: 0f8cb1fb8e01c53f9ad47344e9448d72df49fcf2
> > >     drm-intel-nightly: 2014y-11m-21d-19h-18m-03s UTC integration manifest
> > > 
> > > ==Bug detailed description==
> > 
> > > (IVB)igt/gem_bad_reloc/negative-reloc PASS->NSPT
> > 
> > Could you recheck if the bisect result for the above is correct? That is run
> > the test on the parent commit of "drm/i915: sanitize rps irq enabling".
> Case igt/gem_bad_reloc/negative-reloc always skips, I didn't find good
> commit.

Ok, so it's an unrelated bug, could you open a new ticket for it?

> > > (IVB)igt/gem_reset_stats/ban-render PASS->DMESG_WARN
> > > (IVB)igt/gem_reset_stats/ban-blt PASS->DMESG_WARN
> > 
> > Could you check if the following two patches fixes the above two problems:
> > 
> > http://lists.freedesktop.org/archives/intel-gfx/2014-November/055970.html
> 
> The failure still able to reproduce with this patch.

Did you apply both patches? You need patch 1/2 and patch 2/2 at the above link.
Comment 4 Guo Jinxian 2014-11-27 05:55:41 UTC
Whit patch 055969 and 055970, test pass.


[root@x-ivb9 tests]# ./gem_bad_reloc --run-subtest negative-reloc
IGT-Version: 1.8-gb8f193b (x86_64) (Linux: 3.18.0-rc6_kcloud_d692c8_20141126+ x86_64)
Found offset 8192 for 4k batch
Batch is now at offset 266240
Subtest negative-reloc: SUCCESS (0.000s)
[root@x-ivb9 tests]# dmesg -r|egrep "<[1-4]>"|grep drm
Comment 5 lu hua 2014-12-10 01:53:31 UTC
Test on IVB and BDW with latest drm-intel-nightly kernel, it still exists.
[root@x-ivb9 tests]# ./gem_reset_stats --run-subtest ban-blt
IGT-Version: 1.8-gf333981 (x86_64) (Linux: 3.18.0_drm-intel-nightly_34d267_20141209+ x86_64)
Subtest ban-blt: SUCCESS (11.983s)
Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:881:
Test requirement: gem_has_vebox(fd)

[ 1670.744506] ------------[ cut here ]------------
[ 1670.746330] WARNING: CPU: 3 PID: 20373 at drivers/gpu/drm/i915/i915_irq.c:284 gen6_enable_rps_interrupts+0x34/0x94 [i915]()
[ 1670.748144] WARN_ON(dev_priv->rps.pm_iir)
[ 1670.748152] Modules linked in: snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek snd_hda_codec_generic dm_mod dcdbas serio_raw pcspkr i2c_i801 snd_hda_intel snd_hda_controller snd_hda_codec lpc_ich mfd_core snd_hwdep snd_pcm snd_timer snd soundcore battery tpm_tis tpm acpi_cpufreq i915 button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea
[ 1670.748154] CPU: 3 PID: 20373 Comm: kworker/3:1 Not tainted 3.18.0_drm-intel-nightly_34d267_20141209+ #2416
[ 1670.748154] Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
[ 1670.748160] Workqueue: events intel_gen6_powersave_work [i915]
[ 1670.748161]  0000000000000000 0000000000000009 ffffffff8178d5e2 ffff8800d4a63d58
[ 1670.748162]  ffffffff8103a8cc ffff8800d4a85800 ffffffffa00b07e4 0000000000000297
[ 1670.748163]  ffff880002ff0000 ffff8800da8ba000 ffff880002ff86d8 0000000000060000
[ 1670.748163] Call Trace:
[ 1670.748167]  [<ffffffff8178d5e2>] ? dump_stack+0x41/0x51
[ 1670.748171]  [<ffffffff8103a8cc>] ? warn_slowpath_common+0x78/0x90
[ 1670.748178]  [<ffffffffa00b07e4>] ? gen6_enable_rps_interrupts+0x34/0x94 [i915]
[ 1670.748179]  [<ffffffff8103a97c>] ? warn_slowpath_fmt+0x45/0x4a
[ 1670.748185]  [<ffffffffa008bd8f>] ? __gen6_update_ring_freq+0x133/0x14f [i915]
[ 1670.748190]  [<ffffffffa00b07e4>] ? gen6_enable_rps_interrupts+0x34/0x94 [i915]
[ 1670.748195]  [<ffffffffa008dba0>] ? intel_gen6_powersave_work+0xfa9/0xfca [i915]
[ 1670.748197]  [<ffffffff8104bbed>] ? process_one_work+0x1ae/0x31c
[ 1670.748198]  [<ffffffff8104bfd5>] ? worker_thread+0x255/0x350
[ 1670.748200]  [<ffffffff8104bd80>] ? process_scheduled_works+0x25/0x25
[ 1670.748201]  [<ffffffff8104f81e>] ? kthread+0xc5/0xcd
[ 1670.748203]  [<ffffffff8104f759>] ? kthread_freezable_should_stop+0x40/0x40
[ 1670.748204]  [<ffffffff81792bec>] ? ret_from_fork+0x7c/0xb0
[ 1670.748205]  [<ffffffff8104f759>] ? kthread_freezable_should_stop+0x40/0x40
[ 1670.748206] ---[ end trace b0d8bb257298ce09 ]---
Comment 6 Imre Deak 2014-12-10 16:22:40 UTC
*** Bug 87182 has been marked as a duplicate of this bug. ***
Comment 7 Jani Nikula 2014-12-15 17:17:08 UTC
commit dbea3cea69508e9d548ed4a6be13de35492e5d15
Author: Imre Deak <imre.deak@intel.com>
Date:   Mon Dec 15 18:59:28 2014 +0200

    drm/i915: sanitize RPS resetting during GPU reset

pushed to drm-intel-next-fixes.
Comment 8 Jiang Wang 2014-12-16 01:32:41 UTC
I have tested using latest commit.The rsult is PASS.
Comment 9 Elizabeth 2017-10-06 14:33:35 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.