Bug 88439

Summary: [BDW Bisected]igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrash-inactive-hang doesn't exit testing
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: christophe.prigent, intel-gfx-bugs
Version: unspecifiedKeywords: bisect_pending
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: BDW i915 features: GEM/Other
Attachments:
Description Flags
dmesg none

Description lu hua 2015-01-15 06:02:18 UTC
Created attachment 112266 [details]
dmesg

==System Environment==
--------------------------
Regression: no, new case, it also has bug 88358

Non-working platforms:  BDW

==kernel==
--------------------------
drm-intel-nightly/95cce4b4c5f3ecaf9c1c01d42f670da2748fcffb
commit 95cce4b4c5f3ecaf9c1c01d42f670da2748fcffb
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jan 14 20:58:31 2015 +0100

    drm-intel-nightly: 2015y-01m-14d-19h-58m-09s UTC integration manifest

==Bug detailed description==
-----------------------------
It doesn't exit testing, even though takes more than 10 minutes. ctrl+c, it also doesn't exit testing. I can connect it via ssh and run reboot. 
Following cases also has this issue.
igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrashing-hang
igt/gem_reloc_vs_gpu/forked-interruptible-faulting-reloc-thrash-inactive-hang
igt/gem_reloc_vs_gpu/forked-interruptible-faulting-reloc-thrashing-hang
igt/gem_reloc_vs_gpu/forked-interruptible-thrash-inactive-hang
igt/gem_reloc_vs_gpu/forked-interruptible-thrashing-hang
igt/gem_reloc_vs_gpu/forked-thrash-inactive-hang
igt/gem_reloc_vs_gpu/forked-thrashing-hang


[root@x-bdw01 tests]# time ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrash-inactive-hang
IGT-Version: 1.9-g3214a27 (x86_64) (Linux: 3.19.0-rc4_drm-intel-nightly_95cce4_20150115+ x86_64)
^C^

==Reproduce steps==
---------------------------- 
1. ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrash-inactive-hang
Comment 1 Chris Wilson 2015-01-15 07:53:17 UTC
(In reply to lu hua from comment #0)
> Created attachment 112266 [details]
> dmesg
> 
> ==System Environment==
> --------------------------
> Regression: no, new case, it also has bug 88358

That doesn't mean that it is not a regression in the kernel support for BDW...
Comment 2 lu hua 2015-01-16 05:47:37 UTC
It's skip on b05ddd4dfb63.
good commit: b05ddd4dfb6303ee9dde359ec913aa7a918fd813(2014-12-10)
bad commit: cd52471999ef08bca35568525c1d85d883ffedb6(2015-01-15)
Comment 3 lu hua 2015-01-16 07:50:57 UTC
c9dc0f35986c0e2fc81e0b71ddc7e3adad733829 is the first bad commit.
commit c9dc0f35986c0e2fc81e0b71ddc7e3adad733829
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Wed Dec 24 08:13:40 2014 -0800
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Jan 7 18:19:06 2015 +0100

    drm/i915: Add ioctl to set per-context parameters

    Sometimes we wish to tweak how an individual context behaves. Since we
    always create a context for every filp, this means that individual
    processes can fine tune their behaviour even if they do not explicitly
    create a context.

    The first example parameter here is to enable multi-process GPU testing,
    but the interface should be able to cope with passing arbitrarily complex
    parameters.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Testcase: igt/gem_reset_stats/ban-period-*
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 4 Chris Wilson 2015-01-16 08:25:33 UTC
Sigh, that's an enabling commit for the test, you have to apply that patch to your bisect base.
Comment 5 Yi Sun 2015-01-19 02:41:23 UTC
(In reply to Chris Wilson from comment #4)
> Sigh, that's an enabling commit for the test, you have to apply that patch
> to your bisect base.

That means we got a invalid bisect result?
Hua, could you please look into it, or re-bisect it basing your previous 'culprit' commit?
Comment 6 lu hua 2015-01-19 08:35:49 UTC
(In reply to Yi Sun from comment #5)
> (In reply to Chris Wilson from comment #4)
> > Sigh, that's an enabling commit for the test, you have to apply that patch
> > to your bisect base.
> 
> That means we got a invalid bisect result?
> Hua, could you please look into it, or re-bisect it basing your previous
> 'culprit' commit?


I think c9dc0f35986c0e2fc81e0b71ddc7e3adad733829 enable this test.
So Chris think enable this patch then test and bisect, right?
It's difficult to find out a good commit.
Comment 7 Chris Wilson 2015-01-19 11:07:40 UTC
It's a trick when bisecting to create a new branch and place the enabling commit at the start and then bisect into the failure.
Comment 8 cprigent 2015-11-17 17:27:45 UTC
Bug scrub
Humberto,
Could you check if still reproduced.
Thanks
Comment 9 Humberto Israel Perez Rodriguez 2016-01-07 22:15:28 UTC
The following tests keeps fail on BDW with the next configuration :

Case list :

time ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrashing-hang = takes more than 10 minutes and never ends
time ./gem_reloc_vs_gpu --run-subtest forked-interruptible-faulting-reloc-thrash-inactive-hang = fail
time ./gem_reloc_vs_gpu --run-subtest forked-interruptible-faulting-reloc-thrashing-hang = takes more than 10 minutes and never ends
time ./gem_reloc_vs_gpu --run-subtest forked-interruptible-thrash-inactive-hang = fail
time ./gem_reloc_vs_gpu --run-subtest forked-interruptible-thrashing-hang = takes more than 10 minutes and never ends
time ./gem_reloc_vs_gpu --run-subtest forked-thrash-inactive-hang = fail
time ./gem_reloc_vs_gpu --run-subtest forked-thrashing-hang =takes more than 10 minutes and never ends


kernel drm-intel-testing:

commit 91587c722c28c4116dedbfbf08aa874377bc76f8
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Dec 4 17:35:54 2015 +0100

    drm-intel-nightly: 2015y-12m-04d-16h-35m-07s UTC integration manifest


kernel version : 4.4.0-rc3
git url        : git://anongit.freedesktop.org/drm-intel
git branch     : drm-intel-testing
git describe   : drm-intel-next-2015-11-20-rebased-13721-g91587c7

kernel drm-intel-nightly

commit 79686f613b3955a4ed09cee936e7f70ec4e61b67
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Wed Dec 30 14:00:24 2015 +0200

    drm-intel-nightly: 2015y-12m-30d-11h-59m-54s UTC integration manifest


kernel version : 4.4.0-rc6
git url        : git://anongit.freedesktop.org/drm-intel
git branch     : drm-intel-nightly
git describe   : drm-intel-next-2015-12-18-1500-g79686f6

igt tools :
branch : intel-gpu-tools-1.13
commit : 2db78a4995a8ee298ae0cd68879baf80407a0e5e


cairo version: 1.15.2 / commit :  db8a7f1 
drm version :  libdrm-2.4.66  / commit : b38a4b2 
intel-driver : 1.6.2 / commit: 683edee
libva version : libva-1.6.2 / commit : 304bc13
mesa version : mesa-11.0.8 / commit : 261daab 
xf86-video-intel version : 2.99.917  / commit : baec802 
xserver version :xorg-server-1.18.0 / commit :7921764
Comment 10 Elio 2016-03-18 17:01:30 UTC
The problem persist with the following configuration:

++ Kernel version                      : 4.4.4-040404-generic
 ++ Linux distribution                  : Ubuntu 15.10
 ++ Architecture                        : 64-bit
 
 ++ xf86-video-intel version            : 2.99.917
 ++ Xorg-Xserver version                : 1.17.2
 ++ DRM version                         : 2.4.64
 ++ VAAPI version                       : Intel i965 driver for Intel(R) Broadwell - 1.6.0
 ++ Cairo version                       : 1.14.2
 ++ Intel GPU Tools version             : Tag [intel-gpu-tools-1.14-74-g431f6c4] / Commit [431f6c4]
 ++ Kernel driver in use                : i915
 ++ Bios revision                       : 5.6


 --- Hardware information ---

 ++ Platform                            :
 ++ Motherboard model                   :
 ++ Motherboard type                    : NUC5i7RYB Desktop
 ++ Motherboard manufacturer            :
 ++ CPU family                          : Core i7
 ++ CPU information                     : Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
 ++ GPU Card                            : Intel Corporation Broadwell-U Integrated Graphics (rev 09) (prog-if 00 [VGA controller])
 ++ Memory ram                          : 8 GB
 ++ Maximum memory ram allowed          : 16 GB
 ++ Display resolution                  :
 ++ CPU's number                        : 4
 ++ Hard drive capacity                 : 120 GB

 ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-thrashing-hang = takes more than 10 minutes and never ends

./gem_reloc_vs_gpu --run-subtest forked-interruptible-faulting-reloc-thrash-inactive-hang = fails
./gem_reloc_vs_gpu --run-subtest forked-interruptible-faulting-reloc-thrashing-hang = hangs

./gem_reloc_vs_gpu --run-subtest forked-interruptible-thrash-inactive-hang=
fail

./gem_reloc_vs_gpu --run-subtest forked-interruptible-thrashing-hang= hangs=fail

./gem_reloc_vs_gpu --run-subtest forked-thrash-inactive-han= fail
Comment 11 Chris Wilson 2016-09-09 17:52:09 UTC
commit 821ed7df6e2a1dbae243caebcfe21a0a4329fca0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 9 14:11:53 2016 +0100

    drm/i915: Update reset path to fix incomplete requests
    
    Update reset path in preparation for engine reset which requires
    identification of incomplete requests and associated context and fixing
    their state so that engine can resume correctly after reset.
    
    The request that caused the hang will be skipped and head is reset to the
    start of breadcrumb. This allows us to resume from where we left-off.
    Since this request didn't complete normally we also need to cleanup elsp
    queue manually. This is vital if we employ nonblocking request
    submission where we may have a web of dependencies upon the hung request
    and so advancing the seqno manually is no longer trivial.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.