Created attachment 112584 [details] dmesg ==System Environment== -------------------------- Regression: yes good commit: 1d83d957e621f160dfe0f08194e9c2fdd5fa7f3e bad commit: 93180785d44e3d417099e293b9ff6eeb4fd20aa2 no-working platforms: BSW ==kernel== -------------------------- drm-intel-nightly/d6bc7a6a0a7573350e8be8ec54002c20d1dbe1e0 commit d6bc7a6a0a7573350e8be8ec54002c20d1dbe1e0 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jan 20 15:10:59 2015 +0100 drm-intel-nightly: 2015y-01m-20d-14h-10m-40s UTC integration manifest ==Bug detailed description== ----------------------------- It causes system on drm-intel-nightly and drm-intel-next-queued kernel. output IGT-Version: 1.9-g032f30c (x86_64) (Linux: 3.19.0-rc4_drm-intel-next-queued_931807_20150121+ x86_64) Test requirement not met in function intel_require_memory, file intel_os.c:244: Test requirement: !(total <= required) Estimated that we need 6442455040 bytes for the test, but only have 1885339648 bytes available (RAM) Subtest major-hang: SKIP (0.043s) ==Reproduce steps== ---------------------------- 1. ./gem_evict_everything --run-subtest major-hang
gem_evict_everything/minor-hang also has this issue.
*** Bug 88655 has been marked as a duplicate of this bug. ***
It also happens on BDW.
./gem_evict_everything --run-subtest swapping-hang also causes system hang on SNB.
(In reply to lu hua from comment #4) > ./gem_evict_everything --run-subtest swapping-hang also causes system hang > on SNB. I doubt it is the same bug. Please file it separately and we can dup if it does match.
(In reply to Chris Wilson from comment #5) > (In reply to lu hua from comment #4) > > ./gem_evict_everything --run-subtest swapping-hang also causes system hang > > on SNB. > > I doubt it is the same bug. Please file it separately and we can dup if it > does match. OK, report bug 88821 to track it on SNB, Thanks.
Report bug 89000 to track BDW. Test on BSW with i915.enable_ppgtt=0, it works well.
Bisect shows: 6d3d8274bc45de4babb62d64562d92af984dd238 is the first bad commit. commit 6d3d8274bc45de4babb62d64562d92af984dd238 Author: Nick Hoath <nicholas.hoath@intel.com> AuthorDate: Thu Jan 15 13:10:39 2015 +0000 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Tue Jan 27 09:50:53 2015 +0100 drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request Move all remaining elements that were unique to execlists queue items in to the associated request. Issue: VIZ-4274 v2: Rebase. Fixed issue of overzealous freeing of request. v3: Removed re-addition of cleanup work queue (found by Daniel Vetter) v4: Rebase. v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix pointer in __i915_add_request (found by Thomas Daniel) v6: Removed unrelated changes Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> [danvet: Reformat comment with strange linebreaks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Please retest with current drm-intel-nightly that has commit f82107950e9bda3779610e37bdfdccae6fc16f87 Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Jan 29 16:55:07 2015 +0000 drm/i915: Fix a use-after-free in intel_execlists_retire_requests
*** Bug 88790 has been marked as a duplicate of this bug. ***
*** Bug 88845 has been marked as a duplicate of this bug. ***
*** Bug 88688 has been marked as a duplicate of this bug. ***
*** Bug 88840 has been marked as a duplicate of this bug. ***
*** Bug 89000 has been marked as a duplicate of this bug. ***
*** Bug 89005 has been marked as a duplicate of this bug. ***
*** Bug 88817 has been marked as a duplicate of this bug. ***
All the dupes have the same bisected bad commit.
*** Bug 88987 has been marked as a duplicate of this bug. ***
Note, when you test and verify this bug, please have a look at what the steps to reproduce were in the duplicate bugs, and see if they are truly fixed too. Thanks.
Test on the latest drm-intel-nightly kernel, this issue still exists. Test commit ad95125eaef18eebb9f47261ce3c99957f5953de Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Feb 9 21:31:04 2015 +0100 drm-intel-nightly: 2015y-02m-09d-20h-26m-16s UTC integration manifest
Test on the latest -nightly kernel, gem_concurrent_blit/cpu-bcs-early-read-forked-hang-blt also causes system hang, add i915.enable_ppgtt=0, it doesn't have hang issue.
Test some gem_concurrent_blit*hang* cases on BDW and BSW, they all cause system hang and have the same bisect commit.
I've submitted a fix for this issue: https://patchwork.kernel.org/patch/5819071/
Nick, for tracking purposes we keep the bugs open until we've merged the patches upstream. lu hua, please test Nick's patch.
(In reply to Nick Hoath from comment #23) > I've submitted a fix for this issue: > https://patchwork.kernel.org/patch/5819071/ Apply this patch on the latest -nightly kernel. Test ./gem_evict_everything --run-subtest major-hang on BDW and BSW, it works well. I will test the duplicate bugs later.
It also impacts SKL.
Nick's latest patch is at http://patchwork.freedesktop.org/patch/42508 Please test this, also against the tests in the duplicates.
(In reply to Jani Nikula from comment #27) > Nick's latest patch is at Another update, http://patchwork.freedesktop.org/patch/42729 > Please test this, also against the tests in the duplicates.
Fixed by commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 drm/i915: Fix a use after free, and unbalanced refcounting in drm-intel-fixes.
(In reply to Jani Nikula from comment #29) > Fixed by commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a Author: Nick Hoath > <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 > drm/i915: Fix a use after free, and unbalanced refcounting in > drm-intel-fixes. I had test with kernel commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a ./gem_evict_everything --run-subtest swapping-hang will cause system hang still. ./gem_evict_everything --run-subtest major-hang will report an error, claimed that it need 6G free mem for this case. Return number is 77.
Hi, What's the failure rate you're getting for ./gem_evict_everything --run-subtest swapping-hang ? I've just run it on BDW with the latest nightly, and I'm getting a 1/10 failure due to paging request BUG, with nothing to indicate 6d3d8274bc45de4babb62d64562d92af984dd238 is the cause. I can't run ./gem_evict_everything --run-subtest major-hang as there isn't enough memory on my system.
(In reply to Nick Hoath from comment #31) > Hi, > What's the failure rate you're getting for > ./gem_evict_everything --run-subtest swapping-hang ? > > I've just run it on BDW with the latest nightly, and I'm getting a 1/10 > failure due to paging request BUG, with nothing to indicate > 6d3d8274bc45de4babb62d64562d92af984dd238 is the cause. > > I can't run ./gem_evict_everything --run-subtest major-hang as there > isn't enough memory on my system. ./gem_evict_everything --run-subtest swapping-hang I tried it with nightly commit 855932144a48a66081a62288bea6f2bbbf48e2e7(2015-02-28) on BDW and 0b2a1076c5cb4f383d6a8c940ffab1e27f241097(2015-02-25) on BSW, the reproducible is 100%. ./gem_evict_everything --run-subtest major-hang This case need 6G free mem, I don't have a machine with so much mem available. However, this case should be able to run before(refer to comment 25), and the machine I use is the same as Lu Hua. What's the reason for this mem requirement increasememn?
Hi, I'm investigating the lockup I see, but please can I have your kernel console output when the hang occurs from: ./gem_evict_everything --run-subtest swapping-hang to see if it's a difference problem.
FWIW the hang I am investigating still occurs with without 6d3d8274bc45de4babb62d64562d92af984dd238, so it will need a new bug if the kernel console output matches.
Created attachment 114019 [details] console output---call trace from running case to hang
(In reply to wendy.wang from comment #35) > Created attachment 114019 [details] > console output---call trace from running case to hang This log was based 0b2a1076c5cb4f383d6a8c940ffab1e27f241097(2015-02-25) drm-intel-nightly kernel testing result.
This latest trace is the same problem I'm investigating. It pre-dates 6d3d8274bc45de4babb62d64562d92af984dd238. The original bug introduced in 6d3d8274bc45de4babb62d64562d92af984dd238 is fixed and the fix upstreamed, and as such I am closing this bug as fixed. I have created bug 89441 to track the newly reported (pre-existing) issue.
Moving old bug from Verified to Closed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.