Summary: | [BSW/SKL ppgtt Bisected]igt/gem_evict_everything/major-hang causes system hang | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||||
Component: | DRM/Intel | Assignee: | Nick Hoath <nicholas.hoath> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | critical | ||||||||
Priority: | highest | CC: | eero.t.tamminen, intel-gfx-bugs, james.ausmus, jeff.zheng, valtteri.rantala | ||||||
Version: | unspecified | ||||||||
Hardware: | All | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
lu hua
2015-01-21 06:43:21 UTC
gem_evict_everything/minor-hang also has this issue. *** Bug 88655 has been marked as a duplicate of this bug. *** It also happens on BDW. ./gem_evict_everything --run-subtest swapping-hang also causes system hang on SNB. (In reply to lu hua from comment #4) > ./gem_evict_everything --run-subtest swapping-hang also causes system hang > on SNB. I doubt it is the same bug. Please file it separately and we can dup if it does match. (In reply to Chris Wilson from comment #5) > (In reply to lu hua from comment #4) > > ./gem_evict_everything --run-subtest swapping-hang also causes system hang > > on SNB. > > I doubt it is the same bug. Please file it separately and we can dup if it > does match. OK, report bug 88821 to track it on SNB, Thanks. Report bug 89000 to track BDW. Test on BSW with i915.enable_ppgtt=0, it works well. Bisect shows: 6d3d8274bc45de4babb62d64562d92af984dd238 is the first bad commit. commit 6d3d8274bc45de4babb62d64562d92af984dd238 Author: Nick Hoath <nicholas.hoath@intel.com> AuthorDate: Thu Jan 15 13:10:39 2015 +0000 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Tue Jan 27 09:50:53 2015 +0100 drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request Move all remaining elements that were unique to execlists queue items in to the associated request. Issue: VIZ-4274 v2: Rebase. Fixed issue of overzealous freeing of request. v3: Removed re-addition of cleanup work queue (found by Daniel Vetter) v4: Rebase. v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix pointer in __i915_add_request (found by Thomas Daniel) v6: Removed unrelated changes Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> [danvet: Reformat comment with strange linebreaks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Please retest with current drm-intel-nightly that has commit f82107950e9bda3779610e37bdfdccae6fc16f87 Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Jan 29 16:55:07 2015 +0000 drm/i915: Fix a use-after-free in intel_execlists_retire_requests *** Bug 88790 has been marked as a duplicate of this bug. *** *** Bug 88845 has been marked as a duplicate of this bug. *** *** Bug 88688 has been marked as a duplicate of this bug. *** *** Bug 88840 has been marked as a duplicate of this bug. *** *** Bug 89000 has been marked as a duplicate of this bug. *** *** Bug 89005 has been marked as a duplicate of this bug. *** *** Bug 88817 has been marked as a duplicate of this bug. *** All the dupes have the same bisected bad commit. *** Bug 88987 has been marked as a duplicate of this bug. *** Note, when you test and verify this bug, please have a look at what the steps to reproduce were in the duplicate bugs, and see if they are truly fixed too. Thanks. Test on the latest drm-intel-nightly kernel, this issue still exists. Test commit ad95125eaef18eebb9f47261ce3c99957f5953de Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Feb 9 21:31:04 2015 +0100 drm-intel-nightly: 2015y-02m-09d-20h-26m-16s UTC integration manifest Test on the latest -nightly kernel, gem_concurrent_blit/cpu-bcs-early-read-forked-hang-blt also causes system hang, add i915.enable_ppgtt=0, it doesn't have hang issue. Test some gem_concurrent_blit*hang* cases on BDW and BSW, they all cause system hang and have the same bisect commit. I've submitted a fix for this issue: https://patchwork.kernel.org/patch/5819071/ Nick, for tracking purposes we keep the bugs open until we've merged the patches upstream. lu hua, please test Nick's patch. (In reply to Nick Hoath from comment #23) > I've submitted a fix for this issue: > https://patchwork.kernel.org/patch/5819071/ Apply this patch on the latest -nightly kernel. Test ./gem_evict_everything --run-subtest major-hang on BDW and BSW, it works well. I will test the duplicate bugs later. It also impacts SKL. Nick's latest patch is at http://patchwork.freedesktop.org/patch/42508 Please test this, also against the tests in the duplicates. (In reply to Jani Nikula from comment #27) > Nick's latest patch is at Another update, http://patchwork.freedesktop.org/patch/42729 > Please test this, also against the tests in the duplicates. Fixed by commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 drm/i915: Fix a use after free, and unbalanced refcounting in drm-intel-fixes. (In reply to Jani Nikula from comment #29) > Fixed by commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a Author: Nick Hoath > <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 > drm/i915: Fix a use after free, and unbalanced refcounting in > drm-intel-fixes. I had test with kernel commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a ./gem_evict_everything --run-subtest swapping-hang will cause system hang still. ./gem_evict_everything --run-subtest major-hang will report an error, claimed that it need 6G free mem for this case. Return number is 77. Hi, What's the failure rate you're getting for ./gem_evict_everything --run-subtest swapping-hang ? I've just run it on BDW with the latest nightly, and I'm getting a 1/10 failure due to paging request BUG, with nothing to indicate 6d3d8274bc45de4babb62d64562d92af984dd238 is the cause. I can't run ./gem_evict_everything --run-subtest major-hang as there isn't enough memory on my system. (In reply to Nick Hoath from comment #31) > Hi, > What's the failure rate you're getting for > ./gem_evict_everything --run-subtest swapping-hang ? > > I've just run it on BDW with the latest nightly, and I'm getting a 1/10 > failure due to paging request BUG, with nothing to indicate > 6d3d8274bc45de4babb62d64562d92af984dd238 is the cause. > > I can't run ./gem_evict_everything --run-subtest major-hang as there > isn't enough memory on my system. ./gem_evict_everything --run-subtest swapping-hang I tried it with nightly commit 855932144a48a66081a62288bea6f2bbbf48e2e7(2015-02-28) on BDW and 0b2a1076c5cb4f383d6a8c940ffab1e27f241097(2015-02-25) on BSW, the reproducible is 100%. ./gem_evict_everything --run-subtest major-hang This case need 6G free mem, I don't have a machine with so much mem available. However, this case should be able to run before(refer to comment 25), and the machine I use is the same as Lu Hua. What's the reason for this mem requirement increasememn? Hi, I'm investigating the lockup I see, but please can I have your kernel console output when the hang occurs from: ./gem_evict_everything --run-subtest swapping-hang to see if it's a difference problem. FWIW the hang I am investigating still occurs with without 6d3d8274bc45de4babb62d64562d92af984dd238, so it will need a new bug if the kernel console output matches. Created attachment 114019 [details]
console output---call trace from running case to hang
(In reply to wendy.wang from comment #35) > Created attachment 114019 [details] > console output---call trace from running case to hang This log was based 0b2a1076c5cb4f383d6a8c940ffab1e27f241097(2015-02-25) drm-intel-nightly kernel testing result. This latest trace is the same problem I'm investigating. It pre-dates 6d3d8274bc45de4babb62d64562d92af984dd238. The original bug introduced in 6d3d8274bc45de4babb62d64562d92af984dd238 is fixed and the fix upstreamed, and as such I am closing this bug as fixed. I have created bug 89441 to track the newly reported (pre-existing) issue. Moving old bug from Verified to Closed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.