Summary: | [all Bisected]igt/drv_module_reload causes major memory corruption and system hang | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||
Component: | DRM/Intel | Assignee: | Matt Roper <matthew.d.roper> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | critical | ||||||
Priority: | high | CC: | conselvan2, intel-gfx-bugs | ||||
Version: | unspecified | ||||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
lu hua
2015-01-15 02:33:53 UTC
Big-bada-boom. Please, please bisect. ea2c67bb4affa84080c616920f3899f123786e56 is the first bad commit commit ea2c67bb4affa84080c616920f3899f123786e56 Author: Matt Roper <matthew.d.roper@intel.com> Date: Tue Dec 23 10:41:52 2014 -0800 drm/i915: Move to atomic plane helpers (v9) Switch plane handling to use the atomic plane helpers. This means that rather than provide our own implementations of .update_plane() and .disable_plane(), we expose the lower-level check/prepare/commit/cleanup entrypoints and let the DRM core implement update/disable for us using those entrypoints. The other main change that falls out of this patch is that our drm_plane's will now always have a valid plane->state that contains the relevant plane state (initial state is allocated at plane creation). The base drm_plane_state pointed to holds the requested source/dest coordinates, and the subclassed intel_plane_state holds the adjusted values that our driver actually uses. v2: - Renamed file from intel_atomic.c to intel_atomic_plane.c (Daniel) - Fix a copy/paste comment mistake (Bob) v3: - Use prepare/cleanup functions that we've already factored out - Use newly refactored pre_commit/commit/post_commit to avoid sleeping during vblank evasion v4: - Rebase to latest di-nightly requires adding an 'old_state' parameter to atomic_update; v5: - Must have botched a rebase somewhere and lost some work. Restore state 'dirty' flag to let begin/end code know which planes to run the pre_commit/post_commit hooks for. This would have actually shown up as broken in the next commit rather than this one. v6: - Squash kerneldoc patch into this one. - Previous patches have now already taken care of most of the infrastructure that used to be in this patch. All we're adding here now is some thin wrappers. v7: - Check return of intel_plane_duplicate_state() for allocation failures. v8: - Drop unused drm_plane_state -> intel_plane_state cast. (Ander) - Squash in actual transition to plane helpers. Significant refactoring earlier in the patchset has made the combined prep+transition much easier to swallow than it was in earlier iterations. (Ander) v9: - s/track_fbs/disabled_planes/ in the atomic crtc flags. The only fb's we need to update frontbuffer tracking for are those on a plane about to be disabled (since the atomic helpers never call prepare_fb() when disabling a plane), so the new name more accurately describes what we're actually tracking. Testcase: igt/kms_plane Testcase: igt/kms_universal_plane Testcase: igt/kms_cursor_crc Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 a2d045dfd034bfb9ec7a5ffb799fa148ff4047c3 1f7f88d3e9fe9cbe7b8409c1d868 52c65808d7a3 M Documentation :040000 040000 7660ab19978e64dac501827a5b4b1e050e88b529 68cd800423301596e3245b0410dc 1d1a19fc3b35 M drivers (In reply to Li Xu from comment #2) > ea2c67bb4affa84080c616920f3899f123786e56 is the first bad commit > commit ea2c67bb4affa84080c616920f3899f123786e56 > Author: Matt Roper <matthew.d.roper@intel.com> > Date: Tue Dec 23 10:41:52 2014 -0800 > > drm/i915: Move to atomic plane helpers (v9) > commit ea2c67bb4affa84080c616920f3899f123786e56 Author: Matt Roper <matthew.d.roper@intel.com> AuthorDate: Tue Dec 23 10:41:52 2014 -0800 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Mon Jan 12 23:59:31 2015 +0100 drm/i915: Move to atomic plane helpers (v9) Please always assign to bisected bad commit author! I'm also CC'ing Ander for being the reviewer. At a very quick glance, only intel_plane_duplicate_state looks suspicious. Is plane->state always valid when non-NULL? Matt, Ander? (In reply to Jani Nikula from comment #4) > Please always assign to bisected bad commit author! I'm also CC'ing Ander > for being the reviewer. > > At a very quick glance, only intel_plane_duplicate_state looks suspicious. > Is plane->state always valid when non-NULL? > > Matt, Ander? It should, but maybe I missed something in the review. But the error checking in intel_plane_duplicate_state() and callers seems to be correct. Perhaps the bug is in the plane helpers. (In reply to Ander Conselvan de Oliveira from comment #5) > (In reply to Jani Nikula from comment #4) > > Please always assign to bisected bad commit author! I'm also CC'ing Ander > > for being the reviewer. > > > > At a very quick glance, only intel_plane_duplicate_state looks suspicious. > > Is plane->state always valid when non-NULL? > > > > Matt, Ander? > > It should, but maybe I missed something in the review. But the error > checking in intel_plane_duplicate_state() and callers seems to be correct. > Perhaps the bug is in the plane helpers. Actually the problem is somewhere with intel_plane_destroy. intel_plane_state_destroy doesn't set the plane to NULL and then something bad happens when drm_plane_cleanup tries to destroy it again. (In reply to Ander Conselvan de Oliveira from comment #6) > (In reply to Ander Conselvan de Oliveira from comment #5) > > (In reply to Jani Nikula from comment #4) > > > Please always assign to bisected bad commit author! I'm also CC'ing Ander > > > for being the reviewer. > > > > > > At a very quick glance, only intel_plane_duplicate_state looks suspicious. > > > Is plane->state always valid when non-NULL? > > > > > > Matt, Ander? > > > > It should, but maybe I missed something in the review. But the error > > checking in intel_plane_duplicate_state() and callers seems to be correct. > > Perhaps the bug is in the plane helpers. > > Actually the problem is somewhere with intel_plane_destroy. > intel_plane_state_destroy doesn't set the plane to NULL and then something > bad happens when drm_plane_cleanup tries to destroy it again. Yep, Ander's analysis looks correct. When I first wrote the patchset, the DRM core wasn't cleaning up plane state, so I had to call the state destruction in intel_plane_destroy() so that we wouldn't leak it on driver unload. But the core got updated to do the destruction before my patches actually landed and I didn't notice before this series got merged, so we're now doing a double kfree() (and possibly a double framebuffer unreference as well). The fix is to just not cleanup the plane state in the driver anymore since the core will handle it for us. I'll send a patch for that shortly. The following patch should solve the problem: [PATCH] drm/i915: Don't cleanup plane state in intel_plane_destroy() (In reply to Matt Roper from comment #8) > The following patch should solve the problem: > > [PATCH] drm/i915: Don't cleanup plane state in intel_plane_destroy() Also available at http://patchwork.freedesktop.org/patch/40566/ - please test. (In reply to Jani Nikula from comment #9) > (In reply to Matt Roper from comment #8) > > The following patch should solve the problem: > > > > [PATCH] drm/i915: Don't cleanup plane state in intel_plane_destroy() > > Also available at http://patchwork.freedesktop.org/patch/40566/ - please > test. Apply this patch on the latest -nightly kernel, the hang goes away. output shows "module successfully unloaded" but not reload, report bug 88573 to track it. I think your patch fixed this bug. commit fd7bf8b626e1f9cfb9cd5c3104dcedae2569899b Author: Matt Roper <matthew.d.roper@intel.com> Date: Fri Jan 16 07:25:24 2015 -0800 drm/i915: Don't cleanup plane state in intel_plane_destroy() Verified.Fixed on the latest -nightly kernel. [root@x-hsw27 tests]# ./drv_module_reload unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device module successfully unloaded module successfully loaded again Closing old verified+fixed |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.