Created attachment 118732 [details] journalctl -b-1 HSW 4770, instant freezes for about a week now with drm-intel-nightly, so far when ordinarily browsing with Chromium (full HW acceleration). No indication what specifically causes it. journalctl shows an earlier WARN_ON_ONCE(!ppgtt) as well, only since yesterday so probably unrelated.
Also got it on a 4200U without Chromium running. The freeze is as total as it is instant, most of the time journalctl can't write the trace to disk.
Created attachment 118793 [details] [review] Check for unpin_work under the spinlock Looks like the relevant information is in drm.debug=1, so try capturing the error dmesg with say drm.debug=3.
It would also be great if it was bisectable :)
So shall I check with your patch applied, or as is? I'll do a bisect once this is reproducible, otherwise it'll never finish.
I think the patch should mask the issue - if I have understood the basic mechanics of the oops. If you have the time to bisect (thanks in advance!) do so without the patch as that should make it easier to trigger.
Created attachment 118795 [details] journalctl with drm.debug=3 Here's one journalctl output with drm.debug=3, kernel as it was before. Not sure if it's any good as the actual crash didn't make it to disk again. Happened by running intel_gpu_top, compton in the bg, and resizing the window of a 4K (h.264) video in Chromium around in a crazed fashion. Like this: https://i.imgur.com/rKJ4zCj.jpg (The corrupted parts in the screenshot have been there for months, they're visible only when resizing a video, change as the window size changes, and blink about once or twice per second.)
Don't run intel_gpu_top it will hard hang your machine (eventually).
For what it's worth, I didn't see the telltale I was looking for in the drm.debug=3 dmesg (but I also presume that it was the hard lockup from intel_gpu_top).
Seems like that was intel_gpu_top then, only resizing doesn't appear to freeze. Back to random chance then.
Tried running four 4K videos in parallel with stress-ng -c 12 loitering in the bg, still wouldn't trigger it. So… until I happen on how to reproduce it, I'll stop running with drm.debug=3. Using that for hours just doesn't sound all that healthy for my SSD.
Ok, run with diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 71d7298648e0..850b11351c03 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11400,7 +11400,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc, * the hardware completed the operation behind our backs. */ if (__intel_pageflip_stall_check(dev, crtc)) { - DRM_DEBUG_DRIVER("flip queue: previous flip completed, continuing\n"); + DRM_ERROR("flip queue: previous flip completed, continuing\n"); page_flip_completed(intel_crtc); } else { DRM_DEBUG_DRIVER("flip queue: crtc already busy\n"); and lets see if that crops up just before the fatal oops.
Haven't encountered it since about two days now (assuming the last two cases were indeed from intel_gpu_top); maybe it was fixed. I'll look out for it for another week, then I guess this can be closed.
Not encountered anymore, so apparently fixed en passant.
Closing resolved+worksforme set by reporter after one year of no comments.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.