Created attachment 126886 [details]
I might have spoke too soon with the memory manager patches, I'm seeing a stack trace just as the machine is just about to switch off.
Also it takes about 30 seconds to switch off my laptop now, I think it's amdgpu related, it seems to wait then fire up the card then switch off - it could also be hard disk or even systemd related though.
I'm attaching the screen shot but it looks like an issue with ttm_bo_force_list_clean
Sorry about the bad quality but I had to record a video in slowmo to capture it, then screenshot that
Does cherry-picking this patch over help?
Yes that fixes it
I've been having a more and more difficult time testing stuff of late, there's been quite a few regressions and I've been carrying more and more patches amongst various branches - lets hope the next cycle will be better
What's your handle on IRC?
(In reply to Mike Lothian from comment #2)
> Yes that fixes it
> I've been having a more and more difficult time testing stuff of late,
> there's been quite a few regressions and I've been carrying more and more
> patches amongst various branches - lets hope the next cycle will be better
Well, bug fixes go to -fixes and new features go to -next. If you want everything, you'd need to merge -fixes into -next.
> What's your handle on IRC?
Sorry I spoke too soon, the issue is still there, it's just more difficult to see as the reboot is so quick now
Maybe a different issue but I've just started getting shutdown issues with agd5f drm-next-4.9-wip
It seems the monitor blanks early so I don't get to see anything - just with halt it doesn't power off.
On current kernel reverting
drm/amdgpu: always apply pci shutdown callbacks (v2)
Apparently fixes it, but it's not that simple. I first saw the issue on the 25th, but with the next update the branch got it went away, so I thought it was fixed. It re-appeared with more recent updates.
Unfortunately it seems the my working recent kernel (26th) has the above commit - so maybe some interaction/timing issue with something else.
I'm still seeing this issue on the 4.9-wip branch and that has this patch included:
@@ -1708,11 +1708,11 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
DRM_INFO("amdgpu: finishing device.\n");
adev->shutdown = true;
/* evict vram memory */
r = amdgpu_fini(adev);
Created attachment 127331 [details]
OK I followed the advice you gave in the other bug about compiling amdgpu as a module and got the following dmesg using
modprobe -r amdgpu && dmesg > dmesg && sync
Created attachment 127340 [details]
After I issue the modprobe -r amdgpu command the system entirely freezes up
I took a screenshot of the final messages - could this be TTM related?
Created attachment 127341 [details]
This captures the BUG that freezes up the system
Created attachment 127565 [details]
The first stack trace in the dmesg is the same, the one captured after the system freezes up is slightly different
Created attachment 127566 [details]
I've tested this again with the latest drm-next-4.10-wip branch and I still get the same errors
Created attachment 128355 [details] [review]
Does this patch help?
It helps the original issue where a saw a panic / stack trace on shutdown and shutdown took a while - so that's great news
I've retested compiling amdgpu as a module and modprobe -r(ing) it - this still kills my machine, would you be interested in me taking more diagnostics? Or can that now be considered a separate bug?
(In reply to Mike Lothian from comment #16)
> It helps the original issue where a saw a panic / stack trace on shutdown
> and shutdown took a while - so that's great news
> I've retested compiling amdgpu as a module and modprobe -r(ing) it - this
> still kills my machine, would you be interested in me taking more
> diagnostics? Or can that now be considered a separate bug?
Separate bug. With this patch, the two code paths (module unload and shutdown are now separate).
*** Bug 98638 has been marked as a duplicate of this bug. ***
Created attachment 128372 [details] [review]
Does this patch also work?
So I removed your previous patch and applied the new one, I get a panic in shutdown again