Bug 107899

Summary: ANV driver doesn't release memory
Product: DRI Reporter: Yurii Kolesnykov <root>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: fdsfgs, intel-gfx-bugs, jason, leozinho29_eu, mark.a.janes, pmenzel+bugs.freedesktop.org, root, ryao
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=109389
Whiteboard: Triaged
i915 platform: SKL i915 features:
Attachments:
Description Flags
Top 10 stacks with outstanding allocations:
none
Dump of long lived i915_request object
none
Dump of intel_engine_cs to match i915_request dump
none
slabtop, /proc/slabinfo, wc -l proposed and one screenshot
none
DRM syncobj fence leak fix
none
Logs and screenshot none

Description Yurii Kolesnykov 2018-09-11 08:19:25 UTC
It was discoverd that ANV doesn't free up allocated memory, original discussion: https://github.com/doitsujin/dxvk/issues/632

Affected configuration: ANV + DXVK + Wine.
Comment 1 Yurii Kolesnykov 2018-09-11 08:21:14 UTC
Created attachment 141521 [details]
Top 10 stacks with outstanding allocations:
Comment 2 Jason Ekstrand 2018-09-11 14:23:55 UTC
From the github issue, it sounded like these allocations are never freed even after closing the app; is this correct?
Comment 3 leozinho29_eu 2018-09-11 15:00:36 UTC
Yes, this is correct. Even closing the application the memory is not freed. If the application is opened again it does not use that memory it used before, it allocates new memory to itself.

Note: I'm FurretUber on GitHub.
Comment 4 leozinho29_eu 2018-09-14 15:21:10 UTC
Note: Dota 2 is causing GPU hangs, I have to set INTEL_DEBUG=nohiz to make it work.

It seems it's not a DXVK-only issue. Dota 2 has, after 12 hours, allocated 719 MB of i915_request cache too. What I have noticed is that it was not allocating any i915_request cache for 20 minutes, then it started to allocate the memory too but at rates much slower than the DXVK applications.

As a comparison, Forsaken Castle, in the same 20 minutes, allocated 213 MB.

I already tried to debug this (see the GitHub issue) using perf and the only thing that looked strange in the DXVK application was one of the variables having values way off compared to the value of the variable in other applications. Using perf to debug @<i915_gem_do_execbuffer+3647> and looking at the values of in_fence and fences: 

 Forsaken_Castle 16943 [003] 17521.309896: probe:i915_gem_do_execbuffer: (ffffffffc07690af) in_fence=0x0 fences=0xffff952366b7bb00
            Xorg  2215 [000] 17521.309986: probe:i915_gem_do_execbuffer: (ffffffffc07690af) in_fence=0x0 fences=0x2d28
        rolc.exe 15796 [001] 17521.311725: probe:i915_gem_do_execbuffer: (ffffffffc07690af) in_fence=0x0 fences=0x1d8


I can do more tests if needed.
Comment 5 Chris Wilson 2018-09-14 20:39:48 UTC
The accumulation of i915_request implies there is a fence leak. Assuming it is not internal (an unmatched dma_fence_get/dma_fence_put), all userspace owners would be tied to an fd and eventually one would notice the fd exhaustion (after a few million depending on rlimit). But for the fd to stick around requires the process to be kept alive, which would imply the fence fd being passed to a display server. I don't think that is how fences are handled under X, which makes the likelihood of it being a singular userspace fence leak less likely.

'ls -1 /proc/$suspect/fd/ | wc -l' might be interesting to watch.
Comment 6 Jason Ekstrand 2018-09-15 22:06:47 UTC
I suspect this is some sort of fence leak with syncobj which would explain why only anv hits it (the GL driver doesn't use syncobj yet).
Comment 7 Richard Yao 2018-09-16 16:14:47 UTC
> The accumulation of i915_request implies there is a fence leak. Assuming it is not internal (an unmatched dma_fence_get/dma_fence_put), all userspace owners would be tied to an fd and eventually one would notice the fd exhaustion (after a few million depending on rlimit).

It is not tied to a file descriptor because I915_EXEC_FENCE_OUT is not set in args->flags (that was worked out by working backward from a perf trace). This means that DEFINE_DRM_GEM_FOPS->drm_release is never called, and we never get dma_fence_put() from this (hypothetical) stack:

dma_fence_put
drm_syncobj_free
kref_put
drm_syncobj_put
drm_syncobj_release_handle
drm_syncobj_release
drm_release

The trace at github indirectly shows that out_fence_fd == -1:

https://github.com/doitsujin/dxvk/issues/632#issuecomment-420485691

Also, my system is also affected by this. I have a Xeon E3-1276v3. I am running Gentoo with Linux 4.18.0-rc8, Xorg 1.19.5, mesa 18.1.6 and vulkan-loader 1.1.77.0. I have killed the Xorg server and the i915_request objects were not freed from the SLAB cache. This implies that the objects are not tied to a file descriptor.
Comment 8 Richard Yao 2018-09-16 16:25:42 UTC
Disregard that remark. That path involves sync_file_fops() and it has a matching dma_fence_get()/dma_fence_put() in sync_file_create()/sync_file_release().
Comment 9 Richard Yao 2018-09-16 17:46:37 UTC
Created attachment 141589 [details]
Dump of long lived i915_request object

I captured one of these long lived SLAB objects from my system using the crash utility. I am not a graphics developer, but here is what stood out to me:

1. The dma_fence refcount is 1.
2. The dma_fence segno is 82576060.
3. The dma_fence ops is i915_fence_ops.
4. The global_seqno is 82576036.

I'll post my capture of the referenced intel_engine_cs next, although the interesting thing there is that the timeline seqno is 82620551, which implies that this particular object is old.
Comment 10 Richard Yao 2018-09-16 17:47:03 UTC
Created attachment 141590 [details]
Dump of intel_engine_cs to match i915_request dump
Comment 11 leozinho29_eu 2018-09-17 17:09:14 UTC
Created attachment 141603 [details]
slabtop, /proc/slabinfo, wc -l proposed and one screenshot

I'm sorry for not being more active on this report before.

The attached file has three text files and one image.

I tested playing Forsaken Castle while watching ls -1 /proc/$suspect/fd/ | wc -l (on my case, $suspect=24003), the file is called proc24003fdwcl.txt. I create it by using:

while true ; do ls -1 /proc/24003/fd/ | wc -l >> proc24003fdwcl.txt ; sleep 2; done

There is /proc/slabinfo and the output of slabtop -s c too. There is iGVT-g load in the logs because I was using a Virtual Machine when tested this (it's not Forsaken Castle with iGVT-g load, don't worry).

The screenshot is the end of the game demo, with an approximate of the time I played it. Notice that in ~12 minutes (I needed some seconds to set up the logging) it allocated around 110 MB of memory.
Comment 12 Jason Ekstrand 2018-09-18 08:16:01 UTC
Created attachment 141619 [details] [review]
DRM syncobj fence leak fix

Mind trying a kernel patch?  I'm not in a position to experiment with kernel patches at the moment but I think I found the bug.
Comment 13 Richard Yao 2018-09-18 12:41:29 UTC
I am able to patch my kernel and rebuild to test, but I will not have access to the workstation that reproduces the problem for another 12 hours.
Comment 14 leozinho29_eu 2018-09-18 14:40:53 UTC
Created attachment 141641 [details]
Logs and screenshot

Great news! Looks like the memory leak is no longer happening or is very, very slow (see later). I played Forsaken Castle demo for 10 minutes, a game I became skilled thanks to this bug, and the amount of pages started at 100 objects and finished at 100 objects. The attached file has dmesg, /proc/slabinfo and a screenshot.

Comparing to the previous situation, where 12 minutes playing allocated 110 MB, it's perfect. I'm not sure if it's placebo effect, but looked like the game was running faster too, 125-130 FPS instead of 100-120 FPS.

Later I started Dota 2 (INTEL_DEBUG=nohiz used due to the hang and crash bug) and the i915_request object count rose to 650, when I closed it reduced to 350. I restarted X session and there is no sign of i915_request on slabtop, /proc/slabinfo has:

i915_request         350    450    640   25    4 : tunables    0    0    0 : slabdata     18     18      0

The values seem pretty small but it did not reduce restarting X, not sure how relevant this is.

This patch looks great.
Comment 15 Richard Yao 2018-09-18 16:25:54 UTC
We would need to dump the objects from the slab cache with crash to confirm that the objects are long lived and should not be there, but it sounds like there might be a small leak remaining. Unfortunately, I do not have a handy method available for dumping all objects in a slab cache via the crash utility. The best that i have done so far is get a list of addresses of objects and then dumped them each manually.
Comment 16 leozinho29_eu 2018-09-18 18:18:41 UTC
I could test a bit further. I opened another DXVK game (DiRT 3 Complete Edition) using INTEL_DEBUG=nohiz too, let one replay playing for one hour and checked the i915_request again. It rose up to 575 objects and when I closed it it kept that value.

What I noticed is that after I opened a Wine OpenGL game (rolc.exe) the value reduced to 100 again, the same value that was at boot.

Looks like the leak as before no longer exists, as that cache was cleaned by the OpenGL application. Before this patch the cache was never cleaned, requiring a reboot.

The curious thing is that the Vulkan/DXVK applications aren't cleaning part of the i915_request cache after they close, which the OpenGL application is doing for them.
Comment 17 Richard Yao 2018-09-19 17:55:54 UTC
Jason, applying your patch against Linux 4.18.0-rc8 (yes, I know I should upgrade) resolves the issue in Rise of Nations: Extended Edition. In my cursory test, i915_request allocations do not exceed 163 and alt-tabbing back to the KDE 5 desktop drops them to 15. Prior to the patch, it would have thousands of allocations by now. I do not see the issue that leozinho29_eu reported, although I did not try `INTEL_DEBUG=nohiz`.

Your patch looks good to me. Feel free to add my Reviewed-by and my Tested-by. To be clear, I know what saying to add my Reviewed-by means. I followed the "Reviewer’s statement of oversight" before offering it:

https://www.kernel.org/doc/html/v4.18/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
Comment 18 Richard Yao 2018-09-19 19:13:19 UTC
This might just be my lack of familiarity with the codebase, but why is ANV affected by this while RADV is not? I do not have any hardware to use to study how RADV works, but at a glance, it is relatively easy to see the fence bits being handled in the i915 GEM code while I am unable to tell at a glance how the amdgpu GEM code uses them. It seems to be done in a very abstract way in the amdgpu_cs.c file and it is not clear to me how that gets invoked by RADV. I am curious if the difference might point to the possibility that ANV is overly aggressive at fencing. I believe that DXVK should be making the same API calls on both.
Comment 19 Jason Ekstrand 2018-09-26 07:28:53 UTC
I've submitted the kernel patch to the mailing list.  Hopefully, it will land fairly soon and we'll make sure it gets back-ported as far as needed.
Comment 20 Jason Ekstrand 2018-09-27 07:49:06 UTC
The fix has landed in drm-misc-fixes; it will propagate to a kernel release near you shortly.
Comment 21 Lakshmi 2018-10-15 08:28:03 UTC
Yurii, do you still have the issue? Based on your confirmation, I can close the bug.
Comment 22 Yurii Kolesnykov 2018-10-15 09:11:52 UTC
Hi Lakshmi!
I just forwarded issue here, I even don't have Linux desktop now. I had asked on same question also on GH.
Comment 23 Yurii Kolesnykov 2018-10-15 15:49:18 UTC
In the original report users say that this was fixed.
Comment 24 Lakshmi 2018-10-16 06:16:18 UTC
Thanks for your feedback Yurii. I consider this bug has been fixed. Closing this bug.
Comment 25 Mark Janes 2019-01-29 17:02:53 UTC
*** Bug 108456 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.