Summary: | GEM object leaks with fullscreen programs -> swap fills up + OOM kills within few hours | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Eero Tamminen <eero.t.tamminen> | ||||
Component: | Driver/modesetting | Assignee: | Louis-Francis Ratté-Boulianne <lfrb> | ||||
Status: | VERIFIED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||
Severity: | critical | ||||||
Priority: | high | CC: | lfrb | ||||
Version: | git | Keywords: | bisected, patch, regression | ||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | Linux (All) | ||||||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=106136 https://bugs.freedesktop.org/show_bug.cgi?id=104760 |
||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Eero Tamminen
2018-04-17 15:29:21 UTC
Also look at i915_gem_framebuffer that will help to indicate which buffers are being leaked. Created attachment 138901 [details]
i915_gem_framebuffer content
framebuffer data keeps stable, so that's not a problem, although huge page objects increase.
Btw. I forgot to mention that also restarting compiz doesn't help, i.e. it's on the server side.
Bisecting is progressing slowly as to be sure of the leak, I'm running the tests for few hours (naturally always same set of tests in same order), in case Mesa bug 105906 would cause fluctuations to results. What's worse, there's no clear point when leak happened, it has grown during few weeks... 2018-03-03 git versions of Mesa, X and drm-tip kernel: -------------------------------------------------------------------- 348 objects, 197382144 bytes 109 unbound objects, 37384192 bytes 237 bound objects, 159473664 bytes 13 purgeable objects, 204800 bytes 25 mapped objects, 839680 bytes 50 huge-paged objects (2M, 64K, 4K) 184258560 bytes 29 display objects (globally pinned), 17522688 bytes 4294967296 [0x0000000010000000] gtt total Supported page sizes: 2M, 64K, 4K [k]contexts: 14 objects, 385024 bytes (0 active, 385024 inactive, 385024 global, 0 shared, 0 unbound) X: 84 objects, 100442112 bytes (0 active, 133922816 inactive, 25534464 global, 42409984 shared, 618496 unbound) X: 253 objects, 113287168 bytes (0 active, 116875264 inactive, 16834560 global, 25632768 shared, 37146624 unbound) -------------------------------------------------------------------- -> 0.1 GB X context(s) 2018-03-07 git versions: -------------------------------------------------------------------- 458 objects, 1153622016 bytes 153 unbound objects, 440012800 bytes 303 bound objects, 713084928 bytes 12 purgeable objects, 200704 bytes 21 mapped objects, 798720 bytes 78 huge-paged objects (2M, 64K, 4K) 411471872 bytes 25 display objects (globally pinned), 17489920 bytes 4294967296 [0x0000000010000000] gtt total Supported page sizes: 2M, 64K, 4K [k]contexts: 12 objects, 368640 bytes (0 active, 368640 inactive, 368640 global, 0 shared, 0 unbound) X: 202 objects, 1056759808 bytes (0 active, 713052160 inactive, 59387904 global, 998711296 shared, 403271680 unbound) X: 251 objects, 113258496 bytes (0 active, 114884608 inactive, 16834560 global, 25632768 shared, 37134336 unbound) -------------------------------------------------------------------- -> 1.0 GB X context (the larger one) 2018-03-12 git versions: -------------------------------------------------------------------- 612 objects, 2487406592 bytes 103 unbound objects, 37261312 bytes 295 bound objects, 838496256 bytes 13 purgeable objects, 204800 bytes 22 mapped objects, 802816 bytes 77 huge-paged objects (2M, 64K, 4K) 474124288 bytes 29 display objects (globally pinned), 17522688 bytes 4294967296 [0x0000000010000000] gtt total Supported page sizes: 2M, 64K, 4K [k]contexts: 14 objects, 385024 bytes (0 active, 385024 inactive, 385024 global, 0 shared, 0 unbound) X: 358 objects, 2390556672 bytes (0 active, 845119488 inactive, 76201984 global, 2324111360 shared, 1611612160 unbound) X: 243 objects, 113197056 bytes (0 active, 97050624 inactive, 8417280 global, 25632768 shared, 37154816 unbound) -------------------------------------------------------------------- -> 2.2 GB X context 2018-03-15 git versions: -------------------------------------------------------------------- 894 objects, 4744323072 bytes 129 unbound objects, 272134144 bytes 465 bound objects, 2155888640 bytes 13 purgeable objects, 204800 bytes 22 mapped objects, 802816 bytes 190 huge-paged objects (2M, 64K, 4K) 1436098560 bytes 29 display objects (globally pinned), 17522688 bytes 4294967296 [0x0000000010000000] gtt total Supported page sizes: 2M, 64K, 4K [k]contexts: 14 objects, 385024 bytes (0 active, 385024 inactive, 385024 global, 0 shared, 0 unbound) X: 632 objects, 4647391232 bytes (0 active, 2187685888 inactive, 93077504 global, 4589318144 shared, 2551144448 unbound) X: 254 objects, 105172992 bytes (0 active, 88899584 inactive, 8417280 global, 17526784 shared, 37146624 unbound) -------------------------------------------------------------------- -> 4.3 GB X context (Note: results aren't all from same machine, I'm using couple of them to speed up bisecting.) Last few nights git versions result in 4.4 GB X context, so the leakage hasn't gotten worse since mid-March. (I would have noticed the leak earlier if X modesetting format mismatch bug hadn't prevented X from starting on older GENs on which we have less RAM installed, between March 7th & April 3rd.) Ok, the leakage started after March 3rd, that version doesn't leak at all. After that the leakage has increased, but the indicated use-case (fullscreen GL) shows the leak right from the beginning. As expected, there's no leakage with Intel DDX, only with modesetting/glamor. Leakage is due just to X server, other components (kernel, mesa, X libs) don't affect it. Leak started between these X server commits: 2018-03-02 17_05:49 UTC: 43ffd572592d26bb78decfdf55e643bdfb011d3f meson: Make SHM extension optional 2018-03-06 15:53:39 UTC: 43576b901151a1f32209f476249a4de6980b654f glamor: Restore glamor_fd_from_pixmap and glamor_pixmap_from_fd Unfortunately that's the day when the initial atomic modesetting, modifiers and DRI3 v1.2 support went in, and the few final commits in that range (including DRI3 v1.2 support) until the last glamor fix, are broken. (Same range of commits that broke X for a month on older GENs due to modesetting format mismatch.) I'll try to narrow down it further. Leakage size depends on rest of the 3D stack. I did rest of the X bisecting using last night Git versions of everything else than X server, as leak size was largest with latest. Last good commit: e375f2966 modesetting: Create scanout buffers using supported modifiers First bad commit: ---------------------------------------------------------------- commit 9d147305b4048dcec7ea4eda3eeea83f843f7788 Author: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> AuthorDate: Wed Feb 28 01:19:42 2018 +0000 Commit: Adam Jackson <ajax@redhat.com> CommitDate: Mon Mar 5 13:27:47 2018 -0500 modesetting: Check if buffer format is supported when flipping Add support for 'check_flip2' so that the present core can know why it is impossible to flip in that scenario. The core can then let know the client that the buffer format/modifier is suboptimal. v2: No longer need to implement 'check_flip' Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Acked-by: Keith Packard <keithp@keithp.com> Reviewed-by: Adam Jackson <ajax@redhat.com> ---------------------------------------------------------------- Looking at that commit: https://cgit.freedesktop.org/xorg/xserver/commit/?id=9d147305b4048dcec7ea4eda3eeea83f843f7788 It seems that on every call of ms_present_check_flip(), following gbm bo is leaked, when modifier support is enabled: gbm = glamor_gbm_bo_from_pixmap(screen, pixmap) Running the example script that opens & closes fullscreen GL program 10x in a row, increases X context size by *200MB* (or more) each time: 1. 100MB 2. 250MB 3. 450MB 4. 610MB 5. 810MB 6. 1000MB 7. 1200MB 8. 1420MB ... I.e. 20MB leak per application window open/close. > I.e. 20MB leak per application window open/close.
IMHO this regression is blocker for the X release. Is there some convention on how to mark bugs as release blockers?
(Should be simple to fix as I located what is leaked / where.)
Thanks a lot Eero for hunting that leak! I've sent a patch to the mailing list that should (at least partly) fix it. Tested the patch on KBL GT2: https://patchwork.freedesktop.org/patch/218934/ It gets rid of *all* the context leakage. Everything else works also fine (except for Mesa bug 105906). Tested-by: Eero Tamminen <eero.t.tamminen@intel.com> commit 6cace4990abc2386b6ea68536b321994d264c295 Author: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Date: Thu Apr 26 11:04:15 2018 -0400 modesetting: Fix GBM objects leak when checking for flip GBM objects were never destroyed after looking for format and modifier compatibility when deciding whether flipping or copying a presented pixmap. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106106 Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Verified. (After 3 hours of testing on several devices, X again has 2x 0.1GB GEM contexts, instead of one of them being 4GB.) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.