Bug 105165

Summary: dbus running out of file descriptors
Product: dbus Reporter: Ray Strode <halfline>
Component: coreAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: D-Bus Maintainers <dbus>
Severity: normal    
Priority: medium CC: amigadave
Version: unspecified   
Hardware: Other   
OS: All   
URL: https://github.com/smcv/dbus/commits/rlimit-105165
Whiteboard:
i915 platform: i915 features:
Attachments: raise file descriptors early
improved log message, and removal of duplicated raise_file_descriptor_limit() call
[1/3] bus: raise fd limits before dropping privs
[2/2] Add a unit test for the dbus-daemon resetting its fd limit
[2/2] Add a unit test for the dbus-daemon resetting its fd limit
[2/3] cmake: Check for getrlimit, setrlimit
[3/3] Add a unit test for the dbus-daemon resetting its fd limit

Description Ray Strode 2018-02-19 18:13:58 UTC
Created attachment 137441 [details] [review]
raise file descriptors early

We're hitting a scalability issue with dbus in this downstream report:

https://bugzilla.redhat.com/show_bug.cgi?id=1529044

(which i think it's private unfortunately)

The upshot is dbus-daemon is running out of file descriptors because it's code to bump the file descriptor limit gets run after it drops privileges.

Attached is a patch from David King that the customer confirms addresses the problem.
Comment 1 Simon McVittie 2018-02-19 19:22:31 UTC
Thanks, I'll apply this soon. The patch looks good, but only because raise_file_descriptor_limit() already contains an "only run once" check.

I think this regressed as a result of rearranging the startup order to fix Bug #92832 in 1.10.18 and 1.11.0.
Comment 2 David King 2018-02-20 09:52:15 UTC
Created attachment 137462 [details] [review]
improved log message, and removal of duplicated raise_file_descriptor_limit() call

(In reply to Simon McVittie from comment #1)
> Thanks, I'll apply this soon. The patch looks good, but only because
> raise_file_descriptor_limit() already contains an "only run once" check.

Yep. I was tempted to remove the only other call of raise_file_descriptor_limit(), in process_config_postinit(), as it seemed that it would only be called once during startup, and then never again (thanks to the "run once" check). This patch should do that, IMO, so I attached a new patch that does so.

Is there a good place to add a test for this, so that it does not regress in the future?
Comment 3 Simon McVittie 2018-02-20 11:00:11 UTC
Comment on attachment 137462 [details] [review]
improved log message, and removal of duplicated raise_file_descriptor_limit() call

Review of attachment 137462 [details] [review]:
-----------------------------------------------------------------

I preferred the old patch, actually: with this one, if context->user is NULL, we're not going to raise our fd limit at all, which is a regression. (In practice context->user is non-NULL on the system bus, which is where this really matters, but dbus-daemon is somewhat generic.)

I'll apply the code from your previous patch with a commit message based on this one (thanks for including more details).
Comment 4 Simon McVittie 2018-02-20 11:02:12 UTC
(In reply to David King from comment #2)
> Is there a good place to add a test for this, so that it does not regress in
> the future?

It's going to be a bit tricky, because the tests aren't routinely run as root. I'll think about it.
Comment 5 Simon McVittie 2018-02-20 11:47:59 UTC
(In reply to Simon McVittie from comment #3)
> I'll apply the code from your previous patch with a commit message based on
> this one (thanks for including more details).

Or it might be better to move it out of the conditional, in fact.
Comment 6 Simon McVittie 2018-02-20 12:00:39 UTC
Created attachment 137464 [details] [review]
[1/3] bus: raise fd limits before dropping privs

From: David King

Startup ordering was changed in #92832 to ensure that SELinux audit
messages could be sent. As a side effect, the raising of file descriptor
limits was moved to after the dropping of root privileges, resulting in
the limit change always failing.

Move the raise_file_descriptor_limit() call to ensure that it is called
before dropping root privileges.

Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105165
Bug-RedHat: https://bugzilla.redhat.com/show_bug.cgi?id=1529044
[smcv: Call raise_file_descriptor_limit() even if !context->user]
Reviewed-by: Simon McVittie <smcv@collabora.com>
Comment 7 Simon McVittie 2018-02-20 12:02:48 UTC
Created attachment 137465 [details] [review]
[2/2] Add a unit test for the dbus-daemon resetting its fd  limit

---

This seems to work, and nicely fails on Travis-CI when I omit your patch that fixes the bug. I'm still waiting for the test results when I include your patch (hopefully it will pass).

Please review?
Comment 8 Simon McVittie 2018-02-20 12:09:04 UTC
Created attachment 137466 [details] [review]
[2/2] Add a unit test for the dbus-daemon resetting its fd limit

---

Now with the config file included as intended.
Comment 9 Simon McVittie 2018-02-20 12:51:54 UTC
(In reply to Simon McVittie from comment #8)
> Created attachment 137466 [details] [review]
> [2/2] Add a unit test for the dbus-daemon resetting its fd limit

This doesn't fully work, because we can't even attempt to start a dbus-daemon as non-root with a non-trivial <user> (but it's only fatal in some build configurations). I'm testing a better version, but I won't attach it here until I get it working.
Comment 10 Simon McVittie 2018-02-20 14:38:09 UTC
Created attachment 137468 [details] [review]
[2/3] cmake: Check for getrlimit, setrlimit

This gives us feature parity with the Autotools build system.
Comment 11 Simon McVittie 2018-02-20 14:38:30 UTC
Created attachment 137469 [details] [review]
[3/3] Add a unit test for the dbus-daemon resetting its fd  limit
Comment 12 David King 2018-02-20 16:51:04 UTC
Comment on attachment 137469 [details] [review]
[3/3] Add a unit test for the dbus-daemon resetting its fd  limit

Review of attachment 137469 [details] [review]:
-----------------------------------------------------------------

The test looks good to me - just one typo in a config header, by the look of it.

::: cmake/config.h.cmake
@@ +207,4 @@
>  #cmakedefine HAVE_DIRFD 1
>  #cmakedefine HAVE_INOTIFY_INIT1 1
>  #cmakedefine HAVE_GETRLIMIT 1
> +#cmakedefine HAVE_PRINTF 1

Shouldn't this be HAVE_PRLIMIT?
Comment 13 Simon McVittie 2018-02-20 17:43:32 UTC
(In reply to David King from comment #12)
> Comment on attachment 137469 [details] [review]
> [3/3] Add a unit test for the dbus-daemon resetting its fd  limit
>
> > +#cmakedefine HAVE_PRINTF 1
> 
> Shouldn't this be HAVE_PRLIMIT?

Of course :-)
Comment 14 Simon McVittie 2018-02-20 18:42:14 UTC
Fixed in git for 1.13.2, with the obvious change to [3/3], and a longer commit message for [2/3].

Leaving this open until I backport it to 1.12.x (and if anyone wants to veto the version I merged, there's still time).
Comment 15 Simon McVittie 2018-02-20 19:01:17 UTC
(In reply to Simon McVittie from comment #14)
> Leaving this open until I backport it to 1.12.x

It cherry-picks in the obvious way, FWIW.
Comment 16 Philip Withnall 2018-02-21 13:38:32 UTC
Post-hoc review on all three patches: they all look good to me (taking David’s comment in comment #12 into account).
Comment 17 Simon McVittie 2018-03-02 14:22:39 UTC
Fixes released in 1.13.2, 1.12.6 and 1.10.26. Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.