If we're going to have one DBusPollFD (or equivalent) per socket (which is a bug in its own right, and also a prerequisite for using epoll), we need to be able to map from a socket fd to its DBusWatch(es). It'd help if we could rely on DBusWatch always having this pattern: - remove the DBusWatch - invalidate the DBusWatch (set its fd to -1) - unref it - *then* close the watched socket This is probably a job for 1.5 rather than 1.4. I'm working on it.
Created attachment 42275 [details] [review] Always remove, invalidate and free watches before closing watched sockets This should mean we don't get invalid fds in the main loop. The BSD (kqueue) and Windows code paths are untested, but follow the same patterns as the tested Linux/generic Unix versions. DBusTransportSocket was already OK (it called free_watches() before _dbus_close_socket, and that did the remove, invalidate, unref dance).
Created attachment 42276 [details] [review] Check that watches are removed, then invalidated, then unreffed
Review of attachment 42275 [details] [review]: This basically looks fine. ::: bus/dir-watch-inotify.c @@ +211,2 @@ _dbus_watch_unref (watch); _dbus_loop_unref (loop); These four calls (or calls very much like them) show up constantly. They seem sane, but I wonder whether it'd be possible to simplify matters. ::: bus/main.c @@ +66,3 @@ { _dbus_warn ("Unable to write to reload pipe.\n"); + close_reload_pipe_write (); I'm prepared to believe that calling close_reload_pipe() here was unsafe; will the remains of the closing turn out to happen later? ::: dbus/dbus-spawn-win.c @@ +167,3 @@ + _dbus_watch_unref (sitter->sitter_watch); + sitter->sitter_watch = NULL; + } Does this mean that the watch was never removed on Windows?
Review of attachment 42276 [details] [review]: This seems reasonable, though I'm a little afraid of asserting here.
(In reply to comment #3) > ::: bus/main.c > @@ +66,3 @@ > { > _dbus_warn ("Unable to write to reload pipe.\n"); > + close_reload_pipe_write (); > > I'm prepared to believe that calling close_reload_pipe() here was unsafe; will > the remains of the closing turn out to happen later? (For the record, the reason close_reload_pipe() is unsafe here is that manipulating our DBusWatch may call functions that aren't async-signal-safe. Hardly anything is guaranteed not to crash in a POSIX signal handler, apart from a few syscalls; in particular, malloc might not work.) We'll close the read end of the pipe shortly afterwards, when reading 1 byte from it in handle_reload_watch() fails, i.e. when the pipe fd reports POLLHUP and we've drained all the bytes in the buffer. In practice the effect of the (existing and new) code here is that if you send 4,096 SIGHUP signals before the main loop has a chance to deal with any of them, the bus daemon will start ignoring SIGHUP. I think. This is completely theoretical, but not really ideal. Perhaps signal_handler() should ignore failed writes completely, on the basis that config will be reloaded (4,096 times!) while draining the queue, which is what you wanted, and then the signal handler will presumably be allowed to write to the pipe again. > ::: dbus/dbus-spawn-win.c > @@ +167,3 @@ > + _dbus_watch_unref (sitter->sitter_watch); > + sitter->sitter_watch = NULL; > + } > > Does this mean that the watch was never removed on Windows? It was probably removed by destruction of the DBusWatchList, which implicitly removes all of its watches (but might do so too late).
I've rebased badfd-33336 onto master: I think it's 1.5.0 material. The two patches are the same as seen here. (In reply to comment #4) > This seems reasonable, though I'm a little afraid of asserting here. Assertions are disabled by default (unless you --enable-maintainer-mode), so packaged versions won't have this check anyway. I think it's OK to add on a development branch? (In reply to comment #5) > In practice the effect of the (existing and new) code here is that if you send > 4,096 SIGHUP signals before the main loop has a chance to deal with any of > them, the bus daemon will start ignoring SIGHUP. I think. This is completely > theoretical, but not really ideal. Perhaps signal_handler() should ignore > failed writes completely, on the basis that config will be reloaded (4,096 > times!) while draining the queue, which is what you wanted (I discovered today that it's actually 65,536 times on modern Linux.) The branch badfd-33336-hup adds two patches (which I'll attach in a moment) to fix this, and to fix use of non-async-signal-safe _dbus_warn in the signal handler. I consider these additional patches to be optional; they don't block any of the bugs blocked by this one.
Created attachment 43752 [details] [review] bus: signal_handler: ignore failure to write, and explain why See the comment in the source code for rationale.
Created attachment 43753 [details] [review] bus signal_handler: don't use _dbus_warn, and don't pretend to be portable _dbus_warn isn't async-signal-safe, so that's out. We can use write() instead; it's POSIX but not ISO C, but then again, so are signals. Accordingly, guard it with DBUS_UNIX. dbus-sysdeps-util-win doesn't actually implement _dbus_set_signal_handler anyway, so not compiling this code on non-Unix seems more honest.
unsuitable for 1.4, IMO; review for 1.5.x would be appreciated
Review of attachment 42275 [details] [review]: I agree with Will: why can't we close the fd when the refcount drops to zero?
Review of attachment 42276 [details] [review]: Looks good, same review.
Review of attachment 43752 [details] [review]: Looks ok.
Review of attachment 43753 [details] [review]: Looks good, just one minor, pedantic comment. ::: bus/main.c @@ +81,3 @@ + * This is necessarily Unix-specific, but so are POSIX signals, + * so... */ + static const char *message = Pedantic: make this static const char message[].
(In reply to comment #10) > Review of attachment 42275 [details] [review]: > > I agree with Will: why can't we close the fd when the refcount drops to zero? I think that's the wrong way round: it's necessary that you stop watching a fd before you close it, but it's not necessarily true that you close a fd when you stop watching it (you could use blocking I/O, or spin on a manual _dbus_poll() - as I think we do in some places). I suppose we *could* have a helper _dbus_tidy_up_socket (DBusLoop *loop, DBusWatch *watch, int *fd) which removes the watch from the loop if both are non-NULL, invalidates the watch, consumes one ref each (?) to the loop and the watch, closes *fd and sets it to -1 - but that seems rather magic, and a bit of a layering violation. It seems better to "say what you mean". (In reply to comment #11) > Review of attachment 42276 [details] [review]: > > Looks good, same review. Do you mean "why can't this be automatic"? The remove part: _dbus_watch_invalidate doesn't take a DBusLoop (and it can't - you can use _dbus_watch_invalidate for a connection watch that's being handled by the GLib or Qt main loop), so invalidating a watch can't automatically remove it from main loop integration first. The assertion is basically there because we can't key the hash table in #23194 by fd if the fd has gone away, so we should check for it. The unref part: modules other than the owner of the fd can also ref it, but the single owner of the fd (who is responsible for closing it) is also responsible for holding a ref that can be used to invalidate it, until after they have done so. Given that, DBusWatch is probably only refcounted so that it can cope with re-entrancy in which the callback for a watch removes that same watch, e.g. when the other end of the socket hangs up. (In reply to comment #13) > ::: bus/main.c > @@ +81,3 @@ > + * This is necessarily Unix-specific, but so are POSIX signals, > + * so... */ > + static const char *message = > > Pedantic: make this static const char message[]. Any particular reason? But, OK.
Fixed in git for 1.5.6.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.