Bug 13194 - When machine-id not found, dbus should not abort
Summary: When machine-id not found, dbus should not abort
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: All All
: medium normal
Assignee: Simon McVittie
QA Contact: D-Bus Maintainers
URL: https://github.com/smcv/dbus/commits/...
Whiteboard: review+
Keywords: patch
Depends on: 101257
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-12 04:53 UTC by Ghee Teo
Modified: 2017-06-09 12:36 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
[1/3] uuidgen: Remove unimplemented declaration (1.22 KB, patch)
2017-06-07 17:07 UTC, Simon McVittie
Details | Splinter Review
[2/3] Make _dbus_get_local_machine_uuid_encoded() properly failable (6.61 KB, patch)
2017-06-07 17:08 UTC, Simon McVittie
Details | Splinter Review
[3/3] Add dbus_try_get_local_machine_id() (6.18 KB, patch)
2017-06-07 17:10 UTC, Simon McVittie
Details | Splinter Review
[1/3] tests: Don't exercise GetMachineId() or autolaunch if no machine ID (6.24 KB, patch)
2017-06-08 16:36 UTC, Simon McVittie
Details | Splinter Review
dbus-launch: Use dbus_try_get_local_machine_id() (2.43 KB, patch)
2017-06-08 17:44 UTC, Simon McVittie
Details | Splinter Review

Description Ghee Teo 2007-11-12 04:53:03 UTC
When /var/lib/dbus/machine-id is missing, applications that uses dbus basically abort because dbus does so, sample pstack, 

$ pidgin
process 8300: D-Bus library appears to be incorrectly set up; failed to read machine uuid: Failed to open "/var/lib/dbus/machine-id": No such file or directory
See the manual page for dbus-uuidgen to correct this issue.
  D-Bus not built with -rdynamic so unable to print a backtrace
Abort (core dumped)
-bash-3.00$ pstack core
core 'core' of 8300:    pidgin
 fef3e8d7 _lwp_kill (1, 6) + 7
 feef6a52 raise    (6) + 22
 feed4fdd abort    (fb3e461c, 8047444, fb3c8a18, fb3e4f90, fb3e461c, 8047478) + cd
 fb3cc2b9 _dbus_abort (fb3e4f90, fb3e461c, 8047478, 8047478, fb3c8fcf, fb3d37f8) + 61
 fb3c8a18 _dbus_warn_check_failed (fb3d37f8, 8136758) + 68
 fb3c8fcf _dbus_get_local_machine_uuid_encoded (804749c) + 8f
 fb3ce8ca _dbus_get_autolaunch_address (80474fc, 8047540) + 2a
 fb3c57f1 _dbus_transport_open_autolaunch (8133e80, 8047560, 8047540) + 5d
 fb3c5929 _dbus_transport_open (8133e80, 80475b0) + 8d
...

This seems a bit drastic, it gave the impression that applications crashes while in actual fact this is a set up problem for dbus.

google dbus machine-id seems that there have been lots of bugs/problem assocaited with this. 

Can dbus lib uses a less severe form of error handling? like uses
_dbus_warn() which leads eventually to _dbus_abort if it fatal_warning is set.
Hence, allowing stable release to run without unncessary core dump!
Comment 1 Havoc Pennington 2007-11-12 07:46:47 UTC
Having no id file is just the same as installing dbus without its config files, or patching it to have "exit(1)" in it, or otherwise breaking things on purpose. The reason it aborts loudly is because you need to fix your system. The error message says *exactly* how to fix your system and points to the docs, and fixing it takes only a minute or two. Normally your distribution will have done it for you, though.

"make install" should probably run dbus-uuidgen, but if the library is fatally misinstalled, there isn't a way to continue. Aborting is appropriate.

Leaving the bug open since we need to add dbus-uuidgen to make install.
Comment 2 Ghee Teo 2007-11-12 09:04:08 UTC
Thanks Havoc ;)
Comment 3 Simon McVittie 2011-01-07 07:54:05 UTC
(In reply to comment #1)
> "make install" should probably run dbus-uuidgen

I started implementing this, but there are subtleties:

- if cross-compiling, we can't run it

- if installing into a DESTDIR to create packages, we don't want to run it,
  to avoid packagers accidentally setting the same machine ID on every
  machine running their distribution

- if installing into a prefix, ideally the user would have set the machine-ID
  path to the normal system-wide one, to avoid split-brain problems

- distcheck will fail, because we certainly don't want to uninstall the
  machine ID!

so I'm not sure it's such a good idea.

To avoid split-brain problems where two parts of an OS think they have different machine IDs, I think we want to encourage the use of a distro-provided D-Bus package, which will have a single value for the machine ID's path, and on any sane system will arrange for dbus-uuidgen --ensure to be run.

If we accepted the patch from Bug #23679 (or similar), we'd solve this globally for Linux by using the kernel's idea of the boot ID.

Bug #29618 is related.
Comment 4 Simon McVittie 2011-11-03 09:42:03 UTC
Bug #42531 (Android support) has a related patch.
Comment 5 Simon McVittie 2014-09-10 16:05:25 UTC
(In reply to comment #3)
> To avoid split-brain problems where two parts of an OS think they have
> different machine IDs, I think we want to encourage the use of a
> distro-provided D-Bus package, which will have a single value for the
> machine ID's path, and on any sane system will arrange for dbus-uuidgen
> --ensure to be run.

... or systemd (which installs /etc/machine-id, with which dbus is perfectly happy), or any piece of OS infrastructure that installs a systemd-compatible /etc/machine-id.

Having said that, making dbus_connection_open() / dbus_bus_get() fail with a DBusError seems like it would be a nicer way to signal "this system is broken". In practice, things that *need* D-Bus will respond to that by aborting/crashing/warning and exiting anyway, and things that have D-Bus as nonessential functionality (Pidgin) can presumably survive it.
Comment 6 Simon McVittie 2017-06-07 17:05:36 UTC
I've come around to the opinion that we should deal with this much more gracefully than we do, and that Havoc's point of view in Comment #1, while certainly defensible, is unnecessarily dogmatic.

(In reply to Havoc Pennington from comment #1)
> Having no id file is just the same as installing dbus without its config
> files, or patching it to have "exit(1)" in it, or otherwise breaking things
> on purpose.

Yes ish... but in practice, the dbus library actually mostly works without a machine ID (and it turns out Travis-CI has been testing that situation for months).

I agree that the error message produced when failing to do something due to lack of a machine ID should indicate that this is a misconfiguration. However, we have better ways to deal with recoverable errors than just aborting, and in particular applications know better than we do whether using dbus is optional or mandatory for that particular application.

Also, having libdbus installed does not *necessarily* mean you have dbus-daemon and dbus-uuidgen installed - some software links to libdbus at build time but does not hard-require a D-Bus bus at runtime.

(In reply to Havoc Pennington from comment #1)
> "make install" should probably run dbus-uuidgen

It can't, because:

* dbus-uuidgen doesn't understand DESTDIR
* dbus-uuidgen is totally inappropriate to run when building packages for
  something like RPM or dpkg, which are a much more frequent way to obtain
  dbus than building it yourself, and IMO something we should be strongly
  recommending - if dbus is critical OS infrastructure then you should
  normally get it from your OS vendor
* creating the machine ID but not uninstalling it makes distcheck fail
* uninstalling the machine ID is never correct

(In reply to Simon McVittie from comment #5)
> Having said that, making dbus_connection_open() / dbus_bus_get() fail with a
> DBusError seems like it would be a nicer way to signal "this system is
> broken". In practice, things that *need* D-Bus will respond to that by
> aborting/crashing/warning and exiting anyway, and things that have D-Bus as
> nonessential functionality (Pidgin) can presumably survive it.

In fact dbus_connection_open()/dbus_bus_get() work fine without a machine ID if DBUS_SESSION_BUS_ADDRESS is already set or if $XDG_RUNTIME_DIR/bus is available, but then crash out if and only if you try to use X11 autolaunch. Similarly, dbus-daemon actually works fine without a machine ID until you call Peer.GetMachineId() on it, at which point it crashes. This does not seem a consistent position to take. There are two things that would be consistent:

* Work as well as we can without a machine ID, while still indicating in
  error messages for the things that don't work that this situation is
  misconfiguration.

* Check for the machine ID up-front even if we don't need it, and refuse
  to work without it.

We do not do either of those right now, and we never have. I strongly prefer the first, because the second seems like being difficult for the sake of being difficult.
Comment 7 Simon McVittie 2017-06-07 17:06:59 UTC
The patches I'm about to attach depend on those from Bug #101257, but could be easily be rebased the other way round.
Comment 8 Simon McVittie 2017-06-07 17:07:44 UTC
Created attachment 131783 [details] [review]
[1/3] uuidgen: Remove unimplemented declaration

As far as I can tell from git history, this function never existed.
Comment 9 Simon McVittie 2017-06-07 17:08:42 UTC
Created attachment 131784 [details] [review]
[2/3] Make _dbus_get_local_machine_uuid_encoded() properly  failable

This function already raised an error, and all callers handled that
error as gracefully as they could (because _dbus_generate_uuid() is
failable, since 2015). Given that, it seems unnecessarily hostile
to do a _dbus_warn_check_failed() unless we have no better alternative:
yes, it indicates that dbus has not been installed correctly, but
during build-time tests it's entirely reasonable that dbus has not
yet been installed.

Callers are:

* DBusConnection, to implement Peer.GetMachineId()
* The bus driver, to implement Peer.GetMachineId()
* X11 autolaunching
* dbus_get_local_machine_id()

Of those, only the last one is not in a position to return an error
gracefully, so move the _dbus_warn_check_failed() to there.

Migrate the text about the D-Bus library being incorrectly set up
into the error emitted by the Unix implementation, and to make it
less misleading, include separate error messages for both the
files we try to read:

$ bwrap --ro-bind / / --dev /dev --tmpfs /etc --tmpfs /var \
  ./tools/dbus-uuidgen --get
D-Bus library appears to be incorrectly set up: see the manual
page for dbus-uuidgen to correct this issue. (Failed to open
"/var/lib/dbus/machine-id": No such file or directory; Failed to open
"/etc/machine-id": No such file or directory)
Comment 10 Simon McVittie 2017-06-07 17:10:52 UTC
Created attachment 131785 [details] [review]
[3/3] Add dbus_try_get_local_machine_id()

---

dbus_get_local_machine_id() continues to issue a warning that defaults to fatal. We could demote it to _dbus_warn() so that it defaults to non-fatal if people would prefer that, but in any case the long-term solution is to use a function that has proper error handling.
Comment 11 Simon McVittie 2017-06-07 17:20:24 UTC
(In reply to Simon McVittie from comment #9)
> $ bwrap --ro-bind / / --dev /dev --tmpfs /etc --tmpfs /var \
>   ./tools/dbus-uuidgen --get
> D-Bus library appears to be incorrectly set up: see the manual
> page for dbus-uuidgen to correct this issue. (Failed to open
> "/var/lib/dbus/machine-id": No such file or directory; Failed to open
> "/etc/machine-id": No such file or directory)

Also:

$ bwrap --ro-bind / / --dev /dev --tmpfs /etc --tmpfs /var \
  ./tools/dbus-launch --exit-with-x11
dbus[12857]: D-Bus library appears to be incorrectly set up: see the manual page for dbus-uuidgen to correct this issue. (Failed to open "/var/lib/dbus/machine-id": No such file or directory; Failed to open "/etc/machine-id": No such file or directory)

So that seems to be behaving as desired: you get a big fat warning, an unsuccessful exit, and no crash.
Comment 12 Philip Withnall 2017-06-07 22:59:45 UTC
Comment on attachment 131783 [details] [review]
[1/3] uuidgen: Remove unimplemented declaration

Review of attachment 131783 [details] [review]:
-----------------------------------------------------------------

Trivially yes.
Comment 13 Philip Withnall 2017-06-07 23:06:52 UTC
Comment on attachment 131784 [details] [review]
[2/3] Make _dbus_get_local_machine_uuid_encoded() properly  failable

Review of attachment 131784 [details] [review]:
-----------------------------------------------------------------

::: dbus/dbus-internals.c
@@ -912,5 @@
> -
> -      if (!_dbus_read_local_machine_uuid (&machine_uuid, FALSE,
> -                                          &local_error))
> -        {          
> -#ifndef DBUS_ENABLE_EMBEDDED_TESTS

Is the conditional compilation on DBUS_ENABLE_EMBEDDED_TESTS no longer necessary? It’s been omitted from where the _dbus_warn_check_failed() has been moved to.
Comment 14 Philip Withnall 2017-06-07 23:13:41 UTC
Comment on attachment 131785 [details] [review]
[3/3] Add dbus_try_get_local_machine_id()

Review of attachment 131785 [details] [review]:
-----------------------------------------------------------------

This looks good. There are a couple of references to dbus_get_local_machine_id() in documentation comments elsewhere in the code (dbus/dbus-bus.c, dbus/dbus-connection.c) — do you want to update them to mention dbus_try_get_local_machine_id() instead, so people are consistently pointed in the right direction? Could be a follow-up patch if you prefer.
Comment 15 Simon McVittie 2017-06-08 11:36:06 UTC
(In reply to Philip Withnall from comment #13)
> Is the conditional compilation on DBUS_ENABLE_EMBEDDED_TESTS no longer
> necessary? It’s been omitted from where the _dbus_warn_check_failed() has
> been moved to.

I think removing that conditional is fine. It was always rather a hack: either something is defined behaviour or it isn't, and I'm very wary of code that changes observable behaviour depending on whether we compiled in tests.

Previously, the behaviour was this, if we assume fatal warnings:

    if no machine ID
        if embedded tests compiled in
            silently make up a new random UUID
        else if warnings are fatal
            warn
            crash
        else
            warn
            make up a new random UUID

Now, the callers that can cope with errors get an error that contains the old warning, the caller that can't cope spams stderr with that warning, and we never just make up a new random UUID.

Travis-CI's build workers do not have a machine ID (demonstrated by one of my new tests for o.fd.DBus.Peer failing until I fixed it), and the "embedded tests" pass, so we can infer that the "embedded tests" never exercised dbus_get_local_machine_id() anyway.
Comment 16 Simon McVittie 2017-06-08 11:37:02 UTC
(In reply to Philip Withnall from comment #14)
> There are a couple of references to
> dbus_get_local_machine_id() in documentation comments elsewhere in the code
> (dbus/dbus-bus.c, dbus/dbus-connection.c) — do you want to update them to
> mention dbus_try_get_local_machine_id() instead, so people are consistently
> pointed in the right direction? Could be a follow-up patch if you prefer.

Good idea.
Comment 17 Simon McVittie 2017-06-08 14:51:07 UTC
(In reply to Simon McVittie from comment #15)
> Travis-CI's build workers do not have a machine ID (demonstrated by one of
> my new tests for o.fd.DBus.Peer failing until I fixed it), and the "embedded
> tests" pass, so we can infer that the "embedded tests" never exercised
> dbus_get_local_machine_id() anyway.

Sigh, no, this is not actually the case - the embedded tests rely on GetMachineId() making up a new UUID on the spot.
Comment 18 Philip Withnall 2017-06-08 15:00:04 UTC
(In reply to Simon McVittie from comment #17)
> (In reply to Simon McVittie from comment #15)
> > Travis-CI's build workers do not have a machine ID (demonstrated by one of
> > my new tests for o.fd.DBus.Peer failing until I fixed it), and the "embedded
> > tests" pass, so we can infer that the "embedded tests" never exercised
> > dbus_get_local_machine_id() anyway.
> 
> Sigh, no, this is not actually the case - the embedded tests rely on
> GetMachineId() making up a new UUID on the spot.

How so? If it’s easy to do so, it might be better to fix the tests rather than keep the DBUS_ENABLE_EMBEDDED_TESTS conditional here, for all the reasons you gave in comment #15. Probably best if we keep the conditional in this patchset, though, and fix the tests and remove the conditional in a follow-up patch (in this bug or another).
Comment 19 Simon McVittie 2017-06-08 15:06:56 UTC
(In reply to Philip Withnall from comment #18)
> How so? If it’s easy to do so, it might be better to fix the tests rather
> than keep the DBUS_ENABLE_EMBEDDED_TESTS conditional here

Yes. I think I can see how to fix the tests without it being horrible, but I'm waiting for re-testing of Bug #101257 to finish first.
Comment 20 Simon McVittie 2017-06-08 16:29:28 UTC
This failure mode can be exercised on a less deficient machine with:

bwrap --bind / / \
   --dev /dev \
   --tmpfs /etc \
   --bind /etc/passwd /etc/passwd \
   --bind /etc/group /etc/group \
   --bind /etc/nsswitch.conf /etc/nsswitch.conf \
   --tmpfs /var \
   make check

(It turns out those are the only things in /etc that the dbus tests need.)
Comment 21 Simon McVittie 2017-06-08 16:36:47 UTC
Created attachment 131809 [details] [review]
[1/3] tests: Don't exercise GetMachineId() or autolaunch if no  machine ID

At the moment there is a hack in the implementation of GetMachineId()
to stop tests from failing during "make check" on a system where
dbus has never been installed, by silently generating a new unique
fake "machine ID" for each process. I'm about to change that
behaviour to report errors properly; skip affected test-cases if we
can't read the real machine ID.

The shell scripts to test dbus-launch are run both as "make check"
tests (for which it is valid for dbus to be not correctly installed)
and as installed-tests (for which that is not valid), so make them
pass during "make check" but fail during installed testing.
The tests in bus/ and test/name-test/ are only run during "make check"
so they only have the code path where they are skipped.
Comment 22 Simon McVittie 2017-06-08 16:37:11 UTC
Comment on attachment 131783 [details] [review]
[1/3] uuidgen: Remove unimplemented declaration

Applied, thanks
Comment 23 Simon McVittie 2017-06-08 16:39:46 UTC
If https://travis-ci.org/smcv/dbus/builds/240863222 succeeds, then this is working fine. Attachment #131809 [details] is intended to fix the previously-failing "debug" and "cmake" builds.

(If the build in a debian stretch container fails, that is probably not this bug - check the log. The apt mirror that it uses has been producing intermittent 503 errors recently.)
Comment 24 Philip Withnall 2017-06-08 17:03:09 UTC
Comment on attachment 131809 [details] [review]
[1/3] tests: Don't exercise GetMachineId() or autolaunch if no  machine ID

Review of attachment 131809 [details] [review]:
-----------------------------------------------------------------

++, with one nit.

::: test/name-test/test-autolaunch.c
@@ +24,5 @@
>  
> +  if (!_dbus_read_local_machine_uuid (&uuid, FALSE, &error))
> +    {
> +      /* We can't expect autolaunching to work in this situation */
> +      fprintf (stderr, "%s\n", error.message);

Do you want to prefix this message with `*** ` like the others? It’s not TAP, but let’s be consistent.
Comment 25 Simon McVittie 2017-06-08 17:44:18 UTC
(In reply to Simon McVittie from comment #16)
> (In reply to Philip Withnall from comment #14)
> > There are a couple of references to
> > dbus_get_local_machine_id() in documentation comments elsewhere in the code
> > (dbus/dbus-bus.c, dbus/dbus-connection.c) — do you want to update them to
> > mention dbus_try_get_local_machine_id() instead, so people are consistently
> > pointed in the right direction? Could be a follow-up patch if you prefer.
> 
> Good idea.

I pushed the obvious change unreviewed.

(In reply to Philip Withnall from comment #24)
> Do you want to prefix this message with `*** ` like the others? It’s not
> TAP, but let’s be consistent.

Done.

One more patch: dbus-launch still used dbus_get_local_machine_id().
Comment 26 Simon McVittie 2017-06-08 17:44:49 UTC
Created attachment 131813 [details] [review]
dbus-launch: Use dbus_try_get_local_machine_id()
Comment 27 Philip Withnall 2017-06-09 08:25:58 UTC
Comment on attachment 131813 [details] [review]
dbus-launch: Use dbus_try_get_local_machine_id()

Review of attachment 131813 [details] [review]:
-----------------------------------------------------------------

Yes.
Comment 28 Simon McVittie 2017-06-09 12:36:20 UTC
Thanks, pushed.

Fixed in git for 1.11.14.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.