Bug 43303 - dbus-python test suite failures
Summary: dbus-python test suite failures
Status: RESOLVED NOTOURBUG
Alias: None
Product: dbus
Classification: Unclassified
Component: python (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Simon McVittie
QA Contact: John (J5) Palmieri
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-28 06:20 UTC by Barry Warsaw
Modified: 2012-05-09 08:09 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Barry Warsaw 2011-11-28 06:20:25 UTC
Using current git head on Ubuntu 11.10 and Python 2.7, none of test-client.py actually passes when you do a 'make check'.  They all produce errors like this:

======================================================================
ERROR: testAsyncCalls (__main__.TestDBusBindings)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/barry/projects/ubuntu/pydbus/upstream/test/test-client.py", line 73, in setUp
    self.remote_object = self.bus.get_object(NAME, OBJECT)
  File "/home/barry/projects/ubuntu/pydbus/upstream/dbus/bus.py", line 244, in get_object
    follow_name_owner_changes=follow_name_owner_changes)
  File "/home/barry/projects/ubuntu/pydbus/upstream/dbus/proxies.py", line 246, in __init__
    self._named_service = conn.activate_name_owner(bus_name)
  File "/home/barry/projects/ubuntu/pydbus/upstream/dbus/bus.py", line 183, in activate_name_owner
    self.start_service_by_name(bus_name)
  File "/home/barry/projects/ubuntu/pydbus/upstream/dbus/bus.py", line 281, in start_service_by_name
    'su', (bus_name, flags)))
  File "/home/barry/projects/ubuntu/pydbus/upstream/dbus/connection.py", line 630, in call_blocking
    message, timeout)
DBusException: org.freedesktop.DBus.Error.Spawn.ChildSignaled: Process /bin/bash received signal 6

I'm still investigating, but I wonder if 'make check' succeeds for you, or whether it's even supposed to!
Comment 1 Barry Warsaw 2011-12-01 11:35:37 UTC
Looks like the same failures occur on Debian Squeeze.  It also doesn't appear to make any difference what I put on the Exec line in TestSuitePythonService.service.  Everything, including a simple

/bin/bash -c "echo 'hello world'"

has the exact same SIGABRT failure.  I have no clue how to continue to debug this.
Comment 2 Simon McVittie 2011-12-13 04:11:49 UTC
(In reply to comment #0)
> Using current git head on Ubuntu 11.10 and Python 2.7, none of test-client.py
> actually passes when you do a 'make check'.

Weirdly, this works fine for me: Debian sid, Python "2.7.2+" (python2.7 2.7.2-8).
Comment 3 Barry Warsaw 2011-12-13 07:15:01 UTC
On Dec 13, 2011, at 12:11 PM, bugzilla-daemon@freedesktop.org wrote:

>Weirdly, this works fine for me: Debian sid, Python "2.7.2+" (python2.7
>2.7.2-8).

I just tried it in a sid vm.  I get some test failures, but nothing like the
cascade of start-up problems I see on Ubuntu. :(
Comment 4 Barry Warsaw 2011-12-16 14:19:26 UTC
Okay, I understand better what's happening here.  I'm no pygi expert but the switch to pygi caused the code in gobject_service.py to fail due.  Also, that code is not compatible with Python 3 due to different metaclass definition syntax.  I've rewritten the code in gobject_service.py to use pygi and be compatible across Python versions.  I've just committed the changes to my Py3 branch, but you might be able to tease out the differences if you want to apply this separately from that branch:

https://github.com/warsaw/dbus-python3/commit/f8dab5af0bef5d26a51df41a564a5285c16a9cb5
Comment 5 Barry Warsaw 2011-12-16 14:33:25 UTC
Ah.  That fix is incomplete.  It works when running the tests manually, but something's still wrong with running it from `make check`.
Comment 6 Jason Conti 2011-12-21 12:58:54 UTC
Barry, comment out the DBUS_FATAL_WARNINGS=1 line in test/run-test.sh. It seems like the tests complete successfully. I think the aborts on ubuntu may be outside the scope of dbus-python, because I can reproduce it using just the TestSuitePythonService.service (with the python code replaced with echo hello as above), tmp-session-bus.conf, the following GDBus code:

#include <gtk/gtk.h>

int main()
{
  gtk_init(NULL, NULL);
  GError *error = NULL;
  GDBusProxy *proxy = g_dbus_proxy_new_for_bus_sync(G_BUS_TYPE_SESSION,
      G_DBUS_PROXY_FLAGS_NONE,
      NULL,
      "org.freedesktop.DBus.TestSuitePythonService",
      "/org/freedesktop/DBus/TestSuitePythonObject",
      "org.freedesktop.DBus.TestSuiteInterface",
      NULL,
      &error);

  if (error != NULL) {
    g_warning("Failed to get proxy: %s", error->message);
    g_error_free(error);
  }

  return 0;
}

and the following bash script:

#!/bin/bash
export DBUS_FATAL_WARNINGS=1

eval $(dbus-launch --sh-syntax --config-file=tmp-session-bus.conf)

echo "Started session bus at $DBUS_SESSION_BUS_PID"

./test-dbus

echo "Killing session bus at $DBUS_SESSION_BUS_PID"
kill $DBUS_SESSION_BUS_PID

Commenting out the export DBUS_FATAL_WARNINGS=1 line allows the code in test-dbus to run until it times out, with the line it aborts immediately just like in the tests.
Comment 7 Barry Warsaw 2012-01-05 12:17:44 UTC
Hi Jason.

Based on your suggestion of DBUS_FATAL_WARNINGS, and some more testing
on Ubuntu 12.04, I'm starting to think this is a dbus-launch related bug.

Start by instrumenting run-test.sh to print out $PYTHONPATH, then export that
into your shell, and also export DBUS_TOP_BUILDDIR and DBUS_TOP_SRCDIR as
required by test-service.py.  Now, in that shell do this from
$DBUS_TOP_SRCDIR:

DBUS_FATAL_WARNINGS=1 dbus-daemon --config-file test/tmp-session-bus.conf --print-address

In a different shell, start up Python with DBUS_SESSION_BUS_ADDRESS exported
as appropriate, then in Python do this:

>>> import dbus
>>> sb = dbus.SessionBus()
>>> p = sb.get_object('org.freedesktop.DBus.TestSuitePythonService', '/org/freedesktop/DBus/TestSuitePythonObject')

you'll get a proxy object which you can poke at, and d-feet shows that your
service is all up and running just fine.  It does not matter whether
DBUS_FATAL_WARNINGS is set for this process.  IOW, with the service started
with dbus-daemon directly, everything is fine.

Now, kill dbus-daemon and start the service instead with dbus-launch:

DBUS_FATAL_WARNINGS=1 dbus-launch --config-file test/tmp-session-bus.conf --sh-syntax

In the other shell, fire up Python with the new DBUS_SESSION_BUS_ADDRESS and
you'll see that as soon as you try to get the proxy object, you'll receive the
SIGABRT:

>>> p = sb.get_object('org.freedesktop.DBus.TestSuitePythonService', '/org/freedesktop/DBus/TestSuitePythonObject')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "dbus/bus.py", line 244, in get_object
    follow_name_owner_changes=follow_name_owner_changes)
  File "dbus/proxies.py", line 246, in __init__
    self._named_service = conn.activate_name_owner(bus_name)
  File "dbus/bus.py", line 183, in activate_name_owner
    self.start_service_by_name(bus_name)
  File "dbus/bus.py", line 281, in start_service_by_name
    'su', (bus_name, flags)))
  File "dbus/connection.py", line 630, in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.Spawn.ChildSignaled: Process /bin/bash received signal 6

The process described by DBUS_SESSION_BUS_PID is still running, so kill it.
Unfortunately, once dbus-launch exits, I don't get any verbose output, so I
have no idea why the service is getting aborted.  It doesn't appear that
DBUS_VERBOSE is getting exported to dbus-launch's child, or maybe it is and
verbose output is getting thrown away.

As you might guess, starting dbus-launch with DBUS_FATAL_WARNINGS=0 lets
everything work.  I need to try to coax more information out of dbus-launch to
continue.

On Dec 21, 2011, at 12:58 PM, bugzilla-daemon@freedesktop.org wrote:

>https://bugs.freedesktop.org/show_bug.cgi?id=43303
>
>--- Comment #6 from Jason Conti <jason.conti@gmail.com> 2011-12-21 12:58:54 UTC ---
>Barry, comment out the DBUS_FATAL_WARNINGS=1 line in test/run-test.sh.
Comment 8 Jason Conti 2012-01-05 20:49:16 UTC
Think I found the offending code in dbus/dbus-spawn.c:do_exec

 953 #ifdef DBUS_BUILD_TESTS
 954   max_open = sysconf (_SC_OPEN_MAX);
 955   
 956   for (i = 3; i < max_open; i++)
 957     {
 958       int retval;
 959 
 960       if (i == child_err_report_fd)
 961         continue;
 962       
 963       retval = fcntl (i, F_GETFD);
 964 
 965       if (retval != -1 && !(retval & FD_CLOEXEC))
 966         _dbus_warn ("Fd %d did not have the close-on-exec flag set!\n", i);
 967     }
 968 #endif

It doesn't seem like this code should get executed since it seems to be disabled in the non-debug package build, but commenting it out and rebuilding the package on precise and I can't reproduce the aborts.

It also seems like quite a few processes are leaking file descriptors. Running your first dbus-daemon example in the last comment under gdb will also cause the abort unless I manually set FD_CLOEXEC on fd 3-6 before attempting to launch the service. Likewise, running dbus-daemon with --fork will trigger the aborts.
Comment 9 Simon McVittie 2012-01-06 02:54:08 UTC
(In reply to comment #8)
> It doesn't seem like this code should get executed since it seems to be
> disabled in the non-debug package build, but commenting it out and rebuilding
> the package on precise and I can't reproduce the aborts.

Does rebuilding the same source with your current toolchain also fix it?

> It also seems like quite a few processes are leaking file descriptors. Running
> your first dbus-daemon example in the last comment under gdb will also cause
> the abort unless I manually set FD_CLOEXEC on fd 3-6 before attempting to
> launch the service.

Those fds probably come from gdb itself, similar to Bug #35173, which was fixed in 1.4.10.
Comment 10 Barry Warsaw 2012-01-06 08:04:08 UTC
On Jan 06, 2012, at 04:49 AM, bugzilla-daemon@freedesktop.org wrote:

>It doesn't seem like this code should get executed since it seems to be
>disabled in the non-debug package build, but commenting it out and rebuilding
>the package on precise and I can't reproduce the aborts.
>
>It also seems like quite a few processes are leaking file descriptors. Running
>your first dbus-daemon example in the last comment under gdb will also cause
>the abort unless I manually set FD_CLOEXEC on fd 3-6 before attempting to
>launch the service. Likewise, running dbus-daemon with --fork will trigger the
>aborts.

I'm still confused.  I think we've established that the child is SIGABRTing
when DBUS_FATAL_WARNINGS=1 because some warning occurs when dbus-launch execs
the child.  But I'm still at a loss to know exactly *which* warning is
occurring as there is no output that I ever see, even with DBUS_VERBOSE=1.
Nothing on stdout/stderr and nothing in any log file.  So perhaps the former
are getting closed before the warnings are occurring?  Have you actually seen
any warnings during normal operation when reproducing this problem?

I'm also trying to understand the difference between Ubuntu and Debian here,
as I'm fairly sure Ubuntu doesn't carry any relevant dbus delta from Debian,
so perhaps there's some other environmental interaction, that's only shown up
since Ubuntu 11.04 (IIRC).
Comment 11 Jason Conti 2012-01-06 09:36:59 UTC
(In reply to comment #9)
> Does rebuilding the same source with your current toolchain also fix it?

Do you mean rebuild dbus in oneiric with the passage above commented out, or 
just a no-change rebuild in precise without the passage commented to see if
it was a problem with the build?

I did the latter earlier in the day yesterday (but added nostrip and noopt) 
and was still getting the aborts.

(In reply to comment #10)
> I'm still confused.  I think we've established that the child is SIGABRTing
> when DBUS_FATAL_WARNINGS=1 because some warning occurs when dbus-launch execs
> the child.

Sorry, I kind of tacked it on as an afterthought at the end, what I meant was that it is not just with dbus-launch. I checked the code and the only major difference I saw was that dbus-launch execs dbus-daemon with --fork, and running:

DBUS_FATAL_WARNINGS=1 dbus-daemon --config-file test/tmp-session-bus.conf
--print-address --fork

and then your python example will abort just the same as the dbus-launch one. I
also get the aborts when I run:

DBUS_FATAL_WARNINGS=1 gdb --args dbus-daemon --config-file test/tmp-session-bus.conf --print-address

Which also prints the following:

Activating service name='org.freedesktop.DBus.TestSuitePythonService'
Fd 3 did not have the close-on-exec flag set!
  /bin/dbus-daemon(_dbus_print_backtrace+0x29) [0x8008f9c9]
  /bin/dbus-daemon(_dbus_abort+0x14) [0x800931a4]
  /bin/dbus-daemon(+0x75c82) [0x80075c82]
  /bin/dbus-daemon(_dbus_spawn_async_with_babysitter+0x490) [0x800995e0]
  /bin/dbus-daemon(bus_activation_activate_service+0x113e) [0x800180ce]
  /bin/dbus-daemon(+0x30dc0) [0x80030dc0]
  /bin/dbus-daemon(bus_driver_handle_message+0x6e8) [0x80033038]
  /bin/dbus-daemon(+0x2dbe9) [0x8002dbe9]
  /bin/dbus-daemon(dbus_connection_dispatch+0x8c3) [0x8004a7f3]
  /bin/dbus-daemon(_dbus_loop_dispatch+0x40) [0x80094a60]
  /bin/dbus-daemon(_dbus_loop_iterate+0x45f) [0x80094eff]
  /bin/dbus-daemon(_dbus_loop_run+0xa0) [0x80095230]
  /bin/dbus-daemon(main+0x740) [0x80013970]
  /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb7dfa113]
  /bin/dbus-daemon(+0x13dc9) [0x80013dc9]
Activated service 'org.freedesktop.DBus.TestSuitePythonService' failed: Process /bin/bash received signal 6

I initially thought this was just gdb making debugging difficult by leaking
the fd as Scott mentioned, and it is difficult to tell since I haven't been
able to get any output in the dbus-launch or dbus-daemon --fork cases.

I was curious if they were related though, so I commented out the above passage
in dbus-spawn.c, rebuilt and wasn't able to reproduce the aborts with gdb,
dbus-launch or dbus-daemon --fork, which is why I figured they were connected.

As it happens, the build with that code commented also allows make check to run
without issues even with DBUS_FATAL_WARNINGS=1 in the dbus-python code.
Comment 12 Jason Conti 2012-01-07 11:46:40 UTC
Decided to build a debian sid system for testing, and sure enough no aborts
as Simon mentions above (sorry for calling you Scott in the last comment).
But looking at the dbus-daemon binary, it is half the size on debian as in
ubuntu.

So I installed dbus-1-dbg on debian, and sure enough, using the debug binary
from that package (a binary which is not installed on ubuntu), and I can get
the aborts in debian as well.

It turns out in ubuntu we are (hopefully unintentionally) installing the
debug binary of dbus-daemon instead of the production version. This can be
seen in, for instance, the build log of the i386 version in precise:
https://launchpadlibrarian.net/88969187/buildlog_ubuntu-precise-i386.dbus_1.4.16-1ubuntu2_BUILDING.txt.gz

It installs the binary from build/bus to debian/tmp//bin and then proceeds
a bit later to overwrite it with the binary from build-debug/bus.

The ubuntu package adds --exec-prefix=/ to common_configure_flags in debian/rules
so that the binary ends up in /bin and somehow this is having the side effect
of causing the debug binary to overwrite it. I move the flag to the flags
for the production build and added back the debug binaries to
debian/dbus-1-dbg.install and everything seems to be working again.
Comment 13 Barry Warsaw 2012-01-10 07:28:11 UTC
Hi Jason, thanks for looking into this in such great detail.

On Jan 07, 2012, at 07:46 PM, bugzilla-daemon@freedesktop.org wrote:

>It turns out in ubuntu we are (hopefully unintentionally) installing the
>debug binary of dbus-daemon instead of the production version. This can be
>seen in, for instance, the build log of the i386 version in precise:
>https://launchpadlibrarian.net/88969187/buildlog_ubuntu-precise-i386.dbus_1.4.16-1ubuntu2_BUILDING.txt.gz

I talked to Martin Pitt about this and he said there's no good reason why
Ubuntu installs the debug version of dbus-daemon over top the production
version.  I think this was just a buggy solution for moving the binaries from
/usr/bin into /bin.

>It installs the binary from build/bus to debian/tmp//bin and then proceeds
>a bit later to overwrite it with the binary from build-debug/bus.
>
>The ubuntu package adds --exec-prefix=/ to common_configure_flags in
>debian/rules
>so that the binary ends up in /bin and somehow this is having the side effect
>of causing the debug binary to overwrite it. I move the flag to the flags
>for the production build and added back the debug binaries to
>debian/dbus-1-dbg.install and everything seems to be working again.

I've confirmed your change fixes the aborts in the test suite.  I double
checked the change with pitti and we agreed that it looks good.  The one
problem is that the debug binaries get installed in bogus paths, but the good
news is that it doesn't really affect anything :).  So we agreed not to worry
about that for now.

I'll upload this to Precise, but since we have no open bugs on it (other than
the one you opened :), I don't see any reason to back port it.

Thanks again!

(The freedesktop.org bug can be closed now.)
Comment 14 Simon McVittie 2012-05-09 08:09:31 UTC
It seems the bug was in Ubuntu's dbus packaging, so, closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.