Bug 15578

Summary: _dbus_watch_invalidate segfaults
Product: dbus Reporter: toady
Component: coreAssignee: Havoc Pennington <hp>
Status: RESOLVED WORKSFORME QA Contact: John (J5) Palmieri <johnp>
Severity: critical    
Priority: medium CC: hp, jw+debian, smcv
Version: 1.2.xKeywords: NEEDINFO
Hardware: Other   
OS: All   
Whiteboard: NB#247014
i915 platform: i915 features:
Attachments: Patch (maybe) resolving this
regression test which doesn't reproduce this bug
make modular tests depend on GLib 2.22, for GSocket

Description toady 2008-04-18 02:21:19 UTC
Created attachment 15998 [details] [review]
Patch (maybe) resolving this

Since I cannot put dbus version in the bugzilla, it is: dbus-1.2.1

Through some intensive usage of libnotify, I have a segfault occuring in Dbus:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40800950 (LWP 4172)]
0x00002ac6615bf436 in _dbus_watch_invalidate (watch=0x0) at dbus-watch.c:147
147	  watch->fd = -1;
(gdb) bt
#0  0x00002ac6615bf436 in _dbus_watch_invalidate (watch=0x0) at dbus-watch.c:147
#1  0x00002ac6615bd985 in free_watches (transport=0x6bb1e0) at dbus-transport-socket.c:82
#2  0x00002ac6615be707 in socket_disconnect (transport=0x6bb1e0) at dbus-transport-socket.c:908
#3  0x00002ac6615bcc0d in _dbus_transport_disconnect (transport=0x6bb1e0) at dbus-transport.c:494
#4  0x00002ac6615bd5ef in _dbus_transport_queue_messages (transport=0x6bb1e0) at dbus-transport.c:1137
#5  0x00002ac6615a4aa8 in _dbus_connection_get_dispatch_status_unlocked (connection=0x6bb750) at dbus-connection.c:3962
#6  0x00002ac6615a27fc in check_for_reply_and_update_dispatch_unlocked (connection=0x6bb750, pending=0x2aaaac002090) at dbus-connection.c:2223
#7  0x00002ac6615a29df in _dbus_connection_block_pending_call (pending=0x2aaaac002090) at dbus-connection.c:2325
#8  0x00002ac6615b6cf4 in dbus_pending_call_block (pending=0x2aaaac002090) at dbus-pending-call.c:707
#9  0x00002ac661381e58 in dbus_g_proxy_end_call_internal (proxy=0x6a8980, call_id=20, error=0x407ffe80, first_arg_type=28, args=0x407ffc50) at dbus-gproxy.c:2221
#10 0x00002ac661383867 in dbus_g_proxy_call (proxy=0x6a8980, method=0x2ac66116d0d9 "Notify", error=0x407ffe80, first_arg_type=28) at dbus-gproxy.c:2531
#11 0x00002ac66116bfcb in notify_notification_show (notification=<value optimized out>, error=0x0) at notification.c:768

The attached patch *maybe* fix this. What I've seen is watch=0x0, because it is cleaned in free_watches: _dbus_connection_remove_watch_unlocked clear the pointer and this pointer is latter used by _dbus_watch_invalidate and _dbus_watch_unref.

I am not sure about my patch, because I don't know whether we should clean if we do not have transport->connection. But it seems to fix my problem.

I don't know if it is a consequence of my patch, but I have an other segfault afterwards:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40800950 (LWP 19001)]
0x00002b7e7c99d1ae in ?? () from /lib/libc.so.6
(gdb) bt
#0  0x00002b7e7c99d1ae in ?? () from /lib/libc.so.6
#1  0x00002b7e7c99c2c0 in memmove () from /lib/libc.so.6
#2  0x00002b7e7a201384 in delete (real=0x680248, start=0, len=72) at dbus-string.c:1418
#3  0x00002b7e7a2013d9 in _dbus_string_delete (str=0x680248, start=0, len=72) at dbus-string.c:1443
#4  0x00002b7e7a1f0828 in load_message (loader=0x680240, message=0x682340, byte_order=108, fields_array_len=53, header_len=72, body_len=0) at dbus-message.c:3546
#5  0x00002b7e7a1f092b in _dbus_message_loader_queue_messages (loader=0x680240) at dbus-message.c:3620
#6  0x00002b7e7a1f9524 in _dbus_transport_get_dispatch_status (transport=0x680070) at dbus-transport.c:1080
#7  0x00002b7e7a1f95cc in _dbus_transport_queue_messages (transport=0x680070) at dbus-transport.c:1107
#8  0x00002b7e7a1e0aa8 in _dbus_connection_get_dispatch_status_unlocked (connection=0x6806e0) at dbus-connection.c:3962
#9  0x00002b7e7a1dfd9b in dbus_connection_send_with_reply (connection=0x6806e0, message=0x680d80, pending_return=0x407ff7e8, timeout_milliseconds=-1) at dbus-connection.c:3230
#10 0x00002b7e79fbdcc2 in dbus_g_proxy_begin_call_internal (proxy=0x6761e0, method=0x2b7e79fc8f2d "GetNameOwner", notify=0x2b7e79fbac31 <got_name_owner_cb>, user_data=0x676180, 
    destroy=0, args=0x2aaaac0dc560, timeout=-1) at dbus-gproxy.c:2164
#11 0x00002b7e79fbd3e4 in manager_begin_bus_call (manager=0x684850, method=0x2b7e79fc8f2d "GetNameOwner", notify=0x2b7e79fbac31 <got_name_owner_cb>, user_data=0x676180, 
    destroy=0, first_arg_type=64) at dbus-gproxy.c:1791
#12 0x00002b7e79fbb2cf in dbus_g_proxy_manager_register (manager=0x684850, proxy=0x676180) at dbus-gproxy.c:963
#13 0x00002b7e79fbc0e4 in dbus_g_proxy_constructor (type=6814112, n_construct_properties=4, construct_properties=0x67f6f0) at dbus-gproxy.c:1349
#14 0x00002b7e7ba074d0 in g_object_newv () from /usr/lib/libgobject-2.0.so.0
#15 0x00002b7e7ba07ed6 in g_object_new_valist () from /usr/lib/libgobject-2.0.so.0
#16 0x00002b7e7ba08101 in g_object_new () from /usr/lib/libgobject-2.0.so.0
#17 0x00002b7e79fbd4dd in dbus_g_proxy_new (connection=0x6806e8, name=0x2b7e79da8e57 "org.freedesktop.Notifications", path_name=0x2b7e79da8e78 "/org/freedesktop/Notifications", 
    interface_name=0x2b7e79da8e57 "org.freedesktop.Notifications") at dbus-gproxy.c:1859
#18 0x00002b7e79fbd5bb in dbus_g_proxy_new_for_name (connection=0x6806e8, name=0x2b7e79da8e57 "org.freedesktop.Notifications", 
    path_name=0x2b7e79da8e78 "/org/freedesktop/Notifications", interface_name=0x2b7e79da8e57 "org.freedesktop.Notifications") at dbus-gproxy.c:1907
Comment 1 John (J5) Palmieri 2008-04-18 07:02:31 UTC
No there is a missing ref on the watch here.  We don't clear the watch.  It could either be in dbusglib (such as they are unreffing a watch they do not own or are passing off ownership to) or we are not checking if the watch still exists before we invalidate it.
Comment 2 John (J5) Palmieri 2008-04-18 08:05:32 UTC
looking further, this happens when you get a corrupt message.  I still can't figure this out.  We ref in dbus-transport-socket.c: _dbus_transport_new_for_socket for the object reference and in dbus-watch.c:_dbus_watch_list_add_watch for the list reference.  We then unref in _dbus_watch_list_remove_watch which should leave one more ref.

The only thing I can see being an issue is the virtual watch_list->remove_watch_function call.

Can you do me a favor and go into gdb and break inside of around line 395  of dbus-watch.c:_dbus_watch_list_remove_watch and see if the function that calls has an unref in it.  Thanks.
Comment 3 Simon McVittie 2011-01-19 07:15:34 UTC
J5 asked for some more information a while ago.

Is there some code we can run (a particular program with a particular libnotify version, perhaps) to reproduce this?
Comment 4 Simon McVittie 2011-04-14 04:36:52 UTC
*** Bug 24412 has been marked as a duplicate of this bug. ***
Comment 5 Simon McVittie 2011-04-15 04:25:59 UTC
Taking this, I've seen a remarkably similar crash in another project with a patched version of dbus-1.4.6.
Comment 6 Simon McVittie 2011-04-15 05:35:46 UTC
Created attachment 45662 [details] [review]
regression test which doesn't reproduce this bug

I was hoping this test would reproduce this bug, but apparently it's not this simple. I think it's still worth committing...
Comment 7 Simon McVittie 2011-04-15 05:36:51 UTC
Created attachment 45663 [details] [review]
make modular tests depend on GLib 2.22, for GSocket

Attachment #45662 [details] requires this patch, and the infrastructure from Bug #34570.
Comment 8 Simon McVittie 2011-04-18 07:25:16 UTC
(In reply to comment #5)
> I've seen a remarkably similar crash in another project

The other project seems to be invoking dbus_connection_send from a non-main thread without initializing libdbus thread-locking, which seems likely to be what broke it. Could that be the cause here too?
Comment 9 Simon McVittie 2012-02-08 06:23:02 UTC
The patches here have been applied. Nobody is working on this, so back to NEW.

(In reply to comment #8)
> (In reply to comment #5)
> > I've seen a remarkably similar crash in another project
> 
> The other project seems to be invoking dbus_connection_send from a non-main
> thread without initializing libdbus thread-locking, which seems likely to be
> what broke it. Could that be the cause here too?

Was that the problem here? If so, please resolve as INVALID.
Comment 10 Simon McVittie 2014-09-10 15:25:47 UTC
(In reply to comment #9)
> > The other project seems to be invoking dbus_connection_send from a non-main
> > thread without initializing libdbus thread-locking, which seems likely to be
> > what broke it. Could that be the cause here too?
> 
> Was that the problem here? If so, please resolve as INVALID.

I'm going to assume that that was the case here too.

(In reply to comment #6)
> Created attachment 45662 [details] [review]
> regression test which doesn't reproduce this bug

Applied in 2011.

(In reply to comment #7)
> Created attachment 45663 [details] [review]
> make modular tests depend on GLib 2.22, for GSocket
> 
> Attachment #45662 [details] requires this patch, and the infrastructure from
> Bug #34570.

Applied in 2011.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.