Summary: | colord-sane: segfault with error 6 in libdbus-1.so.3.7.0 | ||
---|---|---|---|
Product: | dbus | Reporter: | Paul Menzel <paulepanter> |
Component: | core | Assignee: | Havoc Pennington <hp> |
Status: | RESOLVED NOTOURBUG | QA Contact: | John (J5) Palmieri <johnp> |
Severity: | normal | ||
Priority: | medium | CC: | paulepanter, richard |
Version: | 1.5 | Keywords: | have-backtrace |
Hardware: | All | ||
OS: | All | ||
URL: | http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=668225 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | logs for shutdown hanging and the start afterwards |
Description
Paul Menzel
2012-05-15 13:48:55 UTC
(In reply to comment #0) > I was told my D-Bus setup is messed up. But I was not told > what might be wrong and still programs should not crash because of that. Yes, they should. Frameworks that map core libraries like libdbus-1.so.3.7.0 in process can't do *anything* if something inside the linked library crashes. It's just not possible to recover on a UNIX-like shared library model. > I am very interested how my D-Bus setup is wrong and how I can remedy > that. DBus has to work correctly, or a lot of things are going to crash. > In the meanwhile it would be great if colord could be fixed. I really don't think this is a colord problem. It's segfaulting inside libdbus-1.so.3.7.0... (In reply to comment #1) > (In reply to comment #0) > > I was told my D-Bus setup is messed up. But I was not told > > what might be wrong and still programs should not crash because of that. > > Yes, they should. Frameworks that map core libraries like libdbus-1.so.3.7.0 in > process can't do *anything* if something inside the linked library crashes. > It's just not possible to recover on a UNIX-like shared library model. You are right. I mixed that up. > > I am very interested how my D-Bus setup is wrong and how I can remedy > > that. > > DBus has to work correctly, or a lot of things are going to crash. > > > In the meanwhile it would be great if colord could be fixed. > > I really don't think this is a colord problem. It's segfaulting inside > libdbus-1.so.3.7.0... True again. I therefore reopen this ticket and change the product to D-Bus. Here is some version information. $ dpkg -l libdbus-1-3 […] ii libdbus-1-3:i386 1.5.12-1 simple interprocess messaging system (library) $ dpkg -L libdbus-1-3 /. /lib /lib/i386-linux-gnu /lib/i386-linux-gnu/libdbus-1.so.3.7.0 […] Please tell me what other information I can provide. Ok, in IRC channel #dbus (IRC) I was told to capture a backtrace. Since the segfaults mostly happen during start up I will have to figure out a way to start colord-sane automatically from GDB. Or a way to get a coredump I guess. On June 6th DBus was upgraded to 1.6.0-1. libdbus-1-3:i386 1.5.12-1 -> 1.6.0-1 The crash in libdbus-1.so.3.7.1 was still there. /var/log/syslog.6.gz:Jun 27 [ 7984.923591] colord-sane[3507]: segfault at 19 ip b55de34a sp b55b1fb8 error 6 in libdbus-1.so.3.7.1[b55b3000+49000] /var/log/syslog.5.gz:Jun 27 [ 1837.738688] colord-sane[3510]: segfault at 19 ip b56b834a sp b568bfb8 error 6 in libdbus-1.so.3.7.1[b568d000+49000] /var/log/syslog.5.gz:Jun 27 [ 965.881639] colord-sane[3518]: segfault at 19 ip b560734a sp b7167fb8 error 6 in libdbus-1.so.3.7.1[b55dc000+49000] /var/log/syslog.5.gz:Jun 27 [ 3660.161146] colord-sane[3491]: segfault at 20474e4d ip b565c342 sp b562ffb8 error 6 in libdbus-1.so.3.7.1[b5631000+49000] /var/log/syslog.5.gz:Jun 27 [ 1545.183620] colord-sane[3499]: segfault at 19 ip b568c34a sp b565ffb8 error 6 in libdbus-1.so.3.7.1[b5661000+49000] /var/log/syslog.4.gz:Jun 29 [11734.265144] colord-sane[3502]: segfault at 4 ip b55db342 sp b713bfb8 error 6 in libdbus-1.so.3.7.1[b55b0000+49000] After this `libdbus-1-3` was upgraded to 1.6.2-1 and I have not seen the crash since. There is a new crash though now. /var/log/syslog.1.gz:Jul 3 00:48:31 mattotaupa kernel: [19959.302068] colord-sane[3119]: segfault at 0 ip b736a935 sp bfee3c8c error 4 in libc-2.13.so[b72d4000+156000] So it looks like this has been fixed in DBus. (In reply to comment #4) > Ok, in IRC channel #dbus (IRC) I was told to capture a backtrace. Since the > segfaults mostly happen during start up I will have to figure out a way to > start colord-sane automatically from GDB. Or a way to get a coredump I guess. Yes, please. There isn't enough information here to locate the bug. Richard: perhaps you can suggest how to obtain a stack trace from colord-sane? (In reply to comment #6) > (In reply to comment #4) > > Ok, in IRC channel #dbus (IRC) I was told to capture a backtrace. Since the > > segfaults mostly happen during start up I will have to figure out a way to > > start colord-sane automatically from GDB. Or a way to get a coredump I guess. > > Yes, please. There isn't enough information here to locate the bug. > > Richard: perhaps you can suggest how to obtain a stack trace from colord-sane? 1. I guess this is due to something, that D-Bus is not started fully in the beginning, because it is only happens during start up. 2. I tried to set up my system to do core dumps. $ more /etc/limits.conf * C2048000 $ grep unlimited /etc/security/limits.conf * soft core unlimited But that does not seem to work since there is no message »Core dumped«. colord-sane[3657]: segfault at 4 ip b5614362 sp b7171fb8 error 6 in libdbus-1.so.3.7.2[b55e9000+49000] At least I do not know where to look for the dumps if they are created. 3. Simon, is there any way to get to the debugging symbols without building the D-Bus package myself? If you have them on your system, you should be able to do $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 or the other hex number to find out where it segfaults. (In reply to comment #7) > 3. Simon, is there any way to get to the debugging symbols without building the > D-Bus package myself? apt-get install dbus-1-dbg > If you have them on your system, you should be able to do > > $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 > > or the other hex number to find out where it segfaults. A full backtrace would be much more useful than the segfaulting line, but yes, that's a good idea. I'll try that when I have some time to look into this bug. (Anyone else interested in this bug: any extra information would be welcome). (In reply to comment #8) > (In reply to comment #7) > > 3. Simon, is there any way to get to the debugging symbols without building the > > D-Bus package myself? > > apt-get install dbus-1-dbg > > > If you have them on your system, you should be able to do > > > > $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 > > > > or the other hex number to find out where it segfaults. > > A full backtrace would be much more useful than the segfaulting line, Alright. Next thing I try and I hope will hopefully get a backtrace is changing in /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service the `Exec` call to the following. Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane Hopefully that will get us a core dump. > but yes, that's a good idea. I'll try that when I have some time to look into this bug. > (Anyone else interested in this bug: any extra information would be welcome). Since I have no clue about all this I found the following two Web pages quite useful. Although both state that such things happening in shared libraries makes it more difficult to debug. But the first one offers a tutorial for that anyhow. [1] http://coredump.io/blog/2012/02/23/debugging-segfaults-from-logs-to-gdb/ [2] http://stackoverflow.com/questions/2549214/interpreting-segfault-messages (In reply to comment #9) > (In reply to comment #8) […] > > A full backtrace would be much more useful than the segfaulting line, > > Alright. Next thing I try and I hope will hopefully get a backtrace is changing > in > > /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service > > the `Exec` call to the following. > > Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane > > Hopefully that will get us a core dump. Unfortunately that does not work. dbus[1807]: [system] Activating service name='org.freedesktop.colord-sane' (using servicehelper) dbus[1807]: [system] Activated service 'org.freedesktop.colord-sane' failed: Cannot launch daemon, file not found or permissions invalid Any hints on how to start /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service manually with some `dbus` foo? […] As an update, on another system the crash happens when the system is shutting down. colord-sane[2629]: segfault at 8 ip 00007f410489f66c sp 00007f4104875b78 error 6 in libdbus-1.so.3.7.2[7f4104877000+44000] For those not knowing already, the difference of +49000 and + 44000 seems to origin in the used architecture (i386 vs. amd64). (In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > > > 3. Simon, is there any way to get to the debugging symbols without building the > > > D-Bus package myself? > > > > apt-get install dbus-1-dbg > > > > > If you have them on your system, you should be able to do > > > > > > $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 > > > > > > or the other hex number to find out where it segfaults. > > > > A full backtrace would be much more useful than the segfaulting line, [ Unsuccessful in getting a backtrace. ] > > but yes, that's a good idea. I'll try that when I have some time to look into this bug. > > (Anyone else interested in this bug: any extra information would be welcome). > > Since I have no clue about all this I found the following two Web pages quite > useful. Although both state that such things happening in shared libraries > makes it more difficult to debug. But the first one offers a tutorial for that > anyhow. With the help of thiago in #dbus on irc.freenode.net, I found out the following. 1. thiago looking into the Linux source code printk("%s%s[%lx+%lx]", prefix, p, vma->vm_start, vma->vm_end - vma->vm_start); he found out that in the line colord-sane[3657]: segfault at 4 ip b5614362 sp b7171fb8 error 6 in libdbus-1.so.3.7.2[b55e9000+49000] we are interested in the address after `ip`. To get the pointer, we calculate the difference b5614362 - b55e9000, which is 0x2b362. `addr2line` still does not give us any code line. $ echo 0x2b362 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 ??:0 $ echo 2b362 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 ??:0 But using GDB we get the following. $ gdb -q /lib/i386-linux-gnu/libdbus-1.so.3.7.2 Reading symbols from /lib/i386-linux-gnu/libdbus-1.so.3.7.2...Reading symbols from /usr/lib/debug/.build-id/f0/8f17cd1f791e9fad9f8528f7146a467c3736b5.debug...done. done. (gdb) p/a 0x2B362 $1 = 0x2b362 <_dbus_list_unlink+18> (gdb) l *0x2B362 0x2b362 is in _dbus_list_unlink (../../dbus/dbus-list.c:498). 493 ../../dbus/dbus-list.c: Datei oder Verzeichnis nicht gefunden. > [1] http://coredump.io/blog/2012/02/23/debugging-segfaults-from-logs-to-gdb/ > [2] http://stackoverflow.com/questions/2549214/interpreting-segfault-messages Calculating all the locations, it looks like it happens in two different places. libdbus-1.so.3.7.0 (i386 userspace with amd64 Linux kernel) 00000000f571d482-f56f2000 2B482 00000000f562748a-f55fc000 2B48A 00000000f564a482-f561f000 2B482 libdbus-1.so.3.7.1 (i386 userspace and Linux kernel) b55de34a-b55b3000 2B34A b56b834a-b568d000 2B34A b560734a-b55dc000 2B34A b565c342-b5631000 2B342 b568c34a-b5661000 2B34A b55db342-b55b0000 2B342 libdbus-1.so.3.7.2 (i386 userspace and Linux kernel) b5614362-b55e9000 2B362 libdbus-1.so.3.7.2 (amd64 userspace and Linux kernel) 00007f410489f66c-7f4104877000 2866C Probably a script could be written to do that automatically using bc. Please note that bc needs all letters for hex numbers in upper case [1]. See `man bash` for that. $ bc bc 1.06.95 Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'. ibase=16 F+F 30 obase=F F+F 20 obase=10 F+F 1E 00000000f564a482-f561f000 (standard_in) 7: syntax error 00000000F564A482-F561F000 2B482 [1] http://doc-tcpip.org/Allgemein/bc.convert.html It also fails with the latest packages of version 1.6.8-1 in Debian Sid/unstable [1]. [ 2599.845031] colord-sane[2680]: segfault at 41 ip 00007f13d9b347d7 sp 00007f13cc9ceb78 error 6 in libdbus-1.so.3.7.2[7f13d9b0c000+44000] [1] http://packages.debian.org/changelogs/pool/main/d/dbus/dbus_1.6.8-1/changelog Recent colord uploads to Debian: http://packages.qa.debian.org/c/colord.html suggest that this might have been a colord bug all along: * debian/patches/fix_colord-sane_dbus_threading.patch: - Initialise libdbus threadsafety in colord-sane. (Probably) fixes multiple crashes from colord-sane in libdbus (LP: #844286). Plausibly fixes other bad colord-sane behaviour, such as 100% CPU usage (Closes: 668325) libdbus isn't thread-safe unless you tell it to be. Unfortunately, we can't make it thread-safe-by-default[1], because it supports running on systems where it might run out of memory and see malloc() returning NULL, and becoming thread-safe requires allocating memory (so can fail). [1] or at least, not in a nice way: some entry points have no way to report an error, so if allocating memory for mutexes failed at one of those entry points, it'd have to either block until more memory was available, or abort() (In reply to comment #16) > Recent colord uploads to Debian: > > http://packages.qa.debian.org/c/colord.html > > suggest that this might have been a colord bug all along Forgot to say: so please try with colord (>= 0.1.21-3) and see if that fixes it. (In reply to comment #17) > (In reply to comment #16) > > Recent colord uploads to Debian: > > > > http://packages.qa.debian.org/c/colord.html > > > > suggest that this might have been a colord bug all along > > Forgot to say: so please try with colord (>= 0.1.21-3) and see if that fixes > it. I forgot to say, that I had these updated packages already installed. The result is that the segmentation fault (as pasted above) now happens during shutdown and not during the start. For best results, have debugging symbols for at least libdbus and colord (install dbus-1-dbg, and rebuild colord with debug symbols as per <http://wiki.debian.org/HowToGetABacktrace> since it doesn't seem to have a corresponding -dbg package). (In reply to comment #9) > Alright. Next thing I try and I hope will hopefully get a backtrace is > changing in > > /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service > > the `Exec` call to the following. > > Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane I think D-Bus requires the Exec in the .service file to be a (full path to a) command + arguments, so you could use: Exec=/bin/sh -c "ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane" Having obtained a core, you can turn that into a backtrace with: gdb /usr/lib/*/colord/colord-sane core (at the gdb prompt) thread apply all bt full (In reply to comment #14) > libdbus-1.so.3.7.0 (i386 userspace with amd64 Linux kernel) > 00000000f571d482-f56f2000 2B482 I'm afraid this is meaningless to us; even if we knew which function this was in (which in principle we could work out with objdump or something), we also need to know which function it was called from, which function called *that*, and so on. That's the information a full backtrace would give us. (In reply to comment #19) > For best results, have debugging symbols for at least libdbus and colord > (install dbus-1-dbg, and rebuild colord with debug symbols as per > <http://wiki.debian.org/HowToGetABacktrace> since it doesn't seem to have a > corresponding -dbg package). Sure. > (In reply to comment #9) > > Alright. Next thing I try and I hope will hopefully get a backtrace is > > changing in > > > > /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service > > > > the `Exec` call to the following. > > > > Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane > > I think D-Bus requires the Exec in the .service file to be a (full path to > a) command + arguments, so you could use: > > Exec=/bin/sh -c "ulimit -c unlimited && > /usr/lib/i386-linux-gnu/colord/colord-sane" Thank you very much. On one system this seems to work. On a different one I got the following. dbus[1980]: [system] Activating service name='org.freedesktop.ColorManager' (using servicehelper) dbus[1980]: [system] Successfully activated service 'org.freedesktop.ColorManager' dbus[1980]: [system] Activating service name='org.freedesktop.colord-sane' (using servicehelper) dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed: Launch helper exited with unknown return code 127 This is on a SSD system. I probably need to pass some debug options to get more information. > Having obtained a core, you can turn that into a backtrace with: > > gdb /usr/lib/*/colord/colord-sane core > (at the gdb prompt) thread apply all bt full > > (In reply to comment #14) > > libdbus-1.so.3.7.0 (i386 userspace with amd64 Linux kernel) > > 00000000f571d482-f56f2000 2B482 > > I'm afraid this is meaningless to us; even if we knew which function this > was in (which in principle we could work out with objdump or something), we > also need to know which function it was called from, which function called > *that*, and so on. That's the information a full backtrace would give us. Sure. Unfortunately nothing could be captured until now. :/ Thanks for the help and hopefully I will succeed in getting a backtrace. (In reply to comment #20) > dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed: > Launch helper exited with unknown return code 127 Shells exit with 127 when they can't find an executable. Is this other system perhaps running amd64 userland rather than i386, so it has /usr/lib/x86_64-linux-gnu/colord/colord-sane instead? (In reply to comment #21) > (In reply to comment #20) > > dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed: > > Launch helper exited with unknown return code 127 > > Shells exit with 127 when they can't find an executable. Is this other > system perhaps running amd64 userland rather than i386, so it has > /usr/lib/x86_64-linux-gnu/colord/colord-sane instead? Spot on! I will make that change and tomorrow we will see if it will work. Thanks! (In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #20) > > > dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed: > > > Launch helper exited with unknown return code 127 > > > > Shells exit with 127 when they can't find an executable. Is this other > > system perhaps running amd64 userland rather than i386, so it has > > /usr/lib/x86_64-linux-gnu/colord/colord-sane instead? > > Spot on! I will make that change and tomorrow we will see if it will work. > Thanks! Alright. First I could not find any core files. Probably that was due to that `/usr/lib/x86_64-linux-gnu/colord/` is only writable by root. `man core` gives some hints how to change the location and luckily on this system `/tmp` is on the disk and not in RAM, so echo "/tmp/core.%p.%E" | sudo tee /proc/sys/kernel/core_pattern is one solution to get the core dump file saved. Thread 3 (Thread 0x7f70ea836700 (LWP 2247)): #0 0x00007f70ec47fac3 in *__GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 _a3 = -1 _a1 = 140122838279216 resultvar = <optimized out> _a2 = 3 oldtype = 0 result = <optimized out> #1 0x00007f70ecbe5624 in g_main_context_poll (n_fds=3, fds=0x7f70e4002430, timeout=-1, context=0x15ff640, priority=<optimized out>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3440 poll_func = 0x7f70ecbf3410 <g_poll> #2 g_main_context_iterate (context=0x15ff640, block=block@entry=1, dispatch=dispatch@entry=1, self=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3141 max_priority = 2147483647 timeout = -1 some_ready = <optimized out> nfds = 3 allocated_nfds = <optimized out> fds = 0x7f70e4002430 #3 0x00007f70ecbe5a82 in g_main_loop_run (loop=0x15ff5f0) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3340 __PRETTY_FUNCTION__ = "g_main_loop_run" #4 0x00007f70ed5bba86 in gdbus_shared_thread_func (user_data=0x15ff610) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./gio/gdbusprivate.c:277 data = 0x15ff610 #5 0x00007f70ecc07f45 in g_thread_proxy (data=0x15f8370) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gthread.c:801 thread = 0x15f8370 #6 0x00007f70ec73fb50 in start_thread (arg=<optimized out>) at pthread_create.c:304 __res = <optimized out> pd = 0x7f70ea836700 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122947544832, -509398390679298109, 140122955934736, 140122947545536, 140123008782400, 3, 436732454272212931, 436735068657655747}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #7 0x00007f70ec48a70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 No locals. #8 0x0000000000000000 in ?? () No symbol table info available. Thread 2 (Thread 0x7f70ee2797a0 (LWP 2233)): #0 0x00007f70ec47fac3 in *__GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 _a3 = -1 _a1 = 23022384 resultvar = <optimized out> _a2 = 2 oldtype = 0 result = <optimized out> #1 0x00007f70ecbe5624 in g_main_context_poll (n_fds=2, fds=0x15f4b30, timeout=-1, context=0x15e6530, priority=<optimized out>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3440 poll_func = 0x7f70ecbf3410 <g_poll> #2 g_main_context_iterate (context=0x15e6530, block=block@entry=1, dispatch=dispatch@entry=1, self=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3141 max_priority = 2147483647 timeout = -1 some_ready = <optimized out> nfds = 2 allocated_nfds = <optimized out> fds = 0x15f4b30 #3 0x00007f70ecbe5a82 in g_main_loop_run (loop=0x15e64b0) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3340 __PRETTY_FUNCTION__ = "g_main_loop_run" #4 0x00000000004021fa in main () No symbol table info available. Thread 1 (Thread 0x7f70db77b700 (LWP 2665)): #0 0x00007f70ec97d7cc in _dbus_list_unlink (list=list@entry=0x7f70db77aba8, link=link@entry=0x16b8180) at ../../dbus/dbus-list.c:498 No locals. #1 0x00007f70ec97d9f9 in _dbus_list_pop_first_link (list=list@entry=0x7f70db77aba8) at ../../dbus/dbus-list.c:625 link = 0x16b8180 #2 0x00007f70ec961e5b in _dbus_connection_unlock (connection=connection@entry=0x7f70ec734eb8) at ../../dbus/dbus-connection.c:424 expired_messages = 0x16b8180 iter = <optimized out> #3 0x00007f70ec962929 in dbus_connection_get_is_connected (connection=0x7f70ec734eb8) at ../../dbus/dbus-connection.c:2936 res = 1 __FUNCTION__ = "dbus_connection_get_is_connected" #4 0x00007f70db781096 in avahi_client_is_connected (client=0x16991e0) at client.c:901 No locals. #5 0x00007f70db785134 in avahi_service_browser_free (b=0x169bee0) at browser.c:698 client = 0x16991e0 r = 0 __PRETTY_FUNCTION__ = "avahi_service_browser_free" #6 0x00007f70dbba2668 in net_avahi_callback (c=0x16991e0, state=<optimized out>, userdata=<optimized out>) at net.c:854 proto = <optimized out> error = -23 #7 0x00007f70db7814ae in filter_func (bus=<optimized out>, message=0x169ca40, userdata=0x16991e0) at client.c:300 client = 0x16991e0 error = {name = 0x0, message = 0x0, dummy1 = 1, dummy2 = 0, dummy3 = 0, dummy4 = 1, dummy5 = 1, padding1 = 0x7f70db77acf8} __PRETTY_FUNCTION__ = "filter_func" #8 0x00007f70ec96553e in dbus_connection_dispatch (connection=0x1699810) at ../../dbus/dbus-connection.c:4631 filter = <optimized out> next = 0x0 message = 0x169ca40 link = <optimized out> filter_list_copy = 0x1654640 message_link = 0x1654778 result = DBUS_HANDLER_RESULT_NOT_YET_HANDLED pending = <optimized out> reply_serial = <optimized out> status = <optimized out> found_object = <optimized out> __FUNCTION__ = "dbus_connection_dispatch" #9 0x00007f70db7888b6 in dispatch_timeout_callback (t=<optimized out>, userdata=<optimized out>) at ../avahi-common/dbus-watch-glue.c:105 d = 0x164f1d0 #10 0x00007f70db9963f0 in avahi_simple_poll_dispatch (s=0x1699130) at simple-watch.c:570 next_timeout = 0x1699280 w = <optimized out> __PRETTY_FUNCTION__ = "avahi_simple_poll_dispatch" #11 0x00007f70db9966dd in avahi_simple_poll_loop (s=0x1699130) at simple-watch.c:646 r = <optimized out> __PRETTY_FUNCTION__ = "avahi_simple_poll_loop" #12 0x00007f70db99675c in thread (userdata=0x16990e0) at thread-watch.c:71 p = 0x16990e0 mask = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} #13 0x00007f70ec73fb50 in start_thread (arg=<optimized out>) at pthread_create.c:304 __res = <optimized out> pd = 0x7f70db77b700 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122695120640, -509398390679298109, 140122980131232, 140122695121344, 140123008782400, 3, 436838390177439683, 436735068657655747}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #14 0x00007f70ec48a70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 No locals. #15 0x0000000000000000 in ?? () No symbol table info available. quit So looking at the trace (Why did Bugzilla not collapse it?), it looks like Avahi and libsane are involved. (In reply to comment #24) > Thread 1 (Thread 0x7f70db77b700 (LWP 2665)): This is presumably the thread that caught the segfault, since the others are blocking in poll() which isn't in libdbus. > #0 0x00007f70ec97d7cc in _dbus_list_unlink (list=list@entry=0x7f70db77aba8, > link=link@entry=0x16b8180) at ../../dbus/dbus-list.c:498 > No locals. This line is: link->prev->next = link->next; which can only fail if the linked list is somehow corrupt (libdbus uses circular linked lists, so prev and next are always non-NULL). > #2 0x00007f70ec961e5b in _dbus_connection_unlock > (connection=connection@entry=0x7f70ec734eb8) at > ../../dbus/dbus-connection.c:424 > expired_messages = 0x16b8180 > iter = <optimized out> Context: we're going through the list of "expired messages" (messages that can't safely be freed until the connection is unlocked) and freeing them all. We do this by moving the connection's list to a temporary variable while under the lock, setting the connection's list to NULL, unlocking, and then freeing the temporary list. Other than that, the list is only changed by _dbus_connection_message_sent_unlocked, _dbus_connection_peer_filter_unlocked_no_update and dbus_connection_dispatch which are all under the lock. Ideas, anyone? (In reply to comment #25) > So looking at the trace (Why did Bugzilla not collapse it?), it looks like > Avahi and libsane are involved. AIUI, Bugzilla didn't collapse it because that isn't a standard feature, and freedesktop.org doesn't have all the same plugins as gnome.org. Thread 2 in your backtrace is the main thread. Thread 3 is the GDBus worker thread. Thread 1 is libavahi-client (which uses libdbus) running in another worker thread. This appears to come from backend/net.c in sane-backends? Looking at backend/net.c, I am very suspicious about the thread-safety of its Avahi interactions. It has a linked list of devices starting at static Net_Device *first_device; which is written by add_device(), which appears to happen in the Avahi thread. It's also read by sane_exit(), sane_get_devices() and sane_open() with no apparent locking. This is basically nothing to do with libdbus, but because the threads share a process-space, writing through a bad pointer anywhere in the process could have action-at-a-distance effects - so it seems worth looking into. It might be worth running colord-sane under one of Valgrind's thread-debugging tools (Helgrind or DRD), which are designed to detect whether multi-threaded memory accesses are deterministic. (In reply to comment #27) > Looking at backend/net.c, I am very suspicious about the thread-safety of > its Avahi interactions. Forwarded to: https://alioth.debian.org/tracker/index.php?func=detail&aid=313921&group_id=30186&atid=410366 Hopefully we'll get some feedback from there? Created attachment 77563 [details]
logs for shutdown hanging and the start afterwards
Comment on attachment 77563 [details] logs for shutdown hanging and the start afterwards Hi, Any progress in fixing this? I am using linux mint debian edition cinnamon 64bit 201303 with dbus 1.6.8-1 and colord 0.1.21-1 installed and almost every day, the shutdown/logout/reboot hangs my laptop! https://bugs.launchpad.net/linuxmint/+bug/1163484 Each time when it hangs, I can see in the system log: [ 9018.025165] colord-sane[3735]: segfault at 58 ip 00007fd6206507cc sp 00007fd61062fb78 error 6 in libdbus-1.so.3.7.2[7fd620628000+44000] See attached syslog_for_shutdown_and_start_afterwards.log. (In reply to comment #30) > Any progress in fixing this? 0.1.21 is really old. 0.1.32 was released recently, can you try with a newer version please. (In reply to comment #30) > I am using linux mint debian edition cinnamon 64bit 201303 with dbus 1.6.8-1 > and colord 0.1.21-1 installed and almost every day, the > shutdown/logout/reboot hangs my laptop! I can't see how a crash in colord-sane would hang your laptop. Try changing "UseSANE=true" to "UseSANE=false" in /etc/colord.conf, which should stop colord-sane from being run (so it can't crash). If you still get the hangs, then you have some other unrelated bug. After installing simple-scan from Debian Wheezy, I see the following at boot: colord-sane [3105] segfault at 15e ip b54f9f06 sp b54e504c error 4 in libdbus-1.so.3.7.2[b54e6000+49000] In spite of the error message, simple-scan still works, both for a black and white page and a colored image. I am not enlightened about the analysis of this error and just reporting my experience to be a good citizen. If there is any additional information I can provide, I will be happy to attempt to do so. Steve Rinsler This seems to have been caused by SANE not being thread-safe. colord 0.1.31 includes a rewrite of colord-sane which is believed to not have this crash - hopefully it doesn't multi-thread its interactions with SANE, Avahi and/or libdbus. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.