Summary: | text rendering lacking locking in multithreaded apps [PATCH] | ||
---|---|---|---|
Product: | cairo | Reporter: | Monty Montgomery <xiphmont> |
Component: | general | Assignee: | Carl Worth <cworth> |
Status: | RESOLVED FIXED | QA Contact: | cairo-bugs mailing list <cairo-bugs> |
Severity: | normal | ||
Priority: | high | CC: | jslupski |
Version: | 1.2.5 | Keywords: | patch |
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
add locking to surface/pattern/cache refcounting
One file hadn't been saved out of the emacs buffer in the previous patch incarnation (dang you emacs and your random safety interlocks!) Compined Monty's patches + fix for typo (see comment #9) + win32 complation fix Lock more cairo code with mutex to prevent remaining crashes |
Description
Monty Montgomery
2006-10-27 17:45:01 UTC
Created attachment 7562 [details] [review] add locking to surface/pattern/cache refcounting Created attachment 7563 [details] [review] One file hadn't been saved out of the emacs buffer in the previous patch incarnation (dang you emacs and your random safety interlocks!) Monty, can you attach a backtrace? I want to see if this is the same bug as: http://bugzilla.gnome.org/show_bug.cgi?id=360691 (In reply to comment #3) > Monty, can you attach a backtrace? > I want to see if this is the same bug as: > http://bugzilla.gnome.org/show_bug.cgi?id=360691 The exact point of the 'crash' was different each time because the problem was unlocked concurrency leaking refcounts, then later throwing an assert error when the refcounts dipped under zero. The 'crash' happened long after the original cause. It's not the backtrace you really wanted from the user-- you wanted the assert abort message. Regardless, I'll get you a backtrace in a few moments. Monty (In reply to comment #3) > Monty, can you attach a backtrace? > I want to see if this is the same bug as: > http://bugzilla.gnome.org/show_bug.cgi?id=360691 Here are the relevant asserts and backtraces. Specifically, I saw two different possible assert failures, both are detailed below. First assert: sushivision: cairo-cache.c:213: _cairo_cache_thaw: Assertion `cache->freeze_count > 0' failed. Program received signal SIGABRT, Aborted. 0x00002b06e3ac307b in raise () from /lib/libc.so.6 (gdb) bt #0 0x00002b06e3ac307b in raise () from /lib/libc.so.6 #1 0x00002b06e3ac484e in abort () from /lib/libc.so.6 #2 0x00002b06e3abcaf4 in __assert_fail () from /lib/libc.so.6 #3 0x00002b06e39334da in _cairo_cache_thaw (cache=<value optimized out>) at cairo-cache.c:213 #4 0x00002b06e393eb74 in _cairo_scaled_font_show_glyphs ( scaled_font=0x6d1f00, op=CAIRO_OPERATOR_OVER, pattern=0x4084eb80, surface=0x2aaaab7011a0, source_x=781, source_y=171, dest_x=781, dest_y=171, width=7, height=26, glyphs=0x2aaaab701c50, num_glyphs=5) at cairo-scaled-font.c:1008 #5 0x00002b06e3942b7f in _cairo_surface_old_show_glyphs_draw_func ( closure=0x4084eae0, op=CAIRO_OPERATOR_OVER, src=0x4084eb80, dst=0x2aaaab7011a0, dst_x=0, dst_y=0, extents=0x4084eb10) at cairo-surface-fallback.c:890 #6 0x00002b06e394268f in _clip_and_composite (clip=0x0, op=CAIRO_OPERATOR_OVER, src=0x4084eb80, draw_func=0x2b06e3942a10 <_cairo_surface_old_show_glyphs_draw_func>, draw_closure=0x4084eae0, dst=0x2aaaab7011a0, extents=0x4084eb10) at cairo-surface-fallback.c:391 #7 0x00002b06e3942a05 in _cairo_surface_fallback_show_glyphs ( surface=0x2aaaab7011a0, op=CAIRO_OPERATOR_OVER, source=0x4084eb80, glyphs=0x2aaaab701c50, num_glyphs=5, scaled_font=0x6d1f00) at cairo-surface-fallback.c:941 #8 0x00002b06e3941494 in _cairo_surface_show_glyphs (surface=0x2aaaab7011a0, op=CAIRO_OPERATOR_OVER, source=<value optimized out>, glyphs=0x2aaaab701c50, num_glyphs=5, scaled_font=0x6d1f00) at cairo-surface.c:1847 #9 0x00002b06e3935fd7 in _cairo_gstate_show_glyphs (gstate=0x2aaaab7013c0, glyphs=0x2aaaab701900, num_glyphs=5) at cairo-gstate.c:1503 #10 0x00002b06e3931c06 in cairo_show_text (cr=0x2aaaab7017c0, utf8=0x4084ee10 "0.230") at cairo.c:2681 #11 0x000000000040490b in draw_scales_work (s=0x2aaaab7011a0, xs= {lo = 0.099285714285714297, hi = 0.255, first_val = 100, first_pixel = 4, step_val = 5, step_pixel = 30, decimal_exponent = -3, m = 0.001, init = 1, pixels = 946}, ys= {lo = -96, hi = 96, first_val = -8, first_pixel = 17, step_val = 2, step_pixel = 21, decimal_exponent = 1, m = 10, init = 1, pixels = 200}) at plot.c:129 #12 0x0000000000404a17 in plot_draw_scales (p=0x566650) at plot.c:149 #13 0x000000000040c6dd in _sushiv_panel_cooperative_compute_2d (p=0x5431d0) at panel-2d.c:709 #14 0x00000000004096d7 in _sushiv_panel_cooperative_compute (p=0x5431d0) at panel.c:110 #15 0x0000000000403630 in worker_thread (dummy=0x0) at main.c:106 #16 0x00002b06e1e9cf1a in start_thread () from /lib/libpthread.so.0 #17 0x00002b06e3b5d5d2 in clone () from /lib/libc.so.6 #18 0x0000000000000000 in ?? () (gdb) other threads: (gdb) thread 1 [Switching to thread 1 (Thread 47308603799168 (LWP 11843))]#0 0x00002b06e3b54cc6 in poll () from /lib/libc.so.6 (gdb) bt #0 0x00002b06e3b54cc6 in poll () from /lib/libc.so.6 #1 0x00002b06e37b967e in g_main_context_check () from /usr/lib/libglib-2.0.so.0 #2 0x00002b06e37b9b16 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #3 0x00002b06e20d77e2 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0 #4 0x000000000040381f in main (argc=1, argv=0x7fffc8d287e8) at main.c:179 (gdb) thread 3 [Switching to thread 3 (Thread 1091180896 (LWP 11845))]#0 0x00002b06e3b079e0 in mempcpy () from /lib/libc.so.6 (gdb) bt #0 0x00002b06e3b079e0 in mempcpy () from /lib/libc.so.6 #1 0x00002b06e3ac5043 in bsearch () from /lib/libc.so.6 #2 0x00002b06e3ac4f93 in bsearch () from /lib/libc.so.6 #3 0x00002b06e3ac5230 in qsort () from /lib/libc.so.6 #4 0x00002b06e3943e54 in _cairo_traps_tessellate_polygon (traps=0x4109fb10, poly=<value optimized out>, fill_rule=CAIRO_FILL_RULE_WINDING) at cairo-traps.c:742 #5 0x00002b06e393ce34 in _cairo_pen_stroke_spline (pen=0x4109f8e0, spline=0x4109f820, tolerance=0.10000000000000001, traps=0x4109fb10) at cairo-pen.c:451 #6 0x00002b06e393c116 in _cairo_stroker_curve_to (closure=0x4109f9f0, b=<value optimized out>, c=<value optimized out>, d=0x4109f990) at cairo-path-stroke.c:860 #7 0x00002b06e3939d99 in _cairo_path_fixed_interpret ( path=<value optimized out>, dir=<value optimized out>, move_to=0x2b06e393b820 <_cairo_stroker_move_to>, line_to=0x2b06e393c5f0 <_cairo_stroker_line_to>, curve_to=0x2b06e393be80 <_cairo_stroker_curve_to>, close_path=0x2b06e393c780 <_cairo_stroker_close_path>, closure=0x4109f9f0) at cairo-path.c:564 #8 0x00002b06e393b9d3 in _cairo_path_fixed_stroke_to_traps ( path=0x2aaaab601198, stroke_style=<value optimized out>, ctm=<value optimized out>, ctm_inverse=<value optimized out>, tolerance=<value optimized out>, traps=<value optimized out>) at cairo-path-stroke.c:995 #9 0x00002b06e3943353 in _cairo_surface_fallback_stroke ( surface=0x2aaaab601460, op=CAIRO_OPERATOR_OVER, source=0x4109fba0, path=0x2aaaab601198, stroke_style=0x2aaaab600b08, ctm=0x4109fc80, ctm_inverse=0x4109fc50, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface-fallback.c:783 #10 0x00002b06e39410ef in _cairo_surface_stroke (surface=0x2aaaab601460, op=CAIRO_OPERATOR_OVER, source=<value optimized out>, path=0x2aaaab601198, stroke_style=0x2aaaab600b08, ctm=<value optimized out>, ctm_inverse=0x2aaaab600c08, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface.c:1375 #11 0x00002b06e39355d0 in _cairo_gstate_stroke (gstate=0x2aaaab600af0, path=0x2aaaab601198) at cairo-gstate.c:931 #12 0x00002b06e393123d in *INT_cairo_stroke_preserve (cr=0x2aaaab601190) at cairo.c:1920 #13 0x00002b06e3931259 in cairo_stroke (cr=0x4109f5ed) at cairo.c:1896 #14 0x000000000040486e in draw_scales_work (s=0x2aaaab601460, xs= {lo = 0.098571428571428588, hi = 0.255, first_val = 100, first_pixel = 9, step_val = 5, step_pixel = 30, decimal_exponent = -3, m = 0.001, init = 1, pixels = 946}, ys= {lo = -96, hi = 96, first_val = -8, first_pixel = 17, step_val = 2, step_pixel = 21, decimal_exponent = 1, m = 10, init = 1, pixels = 200}) at plot.c:125 #15 0x0000000000404a17 in plot_draw_scales (p=0x566650) at plot.c:149 #16 0x000000000040c6dd in _sushiv_panel_cooperative_compute_2d (p=0x5431d0) at panel-2d.c:709 #17 0x00000000004096d7 in _sushiv_panel_cooperative_compute (p=0x5431d0) at panel.c:110 #18 0x0000000000403630 in worker_thread (dummy=0x0) at main.c:106 #19 0x00002b06e1e9cf1a in start_thread () from /lib/libpthread.so.0 #20 0x00002b06e3b5d5d2 in clone () from /lib/libc.so.6 #21 0x0000000000000000 in ?? () Second assert: sushivision: cairo-surface.c:441: cairo_surface_destroy: Assertion `surface->ref_count > 0' failed. Program received signal SIGABRT, Aborted. [Switching to Thread 1082460512 (LWP 11860)] 0x00002ba8bc5f107b in raise () from /lib/libc.so.6 (gdb) bt #0 0x00002ba8bc5f107b in raise () from /lib/libc.so.6 #1 0x00002ba8bc5f284e in abort () from /lib/libc.so.6 #2 0x00002ba8bc5eaaf4 in __assert_fail () from /lib/libc.so.6 #3 0x00002ba8bc46f93d in *INT_cairo_surface_destroy (surface=0x6cf520) at cairo-surface.c:441 #4 0x00002ba8bc46ca08 in _cairo_scaled_font_show_glyphs ( scaled_font=0x6d1f00, op=CAIRO_OPERATOR_OVER, pattern=0x4084dc50, surface=0x2aaaab816830, source_x=54, source_y=171, dest_x=54, dest_y=171, width=7, height=26, glyphs=0x756070, num_glyphs=5) at cairo-scaled-font.c:987 #5 0x00002ba8bc470b7f in _cairo_surface_old_show_glyphs_draw_func ( closure=0x4084dbb0, op=CAIRO_OPERATOR_OVER, src=0x4084dc50, dst=0x2aaaab816830, dst_x=0, dst_y=0, extents=0x4084dbe0) at cairo-surface-fallback.c:890 #6 0x00002ba8bc47068f in _clip_and_composite (clip=0x0, op=CAIRO_OPERATOR_OVER, src=0x4084dc50, draw_func=0x2ba8bc470a10 <_cairo_surface_old_show_glyphs_draw_func>, draw_closure=0x4084dbb0, dst=0x2aaaab816830, extents=0x4084dbe0) at cairo-surface-fallback.c:391 #7 0x00002ba8bc470a05 in _cairo_surface_fallback_show_glyphs ( surface=0x2aaaab816830, op=CAIRO_OPERATOR_OVER, source=0x4084dc50, glyphs=0x756070, num_glyphs=5, scaled_font=0x6d1f00) at cairo-surface-fallback.c:941 #8 0x00002ba8bc46f494 in _cairo_surface_show_glyphs (surface=0x2aaaab816830, op=CAIRO_OPERATOR_OVER, source=<value optimized out>, glyphs=0x756070, num_glyphs=5, scaled_font=0x6d1f00) at cairo-surface.c:1847 #9 0x00002ba8bc463fd7 in _cairo_gstate_show_glyphs (gstate=0x2aaaab816940, glyphs=0x754ea0, num_glyphs=5) at cairo-gstate.c:1503 #10 0x00002ba8bc45fc06 in cairo_show_text (cr=0x2aaaab801970, utf8=0x4084dee0 "0.160") at cairo.c:2681 #11 0x000000000040490b in draw_scales_work (s=0x2aaaab816830, xs= {lo = 0.14940119760479043, hi = 0.5, first_val = 150, first_pixel = 3, step_val = 5, step_pixel = 27, decimal_exponent = -3, m = 0.001, init = 1, pixels = 1917}, ys= {lo = -96, hi = 96, first_val = -8, first_pixel = 17, step_val = 2, step_pixel = 21, decimal_exponent = 1, m = 10, init = 1, pixels = 200}) at plot.c:129 #12 0x0000000000404a17 in plot_draw_scales (p=0x566650) at plot.c:149 #13 0x000000000040c6dd in _sushiv_panel_cooperative_compute_2d (p=0x5431d0) at panel-2d.c:709 #14 0x00000000004096d7 in _sushiv_panel_cooperative_compute (p=0x5431d0) at panel.c:110 #15 0x0000000000403630 in worker_thread (dummy=0x0) at main.c:106 #16 0x00002ba8ba9caf1a in start_thread () from /lib/libpthread.so.0 #17 0x00002ba8bc68b5d2 in clone () from /lib/libc.so.6 #18 0x0000000000000000 in ?? () other threads: (gdb) thread 1 [Switching to thread 1 (Thread 48003729135232 (LWP 11859))]#0 0x00002ba8bc4a4ad8 in _cairo_pixman_render_edge_init (e=0x7ffff01f88c0, n=8, y_start=67720, x_top=172998, y_top=66009, x_bot=156614, y_bot=70399) at renderedge.c:152 (gdb) bt #0 0x00002ba8bc4a4ad8 in _cairo_pixman_render_edge_init (e=0x7ffff01f88c0, n=8, y_start=67720, x_top=172998, y_top=66009, x_bot=156614, y_bot=70399) at renderedge.c:152 #1 0x00002ba8bc4a4b31 in _cairo_pixman_render_line_fixed_edge_init ( e=0x1126, n=8, y=3214, line=<value optimized out>, x_off=<value optimized out>, y_off=<value optimized out>) at renderedge.c:189 #2 0x00002ba8bc49ada0 in fbRasterizeTrapezoid ( pPicture=<value optimized out>, trap=0x735698, x_off=-761, y_off=0) at fbtrap.c:135 #3 0x00002ba8bc4992fe in _cairo_pixman_add_trapezoids (dst=0x7532b0, x_off=-761, y_off=0, traps=0x735698, ntraps=36) at ictrap.c:204 #4 0x00002ba8bc465a2d in _cairo_image_surface_composite_trapezoids ( op=CAIRO_OPERATOR_OVER, pattern=0x7ffff01f9060, abstract_dst=0x6f2e00, antialias=<value optimized out>, src_x=761, src_y=0, dst_x=761, dst_y=0, width=15, height=14, traps=0x734f90, num_traps=81) at cairo-image-surface.c:1013 #5 0x00002ba8bc46e3a1 in _cairo_surface_composite_trapezoids (op=4390, pattern=0x7ffff01f9060, dst=0x6f2e00, antialias=CAIRO_ANTIALIAS_DEFAULT, src_x=761, src_y=0, dst_x=761, dst_y=0, width=15, height=14, traps=0x734f90, num_traps=81) at cairo-surface.c:1459 #6 0x00002ba8bc471280 in _composite_traps_draw_func (closure=0x7ffff01f8f50, op=CAIRO_OPERATOR_OVER, src=0x7ffff01f9060, dst=0x6f2e00, dst_x=0, dst_y=0, extents=0x7ffff01f8f60) at cairo-surface-fallback.c:492 #7 0x00002ba8bc47068f in _clip_and_composite (clip=0x0, op=CAIRO_OPERATOR_OVER, src=0x7ffff01f9060, draw_func=0x2ba8bc4711c0 <_composite_traps_draw_func>, draw_closure=0x7ffff01f8f50, dst=0x6f2e00, extents=0x7ffff01f8f60) at cairo-surface-fallback.c:391 #8 0x00002ba8bc470fb4 in _clip_and_composite_trapezoids (src=0x7ffff01f8eb0, op=32767, dst=0x6f2e00, traps=0x7ffff01f8fd0, clip=0x0, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface-fallback.c:636 #9 0x00002ba8bc47137c in _cairo_surface_fallback_stroke (surface=0x6f2e00, op=CAIRO_OPERATOR_OVER, source=0x7ffff01f9060, path=0x69b628, stroke_style=0x754218, ctm=0x7ffff01f9140, ctm_inverse=0x7ffff01f9110, tolerance=<value optimized out>, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface-fallback.c:793 #10 0x00002ba8bc46f0ef in _cairo_surface_stroke (surface=0x6f2e00, op=CAIRO_OPERATOR_OVER, source=<value optimized out>, path=0x69b628, stroke_style=0x754218, ctm=<value optimized out>, ctm_inverse=0x754318, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface.c:1375 #11 0x00002ba8bc4635d0 in _cairo_gstate_stroke (gstate=0x754200, path=0x69b628) at cairo-gstate.c:931 #12 0x00002ba8bc45f23d in *INT_cairo_stroke_preserve (cr=0x69b620) at cairo.c:1920 #13 0x0000000000406fb7 in slider_draw (s=0x69fea0) at slider.c:320 #14 0x0000000000408999 in slider_motion (s=0x69fea0, slicenum=0, x=767, y=18) at slider.c:680 #15 0x0000000000408df5 in slice_motion (widget=0x69cd00, event=0x6cc670) at slice.c:66 #16 0x00002ba8bac0a77d in _gtk_marshal_BOOLEAN__BOXED () from /usr/lib/libgtk-x11-2.0.so.0 #17 0x00002ba8bb527589 in g_closure_invoke () from /usr/lib/libgobject-2.0.so.0 #18 0x00002ba8bb536d8f in g_signal_chain_from_overridden () from /usr/lib/libgobject-2.0.so.0 #19 0x00002ba8bb537c6e in g_signal_emit_valist () from /usr/lib/libgobject-2.0.so.0 #20 0x00002ba8bb538083 in g_signal_emit () from /usr/lib/libgobject-2.0.so.0 #21 0x00002ba8bace0e8e in gtk_widget_get_default_style () from /usr/lib/libgtk-x11-2.0.so.0 #22 0x00002ba8bac043de in gtk_propagate_event () from /usr/lib/libgtk-x11-2.0.so.0 #23 0x00002ba8bac05487 in gtk_main_do_event () from /usr/lib/libgtk-x11-2.0.so.0 #24 0x00002ba8baf4d4dc in _gdk_events_init () from /usr/lib/libgdk-x11-2.0.so.0 #25 0x00002ba8bc2e49e3 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #26 0x00002ba8bc2e782d in g_main_context_check () from /usr/lib/libglib-2.0.so.0 #27 0x00002ba8bc2e7b16 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #28 0x00002ba8bac057e2 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0 #29 0x000000000040381f in main (argc=1, argv=0x7ffff01f9cb8) at main.c:179 (gdb) thread 3 [Switching to thread 3 (Thread 1091180896 (LWP 11861))]#0 0x00002ba8ba9cfeab in __lll_mutex_lock_wait () from /lib/libpthread.so.0 (gdb) bt #0 0x00002ba8ba9cfeab in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #1 0x0000000000000000 in ?? () ...so the backtraces are not similar to the referenced bugs-- but that does not rule out the cause being the same. The crashes above are not coming from a standard Gtk+ widget, but rather a custom widget implemented by Sushivision. Monty Thanks. (In reply to comment #5) I also should add that although I only saw two assert failures, I added or adjusted locking in five places that came up in code review: Surface refcounts, Pattern refcounts, font cache 'freeze' counts, generic cache refcounts, and lastly fixing a race in glyph rendering where it would first check for glyphs, then freeze the cache instead of the other way around. So it's possible the above patch fixes more concurrency clobbers than just the two assert failures I saw. Monty This bug suggests that the same is happening in cairo_ft_scaled_font_lock/unlock: http://bugzilla.gnome.org/show_bug.cgi?id=357939 We should fix that one too. Looking at the bottom of the 2nd patch: @@ -347,6 +351,7 @@ static void _cairo_cache_remove (cairo_cache_t *cache, cairo_cache_entry_t *entry) { + CAIRO_MUTEX_LOCK (cairo_cache_refcount_mutex); cache->size -= entry->size; _cairo_hash_table_remove (cache->hash_table, @@ -354,6 +359,7 @@ _cairo_cache_remove (cairo_cache_t *cac if (cache->entry_destroy) cache->entry_destroy (entry); + CAIRO_MUTEX_LOCK (cairo_cache_refcount_mutex); } Shouldn't the second mutex operation be UNLOCK? (In reply to comment #9) > Shouldn't the second mutex operation be UNLOCK? Definitely. Thanks for the catch. Could anyone testing this stuff please post a single, good patch containing everything of interest? I seem to have broken something in attempting to generate one. Actually, what would be even better would be multiple patches for the independent pieces, (for example, separating the missing freeze/thaw from the addition of locking around all reference counting). And while I'm making a wishlist, I'd love to have the stuff available as a git patchset with commit messages, etc. And I'd really love to see some new cairo_atomic_increment/decrement inline functions used for the reference counting, (even if initially implemented with just a mutex---and likely just one mutex for all reference count locking). But please don't let my long list of wishes keep anyone from doing one piece just because they don't plan to do all of them. Thanks, -Carl What is it that you broke Carl? I don't mind making this mostly my problem as I seem to have the most immediate need for the fix. Monty (In reply to comment #11) > What is it that you broke Carl? I don't mind making this mostly my problem as I > seem to have the most immediate need for the fix. I seemed to see tests hanging when I ran "make test". Could you give that a try? The other things that would help would be a single correct patch known to apply to the latest cairo code. Thanks, -Carl Created attachment 8470 [details] [review] Compined Monty's patches + fix for typo (see comment #9) + win32 complation fix Patch attached is combined patch 7562 and 7673. On top of it CAIRO_MUTEX_LOCK is changed to CAIRO_MUTEX_UNLOCK in cairo-cache.c, that is obvious typo (see comment #9). Then added declaration and initialization of mutex used in this patch in win32 code. (Are there any other platforms that needs that too?) Created attachment 8471 [details] [review] Lock more cairo code with mutex to prevent remaining crashes Monty's patches helped a lot, but not all cases were solved. Examples of crashes: threads: cairo-ft-font.c:562: _cairo_ft_unscaled_font_unlock_face: Assertion `unscaled->lock > 0' failed. threads: cairo-hash.c:196: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed. It seems that these crashes are easiest reproducible when threads tie while executing some instructions. I.e. I'm running two threads application on a CoreDuo processor. Each thread runs the same code in a loop, so (I guess) they are often executing the same code block in the same moment. As a solution I have added 5 more mutex in different parts of code After that all problems seems to be resolved, although it is quite possible that I've added more locks than really needed, and that the code is sub-optimal. I've added: - cairo_refcount_mutex for cairo_reference/cairo_destroy - cairo_clip_path_refcount_mutex for _cairo_clip_path_reference/_cairo_clip_path_destroy - cairo_unscaled_font_refcount_mutex for _cairo_unscaled_font_reference/_cairo_unscaled_font_destroy - cairo_ft_font_unscaled_lock_mutex for _cairo_ft_unscaled_font_lock_face/_cairo_ft_unscaled_font_unlock_face - cairo_font_face_refcount_mutex for cairo_font_face_reference/cairo_font_face_destroy The last one is the most tricky - especially unlocking/locking around "font_face->backend->destroy (font_face);". But without this I was getting deadlock. (In reply to comment #13) > Created an attachment (id=8470) [details] > Compined Monty's patches + fix for typo (see comment #9) + win32 complation fix This still isn't applying cleanly to the latest cairo. Oh, looks from the patch that it was generated against 1.2.6. It would be much more helpful to have patches generated from the latest cairo, (as made available through git). And if you are in a position to feed me patches generated from git that could help a lot too. > On top of it CAIRO_MUTEX_LOCK is changed to CAIRO_MUTEX_UNLOCK in > cairo-cache.c, that is obvious typo (see comment #9). Thanks. > Then added declaration and initialization of mutex used in this patch in win32 > code. (Are there any other platforms that needs that too?) Also, thanks. There might be another platform that needs it, but we won't hear about it here in bugzilla-land where we get basically just a pair- wise conversation. Please (In reply to comment #16) > This still isn't applying cleanly to the latest cairo. Beyond that, I'm still getting deadlock with "make test" after applying (and massaging) the patch. Something's very wrong. I'll start by breaking the patch up into minimal pieces, but I'll need some help from someone who can easily replicate the original bugs to make sure my new stuff still catches them all. I'll post a git branch with the new work soon, (I refuse to post patches as bugzilla attachments---yuck!). -Carl So here's a web-view of a git branch I made, (8801-attachments) with the two most recent attachments as git commits: http://gitweb.freedesktop.org/?p=users/cworth/cairo;a=shortlog;h=8801-attachments One should be able to explore that by doing the following for a new clone: git clone git://people.freedesktop.org/~cworth/cairo cd cairo git checkout -b tmp 8801-attachments or from an existing clone of cairo's repository: git fetch git://people.freedesktop.org/~cworth/cairo 8801-attachments:8801-attachments git checkout -b tmp 8801-attachments If you could test to see if you con't also get deadlock with "make test" that would be helpful. Then, I'll also follow up with a new branch that breaks up each minimal piece of new functionality. Thanks, -Carl (In reply to comment #18) > Then, I'll also follow up with a new branch that breaks up each minimal piece > of new functionality. The first thing I tried separating out were the changes to cairo-scaled-font.c, (moving the scaled_font_freeze up before the call to backend->show_glyphs). That's definitely not what we want. It's even been proposed before and rejected, see: https://bugs.freedesktop.org/show_bug.cgi?id=6955#c12 So you'll see that the freeze is happening as the first substantive thing in _cairo_xlib_surface_show_glyphs. So now I'm really curious as to what prompted this change in the patch. I'd really like to see a test case that fails without this part of the patch and passes with it. The ref_count stuff is a bit more straightforward, but I still want to change it in a couple of ways: 1. I think I might unify all of the reference counting throughout cairo 2. I'd still like to see the protection for this implemented with atomic instructions. More to come on those fronts... -Carl (In reply to comment #19) > The first thing I tried separating out were the changes to cairo-scaled-font.c, > (moving the scaled_font_freeze up before the call to backend->show_glyphs). > > That's definitely not what we want. It's even been proposed before and > rejected, see: > > https://bugs.freedesktop.org/show_bug.cgi?id=6955#c12 Looking closer, what was rejected was doing generic freeze/thaw to fix the problem of glyphs disappearing from the X server glyph cache due to cache pressure from other glyphs in the same call to _cairo_surface_show_glyphs. But the question of how to ensure that the results of _cairo_scaled_glyph_lookup remain valid until the caller is done looking at them has not been properly addressed. I think we're missing freeze/thaw not only here, but around every call into _cairo_scaled_glyph_lookup. And _cairo_scaled_glyph_lookup is also currently abusing the font_map_mutex when the scaled_font->glyphs cache really should have its own mutex. So, basically, it looks like the locking situtation inside cairo-scaled-font.c is quite miserable currently, and I'm not surprised that you've both had to do a fair amount of whack-a-mole fixing on it. I'd like to attempt a more complete fix addressing the issues described here, but I'm definitely too tired to finish that up tonight. More tomorrow, (and probably moving over to the cairo mailing list at this point). -Carl See my recent mail to the cairo list for an attempted patch to address this bug: http://article.gmane.org/gmane.comp.lib.cairo/9360 Thanks, -Carl As described in the mail linked to above, the patch that Keith and I put together solves the bug without introducing any locking overhead for reference counting. Monty confirmed that the patch fixes his bug. And the patch also doesn't cause any deadlocks in "make test". Finally, the new locking that the patch adds to the text rendering code could be made more narrow for better performance, but that's obviously a different bug. Anyway, the patch is committed now, (during 1.3.13 as available through git), and will appear in the next (1.3.14 snapshot). So that's a version number you can key off of Monty. Jan, if you could retest with the latest code that would be helpful. And if you still see problems you may be seeing an independent bug---so let us know. -Carl Ok, this one isn't quite done yet. The pthread-show-text test case is being useful for exposing the remaining problems, (particularly with a multi-core or multi-processor machine). More later, (sleep first). -Carl The scaled_font locking added previously helped a lot, but as Jan noted, it wasn't complete. Since then, with Jan's help we tracked down the need to add locking to the unscaled_font object as well, and triggered some interesting discussions about the way this locking should interact with cairo_ft_scaled_font_lock_face, (which is a separate bug---how to ensure cairo and cairo-using applications don't call into the non-thread-safe freetype library at the same time). Anyway, the new locking (and subsequent fixes for the several things I broke while adding it), are available in commits 7e1301ffb066b04d95dc71144d86e660f0155bba through 9966551dc7640ae7901ffed0e15f0fbf7e603d5d . Hopefully, we can keep this bug closed this time. Again, thanks for all your help with this. -Carl |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.