Bug 30032

Summary: Xorg process stackfault - infinite recursion FlushCallback vs WriteToClient
Product: xorg Reporter: Sebastian Glita <gseba>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: chain, OmegaPhil, persson, runge, szotsaki, xarax-fd
Version: gitKeywords: patch
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 27592, 31018    
Attachments:
Description Flags
"git diff" patch against master none

Description Sebastian Glita 2010-09-05 08:45:01 UTC
Created attachment 38453 [details] [review]
"git diff" patch against master

Hi,

Xorg restarted each session after some activity (20-30 minutes).

Here is a simple fixup in "dixutils.c:_CallCallbacks"

- by moving recursion count check *before* list iteration and actual callbacks' invocation;

- can say for sure, though, whether counters (member `int inCallback' from "include/dixstruct.h:171:CallbackListRec" structure) were supposed to grow greater than 1 before iteration,

- so cannot ascertain whether the new behavior is required.


There is a gdb backtrace top:

#104774  RecordFlushAllContexts (pcbl=0x878748, nulldata=0x0, calldata=0x0) at record.c:867
#104775  0x000000000043a9d2 in _CallCallbacks (pcbl=0x878748, call_data=0x0) at dixutils.c:745
#104776  0x000000000043adb6 in CallCallbacks (pcbl=0x878748, call_data=0x0) at dixutils.c:877
#104777  0x000000000047e5fd in FlushAllOutput () at io.c:614
#104778  0x000000000042c219 in Dispatch () at dispatch.c:453
#104779  0x000000000042461e in main (argc=18, argv=0x7fff83be4828, envp=0x7fff83be48c0) at main.c:291


There were some about 104780 stack calls between these lines of code:

a. dix/main.c:main:291 calls Dispatch();

b. dix/dispatch.c:Dispatch:543 calls FlushAllOutput();

c. os/io.c:FlushAllOutput:614 triggers CallCallbacks(&FlushCallback, NULL);

d. dix/dixutils.c:CallCallbacks:877 falls through _CallCallbacks(pcbl, call_data);

e. dix/dixutils.c:_CallCallbacks:743 dispatches (*(cbr->proc)) (pcbl, cbr->data, call_data);

f. record/record.c:RecordFlushAllContexts:867 entails RecordFlushReplyBuffer(ppAllContexts[eci], NULL, 0, NULL, 0);

g. record/record.c:RecordFlushReplyBuffer:251 enters WriteToClient;

h. os/io.c:WriteToClient:824: triggers, as in c, again CallCallbacks(&FlushCallback, NULL);

i. dix/dixutils.c:CallCallbacks:877 back (as in d) to _CallCallbacks(pcbl, call_data);


So I changed "dix/dixutils.c" lines 732-747

- from:

<<<
static void 
_CallCallbacks(
    CallbackListPtr    *pcbl,
    pointer	    call_data)
{
    CallbackListPtr cbl = *pcbl;
    CallbackPtr     cbr, pcbr;

    ++(cbl->inCallback);
    for (cbr = cbl->list; cbr != NULL; cbr = cbr->next)
    {
	(*(cbr->proc)) (pcbl, cbr->data, call_data);
    }
    --(cbl->inCallback);

    if (cbl->inCallback) return;
>>>

- into:

<<<
static void 
_CallCallbacks(
    CallbackListPtr    *pcbl,
    pointer	    call_data)
{
    CallbackListPtr cbl = *pcbl;
    CallbackPtr     cbr, pcbr;

    if (cbl->inCallback) return;

    ++(cbl->inCallback);
    for (cbr = cbl->list; cbr != NULL; cbr = cbr->next)
    {
	(*(cbr->proc)) (pcbl, cbr->data, call_data);
    }
    --(cbl->inCallback);
>>>


So 6 is the number of the mutual recursive functions' chain:

1. dix/dixutils.c:CallCallbacks:877
2. dix/dixutils.c:_CallCallbacks:743
3. record/record.c:RecordFlushAllContexts:867
4. record/record.c:RecordFlushReplyBuffer:251
5. os/io.c:WriteToClient:824


In "_CallCallbacks", the value of the `cbl->inCallback' variable was about 20000 at the bottom of the stack, which verifies: 20000 * 5 ~= 105000.


An earlier, 29.08.2010 date, commit might have triggered it:
http://cgit.freedesktop.org/xorg/xserver/commit/?id=c65f610e12f9df168d5639534ed3c2bd40afffc8


It seems to occur not with xf86-video-nouveau, besides xf86-video-intel.


Thanks,
s.
Comment 1 Xavier Aragon 2010-10-06 03:41:28 UTC
I just confirm that exactly the same problem occurs when I use x11vnc with Xorg server 1.9 (on Ubuntu 10.10). It seems that x11vnc uses the XRecord extension by default, and server 1.9 can get to infinite call recursion when XRecord is used, as explained in the bug description above. Currently I can work around the problem by specifying -noxrecord in x11vnc options.
Comment 2 Julien Cristau 2010-10-06 04:18:10 UTC
Thanks for the report. Would you mind sending your patch for review to xorg-devel@lists.x.org, per http://www.x.org/wiki/Development/Documentation/SubmittingPatches ?
Comment 3 Sebastian Glita 2010-10-06 13:07:49 UTC
I posted the patch there.
Thanks for confirmation.
Comment 4 darwin_te 2010-12-29 15:32:26 UTC
Confirmed bug on my:

- Kubuntu 10.10,
- Nvidia binary driver 260.19.29
- GEForce 8400 GS with dual monitor output
- kernel 2.6.35-24-generic-pae
- x11vnc version 0.9.10 lastmod: 2010-04-28
- X.Org X Server 1.9.0

Temporary fix is to specify -noxrecord on x11vnc option.

Threads where other uses are commenting on:

http://ubuntuforums.org/showthread.php?p=10293954#post10293954
Comment 5 Sebastian Glita 2010-12-30 06:24:37 UTC
x11vnc without -noxrecord then causes this bug, using it as such also launched automatically.
Comment 6 Chris Wilson 2011-01-24 04:40:37 UTC
*** Bug 33384 has been marked as a duplicate of this bug. ***
Comment 7 Stephen White 2011-02-17 02:54:09 UTC
We're seeing crashes in Xvfb when using it for running automated UI tests with Eclipse & SWTBot.  The stack-trace shows similar infinite recursion in the same code as this bug.  The patch in attachment 38453 [details] [review] (applied to Xorg 1.9.3 as shipped with Fedora 14) seems to resolve these issues for us.
Comment 8 Julien Cristau 2011-02-17 09:27:56 UTC
See also http://patchwork.freedesktop.org/patch/4042/
Comment 9 Keith Packard 2011-02-24 19:03:10 UTC
Patch merged in 0801afbd7c2c644c672b37f8463f1a0cbadebd2e
Comment 10 Jeremy Huddleston Sequoia 2011-02-24 20:08:08 UTC
cherry-picked into 1.9 as well 8369467c20746ee91ac8be78a43dc1990b01e056
Comment 11 Julien Cristau 2011-03-21 09:53:12 UTC
*** Bug 35477 has been marked as a duplicate of this bug. ***
Comment 12 Julien Cristau 2011-04-27 01:37:42 UTC
*** Bug 36544 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.