Created attachment 18583 [details] gdb backtrace From times to times (around once a week, maybe more) the server just stop responding when a window is closed. gdb from remote session shows that the server is spinning forever in damageDestroyPixmap: while ((pDamage = *pPrev)) { damageRemoveDamage (pPrev, pDamage); if (!pDamage->isWindow) DamageDestroy (pDamage); } the graphic stack is a git master checkout from 26 August 2008 using the radeon driver and compiz. a gdb bt full backtrace is attached.
Does this happen with the 1.5 server?
(In reply to comment #1) > Does this happen with the 1.5 server? > nope, git master
(In reply to comment #2) > (In reply to comment #1) > > Does this happen with the 1.5 server? > > > > nope, git master > to be bore correct: I haven't tested server 1.5 , the bug is observed on git master
X server 1.5.2 (2:1.5.2-2ubuntu3) suffers from the same problem. The gdb backtrace is identical to the one reported by Mathieu. The graphic stack comes from a freshly installed Ubuntu 8.10 running KDE4. Graphic hardware is a 01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE] using the radeon driver.
Can you trace the control flow with gdb to see why it doesn't terminate?
pDamage->pNext points to pDamage, leading to an infinite loop. (gdb) print *pPrev $3 = (DamagePtr) 0x81dc968 (gdb) print *pDamage $4 = {pNext = 0x81dc968, pNextWin = 0x81dc96c, damage = {extents = {x1 = 0, y1 = 0, x2 = 0, y2 = 0}, data = 0x0}, damageLevel = DamageReportRawRegion, isInternal = 135749696, closure = 0x8175fb0, isWindow = 135749408, pDrawable = 0x8175ea0, damageReport = 0x8175e00 <damageChangeClip>, damageDestroy = 0x8175d80 <damageDestroyClip>, reportAfter = 135748848, pendingDamage = {extents = {x1 = 0, y1 = 0, x2 = -26848, y2 = 2071}, data = 0x81794c0}}
(In reply to comment #6) > pDamage->pNext points to pDamage, leading to an infinite loop. Hmm, does rebuilding miext/damage/damage.c with DAMAGE_VALIDATE_ENABLE defined to 1 give a clue as to how this comes to be?
Unfortunately DAMAGE_VALIDATE_ENABLE doesn't help. No error message is log, and the abort() calls are not taken. Should I try to set DAMAGE_DEBUG_ENABLE to 1 or will it generate way too much log messages (my X server usually runs for about an hour before it freezes) ?
*** Bug 18906 has been marked as a duplicate of this bug. ***
*** Bug 19560 has been marked as a duplicate of this bug. ***
Created attachment 22405 [details] [review] damage debugging patch Could anyone that can reproduce the bug try running with this patch? I note that 3/3 reports so far are radeon, but I couldn't identify anything clearly bad in radeon. Is anybody doing this rotating with randr?
(In reply to comment #11) > Created an attachment (id=22405) [details] > damage debugging patch > > Could anyone that can reproduce the bug try running with this patch? I note > that 3/3 reports so far are radeon, but I couldn't identify anything clearly > bad in radeon. Is anybody doing this rotating with randr? > Sorry but my radeon powered laptop died a month ago, and my current system, for which I use nouveau doesn't has this bug. But for the record, I neved used randr rotation.
(In reply to comment #11) > Created an attachment (id=22405) [details] > damage debugging patch > > Could anyone that can reproduce the bug try running with this patch? Sorry for the late reply, I had to reproduce the issue. The FatalError call isn't reached. > I note that 3/3 reports so far are radeon, but I couldn't identify anything > clearly bad in radeon. Is anybody doing this rotating with randr? I'm not.
(In reply to comment #6) > pDamage->pNext points to pDamage, leading to an infinite loop. > > (gdb) print *pPrev > $3 = (DamagePtr) 0x81dc968 > (gdb) print *pDamage > $4 = {pNext = 0x81dc968, pNextWin = 0x81dc96c, damage = {extents = {x1 = 0, y1 > = 0, x2 = 0, y2 = 0}, data = 0x0}, damageLevel = DamageReportRawRegion, > isInternal = 135749696, closure = 0x8175fb0, isWindow = 135749408, pDrawable = > 0x8175ea0, damageReport = 0x8175e00 <damageChangeClip>, damageDestroy = > 0x8175d80 <damageDestroyClip>, reportAfter = 135748848, pendingDamage = > {extents = {x1 = 0, y1 = 0, x2 = -26848, y2 = 2071}, data = 0x81794c0}} > Looking at inlined code might help a little (attachment). The gdb output shows that *pPrev and pDamage match, as it should be by statement (2). Then damageRemoveDamage gets called, but the abort() call (enabled if DAMAGE_VALIDATE_ENABLE is defined) is never hit, because the assignement pDamage = *pPrev is executed *before* calling damageRemoveDamage, thus statement (4) is *always true* without needing to call damageRemoveDamage to check. Apart from this, damageDestroyPixmap would continue to work (avoiding infinite loop due to assignement (2)) if the only conditional was executed, statement (5). In this case it never happens, since (5) is executed only if pDamage->isWindow = 0, while gdb reports pDamage->isWindow = 135749408 (which is a corrupted value, since isWindow is supposed to be a bool type). Chances are that: a) statement (1) gets executed and somehow isWindow gets corrupted when executing this statement on some particular configurations (most likely) b) random corruption of values stored in pPixmap in statement (1) on occasional basis (leading to corrupted isWindow) due to "damage external software"/hardware faults (other xserver components, radeon driver, kernel, ram) (unlikely) c) buggy code produced by the compiler (likely) Personally, I've been running radeon, radeonhd, on xserver>1.5 for the last five months without ever hitting this bug.
Created attachment 22903 [details] inlined damageDestroyPixmap function Shows execution flow through damageDestroyPixmap
(In reply to comment #14) > a) statement (1) gets executed and somehow isWindow gets corrupted when > executing this statement on some particular configurations (most likely) Actually that seems quite unlikely to me, as getPixmapDamageRef just boils down to dixLookupPrivateAddr. > b) random corruption of values stored in pPixmap in statement (1) on occasional > basis (leading to corrupted isWindow) due to "damage external > software"/hardware faults (other xserver components, radeon driver, kernel, > ram) (unlikely) > c) buggy code produced by the compiler (likely) Looking at the printed *pDamage again, actually it doesn't look like a DamageRec at all but like the static GCFuncs damageGCFuncs from line 438 of damage.c. So it does look like some kind of memory corruption, but I'd tend to consider b) more likely than c). If someone could reproduce the problem with the X server running in valgrind or at least gdb with something like Electric Fence, that might give a hint. > Personally, I've been running radeon, radeonhd, on xserver>1.5 for the last > five months without ever hitting this bug. Same here, but for even longer.
A fix for a memory use-after-free in the r300 driver just went into Mesa Git master and mesa_7_4_branch. Would be great if you could try if it helps for this problem as well.
I haven't been to reproduce the problem for quite some time now. It must have been fixed by a Ubuntu upgrade (probably Xorg or KDE).
I second Laurent, on Ubuntu Jaunty, I have never been able to reproduce the problem. It cannot be a change in KDE though, since I am a Gnome only user.
Assuming it's been fixed, reopen if you can still reproduce with current bits.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.