Bug 19740 - [GM965][GM45] GPU hang with compiz reflection plugin
Summary: [GM965][GM45] GPU hang with compiz reflection plugin
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium critical
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 23753 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-01-26 01:02 UTC by Tom Jaeger
Modified: 2010-06-29 13:18 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gpu dump (152.92 KB, application/x-gzip)
2009-08-07 21:04 UTC, Tom Jaeger
no flags Details

Description Tom Jaeger 2009-01-26 01:02:39 UTC
This is fairly easy for me to reproduce with certain compiz plugins (alt+tab, scale, shift switcher).  Intel driver is current master (but the problem has been happening for a while, possibly since commit 750bd0bde09adf956c17bbb49c5a6020f12e60a4).  This is on a i965 on an x61t: 

00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)


#0  0xb7f1e430 in __kernel_vsyscall ()
#1  0xb7bb2ce9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2  0xb79b7add in drmIoctl () from /usr/lib/libdrm.so.2
#3  0xb79b7ee2 in drmCommandNone () from /usr/lib/libdrm.so.2
#4  0xb794da3f in I830BlockHandler (i=0, blockData=0x0, pTimeout=0xbfe3a298, pReadmask=0x81f52c0)
    at ../../src/i830_driver.c:2716
#5  0x0817ad7b in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0xbfe3a298, pReadmask=0x81f52c0)
    at ../../render/animcur.c:222
#6  0x08143e98 in compBlockHandler (i=0, blockData=0x0, pTimeout=0xbfe3a298, pReadmask=0x81f52c0)
    at ../../composite/compinit.c:158
#7  0x08090a48 in BlockHandler (pTimeout=0xbfe3a298, pReadmask=0x81f52c0) at ../../dix/dixutils.c:384
#8  0x08130884 in WaitForSomething (pClientsReady=0xb932d50) at ../../os/WaitFor.c:215
#9  0x0808cb4e in Dispatch () at ../../dix/dispatch.c:367
#10 0x08071b7d in main (argc=10, argv=0xbfe3a3e4, envp=Cannot access memory at address 0x6460
) at ../../dix/main.c:383
Comment 1 Tom Jaeger 2009-01-26 01:13:06 UTC
Sorry, I spoke too soon.  This is also happening with that commit reverted:

#0  0xb7f7b430 in __kernel_vsyscall ()
#1  0xb7c0fce9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7a14add in drmIoctl () from /usr/lib/libdrm.so.2
#3  0xb7a14e2b in drmCommandWrite () from /usr/lib/libdrm.so.2
#4  0xb799e7e5 in I830Sync (pScrn=0x9d1f408) at ../../src/i830_accel.c:214
#5  0xb79cf46a in I830EXASync (pScreen=0x9d38340, marker=0) at ../../src/i830_exa.c:169
#6  0xb797a185 in exaWaitSync (pScreen=0x9d38340) at ../../exa/exa.c:1065
#7  0xb797b3fe in ExaDoPrepareAccess (pDrawable=0xe5b7db0, index=0) at ../../exa/exa.c:509
#8  0xb7980512 in exaCopyDirty (migrate=0xbfc963ec, pValidDst=0xfe845e4, pValidSrc=0xfe845d8, transfer=0, 
    fallback_src=0xe5b7de0 "", fallback_dst=0xb4375140 '\200' <repeats 68 times>, fallback_srcpitch=40, 
    fallback_dstpitch=64, fallback_index=0, sync=0xb797a190 <exaMarkSync>) at ../../exa/exa_migration.c:218
#9  0xb7980a39 in exaDoMoveInPixmap (migrate=0xbfc963ec) at ../../exa/exa_migration.c:274
#10 0xb79811fa in exaDoMigration (pixmaps=0xbfc963dc, npixmaps=2, can_accel=1) at ../../exa/exa_migration.c:683
#11 0xb797e131 in exaCopyNtoN (pSrcDrawable=0xe5b7db0, pDstDrawable=0xa4a64008, pGC=0x0, pbox=0xbfc96538, nbox=1, 
    dx=-592, dy=-48, reverse=0, upsidedown=0, bitplane=0, closure=0x0) at ../../exa/exa_accel.c:479
#12 0xb79832fe in exaComposite (op=1 '\001', pSrc=0xfe846f8, pMask=0x0, pDst=0xc8687e0, xSrc=0, ySrc=0, 
    xMask=<value optimized out>, yMask=<value optimized out>, xDst=592, yDst=48, width=10, height=12)
    at ../../exa/exa_render.c:865
#13 0x0817deba in damageComposite (op=252 '�', pSrc=0xfe846f8, pMask=0x0, pDst=0xc8687e0, xSrc=<value optimized out>, 
    ySrc=<value optimized out>, xMask=<value optimized out>, yMask=<value optimized out>, xDst=<value optimized out>, 
    yDst=<value optimized out>, width=<value optimized out>, height=<value optimized out>)
    at ../../../miext/damage/damage.c:643
#14 0x081700fa in CompositePicture (op=1 '\001', pSrc=0xfe846f8, pMask=0x0, pDst=0xc8687e0, xSrc=0, ySrc=0, 
    xMask=<value optimized out>, yMask=<value optimized out>, xDst=<value optimized out>, yDst=<value optimized out>, 
    width=10, height=12) at ../../render/picture.c:1675
#15 0xb797ed80 in exaBufferGlyph (pScreen=0x9d38340, buffer=0xbfc96834, pGlyph=0xfe84578, xGlyph=39, yGlyph=12)
    at ../../exa/exa_glyphs.c:481
#16 0xb797f7f2 in exaGlyphs (op=3 '\003', pSrc=0xe1479e8, pDst=0xfdf9848, maskFormat=0x9d3ac38, xSrc=941, ySrc=18, 
    nlist=0, list=0xbfc979a8, glyphs=0xbfc975bc) at ../../exa/exa_glyphs.c:856
#17 0x0817e185 in damageGlyphs (op=252 '�', pSrc=0xe1479e8, pDst=0xfdf9848, maskFormat=0x9d3ac38, 
    xSrc=<value optimized out>, ySrc=<value optimized out>, nlist=1, list=0xbfc979a8, glyphs=0xbfc975a8)
    at ../../../miext/damage/damage.c:721
#18 0x0816c732 in CompositeGlyphs (op=<value optimized out>, pSrc=0xe1479e8, pDst=0xfdf9848, maskFormat=0x9d3ac38, 
    xSrc=941, ySrc=18, nlist=1, lists=0xbfc979a8, glyphs=0xbfc975a8) at ../../render/glyph.c:632
#19 0x0817987e in ProcRenderCompositeGlyphs (client=0x102d9ad0) at ../../render/render.c:1459
#20 0x08172cc5 in ProcRenderDispatch (client=0x40046445) at ../../render/render.c:2086
#21 0x0808ce0f in Dispatch () at ../../dix/dispatch.c:437
#22 0x08071b7d in main (argc=10, argv=0xbfc97e04, envp=Cannot access memory at address 0x4004644d
) at ../../dix/main.c:383
Comment 2 Eric Anholt 2009-01-27 17:30:57 UTC
Does this also occur with UXA and DRI2?  
Comment 3 Tom Jaeger 2009-01-27 22:43:47 UTC
(In reply to comment #2)
> Does this also occur with UXA and DRI2?  
> 

Sorry, I haven't had much luck with UXA so far.  Right now, I'm getting less than 1fps in compiz (tried both 2.6.1 and current master and updated to mesa's 7.4 branch). For what it's worth, I haven't been able to reproduce the problem under these circumstances.
Comment 4 Tom Jaeger 2009-01-28 13:41:18 UTC
(In reply to comment #2)
> Does this also occur with UXA and DRI2?  

I've got compiz working now (I had to disable sync to vblank in the compiz options).  The problem is still there, and it's happening spontaneously now, usually a few minutes into the session.  (Backtrace is identical to the one in comment #1).
Comment 5 Tom Jaeger 2009-03-23 19:05:00 UTC
Is there anything I can do to help debug the issue?  It's affecting both EXA and UXA (making UXA basically unusable) and it's really easy for me to reproduce.
Comment 6 Arkadiusz Miskiewicz 2009-04-02 03:44:30 UTC
I just hit this, too.

xserver 1.6
mesa 7.4
libdrm from git master
intel driver from git master
kernel from git master as of today
x86_64 on thinkpad t400 with GM45

KMS enabled, UXA enabled, nopat

X frozen, even mouse cursor doesn't work

gdb says:

Program received signal SIGINT, Interrupt.
0x00007faa46728327 in ioctl () from /lib64/libc.so.6
(gdb) bt
#0  0x00007faa46728327 in ioctl () from /lib64/libc.so.6
#1  0x00007faa44f041c3 in drmIoctl (fd=7, request=25688, arg=0x0) at xf86drm.c:187
#2  0x00007faa44f044c6 in drmCommandNone (fd=7, drmCommandIndex=<value optimized out>) at xf86drm.c:2313
#3  0x00007faa44a7c838 in I830BlockHandler (i=<value optimized out>, blockData=0x0, pTimeout=0x7fff506dc118, pReadmask=0x7d1ea0) at i830_driver.c:2655
#4  0x000000000052d4b8 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0x7fff506dc118, pReadmask=0x7d1ea0) at animcur.c:222
#5  0x00000000004f93fe in compBlockHandler (i=0, blockData=0x0, pTimeout=0x7fff506dc118, pReadmask=0x7d1ea0) at compinit.c:158
#6  0x000000000044b170 in BlockHandler (pTimeout=0x7fff506dc118, pReadmask=0x7d1ea0) at dixutils.c:384
#7  0x00000000004e7661 in WaitForSomething (pClientsReady=0x3890f90) at WaitFor.c:215
#8  0x00000000004474f0 in Dispatch () at dispatch.c:367
#9  0x000000000042d63d in main (argc=7, argv=0x7fff506dc2f8, envp=<value optimized out>) at main.c:397


Comment 7 Arkadiusz Miskiewicz 2009-04-02 04:27:21 UTC
Isn't this the same as #19911 anyway?
Comment 8 Eric Anholt 2009-04-08 14:47:19 UTC
Arkadiusz: Don't lump bugs together because symptoms are the same.  Lump them together when the things you do to cause the bug plus the symptoms are the same.

Of course, the missing part in this bug is how to reliably produce the hang.
Comment 9 Tom Jaeger 2009-04-08 21:10:56 UTC
> Of course, the missing part in this bug is how to reliably produce the hang.

Well it's pretty obvious that the bug doesn't show up on any of the driver developers' machines, or else it would be fixed by now.  It's very easy for me to reproduce:

EXA: Run compiz, open a large number of windows, minimize some of them and press Alt+Tab.  Might not hang on the first try, but will usually take less then a minute to reproduce
UXA: Run compiz and start firefox.  That's it.
Comment 10 Arkadiusz Miskiewicz 2009-04-08 23:58:12 UTC
Eric: ok, hopefuly test program at #19911 triggers for me reliably.

Tom: looks like my issue is indeed different, I wasn't able to trigger it by running compiz and firefox at UXA
Comment 11 Tom Jaeger 2009-07-06 14:04:08 UTC
I didn't realize this before, but in order to reproduce the issue, I need to enable compiz' reflection plugin (for what it's worth, this might not be true for the EXA hang, I'm pretty sure I've seen that one without the reflection plugin enabled).
Comment 12 Eric Anholt 2009-08-07 19:15:32 UTC
Could you attach the output of intel_gpu_dump when you reproduce the hang?  I've now run with the reflection plugin enabled, and don't see any problems.  Also, please be sure you have the following new commit series:

xf86-video-intel:
commit e8f0763d405a8152c74c28792c52fe12c1d41dd5
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Aug 7 18:24:44 2009 -0700

    Fix math in the tiling alignment fix.

commit 222b52ef16895823fbf3a0fc0be4eb23b930ed1b
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Aug 7 18:05:29 2009 -0700

    Align tiled pixmap height so we don't address beyond the end of our buffers.

Mesa:
commit ceb8afcca5b0a52b005a782ea54b301beaee1a15
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Aug 7 18:09:31 2009 -0700

    intel: Align region height as required for tiled regions.

    Otherwise, we would address beyond the end of our buffers.  Fixes reliable
    GPU segfault with texture_tiling=true and oglconform shadow.c.

    Bug #22406.

I can't say for sure whether this will fix your problem, as it depends on the size of your screen and whether you had any other 3D apps besides compiz running.
Comment 13 Tom Jaeger 2009-08-07 21:03:16 UTC
The bug is still present in the latest intel driver + mesa from git.

(In reply to comment #12)
> Could you attach the output of intel_gpu_dump when you reproduce the hang? 
> I've now run with the reflection plugin enabled, and don't see any problems. 
> Also, please be sure you have the following new commit series:
I'll attach the output of intel_gpu_dump.  After executing intel_gpu_dump, the system locks up pretty bad (I can still login via ssh, but I can't do anything in the shell for some reason), but it seems like the dump worked.

> I can't say for sure whether this will fix your problem, as it depends on the
> size of your screen and whether you had any other 3D apps besides compiz
> running.
> 

Screen size is 1400x1050.  The only apps that are running are gnome, compiz and firefox.
Comment 14 Tom Jaeger 2009-08-07 21:04:28 UTC
Created attachment 28435 [details]
gpu dump
Comment 15 zOOm_ER 2009-08-10 15:30:03 UTC
It seems to me, that there are dozens of bugreports already, which describe random X freeze on GM965. Compiz (especially reflection) just makes it hang faster.
Comment 16 Eric Anholt 2009-10-09 12:36:50 UTC
This is reproducible on my GM965: Enable the reflection plugin, open 2 gnome-terminals, drag one around.  The hangcheck timer repeatedly triggers and resets the GPU.
Comment 17 Eric Anholt 2009-10-19 10:44:58 UTC
*** Bug 23753 has been marked as a duplicate of this bug. ***
Comment 18 maximlevitsky 2010-06-29 13:18:25 UTC
Glad to see that this bug is fixed now.

(I had to jump through few hoops to make compiz work with master of mesa though)

More correctly I reverted commit

73e24cd5a7a0760726a681dda5b88805ddcf1555 is first bad commit
commit 73e24cd5a7a0760726a681dda5b88805ddcf1555
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Mon Feb 8 10:34:52 2010 -0800

    intel: Stop exposing useless 24 depth/0 stencil configs
    
    Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
    Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>


And removed one assert, and now compiz works.
Otherwise compiz was unhappy with missing visuals
(it wants 24 bit / 0 stencil ?)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.