I've found a repeatable method to hang the GPU on my 945GME, using the latest git head of xf86-video-intel, mesa, libdrm, xorg-server, and the assorted current git versions of their dependencies that are needed in order to compile them (for instance, libXi, which is a minor detail I think to be irrelevant to this report). In the stock xorg-server, the action of using Xv to convert YUV into RGB onto an X11 off-screen Pixmap is an invalid operation. So, I also applied the patch from Bug Report #21143 in order to allow this operation to be attempted. What I do is set up an XvImage object to contain the YUV image, and then allocate an off-screen Pixmap object to receive it. Prior to the transfer, I use the features in the GLX_EXT_texture_from_pixmap to build a GLXPixmap object that refers to this offscreen pixmap. I then bind this to a texture, and then proceed to run a loop where I blit the texture to the screen using GL_QUADS, modify the XvImage, rerun the XvPutImage, and then attempt to re-blit. The Xorg server hangs (inside the kernel ioctl), presumably waiting for the card to complete an action. The hang occurs at the point where the application calls glXSwapBuffers(). I am using libSDL-1.2.13 which wraps this inside of SDL_GL_SwapBuffers(). Here's the X server's stack: (gdb) bt #0 0xb7dab9e0 in ioctl () from /lib/libc.so.0 #1 0xb7d056b1 in drmIoctl (fd=8, request=25688, arg=0x0) at xf86drm.c:187 #2 0xb7d0948a in drmCommandNone (fd=8, drmCommandIndex=24) at xf86drm.c:2313 #3 0xb7c92321 in I830BlockHandler (i=0, blockData=0x0, pTimeout=0xbf9c36f4, pReadmask=0x823a800) at i830_driver.c:2232 #4 0x081f796d in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0xbf9c36f4, pReadmask=0x823a800) at animcur.c:222 #5 0x080fe220 in compBlockHandler (i=0, blockData=0x0, pTimeout=0xbf9c36f4, pReadmask=0x823a800) at compinit.c:159 #6 0x08078071 in BlockHandler (pTimeout=0xbf9c36f4, pReadmask=0x823a800) at dixutils.c:379 #7 0x080b02b8 in WaitForSomething (pClientsReady=0x834f648) at WaitFor.c:215 #8 0x0806a485 in Dispatch () at dispatch.c:362 #9 0x08063bb1 in main (argc=4, argv=0xbf9c38e4, envp=0xbf9c38f8) at main.c:283 Here's the associated kernel stack: cat 1218/stack [<c030f1e1>] i915_wait_request+0x121/0x180 [<c030f275>] i915_gem_throttle_ioctl+0x35/0x50 [<c02fa495>] drm_ioctl+0x125/0x360 [<c018c5eb>] vfs_ioctl+0x6b/0x80 [<c018c8c8>] do_vfs_ioctl+0x2c8/0x4f0 [<c018cb29>] sys_ioctl+0x39/0x60 [<c0102ce1>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff Here's the stack trace from my program: #0 0xb769fd62 in select () from /lib/libc.so.0 #1 0xb7e766c5 in _xcb_conn_wait (c=0x805cf48, cond=0xbf8fa2a8, vector=0x0, count=0x0) at xcb_conn.c:283 #2 0xb7e77fbd in xcb_wait_for_reply (c=0x805cf48, request=52, e=0xbf8fa308) at xcb_in.c:376 #3 0xb7edc8dd in _XReply (dpy=0x80550a8, rep=0xbf8fa334, extra=0, discard=1) at xcb_io.c:454 #4 0xb7687d05 in XFixesExtAddDisplay (extinfo=0xb768970c, dpy=0x80550a8, ext_name=0xb76896e8 "XFIXES") at Xfixes.c:85 #5 0xb7688092 in XFixesFindDisplay (dpy=0x80550a8) at Xfixes.c:201 #6 0xb7685d71 in XFixesCreateRegion (dpy=0x80550a8, rectangles=0xbf8fa3f4, nrectangles=1) at Region.c:33 #7 0xb797125e in dri2CopySubBuffer (pdraw=0x84039b0, x=0, y=0, width=1280, height=800) at dri2_glx.c:213 #8 0xb79712f2 in dri2SwapBuffers (pdraw=0x84039b0) at dri2_glx.c:224 #9 0xb7937d61 in glXSwapBuffers (dpy=0x80550a8, drawable=6291471) at glxcmds.c:890 #10 0xb7b44245 in X11_GL_SwapBuffers (this=0x80584d0) at ./src/video/x11/SDL_x11gl.c:428 #11 0xb7b2cd7c in SDL_GL_SwapBuffers () at ./src/video/SDL_video.c:1515 #12 0x0804a748 in main (argc=2, argv=0xbf8fa6e4) at GLExtTest.cpp:347 I did get an intel_gpu_dump of the GPU, which I am attaching to this issue. The email on intel-gfx below alludes to "virtual beverages" for such output: * http://lists.freedesktop.org/archives/intel-gfx/2009-April/002104.html
The intel_gpu_dump is located at: * http://www.colemankane.org/dump-i945gme-Xv-with-GLX_EXT_texture_from_pixmap.txt This is because it is larger than the maximum upload limit for attachments here.
Created attachment 25865 [details] My xorg.conf file Attaching my xorg.conf file
Created attachment 25866 [details] My Xorg log file Attaching my Xorg log file.
Created attachment 25867 [details] My dmesg file I am attaching my dmesg output as well.
I am able to reproduce the same behavior using the non-Shm as well as the Shm versions of the Xv calls. The same code is used to perform the work without OpenGL and using the Window as the Drawable, and that configuration results in a successful operation. The same code is also used to perform direct RGB render-to-texture using XImage objects and succeeds in both the Shm and Non-Shm versions of that code. The code is a demonstration program that accepts a series of arguments via the command-line for turning on/off OpenGL, texture-from-pixmap, Xv, and Shm.
Created attachment 25868 [details] My demo code that reproduces the hang This is the C/C++ program that reproduces the "GPU hang" bug, when executed to use the texture-from-pixmap, Xv, OpenGL paths. Both non-Shm and Shm variants of the Xv code cause the program to hang at SDL_GL_SwapBuffers(), which hangs when it calls glXSwapBuffers(). I will modify the program so that it doesn't use IUI, SDLmm, and SDL, and post a successor to this attachment. That will take a little time, so I want to let you see what I am doing so far by attaching this code here right now.
I initially encountered this bug on the recent release versions of all software modules: * mesa 7.4 * xorg-server 1.6.1 (with Xv Pixmaps patch from #21143) * libdrm 2.4.9 * xf86-video-intel 2.7.0 * linux 2.6.29.2 I am running Linux kernel 2.6.30rc4, and I see the same behavior on both versions.
I've built an entire Linux system from scratch and from source code for this, so there's no particular "distro". Mostly, I am using the latest vanilla code from the projects that I need. I just have a bundle of Makefile sources that perform the "build everything and deploy it to a staging directory" from sources. Linux just boots to and init script which starts up necessary daemons (like udevd) and sshd (for remote access) and then execs /bin/tcsh for a shell.
Created attachment 25875 [details] Rewrite of test case so that it only depends on X11 and Mesa (OpenGL) I replaced all of the external libraries with X11 calls and OpenGL calls to perform the same Window/Context setup. This new test case reproduces the bugs, and it should be able to stand on its own (e.g. be built by you).
Created attachment 25876 [details] GZipped version of the externally-linked output of intel_gpu_dump I'm uploading a gzipped version of the intel_gpu_dump output so that it is still available even if the externally-linked text vanishes.
I just sync'd with the git master of libdrm, mesa, and xf86-video-intel and now the procedure appears to be working! I suspect the "can't figure out which pipe to sync on" fix probably did it. I'll retest with release versions of xorg-server, mesa, and libdrm and get back to you.
Since it gets further now, I ran into another crash. This time, it only happens after I run the program and exit it. The X server crashes at: [New process 1713] #0 0xb7d96a97 in kill () from /lib/libc.so.0 (gdb) bt #0 0xb7d96a97 in kill () from /lib/libc.so.0 #1 0xb7dba5bd in raise () from /lib/libc.so.0 #2 0xb7dbc15b in abort () from /lib/libc.so.0 #3 0x080c8081 in ddxGiveUp () at xf86Init.c:1403 #4 0x080c817a in AbortDDX () at xf86Init.c:1448 #5 0x080bf40f in AbortServer () at log.c:404 #6 0x080bf7e9 in FatalError (f=0x820e124 "Caught signal %d (%s). Server aborting\n") at log.c:529 #7 0x080b71f6 in OsSigHandler (signo=11, sip=0xbf9ad76c, unused=0xbf9ad7ec) at osinit.c:152 #8 <signal handler called> #9 0x0808f5d1 in privateExists (privates=0x83643a8, key=0x8239218) at privates.c:79 #10 0x0808f55d in dixLookupPrivate (privates=0x83643a8, key=0x8239218) at privates.c:162 #11 0x081bfb0b in xf86XVRemovePortFromWindow (pWin=0x8364390, portPriv=0x8261100) at xf86xv.c:985 #12 0x081c154b in xf86XVPutImage (client=0x8253060, pDraw=0x8252840, pPort=0x8262368, pGC=0x8364670, src_x=0, src_y=0, src_w=1920, src_h=1080, drw_x=0, drw_y=0, drw_w=1920, drw_h=1080, format=0x8260f78, data=0xb62c6034 "", sync=0, width=1920, height=1080) at xf86xv.c:1723 #13 0xb7d79c25 in XvdiPutImage (client=0x8253060, pDraw=0x8252840, pPort=0x8262368, pGC=0x8364670, src_x=0, src_y=0, src_w=1920, src_h=1080, drw_x=0, drw_y=0, drw_w=1920, drw_h=1080, image=0x8260f78, data=0xb62c6034 "", sync=0, width=1920, height=1080) at xvmain.c:713 #14 0xb7d7e366 in ProcXvPutImage (client=0x8253060) at xvdisp.c:1010 #15 0xb7d7ee8a in ProcXvDispatch (client=0x8253060) at xvdisp.c:1276 #16 0x0806a627 in Dispatch () at dispatch.c:432 #17 0x08063bb1 in main (argc=4, argv=0xbf9adf94, envp=0xbf9adfa8) at main.c:283 I looked at the code in frame #12, above, and I suspect that for some reason when the old app exits, the Xv port is still attached to a destroyed Window handle. It then attempts to call xf86RemovePortFromWindow on that destroyed Window, which causes a Segfault. Do you want me to open a new issue for this problem?
Seems that xf86XVStopVideo and I830StopVideo never get called, even if I try to call XvStopVideo from my application.
Hah, saw your dump without seeing your comment that it's working now. I've posted a patch that should fix things if your pixmap was bigger than the screen. http://lists.freedesktop.org/archives/intel-gfx/2009-May/002453.html
Thanks, I seem to get the crash whether or not I am using a pixmap larger than the physical screen size. I've tried with a pixmap of 1920x1080 and 1280x720 on a 1600x1200 display.
I verified, the patch doesn't fix the new bug.
Created attachment 26005 [details] [review] Band-aid to stop Xv[Shm]PutImage from crashing the Xserver This attachment is a diff that will allow the Xserver to avoid the crash. However, this probably means that if an Xv app is running and a new one is started up on the same port, the xf86XVRemovePortFromWindow function won't be called to "detach" the window prior to the next Xv[Shm]PutImage call. I don't know why, but it seems like the target of the portPriv pointer is either used after uninitialized malloc, or else it is use-after-freed on the second or another successive Xv attempt on the same Port. Perhaps this has to do with code elsewhere assuming that the target of an XvPutImage will always be a Window drawable? The xf86RemovePortFromWindow function seems to assume this.
The original bug is fixed, so I'm closing this one. Please open one against the server if you want to get your server XV stuff handled.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.