Using latest Intel driver (with i965 support) with Mesa CVS and X7.1, SPECViewperf81 crashed X server after running several minutes. kernel: 2.6.17.7 OS: FC5/ia32e platform: G965 C0 how to reproduce: 1. ./src/Configure 2. ./Run_All.csh after running a little time, X server aborted with the error, Error in I830WaitLpRing(), now is 972543175, start is 972541174 pgetbl_ctl: 0x3ff80001 pgetbl_err: 0x0 ipeir: 0 iphdr: 0 LP ring tail: 20a8 head: 1a0c8 len: 1f001 start 0 eir: 0 esr: 1 emr: ffdf instdone: 0 instpm: 0 memmode: 0 instps: 0 hwstam: dfff ier: 0 imr: dfff iir: a0 space: 98328 wanted 131064 Fatal server error: lockup Error in I830WaitLpRing(), now is 972545195, start is 972543194 pgetbl_ctl: 0x3ff80001 pgetbl_err: 0x0 ipeir: 0 iphdr: 0 LP ring tail: 20b0 head: 1a0c8 len: 1f001 start 0 eir: 0 esr: 1 emr: ffdf instdone: 0 instpm: 0 memmode: 0 instps: 0 hwstam: dfff ier: 0 imr: dfff iir: a0 space: 98320 wanted 131064 FatalError re-entered, aborting lockup
this crashment appears both on 32-bit and 64-bit machine
Created attachment 6654 [details] Xorg.log
Created attachment 6655 [details] xorg.con
Fixed in latest Mesa CVS.
We're still seeing the same error. Please let me know what info you want.
(In reply to comment #5) > We're still seeing the same error. > Please let me know what info you want. What output do you get from viewperf? Specifically, what test is running during the crash? Is it always the same? Have you got more recent hardware than C0? I only have a C1 here and there were definitely some differences - can you see whether the crash happens for you on C1?
Created attachment 6854 [details] our test result, every time the test always crashed when 3dsmax is running
Created attachment 6855 [details] output of viewperf, we are using C1
BTW, when we compile the source code, we got an error, clock.c: In function ‘stopclock’: clock.c:85: error: “CLK_TCK” undeclared (first use in this function) clock.c:85: error: (Each undeclared identifier is reported only once clock.c:85: error: for each function it appears in.) make: *** [Release/Linux32/clock.o] Error 1 our solution: add definition in clock.c #define CLK_TCK ((__clock_t) __sysconf (2)) /* 2 is _SC_CLK_TCK*/ in clock.c, at the end of the file, there some lines like this, #ifdef WIN32 period = (float) (GetTickCount() - gtime) / 1000.0F; #else period = (float) (times(&tbuf) - gtime) / (float) CLK_TCK; #endif return (period); I try to change (float) CLK_TCK to 1000.0F as ifdef WIN32, but the some crashment appeared.
This issue goes away when testing on production G965 system.
I see a very similar bug on my Intel DG965WH motherboard, with SPECviewperf 9.0.3. I bought this board retail from tigerdirect.ca, so it better be production hardware. :) Tell me how to find out the hardware version and I'll post it. I'm not keen on taking the heatsink off the northbridge to read the stamp, but other than that... I'm using AMD64 Ubuntu Edgy (xorg 7.1). (I changed the "Hardware" field for this bug, since it was closed for ia32 hardware. I hope that's ok.) I have drm (including kernel-side), mesa, and xf86-video-intel compiled from git sources. (updated yesterday, march 1). X is generally working quite stably for playing games, e.g. vegastrike. 2GB of dual channel DDR2-6400 and a core2duo 2.4GHz kick butt. :) BTW, I bought Intel graphics hardware specifically because the drivers were Free and well supported. I want to be able to run Xen, and I just plain like Free software. Err, back to the bug. In my quest to find new and exciting ways to crash X in a way that would force me to reboot (before I start using this machine as my server for everything at home as well as a desktop) I tried SPECviewperf. The 3dsmax test doesn't crash the server, but it does segfault. Run All Summary 3dsmax-04 Weighted Geometric Mean = 0.00000 catia-02 Weighted Geometric Mean = 0.00000 ensight-03 Weighted Geometric Mean = 1.335 light-08 Weighted Geometric Mean = 2.672 maya-02 Weighted Geometric Mean = 7.840 proe-04 Weighted Geometric Mean = 1.219 sw-01 Weighted Geometric Mean = 1.818 ugnx-01 Weighted Geometric Mean = 0.00000 tcvis-01 Weighted Geometric Mean = 0.00000 I guess the server crashed during the the last test, ugs, since sum_results/ugs doesn't have a summary.txt. The relevant messages in my kernel log are: Mar 1 20:56:51 tesla kernel: [95331.474024] viewperf[21069]: segfault at 00002afda95bfe68 rip 00002afca7bbc105 rsp 00007fff04532a38 error 4 Mar 1 20:57:08 tesla kernel: [95348.330566] viewperf[21090]: segfault at 00002ba004981e78 rip 00002b9f0377e82c rsp 00007fffa8974f88 error 4 Mar 1 20:57:47 tesla kernel: [95387.069937] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3012069 emitted: 3013566 Mar 1 20:57:52 tesla kernel: [95392.140823] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3015814 emitted: 3017333 Mar 1 20:57:57 tesla kernel: [95397.259699] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3019581 emitted: 3021098 Mar 1 20:58:02 tesla kernel: [95402.382584] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3023326 emitted: 3024865 Mar 1 20:58:08 tesla kernel: [95407.585200] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3027057 emitted: 3028632 Mar 1 20:58:13 tesla kernel: [95412.920028] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3030859 emitted: 3032395 Mar 1 20:58:18 tesla kernel: [95418.134891] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3034042 emitted: 3036163 Mar 1 20:58:21 tesla kernel: [95421.134232] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3035667 emitted: 3036163 Mar 1 20:58:28 tesla kernel: [95427.832940] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3038184 emitted: 3038455 Mar 1 20:58:33 tesla kernel: [95433.499696] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 3040150 emitted: 3040738 Mar 1 21:09:34 tesla kernel: [96093.646095] viewperf[21217]: segfault at fffffffff1218980 rip fffffffff1218980 rsp 00007fffb9e6e0c8 error 14 Mar 1 21:09:39 tesla kernel: [96099.111426] viewperf[21218]: segfault at ffffffffa9b4a980 rip ffffffffa9b4a980 rsp 00007fff0153b7a8 error 14 Mar 1 21:15:26 tesla kernel: [96446.134405] viewperf[21230]: segfault at 00000000787dc980 rip 00000000787dc980 rsp 00007fff328a7ae8 error 14 Mar 1 21:19:45 tesla kernel: [96705.174709] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 63645579 emitted: 63865738 My Xorg.0.log ends with: Error in I830WaitLpRing(), now is 272316156, start is 272314155 pgetbl_ctl: 0x7ff80001 pgetbl_err: 0x0 ipeir: 0 iphdr: 0 LP ring tail: 1cb00 head: 12b18 len: 1f001 start 0 eir: 0 esr: 1 emr: ffdf instdone: 0 instpm: 0 memmode: 0 instps: 0 hwstam: cffe ier: 82 imr: 0 iir: 70 space: 90128 wanted 131064 (II) I810(0): [drm] removed 1 reserved context for kernel (II) I810(0): [drm] unmapping 8192 bytes of SAREA 0x2efff000 at 0x2b12a3439000 Fatal server error: lockup Error in I830WaitLpRing(), now is 272318181, start is 272316180 pgetbl_ctl: 0x7ff80001 pgetbl_err: 0x0 ipeir: 0 iphdr: 0 LP ring tail: 1cb08 head: 12b18 len: 1f001 start 0 eir: 0 esr: 1 emr: ffdf instdone: 0 instpm: 0 memmode: 0 instps: 0 hwstam: dfff ier: 0 imr: dfff iir: 70 space: 90120 wanted 131064 FatalError re-entered, aborting lockup I can gather more info if you can't reproduce this. I'll post my kernel log and xorg log. BTW, clock.c compiles ok if you use -D_XOPEN_SOURCE, or some other feature-test macro that results in time.h defining the CLK_TCK (which isn't in C99, or something, so glibc doesn't define it normally). viewperf 9.0.3 compiles fine for me out of the box, even though it still uses CLK_TCK.
Created attachment 8933 [details] peter's kernel log
Created attachment 8934 [details] peter's X log
Created attachment 8935 [details] peter's x config I ran X with -layout simple, which just uses the i965 head. I wasn't doing any -sharevts multiseat stuff either; I have a separate xorg.conf for that. :)
I think I just spammed you guys with a bunch of emails while I tweaked the mime type on an attachment. sorry, I didn't realize emails were getting sent for every time I did something minor, or I would have left it alone :(
I reproduced this again. This time, without being preceded by other segfaulting graphics programs that might have caused problems. I did run 32bit glxgears and googleearth to test the 32bit dri libs I compiled, but they worked fine. This time I noticed that viewperf always does viewperf: main/framebuffer.c:219: _mesa_free_framebuffer_data: Assertion `fb->RefCount == 0' failed. Aborted instead of exiting. I think this is at the end of a test, when it would have exited anyway. The kernel log messages: Mar 2 20:17:26 tesla kernel: [61215.978734] [drm:i915_wait_irq] *ERROR* i915_wa it_irq: EBUSY -- rec: 164764684 emitted: 164766183 all come from the ensight test. The three viewperf segfaults are all from the ugnx test. The test that crashes X is: $ ./Run_Viewset.csh tcvis-01 tcvis results Running: tcvis-01.csh Writing PNG file '../results/tcvis/tcvis01.png'...done. Writing PNG file '../results/tcvis/tcvis01-depth.png'...done. Writing PNG file '../results/tcvis/tcvis01Full.png'...done. Writing PNG file '../results/tcvis/tcvis01Full-depth.png'...done. DRM_I830_BATCHBUFFER: -13 viewperf: intel_context.c:694: UNLOCK_HARDWARE: Assertion `intel->batch->ptr == intel->batch->map + intel->batch->offset' failed. Aborted The errors at the end of Xorg.0.log are the same as before, just with a few different numbers. Also, it only takes < 15 seconds for tcvis to crash X. I'm running viewperf at 1024x768, in case that matters. My X resolution is was 1280x1024 this time, but I think last time I was at 1024x768. (my CRT only does 60Hz at 1280x1024, so I usually use xrandr to bring it down.) Trying to start X again fails, with essentially the same error. And suspend/resume doesn't seem to work anymore (it used to re-post the video, not it oopses the kernel with the video still off). It would be really nice if crashing X didn't mean I had to reboot before I could use X again on that head. I'm typing now on X running on the PCI r128 in this machine. I think I forgot to mention that this didn't lock the machine at all. Only the video hardware is out to lunch.
tested again, this time fresh from a reboot. booted up (to a text console) sudo X -config ... -layout ... DISPLAY=:0 fluxbox in X open a terminal crash X within 10 seconds of starting tcvis. (configured to run at 640x480) peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ ./Run_Viewset.csh tcvis-01 tcvis results Running: tcvis-01.csh Writing PNG file '../results/tcvis/tcvis01.png'...done. Writing PNG file '../results/tcvis/tcvis01-depth.png'...done. Writing PNG file '../results/tcvis/tcvis01Full.png'...done. Writing PNG file '../results/tcvis/tcvis01Full-depth.png'...done. DRM_I830_BATCHBUFFER: -13 viewperf: intel_context.c:694: UNLOCK_HARDWARE: Assertion `intel->batch->ptr == intel->batch->map + intel->batch->offset' failed. Aborted So this is a very reproducible bug on my system, and now I'm sure that nothing I did before running viewperf could have confused the driver.
Created attachment 11381 [details] Xorg log error with SPECViewperf test I'm having errors with SPECViewperf (9.0.3) benchmark too. I'm using Ubuntu 7.10, that uses Mesa 7.0.x, when I run the test with window manager (Gnome), the test fails on light-08 test. If I run a clean X, without GDM and Gnome, command: 'xinit -e xterm', the light-08 test pass, but fails on tcvis-01.
I forget to say in my former comment: I open a bug (12235) with errors in some GL applications (e.g. Torcs game). The error is very similar to the error on this bug. I think that can have some relationship.
Peter, are you still seeing the xserver crash with the latest git driver? SpecViewPerf 9.0.3 works fine on my i965, on both 32-bit and 64-bit system.
Peter, I forgot to mention we added "#define GLX_GLXEXT_PROTOTYPES" at the head of the file viewperf.c to avoid segfault error in ugnx-01 on x86-64. Please try if that makes difference.
I finally got around to trying this again. I can still reproduce this, now on AMD64 Ubuntu Gutsy. :( > I forgot to mention we added "#define GLX_GLXEXT_PROTOTYPES" at the head of the > file viewperf.c to avoid segfault error in ugnx-01 on x86-64. Thanks, that does make viewperf run cleanly except when it locks up X. I used ./Configure to compile a 64bit viewperf. (after wasting several hours without realizing I was running 32bit viewperf with non-updated 32bit mesa.) my libdrm, kernel-side drm, and mesa are from git as of Nov 11th. My X server, kernel, and xorg intel driver are from Ubuntu Gutsy. I'm not that familiar with git, so I did a fresh checkout of the mesa git tree. (diff showed it was in fact identical to my git tree, except for configs/linux-dri-x86_64, which I'd changed). That also made sure I was compiling with the standard gcc flags, instead of my usual -Os -march=nocona -mtune=generic (for core2duo). Anyway, that's what the LD_LIBRARY_PATH= and LIBGL_DRIVERS_PATH= is about. peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ libgl=/usr/local/src/g965/mesa.fresh/lib peter@tesla:/usr/local/src/opengl/SPEC/SPECViewperf9.0$ LD_LIBRARY_PATH="$libgl" LIBGL_DRIVERS_PATH="$libgl" LIBGL_DEBUG=verbose MESA_DEBUG=1 /usr/bin/time ./Run_Viewset.csh tcvis-01 tcvis results Running: tcvis-01.csh libGL: XF86DRIGetClientDriverName: 1.8.0 i965 (screen 0) libGL: OpenDriver: trying /usr/local/src/g965/mesa.fresh/lib/i965_dri.so drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 6, (OK) drmOpenByBusid: Searching for BusID pci:0000:00:02.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 6, (OK) drmOpenByBusid: drmOpenMinor returns 6 drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0 Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable libGL error: Can't open configuration file /etc/drirc: No such file or directory. Writing PNG file '../results/tcvis/tcvis01.png'...done. Writing PNG file '../results/tcvis/tcvis01-depth.png'...done. Writing PNG file '../results/tcvis/tcvis01Full.png'...done. Writing PNG file '../results/tcvis/tcvis01Full-depth.png'...done. intelWaitIrq: drmI830IrqWait: -16 19.91user 9.75system 0:34.27elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (43major+73954minor)pagefaults 0swaps kernel: [ 49.112607] [drm] Initialized i915 1.11.0 20070209 on minor 0 ... [ 772.653501] [drm:i915_wait_irq] *ERROR* i915_wait_irq: EBUSY -- rec: 5505330 emitted: 5545013 Xorg.0.log: excerpt Error in I830WaitLpRing(), timeout for 2 seconds pgetbl_ctl: 0xcff80001 pgetbl_err: 0x0 ipeir: 0 iphdr: 60020100 LP ring tail: fae0 head: 7150 len: 1f001 start 0 Err ID (eir): 0 Err Status (esr): 1 Err Mask (emr): ffffffdf instdone: 6fe5fafd instdone_1: ffff0 instpm: 0 memmode: 0 instps: 409f02e HW Status mask (hwstam): fffecffe IRQ enable (ier): 2 imr: fffe0000 iir: 10c0 acthd: 5ff1430 dma_fadd_p: 5ff1430 ecoskpd: 307 excc: 0 cache_mode: 6800/180 mi_arb_state: 44 IA_VERTICES_COUNT_QW 0/0 IA_PRIMITIVES_COUNT_QW 0/0 VS_INVOCATION_COUNT_QW 0/0 GS_INVOCATION_COUNT_QW 0/0 GS_PRIMITIVES_COUNT_QW 0/0 CL_INVOCATION_COUNT_QW 0/0 CL_PRIMITIVES_COUNT_QW 0/0 PS_INVOCATION_COUNT_QW 0/0 PS_DEPTH_COUNT_QW 0/0 WIZ_CTL 0 TS_CTL 0 TS_DEBUG_DATA b1618b5b TD_CTL 0 / 0 space: 95848 wanted 131064 (II) intel(0): [drm] removed 1 reserved context for kernel (II) intel(0): [drm] unmapping 8192 bytes of SAREA 0x2efff000 at 0x2b9056a43000 Fatal server error: lockup (then the above repeats) I don't use a multiseat setup any more, so my xorg.conf looks like: Section "ServerFlags" Option "DefaultServerLayout" "simple" Option "AllowMouseOpenFail" "true" Option "AIGLX" "false" EndSection Section "Module" Load "i2c" Load "bitmap" Load "ddc" Load "dri" Load "extmod" Load "freetype" Load "glx" Load "int10" Load "type1" Load "vbe" EndSection Section "ServerLayout" Identifier "simple" Screen "intel Screen" InputDevice "Generic Keyboard" InputDevice "Configured Mouse" EndSection Section "Device" Identifier "intel" Driver "intel" BusID "PCI:0:2:0" EndSection Section "Screen" Identifier "intel Screen" Device "intel" Monitor "auto" # Monitor section with no options set. DefaultDepth 24 SubSection "Display" Depth 24 Modes "1680x1050" ... EndSubSection EndSection (normally I use this Modules section, because if I want to use a multiseat serverlayout, I need e.g. int10 and vbe commented out. and dga crashes X.) Section "Module" # Load "i2c" Load "bitmap" # Load "ddc" Load "dri" SubSection "extmod" Option "omit xfree86-dga" EndSubSection # Load "extmod" # subsection does this Load "freetype" Load "glx" # Load "int10" Load "type1" # Load "vbe" EndSection
Created attachment 12489 [details] peter's Xorg.log, Nov 2k7
BTW, this is 100% reproducible for me, at about 20 seconds in. Other things I forgot to mention: Another reason I did a fresh git checkout was that this symlink looked messed up: peter@tesla:/usr/local/src/g965/mesa$ ll src/mesa/drivers/dri/i965/server/ total 0 lrwxrwxrwx 1 peter src 27 2007-11-11 20:24 intel_dri.c -> ../intel/server/intel_dri.c It's a broken symlink, which I'm guessing should be a symlink to ../../intel/... (This is obviously a separate bug, but easy to fix so I just meant to mention it here.) Also, Unreal Tournament 2004 causes lockups that look the same as the viewperf tcvis ones. The only difference is that the X server doesn't log anything or exit until you do a killall ut2004-bin. This was reported on an Ubuntu bug report that was originally about Ubuntu's mesa being very prone to lockups on g965... https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/104673 A few other users found that bug report and reported their problems on it. I was able to reproduce the lockups with the ut2004 demo. The demo is a freely available, with Linux binaries for AMD64 and x86. Try http://treefort.icculus.org/ut2004/ bf9f483902c6006b94c327fb7b585086 UT2004-LNX-Demo3334.run As I said on the Ubuntu bug report, the lockups are reproduced most quickly on the "bombing run" game type. The error messages (including the Xorg.log) look quite similar to the viewperf lockups. peter@tesla:~$ libgl=/usr/local/src/g965/mesa.fresh/lib; LD_LIBRARY_PATH="$libgl" LIBGL_DRIVERS_PATH="$libgl" LIBGL_DEBUG=verbose MESA_DEBUG=1 /usr/bin/time ut2004 WARNING: ALC_EXT_capture is subject to change! libGL: XF86DRIGetClientDriverName: 1.8.0 i965 (screen 0) libGL: OpenDriver: trying /usr/local/src/g965/mesa.fresh/lib/i965_dri.so drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 153, (OK) drmOpenByBusid: Searching for BusID pci:0000:00:02.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 153, (OK) drmOpenByBusid: drmOpenMinor returns 153 drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0 Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable libGL error: Can't open configuration file /etc/drirc: No such file or directory. intelWaitIrq: drmI830IrqWait: -16 Signal: SIGTERM [terminate] Requesting Exit. Signal: SIGQUIT [quit] Aborting. Crash information will be saved to your logfile. Command exited with non-zero status 1 174.79user 4.44system 6:34.00elapsed 45%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (68major+117331minor)pagefaults 0swaps (lockups on bombing run usually happen much faster than the "assault" game I played.)
Peter, We have fixed some critical crash bugs in our 3D driver. Could you check if the issue is still valid with the latest mesa and drm driver? Thanks
(In reply to comment #25) > Peter, > We have fixed some critical crash bugs in our 3D driver. > Could you check if the issue is still valid with the latest mesa and drm > driver? SPECviewperf 9.0.3 runs fine now. :) Nice work, guys. mesa updated 2k8/3/9, drm kernel and user updated 2k8/2/29. (didn't want to reboot again to update drm.) I tried once on a fairly fresh X server and left it alone. I ran it again and put the window behind other stuff I was doing. (firefox, vnc, mplayer). Nothing I did caused a lockup that made the X server exit. There's a bad interaction between mplayer -vo gl:yuv=2:swapinterval=1:lscale=1:cscale=0 and SPECviewperf. While viewperf was running, I tried to run mplayer, but the graphics froze after the mplayer window opened. I thought I'd managed to crash X like usual, but sshing in and killing mplayer unfroze the desktop with no ill effects. (I think -QUIT worked sometimes, but other times -KILL was needed.) mplayer -vo xv didn't cause any problems. Unreal Tournament doesn't crash X anymore either, but it doesn't work. It segfaults while trying to load a game. (after all the menus, while the level is loading). These last two are obviously separate bugs from the dri lockups this bug report was about, so it can finally get closed. And they're _much_ less serious, since they don't need a reset of the computer to get the video hardware back to a useable state. Needing to reboot after an X lockup was always the most annoying thing. I guess I should file bug reports for those two things I mentioned above. I'll probably do that tomorrow.
Thanks Peter, I am cloing the bug. Please open seperate bug reports for the issues you mentioned. BTW: ut2004 runs well here on i965 machines. My favoritemethod to debug UT is to modify the ut2004 script, replace exec "./ut2004-bin" to exec gdb "./ut2004-bin" so you can see backtrace when ut crashes.
> Please open seperate bug reports for the issues you mentioned. I was going, but they went away when I upgraded to Ubuntu Hardy, with its newer X server and intel driver. I'm only bothering to say this because I got an email about a tag being added to this bug, so I thought other people might be looking at it and wondering what happened. happy hacking.
Mass version move, cvs -> git
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.