Bug 17460

Summary: [G45 EXA] X crash
Product: xorg Reporter: Sven Arvidsson <sa>
Component: Driver/intelAssignee: Wang Zhenyu <zhenyu.z.wang>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: andre.bugs2, andre.bugs, azawacki, cworth, d13f00l, ranjo.jjxl, refux
Version: 7.4 (2008.09)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 16926    
Attachments:
Description Flags
configuration
none
Crash with 2.4.2
none
hang with driver from git
none
backtrace from core dump
none
Backtrace after increasing IGD Graphics Mode to 128MB
none
Log of crash with intel driver version 2.4.1
none
Backtrace of crash using EXANoComposite
none
Xorg log of crash with EXANoComposite
none
test patch (clocking gate)
none
Backtrace from test patch (19182)
none
Xorg log from test patch (19182)
none
xorg output
none
xorg output
none
Xorg log from crash with i830-clock-gate.patch applied to 2.4.2 none

Description Sven Arvidsson 2008-09-06 11:46:04 UTC
I have a G45 based motherboard (Asus P5Q-EM) where X crashes a few seconds after I log in using gdm. It seems to be doing fine just sitting at the gdm login screen.

The system is reachable over the network after the crash, but the screen stays black, I'm guessing this is normal as I can't seem to switch between X and text-mode before the crash either.

I'm using version 2.4.2 of the intel driver, with xserver 1.5.0 and version 7.1 of mesa.



If I use xf86-video-intel from git (b9ef0ed7d7b96eca6394cd0d367369ec511d1bcd) the behaviour is a bit different. At the point where it usually crashes, the screen stops redrawing, but the cursor is still movable. The logs are filled with:

 [mi] EQ overflowing. The server is probably stuck in an infinite loop.
 [mi] mieqEnequeue: out-of-order valuator event; dropping.

I'm attaching my xorg.conf, Xorg.0-2.4.2.log (log of the crash with 2.4.2) and Xorg.0-git.log (log of driver from git). 

I'm also attaching gdb.txt, a post mortem backtrace from a core dump.
Comment 1 Sven Arvidsson 2008-09-06 11:46:32 UTC
Created attachment 18701 [details]
configuration
Comment 2 Sven Arvidsson 2008-09-06 11:47:02 UTC
Created attachment 18702 [details]
Crash with 2.4.2
Comment 3 Sven Arvidsson 2008-09-06 11:47:28 UTC
Created attachment 18703 [details]
hang with driver from git
Comment 4 Sven Arvidsson 2008-09-06 11:47:52 UTC
Created attachment 18704 [details]
backtrace from core dump
Comment 5 Sven Arvidsson 2008-09-07 14:37:21 UTC
A little bit of success... using XAA instead of EXA does work.
Comment 6 Sven Arvidsson 2008-09-10 05:46:06 UTC
Created attachment 18805 [details]
Backtrace after increasing IGD Graphics Mode to 128MB

I wonder if this could be a BIOS problem. 

I tried increasing "IGD Graphics Mode Select" from the default 32MB to 128MB and I get a very different backtrace. Curiously enough it doesn't seem to involve EXA at all, but XKB?
Comment 7 André 2008-09-14 11:19:13 UTC
Created attachment 18863 [details]
Log of crash with intel driver version 2.4.1

I have the same motherboard (Asus P5Q-EM), and I'm seeing the same behaviour. Attached is the the xorg log from the crash that I get, using version 2.4.1 of the intel driver. My crash looks very similar to the reporter's crash; it looks like the crash is occuring in I830WaitLpRing().

Let me know if I can do something else to help debug this error.
Comment 8 André Dahlqvist 2008-09-14 11:53:41 UTC
I think this is a duplicate of bug 17567. Sven, do you agree?
Comment 9 Sven Arvidsson 2008-09-14 13:57:53 UTC
Yes, or possibly 17567 is a dupe of this, doesn't matter to me :)
Comment 10 Dylan 2008-09-14 16:57:21 UTC
Actually, I have that same motherboard too, and I get the same results from GIT, I'm on Gentoo, I posted that other bug.  How funny :)

I also just posted another for GIT...
https://bugs.freedesktop.org/show_bug.cgi?id=17576
Comment 11 Gordon Jin 2008-09-15 02:31:29 UTC
*** Bug 17567 has been marked as a duplicate of this bug. ***
Comment 12 Gordon Jin 2008-09-15 02:32:48 UTC
I'm not sure if cworth has G45, so I tend to assign all G45 bugs to zhenyu.
Comment 13 Wang Zhenyu 2008-09-15 23:54:40 UTC
So first try "NoAccel" option to see if this is a display problem. If ok, try "EXANoComposite" option, to see if this is render code issue.
Comment 14 Sven Arvidsson 2008-09-16 07:48:01 UTC
Created attachment 18924 [details]
Backtrace of crash using EXANoComposite

It works fine with NoAccel. If I use EXANoComposite it crashes again.

I'm including log file and backtrace, but it seems to be identical to the previous one.

I'm using git version 86f82c429f5d7067c52d3b783988917869e13d1d of the 2.4 branch.
Comment 15 Sven Arvidsson 2008-09-16 07:48:33 UTC
Created attachment 18925 [details]
Xorg log of crash with EXANoComposite
Comment 16 André Dahlqvist 2008-09-16 11:43:06 UTC
I too tested with with NoAccel and EXANoComposite, and my results were a little different than Sven's. I use 2.4.1 of the driver, so Sven's results are perhaps more relevant. I have turned off desktop effects in Ubuntu, which may be a difference too.

Using NoAccel there is no crash, but the screen is completely white. After logging in using GDM the mouse cursor indicates that GNOME starts and finishes loading, but the screen is white. When I kill X with Ctrl-Alt-Backspace I see the GNOME desktop appear for a brief moment before X is shuts down.

If I use EXANoComposite I actually see the desktop (but only the background, no menu or icons) and then the mouse cursor freezes. After that it seams like X has crashed.
Comment 17 Dylan 2008-09-17 15:57:52 UTC
I'm able to run with NOAccel, trying EXANoComposite 
Comment 18 Dylan 2008-09-17 16:37:08 UTC
I'm able to boot X with EXANoComposite after removing libdrm_intel.so off my machine and rebuilding the latest stable intel driver.  For some reason, my driver was linked against it.  Regular EXA won't work though.  
Comment 19 Wang Zhenyu 2008-09-18 00:24:21 UTC
Could you help to grab video bios version?

Try to get vbios by src/bios_reader/bios_dumper tool or
do like below:

cd /sys/devices/pci0000\:00/0000\:00\:02.0/
echo 1 > rom
cat rom > /tmp/vbios

then run

strings /tmp/vbios | grep -i "build number"
Comment 20 Sven Arvidsson 2008-09-18 03:13:46 UTC
Build Number: 1666 PC 14.34  07/07/2008  02:09:41
Comment 21 André Dahlqvist 2008-09-18 12:57:23 UTC
I have the exact same version as Sven.
Comment 22 Dylan 2008-09-18 14:56:29 UTC
(In reply to comment #21)
> I have the exact same version as Sven.
> 
Same.
Comment 23 RefuX Zanzeebarr 2008-09-19 18:29:51 UTC
I am running Ubuntu Intrepid Ibex Alpha 6 and have this same issue.
HOWEVER I have a GIGABYTE GA-EG45M-DS2 motherboard, so I guess that makes me special :)

I tried the: 
   Option "NoAccel"

and that worked for me (yay!)

I then took out the NoAccel option and added:
   Option "EXANoComposite" 

Same crash as before. :(

I tried this:
cd /sys/devices/pci0000\:00/0000\:00\:02.0/
echo 1 > rom

(also with sudo) but got a permission denied error (yeah I'm a n00b, what can I say).
Comment 24 André Dahlqvist 2008-09-19 18:47:58 UTC
RefuX: I had the same problem. It will work if you do "sudo -i", enter your password and then do:

cd /sys/devices/pci0000\:00/0000\:00\:02.0/
echo 1 > rom
Comment 25 Anthony Zawacki 2008-09-19 19:39:56 UTC
I have this same issue with the DG45FC motherboard.  I am in the same situation that with no configuration, an X crash on login, and things seem to work when using NoAccel and AccelMethod XAA (though I can get it to crash by running xine with various files recorded from an HD-PVR-1212.

The video bios on my motherboard is:

Build Number: 1659 PC 14.34  06/25/2008  20:55:25

Comment 26 RefuX Zanzeebarr 2008-09-19 19:45:29 UTC
(In reply to comment #24)
> RefuX: I had the same problem. It will work if you do "sudo -i", enter your
> password and then do:
> 
> cd /sys/devices/pci0000\:00/0000\:00\:02.0/
> echo 1 > rom
> 

Thanks much :)

Build Number: 1666 PC 14.34  07/07/2008  02:09:41
Comment 27 Wang Zhenyu 2008-09-25 02:18:26 UTC
Created attachment 19182 [details] [review]
test patch (clocking gate)

Pleas try if this test patch helps.
Comment 28 Sven Arvidsson 2008-09-25 05:17:51 UTC
(In reply to comment #27)
> Pleas try if this test patch helps.

Unfortunately it still crashes at the same point. I'm attaching a log and a backtrace.
Comment 29 Sven Arvidsson 2008-09-25 05:19:08 UTC
Created attachment 19190 [details]
Backtrace from test patch (19182)
Comment 30 Sven Arvidsson 2008-09-25 05:20:04 UTC
Created attachment 19191 [details]
Xorg log from test patch (19182)
Comment 31 Dylan 2008-09-25 14:35:36 UTC
Created attachment 19208 [details]
xorg output

[drm] Initialized i915 1.6.0 20080730 on minor 0
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<3>[drm:i915_gem_idle] *ERROR* hardware wedged

2.6.27-rc4-01377-gf2b39cd
Gem Enabled Kernel

on 2.6.26.5 I totally hardlock, can't ping, so no logs :\
Comment 32 Dylan 2008-09-25 14:35:49 UTC
Created attachment 19209 [details]
xorg output

[drm] Initialized i915 1.6.0 20080730 on minor 0
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<3>[drm:i915_gem_idle] *ERROR* hardware wedged

2.6.27-rc4-01377-gf2b39cd
Gem Enabled Kernel

on 2.6.26.5 I totally hardlock, can't ping, so no logs :\
This is with that patch
Comment 33 Dylan 2008-09-25 14:49:30 UTC
Ok, 2.6.25.5, Xorg 1.5, dunno what the rest of the Xorg libs are at, IM IN X WITH EXA WORKING!!!!!!  This is a first!!!

Git X build, in dmesg
i get [drm:i915_initialize] *ERROR* can not ioremap virtual address for ring buffer

and failure to initialize dma in xorg log

Gem enabled kernel also does not work.

Comment 34 Wang Zhenyu 2008-09-25 18:14:37 UTC
Dylan, with 2.6.26 kernel, is your success anything to do with my test patch? Is it must-have?
Comment 35 Dylan 2008-09-25 18:53:16 UTC
No success with 2.6.26, but with 2.6.25 I'm able to get into X with EXA enabled...
I tried with intel drivers from GIT + your patches, also with 2.4.2.  

I dropped my memory timings to 800mhz in bios and suddenly X is working fine even on the last Intel stable driver in 2.6.25

If I have 1gb of ram in my system, EXA is super fast, visibly as fast as XAA if not faster.

If I put in 6gb, EXA is super slow, also this pops up in my dmesg

mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
set status page addr 0x08000000

With just 1gb of ram in the system, I don't get that message.


I can run a memtest tonight at 1066mhz...the memory is rated as such, and I haven't had problems with other PCs...

Looks like theres a lot of variables here and issues crossing several bugreports...
Comment 36 Dylan 2008-09-25 19:19:48 UTC
Yeah, I fail memtest in 1066.  800 works fine and passes.

http://www.memtest.org/

Maybe anyone who is experiencing this, try running this.  You can burn the bootable ISO, some distros you can install it, or it comes preinstalled, like ubuntu.

How stupid is this, so much time wasted.   Boo at hardware that won't run at the spec stated.
Comment 37 Dylan 2008-09-25 20:16:11 UTC
Sorry, I meant to say no luck with the gem patched kernel 2.6.27.
2.6.26 is fine.  I'm incredibly dazed and confused by juggling around all these versions and playing with bios settings.;)

I'm running 2gb now, 800mhz, memtests ok, exa is working and incredibly fast, no MTRR message with just 2gb of ram.  I filed a separate bug.
Comment 38 Sven Arvidsson 2008-09-26 04:38:38 UTC
(In reply to comment #36)
> http://www.memtest.org/
> 
> Maybe anyone who is experiencing this, try running this.  You can burn the
> bootable ISO, some distros you can install it, or it comes preinstalled, like
> ubuntu.

Thanks for the tip. I ran memtest for about an hour with no errors. I have 2x2GB of RAM in a dual channel config, running at 800MHz.

As for kernel versions, I get the crash (with and without the patch) in 2.6.26.4. With 2.6.25.11 I can't start X at all, it complains about a missing /dev/agpgart
Comment 39 Dylan 2008-09-26 05:27:51 UTC
I just checked your xorg log, any time you see an, "existing errors" message appear, you need to reboot.  Your hardware is in a bad state.  Try testing after rebooting and repost the log.

You need the intel_agp module also in the newer kernel, thats probably why agpgart is missing.
Comment 40 RefuX Zanzeebarr 2008-09-27 15:40:58 UTC
With my GIGABYTE GA-EG45M-DS2 motherboard I have 800mhz ram, I ran memtest and it's just fine.

I plan to run the patch once I figure out how to apply it :)
Comment 41 Hong Yuan 2008-09-27 17:20:28 UTC
On my mobo GIGABYTE GA-EG43M-S2H (800MHz DDR2) I saw the exactly same problem and have to use NoAccel to make Ubuntu working.

I have installed the lastest Ubuntu Intrepid Alpha6 with kernel 2.6.27. I have compiled the Intel driver 2.4.2 build against xorg 1.5.1

My Bios number is:

Build Number: 1666 PC 14.34  07/07/2008  02:09:41

So the bug is not only confined to G45 but also the G43 chipset. Hope this can be fixed soon.

Comment 42 Hong Yuan 2008-09-27 17:54:13 UTC
By the way, no luck with the patch. Same error, though I am using the kernel version 2.6.27.
Comment 43 Wang Zhenyu 2008-09-27 19:03:29 UTC
As I said on another bug, I have identical Gigabyte EG43M-S2H board here, but can't see this problem. I'm using all master tips (kernel, xserver, xf86-video-intel), which just works fine to me. I have 2G memory. So please try current master, my G45 patches have all been pushed. 
Comment 44 Sven Arvidsson 2008-09-28 06:08:10 UTC
Created attachment 19271 [details]
Xorg log from crash with i830-clock-gate.patch applied to 2.4.2

(In reply to comment #39)
> I just checked your xorg log, any time you see an, "existing errors" message
> appear, you need to reboot.  Your hardware is in a bad state.  Try testing
> after rebooting and repost the log.

Good catch, thanks! 

I actually do reboot after each test, but I forgot that gdm restarts after each failed start so I have been posting the wrong log file. I guess I will keep to using startx from now on :)

I have attached a new log file from the crash.
Comment 45 Sven Arvidsson 2008-09-30 05:16:15 UTC
I have finally been able to use X with EXA. The problem was that I hade the compositing manager in Metacity enabled, with it disabled, EXA works. 

I'm using the driver from git, 11d304e99c0e11c28901ec28115d9c8b81a2b9cc, with xserver 1.5.1 and kernel 2.6.26.

If I have the compositing manager enabled, X doesn't crash, but the screen stops updating after a little while. If I log in remotely, I can see that all the applications set to launch on login are okay and running. 

If I try launching a new app, it seems to hang somewhere in libxcb, but I guess this is normal if the X server doesn't respond?

--

Once I had EXA working, I made an attempt to play video using textured video, but it seems to result in the same "hang" of the X server as enabling the compositing manager.

I will go on testing and see if I can get a backtrace when the X server stops responding.

--

I don't know if this is related but I get this in the logs after starting X (doesn't matter if I use EXA or XAA):

[ 1778.876115] [drm] Initialized i915 1.6.0 20060119 on minor 0
[ 1778.876115] [drm:i915_getparam] *ERROR* Unknown parameter 5
[ 1778.946646] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[ 1779.364315] set status page addr 0x02f50000


I also noticed this, but once again, not sure if it has anything to do with the crash:

[   15.164688] i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[   15.164750] ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f]
[   15.164811] ACPI: Device needs an ACPI driver
Comment 46 Wang Zhenyu 2008-09-30 05:23:59 UTC
[ 1778.876115] [drm:i915_getparam] *ERROR* Unknown parameter 5

This is harmless, just new param used by kernel with GEM support, if no GEM support, driver will fallback to fake mode.

So it looks we can close this bug, and for other issues we need new bug tracks.
Thanks all for testing!
Comment 47 Sven Arvidsson 2008-09-30 13:07:42 UTC
(In reply to comment #46)
> So it looks we can close this bug, and for other issues we need new bug tracks.
I have filed bug 17851 for the problems I ran into using a compositing manager.

> Thanks all for testing!
Thank you so much for fixing the bug!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.