Bug 18319 - [830M EXA] render accel broken on 830
Summary: [830M EXA] render accel broken on 830
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.4 (2008.09)
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-30 22:26 UTC by Tony Murray
Modified: 2009-11-10 10:02 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log (19.24 KB, text/plain)
2008-10-30 22:26 UTC, Tony Murray
no flags Details
xorg.conf (560 bytes, text/plain)
2008-10-30 22:39 UTC, Tony Murray
no flags Details
lspci output (1.33 KB, text/plain)
2008-11-28 05:35 UTC, Simon Berger
no flags Details
xorg log (38.28 KB, text/x-log)
2008-11-28 05:37 UTC, Simon Berger
no flags Details
xorg conf (1.92 KB, application/octet-stream)
2008-11-28 05:37 UTC, Simon Berger
no flags Details
2.6.1 xorg.log (19.37 KB, text/plain)
2009-02-06 15:38 UTC, Tony Murray
no flags Details
2.6.3 Xorg.log (43.50 KB, text/plain)
2009-03-23 11:12 UTC, Gordon Schumacher
no flags Details

Description Tony Murray 2008-10-30 22:26:01 UTC
Created attachment 19972 [details]
Xorg log

This is on a Gentoo x86 Dell X200 laptop running xorg-server 1.5.2, linux 2.6.27, and xf86-video-intel 2.5.0.

With previous versions there had been minor corruption on some parts of the screen temporarily and very occasionally it would lock up.

With 2.5.0 the entire screen is corrupted and looks like a very colorful static.  You can make out some things on the screen faintly.  The xserver hard locks after trying to type some.

Kernel log output:
[drm] Initialized drm 1.1.0 20060810
pci 0000:00:02.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10
pci 0000:00:02.0: setting latency timer to 64
[drm] Initialized i915 1.6.0 20060119 on minor 0
pci 0000:00:02.1: setting latency timer to 64
[drm] Initialized i915 1.6.0 20060119 on minor 1
[drm:i915_getparam] *ERROR* Unknown parameter 5

I've attached the Xorg log as well.
Comment 1 Tony Murray 2008-10-30 22:39:01 UTC
Created attachment 19973 [details]
xorg.conf
Comment 2 Simon Berger 2008-11-28 05:34:20 UTC
I've got the same symptoms ('colorful static' + lockup) with a different error message using version 2.5.0 and 2.5.1 on my thinkpad x30 with xorg 1.5.3 and kernel 2.6.27.5 x86 (fedora 10). So I think it is ok to attach it to this bug (!?)

The easiest way to reproduce this bug is to boot from any fedora 10 (or opensuse 11.1) live cd, which (sadly) come with this version of the driver.
Downgrading to driver version 2.3.2 partly resolves the problem, but then I get minor corruptions like the original author said (single characters of a specific font randomly become unreadable in my case).

The most recent setup that works completely flawless (no lockups or corruptions) is driver 2.1.1 and xorg 1.3.0 (current fedora 8 defaults). That version has proven to be extremely stable on this system, so it is strange to see that new versions of the driver become increasingly unusable. I have not tested any other versions though. Also the 2.1.1 version feel much smoother (e.g. scrolling in firefox) performance wise than any newer version, but that might be subjective.
After the lockup the system is still reachable on the network.

The interesting bits from the kernel log:

>>>>>>>> Kernel log output
[drm:i915_wait_irq] *ERROR* EBUSY -- rec: 1 emitted: 4
[drm:i915_wait_irq] *ERROR* EBUSY -- rec: 1 emitted: 4
[drm:i915_wait_irq] *ERROR* EBUSY -- rec: 1 emitted: 4
[drm:i915_wait_irq] *ERROR* EBUSY -- rec: 1 emitted: 4
[drm:i915_wait_irq] *ERROR* EBUSY -- rec: 1 emitted: 4
BUG: unable to handle kernel NULL pointer dereference at 00000060
IP: [<f03d706c>] :i915:i915_driver_irq_handler+0x1d/0x17d
*pde = 1292b067 *pte = 00000000 
Oops: 0000 [#1] SMP 
Modules linked in: fuse i915 drm bridge stp bnep sco l2cap bluetooth sunrpc ipv6
 dm_multipath uinput snd_intel8x0 snd_intel8x0m snd_ac97_codec ppdev ac97_bus sn
d_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss sn
d_mixer_oss snd_pcm iTCO_wdt parport_pc ipw2200 firewire_ohci ieee80211 pcspkr i
eee80211_crypt e100 firewire_core yenta_socket parport snd_timer iTCO_vendor_sup
port rsrc_nonstatic mii nsc_ircc snd crc_itu_t irda crc_ccitt video output i2c_i
801 soundcore i2c_core snd_page_alloc ata_generic pata_acpi [last unloaded: micr
ocode]

Pid: 11077, comm: X Not tainted (2.6.27.5-117.fc10.i686 #1)
EIP: 0060:[<f03d706c>] EFLAGS: 00013082 CPU: 0
EIP is at i915_driver_irq_handler+0x1d/0x17d [i915]
EAX: 00000000 EBX: ec1a1000 ECX: 00000006 EDX: edcdd800
ESI: ec1574e0 EDI: edcdd800 EBP: ec0f9d34 ESP: ec0f9d10
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process X (pid: 11077, ti=ec0f9000 task=ede44010 task.ti=ec0f9000)
Stack: ec0f9d20 ec0f9d20 00003246 00000000 ec0f9d34 c046332b 00003282 ec1574e0 
       0000000b ec0f9d54 c0463438 edcdd800 c0803330 00003212 edcdd80c 00000001 
       edcdd800 ec0f9d68 f034b6ad ec1a1000 edcdd800 ec129f60 ec0f9d78 f03d5216 
Call Trace:
 [<c046332b>] ? synchronize_irq+0x38/0x42
 [<c0463438>] ? free_irq+0xd9/0x130
 [<f034b6ad>] ? drm_irq_uninstall+0x81/0x99 [drm]
 [<f03d5216>] ? i915_dma_cleanup+0x1b/0x84 [i915]
 [<f03d56cc>] ? i915_driver_lastclose+0x31/0x35 [i915]
 [<f0349765>] ? drm_lastclose+0x36/0x22d [drm]
 [<f0349e4e>] ? drm_release+0x32d/0x3a1 [drm]
 [<f0349ea9>] ? drm_release+0x388/0x3a1 [drm]
 [<c04911a1>] ? __fput+0xad/0x13d
 [<c0491248>] ? fput+0x17/0x19
 [<c048eaf3>] ? filp_close+0x50/0x5a
 [<c042d436>] ? put_files_struct+0x68/0xaa
 [<c042d4b0>] ? exit_files+0x38/0x3d
 [<c042eb1c>] ? do_exit+0x1e5/0x744
 [<c0434ff5>] ? __sigqueue_free+0x2d/0x30
 [<c0434ff5>] ? __sigqueue_free+0x2d/0x30
 [<c04353d6>] ? __dequeue_signal+0xf4/0x11c
 [<c042f0eb>] ? do_group_exit+0x70/0x97
 [<c043746f>] ? get_signal_to_deliver+0x2af/0x2d6
 [<c0402efc>] ? do_notify_resume+0x71/0x679
 [<c0435efb>] ? send_signal+0x1fb/0x211
 [<c0435fe9>] ? do_tkill+0xd8/0xea
 [<c06a7873>] ? unlock_kernel+0x29/0x2c
 [<c048fff4>] ? fsnotify_modify+0x4f/0x5a
 [<c045fb9a>] ? audit_syscall_entry+0xf9/0x123
 [<c045fa8c>] ? audit_syscall_exit+0xb2/0xc7
 [<c0403d61>] ? work_notifysig+0x13/0x1a
 =======================
Code: ff ff 59 5b 8d 65 f4 89 d0 5b 5e 5f 5d c3 55 89 e5 57 89 d7 56 53 83 ec 18
 8b 82 a4 02 00 00 8b 9a 80 02 00 00 8b 80 48 01 00 00 <8b> 40 60 89 45 e4 90 ff
 83 90 00 00 00 8b 43 04 05 a4 20 00 00 
EIP: [<f03d706c>] i915_driver_irq_handler+0x1d/0x17d [i915] SS:ESP 0068:ec0f9d10
---[ end trace 4af61112de2d2058 ]---
Fixing recursive fault but reboot is needed!
<<<<<<<< Kernel log output

I'll attach the Xorg log + conf and lspci output (is it normal that the display controller shows up twice in the device list?).


Comment 3 Simon Berger 2008-11-28 05:35:34 UTC
Created attachment 20658 [details]
lspci output
Comment 4 Simon Berger 2008-11-28 05:37:16 UTC
Created attachment 20659 [details]
xorg log
Comment 5 Simon Berger 2008-11-28 05:37:39 UTC
Created attachment 20660 [details]
xorg conf
Comment 6 Simon Berger 2008-12-09 02:28:09 UTC
At least my version of this bug seem to be caused by the EXA acceleration method. Adding 'Option "AccelMethod" "XAA"' to the Device section completely solves the problem.

Please make this the default option at least for this chip, because this way it is basically impossible to even boot any current installation/live cds (currently even the vesa fallback drivers do not work on my laptop). 
Comment 7 Jesse Barnes 2008-12-18 14:36:58 UTC
Ah that kernel message indicates a bug that should be fixed now.  Also, does setting ExaNoComposite to true in your xorg.conf intel driver section make the corruption and lockup go away?
Comment 8 Simon Berger 2008-12-20 06:32:33 UTC
Okay, ExaNoComposite also seems to solve the problem with driver version 2.5.0. Font drawing (measured with 'x11perf -aa10text') is still about 4 times slower with EXA compared to XAA (regarding my comment on the felt slowdown with the new driver).
Comment 9 Tony Murray 2008-12-21 18:38:43 UTC
I think this was fixed in 2.5.1, but it seems to be working for me.  Still some minor corruptions as with older drivers.  But at least it is fully usable.
Comment 10 Tony Murray 2008-12-27 00:17:39 UTC
Ok, This issue still occurs on xf86-video-intel 2.5.1, kernel 2.6.28, and libdrm 2.4.1.

Setting ExaNoComposite does make it go away, I haven't had a chance to try UXA yet...
Comment 11 Jesse Barnes 2009-01-06 13:01:31 UTC
Ok, thanks for confirming.  Looks like our render accel code needs some work on 830.
Comment 12 Jesse Barnes 2009-01-28 13:21:48 UTC
Reassigning to our render accel guru.
Comment 13 Tony Murray 2009-02-06 15:38:33 UTC
Created attachment 22662 [details]
2.6.1 xorg.log

After commenting my  

Option "ExaNoComposite" "1"

line, after start up X looks better, but still locks up.  I saws this in my log:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Versions:
linux: 2.6.28
xf86-video-intel: 2.6.1
xorg-server: 1.5.3
mesa: 7.3

I've attached full xorg.log
Comment 14 Gordon Schumacher 2009-03-23 11:12:27 UTC
Created attachment 24169 [details]
2.6.3 Xorg.log

I am seeing very similar behaviour on a Thinkpad X30, which has an 82830 chipset/video, as well as bug #16928.  This machine is running the Kubuntu Jaunty Jackalope alpha, and until I added the XAA option as noted below, it was running with no xorg.conf file.

Versions:
Kernel		- Ubuntu 2.8.28-11-generic
xorg-intel	- 2.6.3-0ubuntu2
xorg-server	- 1.6.0-0ubuntu3
mesa		- 7.3-1ubuntu3

Setting 'Option "AccelMethod" "XAA"' does indeed appear to eliminate the crashes, as well as the corrupted bitmaps... but now, scrolled contents (such as listboxes) don't get redrawn properly.  If a box is scrolled, the few lines closest to the "incoming" contents get redrawn, and the rest of the area is unchanged.  I've also seen cases where window backgrounds aren't completely filled in so that I can see "leftovers" of other bits of the screen.

I've not managed to capture any information after a lockup so far; whatever happened didn't make it into the syslog, and I'm perhaps being incompetent with my attempts at using magic sysrq.

Let me know what else I can do to help.
Comment 15 Gordon Schumacher 2009-03-23 11:17:17 UTC
Oh, one additional bit of information that might help...

I also seem to remember that long ago when I installed SuSE 10.something on
this same machine, I had to manually force it to use the "i810" driver instead
of the "intel" driver, otherwise Something Bad Happened (but it's been long
enough that I don't recall what).

Also interestingly, if I specify "noacpi" to the kernel command line args, the frequency of crashes drops to *nearly* zero (but they still happen).
Comment 16 Gordon Schumacher 2009-03-23 12:03:56 UTC
Of course, it's done this just to spite me...

It locked up again, literally moments after I pushed "send".  I'm trying again with "NoDRI", since I've seen reports that this can be an issue; I'll let you know.
Comment 17 Gordon Schumacher 2009-03-24 12:39:14 UTC
(In reply to comment #16)
> Of course, it's done this just to spite me...
> 
> It locked up again, literally moments after I pushed "send".  I'm trying again
> with "NoDRI", since I've seen reports that this can be an issue; I'll let you
> know.

NoDRI appears to have fixed the issue, but (of course) it runs very, very slowly now.

So my question is... what additional information can I provide to try and track this down?
Comment 18 Eric Anholt 2009-07-15 15:53:04 UTC
Could you try with master of the 2D driver?  This may help with 8xx render corruption issues:

commit a1e6abb5ca89d699144d10fdc4309b3b78f2f7a9
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Jul 15 14:15:10 2009 -0700

    Use batch_start_atomic to fix batchbuffer wrapping problems with 8xx render.
    
    Bug #22483.
Comment 19 Carl Worth 2009-09-17 11:29:17 UTC
I'm closing this bug due to lack of input after Eric's proposed fix.

Tony, if you're able to test a newer version of the driver, I'm hopeful
you'll find it fixes your problem. If you find it doesn't then please
feel free to reopen this bug report.

-Carl
Comment 20 Tony Murray 2009-11-09 20:13:26 UTC
I haven't had any such issues as described in this bug for a long time. It can be fully closed.  Thanks!
Comment 21 Carl Worth 2009-11-10 10:02:19 UTC
(In reply to comment #20)
> I haven't had any such issues as described in this bug for a long time. It can
> be fully closed.  Thanks!

Tony,

Thanks for the confirmation. Have fun out there!

-Carl


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.