Bug 20739

Summary: [i945] X crashes in fbBlt() when using Sun Java Plugin 6 + firefox3.0 on Asus EEEPC 1000
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Jesse Barnes <jbarnes>
Status: RESOLVED INVALID QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: yan.i.li
Version: 7.3 (2007.09)Keywords: NEEDINFO
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log
none
Manage pixmaps in the driver w/EXA none

Description Bryce Harrington 2009-03-18 16:05:53 UTC
Created attachment 24018 [details]
Xorg.0.log

Forwarding this report from Ubuntu:
https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/337608

[Problem]
Sun's Java test web page causes X to crash in fbBlt apparently due to invalid memcpy().

[Original Report]
Ubuntu Jaunty Alpha 5 on Asus EEEPC 1000, Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03).

I installed Sun Java (6) update 12 though the "install missing plugin" on firefox. I restarted firefox, and I tried to test the plugin by going to http://java.com/en/download/help/testvm.xml website on firefox. The page loaded ok, the applet seems to run fine, but when I move the scroll bar on firefox (scroll bar on the LHS of firefox), this killed X. I looked in /var/log/ log files but did not see any relevant information there. I can update the bug with more info if someone shows me where to get it from.

Apport did not report that the Xserver was terminated abruptly.

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile 945GME Express Memory Controller Hub [8086:27ac] (rev 03)
 Subsystem: ASUSTeK Computer Inc. Device [1043:830f]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GME Express Integrated Graphics Controller [8086:27ae] (rev 03)
 Subsystem: ASUSTeK Computer Inc. Device [1043:830f]

[backtrace]
#0 memcpy () at ../sysdeps/i386/i686/memcpy.S:75
No locals.
#1 0xa2ab12e0 in ?? ()
No symbol table info available.
#2 0xb7870583 in fbBltStip (src=0xa2a9c038, srcStride=1024, srcX=5440,
    dst=0x9404258, dstStride=85, dstX=0, width=2720, height=46, alu=3,
    pm=4294967295, bpp=32) at ../../fb/fbblt.c:944
No locals.
#3 0xb7875868 in fbGetImage (pDrawable=0x9102088, x=170, y=554, w=85, h=46,
    format=2, planeMask=4294967295, d=0x9404258 "") at ../../fb/fbimage.c:332
 pm = 4294967295
 src = (FbBits *) 0xa28a3038
 srcStride = 1024
 srcBpp = 155213372
 srcXoff = 0
 srcYoff = -49
 dstStride = -1565846816
#4 0xb78557f2 in exaGetImage (pDrawable=0x9102088, x=<value optimized out>,
    y=<value optimized out>, w=<value optimized out>, h=<value optimized out>,
    format=2, planeMask=4294967295, d=0x9404258 "")
    at ../../exa/exa_accel.c:1228
 pixmaps = {{as_dst = 0, as_src = 1, pPix = 0xa28a3008,
    pReg = 0xbf9e5318}}
---Type <return> to continue, or q <return> to quit---
 Reg = {extents = {x1 = 422, y1 = 505, x2 = 507, y2 = 551}, data = 0x0}
 pPix = (PixmapPtr) 0x0
 xoff = <value optimized out>
 yoff = <value optimized out>
 ok = <value optimized out>
Comment 1 Jesse Barnes 2009-04-06 13:39:16 UTC
I can't reproduce this with xf86-video-intel git from today and a fairly recent 2.6.29 version of drm-intel, I'll try 2.6.1 next.  This could have been fixed by one of the many fence reg related fixes that went in post 2.6.1.
Comment 2 Jesse Barnes 2009-04-06 13:56:21 UTC
Can't reproduce on 2.6.29 w/2.6.1 either, must have been a GEM fix between Jaunty's 2.6.28 and the current code...  Several fencing related fixes from Chris Wilson went in after 2.6.28 came out, it would be worth trying those.  But there have been a ton of changes (mostly fixes), and unless Jaunty's 2.6.28 includes GTT mapping support, the fencing fixes aren't likely to help.

Bryce, can you get Manoj to try with a newer 2D driver and/or newer kernel?  I'll try grabbing Jaunty's kernel in the meantime.
Comment 3 Jesse Barnes 2009-04-06 15:23:11 UTC
Ok I finally got the Jaunty bits and saw the crash with 2.6.28-11-generic and the 2.6.1 driver, but with the latest Jaunty 2.6.3 driver things seem stable.  Can you confirm that?
Comment 4 Jesse Barnes 2009-04-08 16:51:40 UTC
Created attachment 24678 [details] [review]
Manage pixmaps in the driver w/EXA

Here's a crazy patch to fix this bug.  The real bug is in the server somewhere (EXA pixmap migration appears to be broken, judging by the corruption shown in the text case vs. UXA and EXA with this patch).  But rather than deal with the server's EXA migration code (scary) why not just make our driver do pixmap management itself?  It should avoid migration altogether but may affect performance or have other bugs...  Please test.
Comment 5 Bryce Harrington 2009-04-08 17:26:37 UTC
Manoj, I built a package with this patch and stuck it in my ppa, if you want to use a deb for this.
(I had to modify the patch slightly to apply to Ubuntu).

Comment 6 Michel Dänzer 2009-04-09 02:07:57 UTC
(In reply to comment #4)
> Here's a crazy patch to fix this bug.  The real bug is in the server somewhere
> (EXA pixmap migration appears to be broken, judging by the corruption shown in
> the text case vs. UXA and EXA with this patch).  But rather than deal with the
> server's EXA migration code (scary) why not just make our driver do pixmap
> management itself?

The reported crash is not likely to be directly related to EXA migration, as that doesn't have any impact on the size of memory mappings.

So while I very much like this patch (I wish this approach had been taken in the first place rather than the whole UXA silliness...), I'm afraid it can only solve the problem indirectly. (Not to mention it won't help people without a kernel memory manager)
Comment 7 Jesse Barnes 2009-04-09 09:45:28 UTC
Yeah, I think the corruption I saw is due to migration code (or more specifically the dirty update stuff), but the crash must be due to either an unmap happing at the wrong time or the mapping size changing like you say.  If you have a driver that doesn't manage pixmaps it's pretty easy to see the corruption with the default migration scheme (always I think?) at the Java test page referenced in this bug.

I'd like to solve the problem properly as well, but I'll have to dig through the EXA code a lot more (I haven't looked at the migration or sys vs offscreen mapping code much at all).
Comment 8 Michel Dänzer 2009-04-15 09:59:50 UTC
(In reply to comment #7)
> Yeah, I think the corruption I saw is due to migration code (or more
> specifically the dirty update stuff), but the crash must be due to either an
> unmap happing at the wrong time or the mapping size changing like you say.  If
> you have a driver that doesn't manage pixmaps it's pretty easy to see the
> corruption with the default migration scheme (always I think?) at the Java test
> page referenced in this bug.

Does

    Option "EXAOptimizeMigration" "off"

work around the corruption? Also I'm having a hard time reproducing reports of problems with that option enabled with xserver master, can you reproduce it with that?


> I'd like to solve the problem properly as well, but I'll have to dig through
> the EXA code a lot more (I haven't looked at the migration or sys vs offscreen
> mapping code much at all).

It's not clear to me at this point it's an EXA bug at all... some values in the backtrace look way off, but the trace may just be inaccurate, or even if not the question is which layer would be responsible for sanitizing them.
Comment 9 Jesse Barnes 2009-05-04 09:53:23 UTC
Haven't heard from the reporter in awhile, but:
  - there's a patch available that "fixes" this for me, distros can pick it up if 
    desired
  - EXA is no longer in the driver and UXA doesn't have this bug afaict
so I'm going to mark this invalid.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.