Bug 21790

Summary: xf86-video-intel: pixmap corruption in the font glyph cache
Product: xorg Reporter: Vytautas <vytautas1987>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: borych, bryce, bugzilla, byron, eric, hcamp, hub, kjb, maxi, mefoster, me, rasasi78, remi, vytautas1987
Version: 7.2 (2007.02)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Screenshot showing corruption in Mozilla Firefox
none
Severe font corruption.
none
same bug or other here?
none
Example of font corruption none

Description Vytautas 2009-05-18 05:15:36 UTC
Screenshots and logs and Xorg.conf files and more here:
http://bugs.gentoo.org/show_bug.cgi?id=270031
(maybe there is no need to reupload them here?)

Distribution: Gentoo x86 2.6.28-r5 
almost everything Gentoo stable just intel the latest 2.7.1 and libdrm-2.4.9 (?). Maybe libdrm bug because I reproduced with Gentoo stable Intel driver too once, but some time ago there was no this bug.

Maybe somehow related to OOo Calc 3.1 because ONLY in it I can reproduce this bug start, but it appears everywhere after some time. 

Fell free to ask more info this is my first bug here.
Comment 1 Rémi Cardona 2009-05-18 06:02:09 UTC
Just to clarify the bug report a little, this bug is not specific to OOo. I had it in gitk and firefox. As for pointing at the glyph cache, it's because in all the reports, it seems that text pixmaps are impacted first.

But on my own laptop, I've sometimes seen corruption of small pixmaps such as thumbnails in firefox.

In any case, the corruption seems to happen when the system memory is under heavy load.

FWIW, here's a fedora bug report that looks identical : https://bugzilla.redhat.com/show_bug.cgi?id=495323

Thanks
Comment 2 Hubert Figuiere 2009-05-18 09:58:09 UTC
as I was mentionning on the RedHat bug report, I was hit by this bug faster when I only had 768MB.
Comment 3 Jesus Rodriguez 2009-05-18 23:11:52 UTC
Same thing here. Ubuntu Jaunty. Didn't happen with Ubuntu stock drivers+kernel, but started happening on some apps (mostly, but not only, with fonts) after upgrading kernel to 2.6.29-02062903-generic and drivers to 2.7.1-0ubuntu1~xup~1.

Affected apps include Firefox, Ooo, Lotus Notes 8.5., gnome-terminal.

Section "Device"
        Identifier      "Configured Video Device"
        Option          "AccelMethod"                   "uxa"
        Option          "EXAOptimizeMigration"          "true"
        Option          "MigrationHeuristic"            "greedy"
        Option          "Tiling"                        "false"
EndSection
Comment 4 Jesus Rodriguez 2009-05-18 23:24:11 UTC
(In reply to comment #3)
> Same thing here. Ubuntu Jaunty. Didn't happen with Ubuntu stock drivers+kernel,
> but started happening on some apps (mostly, but not only, with fonts) after
> upgrading kernel to 2.6.29-02062903-generic and drivers to
> 2.7.1-0ubuntu1~xup~1.
> 
> Affected apps include Firefox, Ooo, Lotus Notes 8.5., gnome-terminal.
> 
> Section "Device"
>         Identifier      "Configured Video Device"
>         Option          "AccelMethod"                   "uxa"
>         Option          "EXAOptimizeMigration"          "true"
>         Option          "MigrationHeuristic"            "greedy"
>         Option          "Tiling"                        "false"
> EndSection
> 

Edit: When I experienced the issue, the original xorg.conf had Tiling=true, I've changed it to see if it's a valid workaround. It hasn't happened (yet) with Tiling=false, but it may happen anyway. It takes some time.
Comment 5 Jesus Rodriguez 2009-05-19 00:14:01 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Same thing here. Ubuntu Jaunty. Didn't happen with Ubuntu stock drivers+kernel,
> > but started happening on some apps (mostly, but not only, with fonts) after
> > upgrading kernel to 2.6.29-02062903-generic and drivers to
> > 2.7.1-0ubuntu1~xup~1.
> > 
> > Affected apps include Firefox, Ooo, Lotus Notes 8.5., gnome-terminal.
> > 
> > Section "Device"
> >         Identifier      "Configured Video Device"
> >         Option          "AccelMethod"                   "uxa"
> >         Option          "EXAOptimizeMigration"          "true"
> >         Option          "MigrationHeuristic"            "greedy"
> >         Option          "Tiling"                        "false"
> > EndSection
> > 
> 
> Edit: When I experienced the issue, the original xorg.conf had Tiling=true,
> I've changed it to see if it's a valid workaround. It hasn't happened (yet)
> with Tiling=false, but it may happen anyway. It takes some time.
> 
Edit2: The bug is reproducible with Tiling=false, too
Comment 6 Jesse Barnes 2009-05-19 18:19:55 UTC
Can anyone reproduce the problem after disabling swapping (doing swapoff on their swap partitions/files)?
Comment 7 Vytautas 2009-05-20 07:04:57 UTC
Swapoff -a and still reproduced white stripes bug version instantly with horizontal scroolbar. Maybe Even easier to reproduce now. 
Comment 8 Klaasjan Brand 2009-05-21 03:39:10 UTC
Vytautas: this bug is about font glyph rendering errors and not about scrollbars. I suppose you're looking for an answer to a different bug.

I've turned of swap and have not seen the font problem for about a day now. (most of the time, I noticed some odd glyphs within a few hours). It will take a few days before I can be really sure, but it's looking good right now.
Of course, I'd like the option to swap back ;)

Jesse: I'm very curious about the relation between the glyph cache and whether or not swap is enabled.
Comment 9 Dark Shadow 2009-05-21 05:30:51 UTC
Created attachment 26061 [details]
Screenshot showing corruption in Mozilla Firefox

Hi, I guess I have the same problem (VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)). It occurs in Firefox and Emacs-23 after some time. Nothing in dmesg, apart from this everything works fine.

Using current git versions of drm, mesa, xf86-video-intel and linux-2.6.29 (patched with tuxonice).

I will check if it happens without swap too.
Comment 10 Vytautas 2009-05-22 05:42:29 UTC
I do not have those crazy letters and numbers anymore without swap. 
Looks like good override.

BUT I still have white stripes and colorful stripes. Should i submit other bug?
Check my images.
Comment 11 Raúl 2009-05-23 02:19:22 UTC
Created attachment 26139 [details]
Severe font corruption.

Hello all:

This is a screenshot of what I found after having my laptop unattended all night. This is a severe case, but I usually had minor issues on certain glyphs, similar to the other screenshot in the bug.

GM965GM, intel driver 2.7.99.1,linux 2.6.29.3 +TuxOnIce noKMS, libdrm 2.4.11, mesa 7.4.1

If you need xorg conf or log, please let me know.

I also had this starting from 2.7.0 already using UXA, when I upgraded to 2.7.99.1 things improved a little, but problem is still there. I did noticed then that it should be related somehow to memory management, indeed I went to the IRC channel with that suspicion, but I had not much information from there. On high memory usage problem increased and doing some memory rotation, i.e.: reusing an application that has been idle for a while, affected the font rendering.

After reading this bug I swapoff -a and things did improve. I rarely see any of this corruption, but I still can notice some glitches, for instance the '[]' chars in this form are not those but just noise.

I'm also very curious how swapping affects font rendering, so I'd appreciate some note about it.

HTH,
Comment 12 Vytas 2009-05-24 09:56:21 UTC
If I disable swap, I can't reproduce this issue, but then the system comes to a complete grind instead. The X server (VIRT) memory usage climbs up slowly but steadily all the time to something like 700M and then (since I have 1G RAM) system either becomes unresponsive (w/o swap), or some memory is swapped to the disk, but glyphs are beginning to deform.
I understand virtual memory of the process may include some mmap-ed stuff etc, but still growing to 700M+ seems weird, aren't the any (video?) memory leaks in the pixmap managing of the new intel drivers?
Comment 13 Dark Shadow 2009-05-24 14:21:49 UTC
Like Vytas posted in comment #12, I also notice improvement when deactivating swap, but the system will become more and more slow to respond, and I can see heavy disk activity especially when compiling things. Keyboard input and responses are delayed by about half a minute (getting worse by the time).
Comment 14 Hubert Figuiere 2009-05-24 15:24:58 UTC
As I said on the RedHat bug report, it happened faster when I only had 768MB than 1.5GB, still with the same amount of swap on the same hardware.

And since I disabled KMS at boot up, it no longer happen.
Comment 15 Vytautas 2009-05-25 01:40:36 UTC
I reproduced bug at full effect without swap under heavy load then compiling things and working with OOo at same time. 
Comment 16 Vytautas 2009-05-26 01:07:49 UTC
Created attachment 26213 [details]
same bug or other here?

I just selected many cells many times and here is is 100% reproducable colorfull stripes (blue ones).
Comment 17 Eric Anholt 2009-05-27 13:45:57 UTC
Vytautas: Does the following patch queued up to for-linus in the kernel help you?

commit 07f4f3e8a24138ca2f3650723d670df25687cd05
Author: Kristian Høgsberg <krh@redhat.com>
Date:   Wed May 27 14:37:28 2009 -0400

    i915: Set object to gtt domain when faulting it back in
    
    When a GEM object is evicted from the GTT we set it to the CPU domain,
    as it might get swapped in and out or ever mmapped regularly.  If the
    object is mmapped through the GTT it can still get evicted in this way
    by other objects requiring GTT space.  When the GTT mapping is touched
    again we fault it back into the GTT, but fail to set it back to the
    GTT domain.  This means we fail to flush any cached CPU writes to the
    pages backing the object which will then happen "eventually", typically
    after we write to the page through the uncached GTT mapping.
    
    [anholt: Note that userland does do a set_domain(GTT, GTT) when starting
    to access the GTT mapping.  That covers getting the existing mapping of the
    object synchronized if it's bound to the GTT.  But set_domain(GTT, GTT)
    doesn't do anything if the object is currently unbound.  This fix covers the
    transition to being bound for GTT mapping.]
    
    Fixes glyph and other pixmap corruption during swapping.  fd.o bug #21790
    
    Signed-off-by: Kristian Høgsberg <krh@redhat.com>
    Signed-off-by: Eric Anholt <eric@anholt.net>

(swapping isn't the only case that this bug can fix, but it's the most common as the cpu cache of the object will be hot with writes at the time we don't want it)
Comment 18 Vytautas 2009-05-28 06:44:44 UTC
Sorry I do not know how to test it. If you give detailed instructions I will test in about week time. Still I know how to compile kernel.
Comment 19 Raúl 2009-05-28 07:22:08 UTC
Vytautas:

You'd need to clone latest linus tree[0] once the commit is applied, build the kernel and try.

Or alternatively try the drm-intel[1] kernel branch where I see it applied.

Tree should be [0]http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=summary

[1]http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=summary
Comment 20 Vytautas 2009-05-28 13:20:17 UTC
Can I use Gentoo git-sources? (http://gentoo-portage.com/sys-kernel/git-sources). 
Can you post here rc number when it will be ready (applied)?
Comment 21 Rémi Cardona 2009-05-28 13:46:25 UTC
(In reply to comment #20)
> Can I use Gentoo git-sources?
> (http://gentoo-portage.com/sys-kernel/git-sources). 

Not yet. But you can just "git clone" Eric's repo from /usr/src to try it out and then remove it when you're done. You can even use "kernel-config" to make it the default kernel source directory.

(In reply to comment #17)
> Vytautas: Does the following patch queued up to for-linus in the kernel help
> you?

Eric, this patch works for me, I've tried thrashing my laptop's memory and I couldn't reproduce the bug. Looks really good.

Thanks for solving this
Comment 22 Dark Shadow 2009-05-29 03:51:55 UTC
The patch solved it for me too. Thanks!
Comment 23 Dark Shadow 2009-05-29 07:48:28 UTC
While the above patch indeed fixed the fonts problem,
my system also seems to suffer from the problem described
in bug #20766. Just in case anyone else has similar
issues...
Comment 24 Robert Huitl 2009-05-30 06:41:48 UTC
Eric, your patch seems to fix this problem for me as well. Thanks a lot!

Dark Shadow, I also had the memory leak problem with 2.6.29. I got the impression that it's much better with 2.6.30-rc7. The number of objects (/proc/dri/0/gem_objects) is still high, but the "object bytes" aren't as high.
Comment 25 Raúl 2009-05-30 07:47:09 UTC
I managed to apply the patch on 2.6.29.4, it also solves the problem. I also hope it doesn't have any collateral effect.

Thanks.
Comment 26 Eric Piel 2009-06-08 01:23:01 UTC
Created attachment 26526 [details]
Example of font corruption

Strangely, I'm still seeing this bug, although I'm using kernel 2.6.30-rc8 (which contains commit 07f4f3e8a24138ca2f3650723d670df25687cd05). Similarly, doing a "swapoff -a" fixes the problem.

It's with the intel driver 2.7.1, and a chipset "965GM", using KMS. Is there something else that I should update to fix the bug?
Comment 27 Carl Worth 2009-06-08 15:26:59 UTC
*** Bug 22111 has been marked as a duplicate of this bug. ***
Comment 28 Carl Worth 2009-06-12 09:22:35 UTC
*** Bug 22118 has been marked as a duplicate of this bug. ***
Comment 29 Byron Clark 2009-06-21 10:18:38 UTC
I'm still seeing this bug with linux 2.6.30 and intel driver 2.7.1.  It does seem harder to trigger, but it still happens.
Comment 30 Byron Clark 2009-06-21 10:21:35 UTC
(In reply to comment #29)
> I'm still seeing this bug with linux 2.6.30 and intel driver 2.7.1.  It does
> seem harder to trigger, but it still happens.
> 

I'm only seeing the corruption in firefox, but it appears that focusing a different window and then returning the focus to firefox corrects the corrupted glyphs.
Comment 31 Rémi Cardona 2009-06-21 23:32:11 UTC
(In reply to comment #30)
> I'm only seeing the corruption in firefox, but it appears that focusing a
> different window and then returning the focus to firefox corrects the corrupted
> glyphs.

Looks like a different bug, please file a new one so your issue gets looked at.

Thanks

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.