Bug 12865 - [965] intermittent display corruption with framebuffer compression
Summary: [965] intermittent display corruption with framebuffer compression
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: high normal
Assignee: Wang Zhenyu
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks: 16467
  Show dependency treegraph
 
Reported: 2007-10-19 17:02 UTC by Jesse Barnes
Modified: 2008-07-07 00:58 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
X log diff between working and bad case (4.50 KB, text/plain)
2007-11-14 00:27 UTC, Wang Zhenyu
no flags Details
Work around clock gating problems (840 bytes, text/x-diff)
2007-11-14 08:32 UTC, Jesse Barnes
no flags Details
Updated clock gating workarounds (1.38 KB, text/x-diff)
2007-11-14 11:49 UTC, Jesse Barnes
no flags Details
first run, screen corrupted (69.95 KB, text/plain)
2007-11-14 12:35 UTC, Tomas Carnecky
no flags Details
second run, now with fbc disabled, screen ok (69.24 KB, text/plain)
2007-11-14 12:36 UTC, Tomas Carnecky
no flags Details
third run, fbc enabled again, screen ok (69.86 KB, text/plain)
2007-11-14 12:36 UTC, Tomas Carnecky
no flags Details
Early clock gating init (1.79 KB, text/x-diff)
2007-11-14 12:57 UTC, Jesse Barnes
no flags Details
early clock gating log (69.84 KB, text/plain)
2007-11-14 13:20 UTC, Tomas Carnecky
no flags Details
FBC register debug patch (2.88 KB, text/x-diff)
2007-11-14 13:40 UTC, Jesse Barnes
no flags Details
Disable FBC on 965GM (474 bytes, text/x-diff)
2007-11-14 16:07 UTC, Jesse Barnes
no flags Details
(2.3-branch) gen4 state align patch (1.65 KB, patch)
2008-05-19 20:31 UTC, Wang Zhenyu
no flags Details | Splinter Review
(2.3-branch) more alignment change (2.71 KB, patch)
2008-05-20 01:41 UTC, Wang Zhenyu
no flags Details | Splinter Review
My xorg.conf (2.89 KB, text/plain)
2008-05-24 11:06 UTC, Lukas Hejtmanek
no flags Details
Fake screenshot (164.48 KB, image/png)
2008-05-24 11:15 UTC, Lukas Hejtmanek
no flags Details

Description Jesse Barnes 2007-10-19 17:02:27 UTC
Every now and then on my 965GM machine (a T61) I see a weird rendering problem that seems composite hook related.

The failure mode is one of two things:
  - all the stuff that's drawn using the render extension is missing from the screen (i.e. no fonts, no icons)
  - all the stuff that's drawn using render is just shown as colored rectangles (same color for each rectangle, different sizes for each font glyph)

I usually see this when I have framebuffer compression enabled, but I'm also seeing it sometimes with the DRM base suspend/resume patch, even when FBC is disabled.

It may be due to the fence registers not being setup correctly or a some bug in the composite hook itself... any ideas?

Thanks,
Jesse
Comment 1 Wang Zhenyu 2007-10-24 01:16:06 UTC
When I used old drm (drmMinor<7), I do see render corrupt with fbc enabled, but disable fbc gave me correct rendering. Use current drm (drmMinor == 11), I can't see render failure with/without fbc. I don't know what's going on.
Comment 2 Jesse Barnes 2007-10-24 10:38:28 UTC
Maybe e04333a6352040bc883655d606923c912d005981 fixed the problem?  It seems like it's probably a tiled rendering or composite hook problem somehow...

Since I sometimes see dark rectangles instead of the rendered text, maybe it's a caching problem somewhere.

Jesse
Comment 3 Jesse Barnes 2007-11-09 16:51:42 UTC
I was able to reproduce this, but only briefly.  Looks like lots of changes have gone into the DRM since driver minor 7, which was 3620a3ec85033d3d1d1a44ec32492fb2ef20fd8a.  Once I have it failing again, I can try a bisect on the DRM tree to see if one of the commits fixed the issue somehow.
Comment 4 Tomas Carnecky 2007-11-12 16:12:41 UTC
I can reliably (100%) reproduce it, if you want me to test patches, I'm all yours :)

I will look into how to bisect dri (if you think there's the problem).. my drm in compiled into the kernel and I don't want to mess up with my system too much (well, a little bit is ok ;) ). If you have suggestions, please share..
Comment 5 Jesse Barnes 2007-11-12 16:18:16 UTC
Thanks Tomas.

It's probably easiest to add another kernel to your system for testing that supports DRM kernel modules instead.  Then you can bisect the Mesa DRM tree (git://anongit.freedesktop.org/git/mesa/drm) to find out where things get better for you (if at all).

Is that something you could do?  I'd really like to fix this one, but it's hard since I can't seem to reproduce it reliably.

Thanks,
Jesse
Comment 6 Tomas Carnecky 2007-11-12 16:25:03 UTC
(In reply to comment #5)
> It's probably easiest to add another kernel to your system for testing that
> supports DRM kernel modules instead.  Then you can bisect the Mesa DRM tree

Just one quick question, do I have to compile CONFIG_DRM into the kernel or is that functionality provided by mesa? (of course the actual i915 module will be provided by mesa..)

> (git://anongit.freedesktop.org/git/mesa/drm) to find out where things get
> better for you (if at all).
> 
> Is that something you could do?  I'd really like to fix this one, but it's hard
> since I can't seem to reproduce it reliably.

Absolutely, I have plenty of time in school :)



Comment 7 Jesse Barnes 2007-11-12 16:30:14 UTC
I usually build with a kernel configuration similar to my distribution's defaults, then remove the included DRM drivers (drm.ko and i915.ko) so that modutils will load the ones from the Mesa tree.

So I think you'll want CONFIG_DRM=m (though that may not be strictly necessary), then just remove the installed drm.ko once you're done.

Thanks,
Jesse
Comment 8 Tomas Carnecky 2007-11-12 16:38:29 UTC
drm has a whole lot tags, where should I start bisecting?
Comment 9 Jesse Barnes 2007-11-12 17:06:49 UTC
Hopefully you can find a "bad" commit at or before 3620a3ec85033d3d1d1a44ec32492fb2ef20fd8a, and the latest commit should be "good" (I hope).  From there the bisect should go quickly.

Jesse
Comment 10 Tomas Carnecky 2007-11-12 18:54:43 UTC
I can't test earlier than ab75d50d6ca72615259e4fa857effeb6192c28a9 because of undefined symbols. But since that commit the screen is corrupted. :-/
Comment 11 Jesse Barnes 2007-11-12 19:27:00 UTC
Can you try changing DRIVER_MINOR to 5 (or something less than 7 I think) in i915_drv.h too?  That will exercise different code paths in the X driver, so if that makes it work again I'll know it's an X driver problem and not a DRM problem...
Comment 12 Tomas Carnecky 2007-11-13 00:38:59 UTC
(In reply to comment #11)
> Can you try changing DRIVER_MINOR to 5 (or something less than 7 I think) in
> i915_drv.h too?  That will exercise different code paths in the X driver, so if
> that makes it work again I'll know it's an X driver problem and not a DRM
> problem...
> 


ab75d50d6ca72615259e4fa857effeb6192c28a9 (the earliest version that compiles for me) + DRIVER_MINOR=5 makes it work again.
Comment 13 Tomas Carnecky 2007-11-13 05:48:47 UTC
(In reply to comment #12)
> ab75d50d6ca72615259e4fa857effeb6192c28a9 (the earliest version that compiles
> for me) + DRIVER_MINOR=5 makes it work again.

I'm sorry, it doesn't. I have to restart the computer after I find a combination that doesn't corrupt the screen, because once it works (=by disabling fb compression or starting X without any drm modules loaded, or due to any other reason) it keeps on working even if I load any drm modules or enable fb compression again. So I have to keep track what exactly I did since the last reboot and check whether that may influence the results. To be extra sure I shut down the computer after every test (compile/load drm modules, start X, see if it works, bisect). Since the X driver v2.1.0 worked, shouldn't I rather bisect that?
Comment 14 Tomas Carnecky 2007-11-13 05:53:39 UTC
Btw, compiling a version earlier than March 12, 2007 (ab75d50d6ca72615259e4fa857effeb6192c28a9) doesn't work due to SA_SHIRQ undefined. Are there any workarounds that I could use to test versions earlier then that date?
Comment 15 Tomas Carnecky 2007-11-13 06:31:24 UTC
Before I started bisecting I rebooted the computer, no drm modules were loaded during startup (because the drm modules were not installed so the kernel couldn't even find those if it wanted). I did nothing but log into the console and execute 'startx'. I had no corruption. But when I tried later (after I was done bisecting) I couldn't reproduce it.
I think it was because the first time I did a warm restart (sudo reboot) and the graphics chip somehow remembered just like it remembers when I disable fbc, start X, kill X, enable fbc, start X and then I don't have any corruption. But when I was bisecting I always did a cold restart (sudo init 0) - to make absolutely sure that no setup information is remembered across the restart.

Sorry for the noise, but I'm very confused right now. I can reproduce it reliably, but the circumstances under which it works are not very clear. I know that disabling fbc fixes it, but I had the impression that something else made it work, too. But I can't seem to find out what.
Comment 16 Jesse Barnes 2007-11-13 08:32:43 UTC
Yeah, this bug is frustrating, isn't it?  I *think* that if you do a cold reboot and use a recent version of the DRM modules (with a minor > 7) it should work from the start with fbc enabled.  But I'm not sure, it may break there too.
Comment 17 Wang Zhenyu 2007-11-13 19:48:21 UTC
From my investigate, the only case this bug trigger is we really enabled fbc on tiled front buffer. Other 'working' cases are all those that didn't really enable
fbc (like linear front buffer alloc, etc.) You can check X log with "fbc enable on plane X" message to see if fbc is really enabled to work or not.
Comment 18 Wang Zhenyu 2007-11-13 23:07:06 UTC
Could you try current git?
Comment 19 Wang Zhenyu 2007-11-13 23:32:49 UTC
oh, sorry I didn't mean current git fixed this one.
Comment 20 Wang Zhenyu 2007-11-14 00:27:22 UTC
It seems this problem could not be produced in every booting, but does happen sometime. Attach the X log diff with ModeDebug between working and bad cases.
It seems that RENCLK_GATE_D1 is related here, from spec bit 16 is fbcunit clock
disable function. I don't understand what it does, but it looks concerned here.
Comment 21 Wang Zhenyu 2007-11-14 00:27:54 UTC
Created attachment 12540 [details]
X log diff between working and bad case
Comment 22 Jesse Barnes 2007-11-14 08:13:55 UTC
Ah, good catch, Zhenyu!  The other differences are just with VGA registers, so they should be harmless.  I'll put together a patch to adjust the clock gating for FBC and see if that works for people.

Thanks,
Jesse
Comment 23 Jesse Barnes 2007-11-14 08:32:51 UTC
Created attachment 12546 [details]
Work around clock gating problems

This patch disables various clock gating features (unifying those that look like they need to be disabled across all the 965 chipsets).  Hopefully it'll fix the weird FBC problems people have been seeing.  Please test.
Comment 24 Tomas Carnecky 2007-11-14 11:29:00 UTC
(In reply to comment #23)
> Created an attachment (id=12546) [details]
> Work around clock gating problems

This patch did not fix the problem on my computer.

Comment 25 Jesse Barnes 2007-11-14 11:49:51 UTC
Created attachment 12552 [details]
Updated clock gating workarounds

This patch also adds some debug messages.  Tomas, can you see if it works for you?  If not, can you attach both working and non-working Xorg.0.log files?

Thanks,
Jesse
Comment 26 Tomas Carnecky 2007-11-14 12:35:20 UTC
Created attachment 12553 [details]
first run, screen corrupted
Comment 27 Tomas Carnecky 2007-11-14 12:36:03 UTC
Created attachment 12554 [details]
second run, now with fbc disabled, screen ok
Comment 28 Tomas Carnecky 2007-11-14 12:36:54 UTC
Created attachment 12555 [details]
third run, fbc enabled again, screen ok
Comment 29 Tomas Carnecky 2007-11-14 12:40:24 UTC
$ diff -u xorg-intel-bad.log xorg-intel-good.log | grep tilin
    -> no difference

$ diff -u xorg-intel-bad.log xorg-intel-fbc-disabled.log | grep tiling
-(II) intel(0): i830_set_tiling(): 0x00700000, 5632, 7700 kByte
-(II) intel(0): i830_set_tiling(): 0x02514000, 5632, 7744 kByte
-(II) intel(0): i830_set_tiling(): 0x02ca4000, 5632, 7744 kByte
+(II) intel(0): i830_set_tiling(): 0x00100000, 5632, 7700 kByte
+(II) intel(0): i830_set_tiling(): 0x01f14000, 5632, 7744 kByte
+(II) intel(0): i830_set_tiling(): 0x026a4000, 5632, 7744 kByte

$ diff -u xorg-intel-bad.log xorg-intel-good.log | grep RENCLK
-(II) intel(0):       RENCLK_GATE_D1: 0x00000000
+(II) intel(0):       RENCLK_GATE_D1: 0x70000000
-(WW) intel(0): Register 0x6204 (RENCLK_GATE_D1) changed from 0x00000000 to 0x70000000
Comment 30 Jesse Barnes 2007-11-14 12:57:18 UTC
Hm, maybe it's not a clock gating issue, since the state at EnterVT looks correct:
  RENCLK_GATE_D1: 0x70810000
unless the clock gating regs aren't being set early enough...  If so, the attached patch might help, but it sounds like there's something else going on.
Comment 31 Jesse Barnes 2007-11-14 12:57:46 UTC
Created attachment 12556 [details]
Early clock gating init
Comment 32 Tomas Carnecky 2007-11-14 13:20:02 UTC
Created attachment 12557 [details]
early clock gating log

You were right, it didn't help, but now 
$ diff -u xorg-intel-early-clock-gating.log xorg-intel-good.log | grep RENCLK
-(II) intel(0):       RENCLK_GATE_D1: 0x00000000
+(II) intel(0):       RENCLK_GATE_D1: 0x70000000
-(WW) intel(0): Register 0x6204 (RENCLK_GATE_D1) changed from 0x00000000 to 0x70810000
-(WW) intel(0): Register 0x6204 (RENCLK_GATE_D1) changed from 0x00000000 to 0x70000000

btw, 'Register 0x6204 (RENCLK_GATE_D1) changed from ...' doesn't appear in -good.log at all, but the register changes throughout the log. And in -bad.log the message appears only once even though the register changes from 0x0 to something else twice. And in this log the message appears at every change from 0x0 to something else (it changes twice). But it never appears between a change from !0x0 to 0x0.
Comment 33 Jesse Barnes 2007-11-14 13:40:45 UTC
Created attachment 12558 [details]
FBC register debug patch

This one adds FBC registers to the dumps.  I should have done this a long time ago, though I doubt it will give us any clues (I'd love to be prooven wrong though!).
Comment 34 Tomas Carnecky 2007-11-14 14:08:08 UTC
(In reply to comment #33)
> Created an attachment (id=12558) [details]
> FBC register debug patch
> 

no difference in FBC regs between -bad and -good. Of course -fbc-disabled sets some to zero.

Now the diff between -bad and -good is (besides the RENCLK_GATE_D1 changes) AR10 and ARX:

-(II) intel(0):                  ARX: 0x20
+(II) intel(0):                  ARX: 0x30

-(II) intel(0):                 AR10: 0x0c
+(II) intel(0):                 AR10: 0x20

-(II) intel(0):                 AR10: 0x00
+(II) intel(0):                 AR10: 0x20

-(II) intel(0):                 AR10: 0x00
+(II) intel(0):                 AR10: 0x20

are those relevant? almost the same changes are between -bad and -fbc-disabled, but between -fbc-disabled and -good only ARX changes.
Comment 35 Tomas Carnecky 2007-11-14 14:19:37 UTC
This is how the the regs change:

                   bad -> fbc-disabled -> good

(II) intel(0):     ARX: 0x20 -> 0x30 -> 0x30

(II) intel(0):    AR10: 0x0c -> 0x00 -> 0x20

(II) intel(0):    AR10: 0x00 -> 0x20 -> 0x20

(II) intel(0):    AR10: 0x00 -> 0x20 -> 0x20
Comment 36 Jesse Barnes 2007-11-14 14:47:16 UTC
I'm pretty sure the AR* regs aren't relevant to this problem.  I think you'd
see the same progression of values after several server starts regardless of
whether FBC is working or not.  And they should only have an effect in VGA mode anyway... so if they're the problem I'd be really surprised.
Comment 37 Jesse Barnes 2007-11-14 16:07:30 UTC
Created attachment 12560 [details]
Disable FBC on 965GM

I'd like to release 2.2 soon, but I don't want people seeing this bug.  Tomas, can you confirm that this patch disables FBC by default (you can still enable it for testing by using an explicit "FramebufferCompression" "true" in your xorg.conf).

Thanks,
Jesse
Comment 38 Tomas Carnecky 2007-11-14 16:14:29 UTC
(In reply to comment #36)
> I'm pretty sure the AR* regs aren't relevant to this problem.  I think you'd
> see the same progression of values after several server starts regardless of
> whether FBC is working or not.  And they should only have an effect in VGA mode
> anyway... so if they're the problem I'd be really surprised.

You are right, these changes have probably nothing to do with the bug. I was able to reproduce the same diff by starting X with the same configuration three times in a row and then comparing the logs.

Even RENCLK_GATE_D1 changes in the same way, so there's no difference visible in the logs between a working and broken configuration :(
Comment 39 Tomas Carnecky 2007-11-14 16:22:26 UTC
(In reply to comment #37)
> Created an attachment (id=12560) [details]
> Disable FBC on 965GM
> 

Yes, this disables FBC on my 965GM chipset.

Comment 40 Jesse Barnes 2007-11-14 16:23:30 UTC
Great, thanks a ton for your testing help, Tomas.  I think I'll talk to our chipset guys to see if they can give me any clues about why this might not be working for some people...
Comment 41 Jesse Barnes 2007-12-11 17:55:22 UTC
Thomas, assuming you can still reproduce this, can you try running with framebuffer compression enabled but with the "ExaNoComposite" option set to "true"?  Also, if you could post your steps for reproducing this that might help too...

Thanks,
Jesse
Comment 42 Tomas Carnecky 2007-12-12 23:42:08 UTC
With these two options (both in the "Device" section)
Option      "FramebufferCompression" "true"
Option      "ExaNoComposite" "true"
no corruption in visible.

There is nothing unusual needed to reproduce it. I just start X with FBC enabled and the corruption is visible - I run a partial gnome desktop so I see the corruption in gtk apps such as gnome-terminal, gedit, firefox or metacity (the gnome window manager).
Comment 43 Jesse Barnes 2007-12-13 10:53:03 UTC
Ok, good data point.  So the corruption is definitely related to the EXA composite hook.

As for reproducing, I thought you had do a careful poweroff, reboot dance to see the corruption reliably?
Comment 44 Tomas Carnecky 2007-12-14 01:20:00 UTC
(In reply to comment #43)
> As for reproducing, I thought you had do a careful poweroff, reboot dance to
> see the corruption reliably?

I completely forgot, sorry!

I just need to start X once with a combination of options that do _not_ show the corruption, and it starts working for any combination of options.
One explanation would be that if FBC/EXA are enabled, the driver doesn't initialize some one-time chip registers, static variables etc. If I start without these options these regs/variables are initialized and if works fine after that with any options.
To break it again, I need to do a restart (will check cold/warm restarts and whether that makes a difference). That indicates that it might be a missing hardware initialization in the FBC/EXA path.
Comment 45 Michael Fu 2008-01-09 17:07:26 UTC
reassign to zhenyu for EXA fix.
Comment 46 Wang Zhenyu 2008-01-09 17:24:48 UTC
(In reply to comment #44)
> 
> I just need to start X once with a combination of options that do _not_ show
> the corruption, and it starts working for any combination of options.
> One explanation would be that if FBC/EXA are enabled, the driver doesn't
> initialize some one-time chip registers, static variables etc. If I start
> without these options these regs/variables are initialized and if works fine
> after that with any options.
> To break it again, I need to do a restart (will check cold/warm restarts and
> whether that makes a difference). That indicates that it might be a missing
> hardware initialization in the FBC/EXA path.
> 

yeah, this is exactly what I also saw, any options after a working one always works. So there might be some initializations should be done for FBC/EXA, as it seems once it is not set up, exa rendering results can't be updated through FBC path. Not sure if there's any cache problems, go back to spec now...

Comment 47 Wang Zhenyu 2008-01-09 22:20:43 UTC
From spec, I just be awared that fbc will discard alpha value in pixel.

Current finding is that when I disable texture format PICT_a8r8g8b8 and PICT_a8b8g8r8, fbc/exa seems fine on 965GM. But X render should provide pre-multiply pixel values, no? I haven't got an idea on what really going on in this case...

Comment 48 Wang Zhenyu 2008-01-09 22:56:32 UTC
In 16bpp mode, all seems ok. So the broken is in 32bpp mode.
Comment 49 Wang Zhenyu 2008-01-14 18:32:01 UTC
Remove misleading 32bpp failure topic words, as it seems still fail and can be repeated every time when I enable fbc...
Comment 50 Tomas Carnecky 2008-01-18 01:40:28 UTC
Today I saw http://lists.freedesktop.org/archives/xorg/2008-January/031967.html and it reminded me of this bug. I saw that the bug he is referring to was resolved as fixed with a patch commited to the master branch. The patch description (Subject: [PATCH] more i830 3d state initialization
) sounded like something that may be related to this bug, or even fix it! So I updated to the latest git version, checked once again (git-log) that the patch was indeed in the master branch, compiled and installed the driver. And to my surprise no more corruption with EXA and framebuffer compression :) To verify that EXA and FBC are enabled I checked the xorg log:

(==) intel(0): Depth 24, (==) framebuffer bpp 32
(==) intel(0): RGB weight 888
(==) intel(0): Default visual is TrueColor
(**) intel(0): Option "AccelMethod" "EXA"
(**) intel(0): Option "ModeDebug" "true"
(**) intel(0): Option "FramebufferCompression" "true"
(II) intel(0): Integrated Graphics Chipset: Intel(R) 965GM
(--) intel(0): Chipset: "965GM"

[...]

(==) intel(0): VideoRam: 262144 KB
(**) intel(0): Framebuffer compression enabled
(**) intel(0): Tiling enabled
(II) intel(0): Attempting memory allocation with tiled buffers.
(II) intel(0): Success.

[...]

(II) EXA(0): Offscreen pixmap area of 23654400 bytes
(II) EXA(0): Driver registered support for the following operations:
(II)         Solid
(II)         Copy
(II)         Composite (RENDER acceleration)

[...]

(II) intel(0): Fixed memory allocation layout:
(II) intel(0): 0x00000000-0x0001ffff: ring buffer (128 kB)
(II) intel(0): 0x00020000-0x0061ffff: compressed frame buffer (6144 kB, 0x000000007d820000 physical)
(II) intel(0): 0x00620000-0x00620fff: compressed ll buffer (4 kB, 0x000000007de20000 physical)
(II) intel(0): 0x00621000-0x0062afff: HW cursors (40 kB)
(II) intel(0): 0x0062b000-0x00632fff: logical 3D context (32 kB)
(II) intel(0): 0x00633000-0x00642fff: exa G965 state buffer (64 kB)
(II) intel(0): 0x00700000-0x00e84fff: front buffer (7700 kB) X tiled
(II) intel(0): 0x0077f000:            end of stolen memory
(II) intel(0): 0x00e85000-0x02513fff: exa offscreen (23100 kB)
(II) intel(0): 0x02514000-0x02ca3fff: back buffer (7744 kB) X tiled
(II) intel(0): 0x02ca4000-0x03433fff: depth buffer (7744 kB) Y tiled
(II) intel(0): 0x03434000-0x05433fff: classic textures (32768 kB)
(II) intel(0): 0x10000000:            end of aperture

[...]

For me this bug is solved :) If you want the full log, just give me a shout.

thanks
Comment 51 Tomas Carnecky 2008-01-18 03:04:05 UTC
I'm terribly, terribly sorry. I fell into my own trap. The corruption is still there. But because I had started X without FBC before, it worked. Just now I rebooted the laptop and saw the corruption in X. So it's still not fixed. I'm sorry for the noise, shame on me :(
Comment 52 Erik Andren 2008-01-19 09:10:50 UTC
I can confirm this bug and also that the ExaNoComposite resolves the problem.
This is with a current xserver and intel driver
Comment 53 Michael Fu 2008-01-21 23:34:18 UTC
(In reply to comment #52)
> I can confirm this bug and also that the ExaNoComposite resolves the problem.
> This is with a current xserver and intel driver
> 

Erik, would you please confirm turning FBC off can resolve your problem too?
Comment 54 Erik Andren 2008-01-21 23:39:34 UTC
(In reply to comment #53)
> (In reply to comment #52)
> > I can confirm this bug and also that the ExaNoComposite resolves the problem.
> > This is with a current xserver and intel driver
> > 
> 
> Erik, would you please confirm turning FBC off can resolve your problem too?
> 

Yes, I only experience this problem when enabling FBC
Comment 55 Jesse Barnes 2008-01-22 08:36:59 UTC
Michael, I still think this is an EXA related bug, rather than an FBC bug, so Zhenyu is probably the best one to handle it.
Comment 56 Erik Andren 2008-02-21 03:32:24 UTC
This bug is resolved for me with git current.
Comment 57 Tomas Carnecky 2008-02-21 04:12:31 UTC
(In reply to comment #56)
> This bug is resolved for me with git current.
> 

Not for me, with xf86-video-intel/master 28049540d8a9f79401fcfeb90784f5a528e7b34f I have see the bug.
Comment 58 Erik Andren 2008-02-21 05:05:58 UTC
(In reply to comment #56)
> This bug is resolved for me with git current.
> 

Seems like I spoke too soon.
The bug is still present.
Apparently, if I only reboot the computer, some graphics state is retained.
I need to halt the system for the corruption to appear.
Comment 59 Jesse Barnes 2008-02-22 16:00:21 UTC
Erik or Thomas, can you get register dumps again with the latest tree, both before suspend and after resume (both from the console)?  There's a bit in the CACHE_MODE_0 register that may explain this behavior, I'm curious to see if it changed.
Comment 60 Michael Fu 2008-02-29 15:33:00 UTC
Erik or Thomas, any response?
Comment 61 Michael Fu 2008-03-12 00:23:37 UTC
(In reply to comment #60)
> Erik or Thomas, any response?
> 

ping for response, Erik and Thomas.
Comment 62 Tomas Carnecky 2008-03-12 02:07:53 UTC
(In reply to comment #59)
> Erik or Thomas, can you get register dumps again with the latest tree, both
> before suspend and after resume (both from the console)?  There's a bit in the
> CACHE_MODE_0 register that may explain this behavior, I'm curious to see if it
> changed.

Sorry for the late response. Two questions. How do I get the register dumps? By enabling "ModeDebug" in xorg.conf? And second question, why suspend/resume? Does it need to be that or is dumping the registers in 'working' and 'screen corrupted' states enough?
Comment 63 Jesse Barnes 2008-03-13 09:34:51 UTC
Sorry, you're right suspend/resume isn't the issue here; I'd like to see what CACHE_MODE_0 is set to on your machines in both the working and non-working states...
Comment 64 Tomas Carnecky 2008-03-13 15:22:10 UTC
No matter what I do (suspend/resume/restart the laptop, enabling/disabling FBC, screen corrupted or not) the cache register always stays 0x6800.
Comment 65 Jesse Barnes 2008-03-18 18:28:26 UTC
Ok, thanks for checking, no luck there.
Comment 66 Wang Zhenyu 2008-05-19 20:30:35 UTC
Per Jesse's request I tested current master branch with fbc on 965GM, it seems this bug has been gone. And I also try 2.3-branch, but render corrupt appear again. After tracking changes between them, I found there might be gen4 state alignment issue in origin i965 render code that lives on 2.3-branch. So please retest current driver.

Retest is done by enable "FramebufferCompression" option, and a cold boot up might be wanted.

You may first try current master branch. For testing 2.3-branch, please apply following patch to 2.3-branch or 2.3.1 release source.

Comment 67 Wang Zhenyu 2008-05-19 20:31:35 UTC
Created attachment 16637 [details] [review]
(2.3-branch) gen4 state align patch
Comment 68 Tomas Carnecky 2008-05-20 01:14:34 UTC
Bug does not show up with current master branch. However both the 2.3 branch as well as the 2.3 branch with your patch applied show corruption. So the bug was not in the alignment, but somewhere else. The bug is fixed for me :) If you want me to do git-bisect to find the cause of it, I'll do that.
Comment 69 Wang Zhenyu 2008-05-20 01:40:20 UTC
Yeah, bisect may be helpful and this is really weird that last alignment patch did run fine, but failed again after a power cycle. How about following patch which does more alignment change for 2.3?
Comment 70 Wang Zhenyu 2008-05-20 01:41:21 UTC
Created attachment 16644 [details] [review]
(2.3-branch) more alignment change
Comment 71 Tomas Carnecky 2008-05-20 01:54:03 UTC
This new patch fixed the bug in the 2.3 branch.

Is the 64 bit alignment some kind of undocumented hardware requirement that nobody know about and how did you come up with the idea that it causes the corruption? Just curious :)

Now that the bug is fixed, you could revert this commit, at least in master:

commit 53e3693ef13f31f3fc33bcff7286ab2b03b2d430
Author: Jesse Barnes <jbarnes@hobbes.virtuousgeek.org>
Date:   Wed Nov 14 16:24:56 2007 -0800

    Disable FBC by default on 965GM
    
    Several people have reported that they see frequent FBC related display
    corruption on 965GM, so disable it for now.  Users wanting to enable it can use
    the driver option "Framebuffercompression" to override the default.

Comment 72 Wang Zhenyu 2008-05-20 02:09:39 UTC
Good to know! As Jesse reminded me this issue might have been gone, I just compare between master and 2.3-branch, that master has Eric/Carl's work on reduce i965 state buffer setup time. So it seems no functional change, but only some state mem's layout and alignment change, that leaded me to try. 

However I still don't know why it's fixed, and I'd like to do more tests to show there's no new issues. Thanks for testing! ;)
 
Comment 73 Jesse Barnes 2008-05-21 11:53:37 UTC
Ok, I pushed the revert for both 965GM and future chips.
Comment 74 Erik Andren 2008-05-22 02:33:50 UTC
Hi, 
using the alignment patch helped a great deal for me using a gm965 chipset and Ubuntu Hardy Heron 64 bit.
Everything works except that the GDM login screen doesn't display any characters when written into. Once logged everything works as expected except when running my webcamera using any program. Every once in a while a frame sticks. I then need to move the window to get it going again.
Comment 75 Lukas Hejtmanek 2008-05-24 11:06:57 UTC
Created attachment 16720 [details]
My xorg.conf

I still get some image corruption using the latest git head of the Intel driver and the Xserver. Once the corruption occurs, it gets worse and worse. I can see it only with fonts. Non-antialiased fonts are more prone to corruption. However, as I'm typing this message in firefox, the bottom line of actual text line is not displayed at all (all the previous lines are OK and if I change focus to another windows, all the text is OK again.). Because of the notice about focus, I'm unable to obtain screenshot of the corruption.
Comment 76 Lukas Hejtmanek 2008-05-24 11:15:28 UTC
Created attachment 16721 [details]
Fake screenshot

As I stated, I'm unable to provide real screenshot. So I took ordinary screenshot and using gimp I made how the corruption typically looks like. The comment box is corrupted at the last text line.
Comment 77 Michael Fu 2008-06-19 00:54:00 UTC
(In reply to comment #75)
> Created an attachment (id=16720) [details]
> My xorg.conf
> 
> I still get some image corruption using the latest git head of the Intel driver
> and the Xserver. Once the corruption occurs, it gets worse and worse. I can see
> it only with fonts. Non-antialiased fonts are more prone to corruption.
> However, as I'm typing this message in firefox, the bottom line of actual text
> line is not displayed at all (all the previous lines are OK and if I change
> focus to another windows, all the text is OK again.). Because of the notice
> about focus, I'm unable to obtain screenshot of the corruption. 
> 

Lukas, to verify you are running into the same bug,  you need to confirm that using FBC can cause this issue and disabling FBC will stop trigger the corruption, otherwise you may run into a different issue and need to open a new bug.

From the comment# 74, it sounds to me that this bug has been fixed...
Comment 78 Michael Fu 2008-07-07 00:58:52 UTC
As it's gone in master branch. zhenyu has work around in 2.3 branch anyway, so I close this bug. please open new bug if it appears again. thanks.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.