Bug 17401

Summary: [GM965] Ghosting and display problems (randomly)
Product: xorg Reporter: Logan Lewis <lfreedesktop>
Component: Driver/intelAssignee: Jesse Barnes <jbarnes>
Status: RESOLVED NOTABUG QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium Keywords: NEEDINFO
Version: 7.3 (2007.09)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log 1
none
Xorg log 2
none
Camera photo of ghosting problem
none
Xorg log 3: ghosting after resume from suspend none

Description Logan Lewis 2008-09-01 22:04:59 UTC
Every few days on my week-old Thinkpad T61 with integrated Intel graphics (GM965/x3100) my built-in LCD goes a little strange in a way that's a bit difficult to describe.  The titlebar (usually a fairly solid color in my KDE theme) gets vertical stripes, and the LCD experiences ghost images: when I move a window away, I see a remnant of that image remain.  It slowly fades.  I tried to take a screenshot, but the screenshot appears normal.  Attached is a poor picture from a camera, but it clearly shows the ghosting after I moved the window down a few inches.

This problem does not occur with the external DVI-attached LCD.  The problem persists until I cycle the monitor either with xrandr --output LVDS --off or by closing the lid.  This problem has occurred with both XAA and EXA rendering.  Unfortunately, I have no reliable way of reproducing this issue on demand; it seems to occur randomly.

System info: Thinkpad T61 14.1" w/ 4 GB of RAM (this has been associated with other problems given /proc/mtrr issues)
Distribution: (K)ubuntu 8.04 x86_64
Intel driver versions: both 2.2.1-1ubuntu12 and self-compiled 2.4.2
lspci: 
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)

I will attach X log files from two of the occurrences.  One log has at the end:

(II) intel(0): [drm] removed 1 reserved context for kernel
(II) intel(0): [drm] unmapping 8192 bytes of SAREA 0x1efff000 at 0x7fda996b8000
(II) intel(0): [drm] Closed DRM master.

the other has: 
(EE) intel(0): First SDVOB output reported failure to sync
(EE) intel(0): underrun on pipe B!

I have one other saved which looks much like the second.

I don't know with certainty whether this is a software or a hardware problem (especially after I took a screenshot this last time).  If anyone else has experienced this I would appreciate a comment.
Comment 1 Logan Lewis 2008-09-01 22:06:34 UTC
Created attachment 18624 [details]
Xorg log 1
Comment 2 Logan Lewis 2008-09-01 22:06:59 UTC
Created attachment 18625 [details]
Xorg log 2
Comment 3 Logan Lewis 2008-09-01 22:10:13 UTC
Created attachment 18626 [details]
Camera photo of ghosting problem
Comment 4 Jesse Barnes 2008-09-22 17:52:52 UTC
Ooh weird.  My first thought is that this is a hardware problem, maybe your LVDS ribbon cable is loose or something.  But if you can't reproduce with the vesa driver after awhile, then it could be an Intel driver problem.  Can you try that out?
Comment 5 Logan Lewis 2008-09-22 18:03:56 UTC
Jesse, thanks for responding.

I've spent the last few weeks trying to narrow down the problem.  One of the first things I did was physically adjust the laptop to try to determine if it was a loose connection.  I'm reasonably certain that's not the cause.

I decided to put back in the 1 GB chip the laptop shipped with.  I went two weeks without reproducing the problem.  I've since tried putting in 1 of the 2 GB chips and will let that run for two weeks; before, the problem happened about once every other day.  At this point I think the problem is either bad RAM (possible, but I let memtest run overnight) or some problem with 4 GB of RAM.  The latter seems likely, as it's triggered bugs with /proc/mtrr being messed up.  It would also help explain why the problem isn't common I was running an x86_64 distro with 4 GB of RAM (maxing out the laptop).  I'm not sure this is a common configuration.

After I test each chip individually (another 3.5 weeks or so), I'll put both chips in for a few days to see if I trigger the problem again. 
Comment 6 Logan Lewis 2008-09-22 18:05:20 UTC
One more thing - having looked at "good" xorg logs now, I'm reasonably certain that nothing particularly special or distinctive shows up in them when the problem does occur.  
Comment 7 Jesse Barnes 2008-09-22 18:10:17 UTC
Ah hm, yeah RAM could be the issue I guess, though it seems like you'd catch that with memtest pretty quickly.  Testing with the vesa driver would still be a good idea though if you can't narrow it down further.
Comment 8 Jesse Barnes 2008-09-22 18:14:10 UTC
Oh I missed the second part of your message... I suppose a 4G configuration could also be an issue, if the BIOS didn't map the memory correctly.  Either way I'll await your test results.

Thanks,
Jesse
Comment 9 Logan Lewis 2008-09-26 18:38:53 UTC
With one of the 2GB chips installed (for a week), the ghosting problem recurred.  At the end of the log was:

(II) intel(0): [drm] removed 1 reserved context for kernel
(II) intel(0): [drm] unmapping 8192 bytes of SAREA 0x2efff000 at 0x7fc442fc4000
(II) intel(0): [drm] Closed DRM master.

This didn't show up every time in the past, but I don't recall seeing it in logs where I didn't have the ghosting problem.  Some searching turns up these messages relating to other bugs.  Does it possibly have any significance? 

So there still may be a RAM-related problem (I went 2 weeks with a 1 GB chip with no problems), but it doesn't appear to be a strictly 4GB problem.  I'm trying the other 2GB chip in case the first I tried was faulty in some way.
Comment 10 Jesse Barnes 2008-09-29 10:30:39 UTC
Those messages should be a normal part of the server exit process when DRM is enabled (which should be most of the time).  RAM seems like the most likely problem here, but if you can narrow it down to a DRM enabled vs. disabled configuration there may be something else going on.
Comment 11 Logan Lewis 2008-10-13 15:13:17 UTC
Though it took another 2 weeks, the ghosting problem recurred while resuming from suspend.  When the problem happened more frequently (with 4 GB of RAM, about every 2 days), it would often happen randomly rather than at startup or resume.

Nothing jumps out at me in the xorg log, but I'll post it anyway.  As before, closing the lid and opening it eliminates the ghosting problem; it's difficult to tell, but it's almost like there's a slight flicker in the display after "fixing" the ghosting problem in this way.  It goes away upon reboot.

At this point my biggest concern is to distinguish whether I have some faulty hardware (display, video chip, RAM, etc) or this is a strange driver bug I'm encountering.  It's strange that the problem was so much more frequent than it is now.  

I'd appreciate any suggestions.
Comment 12 Logan Lewis 2008-10-13 15:14:31 UTC
Created attachment 19637 [details]
Xorg log 3: ghosting after resume from suspend
Comment 13 Logan Lewis 2008-10-13 16:38:33 UTC
I know you suggested trying the VESA driver for a while to see if the problem recurs while using it.  I spent some time trying to get the native resolution of my internal LCD (1440x900) working with no luck (adding the modeline by hand didn't seem to work for me); am I wrong in thinking that the VESA driver does not support this?  I understand I'd lose my second monitor as well, of course. 
Comment 14 Michael Fu 2008-11-11 17:39:41 UTC
Logan, I think you havn't run out your combination of RAMs yet. It sounds to me:

1) 1GB RAM works
2) 2GB RAM doesn't work.

Have you tried to use the 2 1GB RAM of the 2GB mode each to see if they work in 1GB mode alone?

Also, please do test VESA driver, even though you'll lose the other monitor. It'll help us narrow down the problem.

I never see other similar report. It's very likely a HW issue, I think.
Comment 15 Logan Lewis 2008-11-11 17:55:19 UTC
Michael,

Thanks for responding.  I decided to try replacing the 4GB of RAM with another set from a different manufacturer (Crucial instead of Kingston).  So far, so good (about 2.5 weeks).  I'm hoping at this point that this was the cause.  Given the long periods between triggering the problem, however, I'm not yet completely convinced I've found the cause.  

That said, I'll resolve this bug for now and keep my fingers crossed that the problem doesn't recur.

Thanks again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.