Bug 20520

Summary:

[945GM] display freezes a few minutes after resuming

Product:

xorg

Reporter:

Martin Pitt <martin.pitt>

Component:

Driver/intel

Assignee:

Jesse Barnes <jbarnes>

Status:

RESOLVED FIXED

QA Contact:

Xorg Project Team <xorg-team>

Severity:

critical

Priority:

high

CC:

brian, clotho67, cwillu, eric, kui.zheng, lool, lubos.kolouch, nalimilan, neitzke, peng.li, tmezzadra, zack.evans

Version:

git

Keywords:

NEEDINFO, regression

Hardware:

x86 (IA32)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
dmesg	none
registers after clean boot	none
registers after hibernate	none
registers after screen freeze	none
Xorg.0.log	none
stracing X after freeze	none
lspci -vvnn	none
GPU dump with 2.6.30rc2	none
Dump with 2.6.30-rc8-git6	none
KMS/composite freeze logs from Martin Pitt	none
script to do s3 automatically	none

Description Martin Pitt 2009-03-07 00:41:29 UTC

A few minutes after resuming from suspend or hibernate, the display suddenly freezes. This is not triggered by anything obvious (such as starting a particular program), just randomly after some key presses or mouse movements.

After display freezing, I can still ssh into the box. The entire user session continues to run, I can start programs, etc. 

I didn't see anything interesting in dmesg and Xorg.0.log, gdb stack trace is totally useless, stracing the X server shows that it's usually waiting in an ioctl(), and receives keyboard and mouse events. I'll attach detailled logs in a minute.

The single change that I spotted was in the registers:

--- regs.afterhibernate.txt	2009-03-07 08:24:27.000000000 +0100
+++ regs.freeze.txt	2009-03-07 08:35:18.000000000 +0100
@@ -140 +140 @@
-(II):              MI_MODE: 0x00000200
+(II):              MI_MODE: 0x00000000

This consistently changes like this after such a freeze happens. Whenever I look at MI_MODE in a working system, it is always 0x00000200. No other registers change after the freeze.

This started happening two or three weeks ago. I am on Ubuntu jaunty (development release), which closely tracks X.org upstream releases. It never happened until then.

It definitively did not happen with -intel 2.4.1/X.org 1.5.2/Linux 2.6.27. Now I have -intel 2.6.1/X.org 1.6.0/Linux 2.6.28.7.

My hardware:
 Intel mobile 945GM
 Intel Core 2 Duo 1.2
 1 GB RAM
 Internal 1280x800 LVDS (switched off)
 External 1280x1024 TFT

Comment 1 Martin Pitt 2009-03-07 00:42:37 UTC

Created attachment 23611 [details]
dmesg

dmesg output (nothing interesting after the freeze). This is a clean boot, hibernate, and resume.

Comment 2 Martin Pitt 2009-03-07 00:43:03 UTC

Created attachment 23612 [details]
registers after clean boot

Comment 3 Martin Pitt 2009-03-07 00:43:51 UTC

Created attachment 23613 [details]
registers after hibernate

Probably not too interesting, since right after hibernate, everything works fine, but for completeness:

$ diff -U0 regs.cleanboot.txt regs.afterhibernate.txt 
--- regs.cleanboot.txt	2009-03-06 18:49:36.000000000 +0100
+++ regs.afterhibernate.txt	2009-03-07 08:24:27.000000000 +0100
@@ -34 +34 @@
-(II):                 LVDS: 0xc0308300 (enabled, pipe B, 18 bit, 1 channel)
+(II):                 LVDS: 0x40300300 (disabled, pipe B, 18 bit, 1 channel)
@@ -46 +46 @@
-(II):         PFIT_CONTROL: 0x00000000
+(II):         PFIT_CONTROL: 0x00002668
@@ -166 +166 @@
-(II): pipe B dot 77142 n 2 m1 14 m2 8 p1 2 p2 14
+(II): pipe B dot 108000 n 2 m1 14 m2 8 p1 2 p2 10

Comment 4 Martin Pitt 2009-03-07 00:44:09 UTC

Created attachment 23614 [details]
registers after screen freeze

Comment 5 Martin Pitt 2009-03-07 00:44:29 UTC

Created attachment 23615 [details]
Xorg.0.log

Comment 6 Martin Pitt 2009-03-07 00:47:21 UTC

Created attachment 23617 [details]
stracing X after freeze

This is from ssh'ing into the frozen box and attaching strace to X. I see

ioctl(11, 0x6458, 0)                    =

Then I walked over, wiggled the mouse a bit, and pressed two keys. The strace shows that apparently those events were still received, and it didn't get stuck in a tight infinite loop or something like this. Thus I think that by and large the server still worked.

However, it should be noted that I tried to press "q" to quit the mutt I was working on when the freeze started. Going back to the ssh session mutt was still running, so I don't think that the "q" keypress actually made it all the way through to mutt. So maybe it's not just a screen freeze, but a little harder than that.

Comment 7 Martin Pitt 2009-03-07 00:49:51 UTC

Trying to attach gdb wasn't very successful unfortunately. I do have the debug symbols of X.org, libx11, libc6, etc. installed, but still the stack trace is totally useless. Perhaps the "Cannot access memory at address 0xffe85fec" has something to do with it, but I don't know why it's doing that.

$ ps aux|grep X
root      3470  0.0  5.2 115892 53076 tty7     Ss+  Mar06   0:45 /usr/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
martin    6497  0.0  0.0   3348   816 pts/0    S+   08:39   0:00 grep X
0 martin@tick:~/xdebug
$ sudo gdb /usr/bin/X 
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(no debugging symbols found)
(gdb) attach 3470
Attaching to program: /usr/bin/X, process 3470
Cannot access memory at address 0xffe85fec
(gdb) bt
#0  0xb7f2b430 in ?? ()
#1  0xb783fee2 in ?? ()
#2  0xb77d60ff in ?? ()
#3  0x0817c0eb in ?? ()
#4  0x08145088 in ?? ()
#5  0x080910c8 in ?? ()
#6  0x081319a4 in ?? ()
#7  0x0808d1ce in ?? ()
#8  0x080721fd in ?? ()
#9  0xb7af5775 in ?? ()
#10 0x080716b1 in ?? ()
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y 
Detaching from program: /usr/bin/X, process 3470

XrandR information: (LVDS off, external TFT on, laptop is docked and closed):

$ xrandr 
Screen 0: minimum 320 x 200, current 1280 x 1024, maximum 1280 x 1280
VGA disconnected (normal left inverted right x axis y axis)
LVDS connected (normal left inverted right x axis y axis)
   1280x800       59.8 +
   1024x768       85.0     75.0     70.1     60.0  
   832x624        74.6  
   800x600        85.1     72.2     75.0     60.3     56.2  
   640x480        85.0     72.8     75.0     59.9  
   720x400        85.0  
   640x400        85.1  
   640x350        85.1  
TMDS-1 connected 1280x1024+0+0 (normal left inverted right x axis y axis) 340mm x 270mm
   1280x1024      75.0*+   60.0  
   1280x960       60.0  
   1152x864       75.0  
   1024x768       85.0     75.0     70.1     60.0  
   832x624        74.6  
   800x600        85.1     72.2     75.0     60.3     56.2  
   640x480        85.0     75.0     72.8     66.7     59.9  
   720x400        70.1  
TV disconnected (normal left inverted right x axis y axis)

Comment 8 Martin Pitt 2009-03-07 00:50:11 UTC

Created attachment 23618 [details]
lspci -vvnn

Comment 9 Eric Anholt 2009-03-07 00:50:30 UTC

that bit just says that the ring is busy -- it's probably just a side effect of the chip being hung.

Comment 10 Martin Pitt 2009-03-07 00:53:57 UTC

Finally, my xorg.conf:

$ cat /etc/X11/xorg.conf 
Section "Device"
        Identifier      "Configured Video Device"
        Option          "FramebufferCompression" "off"
EndSection

I need to set this option because of bug 19304.

Comment 11 Martin Pitt 2009-03-07 02:26:31 UTC

I confirm that this also happens if I use the laptop undocked, with just the internal LVDS:

$ xrandr 
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 1280 x 1280
VGA disconnected (normal left inverted right x axis y axis)
LVDS connected 1280x800+0+0 (normal left inverted right x axis y axis) 261mm x 163mm
   1280x800       59.8*+
   1024x768       85.0     75.0     70.1     60.0  
   832x624        74.6  
   800x600        85.1     72.2     75.0     60.3     56.2  
   640x480        85.0     72.8     75.0     59.9  
   720x400        85.0  
   640x400        85.1  
   640x350        85.1  
TMDS-1 disconnected (normal left inverted right x axis y axis)
TV disconnected (normal left inverted right x axis y axis)

Comment 12 Lubos Kolouch 2009-03-09 11:00:37 UTC

I can confirm this issue, it happens in Gentoo and Arch for me... it is very annoying

Comment 13 Martin Pitt 2009-03-20 02:02:00 UTC

I have now upgraded to Linux 2.6.28.8 and -intel 2.6.3, and suspend/hibernate now works fine again, no hangs any more. Thus I tentatively close this now.

Lubos, if it still happens for you with the latest version, please reopen.

Comment 14 Martin Pitt 2009-03-23 01:05:12 UTC

Sorry, just got it again. It seems to happen a lot less often now, but still there.

Comment 15 Lubos Kolouch 2009-03-23 01:09:27 UTC

It *just* happened to me as well.
I did several times suspend & resume during the weekend, all OK,
but now X stopped responding.

Gentoo kernel 2.6.28-r4, xf86-video-intel-2.6.3-r1

Comment 16 Lubos Kolouch 2009-03-26 00:43:48 UTC

After latest upgrade it happens again 100% of the time... 
work->hibernate->resume->wait->freeze->reboot

gentoo-sources-2.6.29
xf86-video-intel-2.6.3-r1
mesa-7.3-r1

Comment 17 Martin Pitt 2009-03-31 01:48:37 UTC

Confirmed that this still happens with the latest (v 5) patch in bug 18651, so this is apparently not related to pipe underruns.

Comment 18 Lubos Kolouch 2009-03-31 01:58:02 UTC

I wonder if it is not related to

http://bugzilla.kernel.org/show_bug.cgi?id=12778

Comment 19 Martin Pitt 2009-03-31 02:42:58 UTC

Indeed, I also get this message when it happens:

Mar 29 23:32:54 tick kernel: [14858.069290] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count f
or disabled pipe 1
Mar 29 23:32:54 tick kernel: [14858.074255] mtrr: no MTRR for d0000000,10000000 found

Comment 20 Martin Pitt 2009-03-31 03:26:42 UTC

I confirm that running X with Option "DRI" "off", and rmmod'ing i915 and drm, suspend works fine. This might indicate that http://bugzilla.kernel.org/show_bug.cgi?id=12778 is indeed the cause of this.

Comment 21 Jesse Barnes 2009-04-06 14:13:04 UTC

Can you confirm that you're not running 'vbetool post' or with any of the ACPI S3 reposting stuff?  That's caused problems for us in the past...

Comment 22 Martin Pitt 2009-04-06 15:15:52 UTC

I just gave a thorough testing to the pm-utils scripts and quirks, and confirm that /usr/lib/pm-utils/sleep.d/98smart-kernel-video still does the right thing. I. e. it filters out all quirks for intel on >= 2.6.26 and thus does not run any quirks (and thus no VBE post/S3 stuff).

Comment 23 Jesse Barnes 2009-04-06 16:04:14 UTC

So far it looks ok on my 945 with the latest Jaunty bits (so 2.6.28-11-generic and xf86-video-intel 2.6.3), but I've only been waiting a few minutes (while moving windows around and browsing the web).  Can you reproduce it with the 2.6.3 driver?  It has quite a few fixes that might be relevant.

Comment 24 Martin Pitt 2009-04-06 17:42:46 UTC

When I tested the suspend quirks, I was running with DRI enabled again (on current Jaunty, i. e. with 2.6.3). It indeed survived for about 10 minutes, then it froze. This also happened to a colleague of mine here at the CELF/LF summit, who also has a 945.

As I said, it is totally erratic. I had it survive for as much as 2 hours, then only for 1 minute, in most of the cases it's like 5 minutes. I couldn't see a pattern when it happens wrt. to the actions performed. In many cases I was just reading something and didn't even move the mouse.

Comment 25 Martin Pitt 2009-04-08 09:46:59 UTC

We finally found the reason for this. Our kernel had the patch from http://bugzilla.kernel.org/show_bug.cgi?id=12950 applied, to improve performance for netbooks. This patch was now identified as causing this regression, and we reverted it.

Thus I close this bug report now. Lubos, if you want to "take over" this bug, please reopen; perhaps you could check if above patch is in Gentoo as well?

Comment 26 Jesse Barnes 2009-04-08 09:59:17 UTC

Thanks for the update Martin... It's strange that the MCHBAR patch would cause problems with suspend/resume though.  I'll look through the patch again but if you get a chance could you try running with the patch but with tiling disabled in your xorg.conf (option "tiling" "false")?

Comment 27 Lubos Kolouch 2009-04-08 10:13:30 UTC

Martin, it happens to me also with vanilla kernel.

Comment 28 Martin Pitt 2009-04-08 23:42:40 UTC

I booted the previous kernel with the MCHBAR patch, disabled tiling, suspended, and it hanged again after about an hour.

I have run with the updated kernel (with the MCHBAR patch reverted) all day, and on the conf I'm using suspend/resume a lot. No hang here. However, I haven't looked what that MCHBAR patch was about. I cannot assert whether reverting it really fixed the suspend hang to 100%, or whether it was just sheer luck that it survived a day. Before that, I got the hang pretty reliably within an hour, though.

Comment 29 Martin Pitt 2009-04-09 07:21:39 UTC

It seems I just was lucky yesterday, it survived the entire day without freezing. But sure enough, when I kept my laptop suspended over night and resumed this morning, it froze after a couple of minutes.

So it was unrelated to the MCHBAR patch after all. Darn! :-/

Comment 30 Jesse Barnes 2009-04-09 09:42:55 UTC

Thanks for testing Martin, I'll see if I can reproduce locally (again, I guess I'm in for lots of waiting).  If you could capture a backtrace via gdb of the hung server that might help a lot.

Comment 31 Martin Pitt 2009-04-09 09:56:10 UTC

I think I did already, and it delivered nothing but ??. Also, I don't think it's actually hung, since I can still strace it and see mouse/keyboard activity. But I'll try harder to gdb it once I'm back home next week (with just a single laptop at the conference I don't have a place to ssh into the box).

Comment 32 Jesse Barnes 2009-04-09 10:07:46 UTC

Oh yeah you did, forgot about that.  I'm not sure why gdb wasn't able to attach properly but hopefully you can figure that out and get a useful trace.  I usually just su and do it as root rather than using sudo (not sure how that affects uid and effective uid etc).

Comment 33 Martin Pitt 2009-04-20 04:30:57 UTC

Created attachment 24962 [details]
GPU dump with 2.6.30rc2

I tried to reproduce this with linux 2.6.30RC2 and libdrm 2.4.9, so that I could use intel_gpu_dump (standard Jaunty, where I encountered the hang before, has 2.6.28.8 and libdrm 2.4.5). However, the symptomps are now slightly different, so I'm not sure whether this is useful at all:

 - I get hangs without any special VT switches/suspend/etc after a few hours.
 - After suspend, the first hang again occurs after a few minutes
 - Unlike with standard jaunty, I can recover from the hang with a VT switch, but then it again happends after a few minutes. GPU dump attached (compressed, sorry, raw file was too big for bugzilla)
 - This also happens without compositing (where as disabling compiz was a good workaround for the original bug here).

For each hang that happens, I get

  [  204.095061] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1

in dmesg.

Comment 34 Lubos Kolouch 2009-04-20 22:35:38 UTC

I attached similar dump of frozen GPU to #20560 ... seems like we are tracing the same issue in two bugs...

Comment 35 Martin Pitt 2009-04-21 00:31:56 UTC

Lubos, thanks. However, please note that the GPU dump is for the hangs which happen on 2.6.30RC2, which behave very different to the hangs I get on 2.6.28.8. I just can't use intel_gpu_dump on the latter, so this was my (vain) attempt to provide info for the original hang.

Comment 36 Lubos Kolouch 2009-04-21 00:44:03 UTC

Martin, my dump is also from 2.6.30RC2 and it behaves exactly the same for me as in 2.6.28 and 2.6.29 ! I can't get away from it just by changing the VT!

Comment 37 Martin Pitt 2009-04-28 14:02:02 UTC

For the record, I now updated to linux 2.6.30rc3, -intel 2.7.0, libdrm 2.4.9, and turned on UXA. Things are running smoothly now, and I suspended about 5 times during the afternoon/evening without any problem.

Comment 38 Jesse Barnes 2009-05-04 09:53:46 UTC

Ok, marking fixed.  Thanks Martin.

Comment 39 Lubos Kolouch 2009-06-10 10:26:16 UTC

As mentioned in #20560 , this is far from fixed...

Comment 40 Lubos Kolouch 2009-06-10 10:27:01 UTC

Created attachment 26643 [details]
Dump with 2.6.30-rc8-git6

Comment 41 Martin Pitt 2009-06-14 13:11:03 UTC

Created attachment 26783 [details]
KMS/composite freeze logs from Martin Pitt

It had worked fine for some weeks (KMS+compiz) on my i945, but now it's back. I'm following Ubuntu's "xorg-edgers" archive which has very current snapshots of upstream. Unlike most regressions that I see, this one isn't just a temporary glitch, it's been broken for over a week now. It now freezes about two seconds after resuming, not several minutes, but otherwise the symptoms are very similar. Should I open a new bug about this, or is it the same? Logs attached (dmesg, gpu, registers, Xorg.log). My current versions:

  Linux 2.6.30 final, with git pull from anholt/drm-intel.git (commit 03d606991)
  libdrm from 2009-06-06 (3d4bfe8c)
  mesa from 2009-06-13 (18af7c38)
  intel from 2009-06-11 (6d062e9e)

I tried the following combinations:

 - KMS, X.org session with compiz: usually freezes; seldomly it survives first suspend, freezes on second
 - no KMS, X.org session with compiz: ok
 - KMS, VT only: ok
 - KMS, gdm only (no composite): ok
 - KMS, X.org session with metacity (no composite): ok
 - KMS, X.org with compiz, switch to VT1 before suspend: ok on resume, often freezes as soon as switching back to X.org

Comment 42 Yifei Chen 2009-06-15 19:34:07 UTC

We tested this bug on 945GM with master branch, display will freeze right after system wake from S4 if we are running gnome with or without compiz. If we run raw X, most of time the system could wake from S4 correctly, but one time, it crashed the whole system. S3 works fine.

Comment 43 Jesse Barnes 2009-06-16 15:21:31 UTC

*** Bug 22039 has been marked as a duplicate of this bug. ***

Comment 44 Jesse Barnes 2009-06-16 15:22:44 UTC

Ug, ok sounds like there are real issues with KMS resume.  Let's keep S3 and S4
separate though; can someone seeing an issue with hibernate file a separate
bug?

Comment 45 Jesse Barnes 2009-06-16 15:24:09 UTC

*** Bug 22010 has been marked as a duplicate of this bug. ***

Comment 46 Li Peng 2009-06-17 02:16:13 UTC

Created attachment 26881 [details]
script to do s3  automatically

This is a script to do S3 resume automatically, should be help to reproduce this issue

Comment 47 Li Peng 2009-06-17 02:21:37 UTC

I met the same problem in moblin, after 3 times S3 resume, screen become blank. I got the regdump diff of good and bad s3 resume, same as above 

-(II):              MI_MODE: 0x00000200
+(II):              MI_MODE: 0x00000000

Comment 48 Gordon Jin 2009-06-17 18:27:06 UTC

(In reply to comment #44)
> Ug, ok sounds like there are real issues with KMS resume.  Let's keep S3 and S4
> separate though; can someone seeing an issue with hibernate file a separate
> bug?
> 

Is bug#22263 the hibernation bug?

Comment 49 Li Peng 2009-06-17 20:29:17 UTC

(In reply to comment #46)
> Created an attachment (id=26881) [details]
> script to do s3  automatically
> 
> This is a script to do S3 resume automatically, should be help to reproduce
> this issue 
> 

Maybe 10 sec is not enough.  I change the sleep and wake up time to 15sec, and test 20 times suspend/resume, it works well.

Comment 50 Milan Bouchet-Valat 2009-06-19 05:53:11 UTC

Gordon:  bug#22263 is not the hibernation problem I'm seeing, and doesn't seem to be Martin's either (comment #41). I don't get any screen corruption. See bug 22366.

Comment 51 Jie Luo 2009-06-21 01:32:30 UTC

(In reply to comment #45)
> *** Bug 22010 has been marked as a duplicate of this bug. ***
> 

I'm not sure whether this is a duplicate of this bug. I have done some tests. I'm sure kernel 2.6.29.4 and 2.6.30-rc5 is good. The screen corruption and X hang only occur on kernel after 2.6.30-rc6. I'll try do some bisect to see which commit is suspicious.

Comment 52 Jie Luo 2009-06-21 07:15:49 UTC

Well, git bisect shows that revert

commit: 79f11c19a396e8cea7dad322dcfb46c0a8517fe6
drm/i915: save/restore fence registers across suspend/resume

make kernel 2.6.30 resume works again. kernel 2.6.30-rc5 + the above commit doesn't cause this hang, so it could be some conflict between this commit and other commits for kernel 2.6.30-rc6.

Here is some addition info.

i915_gem_fence_regs before suspend:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f676c360: P 00c00000 00400000 00001000 X 00000002 00000002 0 (name: 1)
Fenced object[ 4] = f6901f00:   02000000 00400000 00001000 X 00000002 00000002 0 (name: 2)
Fenced object[ 5] = f6901f60:   02400000 00400000 00001000 X 00000002 00000002 0 (name: 3)
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

i915_gem_fence_regs after resume:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f6042780: P 00c00000 00400000 00001000 X 00000002 00000000 0 (name: 1)
Fenced object[ 4] = unused
Fenced object[ 5] = unused
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

Comment 53 Jie Luo 2009-06-21 07:18:07 UTC

(In reply to comment #52)

Sorry, this is the one after resume.

i915_gem_fence_regs after resume:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f676c360: P 00c00000 00400000 00001000 X 00000002 00000000 0 (name: 1)
Fenced object[ 4] = unused
Fenced object[ 5] = unused
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

Comment 54 Jesse Barnes 2009-06-22 12:03:26 UTC

If fence register save/restore really is the issue, this patch should help.

Current code saves the fence registers before rendering has completed, which can affect fence register allocation.  If we save before rendering completes, and restore again at resume time, we may end up causing trouble with whatever objects land in the fenced space after resume.

Saving register state (including fences) *after* we've idled the memory manager should help with that.

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 98560e1..e3cb402 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -67,8 +67,6 @@ static int i915_suspend(struct drm_device *dev, pm_message_t s
 
        pci_save_state(dev->pdev);
 
-       i915_save_state(dev);
-
        /* If KMS is active, we do the leavevt stuff here */
        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
                if (i915_gem_idle(dev))
@@ -77,6 +75,8 @@ static int i915_suspend(struct drm_device *dev, pm_message_t s
                drm_irq_uninstall(dev);
        }
 
+       i915_save_state(dev);
+
        intel_opregion_free(dev, 1);
 
        if (state.event == PM_EVENT_SUSPEND) {

Comment 55 Jie Luo 2009-06-22 18:34:41 UTC

(In reply to comment #54)
> If fence register save/restore really is the issue, this patch should help.
> 

Yes, it does help my problem. The system can resume correctly again. I didn't see a hang so far.

Comment 56 Martin Pitt 2009-06-23 01:28:32 UTC

I tested the patch in comment 54 and also confirm that it fixes suspend/resume with the internal laptop monitor. Thanks!

It still fails with the external one, but that's a different problem, and I'm going to report it separately.

Comment 57 Tomas M. 2009-06-23 04:30:45 UTC

(In reply to comment #54)
> If fence register save/restore really is the issue, this patch should help.

applied the patch here and it appears to have fixed it for me..

intel gma950 laptop.

Comment 58 Jesse Barnes 2009-06-23 10:19:21 UTC

Great, thanks for testing.  Fix has been pushed into the kernel:

commit	9e06dd39f2b6d7e35981e0d7aded618686b32ccb
drm/i915: correct suspend/resume ordering

Comment 59 Gordon Jin 2009-06-23 18:39:34 UTC

(In reply to comment #58)
> Great, thanks for testing.  Fix has been pushed into the kernel:
> 
> commit  9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> drm/i915: correct suspend/resume ordering

The fix is in drm-intel-next branch.

Eric, please cherry-pick it into qa-branch so it'll be in Q2 package.

Comment 60 Jie Luo 2009-06-23 20:16:32 UTC

(In reply to comment #58)
> Great, thanks for testing.  Fix has been pushed into the kernel:
> 
> commit  9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> drm/i915: correct suspend/resume ordering
> 

Maybe this fix should also be send to 2.6.30.x stable branch, since it's a regression during the 2.6.30 rc process. And it will make user of the stable kernel happy. Thanks.

Comment 61 Jesse Barnes 2009-06-24 10:49:36 UTC

On Tue, 23 Jun 2009 20:16:32 -0700 (PDT)
> --- Comment #60 from Jie Luo <clotho67@gmail.com>  2009-06-23
> 20:16:32 PST --- (In reply to comment #58)
> > Great, thanks for testing.  Fix has been pushed into the kernel:
> > 
> > commit  9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> > drm/i915: correct suspend/resume ordering
> > 
> 
> Maybe this fix should also be send to 2.6.30.x stable branch, since
> it's a regression during the 2.6.30 rc process. And it will make user
> of the stable kernel happy. Thanks.

Good point, want to send a note to stable@kernel.org with the commit
info, proposing the patch for inclusion?

Thanks,

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.