Bug 22426 - [GM965 GEM KMS] GPU wedge during resume
Summary: [GM965 GEM KMS] GPU wedge during resume
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high critical
Assignee: Jesse Barnes
QA Contact:
Depends on:
Reported: 2009-06-22 20:55 UTC by Ben Gamari
Modified: 2017-07-24 23:09 UTC (History)
0 users

See Also:
i915 platform:
i915 features:

Post-mortem data (287.83 KB, application/octet-stream)
2009-06-22 20:55 UTC, Ben Gamari
no flags Details
Post-mortem data (20090720) (189.07 KB, application/octet-stream)
2009-07-20 07:40 UTC, Pantelis Koukousoulas
no flags Details
dmesg with the reset patches from bgamari applied (93.56 KB, text/plain)
2009-07-21 00:31 UTC, Pantelis Koukousoulas
no flags Details

Description Ben Gamari 2009-06-22 20:55:16 UTC
Created attachment 27022 [details]
Post-mortem data

When resuming from S3 today while running compiz I encountered a GPU lockup. It appears that the 3d unit was the cause of the crash given ringbuffer dump,

0x096133e8: HEAD 0x7b001c04: 3DPRIMITIVE: quad list sequential
0x096133ec: HEAD 0x00000004:    vertex count
0x096133f0: HEAD 0x00000000:    start vertex
0x096133f4: HEAD 0x00000001:    instance count
0x096133f8: HEAD 0x00000000:    start instance
0x096133fc: HEAD 0x00000000:    index bias

This was with the following components,

Xorg components as of Mon Jun 22 23:53:55 EDT 2009
drm: 	69fc600a9d34e9c2f01d5afc8977496edec80aeb
xf86-video-intel: 	d9e133e6874584e2a0d8ddbeba618cf8d5f1344e
mesa: 	c80ce5ac90b1e0ac7a72cd41c314aa2000bfecf5
xserver: 	0f441ed27c547c94c59547b313c40557773dddf1
kernel: Based on Linus's master (7e0338c0de18c50f09aea1fbef45110cf7d64a3c)
Comment 1 Jesse Barnes 2009-06-23 18:32:37 UTC
Just to confirm, this isn't fixed by the recent suspend/resume ordering fix in the drm-intel-next kernel?
Comment 2 Ben Gamari 2009-06-23 21:00:00 UTC
Unfortunately it doesn't seem to be.
Comment 3 Ben Gamari 2009-06-23 23:49:39 UTC
(In reply to comment #2)
> Unfortunately it doesn't seem to be.

It is rather interesting, though, that the GPU only freezes when the 3d unit is used (i.e. task switching in compiz). When the machine comes up from suspend I can interact with the gnome-screensaver unlock window effectively indefinitely. However, when I unlock the session and attempt to interact with compiz, the chip quickly locks up. Additionally, for the short duration when compiz does draw (the first few seconds after unlocking), alpha blending appears to be broken, drawing all alpha blended regions as fully transparent.
Comment 4 Pantelis Koukousoulas 2009-06-30 20:00:59 UTC
I think I 'm seeing this bug too (on G45). It doesn't seem to matter whether KMS or UMS is used (it happens in both cases).

The symptom is that several desktop (KDE 4.3) components become invisible, 
including e.g., the panels and window decorations (I think these are
all part of the plasma process nowadays)

Will follow up with more information after I 've done a little more testing
Comment 5 Jason Smith 2009-07-10 08:24:55 UTC
I experience this issue too. Upon resume all argb windows become fully transparent. Restarting my window manager fixes the issue however.
Comment 6 Ben Gamari 2009-07-14 10:58:55 UTC
(In reply to comment #5)
> I experience this issue too. Upon resume all argb windows become fully
> transparent. Restarting my window manager fixes the issue however.
I can also reproduce this simply by restarting compiz (without suspending/resuming). Running compiz --replace a second time during a session causes alpha blended windows to be rendered transparently.
Comment 7 Pantelis Koukousoulas 2009-07-20 07:39:05 UTC
So I devoted a little time to this problem today.

Kernel is latest drm-intel, the other versions are latest available (updated this morning) in kubuntu karmic / xorg-edgers repositories (see file versions.txt in the attached archive - gpu-suspend-lockup-20090720.tar.bz2). I 'm using KMS, but last time I tried from UMS it had the same problem.

If I activate KDE4's compositing / 3D effects support and perform a suspend to ram / resume cycle the gpu locks up. Unfortunately, ssh in also doesn't work
(It looks like the driver takes some lock with it to its doom. sshd asks for the password but hangs where it should spawn a shell - see hang_ssh.txt).

If I suspend from the console by echo mem > /sys/power/state rather than from inside X, the machine resumes ok, but locks up when I switch to X.org.

Therefore, I took gpu/reg dumps from before the suspend, after resume from console and then also used a sleep 10; intel_reg_dumper > dump.txt 
trick to try and get output from the wedged state (see test.sh).

Please ask if you would like any additional info.
Comment 8 Pantelis Koukousoulas 2009-07-20 07:40:03 UTC
Created attachment 27847 [details]
Post-mortem data (20090720)
Comment 9 Pantelis Koukousoulas 2009-07-21 00:31:29 UTC
Created attachment 27868 [details]
dmesg with the reset patches from bgamari applied

Trying the GPU reset patches. With those the driver detects the lockup and tries to reset the chip, but fails. Perhaps this is because the reset algorithm implemented does not work for my chip (G45).
Comment 10 Jesse Barnes 2009-08-31 09:47:34 UTC
There were some hang fixes that went in recently; do today's git bits still have this issue?
Comment 11 Ben Gamari 2009-09-01 20:14:25 UTC
(In reply to comment #10)
> There were some hang fixes that went in recently; do today's git bits still
> have this issue?

Things have definitely improved. The chip is far more stable and coming back from suspend hasn't yet resulted in a wedged GPU. Unfortunately, the problem still isn't completely resolved.

The alpha blending issue has recurred in one of the two times I've suspended so far. While immediately restarting compiz does not fix the issue, running metacity and then starting compiz seems to bring the chip back to a sane state.
Comment 12 Jesse Barnes 2009-09-02 09:56:03 UTC
Hm weird... sounds like we're missing some 3D state setup possibly, or there's some other display bit that controls rendering to alpha channels.  I'll look for it.
Comment 13 Ben Gamari 2009-09-16 06:21:40 UTC
This actually seems to have recently disappeared. Perhaps it was Ickle's relocation address verification patch. Who knows...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.