27883 – [page flipping] hangs on 965 and 945

Bug 27883 - [page flipping] hangs on 965 and 945

Summary: [page flipping] hangs on 965 and 945

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	high critical
Assignee:	Jesse Barnes
QA Contact:

URL:
Whiteboard:
Keywords:	NEEDINFO

Depends on:
Blocks:

Reported:	2010-04-29 01:12 UTC by Harald Judt
Modified:	2017-07-24 23:08 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
GPU dump frozen X (BZip2) (82.61 KB, application/octet-stream) 2010-04-29 01:14 UTC, Harald Judt	no flags	Details
GPU dump with mesa-7.8.1 (97.62 KB, application/octet-stream) 2010-04-29 06:35 UTC, Harald Judt	no flags	Details
disable page flipping but leave events (1007 bytes, patch) 2010-05-01 10:46 UTC, Jesse Barnes	no flags	Details \| Splinter Review
i915 error state (108.49 KB, application/x-bzip) 2010-05-12 08:14 UTC, Tomas M.	no flags	Details
intel gpu dump (83.44 KB, application/x-bzip) 2010-05-12 08:15 UTC, Tomas M.	no flags	Details
screen corruption with patches (634.73 KB, image/png) 2010-05-14 18:09 UTC, Tomas M.	no flags	Details
the rotated cube carries the broken background (285.86 KB, image/png) 2010-05-14 18:11 UTC, Tomas M.	no flags	Details
hydroxigen drum machine broke terribly (245.64 KB, image/png) 2010-05-14 18:12 UTC, Tomas M.	no flags	Details
GPU dump (98.34 KB, application/octet-stream) 2010-06-17 05:08 UTC, Harald Judt	no flags	Details
disable-page-flipping-leave-events for current xf86-video-intel git version. (977 bytes, patch) 2010-08-03 09:06 UTC, Harald Judt	no flags	Details \| Splinter Review
Updated patch to disable page flipping. (1.09 KB, patch) 2010-08-24 04:46 UTC, Harald Judt	no flags	Details \| Splinter Review
Contents of /sys/kernel/debug/dri/0/ when frozen (317.14 KB, application/octet-stream) 2010-09-05 01:14 UTC, Tassilo Horn	no flags	Details
Show Obsolete (2) View All

Description Harald Judt 2010-04-29 01:12:07 UTC

When using compiz window manager (0.8.4), the display freezes and I have to kill X and restart the machine. While it is possible to start X again, there will be graphical corruption when rotating the cube etc., making the system unusable. Compiz 0.8.6 shows render corruption right after start, so I can't even use it, but that's another problem which probably does not belong here.

I can move the mouse, VT switch and SSH still works after the freeze, so I've made a gpu dump while X was still running, maybe it helps? There's nothing helpful in dmesg or Xorg.log.

Hardware: Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) on a Lenovo Thinkpad T61.

I'll try mesa-7.8.1 instead of up-to-date git version, and report here if it works; I believe it does but I have to verify.

Comment 1 Harald Judt 2010-04-29 01:14:31 UTC

Created attachment 35328 [details]
GPU dump frozen X (BZip2)

Comment 2 Harald Judt 2010-04-29 06:35:12 UTC

Created attachment 35334 [details]
GPU dump with mesa-7.8.1

This time, with mesa-7.8.1 everything froze during cube rotation. SSH still functional, GPU dump attached.

Next step: I'm trying x11-drivers/xf86-video-intel-2.10.0 now, if this doesn't work I'll go back to mesa-7.7, which definitely worked.

Comment 3 Jesse Barnes 2010-05-01 10:46:59 UTC

Created attachment 35367 [details] [review]
disable page flipping but leave events

Does this patch work around the hang?

Comment 4 Harald Judt 2010-05-03 13:31:43 UTC

Yes, the patch seems to help (xf86-video-intel-2.11.0, mesa-7.8.1), no crashes so far.

Comment 5 Tomas M. 2010-05-08 08:16:43 UTC

i am suffering from a similar bug, better described here: https://bugs.freedesktop.org/show_bug.cgi?id=27420

the fix proposed there didnt help me.

disabling page flipping seems to fix it

mesa 7.8.1
xorg 1.8
xf86-video-intel 2.11
compiz enabled

Comment 6 Tomas M. 2010-05-08 10:22:46 UTC

and of course, i forgot:

intel GMA945 hardware.

Comment 7 Jesse Barnes 2010-05-10 10:24:56 UTC

There are a couple of other patches that might help.  Can you re-enable page flipping in your DDX driver and try these kernel patches?

https://patchwork.kernel.org/patch/90682/
https://patchwork.kernel.org/patch/90683/

Comment 8 Tomas M. 2010-05-10 12:50:40 UTC

(In reply to comment #7)
> There are a couple of other patches that might help.  Can you re-enable page
> flipping in your DDX driver and try these kernel patches?
> 
> https://patchwork.kernel.org/patch/90682/
> https://patchwork.kernel.org/patch/90683/

the first patch fails to apply to the 2.6.33 kernel series.

trying the second patch (will report back when done).

against what kernel should i apply the first patch?

Comment 9 Jesse Barnes 2010-05-10 13:10:29 UTC

The first patch just needs a little massaging; it should mostly apply to current kernels, but things have moved around a little.  I'm not sure how important it is in isolation though...

Comment 10 Tomas M. 2010-05-10 13:40:39 UTC

(In reply to comment #9)
> The first patch just needs a little massaging; it should mostly apply to
> current kernels, but things have moved around a little.  I'm not sure how
> important it is in isolation though...

the second patch broke the kernel (didnt allow it to modeset before it stack traced on me), kernel 2.6.33

gonna try and apply the first patch

Comment 11 Tomas M. 2010-05-10 14:26:50 UTC

the first patch fails to build. IS_GEN3 isnt defined.

drivers/gpu/drm/i915/i915_dma.c: In function ‘i915_load_modeset_init’:
drivers/gpu/drm/i915/i915_dma.c:1263:9: error: implicit declaration of function ‘IS_GEN3’
make[4]: *** [drivers/gpu/drm/i915/i915_dma.o] Error 1
make[3]: *** [drivers/gpu/drm/i915] Error 2
make[2]: *** [drivers/gpu/drm] Error 2
make[1]: *** [drivers/gpu] Error 2
make: *** [drivers] Error 2

Comment 12 Tomas M. 2010-05-11 15:42:59 UTC

after reading a bit, realized those macros were included in 2.6.34 (soon to be backported to 2.6.33.4)

got the first patch in, it appears to fix the redrawing problems concerning the terminal. but while typing this, i realized some characters fail to render still. (even if they are there).

glxgears closes to the 60ish framerate i should expect.


i got another kernel waiting which includes both patches, should i test that too?

Comment 13 Jesse Barnes 2010-05-11 15:48:41 UTC

Yes, I'm curious if both patches help even more

Comment 14 Tomas M. 2010-05-11 18:17:17 UTC

(In reply to comment #13)
> Yes, I'm curious if both patches help even more

both patches applied dont help more.

1st patch: (90682) helps somewhat.
1st patch + 2nd patch: == 1st patch only
2nd patch only: broken kernel (??)

Comment 15 Tomas M. 2010-05-12 04:57:09 UTC

extra info:

with time, the redrawing issues seem to get worse, until a gpu hang.

im mounting the debug fs, will collect the error state and will add a gpu dump when i get one. im not sure if this should be filed as a different bug. just let me know.

Comment 16 Tomas M. 2010-05-12 08:14:20 UTC

Created attachment 35595 [details]
i915 error state

Comment 17 Tomas M. 2010-05-12 08:15:01 UTC

Created attachment 35596 [details]
intel gpu dump

Comment 18 Tomas M. 2010-05-13 04:09:26 UTC

it seems ive been suffering from a multiple bug syndrome.

both patches applied + patch on comment 12 from here: https://bugs.freedesktop.org/show_bug.cgi?id=27883 seems to fix all the rendering corruptions ive been having.

i had already tested that patch before without your patches and the issues still remained.

im not sure if your second patch is of some use, will most likely test this during the weekend.

the hang issue might still be there, will report back if i suffer another one.

Comment 19 Tomas M. 2010-05-14 18:09:41 UTC

Created attachment 35661 [details]
screen corruption with patches

corrupted screen after a several hours of usage + suspend/resume cycles, fist patch + mesa patch, kernel 2.6.33.4

Comment 20 Tomas M. 2010-05-14 18:11:31 UTC

Created attachment 35662 [details]
the rotated cube carries the broken background

the corrupted background gets moved with the cube

Comment 21 Tomas M. 2010-05-14 18:12:58 UTC

Created attachment 35663 [details]
hydroxigen drum machine broke terribly

title says it all.

Comment 22 Ivan Bulatovic 2010-05-15 03:39:10 UTC

(In reply to comment #13)
> Yes, I'm curious if both patches help even more

Applied both patches on 2.6.34-rc6 and the problem remains (without mesa fix btw).

uname -a
Linux vostro 2.6.34-rc6-RC #2 SMP PREEMPT Sat May 15 12:12:50 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T5670 @ 1.80GHz GenuineIntel GNU/Linux

cat /var/log/Xorg.0.log | grep flip
[    30.960] (II) intel(0): Kernel page flipping support detected, enabling

I have the same setup as Tomas regarding packages, 965GM chipset...

Comment 23 Ivan Bulatovic 2010-05-15 04:17:23 UTC

Selecting Indirect Rendering in Fusion Icon (or with --indirect-rendering switch) everything works great. This applies for both clean 2.6.34-rc7 and patched 2.6.34-rc6 with both of the above patches.

Comment 24 Tomas M. 2010-05-15 16:20:30 UTC

(In reply to comment #23)
> Selecting Indirect Rendering in Fusion Icon (or with --indirect-rendering
> switch) everything works great. This applies for both clean 2.6.34-rc7 and
> patched 2.6.34-rc6 with both of the above patches.

yes, indirect rendering seems to fix redrawing issues.

anyway, im being hit by so many bugs in this release i dont know how to debug them all :(

now when using a dual screen setup (same system) external monitor pluged in, even if the screen is setup correctly (one display below the other one), the laptop LVDS display appears from 0x0 instad of 0x1050. (mouse cursor can move from one screen to the next. but whats shown in the LVDS display is the same as what appears in the VGA screen.

Comment 25 Chris Wilson 2010-05-29 03:00:52 UTC

Can you please try this:

commit 44d45d3fa56f121ce89ffe5b28beb48be01a95df
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat May 29 10:39:28 2010 +0100

    dri: Use size from backing pixmap when creating buffers.
    
    This avoid using the garbage values stored in the Screen drawable,
    instead of the true values which are only maintained in its backing
    pixmap. The consequence of using the wrong size was to hand a 1x1
    pixmap to metacity/mutter and have it believe it was a full screen
    drawable; GPU hangs ensued if using page flipping.

With this fix, I no longer get an immediate hang...

Comment 26 Tomas M. 2010-05-29 06:26:40 UTC

(In reply to comment #25)
> Can you please try this:
> 
> commit 44d45d3fa56f121ce89ffe5b28beb48be01a95df
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sat May 29 10:39:28 2010 +0100
> 
>     dri: Use size from backing pixmap when creating buffers.
> 
>     This avoid using the garbage values stored in the Screen drawable,
>     instead of the true values which are only maintained in its backing
>     pixmap. The consequence of using the wrong size was to hand a 1x1
>     pixmap to metacity/mutter and have it believe it was a full screen
>     drawable; GPU hangs ensued if using page flipping.
> 
> With this fix, I no longer get an immediate hang...

tested it out, i dont see any improvements here.

xf86-video-intel 2.11 + commit only.
mesa 7.8.1
xorg 1.8

to sum up:

*  --indirect-rendering fixes redrawing issues, otherwise the screen fails to update correctly.

*  sporadic gpu hangs (got several dumps if you need them)


a sure way to find trouble is to test the app "hydrogen" which can either hang the gpu, or draw grable on the screen. 100% of tests.

Comment 27 Chris Wilson 2010-05-29 06:37:26 UTC

(In reply to comment #26)
> xf86-video-intel 2.11 + commit only.

2.11 has quite a few page-flipping bugs ;-) But this patch is incorrect and I've had to back it out of the tree.

> a sure way to find trouble is to test the app "hydrogen" which can either hang
> the gpu, or draw grable on the screen. 100% of tests.

One bug at a time -- it is quite likely that hydrogen is triggering many other bugs unrelated to page-flipping, as well as perhaps tripping up on page-flipping. :(

Comment 28 Jesse Barnes 2010-06-01 13:51:40 UTC

Fix committed.  Along with the original e2615cdeef078dbd2e834b68c437f098a92b941d this should fix the hangs.

commit f2272402035574c206a0e3383c55373c440fd928
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Jun 1 13:46:15 2010 -0700

    DRI2: fix new buffer exchange check

Comment 29 Harald Judt 2010-06-17 05:08:24 UTC

Created attachment 36327 [details]
GPU dump

While it is MUCH better now, hangs still occur from time to time. I've attached a GPU dump generated when the screen is frozen, maybe it helps. One can still move around the mouse and switch away to a vt and kill X from there.

Using current libdrm, mesa and xf86-video-intel.

Comment 30 Jesse Barnes 2010-06-17 09:26:43 UTC

With the latest bits do you still see hangs when you disable page flipping (see "disable page flipping but leave events" attachment)?

If so, sounds like you're seeing a different GPU hang.  If not, are you using the latest kernel from drm-intel-next?  It has a few fixes for flipping as well...

Comment 31 Harald Judt 2010-08-03 09:03:55 UTC

With the latest kernel 2.6.35 and up-to-date packages X still hangs after some time when rotating the cube.

With page flipping disabled, there is quite a noticeable performance drop. Therefore, I reopen this bug.

Comment 32 Harald Judt 2010-08-03 09:06:14 UTC

Created attachment 37546 [details] [review]
disable-page-flipping-leave-events for current xf86-video-intel git version.

I have modified the patch from comment #3 so that it applies on current xf86-video-intel git. It makes X stable again, though it is slower.

Any ideas?

Comment 33 Chris Wilson 2010-08-08 06:48:57 UTC

Let's have a fresh set of debug info to double check that we are seeing the same hang. cat /sys/kernel/debug/dri/0/(i915_error_state|i915_gem_interrupts) and dmesg.

Comment 34 Jesse Barnes 2010-08-10 14:22:00 UTC

Harald, any chance you can get the dump Chris requested so we can confirm this one?

Comment 35 Harald Judt 2010-08-11 07:57:06 UTC

Sorry for the late response, I got some work to do.

cat i915_error_state:
no error state collected

cat 0/i915_gem_interrupt:
Interrupt enable:    00028c53
Interrupt identity:  00000000
Interrupt mask:      fffc53ae
Pipe A stat:         00440303
Pipe B stat:         00400000
Interrupts received: 1358161
Current sequence:    3465346
Waiter sequence:     0
IRQ sequence:        0

I've collected all files in the /sys/kernel/debug/dri/0 and /sys/kernel/debug/dri/64 directory at the time of the hang, so if you need something else from there...

cat /sys/kernel/debug/dri/64/vma resulted in "Cannot allocate memory."

Comment 36 Chris Wilson 2010-08-11 14:45:38 UTC

(In reply to comment #35)
> cat i915_error_state:
> no error state collected
> 
> cat 0/i915_gem_interrupt:
> Waiter sequence:     0
> IRQ sequence:        0

This doesn't match my expectations for a classic invalid GPU batchbuffer nor for a missed user interrupt. So its not quite the same as the last hang that bit you, but it does still smell page-flip related.

[Too tired at the moment to think what the best debugging approach would be to confirm that.]

> cat /sys/kernel/debug/dri/64/vma resulted in "Cannot allocate memory."

This is a whole new level of worry. And introduces the spectre that you actually have a buffer leak (or texture leak among your applications) and we are hit allocation failure within any of the many paths that could be related to the hang.

In short, thanks for info, it raises more questions than it answers.

Comment 37 Harald Judt 2010-08-24 04:46:30 UTC

Created attachment 38123 [details] [review]
Updated patch to disable page flipping.

Maybe. I've tried again using current git, here's dmesg though I don't expect it is very useful.

If it was not related to page flipping, then I would have the same issues when turning it off, right? But it never hangs when it is deactivated.

Maybe the allocation problem is related to compiz leaking textures. I'm using compiz-0.8.4 with emerald and sometimes, most probably after hibernation/resume, some windows' decorations are completely white. I then restart emerald / compiz, and everything's ok again.

cat: page allocation failure. order:8, mode:0x40d0
Pid: 9197, comm: cat Not tainted 2.6.35-tuxonice #6
Call Trace:
 [<ffffffff81088da0>] __alloc_pages_nodemask+0x695/0x6df
 [<ffffffff81088dfc>] __get_free_pages+0x12/0x4f
 [<ffffffff810af026>] __kmalloc+0x38/0xb0
 [<ffffffff810cbc6a>] seq_read+0x1d1/0x32b
 [<ffffffff8102d1ff>] ? get_parent_ip+0x11/0x41
 [<ffffffff810cba99>] ? seq_read+0x0/0x32b
 [<ffffffff810f2f37>] proc_reg_read+0x8d/0xac
 [<ffffffff810b432c>] vfs_read+0xa2/0xdf
 [<ffffffff810b441f>] sys_read+0x45/0x69
 [<ffffffff810029ab>] system_call_fastpath+0x16/0x1b
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
active_anon:92421 inactive_anon:37235 isolated_anon:0
 active_file:88548 inactive_file:147201 isolated_file:0
 unevictable:34401 dirty:3 writeback:4 unstable:0
 free:61057 slab_reclaimable:30181 slab_unreclaimable:4542
 mapped:59462 shmem:18097 pagetables:7027 bounce:0
DMA free:7972kB min:44kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:148kB inactive_file:7316kB unevictable:108kB isolated(anon):0kB isolated(file):0kB present:15760kB mlocked:108kB dirty:0kB writeback:0kB mapped:132kB shmem:0kB slab_reclaimable:296kB slab_unreclaimable:112kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1979 1979 1979
DMA32 free:236256kB min:5668kB low:7084kB high:8500kB active_anon:369684kB inactive_anon:148940kB active_file:354044kB inactive_file:581488kB unevictable:137496kB isolated(anon):0kB isolated(file):0kB present:2026752kB mlocked:137496kB dirty:12kB writeback:16kB mapped:237716kB shmem:72388kB slab_reclaimable:120428kB slab_unreclaimable:18056kB kernel_stack:2480kB pagetables:28108kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 7*8kB 6*16kB 4*32kB 6*64kB 3*128kB 5*256kB 5*512kB 1*1024kB 1*2048kB 0*4096kB = 7972kB
DMA32: 3816*4kB 4240*8kB 2266*16kB 1511*32kB 809*64kB 286*128kB 49*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 236256kB
291312 total pagecache pages
3085 pages in swap cache
Swap cache stats: add 96371, delete 93286, find 537480/543604
Free swap  = 2082012kB
Total swap = 2096444kB
517807 pages RAM
9658 pages reserved
340038 pages shared
182906 pages non-shared
cat: page allocation failure. order:8, mode:0x40d0
Pid: 9667, comm: cat Not tainted 2.6.35-tuxonice #6
Call Trace:
 [<ffffffff81088da0>] __alloc_pages_nodemask+0x695/0x6df
 [<ffffffff81088dfc>] __get_free_pages+0x12/0x4f
 [<ffffffff810af026>] __kmalloc+0x38/0xb0
 [<ffffffff810cbc6a>] seq_read+0x1d1/0x32b
 [<ffffffff8102d1ff>] ? get_parent_ip+0x11/0x41
 [<ffffffff810cba99>] ? seq_read+0x0/0x32b
 [<ffffffff810f2f37>] proc_reg_read+0x8d/0xac
 [<ffffffff810b432c>] vfs_read+0xa2/0xdf
 [<ffffffff810b441f>] sys_read+0x45/0x69
 [<ffffffff810029ab>] system_call_fastpath+0x16/0x1b
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
active_anon:78623 inactive_anon:50745 isolated_anon:77
 active_file:71169 inactive_file:123443 isolated_file:157
 unevictable:34401 dirty:9 writeback:17499 unstable:0
 free:106379 slab_reclaimable:25112 slab_unreclaimable:5342
 mapped:55168 shmem:17728 pagetables:7159 bounce:0
DMA free:7972kB min:44kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:148kB inactive_file:7316kB unevictable:108kB isolated(anon):0kB isolated(file):0kB present:15760kB mlocked:108kB dirty:0kB writeback:0kB mapped:132kB shmem:0kB slab_reclaimable:296kB slab_unreclaimable:112kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1979 1979 1979
DMA32 free:417544kB min:5668kB low:7084kB high:8500kB active_anon:314492kB inactive_anon:202980kB active_file:284528kB inactive_file:486456kB unevictable:137496kB isolated(anon):308kB isolated(file):628kB present:2026752kB mlocked:137496kB dirty:36kB writeback:69996kB mapped:220540kB shmem:70912kB slab_reclaimable:100152kB slab_unreclaimable:21256kB kernel_stack:2464kB pagetables:28636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 7*8kB 6*16kB 4*32kB 6*64kB 3*128kB 5*256kB 5*512kB 1*1024kB 1*2048kB 0*4096kB = 7972kB
DMA32: 11267*4kB 9600*8kB 5924*16kB 2422*32kB 1054*64kB 318*128kB 55*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 417420kB
268363 total pagecache pages
21500 pages in swap cache
Swap cache stats: add 115198, delete 93698, find 538928/545059
Free swap  = 2007544kB
Total swap = 2096444kB
517807 pages RAM
9658 pages reserved
292761 pages shared
173951 pages non-shared

Comment 38 Harald Judt 2010-08-24 08:23:00 UTC

grep $(pgrep X) /proc/dri/0/vma | wc -l
36203
...
37988

The number does never decrease.

Comment 39 Tassilo Horn 2010-09-05 01:10:25 UTC

Hi!  I own the same laptop as Harald with the same video card: VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)

I also face random display freezes that I can only "fix" by killing and restarting X.  I use KDE 4.5, and if desktop effects are enabled, I can be sure to reproduce a freeze in less than 3 minutes.

I run a 2.6.35.5 kernel and the git master HEADs of libdrm, mesa, xf86-video-intel, and xorg-server, and I'm happy to try out anything that might help to get that bug fixed.  

(Since the release of the intel driver version 2.11.0 my system is unstable.  2.1{1,2}.0 give me hard lockups of the whole system, while the git versions only produce display freezes.  Kinda sad that I recognize that as a huge improvement...)

Ok, enough wining.  Now I desperately enabled desktop effects to get another freeze, and when that happened, dmesg printed these error messages:

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17559 at 17552)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17567 at 17562)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17574 at 17569)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17581 at 17576)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

I attach a tarball with the contents of /sys/kernel/debug/dri/0/.  The i915_error_state contains information, and maybe the other files are useful, too.

(BTW: I have ../dri/0/ and ../dri/64/.  Diffing the dirs shows only a difference in the name file where the 0 one has "i915 0000:00:02.0 pci:0000:00:02.0" and the 64 one is empty.  So why are there two directories?)

Comment 40 Tassilo Horn 2010-09-05 01:14:12 UTC

Created attachment 38447 [details]
Contents of /sys/kernel/debug/dri/0/ when frozen

Comment 41 Chris Wilson 2010-09-06 10:38:09 UTC

In http://cgit.freedesktop.org/~ickle/drm-intel/log/ drm-intel-fixes is a patch that should detect (and fixup) one observed instance of missed interrupt -> hang:

http://cgit.freedesktop.org/~ickle/drm-intel/commit/?id=11bb865c00bf90fa10e5fde58841c37b79683881

and drm-intel-next adds a check to hangcheck that should also break any hangs (but probably wont prevent the stutter).

Comment 42 Tassilo Horn 2010-09-06 13:15:21 UTC

(In reply to comment #41)
> In http://cgit.freedesktop.org/~ickle/drm-intel/log/ drm-intel-fixes is a patch
> that should detect (and fixup) one observed instance of missed interrupt ->
> hang:
> 
> http://cgit.freedesktop.org/~ickle/drm-intel/commit/?id=11bb865c00bf90fa10e5fde58841c37b79683881
> 
> and drm-intel-next adds a check to hangcheck that should also break any hangs
> (but probably wont prevent the stutter).

The patch didn't apply completely against my 2.6.35.5 kernel, but I managed to manually perform the changes that failed (hopefully).  It compiled, and since I'm running the patched kernel I had no freeze anymore.  But I only did some brief testing.  I'll report back when I had it running for some time...

Comment 43 Tassilo Horn 2010-09-07 11:37:48 UTC

(In reply to comment #41)
> In http://cgit.freedesktop.org/~ickle/drm-intel/log/ drm-intel-fixes is a patch
> that should detect (and fixup) one observed instance of missed interrupt ->
> hang:

Now I'm using the drm-intel-fixes branch oy your kernel, and since that I did not have any freezes anymore.  I do have some other issues now like artifacts, invisible context menus and borked thumbnails with kwin, but that may also be caused by my downgrade of all X stuff to the latest releases shipped by gentoo.

Comment 44 Chris Wilson 2010-09-07 11:42:48 UTC

Thanks for testing! The kwin artifacts do sound more like issues in our GL/X drivers. One step at a time...

Comment 45 Tassilo Horn 2010-09-07 23:25:03 UTC

(In reply to comment #44)
> Thanks for testing!

I'm happy to do so.  In the meantime, I had some 1 or 2 second hangs, but still no freeze.  Into which official kernel will these fixes be introduced?

> The kwin artifacts do sound more like issues in our GL/X
> drivers. One step at a time...

Sure.

Comment 46 Chris Wilson 2010-09-08 00:45:51 UTC

I didn't mark the page-flip freeze detect patch as for-stable (as I didn't think it would apply to earlier kernels without porting) so that will be first available in 2.6.36. The rest of the patches were candidates for-stable and have included several page-flipping hang fixes in the past.

Can you file a fresh bug for the kwin stutters and attach Xorg.0.log and dmesg [drm.debug=0xe] for that period? I'm suspect they may be related to bug 30073 but it is too early to tell.

Comment 47 Harald Judt 2010-09-08 02:12:41 UTC

I also applied the patch to 2.6.36-rc3 (where parts of it have already been applied?) and the issue did not occur any more. I still had memory leaks, though.

I'm afraid I cannot follow this issue any more, because I got a new laptop now which doesn't have an intel gpu. I was quite amazed about the performance and functionality improvements of the intel driver, though.

Thanks for your help.

Comment 48 Jesse Barnes 2010-09-10 12:57:15 UTC

I think the hang itself is fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.