Summary: | [page flipping] hangs on 965 and 945 | ||
---|---|---|---|
Product: | DRI | Reporter: | Harald Judt <h.judt> |
Component: | DRM/Intel | Assignee: | Jesse Barnes <jbarnes> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | critical | ||
Priority: | high | CC: | chris, mrgrim, tmezzadra |
Version: | XOrg git | Keywords: | NEEDINFO |
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Harald Judt
2010-04-29 01:12:07 UTC
Created attachment 35328 [details]
GPU dump frozen X (BZip2)
Created attachment 35334 [details]
GPU dump with mesa-7.8.1
This time, with mesa-7.8.1 everything froze during cube rotation. SSH still functional, GPU dump attached.
Next step: I'm trying x11-drivers/xf86-video-intel-2.10.0 now, if this doesn't work I'll go back to mesa-7.7, which definitely worked.
Created attachment 35367 [details] [review] disable page flipping but leave events Does this patch work around the hang? Yes, the patch seems to help (xf86-video-intel-2.11.0, mesa-7.8.1), no crashes so far. i am suffering from a similar bug, better described here: https://bugs.freedesktop.org/show_bug.cgi?id=27420 the fix proposed there didnt help me. disabling page flipping seems to fix it mesa 7.8.1 xorg 1.8 xf86-video-intel 2.11 compiz enabled and of course, i forgot: intel GMA945 hardware. There are a couple of other patches that might help. Can you re-enable page flipping in your DDX driver and try these kernel patches? https://patchwork.kernel.org/patch/90682/ https://patchwork.kernel.org/patch/90683/ (In reply to comment #7) > There are a couple of other patches that might help. Can you re-enable page > flipping in your DDX driver and try these kernel patches? > > https://patchwork.kernel.org/patch/90682/ > https://patchwork.kernel.org/patch/90683/ the first patch fails to apply to the 2.6.33 kernel series. trying the second patch (will report back when done). against what kernel should i apply the first patch? The first patch just needs a little massaging; it should mostly apply to current kernels, but things have moved around a little. I'm not sure how important it is in isolation though... (In reply to comment #9) > The first patch just needs a little massaging; it should mostly apply to > current kernels, but things have moved around a little. I'm not sure how > important it is in isolation though... the second patch broke the kernel (didnt allow it to modeset before it stack traced on me), kernel 2.6.33 gonna try and apply the first patch the first patch fails to build. IS_GEN3 isnt defined. drivers/gpu/drm/i915/i915_dma.c: In function ‘i915_load_modeset_init’: drivers/gpu/drm/i915/i915_dma.c:1263:9: error: implicit declaration of function ‘IS_GEN3’ make[4]: *** [drivers/gpu/drm/i915/i915_dma.o] Error 1 make[3]: *** [drivers/gpu/drm/i915] Error 2 make[2]: *** [drivers/gpu/drm] Error 2 make[1]: *** [drivers/gpu] Error 2 make: *** [drivers] Error 2 after reading a bit, realized those macros were included in 2.6.34 (soon to be backported to 2.6.33.4) got the first patch in, it appears to fix the redrawing problems concerning the terminal. but while typing this, i realized some characters fail to render still. (even if they are there). glxgears closes to the 60ish framerate i should expect. i got another kernel waiting which includes both patches, should i test that too? Yes, I'm curious if both patches help even more (In reply to comment #13) > Yes, I'm curious if both patches help even more both patches applied dont help more. 1st patch: (90682) helps somewhat. 1st patch + 2nd patch: == 1st patch only 2nd patch only: broken kernel (??) extra info: with time, the redrawing issues seem to get worse, until a gpu hang. im mounting the debug fs, will collect the error state and will add a gpu dump when i get one. im not sure if this should be filed as a different bug. just let me know. Created attachment 35595 [details]
i915 error state
Created attachment 35596 [details]
intel gpu dump
it seems ive been suffering from a multiple bug syndrome. both patches applied + patch on comment 12 from here: https://bugs.freedesktop.org/show_bug.cgi?id=27883 seems to fix all the rendering corruptions ive been having. i had already tested that patch before without your patches and the issues still remained. im not sure if your second patch is of some use, will most likely test this during the weekend. the hang issue might still be there, will report back if i suffer another one. Created attachment 35661 [details]
screen corruption with patches
corrupted screen after a several hours of usage + suspend/resume cycles, fist patch + mesa patch, kernel 2.6.33.4
Created attachment 35662 [details]
the rotated cube carries the broken background
the corrupted background gets moved with the cube
Created attachment 35663 [details]
hydroxigen drum machine broke terribly
title says it all.
(In reply to comment #13) > Yes, I'm curious if both patches help even more Applied both patches on 2.6.34-rc6 and the problem remains (without mesa fix btw). uname -a Linux vostro 2.6.34-rc6-RC #2 SMP PREEMPT Sat May 15 12:12:50 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T5670 @ 1.80GHz GenuineIntel GNU/Linux cat /var/log/Xorg.0.log | grep flip [ 30.960] (II) intel(0): Kernel page flipping support detected, enabling I have the same setup as Tomas regarding packages, 965GM chipset... Selecting Indirect Rendering in Fusion Icon (or with --indirect-rendering switch) everything works great. This applies for both clean 2.6.34-rc7 and patched 2.6.34-rc6 with both of the above patches. (In reply to comment #23) > Selecting Indirect Rendering in Fusion Icon (or with --indirect-rendering > switch) everything works great. This applies for both clean 2.6.34-rc7 and > patched 2.6.34-rc6 with both of the above patches. yes, indirect rendering seems to fix redrawing issues. anyway, im being hit by so many bugs in this release i dont know how to debug them all :( now when using a dual screen setup (same system) external monitor pluged in, even if the screen is setup correctly (one display below the other one), the laptop LVDS display appears from 0x0 instad of 0x1050. (mouse cursor can move from one screen to the next. but whats shown in the LVDS display is the same as what appears in the VGA screen. Can you please try this: commit 44d45d3fa56f121ce89ffe5b28beb48be01a95df Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat May 29 10:39:28 2010 +0100 dri: Use size from backing pixmap when creating buffers. This avoid using the garbage values stored in the Screen drawable, instead of the true values which are only maintained in its backing pixmap. The consequence of using the wrong size was to hand a 1x1 pixmap to metacity/mutter and have it believe it was a full screen drawable; GPU hangs ensued if using page flipping. With this fix, I no longer get an immediate hang... (In reply to comment #25) > Can you please try this: > > commit 44d45d3fa56f121ce89ffe5b28beb48be01a95df > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Sat May 29 10:39:28 2010 +0100 > > dri: Use size from backing pixmap when creating buffers. > > This avoid using the garbage values stored in the Screen drawable, > instead of the true values which are only maintained in its backing > pixmap. The consequence of using the wrong size was to hand a 1x1 > pixmap to metacity/mutter and have it believe it was a full screen > drawable; GPU hangs ensued if using page flipping. > > With this fix, I no longer get an immediate hang... tested it out, i dont see any improvements here. xf86-video-intel 2.11 + commit only. mesa 7.8.1 xorg 1.8 to sum up: * --indirect-rendering fixes redrawing issues, otherwise the screen fails to update correctly. * sporadic gpu hangs (got several dumps if you need them) a sure way to find trouble is to test the app "hydrogen" which can either hang the gpu, or draw grable on the screen. 100% of tests. (In reply to comment #26) > xf86-video-intel 2.11 + commit only. 2.11 has quite a few page-flipping bugs ;-) But this patch is incorrect and I've had to back it out of the tree. > a sure way to find trouble is to test the app "hydrogen" which can either hang > the gpu, or draw grable on the screen. 100% of tests. One bug at a time -- it is quite likely that hydrogen is triggering many other bugs unrelated to page-flipping, as well as perhaps tripping up on page-flipping. :( Fix committed. Along with the original e2615cdeef078dbd2e834b68c437f098a92b941d this should fix the hangs. commit f2272402035574c206a0e3383c55373c440fd928 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Tue Jun 1 13:46:15 2010 -0700 DRI2: fix new buffer exchange check Created attachment 36327 [details]
GPU dump
While it is MUCH better now, hangs still occur from time to time. I've attached a GPU dump generated when the screen is frozen, maybe it helps. One can still move around the mouse and switch away to a vt and kill X from there.
Using current libdrm, mesa and xf86-video-intel.
With the latest bits do you still see hangs when you disable page flipping (see "disable page flipping but leave events" attachment)? If so, sounds like you're seeing a different GPU hang. If not, are you using the latest kernel from drm-intel-next? It has a few fixes for flipping as well... With the latest kernel 2.6.35 and up-to-date packages X still hangs after some time when rotating the cube. With page flipping disabled, there is quite a noticeable performance drop. Therefore, I reopen this bug. Created attachment 37546 [details] [review] disable-page-flipping-leave-events for current xf86-video-intel git version. I have modified the patch from comment #3 so that it applies on current xf86-video-intel git. It makes X stable again, though it is slower. Any ideas? Let's have a fresh set of debug info to double check that we are seeing the same hang. cat /sys/kernel/debug/dri/0/(i915_error_state|i915_gem_interrupts) and dmesg. Harald, any chance you can get the dump Chris requested so we can confirm this one? Sorry for the late response, I got some work to do. cat i915_error_state: no error state collected cat 0/i915_gem_interrupt: Interrupt enable: 00028c53 Interrupt identity: 00000000 Interrupt mask: fffc53ae Pipe A stat: 00440303 Pipe B stat: 00400000 Interrupts received: 1358161 Current sequence: 3465346 Waiter sequence: 0 IRQ sequence: 0 I've collected all files in the /sys/kernel/debug/dri/0 and /sys/kernel/debug/dri/64 directory at the time of the hang, so if you need something else from there... cat /sys/kernel/debug/dri/64/vma resulted in "Cannot allocate memory." (In reply to comment #35) > cat i915_error_state: > no error state collected > > cat 0/i915_gem_interrupt: > Waiter sequence: 0 > IRQ sequence: 0 This doesn't match my expectations for a classic invalid GPU batchbuffer nor for a missed user interrupt. So its not quite the same as the last hang that bit you, but it does still smell page-flip related. [Too tired at the moment to think what the best debugging approach would be to confirm that.] > cat /sys/kernel/debug/dri/64/vma resulted in "Cannot allocate memory." This is a whole new level of worry. And introduces the spectre that you actually have a buffer leak (or texture leak among your applications) and we are hit allocation failure within any of the many paths that could be related to the hang. In short, thanks for info, it raises more questions than it answers. Created attachment 38123 [details] [review] Updated patch to disable page flipping. Maybe. I've tried again using current git, here's dmesg though I don't expect it is very useful. If it was not related to page flipping, then I would have the same issues when turning it off, right? But it never hangs when it is deactivated. Maybe the allocation problem is related to compiz leaking textures. I'm using compiz-0.8.4 with emerald and sometimes, most probably after hibernation/resume, some windows' decorations are completely white. I then restart emerald / compiz, and everything's ok again. cat: page allocation failure. order:8, mode:0x40d0 Pid: 9197, comm: cat Not tainted 2.6.35-tuxonice #6 Call Trace: [<ffffffff81088da0>] __alloc_pages_nodemask+0x695/0x6df [<ffffffff81088dfc>] __get_free_pages+0x12/0x4f [<ffffffff810af026>] __kmalloc+0x38/0xb0 [<ffffffff810cbc6a>] seq_read+0x1d1/0x32b [<ffffffff8102d1ff>] ? get_parent_ip+0x11/0x41 [<ffffffff810cba99>] ? seq_read+0x0/0x32b [<ffffffff810f2f37>] proc_reg_read+0x8d/0xac [<ffffffff810b432c>] vfs_read+0xa2/0xdf [<ffffffff810b441f>] sys_read+0x45/0x69 [<ffffffff810029ab>] system_call_fastpath+0x16/0x1b Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 active_anon:92421 inactive_anon:37235 isolated_anon:0 active_file:88548 inactive_file:147201 isolated_file:0 unevictable:34401 dirty:3 writeback:4 unstable:0 free:61057 slab_reclaimable:30181 slab_unreclaimable:4542 mapped:59462 shmem:18097 pagetables:7027 bounce:0 DMA free:7972kB min:44kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:148kB inactive_file:7316kB unevictable:108kB isolated(anon):0kB isolated(file):0kB present:15760kB mlocked:108kB dirty:0kB writeback:0kB mapped:132kB shmem:0kB slab_reclaimable:296kB slab_unreclaimable:112kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1979 1979 1979 DMA32 free:236256kB min:5668kB low:7084kB high:8500kB active_anon:369684kB inactive_anon:148940kB active_file:354044kB inactive_file:581488kB unevictable:137496kB isolated(anon):0kB isolated(file):0kB present:2026752kB mlocked:137496kB dirty:12kB writeback:16kB mapped:237716kB shmem:72388kB slab_reclaimable:120428kB slab_unreclaimable:18056kB kernel_stack:2480kB pagetables:28108kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 3*4kB 7*8kB 6*16kB 4*32kB 6*64kB 3*128kB 5*256kB 5*512kB 1*1024kB 1*2048kB 0*4096kB = 7972kB DMA32: 3816*4kB 4240*8kB 2266*16kB 1511*32kB 809*64kB 286*128kB 49*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 236256kB 291312 total pagecache pages 3085 pages in swap cache Swap cache stats: add 96371, delete 93286, find 537480/543604 Free swap = 2082012kB Total swap = 2096444kB 517807 pages RAM 9658 pages reserved 340038 pages shared 182906 pages non-shared cat: page allocation failure. order:8, mode:0x40d0 Pid: 9667, comm: cat Not tainted 2.6.35-tuxonice #6 Call Trace: [<ffffffff81088da0>] __alloc_pages_nodemask+0x695/0x6df [<ffffffff81088dfc>] __get_free_pages+0x12/0x4f [<ffffffff810af026>] __kmalloc+0x38/0xb0 [<ffffffff810cbc6a>] seq_read+0x1d1/0x32b [<ffffffff8102d1ff>] ? get_parent_ip+0x11/0x41 [<ffffffff810cba99>] ? seq_read+0x0/0x32b [<ffffffff810f2f37>] proc_reg_read+0x8d/0xac [<ffffffff810b432c>] vfs_read+0xa2/0xdf [<ffffffff810b441f>] sys_read+0x45/0x69 [<ffffffff810029ab>] system_call_fastpath+0x16/0x1b Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 active_anon:78623 inactive_anon:50745 isolated_anon:77 active_file:71169 inactive_file:123443 isolated_file:157 unevictable:34401 dirty:9 writeback:17499 unstable:0 free:106379 slab_reclaimable:25112 slab_unreclaimable:5342 mapped:55168 shmem:17728 pagetables:7159 bounce:0 DMA free:7972kB min:44kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:148kB inactive_file:7316kB unevictable:108kB isolated(anon):0kB isolated(file):0kB present:15760kB mlocked:108kB dirty:0kB writeback:0kB mapped:132kB shmem:0kB slab_reclaimable:296kB slab_unreclaimable:112kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1979 1979 1979 DMA32 free:417544kB min:5668kB low:7084kB high:8500kB active_anon:314492kB inactive_anon:202980kB active_file:284528kB inactive_file:486456kB unevictable:137496kB isolated(anon):308kB isolated(file):628kB present:2026752kB mlocked:137496kB dirty:36kB writeback:69996kB mapped:220540kB shmem:70912kB slab_reclaimable:100152kB slab_unreclaimable:21256kB kernel_stack:2464kB pagetables:28636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 3*4kB 7*8kB 6*16kB 4*32kB 6*64kB 3*128kB 5*256kB 5*512kB 1*1024kB 1*2048kB 0*4096kB = 7972kB DMA32: 11267*4kB 9600*8kB 5924*16kB 2422*32kB 1054*64kB 318*128kB 55*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 417420kB 268363 total pagecache pages 21500 pages in swap cache Swap cache stats: add 115198, delete 93698, find 538928/545059 Free swap = 2007544kB Total swap = 2096444kB 517807 pages RAM 9658 pages reserved 292761 pages shared 173951 pages non-shared grep $(pgrep X) /proc/dri/0/vma | wc -l 36203 ... 37988 The number does never decrease. Hi! I own the same laptop as Harald with the same video card: VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) I also face random display freezes that I can only "fix" by killing and restarting X. I use KDE 4.5, and if desktop effects are enabled, I can be sure to reproduce a freeze in less than 3 minutes. I run a 2.6.35.5 kernel and the git master HEADs of libdrm, mesa, xf86-video-intel, and xorg-server, and I'm happy to try out anything that might help to get that bug fixed. (Since the release of the intel driver version 2.11.0 my system is unstable. 2.1{1,2}.0 give me hard lockups of the whole system, while the git versions only produce display freezes. Kinda sad that I recognize that as a huge improvement...) Ok, enough wining. Now I desperately enabled desktop effects to get another freeze, and when that happened, dmesg printed these error messages: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17559 at 17552) [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17567 at 17562) [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17574 at 17569) [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 17581 at 17576) [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung I attach a tarball with the contents of /sys/kernel/debug/dri/0/. The i915_error_state contains information, and maybe the other files are useful, too. (BTW: I have ../dri/0/ and ../dri/64/. Diffing the dirs shows only a difference in the name file where the 0 one has "i915 0000:00:02.0 pci:0000:00:02.0" and the 64 one is empty. So why are there two directories?) Created attachment 38447 [details]
Contents of /sys/kernel/debug/dri/0/ when frozen
In http://cgit.freedesktop.org/~ickle/drm-intel/log/ drm-intel-fixes is a patch that should detect (and fixup) one observed instance of missed interrupt -> hang: http://cgit.freedesktop.org/~ickle/drm-intel/commit/?id=11bb865c00bf90fa10e5fde58841c37b79683881 and drm-intel-next adds a check to hangcheck that should also break any hangs (but probably wont prevent the stutter). (In reply to comment #41) > In http://cgit.freedesktop.org/~ickle/drm-intel/log/ drm-intel-fixes is a patch > that should detect (and fixup) one observed instance of missed interrupt -> > hang: > > http://cgit.freedesktop.org/~ickle/drm-intel/commit/?id=11bb865c00bf90fa10e5fde58841c37b79683881 > > and drm-intel-next adds a check to hangcheck that should also break any hangs > (but probably wont prevent the stutter). The patch didn't apply completely against my 2.6.35.5 kernel, but I managed to manually perform the changes that failed (hopefully). It compiled, and since I'm running the patched kernel I had no freeze anymore. But I only did some brief testing. I'll report back when I had it running for some time... (In reply to comment #41) > In http://cgit.freedesktop.org/~ickle/drm-intel/log/ drm-intel-fixes is a patch > that should detect (and fixup) one observed instance of missed interrupt -> > hang: Now I'm using the drm-intel-fixes branch oy your kernel, and since that I did not have any freezes anymore. I do have some other issues now like artifacts, invisible context menus and borked thumbnails with kwin, but that may also be caused by my downgrade of all X stuff to the latest releases shipped by gentoo. Thanks for testing! The kwin artifacts do sound more like issues in our GL/X drivers. One step at a time... (In reply to comment #44) > Thanks for testing! I'm happy to do so. In the meantime, I had some 1 or 2 second hangs, but still no freeze. Into which official kernel will these fixes be introduced? > The kwin artifacts do sound more like issues in our GL/X > drivers. One step at a time... Sure. I didn't mark the page-flip freeze detect patch as for-stable (as I didn't think it would apply to earlier kernels without porting) so that will be first available in 2.6.36. The rest of the patches were candidates for-stable and have included several page-flipping hang fixes in the past. Can you file a fresh bug for the kwin stutters and attach Xorg.0.log and dmesg [drm.debug=0xe] for that period? I'm suspect they may be related to bug 30073 but it is too early to tell. I also applied the patch to 2.6.36-rc3 (where parts of it have already been applied?) and the issue did not occur any more. I still had memory leaks, though. I'm afraid I cannot follow this issue any more, because I got a new laptop now which doesn't have an intel gpu. I was quite amazed about the performance and functionality improvements of the intel driver, though. Thanks for your help. I think the hang itself is fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.