Summary: | [SNA HD3000] mplayer -vo gl -fs nukes vblank | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Sami Farin <hvtaifwkbgefbaei> | ||||||||||||||||||||||||
Component: | Driver/intel | Assignee: | Chris Wilson <chris> | ||||||||||||||||||||||||
Status: | RESOLVED DUPLICATE | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||||
Severity: | major | ||||||||||||||||||||||||||
Priority: | medium | ||||||||||||||||||||||||||
Version: | git | ||||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||
Attachments: |
|
Created attachment 54744 [details]
Xorg.0.log
2.17.0-216-g56e9727, what is that? Can you confirm using upstream whilst I try to reproduce? I was in different branch, but it was tracking master and I had no modifications except disabling legacy i810. latest commit was 84d97bdba02b909369b54de21425ffc9f6ad581a Also xorg-x11-server-Xorg-1.11.99.1-8 is extremely buggy. Is there an update in fedora yet? It is unfortunate it is extremely buggy still after 25 years, but here the results for xserver git e7df42ab68e30588a5e32ed543b0711821daf009: with -vo gl, screen does not get corrupted anymore. however, when I was just using gnome-terminal, I got this: [ 1039.951478] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 1039.951545] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 1039.953924] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 41006 at 41004, next 41007) also, screen got corrupted while I was only typing in gnome-terminal. (jee.png) I now had wmcpu running, maybe it freaked out the driver(s). I killed it, hoping I don't need to reboot so often. Created attachment 54747 [details]
intel_reg_dumper after gpu hang
Created attachment 54748 [details]
i915_error_state after gpu hang
Created attachment 54749 [details]
screen corruption
If you attach the /sys/kernel/debug/dri/0/i915_error_state I can tell you which bug you hit next... 2011-12-23 16:24:57.731957973 <3>[ 3316.523105] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung 2011-12-23 16:24:57.731959168 <3>[ 3316.523123] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 172391 at 172387, next 172392) 2011-12-23 16:26:26.411963994 <3>[ 3405.141176] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung 2011-12-23 16:26:26.411965043 <3>[ 3405.141198] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 177735 at 177731, next 177736) intel_reg_dumper and error state output is the same as in the attached files Sorry missed the error state you posted. That looks like the new code to produce more compact batch buffers is at fault. This should fix that last GPU hang: commit 98f15fc61361b7f1e01969f8d4237c13e93e3fb0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Dec 23 17:25:42 2011 +0000 sna: Don't align pwrite to cachelines for doing discontiguous copies The batch compaction breaks the 1:1 mapping between the cpu buffer and the bo, so we can no longer safely align the transfer to whole cachelines. References: https://bugs.freedesktop.org/show_bug.cgi?id=44091 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> That was some debugging from a bunch of hex numbers ;) That fix definitely made a stability improvement, thank you. There's a new drawing bug, though, please see sna-drawing-chrome.png (xrefresh does not fix it) Created attachment 54753 [details]
sna-drawing-chrome.png
with sna , xf86-video-intel 98f15fc61361b7f1e01969f8d4237c13e93e3fb0
this with efc8d04fc114e9966e5ca00600f9663ecf03a5ca Backtrace: 0: X (xorg_backtrace+0x26) [0x472d26] 1: X (0x400000+0x78229) [0x478229] 2: /lib64/libpthread.so.0 (0x300c800000+0xf4f0) [0x300c80f4f0] 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fe110268000+0x50a57) [0x7fe1102b8a57] 4: X (miCopyRegion+0x18a) [0x58339a] 5: X (miDoCopy+0x392) [0x583892] 6: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fe110268000+0x50312) [0x7fe1102b8312] 7: X (0x400000+0x12923f) [0x52923f] 8: X (0x400000+0x31923) [0x431923] 9: X (0x400000+0x358a1) [0x4358a1] 10: X (0x400000+0x22eca) [0x422eca] 11: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3004c2169d] 12: X (0x400000+0x2322d) [0x42322d] Segmentation fault at address 0x20 Fatal server error: Caught signal 11 (Segmentation fault). Server aborting hmm not very useful because I had optimizations enabled $ addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so.2.17.0-efc8d04fc114e9966e5ca00600f9663ecf03a5ca 0x50312 sna_accel.c:0 Ok, that should be fixed with commit b86e4f59299f935d5a0ea8375da97e6fc57571f9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Dec 24 11:45:27 2011 +0000 sna: Check that the copy dst is attached before replacing damage Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 54848 [details]
sna-git-655a96cd5f-xpdf.png
SNA git-655a96cd5f seems pretty stable now, but xpdf has some oddities (and space is missing between the words in the menus) and e.g. xscreensaver has also spaces missing (it says "Pleaseenteryourpassword."). I did not try does UXA work. Oops, discarding spaces is not a good idea after all. commit 8fc21328a0bdf87fde35d68d2b27834011acde7b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Dec 27 15:26:46 2011 +0000 sna: Don't discard empty glyphs, just skip them A space is encoded as a 1x1 blank glyph, but we still need to advance by its character width and so we cannot simply discard the glyph. References: https://bugs.freedesktop.org/show_bug.cgi?id=44091 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> 8fc21328a0bdf87fde35d68d2b27834011acde7b fixed xpdf drawing issues, except that the spaces are still missing (also for xscreensaver). Also, when I do "mplayer -vo gl -fs *" , and play each video 1s, after a little time every gnome-terminal goes (almost totally) black, and dockapps all go black. when I move another app over the gnome-terminals, underneath is uncovered correctly rendered text output.... but it goes almost black again when I type some commands. xterm works ok, however. I can't fix this with xrefresh, also, new gnome-terminals are also screwed. Note that this (gnome-terminal getting screwed) happened also with older SNA version from last Saturday. regs are the same as in attachment 3 [details] [review], except - DSPASURF: 0x04ea7000 + DSPASURF: 0x04396000 Created attachment 54862 [details]
sna-gnome-terminal-ls.png SNA 8fc21328a0
ignore the auto-generated link to attachment 3 [details] [review]... I meant attachment number three in this bug report. (attachment 54747 [details]) I tried mplayer again with 337635ab97bbfc9b4455eadb63214783bb90bb2b It showed some movies OK, but then froze in black screen, sysrq did not work and I had to power cycle computer. after reboot Xorg.0.log was 0 bytes long. BTW, when I exited Xorg when using intel git-8fc21328a0, it whined X: free(): invalid pointer ... X [0x441c07] [0x422f41] (gdb) disass 0x441c07 Dump of assembler code for function CloseDownEvents: 0x0000000000441bf0 <+0>: push %rbx 0x0000000000441bf1 <+1>: callq 0x44ea80 <GetMaximumEventsNum> 0x0000000000441bf6 <+6>: mov 0x3b8fcb(%rip),%rbx # 0x7fabc8 0x0000000000441bfd <+13>: mov %eax,%esi 0x0000000000441bff <+15>: mov (%rbx),%rdi 0x0000000000441c02 <+18>: callq 0x44f070 <FreeEventList> 0x0000000000441c07 <+23>: movq $0x0,(%rbx) 0x0000000000441c0e <+30>: pop %rbx 0x0000000000441c0f <+31>: retq End of assembler dump. ... 0x0000000000422f23 <+995>: add $0x8,%rax 0x0000000000422f27 <+999>: cmp %rcx,%rax 0x0000000000422f2a <+1002>: movq $0x0,0xb0(%rdx) 0x0000000000422f35 <+1013>: jne 0x422f20 <main+992> 0x0000000000422f37 <+1015>: callq 0x42ac20 <CloseDownDevices> 0x0000000000422f3c <+1020>: callq 0x441bf0 <CloseDownEvents> 0x0000000000422f41 <+1025>: mov 0x2c(%rbx),%r13d 0x0000000000422f45 <+1029>: sub $0x1,%r13d 0x0000000000422f49 <+1033>: js 0x422fae <main+1134> 0x0000000000422f4b <+1035>: mov %r14d,%r15d 0x0000000000422f4e <+1038>: xchg %ax,%ax 0x0000000000422f50 <+1040>: mov %r13d,%edi 0x0000000000422f53 <+1043>: movslq %r13d,%r14 0x0000000000422f56 <+1046>: callq 0x454dd0 <FreeScratchPixmapsForScreen> 0x0000000000422f5b <+1051>: mov %r13d,%edi 337635ab97: [ 24361.644] (WW) intel(0): sna_dri_get_msc:1633 get vblank counter failed: Invalid argument [ 24368.038] (II) AIGLX: Suspending AIGLX clients for VT switch [ 24368.100] sna_page_flip: failed to add fb: 1920x1200 depth=24, bpp=32, pitch=7680 [ 24374.665] (II) AIGLX: Resuming AIGLX clients after VT switch [ 24381.244] (WW) intel(0): sna_dri_get_msc:1633 get vblank counter failed: Invalid argument [24344.710131] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [24344.710189] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [24344.712681] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1324359 at 1324358, next 1324364) [24351.033712] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [24351.033746] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1324497 at 1324365, next 1324498) [24351.395505] usb 2-1.4: unlink qh1-0e01/ffff8803b8ab5180 start 0 [1/3 us] [24364.304438] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [24364.304457] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1324802 at 1324800, next 1324804) [24370.608033] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [24370.608050] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1324929 at 1324800, next 1324930) [24377.059537] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [24377.059572] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1325056 at 1324800, next 1325057) Created attachment 54909 [details]
i915_error_state.337635ab97.txt
after comment #25 I tried to play a movie, but X crashed and gpu wedged. # addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so.337635ab97 0xdd289 0xdd2bd 0xde459 /usr/include/xorg/list.h:183 /usr/include/xorg/list.h:205 /wrk/safari/cvs/xf86-video-intel/src/sna/sna_dri.c:608 line 608: info->drawable_id = None; (WW) intel(0): flip queue failed: Input/output error AUDIT: Wed Dec 28 20:19:13 2011: 5671: client 31 disconnected Backtrace: 0: X (xorg_backtrace+0x26) [0x472d26] 1: X (0x400000+0x78229) [0x478229] 2: /lib64/libpthread.so.0 (0x300c800000+0xf4f0) [0x300c80f4f0] 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f75c1788000+0xdd289) [0x7f75c1865289] 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f75c1788000+0xdd2bd) [0x7f75c18652bd] 5: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f75c1788000+0xde459) [0x7f75c1866459] 6: X (FreeClientResources+0xf5) [0x45cc45] 7: X (CloseDownClient+0x5a) [0x434d2a] 8: X (0x400000+0x358d8) [0x4358d8] 9: X (0x400000+0x22eca) [0x422eca] 10: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3004c2169d] 11: X (0x400000+0x2322d) [0x42322d] Segmentation fault at address (nil) Fatal server error: Caught signal 11 (Segmentation fault). Server aborting .............. then when I tried to restart X, it segfaulted on startup. had to powercycle again. (WW) intel(0): cannot enable XVideo whilst the GPU is wedged (II) intel(0): Overlay video not supported on this hardware (WW) intel(0): Disabling Xv because no adaptors could be initialized. (WW) intel(0): cannot enable DRI2 whilst the GPU is wedged (==) intel(0): hotplug detection: "enabled" (--) RandR disabled (II) SELinux: Disabled on system (II) AIGLX: Screen 0 is not DRI2 capable (II) AIGLX: Screen 0 is not DRI capable (II) AIGLX: Loaded and initialized swrast (II) GLX: Initialized DRISWRAST GL provider for screen 0 Backtrace: 0: X (xorg_backtrace+0x26) [0x472d26] 1: X (0x400000+0x78229) [0x478229] 2: /lib64/libpthread.so.0 (0x300c800000+0xf4f0) [0x300c80f4f0] 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f506ef1b000+0x814ab) [0x7f506ef9c4ab] 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f506ef1b000+0x839e9) [0x7f506ef9e9e9] 5: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f506ef1b000+0x6eab9) [0x7f506ef89ab9] 6: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f506ef1b000+0x7eba1) [0x7f506ef99ba1] 7: X (0x400000+0xc7b47) [0x4c7b47] 8: X (0x400000+0x22d64) [0x422d64] 9: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3004c2169d] 10: X (0x400000+0x2322d) [0x42322d] Segmentation fault at address 0x52 Fatal server error: Caught signal 11 (Segmentation fault). Server aborting # addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so.337635ab97 0x814ab 0x839e9 0x6eab9 0x7eba1 /wrk/safari/cvs/xf86-video-intel/src/sna/sna_glyphs.c:176 /wrk/safari/cvs/xf86-video-intel/src/sna/sna_glyphs.c:926 /wrk/safari/cvs/xf86-video-intel/src/sna/sna_accel.c:9392 /wrk/safari/cvs/xf86-video-intel/src/sna/sna_driver.c:212 line 176: sna_pixmap(pixmap)->pinned = true; The hangs are nothing to do with the ddx, but mesa. Great.. so do I change Product to "Mesa"? what does this crash have to do with? xserver, libdrm, kernel, pixman, mesa, xf86-video-intel? intel git-c9f7f10b [mi] EQ overflow continuing. 200 events have been dropped. Backtrace: 0: X (xorg_backtrace+0x26) [0x472d26] 1: X (QueuePointerEvents+0x5b) [0x44f7bb] 2: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7ff24b792000+0x5881) [0x7ff24b797881] 3: X (0x400000+0x91957) [0x491957] 4: X (0x400000+0xb975e) [0x4b975e] 5: /lib64/libpthread.so.0 (0x300c800000+0xf4f0) [0x300c80f4f0] 6: /lib64/libc.so.6 (ioctl+0x7) [0x3004ce7b67] 7: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7ff24cf70398] 8: /usr/lib64/libdrm.so.2 (drmCommandNone+0x16) [0x7ff24cf72bd6] 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7ff24ca44000+0x4bbbf) [0x7ff24ca8fbbf] this is in kgem_throttle: kgem->wedged |= drmCommandNone(kgem->fd, DRM_I915_GEM_THROTTLE) == -EIO; 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7ff24ca44000+0x6ec71) [0x7ff24cab2c71] 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7ff24ca44000+0x6ecb4) [0x7ff24cab2cb4] 12: X (BlockHandler+0x4a) [0x439aaa] 13: X (WaitForSomething+0x269) [0x47b899] 14: X (0x400000+0x355b2) [0x4355b2] 15: X (0x400000+0x22eca) [0x422eca] 16: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3004c2169d] 17: X (0x400000+0x2322d) [0x42322d] (EE) intel(0): Detected a hung GPU, disabling acceleration. (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg. [mi] Increasing EQ size to 512 to prevent dropped events. [mi] EQ processing has resumed after 281 dropped events. [mi] This may be caused my a misbehaving driver monopolizing the server's resources. AUDIT: Fri Dec 30 00:25:54 2011: 15911: client 33 disconnected AUDIT: Fri Dec 30 00:25:54 2011: 15911: client 33 connected from local host ( uid=500 gid=509 pid=22546 ) AUDIT: Fri Dec 30 00:27:17 2011: 15911: client 33 disconnected AUDIT: Fri Dec 30 00:27:17 2011: 15911: client 33 connected from local host ( uid=500 gid=509 pid=22601 ) Backtrace: 0: X (xorg_backtrace+0x26) [0x472d26] 1: X (0x400000+0x78229) [0x478229] 2: /lib64/libpthread.so.0 (0x300c800000+0xf4f0) [0x300c80f4f0] 3: /usr/lib64/libpixman-1.so.0 (0x3025000000+0x5f668) [0x302505f668] 4: /usr/lib64/libpixman-1.so.0 (0x3025000000+0x7d032) [0x302507d032] 5: /usr/lib64/libpixman-1.so.0 (pixman_fill+0x31) [0x302500a081] 6: /usr/lib64/xorg/modules/libfb.so (fbFill+0x30d) [0x7ff24c01458d] 7: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7ff24ca44000+0x697c4) [0x7ff24caad7c4] this is src/sna/sna_accel.c:7628 } while (--n); (after if (region.data == NULL) { ) 8: X (0x400000+0x1273d0) [0x5273d0] 9: X (miPaintWindow+0x1b0) [0x585ee0] 10: X (miWindowExposures+0xc2) [0x586092] 11: X (0x400000+0xa86f0) [0x4a86f0] 12: X (miHandleValidateExposures+0x68) [0x59ea78] 13: X (MapWindow+0x2b7) [0x46c857] 14: X (0x400000+0x30160) [0x430160] 15: X (0x400000+0x358a1) [0x4358a1] 16: X (0x400000+0x22eca) [0x422eca] 17: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3004c2169d] 18: X (0x400000+0x2322d) [0x42322d] Segmentation fault at address (nil) Fatal server error: Caught signal 11 (Segmentation fault). Server aborting ........ I now am running latest mesa git, my goal is to get 24h uptime! Created attachment 54961 [details]
i915_error_state.c9f7f10b.txt
<3>[68076.585833] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <6>[68076.585899] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state <3>[68076.588216] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 2700529 at 2700527, next 2700531) Created attachment 54979 [details]
i915_error_state.0dc5c0651cb69.txt
Closing as a dupe of the residual bug. *** This bug has been marked as a duplicate of bug 44364 *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 54743 [details] intel_reg_dumper.txt after ten seconds of using X, intel driver crashes: (WW) intel(0): sna_dri_get_msc:1633 get vblank counter failed: Invalid argument after that, with glxgears: 9357 frames in 5.0 seconds = 1871.355 FPS System environment: -- chipset: Intel(R) Sandybridge Desktop (GT2) -- system architecture: 64-bit -- xf86-video-intel: 2.17.0-216-g56e9727 -- xserver: xorg-x11-server-Xorg-1.11.99.1-8.20111109.fc17.x86_64 -- mesa: 7.12.0.3.fc17 -- libdrm: 2.4.29-3-gef20301 -- kernel: 3.0.14 -- Linux distribution: Fedora -- Machine or mobo model: Asus P8Z68-V PRO GEN3, Intel Core i5-2500K -- Display connector: DVI Reproducing steps: mplayer -vo gl -fs some1080pvideo.mkv Additional info: screen corruption (fixable with xrefresh) when switching away from fullscreen mode with -vo gl, but no with -vo xv with SNA, "wmcpu -t 50000" triggers double free or memory corruption bug in intel_drv.so or Xorg, so I have to run X without wmcpu