Summary: | [snb] hangcheck reports render ring IRQ miss | ||
---|---|---|---|
Product: | DRI | Reporter: | Sami Farin <hvtaifwkbgefbaei> |
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | ben, chris, daniel, jbarnes |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Sami Farin
2012-01-28 03:55:23 UTC
xorg-x11-server-Xorg-1.11.99.901-3.20120124.fc17.x86_64 stalled, too, but I could not login at tty1 for some reason. <3>[180030.123002] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... render ring idle [waiting on 9664668, at 9664667], missed IRQ? I got these every 1.5s. now I am running 1.11.99.902 without fedora patches, fingers crossed Daniel, another render ring IRQ miss surfaces! I suspect we have a further issue, but we can at least fix the render ring IRQ miss and see what remains... Please test the for-nkalkhof branch available in the kernel git repo at: http://cgit.freedesktop.org/~danvet/drm/ git clone git://people.freedesktop.org/~danvet/drm Cloning into 'drm'... remote: Counting objects: 2301783, done. remote: Compressing objects: 100% (356562/356562), done. remote: Total 2301783 (delta 1929730), reused 2295032 (delta 1923035) Receiving objects: 100% (2301783/2301783), 549.20 MiB | 1.50 MiB/s, done. Resolving deltas: 100% (1929730/1929730), done. warning: remote HEAD refers to nonexistent ref, unable to checkout. On Sat, Jan 28, 2012 at 21:05, <bugzilla-daemon@freedesktop.org> wrote: > warning: remote HEAD refers to nonexistent ref, unable to checkout. Thats's expected, you need to checkout my branch: git checkout origin/for-nkalkhof So it was the kernel, first I thought I was cloning libdrm, no wonder it was so humongous ;) Isn't there a spinoff of the missed-irqs patch for 3.0.x kernel? I don't really have time to start tinkering with this nkalkhof kernel.. maybe I just wait till the fixes land into 3.2.stable.. Currently that patch will land earliest in 3.4, so trying out that git kernel is highly advised if you don't want to wait a few months for a possible fix ... OK, thanks, I guess I have to start using it if I start getting serious hangs (requiring reboot) more regularly (>2 times a week). ok I had to sysrq-k three times inside 24h, so I compiled -nkalkhof kernel. it is not very stable, either. it crashed the instant I ran mplayer -fs -vo gl some.mkv. [ 3380.615530] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 3380.615595] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 3380.618070] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 177997 at 177995, next 178001) [ 3386.979091] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 3386.979105] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 178004 at 178003, next 178011) [ 3393.286694] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 3393.286711] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 178012 at 178011, next 178017) [ 3399.646258] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 3399.646273] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 178030 at 178023, next 178031) Created attachment 56339 [details]
i915_error_state.mesa_eaf360e5bffc56307-intel_ed1c1a7468.txt
Created attachment 56340 [details]
intel_reg_dumper.mesa_eaf360e5bffc56307-intel_ed1c1a7468.txt
(WW) intel(0): sna_dri_get_msc:1650 get vblank counter failed: Invalid argument (WW) intel(0): flip queue failed: Device or resource busy The residual bug is bug 44364, so retitling as appropriate to separate the issues. now with 7½ days uptime it has had only one more GPU hang, not bad. but I am wondering how often do you merge the linus stable to for-nkalkhof..? this with mesa df5963c25641a7c3a4bbfcb81cc3dc771581590e and nkalkhof kernel <3>[599034.091545] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <6>[599034.091620] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state <3>[599034.094118] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 35040628 at 35040627, next 35040632) <3>[599040.403130] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <3>[599040.403144] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 35040635 at 35040634, next 35040639) <3>[599046.698729] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <3>[599046.698744] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 35040641 at 35040640, next 35040646) <3>[599053.006316] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <3>[599053.006331] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 35040649 at 35040647, next 35040652) <0>[599053.006346] ------------[ cut here ]------------ <2>[599053.006358] kernel BUG at drivers/gpu/drm/drm_irq.c:929! <0>[599053.006366] invalid opcode: 0000 [#1] SMP <4>[599053.006375] CPU 3 <4>[599053.006379] Modules linked in: nfnetlink_log nls_iso8859_1 nls_cp437 xt_NOTRACK xt_CLASSIFY ipt_ECN xt_connmark xt_length xt_connlimit xt_set xt_multiport ip_set_bitmap_port ip_set_hash_net sch_sfb nf_conntrack_netlink snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device arptable_filter arp_tables ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT ip6t_LOG xt_limit ipt_LOG xt_hashlimit xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_tcpudp ip6table_filter ip6table_mangle ip6_tables iptable_filter iptable_mangle iptable_raw ip_tables bridge stp llc snd_hda_intel ip_set nfnetlink sch_netem sch_hfsc sch_htb sch_sfq cls_fw cls_u32 cls_route sch_ingress sch_red sch_tbf sch_teql sch_prio sch_gred cls_rsvp cls_rsvp6 cls_tcindex sch_cbq sch_dsmark at24 at25 uvcvideo videodev v4l2_compat_ioctl32 pps_ldisc pps_core pl2303 af_key xfrm4_tunnel esp4 ah4 ipcomp xfrm_ipcomp ipip tun udf vfat fat oprofile nf_conntrack x_tables dccp_diag dccp cmtp kernelcapi loop ppdev parp ort_pc lp parport i2c_dev ppp_async ppp_generic slhc ftdi_sio usbserial rfcomm lockd sunrpc bnep w83627ehf hwmon_vid coretemp hwmon snd_hda_codec_hdmi snd_hda_codec_realtek btusb bluetooth i2c_i801 rfkill snd_hda_codec snd_pcm e1000e iTCO_wdt iTCO_vendor_support snd_timer snd soundcore snd_page_alloc mxm_wmi wmi rtc_cmos kvm binfmt_misc tcp_cubic autofs4 firewire_ohci firewire_core i915 drm_kms_helper button video [last unloaded: pcspkr] <4>[599053.006669] <4>[599053.006673] Pid: 20231, comm: X Not tainted 3.2.0-nkalkhof-44e435f49f5b+ #1 System manufacturer System Product Name/P8Z68-V PRO GEN3 <4>[599053.006693] RIP: 0010:[<ffffffff813c937b>] [<ffffffff813c937b>] drm_vblank_put+0x5e/0x60 <4>[599053.006710] RSP: 0018:ffff880330917ce0 EFLAGS: 00010246 <4>[599053.006719] RAX: ffff88041d3d1200 RBX: ffff88041c8b5000 RCX: 00000000000048c0 <4>[599053.006730] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88041c8b5000 <4>[599053.006741] RBP: ffff880330917d48 R08: 0000000000000000 R09: 0000000000000000 <4>[599053.006752] R10: ffff88036a4ec9d0 R11: 0000000000000246 R12: ffff88006aa84b80 <4>[599053.006763] R13: ffff88041c8b5578 R14: ffff880417b9c000 R15: 00000000fffffff5 <4>[599053.006775] FS: 00007f5f2787e8c0(0000) GS:ffff88041f200000(0000) knlGS:0000000000000000 <4>[599053.006787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[599053.006796] CR2: 00007f4f51e8b000 CR3: 000000036eb56000 CR4: 00000000000406e0 <4>[599053.006807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[599053.006818] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[599053.006829] Process X (pid: 20231, threadinfo ffff880330916000, task ffff88036a4ec340) <0>[599053.006840] Stack: <4>[599053.006844] ffffffffa003d208 ffff880330917d28 0000000000000001 ffff88004f06a400 <4>[599053.006859] ffff88041c8b5068 ffff880417e40000 ffff88006aa84480 ffff880093a70000 <4>[599053.006873] ffff880330917df8 ffff88041c8b5000 ffff88041c8b5720 ffff880093a70000 <0>[599053.006887] Call Trace: <4>[599053.006902] [<ffffffffa003d208>] ? intel_crtc_page_flip+0x20c/0x338 [i915] <4>[599053.006915] [<ffffffff813d509b>] drm_mode_page_flip_ioctl+0x174/0x1ec <4>[599053.006926] [<ffffffff813c5bcc>] drm_ioctl+0x3c0/0x483 <4>[599053.006936] [<ffffffff813d4f27>] ? drm_mode_gamma_get_ioctl+0x106/0x106 <4>[599053.006948] [<ffffffff81121937>] do_vfs_ioctl+0x8a/0x4f9 <4>[599053.006958] [<ffffffff81121de0>] sys_ioctl+0x3a/0x7a <4>[599053.006967] [<ffffffff816963e2>] system_call_fastpath+0x16/0x1b <0>[599053.006976] Code: 00 8d 04 80 8d 04 80 8d 14 80 c1 e2 03 be d3 4d 62 10 89 d0 f7 e6 c1 ea 06 48 8d 34 0a 48 81 c7 e0 04 00 00 e8 68 66 c8 ff 5d c3 <0f> 0b 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 41 89 f5 4c 63 <1>[599053.007049] RIP [<ffffffff813c937b>] drm_vblank_put+0x5e/0x60 <4>[599053.007060] RSP <ffff880330917ce0> because nkalkhof seems not to be maintained anymore, I tried torvalds-1f033c1a6ec1a6815e9c, mplayer -fs -vo gl was too much for it: [ 1826.552898] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 1826.552965] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 1826.555467] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 154707 at 154706, next 154709) [ 1832.876460] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 1832.876478] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 154711 at 154710, next 154717) Created attachment 57925 [details]
3.3.0-rc5-1f033c1a6e-error_state.txt
Created attachment 57926 [details]
3.3.0-rc5-1f033c1a6e-regs.txt
(In reply to comment #17) > because nkalkhof seems not to be maintained anymore, I tried > torvalds-1f033c1a6ec1a6815e9c, mplayer -fs -vo gl was too much for it: Which is still your *mesa* bug 44364, this bug is tracking the occurrence of the IRQ miss on SNB which does has a patch going in 3.4. A patch referencing this bug report has been merged in Linux v3.4-rc1: commit 99ffa1629d737295e569267cf5940758139f9ddb Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jan 25 14:04:00 2012 +0100 drm/i915: enable forcewake voodoo also for gen6 Patch has landed, closing this bug. The residual gpu hang looks like a separate mesa issue, missed IRQs seem to be gone for good. If they pop up again, please reopen. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.