Bug 45273

Summary: [IVB] gpu hang on an apparently idle machine
Product: DRI Reporter: Eugeni Dodonov <eugeni>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED DUPLICATE QA Contact:
Severity: normal    
Priority: medium CC: ben, chris, daniel, jbarnes
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state
none
dmesg
none
Xorg.0.log none

Description Eugeni Dodonov 2012-01-26 07:15:30 UTC
Created attachment 56185 [details]
i915_error_state

Kernel: drm-intel-next-2012-01-20 (3d29b842e58fbca2c13a9f458fddbaa535c6e578)
xf86-video-intel: bbd6c8123635899e89911104bf84e1b7d11d66a1 (using UXA)
libdrm: 66518ab5653cfdc840cd69e7b653ec05df060584


Machine was mostly idle, no specific workload.



Noticeable messages from logs (attached):
Xorg.0.log:
[160645.780] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Device or resource busy.

dmesg:
[158263.159917] [drm:pch_irq_handler] *ERROR* PCH poison interrupt
[158267.410012] [drm:pch_irq_handler] *ERROR* PCH poison interrupt
[160410.018307] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[160410.018325] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[160430.920839] cat: page allocation failure: order:10, mode:0x40d0
[160430.920844] Pid: 26275, comm: cat Tainted: G         C   3.2.0-rc7-eugeni-00003-ge27bde8 #46
[160430.920846] Call Trace:
[160430.920854]  [<ffffffff8110a9c6>] warn_alloc_failed+0xf6/0x150
[160430.920858]  [<ffffffff8110d6f0>] ? drain_pages+0xa0/0xa0
[160430.920862]  [<ffffffff8110d706>] ? drain_local_pages+0x16/0x20
[160430.920865]  [<ffffffff8110df00>] __alloc_pages_nodemask+0x600/0x800
[160430.920870]  [<ffffffff811450c3>] alloc_pages_current+0xa3/0x110
[160430.920873]  [<ffffffff8110a08e>] __get_free_pages+0xe/0x40
[160430.920876]  [<ffffffff8114d2ef>] kmalloc_order_trace+0x3f/0xd0
[160430.920880]  [<ffffffff8118478e>] ? seq_read+0x15e/0x3d0
[160430.920883]  [<ffffffff8114f868>] __kmalloc+0x158/0x160
[160430.920887]  [<ffffffff81112556>] ? put_page+0x36/0x40
[160430.920889]  [<ffffffff811847a5>] seq_read+0x175/0x3d0
[160430.920893]  [<ffffffff81163d7c>] vfs_read+0xac/0x180
[160430.920897]  [<ffffffff81163e9a>] sys_read+0x4a/0x90
[160430.920901]  [<ffffffff814c2ec2>] system_call_fastpath+0x16/0x1b

And i915_error_state seem to have a huge number of active objects (3672).
Comment 1 Eugeni Dodonov 2012-01-26 07:19:16 UTC
Created attachment 56186 [details]
dmesg
Comment 2 Eugeni Dodonov 2012-01-26 07:20:10 UTC
Created attachment 56187 [details]
Xorg.0.log
Comment 3 Chris Wilson 2012-01-26 07:36:34 UTC
Smells like the semaphore issue. Both render/blt are waiting on the semaphore... Danvet, what was that branch again with the silly name?
Comment 5 Daniel Vetter 2012-01-26 07:45:41 UTC
And by the looks of it we need to convert the error_state over to the iterative seqfile interface ...
Comment 6 Chris Wilson 2012-02-01 14:14:37 UTC
Judging by the error state this is the same as bug 45492.
Comment 7 Eugeni Dodonov 2012-02-02 08:27:13 UTC
Indeed it is. I'll mark it as dupe so the crime scene investigation would continue on bug #45492.

*** This bug has been marked as a duplicate of bug 45492 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.