Bug 38567

Summary: [SNB rc6/vt-d] hang unless i915_enable_rc6=0
Product: DRI Reporter: Ted Phelps <phelps>
Component: DRM/IntelAssignee: Eugeni Dodonov <eugeni>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: medium CC: daniel, eugeni, florian, jbarnes, xhejtman
Version: DRI gitKeywords: NEEDINFO
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state
none
Another GPU hung
none
Error state corresponding to comment #12.
none
First warning from __gen6_gt_wait_for_fifo
none
last batch of kernel warnings for comment #12
none
Patch to i915_drv.c described in comment #12
none
Hung GPU following "Try enabling RC6 by default (again)"
none
dmesg output after reboot with virtualization disabled.
none
lspci -vv output
none
dmidecode output
none
Render glitch with rc6=1
none
True rc6 render glitch
none
rc6 render glitch without SNA none

Description Ted Phelps 2011-06-22 05:37:42 UTC
I'm seeing my X session hang occasionally (ye olde *Error* Hangcheck timer elapsed).  I haven't managed to correlate the hangs with any specific activity.  The most recent hang (i915_error_state to be attached) occurred while running a 3-D game engine (darkplaces) though I've seen it occur without any serious 3-D activity too.

Interestingly, I've found that setting i915_enable_rc6=0 prevents these hangs from occurring -- I recently managed to run for around 30 days without a hang (with 2.6.39-rc6 git/keithp 8eb5729).  With rc6 enabled, I can typically trigger a hang within a few hours.

== SYSTEM ENVIRONMENT ==

Chipset: SNB 2600K
System architecture: x86_64
xf86-video-intel-2.15.0, xorg-server-1.10.2, mesa-7.10.3, libdrm-2.4.26
Linux version 3.0.0-rc4 with gen6_enable_rps call removed from intel_modeset_init (as suggested by Jesse Barnes on 2011-06-21)
Linux distro: LFS
Motherboard: DH67CL
Display connector: HDMI

== TO REPRODUCE ==
Unknown

== End of Xorg.log ==
[  2010.757] (II) intel(0): Allocated new frame buffer 1920x1200 stride 7680, tiled
[  9967.720] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[  9967.721] 
Backtrace:
[  9967.730] 0: X (xorg_backtrace+0x28) [0x49fff8]
[  9967.730] 1: X (mieqEnqueue+0x1f4) [0x49f514]
[  9967.730] 2: X (xf86PostMotionEventM+0x97) [0x47c907]
[  9967.730] 3: X (xf86PostMotionEventP+0x3c) [0x47c9fc]
[  9967.730] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f9955f6f000+0x46da) [0x7f9955f736da]
[  9967.730] 5: X (0x400000+0x6a287) [0x46a287]
[  9967.730] 6: X (0x400000+0x1198e3) [0x5198e3]
[  9967.730] 7: /lib/libpthread.so.0 (0x7f995a229000+0xe4e0) [0x7f995a2374e0]
[  9967.730] 8: /lib/libc.so.6 (ioctl+0x7) [0x7f9958db5867]
[  9967.730] 9: /usr/lib/libdrm.so.2 (drmIoctl+0x28) [0x7f9957e50638]
[  9967.730] 10: /usr/lib/libdrm_intel.so.1 (drm_intel_gem_bo_map_gtt+0x7e) [0x7f99577f099e]
[  9967.730] 11: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f99579f5000+0x10e99) [0x7f9957a05e99]
[  9967.730] 12: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f99579f5000+0x32af6) [0x7f9957a27af6]
[  9967.730] 13: X (miPolyText16+0x99) [0x552ba9]
[  9967.730] 14: X (0x400000+0xdbd16) [0x4dbd16]
[  9967.730] 15: X (doPolyText+0x19e) [0x43005e]
[  9967.730] 16: X (PolyText+0x49) [0x431209]
[  9967.730] 17: X (0x400000+0x2b3f9) [0x42b3f9]
[  9967.730] 18: X (0x400000+0x2e011) [0x42e011]
[  9967.730] 19: X (0x400000+0x2204e) [0x42204e]
[  9967.730] 20: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f9958d0c75d]
[  9967.730] 21: X (0x400000+0x21bf9) [0x421bf9]
[  9972.566] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[  9972.566] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.

== END OF dmesg (drm.debug not set, alas) ==
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1133092 at 1133090, next 1133096)


Please let me know if there's any additional information I can provide or any settings you'd like me to test.  I'm stable with rc6 disabled, but would prefer to save those watts if at all possible.

Thanks,
-Ted
Comment 1 Ted Phelps 2011-06-22 05:59:06 UTC
Created attachment 48282 [details]
i915_error_state

The plain text version is too large, so I've bzipped.
Comment 2 Chris Wilson 2011-06-22 06:14:43 UTC
Right, there doesn't appear to be anything unusual in the error state. The only indication is that it is an older mesa, any you may find solace in some of the recent bug fixes.

In particular, 

commit f6e5230b2614cc91e4c849c07781b2230878d274
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Jun 17 18:44:26 2011 -0700

    i965/gen6: Apply documented workaround for nonpipelined state packets.
    
    Fixes a 100% reproducible GPU hang in topogun-1.06-orc-84k.trace.
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

sounds like it could cause quite a few random crashes.

First let's rule out the known bugs and failing that, we have an unfortunate side-effect of rc6 that we need to pin down.
Comment 3 Ted Phelps 2011-06-24 17:17:36 UTC
I'm having a great deal of difficulty getting a more recent Mesa to function.  When I fire up the gears demo, I see some new textual output that looks vaguely like source code and then X11 hangs for a few seconds.  A window appears with no content and the machine hangs for a few seconds.  I can eventually recover by killing the gears demo.  I've tried recompiling X, xf86-video-intel and the gears demo but the problem still persists.

So I'm afraid that this issue is going to have to go on hold until that's sorted out.

-Ted
Comment 4 PAB 2011-06-25 10:40:05 UTC
Facing the same problem
[  377.856455] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  377.866509] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 13484 at 13483, next 13485)

I made following observation:
I am running a MSI H67MA-ED55, i7 2600k, 2.6.38. First time I had this "GPU hung"-issue was after changing the BIOS from 1.4 to 1.5. With 1.4 Ubuntu ran without any problems. Testing 2.6.39 and 3.0rc4 made no difference.

I could imagine this could help a bit in focusing the fault reason if delta between 1.4 and 1.5 is not too big.

Good luck, Peter
Comment 5 Ted Phelps 2011-06-26 17:28:24 UTC
Re: comment #3, I had configured mesa to use the i965 gallium driver, and that appears to have been the source of that issue.

I ran with enable_rc6=0 overnight and something went wrong, but I'm not sure what.  My keyboard was being ignored (even Num Lock didn't light its LED) a neither it nor my mouse could get me out of DPMS mode.  Network was functional; I was able to ssh to the machine and there was no sign of trouble in dmesg or Xorg.0.log.  My attempt to wake the machine from DPMS via xset just hung.  strace of X itself showed an endless series of restarted system calls:

--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(10, 0x40406469, 0x7fffb8965040)   = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(10, 0x40406469, 0x7fffb8965040)   = ? ERESTARTSYS (To be restarted)
...

I've since rebooted the machine with i915_enable_rc6=0 to see if that has any effect on this new issue.


In short, I'm still having problems, but they appear to be different problems.  I was unable to provoke the hangcheck timer by running 3D applications.  I'll leave this bug open a few more days in case I do manage to reproduce the old symptoms.

Thanks,
-Ted
Comment 6 Chris Wilson 2011-06-27 00:11:40 UTC
(In reply to comment #5)
> My attempt to wake the machine from DPMS via xset just hung. 
> strace of X itself showed an endless series of restarted system calls:
> 
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
> ioctl(10, 0x40406469, 0x7fffb8965040)   = ? ERESTARTSYS (To be restarted)

Interesting. A mutex livelock perhaps? Would be useful to know the stacktrace (so that I don't have to work out which ioctl 0x40406469 is ;) and to get the kernel stacks for anything else that may still be inside i915.ko

Ideally that should never have happened, as the hangcheck is supposed to kick in, force whatever is holding the lock to return, reset the device, then everything can continue merrily on as if nothing went wrong. (I did say ideally.)
Comment 7 Chris Wilson 2011-07-10 03:43:44 UTC
Ted, did you reproduce the busy-spin? I'm curious as to what the timings were. Do you still have the strace handy?
Comment 8 Ted Phelps 2011-07-10 20:47:25 UTC
Sorry for the silence.  I had a hardware failure on another machine and my test machine was pressed into service in a role where I couldn't easily reboot it.

I tried the test again with a newer Linux kernel (git/keithp 902daf6), Mesa snapshot (git/master 576f489) and xorg-server (1.10.3) over the weekend.  The machine locked up solid on me -- nothing from netconsole -- after about 2 hours.  I rebooted and it has been running without issues for 2.5 days.  Both times were with i915_enable_rc6=1.

So something's still wrong, but I don't have anything useful to add.

-Ted
Comment 9 Ted Phelps 2011-07-12 05:25:50 UTC
Created attachment 49000 [details]
Another GPU hung

Jul 12 06:35:29 orpheus -- MARK --
Jul 12 06:48:24 orpheus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck t
imer elapsed... GPU hung
Jul 12 06:48:24 orpheus kernel: [drm:i915_wait_request] *ERROR* i915_wait_reques
t returns -11 (awaiting 38663880 at 38663877, next 38663881)
Jul 12 06:48:24 orpheus kernel: [drm:init_ring_common] *ERROR* render ring initi
alization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Jul 12 06:48:25 orpheus kernel: [drm:init_ring_common] *ERROR* gen6 bsd ring ini
tialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Jul 12 06:48:25 orpheus kernel: [drm:init_ring_common] *ERROR* blt ring initiali
zation failed ctl 00000000 head 00000000 tail 00000000 start 00000000
Jul 12 06:48:32 orpheus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck t
imer elapsed... GPU hung
Jul 12 06:48:32 orpheus kernel: [drm:i915_wait_request] *ERROR* i915_wait_reques
t returns -11 (awaiting 38663887 at 38663880, next 38663888)
Jul 12 06:55:29 orpheus -- MARK --
Jul 12 07:15:29 orpheus -- MARK --
Comment 10 Ted Phelps 2011-07-12 05:30:16 UTC
Sorry for the incoherent comment on the attachment.  I seem to be especially inept with the keyboard today.

I just wanted to mention that this latest hang happened whilst I was asleep, so presumably not much was happening at the time.  There was another hang a few hours later when I was at work:

    Jul 12 09:45:58 orpheus kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
    Jul 12 09:45:58 orpheus kernel: [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 40661957 at 40661954, next 40661958)

Same Xorg/mesa as in comment #8.

-Ted
Comment 11 Chris Wilson 2011-07-12 05:45:57 UTC
The batch looks pretty innocuous. Yet the GPU is very much upset, the ring  and error registers returning 0 is a very bad sign. And rc6 appears to be still the only the difference between stable/unstable systems.
Comment 12 Ted Phelps 2011-07-16 17:33:25 UTC
On a whim, I tweaked the latest drm-intel-next (git-6e96e77) so that __gen6_gt_wait_for_fifo would wait for 5000 iterations rather than 500 and to warn if the loop counter was less than 4500 when the loop exited in addition to the existing warning on the loop being less than zero.  I'll attach this trivial patch in a moment.

In the following day, I observed the new warning twice without the original warning -- it took between 500 and 5000 iterations get an acceptable value from GT_FIFO_FREE_ENTRIES.  The third time, both warnings were encountered, indicating that over 5000 iterations passed.  Immediately following that, the first warning was hit again but without the second.  About 16 hours later, there was another batch of warnings indicating 10,500 iterations were required before the fifo had a sufficient number of free entries.  Three hours later, 100,000 cycles of waits ensued, the hangcheck timer expired and i915_wait_request returned -EAGAIN.  I'll attach the stack traces and i915_error_state for this last.  No 3-D applications were running on the machine at the time.

This is all with i915_enable_rc6=1, mesa git-450f486, xf86-video-intel-2.15.0.

-Ted
Comment 13 Ted Phelps 2011-07-16 17:34:47 UTC
Created attachment 49193 [details]
Error state corresponding to comment #12.
Comment 14 Ted Phelps 2011-07-16 17:36:14 UTC
Created attachment 49194 [details]
First warning from __gen6_gt_wait_for_fifo
Comment 15 Ted Phelps 2011-07-16 17:38:15 UTC
Created attachment 49195 [details]
last batch of kernel warnings for comment #12
Comment 16 Ted Phelps 2011-07-16 17:40:13 UTC
Created attachment 49196 [details] [review]
Patch to i915_drv.c described in comment #12

This patch doesn't fix anything; it simply highlights how long we can end up waiting for GT_FIFO_FREE_ENTRIES to reach GT_FIFO_NUM_RESERVED_ENTRIES.
Comment 17 Ted Phelps 2011-08-04 07:10:11 UTC
Created attachment 49915 [details]
Hung GPU following "Try enabling RC6 by default (again)"

Yet another i915_error_state.  This one with:

- Linux version 3.0.0-00175-g07b7ddd (on keithp/drm-intel-next)
- Mesa-7.11rc4
- xf86-video-intel-2.15.0

Cheers,
-Ted
Comment 18 Florian Mickler 2011-08-08 01:47:49 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 4e20fa65a3ea789510eed1a15deb9e8aab2b8202
Author: Keith Packard <keithp@keithp.com>
Date:   Wed Aug 3 10:52:24 2011 -0700

    drm/i915: Try enabling RC6 by default (again)
Comment 19 Florian Mickler 2011-08-08 02:24:48 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.1-rc1:

commit 39060a07781b4930656752943cf5d66376d0533c
Author: Dave Airlie <airlied@redhat.com>
Date:   Fri Aug 5 10:56:29 2011 +0100

    Revert "drm/i915: Try enabling RC6 by default (again)"
Comment 20 Daniel Vetter 2011-09-05 02:36:56 UTC
It looks like you're booting with i915.use_semaphores=1. Can you please disable semaphores and enable rc6 and see what happens? Just to check whether the problem only lies in the combination of the two.
Comment 21 Ted Phelps 2011-09-18 11:32:05 UTC
Sorry for the late reply.

I've seen this behavior both with and without semaphores enabled.

-Ted
Comment 22 Daniel Vetter 2011-09-30 12:49:36 UTC
Can you please check whether you're using VT-d/DMAR? If so, please try disabling that in the bios. Also please attach your full dmesg after boot.

Thanks, Daniel
Comment 23 Eugeni Dodonov 2011-10-07 09:32:33 UTC
Hi,

could you also post the results of dmidecode and lspci -vv for the machine where the issue happens please?
Comment 24 Ted Phelps 2011-10-08 17:58:18 UTC
Apologies again for the delay.  My motherboard seems to have died; I've replaced that with a borrowed Intel DZ68DB motherboard and verified that I still see the hangcheck timer get wedged.

I'm now running with virtualization disabled in the BIOS; I'll post again if I see it hang in this configuration.

-Ted
Comment 25 Ted Phelps 2011-10-08 18:00:05 UTC
Created attachment 52131 [details]
dmesg output after reboot with virtualization disabled.
Comment 26 Ted Phelps 2011-10-08 18:00:40 UTC
Created attachment 52132 [details]
lspci -vv output
Comment 27 Ted Phelps 2011-10-08 20:23:21 UTC
Created attachment 52133 [details]
dmidecode output
Comment 28 Ted Phelps 2011-10-10 03:59:47 UTC
Ok, I think Daniel is on to something.  I've been running for 38 hours with i915_enable_rc6=1 and no GPU hangs, which is about 6 times longer than it typically takes to hang.  I'll keep an eye on it for the rest of the week, but I think we have a winner.

So, now that we think you're a genius, would you like to tell us why did you think VT-d/DMAR might be relevant?

-Ted
Comment 29 Daniel Vetter 2011-10-10 12:33:30 UTC
Ok, hopefully we'll have an angle on this now. Next thing to try is to reenable dmar in the bios and disable it on the kernel cmdline with

intel_iommu=off

This /should/ give the same results, but there are some slight variations possible. So please test carefully (i.e. if you can, let it run an entire week with this).

Oh, and the genius thing is a bit too much - we've simply tracked down another seemingly obscure bug to bad interaction with VT-d and I thought a shot in the dark rarely hurts ;-) And the proof is still out there, I won't (yet) call this "tracked down".
Comment 30 Lukas Hejtmanek 2011-10-13 03:20:58 UTC
(In reply to comment #29)
> Ok, hopefully we'll have an angle on this now. Next thing to try is to reenable
> dmar in the bios and disable it on the kernel cmdline with
> 
> intel_iommu=off
> 
> This /should/ give the same results, but there are some slight variations
> possible. So please test carefully (i.e. if you can, let it run an entire week
> with this).
> 
> Oh, and the genius thing is a bit too much - we've simply tracked down another
> seemingly obscure bug to bad interaction with VT-d and I thought a shot in the
> dark rarely hurts ;-) And the proof is still out there, I won't (yet) call this
> "tracked down".

I disabled VT-d in BIOS but I got screen corruption if rc6 enabled. Is it related?

-- 
Lukas Hejtmanek
Comment 31 Daniel Vetter 2011-10-13 05:51:23 UTC
> --- Comment #30 from Lukas Hejtmanek <xhejtman@fi.muni.cz> 2011-10-13
> I disabled VT-d in BIOS but I got screen corruption if rc6 enabled. Is it
> related?

Maybe. We have another report (#41682) implicating rc6 in render
glitches. Can you post a screenshoot/screencast of it happening?
Comment 32 Lukas Hejtmanek 2011-10-13 06:20:53 UTC
(In reply to comment #31)
> > --- Comment #30 from Lukas Hejtmanek <xhejtman@fi.muni.cz> 2011-10-13
> > I disabled VT-d in BIOS but I got screen corruption if rc6 enabled. Is it
> > related?
> 
> Maybe. We have another report (#41682) implicating rc6 in render
> glitches. Can you post a screenshoot/screencast of it happening?

OK, I will follow #41682 and provide a screenshot there. Btw, corruption seems to be related only to glyphs.
Comment 33 Daniel Vetter 2011-10-13 06:28:47 UTC
> --- Comment #32 from Lukas Hejtmanek <xhejtman@fi.muni.cz> 2011-10-13 06:20:53 PDT ---
> OK, I will follow #41682 and provide a screenshot there. Btw, corruption seems
> to be related only to glyphs.

Please post your screenshot here on this bug. With such
hard-to-track-down issues like this it's usually better to keep
reports separate till there's proof in form of a fix that they're
indeed the same issue. Too much risk of (needless) confusion.
Comment 34 Lukas Hejtmanek 2011-10-13 06:40:21 UTC
Created attachment 52292 [details]
Render glitch with rc6=1
Comment 35 Lukas Hejtmanek 2011-10-13 06:40:41 UTC
(In reply to comment #33)
> > --- Comment #32 from Lukas Hejtmanek <xhejtman@fi.muni.cz> 2011-10-13 06:20:53 PDT ---
> > OK, I will follow #41682 and provide a screenshot there. Btw, corruption seems
> > to be related only to glyphs.
> 
> Please post your screenshot here on this bug. With such
> hard-to-track-down issues like this it's usually better to keep
> reports separate till there's proof in form of a fix that they're
> indeed the same issue. Too much risk of (needless) confusion.

Attached..
Comment 36 Lukas Hejtmanek 2011-10-13 06:52:17 UTC
(In reply to comment #35)
> (In reply to comment #33)
> > > --- Comment #32 from Lukas Hejtmanek <xhejtman@fi.muni.cz> 2011-10-13 06:20:53 PDT ---
> > > OK, I will follow #41682 and provide a screenshot there. Btw, corruption seems
> > > to be related only to glyphs.
> > 
> > Please post your screenshot here on this bug. With such
> > hard-to-track-down issues like this it's usually better to keep
> > reports separate till there's proof in form of a fix that they're
> > indeed the same issue. Too much risk of (needless) confusion.
> 
> Attached..

hmm, looks like this is not *that* issue, this screenshot looks like glyphs rendered to completely wrong surface. I try to catch another screenshot that is directly related to rc6 issue. 

I guess so as running forcewaked does not prevent the issue in the attachement.
Comment 37 Lukas Hejtmanek 2011-10-13 06:54:40 UTC
Created attachment 52293 [details]
True rc6 render glitch
Comment 38 Daniel Vetter 2011-10-13 06:57:41 UTC
On Thu, Oct 13, 2011 at 15:52,  <bugzilla-daemon@freedesktop.org> wrote:
> hmm, looks like this is not *that* issue, this screenshot looks like glyphs
> rendered to completely wrong surface. I try to catch another screenshot that is
> directly related to rc6 issue.
>
> I guess so as running forcewaked does not prevent the issue in the attachement.

Can you elaborate a bit on what you exactly mean here? I.e. is the
"true" rc6 glich prevented by running forcewaked, whereas the previous
screenshoot is only prevented by disabling rc6 on the kernel cmdline?
Comment 39 Lukas Hejtmanek 2011-10-13 07:11:32 UTC
(In reply to comment #38)
> On Thu, Oct 13, 2011 at 15:52,  <bugzilla-daemon@freedesktop.org> wrote:
> > hmm, looks like this is not *that* issue, this screenshot looks like glyphs
> > rendered to completely wrong surface. I try to catch another screenshot that is
> > directly related to rc6 issue.
> >
> > I guess so as running forcewaked does not prevent the issue in the attachement.
> 
> Can you elaborate a bit on what you exactly mean here? I.e. is the
> "true" rc6 glich prevented by running forcewaked, whereas the previous
> screenshoot is only prevented by disabling rc6 on the kernel cmdline?

It looks like:
attachement id=52293 happens when rc6=1 AND forcewaked is not running.

attachement id=52292 happens independently of rc6 (either on kernel cmd or forcewaked) - just checked this and I am able to reproduce it.
Comment 40 Daniel Vetter 2011-10-13 07:15:00 UTC
For the render corruptions, can you please try the latest git version of xf86-video-intel, specifically

commit d0184b59095d5b8fab1a65ceba075d29189130d4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun Oct 9 18:43:14 2011 +0200

    snb: implement PIPE_CONTROL workaround
Comment 41 Lukas Hejtmanek 2011-10-13 07:38:13 UTC
(In reply to comment #40)
> For the render corruptions, can you please try the latest git version of
> xf86-video-intel, specifically
> 
> commit d0184b59095d5b8fab1a65ceba075d29189130d4
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Sun Oct 9 18:43:14 2011 +0200
> 
>     snb: implement PIPE_CONTROL workaround

I am running:
commit 823a4272c50247482428a16cb08741bf87a302ea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 11 13:51:41 2011 +0100

    sna/gen3: Avoid RENDER/BLT context switch for fill boxes
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

and it is still bad.

The first bad commit is:
commit c5414ec992d935e10156a2b513d5ec2dded2f689
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Oct 2 12:02:41 2011 +0100

    sna: Use BLT operations to avoid fallbacks in core glyph rendering
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 42 Chris Wilson 2011-10-13 07:42:27 UTC
(In reply to comment #41) 
> The first bad commit is:
> commit c5414ec992d935e10156a2b513d5ec2dded2f689
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sun Oct 2 12:02:41 2011 +0100
> 
>     sna: Use BLT operations to avoid fallbacks in core glyph rendering
> 
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Lukas, are you SNA? If so can you file a separate bug report as the original predates SNA and in particular that bisection. Thanks.
Comment 43 Daniel Vetter 2011-10-13 07:43:53 UTC
On Thu, Oct 13, 2011 at 16:42,  <bugzilla-daemon@freedesktop.org> wrote:
> Lukas, are you SNA? If so can you file a separate bug report as the original
> predates SNA and in particular that bisection. Thanks.

Also please recheck the rc6 related issues reported here with sna disabled.
Comment 44 Lukas Hejtmanek 2011-10-13 07:48:44 UTC
(In reply to comment #42)
> (In reply to comment #41) 
> > The first bad commit is:
> > commit c5414ec992d935e10156a2b513d5ec2dded2f689
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Sun Oct 2 12:02:41 2011 +0100
> > 
> >     sna: Use BLT operations to avoid fallbacks in core glyph rendering
> > 
> >     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Lukas, are you SNA? If so can you file a separate bug report as the original
> predates SNA and in particular that bisection. Thanks.

yes, I use SNA. I added comments to #41718, is it OK?
Comment 45 Lukas Hejtmanek 2011-10-13 07:57:23 UTC
(In reply to comment #43)
> On Thu, Oct 13, 2011 at 16:42,  <bugzilla-daemon@freedesktop.org> wrote:
> > Lukas, are you SNA? If so can you file a separate bug report as the original
> > predates SNA and in particular that bisection. Thanks.
> 
> Also please recheck the rc6 related issues reported here with sna disabled.

it seems that it is SNA related. I don't see the issue without SNA. I will do more testing..
Comment 46 Lukas Hejtmanek 2011-10-13 08:03:14 UTC
(In reply to comment #45)
> (In reply to comment #43)
> > On Thu, Oct 13, 2011 at 16:42,  <bugzilla-daemon@freedesktop.org> wrote:
> > > Lukas, are you SNA? If so can you file a separate bug report as the original
> > > predates SNA and in particular that bisection. Thanks.
> > 
> > Also please recheck the rc6 related issues reported here with sna disabled.
> 
> it seems that it is SNA related. I don't see the issue without SNA. I will do
> more testing..

well, does not. see attachement. this is screenshot of corruption *without* SNA and *with* rc6=1 and *without* forcewaked running. I try to reproduce with forcewaked.
Comment 47 Lukas Hejtmanek 2011-10-13 08:04:20 UTC
Created attachment 52299 [details]
rc6 render glitch without SNA
Comment 48 Ted Phelps 2011-10-22 18:53:56 UTC
Re: comment #29, I've re-enabled virtualization and disabled the IOMMU (intel_iommu=off) and haven't seen a GPU hang after 10 days.

Please let me know if there's anything further I can test for you.

-Ted
Comment 49 Florian Mickler 2012-01-21 08:52:37 UTC
A patch referencing this bug report has been merged in Linux v3.2-rc6:

commit c0f372b3746d4ede07b2ace2beabd38d9c045b25
Author: Keith Packard <keithp@keithp.com>
Date:   Wed Nov 16 22:24:52 2011 -0800

    drm/i915: By default, enable RC6 on IVB and SNB when reasonable
Comment 50 Eugeni Dodonov 2012-03-26 19:30:05 UTC
I am pretty sure this should be resolved with 3.3 kernel where we disabled RC6p on Sandy Bridge. But if it still an issue, please, reopen so we could investigate it once again.
Comment 51 Florian Mickler 2012-04-16 14:33:18 UTC
A patch referencing this bug report has been merged in Linux v3.4-rc2:

commit aa46419186992e6b8b8010319f0ca7f40a0d13f5
Author: Eugeni Dodonov <eugeni.dodonov@intel.com>
Date:   Fri Mar 23 11:57:19 2012 -0300

    drm/i915: enable plain RC6 on Sandy Bridge by default

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.