Bug 93844

Summary: RetroArch refuses to go back into windowed mode after fullscreen with Intel i915 graphics
Product: DRI Reporter: Diego Viola <diego.viola>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: anarsoul, intel-gfx-bugs
Version: unspecifiedKeywords: regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
URL: Force AttendClient after DRI2DrawableGone
Whiteboard:
i915 platform: G45 i915 features:
Bug Depends on:    
Bug Blocks: 95381    
Attachments:
Description Flags
Xorg.0.log
none
dmidecode
none
lspci
none
dmesg after X crashing
none
glxinfo
none
Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1
none
Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1
none
Xorg.0.log with latest crash after trying the patch
none
Xorg.0.log backtrace with debug enabled
none
Xorg.0.log after trying xf86-video-intel 1:2.99.917+519+g8229390-1
none
Xorg.0.log with debug=full
none
Xorg.0.log.xz
none
RetroArch Input Hotkey Binds
none
RetroArch backtrace after it hangs
none
Restore signalling on Window destruction
none
Xorg.0.log with patch applied and debug=full
none
Restore signalling on Window destruction
none
Force AttendClient after DRI2DrawableGone
none
Xorg.0.log with latest driver and debug=full
none
Xorg.0.log with debug=full and kernel 4.5.0
none
Arch Linux kernel config
none
x86_64_defconfig
none
git bisect log
none
localmodconfig config
none
git bisect log 2
none
git bisect v4.3 and v4.4 none

Description Diego Viola 2016-01-24 21:56:48 UTC
Created attachment 121248 [details]
Xorg.0.log
Comment 1 Diego Viola 2016-01-24 21:57:40 UTC
My system:

Arch Linux x86-64

$ uname -a
Linux myhost 4.3.3-3-ARCH #1 SMP PREEMPT Wed Jan 20 08:12:23 CET 2016 x86_64 GNU/Linux
Comment 2 Diego Viola 2016-01-24 21:58:48 UTC
Created attachment 121249 [details]
dmidecode
Comment 3 Diego Viola 2016-01-24 22:04:33 UTC
Created attachment 121250 [details]
lspci
Comment 4 Diego Viola 2016-01-24 22:05:22 UTC
RetroArch was 1.3.1 at the time of testing this, I can reproduce the problem, in order to reproduce the issue I just have to toggle full screen and then press f again to switch to windowed mode, then X would crash.
Comment 5 Diego Viola 2016-01-24 22:05:57 UTC
I use i3wm as my window manager.

$ i3 --version
i3 version 4.11 (2015-09-30, branch "4.11") © 2009 Michael Stapelberg and contributors
Comment 6 Diego Viola 2016-01-24 22:10:56 UTC
Created attachment 121251 [details]
dmesg after X crashing
Comment 7 Diego Viola 2016-01-24 22:11:54 UTC
Created attachment 121252 [details]
glxinfo
Comment 8 Chris Wilson 2016-01-25 09:11:55 UTC
Quickest way with a fairly reproducible test case like that is to build http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ with --enable-debug and retest.
Comment 9 Diego Viola 2016-01-25 19:11:34 UTC
Created attachment 121276 [details]
Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1
Comment 10 Diego Viola 2016-01-25 19:12:14 UTC
(In reply to Diego Viola from comment #9)
> Created attachment 121276 [details]
> Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1

Wrong attachment.
Comment 11 Diego Viola 2016-01-25 19:13:26 UTC
Created attachment 121277 [details]
Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1
Comment 12 Diego Viola 2016-01-25 19:15:55 UTC
Just posted the right attachment (logs) with the backtrace. Please let me know if you need more info.
Comment 13 Chris Wilson 2016-01-25 21:45:05 UTC
Can you please try:

commit 8ab71cd3293ad420b0cdf487e8d5c66170ddc13c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 25 21:41:57 2016 +0000

    sna/dri2: Guard signalling swap completion after a FLIP
    
    Before sending the frame swap complete signal after a FLIP, make sure
    the client didn't die in the meantime.
    
    Reported-by: Diego Viola <diego.viola@gmail.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=93844
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 14 Diego Viola 2016-01-25 22:37:34 UTC
(In reply to Chris Wilson from comment #13)
> Can you please try:
> 
> commit 8ab71cd3293ad420b0cdf487e8d5c66170ddc13c
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Jan 25 21:41:57 2016 +0000
> 
>     sna/dri2: Guard signalling swap completion after a FLIP
>     
>     Before sending the frame swap complete signal after a FLIP, make sure
>     the client didn't die in the meantime.
>     
>     Reported-by: Diego Viola <diego.viola@gmail.com>
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=93844
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I tried but it doesn't resolve the problem, still crashes.
Comment 15 Diego Viola 2016-01-25 22:38:17 UTC
Created attachment 121286 [details]
Xorg.0.log with latest crash after trying the patch
Comment 16 Chris Wilson 2016-01-25 22:51:34 UTC
Can you please recompile with ./configure --enable-debug ? It should throw an assertion much earlier, or at least complain about the error.
Comment 17 Diego Viola 2016-01-25 23:08:19 UTC
Created attachment 121288 [details]
Xorg.0.log backtrace with debug enabled
Comment 18 Diego Viola 2016-01-25 23:08:30 UTC
(In reply to Chris Wilson from comment #16)
> Can you please recompile with ./configure --enable-debug ? It should throw
> an assertion much earlier, or at least complain about the error.

Done.
Comment 19 Diego Viola 2016-01-26 17:50:44 UTC
Any ideas about the issue?
Comment 20 Chris Wilson 2016-01-26 20:43:44 UTC
Gone over the possible settings of info->signal to true when info->draw == NULL, and think I have found the last missing piece:

diff --git a/src/sna/sna_dri2.c b/src/sna/sna_dri2.c
index f2f4908..045b12d 100644
--- a/src/sna/sna_dri2.c
+++ b/src/sna/sna_dri2.c
@@ -2787,6 +2787,9 @@ sna_dri2_flip_continue(struct sna_dri2_event *info)
        info->type = info->flip_continue;
        info->flip_continue = 0;
 
+       if (info->draw == NULL)
+               return false;
+
        if (info->sna->mode.front_active == 0)
                return false;
Comment 21 Diego Viola 2016-01-26 21:45:22 UTC
Should I try the code above or will you provide a patch?
Comment 22 Chris Wilson 2016-01-27 10:26:33 UTC
(In reply to Diego Viola from comment #21)
> Should I try the code above or will you provide a patch?

I was hoping you could quickly test the above. :)
Comment 23 Chris Wilson 2016-01-27 10:56:56 UTC
commit 7817949314a21293c8bc34dec214b42932b19aaf
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 27 10:54:46 2016 +0000

    sna/dri2: Avoiding marking a pending-signal on a dead Drawable
    
    If the Drawable is gone, we cannot send it a frame-complete signal, and
    in particular we cannot continue the pending flip-chain.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=93844
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Hopefully that's the last of this bug.
Comment 24 Diego Viola 2016-01-27 11:23:14 UTC
I'll give this a try today and report back, I'm currently not on my computer.

Thanks a lot for your help.
Comment 25 Diego Viola 2016-01-27 22:02:11 UTC
Created attachment 121333 [details]
Xorg.0.log after trying xf86-video-intel 1:2.99.917+519+g8229390-1

The problem still persist after trying the latest changes from git.
Comment 26 Diego Viola 2016-01-27 22:16:54 UTC
Sorry, I installed the latest build incorrectly, after trying the latest changes from git, the crash is actually fixed and I can't get it to crash anymore.
Comment 27 Diego Viola 2016-01-27 22:21:46 UTC
The crash is now fixed after trying the latest code from HEAD, however, there's one more problem, after pressing f to go into fullscreen mode, I can't press f to go back, the image freezes rather than going back into windowed mode.

I can press f again and the program keeps responding, but I can't find a way to go back to windowed mode.

Any ideas?
Comment 28 Diego Viola 2016-01-27 22:23:22 UTC
OK so after pressing f a few times the program doesn't even respond anymore.
Comment 29 Chris Wilson 2016-01-27 23:01:05 UTC
Hmm, at that point, can you send me a ./configure --enable-debug=full Xorg.log? Sounds like we are eating a signal too many.
Comment 30 Diego Viola 2016-01-27 23:08:59 UTC
(In reply to Chris Wilson from comment #29)
> Hmm, at that point, can you send me a ./configure --enable-debug=full
> Xorg.log? Sounds like we are eating a signal too many.

Sure, 1 min.

Thanks.
Comment 31 Diego Viola 2016-01-27 23:25:22 UTC
Created attachment 121336 [details]
Xorg.0.log with debug=full

Attached the Xorg.0.log with debug=full and compressed it with xz since uncompressed it's about 64MB.
Comment 32 Diego Viola 2016-01-27 23:32:32 UTC
(In reply to Diego Viola from comment #31)
> Created attachment 121336 [details]
> Xorg.0.log with debug=full
> 
> Attached the Xorg.0.log with debug=full and compressed it with xz since
> uncompressed it's about 64MB.

I also forgot to say that I hit f (fullscreen) a few times when I captured that log in RetroArch.
Comment 33 Diego Viola 2016-01-27 23:47:41 UTC
One thing I've noticed after trying your changes is that the Refresh Rate in RetroArch stays at 60 Hz, so the FPS is much better now.

Before that, I had an issue that the Refresh Rate would be too high in Retro Arch so emulation would be too fast, to fix that I had to go into fullscreen mode and back to windowed mode.

It was extremely annoying, I had reported this issue to the RetroArch developers but they mentioned the driver or my Refresh Rate was the issue.
Comment 34 Chris Wilson 2016-01-28 09:42:32 UTC
(In reply to Diego Viola from comment #32)
> (In reply to Diego Viola from comment #31)
> > Created attachment 121336 [details]
> > Xorg.0.log with debug=full
> > 
> > Attached the Xorg.0.log with debug=full and compressed it with xz since
> > uncompressed it's about 64MB.
> 
> I also forgot to say that I hit f (fullscreen) a few times when I captured
> that log in RetroArch.

It starts as a window, swapping every vblank. It destroys that window and creates a fullscreen, and swaps every vblank. It then destroys that window and goes quiet. The bug I thought I might see would be where we have a listener but no vblank (and rendering/program updates would cease) - that doesn't appear to be the case here, it looks fairly regular.
Comment 35 Diego Viola 2016-01-28 16:16:27 UTC
About the FPS problem I mentioned earlier, I tested xf86-video-intel-git again and tried to reproduce the issue.

Here's how it went:

1) started a game with retroarch in tiling mode (I use i3), so the game was at the right and the terminal at the left (I use termite).
2) Pressed Shift+Super+Space to make it go into float mode.

Then the FPS would go as high as ~400 FPS.

Should I open a separate bug report for this? It appears the crash is fixed.
Comment 36 Chris Wilson 2016-01-28 16:27:01 UTC
Higher than vblank refresh rate implies that the swap buffers target is off-screen. It should be picking the primary CRTC's vrefresh if available. Might as well see if you can capture that in a full-debug and attach it. Just need the tail really to see where the drawable is when swap buffers is called, and why it is not choosing to use sync-to-some-vblank.
Comment 37 Diego Viola 2016-01-28 16:53:44 UTC
Created attachment 121355 [details]
Xorg.0.log.xz

Attaching the full Xorg.0.log in compressed/xz format because I don't know how big the tail should be.
Comment 38 Chris Wilson 2016-01-28 17:07:43 UTC
Hmm. It is hitting DRI2CopyRegion instead. It may be doing a DRI2SwapBuffers that the xserver/dri2 core converts to CopyRegion is vblank_mode=0, or the client may be doing GLX_MESA_copy_sub_buffers (which calls DRI2CopyRegion directly). CopyRegion is not rate-limited.
Comment 39 Diego Viola 2016-01-28 17:15:27 UTC
(In reply to Chris Wilson from comment #38)
> Hmm. It is hitting DRI2CopyRegion instead. It may be doing a DRI2SwapBuffers
> that the xserver/dri2 core converts to CopyRegion is vblank_mode=0, or the
> client may be doing GLX_MESA_copy_sub_buffers (which calls DRI2CopyRegion
> directly). CopyRegion is not rate-limited.

OK found the problem, it's not a bug.

I found that pressing space enables/disables frame throttling in RetroArch.

I wish I knew that before. :-(
Comment 40 Diego Viola 2016-01-28 17:16:05 UTC
Pressing Space in RetroArch makes the FPS go high, pressing it again normalizes the FPS again.
Comment 41 Diego Viola 2016-01-28 17:19:06 UTC
OK so my last issue is that fullscreen and back to windowed mode bug.
Comment 42 Diego Viola 2016-01-28 17:29:58 UTC
Created attachment 121356 [details]
RetroArch Input Hotkey Binds
Comment 43 Diego Viola 2016-01-29 17:54:41 UTC
So any ideas about my issue with fullscreen? How can we debug it further?
Comment 44 Diego Viola 2016-01-31 00:44:08 UTC
So I just switched to UXA with the latest xf86-video-intel from git and RetroArch doesn't hang anymore when I switch between fullscreen and windowed mode.
Comment 45 Diego Viola 2016-01-31 02:05:23 UTC
Created attachment 121419 [details]
RetroArch backtrace after it hangs

Attaching this backtrace in case it helps.
Comment 46 Chris Wilson 2016-01-31 09:10:10 UTC
Created attachment 121422 [details] [review]
Restore signalling on Window destruction

I dropped this code recently as I thought it was the source of the DRI2 BadDrawable errors - hopefully it is not as we need to undo the IgnoreClient from DRI2 as DRI2 itself does not.
Comment 47 Chris Wilson 2016-01-31 11:13:19 UTC
I think the bug is really:

diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
index 5d54ab9..af7d136 100644
--- a/hw/xfree86/dri2/dri2.c
+++ b/hw/xfree86/dri2/dri2.c
@@ -420,6 +420,9 @@ DRI2DrawableGone(void *p, XID id)
         (*pDraw->pScreen->DestroyPixmap)(pPriv->redirectpixmap);
     }
 
+    if (pPriv->blockedClient)
+        AttendClient(pPriv->blockedClient);
+
     free(pPriv);
 
     return Success;
Comment 48 Diego Viola 2016-01-31 18:54:00 UTC
(In reply to Chris Wilson from comment #47)
> I think the bug is really:
> 
> diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
> index 5d54ab9..af7d136 100644
> --- a/hw/xfree86/dri2/dri2.c
> +++ b/hw/xfree86/dri2/dri2.c
> @@ -420,6 +420,9 @@ DRI2DrawableGone(void *p, XID id)
>          (*pDraw->pScreen->DestroyPixmap)(pPriv->redirectpixmap);
>      }
>  
> +    if (pPriv->blockedClient)
> +        AttendClient(pPriv->blockedClient);
> +
>      free(pPriv);
>  
>      return Success;

Thanks, should I try this patch or the previous one too?

Sorry, I'm a little confused.
Comment 49 Chris Wilson 2016-02-01 08:06:40 UTC
(In reply to Diego Viola from comment #48)
> (In reply to Chris Wilson from comment #47)
> > I think the bug is really:
> > 
> > diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c
> > index 5d54ab9..af7d136 100644
> > --- a/hw/xfree86/dri2/dri2.c
> > +++ b/hw/xfree86/dri2/dri2.c
> > @@ -420,6 +420,9 @@ DRI2DrawableGone(void *p, XID id)
> >          (*pDraw->pScreen->DestroyPixmap)(pPriv->redirectpixmap);
> >      }
> >  
> > +    if (pPriv->blockedClient)
> > +        AttendClient(pPriv->blockedClient);
> > +
> >      free(pPriv);
> >  
> >      return Success;
> 
> Thanks, should I try this patch or the previous one too?

The first because I need to workaround the issue.
Comment 50 Diego Viola 2016-02-01 14:11:27 UTC
I applied the patch from #46 and I still can reproduce the issue.
Comment 51 Chris Wilson 2016-02-01 16:04:03 UTC
(In reply to Diego Viola from comment #50)
> I applied the patch from #46 and I still can reproduce the issue.

Do you mind running with -enable-debug=full again? I want to see if we are sending that signal when the window is destroyed.
Comment 52 Diego Viola 2016-02-01 16:05:34 UTC
(In reply to Chris Wilson from comment #51)
> (In reply to Diego Viola from comment #50)
> > I applied the patch from #46 and I still can reproduce the issue.
> 
> Do you mind running with -enable-debug=full again? I want to see if we are
> sending that signal when the window is destroyed.

Should I do that with your patch applied?
Comment 53 Diego Viola 2016-02-01 16:45:27 UTC
(In reply to Diego Viola from comment #52)
> (In reply to Chris Wilson from comment #51)
> > (In reply to Diego Viola from comment #50)
> > > I applied the patch from #46 and I still can reproduce the issue.
> > 
> > Do you mind running with -enable-debug=full again? I want to see if we are
> > sending that signal when the window is destroyed.
> 
> Should I do that with your patch applied?

Ping?
Comment 54 Diego Viola 2016-02-02 01:49:07 UTC
Created attachment 121449 [details]
Xorg.0.log with patch applied and debug=full

Applied patch from #46 and enabled debug=full, please see the log attached.
Comment 55 Chris Wilson 2016-02-02 09:20:45 UTC
Created attachment 121453 [details] [review]
Restore signalling on Window destruction

That had the expected error of DRI2 complaining it had freed its private data before we could destroy the window, i.e. too late to fix the AttendClient.

This second attempt uses a resource type instead of chaining up the Window destruction.
Comment 56 Chris Wilson 2016-02-02 15:51:53 UTC
I've managed to reproduce the same issue by destroying the window from a second connection (i.e. the window manager closing the window instead of the client). I haven't yet found a way to avoid the bug in DRI2DrawableGone, it depends upon having our destructor run first and there is no guaranteed way that will happen (DRI2DrawableGone is even run before the Window is unrealized).
Comment 57 Chris Wilson 2016-02-02 17:28:43 UTC
Knowing the trigger, it is reproducible on UXA as well - just harder as UXA doesn't do triple buffering so has a smaller race window.
Comment 58 Diego Viola 2016-02-02 17:48:54 UTC
Glad you were able to reproduce this, should I try the patch from #55 or no need?

Thanks.
Comment 59 Chris Wilson 2016-02-02 18:32:01 UTC
(In reply to Diego Viola from comment #58)
> Glad you were able to reproduce this, should I try the patch from #55 or no
> need?

Indeed, no need. Nothing short of fixing it in the xserver works for me, so far.
Comment 60 Diego Viola 2016-02-02 21:10:47 UTC
Should we open a separate bug for the X server?
Comment 61 Chris Wilson 2016-02-03 16:29:30 UTC
Created attachment 121495 [details] [review]
Force AttendClient after DRI2DrawableGone

Best, most horrible, attempt yet. It works, but may cause false positives.
Comment 62 Chris Wilson 2016-02-03 16:30:21 UTC
(In reply to Diego Viola from comment #60)
> Should we open a separate bug for the X server?

http://patchwork.freedesktop.org/patch/72436/
Comment 63 Diego Viola 2016-02-03 17:11:25 UTC
(In reply to Chris Wilson from comment #62)
> (In reply to Diego Viola from comment #60)
> > Should we open a separate bug for the X server?
> 
> http://patchwork.freedesktop.org/patch/72436/

Thank you so much, I'll just wait for the proper fix to be implemented. :-)
Comment 64 Diego Viola 2016-03-07 20:45:53 UTC
Just upgraded to Linux 4.5 and the problem is fixed now. Thank you.

Closing.
Comment 65 Diego Viola 2016-03-07 20:47:02 UTC
4.5.0-rc7
Comment 66 Diego Viola 2016-03-25 18:16:42 UTC
Tested with xorg-server 1.18.2-4 and Linux 4.4.5-1-ARCH as well and it's fixed.

Thanks.
Comment 67 Diego Viola 2016-03-29 03:25:11 UTC
I've just tried the fix with xorg-server 1.18.2 and RetroArch no longer hangs after I launch it (main menu) and when I toggle it between fullscreen and windowed mode.

However, when opening a core and a game I can't go from fullscreen to windowed mode (it hangs like before).

Reopening this bug report.

Linux 4.4.5-1-ARCH
xorg-server 1.18.2-4
RetroArch v1.3.3 -- 577f210
Comment 68 Chris Wilson 2016-03-29 07:22:18 UTC
Same drill as last time. Please reproduce with --enable-debug=full and lets see if that first produces an assertion failure or failing that a chance to painstakingly go through the log file and try and spot what is missing. Thanks.
Comment 69 Diego Viola 2016-03-29 20:40:40 UTC
Created attachment 122621 [details]
Xorg.0.log with latest driver and debug=full
Comment 70 Diego Viola 2016-03-29 20:52:55 UTC
Attached it, that log was after starting RetroArch and loading a core with a game.

This happens independently of what the core and game is. Also, I've noticed that upgrading to Linux 4.5 sort of helped with this issue before, but I'm not sure if it fixed this issue of preventing to go into windowed from fullscreen after a core/game is loaded.

That said, Arch Linux is still using the 4.4.5 kernel.

Please let me know if you need anything else, thanks.
Comment 71 Diego Viola 2016-03-29 20:54:30 UTC
Also, when I captured that log I tried opening the core/game and hitting fullscreen and f again to go into windowed mode, it made it hang most of the time I tried this but it succeeded 2 times or so (fullscreen->windowed worked 2 times).
Comment 72 Diego Viola 2016-03-30 00:07:48 UTC
Interestingly, I was able to reproduce this problem on my ThinkPad T450 as well, I run (Arch Linux there as well).

I've also noticed I can always press F1 while in game to pop up the menu and press F there and it will go back to windowed mode, but I can't do that after I start a game.
Comment 73 Diego Viola 2016-03-30 08:18:14 UTC
I've compiled Linux 4.6.0-rc1+ (git commit 1993b17) and tried switching between fullscreen and windowed mode on a game and it works fine.

After switching between fullscreen/windowed mode I experience a pause of something like 2-3 seconds, so I have to wait 2 or 3 seconds so I can press F again.
Comment 74 Diego Viola 2016-03-30 08:18:47 UTC
A pause in the video.
Comment 75 Diego Viola 2016-03-30 09:20:03 UTC
I don't know what the latest kernels are doing to help fix this issue, does X still needs fixing like you did last time?

The 2-3 seconds video pause is probably some RetroArch related thing.

Thanks.
Comment 76 Chris Wilson 2016-03-30 09:30:54 UTC
I've gone through most of the window deletion events (which seem to denote the change between window sizes) and can see both the Xorg fix take effect and the ddx seems to be cancelling pending events correctly. So far seeing what I'm expecting to see and haven't identified the hang yet (or even noticed where the hang is).

The kernel's vblank code has been changing recently, so could well have an impact.
Comment 77 Diego Viola 2016-03-30 10:23:06 UTC
(In reply to Chris Wilson from comment #76)
> I've gone through most of the window deletion events (which seem to denote
> the change between window sizes) and can see both the Xorg fix take effect
> and the ddx seems to be cancelling pending events correctly. So far seeing
> what I'm expecting to see and haven't identified the hang yet (or even
> noticed where the hang is).
> 
> The kernel's vblank code has been changing recently, so could well have an
> impact.

I see, well, for me the problem occurs only with the 4.4 kernel right now, and yes, the hang happens when I toggle between fullscreen and windowed mode (while in a game).

I remember the 4.5 kernel fixing this before the patch was included in the xorg-server.
Comment 78 Chris Wilson 2016-03-30 10:31:49 UTC
(In reply to Diego Viola from comment #77)
> I see, well, for me the problem occurs only with the 4.4 kernel right now,
> and yes, the hang happens when I toggle between fullscreen and windowed mode
> (while in a game).
> 
> I remember the 4.5 kernel fixing this before the patch was included in the
> xorg-server.

That's strange. We definitely found a real bug in xorg-server that just depended upon there being incomplete DRI2BlockClient (from too many swaps pending) at the time the window was destroyed. We could only prevent that bug by not having any swaps pending, which to me implies they were completing instaneously. Bizarre.

Do you have time for a full-debug from 4.5 so that I can compare?
Comment 79 Diego Viola 2016-03-30 10:33:01 UTC
(In reply to Chris Wilson from comment #78)
> (In reply to Diego Viola from comment #77)
> > I see, well, for me the problem occurs only with the 4.4 kernel right now,
> > and yes, the hang happens when I toggle between fullscreen and windowed mode
> > (while in a game).
> > 
> > I remember the 4.5 kernel fixing this before the patch was included in the
> > xorg-server.
> 
> That's strange. We definitely found a real bug in xorg-server that just
> depended upon there being incomplete DRI2BlockClient (from too many swaps
> pending) at the time the window was destroyed. We could only prevent that
> bug by not having any swaps pending, which to me implies they were
> completing instaneously. Bizarre.
> 
> Do you have time for a full-debug from 4.5 so that I can compare?

Sure. What should I do?
Comment 80 Diego Viola 2016-03-30 10:38:54 UTC
Should I just get a Xorg.0.log with debug=full using the 4.5 kernel?
Comment 81 Diego Viola 2016-03-30 11:13:32 UTC
I'm getting a lot of warnings such as these when compiling xf86-video-intel.

Any ideas why that is?

/usr/include/features.h:331:4: warning: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Wcpp]
 #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
    ^
  CC       sna_display.lo
In file included from /usr/include/stdint.h:25:0,
                 from /usr/lib/gcc/x86_64-unknown-linux-gnu/5.3.0/include/stdint.h:9,
                 from sna_display.c:33:
Comment 82 Diego Viola 2016-03-30 11:23:26 UTC
Created attachment 122638 [details]
Xorg.0.log with debug=full and kernel 4.5.0

This time didn't hang when switching between fullscreen/windowed in RetroArch, the video just also paused 2-3 seconds between changing modes.
Comment 83 Chris Wilson 2016-03-30 11:27:08 UTC
(In reply to Diego Viola from comment #81)
> I'm getting a lot of warnings such as these when compiling xf86-video-intel.
> 
> Any ideas why that is?

For debugging purposes full-debug sets -O0 to disable all optimisation and get accurate source tracking and backtraces. That seems to be causing a silly conflict and pointless spam.
Comment 84 Diego Viola 2016-03-30 11:29:59 UTC
(In reply to Chris Wilson from comment #83)
> (In reply to Diego Viola from comment #81)
> > I'm getting a lot of warnings such as these when compiling xf86-video-intel.
> > 
> > Any ideas why that is?
> 
> For debugging purposes full-debug sets -O0 to disable all optimisation and
> get accurate source tracking and backtraces. That seems to be causing a
> silly conflict and pointless spam.

I see, thanks.

Attached the log you requested.
Comment 85 Chris Wilson 2016-03-30 12:02:26 UTC
(In reply to Diego Viola from comment #82)
> Created attachment 122638 [details]
> Xorg.0.log with debug=full and kernel 4.5.0
> 
> This time didn't hang when switching between fullscreen/windowed in
> RetroArch, the video just also paused 2-3 seconds between changing modes.

At first glance, it looks the same. :|

In xf86-video-intel/test/dri2-race I have tests for everything we have encountered so far. Modulo a bug in the testcase itself (or at least so it looks like), so far it hasn't found a fresh hang. I might try that with an older kernel just in case, but such a bug would also likely to be hardware/setup dependent.
Comment 86 Diego Viola 2016-03-30 15:03:09 UTC
(In reply to Chris Wilson from comment #85)
> (In reply to Diego Viola from comment #82)
> > Created attachment 122638 [details]
> > Xorg.0.log with debug=full and kernel 4.5.0
> > 
> > This time didn't hang when switching between fullscreen/windowed in
> > RetroArch, the video just also paused 2-3 seconds between changing modes.
> 
> At first glance, it looks the same. :|
> 
> In xf86-video-intel/test/dri2-race I have tests for everything we have
> encountered so far. Modulo a bug in the testcase itself (or at least so it
> looks like), so far it hasn't found a fresh hang. I might try that with an
> older kernel just in case, but such a bug would also likely to be
> hardware/setup dependent.

I'm not sure how it can be hardware/setup dependent since I was able to reproduce this on my ThinkPad as well as it happens on my desktop, and they both have different hardware.

I also run the same identical Arch Linux setup on both.

This is also an app hang (RetroArch hang) and not a X hang, it is the same problem you said you was able to reproduce.

It's a case of RetroArch hanging and not being able to do anything until I press F on the app again and then I can move between menus or quit the app.
Comment 87 Diego Viola 2016-03-30 15:08:52 UTC
I tried dri2-race but I didn't know how to stop it, how should I use it?
Comment 88 Chris Wilson 2016-03-30 15:13:12 UTC
At this present time, I do not see any evidence of a missed swap completion event in the logs and the test cases written to exercise the earlier hangs don't show the error. I still need to find something in the log that corresponds with the hang :|

The comment about maybe being hardware dependent is that kernel bugs have a tendency of being so, I meant nothing more.
Comment 89 Diego Viola 2016-03-30 19:20:06 UTC
(In reply to Chris Wilson from comment #88)
> At this present time, I do not see any evidence of a missed swap completion
> event in the logs and the test cases written to exercise the earlier hangs
> don't show the error. I still need to find something in the log that
> corresponds with the hang :|
> 
> The comment about maybe being hardware dependent is that kernel bugs have a
> tendency of being so, I meant nothing more.

I see, thanks for the clarification.

Arch Linux is going to get Linux 4.5 soon in [core], so this bug won't affect me anymore hopefully.

Thanks for all of your help. :)
Comment 90 Diego Viola 2016-04-14 06:56:39 UTC
It's fixed with the latest kernel update (4.5.0-1-ARCH), closing.
Comment 91 Diego Viola 2016-04-14 07:32:25 UTC
Not sure if the latest update of xf86-video-intel had anything to do with this, but the pause between fullscreen and windowed mode also disappeared while in the game. Thanks.
Comment 92 ian.bradley 2016-05-02 17:44:21 UTC
For those of us using non-rolling releases, is there a preferred way for us to get these fixes? Any chance of them being backported to older kernels, say the kernel in Ubuntu's latest LTS?
Comment 93 Diego Viola 2016-05-24 17:59:44 UTC
It looks like the drivers have regressed again, I've installed retroarch 1.3.4 from the Arch repos and I can't go back from fullscreen now while playing a game.
Comment 94 Diego Viola 2016-05-24 18:06:36 UTC
(In reply to Diego Viola from comment #93)
> It looks like the drivers have regressed again, I've installed retroarch
> 1.3.4 from the Arch repos and I can't go back from fullscreen now while
> playing a game.

Any ideas about this? I'm currently on Linux 4.5.4-1-ARCH, it was fine on 4.5.0.
Comment 95 Diego Viola 2016-05-24 18:24:11 UTC
Nevermind, I will do some git bisecting and report back.
Comment 96 Diego Viola 2016-05-24 20:41:58 UTC
Works fine on 4.5.5 and 4.6 so I guess no bisect needed.

Although there is some lag (2-3 seconds) between the fullscreen/windowed transition.
Comment 97 Diego Viola 2016-06-08 19:57:30 UTC
Still can't switch to windowed mode from fullscreen with Linux 4.6.1.

This bug is really annoying.
Comment 98 Jani Nikula 2016-06-09 08:07:16 UTC
(In reply to Diego Viola from comment #96)
> Works fine on 4.5.5 and 4.6 so I guess no bisect needed.

(In reply to Diego Viola from comment #97)
> Still can't switch to windowed mode from fullscreen with Linux 4.6.1.

Are you saying that there's a difference between kernels v4.6 and v4.6.1?
Comment 99 Diego Viola 2016-06-09 13:15:35 UTC
(In reply to Jani Nikula from comment #98)
> (In reply to Diego Viola from comment #96)
> > Works fine on 4.5.5 and 4.6 so I guess no bisect needed.
> 
> (In reply to Diego Viola from comment #97)
> > Still can't switch to windowed mode from fullscreen with Linux 4.6.1.
> 
> Are you saying that there's a difference between kernels v4.6 and v4.6.1?

I really don't know, I'm on 4.6.1-2-ARCH right now, but I also tried 4.6-1-ARCH today, the problem is there in both of them.

When I tried 4.6 before I think I compiled it myself, so could the problem be in the kernel configuration? i.e. standard defconfig vs Arch .config.

I'm getting a bit confused here.
Comment 100 Diego Viola 2016-06-09 14:10:32 UTC
I've compiled Linux 4.6.1 now and I can fullscreen/toggle-windowed-mode just fine, but the problem appears again with the 4.6.1-2-ARCH kernel.

Any ideas?
Comment 101 Jani Nikula 2016-06-09 17:51:53 UTC
(In reply to Diego Viola from comment #100)
> I've compiled Linux 4.6.1 now and I can fullscreen/toggle-windowed-mode just
> fine, but the problem appears again with the 4.6.1-2-ARCH kernel.
> 
> Any ideas?

Usually the distro config is stored under /boot. It takes forever to build a kernel using that, but might be interesting to see if that makes a difference for your self built kernel. If the upstream kernel works with the distro config, it's down to any arch specific patches in their kernel.

Or you could pick up the arch kernel and build that using your config.
Comment 102 Diego Viola 2016-06-09 19:13:42 UTC
(In reply to Jani Nikula from comment #101)
> (In reply to Diego Viola from comment #100)
> > I've compiled Linux 4.6.1 now and I can fullscreen/toggle-windowed-mode just
> > fine, but the problem appears again with the 4.6.1-2-ARCH kernel.
> > 
> > Any ideas?
> 
> Usually the distro config is stored under /boot. It takes forever to build a
> kernel using that, but might be interesting to see if that makes a
> difference for your self built kernel. If the upstream kernel works with the
> distro config, it's down to any arch specific patches in their kernel.
> 
> Or you could pick up the arch kernel and build that using your config.

4.6.2-1-ARCH just landed on Arch and that's also broken, I'm building 4.6.2 with Arch's /proc/config.gz.

I will let you know after I test it. Thanks.
Comment 103 Diego Viola 2016-06-09 20:52:17 UTC
I've compiled 4.6.2 using Arch's /proc/config.gz and I was able to reproduce the hang.

So this is an Arch config problem?
Comment 104 Jani Nikula 2016-06-10 07:23:57 UTC
(In reply to Diego Viola from comment #103)
> I've compiled 4.6.2 using Arch's /proc/config.gz and I was able to reproduce
> the hang.
> 
> So this is an Arch config problem?

Maybe. Please attach both the working and failing configs. Comparing them is painful, but let's see if something stands out.

The kernel source tree has a scripts/diffconfig tool that you can try yourself for comparing two configs, but usually that ends up in tons of noise when comparing such different configs.
Comment 105 Diego Viola 2016-06-10 14:35:26 UTC
Created attachment 124450 [details]
Arch Linux kernel config
Comment 106 Diego Viola 2016-06-10 14:41:41 UTC
Created attachment 124451 [details]
x86_64_defconfig

This is the default configuration based on x86_64_defconfig, this one is working.
Comment 107 Diego Viola 2016-06-10 14:48:30 UTC
The thing that's confusing me is that it was working with the Arch kernel as well a few days/weeks ago, see #comment 90
Comment 108 Diego Viola 2016-06-10 14:51:16 UTC
(In reply to Jani Nikula from comment #104)
> (In reply to Diego Viola from comment #103)
> > I've compiled 4.6.2 using Arch's /proc/config.gz and I was able to reproduce
> > the hang.
> > 
> > So this is an Arch config problem?
> 
> Maybe. Please attach both the working and failing configs. Comparing them is
> painful, but let's see if something stands out.
> 
> The kernel source tree has a scripts/diffconfig tool that you can try
> yourself for comparing two configs, but usually that ends up in tons of
> noise when comparing such different configs.

Thanks, I'll try that, the diffing part is the easiest part, the hardest part for me is knowing what is causing the problem.

For instance, Arch kernel is using CONFIG_HZ_300=y and the default config uses CONFIG_HZ_1000=y, would that have anything to do with it?
Comment 109 Diego Viola 2016-06-11 19:28:00 UTC
Tried building with CONFIG_HZ=1000, that didn't change anything.
Comment 110 Diego Viola 2016-06-11 20:47:51 UTC
OK, I found that removing xf86-video-intel solves the problem, I can now switch between fullscreen and windowed mode just fine in RetroArch.
Comment 111 Diego Viola 2016-06-11 20:52:11 UTC
Solved by removing xf86-video-intel.
Comment 112 Chris Wilson 2016-06-11 22:34:43 UTC
So the first bug was in Xorg, the second in the kernel, and you think removing -intel is the solution.
Comment 113 Diego Viola 2016-06-11 22:53:28 UTC
(In reply to Chris Wilson from comment #112)
> So the first bug was in Xorg, the second in the kernel, and you think
> removing -intel is the solution.

Well, sorry if that was rude, I'm just saying what works for me.

I'll reopen this if something else can be done.
Comment 114 Diego Viola 2016-06-11 23:01:23 UTC
I also have to say that after removing xf86-video-intel, the switch between fullscreen/windowed is instant. I was always getting at least 1-2 seconds delay after toggling fullscreen/windowed mode with the DDX driver.
Comment 115 Diego Viola 2016-06-11 23:09:26 UTC
Should I try Linux from git or xf86-video-intel from git or something else? Please let me know.
Comment 116 Diego Viola 2016-06-12 00:15:43 UTC
(In reply to Chris Wilson from comment #112)
> So the first bug was in Xorg, the second in the kernel, and you think
> removing -intel is the solution.

If the problem I'm having is in fact a bug in the kernel, wouldn't glamor also be failing in that case?

I'm being just curious here. Thanks.
Comment 117 Diego Viola 2016-06-12 00:52:45 UTC
I'm back to xf86-video-intel, building Linux 4.7-rc2 with stock Arch config as we speak.
Comment 118 Diego Viola 2016-06-12 15:43:10 UTC
OK so I've compiled 4.7.0-rc2-ARCH and that changes nothing, the bug is still there.
Comment 119 Diego Viola 2016-06-12 15:46:50 UTC
One thing I've noticed is also when toggling fast forward (Space) in RetroArch, I can fullscreen/switch to windowed just fine.

But as soon as speed is normal, I cannot go back to windowed anymore.

This with xf86-video-intel.
Comment 120 Diego Viola 2016-06-13 14:57:35 UTC
Any ideas?
Comment 121 Diego Viola 2016-06-14 21:06:34 UTC
Got a xf86-video-intel update today and that doesn't resolve the issue either.

xf86-video-intel 2.99.917+662+gb617f80-1

Disappointing.
Comment 122 Diego Viola 2016-06-18 13:39:05 UTC
I've reproduced this on 2 of my computers, the one I've originally reported on, and on my T450, ran the same tests on the different hardware (fall back to windowed from fullscreen on retroarch) and it failed on both computers using xf86-video-intel, but worked fine with glamor.

How can you say it's a bug in the kernel for sure? And will it be fixed for Linux 4.7?

Thanks.
Comment 123 Diego Viola 2016-06-20 14:47:09 UTC
Changed the title to reflect the current problem.
Comment 124 Chris Wilson 2016-06-21 09:00:47 UTC
You reported that a change in kernel configuration is enough to change the symptoms at least. That should be enough to work out the trigger.

Reporting that glamor works is the same as saying that if you don't use the same kernel functions, it works.
Comment 125 Diego Viola 2016-06-21 13:25:18 UTC
(In reply to Chris Wilson from comment #124)
> You reported that a change in kernel configuration is enough to change the
> symptoms at least. That should be enough to work out the trigger.
> 
> Reporting that glamor works is the same as saying that if you don't use the
> same kernel functions, it works.

Yes, this is why I'm getting confused, too many variables.

I mean, it worked fine during the 4.5 kernels, then it regressed in 4.6, but if I compile 4.6 myself it works, it doesn't with the -ARCH kernel though.

I try xf86-video-intel with the stock arch kernel, and it fails, I reboot with modesetting/glamor and it works.

Worked fine with UXA back then but not SNA, etc.

I don't know what else to try, I apologize if I said something stupid. I know you do great work on this driver.

That said, I want to help but I don't know what else to do anymore. Any suggestions?

I'm thinking about trying Linux 4.7-rc4 but I don't know whether that will help or not.
Comment 126 Diego Viola 2016-06-21 19:09:18 UTC
Just re-edited the title, sorry for any misunderstandings.
Comment 127 Diego Viola 2016-06-24 16:30:01 UTC
Git bisect coming.
Comment 128 Diego Viola 2016-06-25 15:05:21 UTC
So I've just did a git clone of Linus' git and did a checkout of v4.5.0 tag and copied /proc/config.gz to .config, built the kernel, rebooted and that kernel is fine, RetroArch works great for doing a fullscreen to windowed and so on.

I'll bisect v4.5.0 and 4.6.2 which is currently failing.
Comment 129 Diego Viola 2016-06-25 18:27:27 UTC
So I'm currently testing this commit:

266c73b77706f2d05b4a3e70a5bb702ed35431d6

Which is not hanging but hanged once. Not sure if I should mark this part as yes/good in the bisect process.
Comment 130 Diego Viola 2016-06-25 18:44:29 UTC
(In reply to Diego Viola from comment #129)
> So I'm currently testing this commit:
> 
> 266c73b77706f2d05b4a3e70a5bb702ed35431d6
> 
> Which is not hanging but hanged once. Not sure if I should mark this part as
> yes/good in the bisect process.

I ended up saying Good because it hanged once, but worked fine 99% of the time.
Comment 131 Diego Viola 2016-06-25 19:00:24 UTC
This commit made it only hang once for me:

266c73b77706f2d05b4a3e70a5bb702ed35431d6

But I think it's the commit where this bug started showing up.

Git bisect is still going.
Comment 132 Diego Viola 2016-06-25 19:06:21 UTC
I think I'll go back and remark this as bad because I was able to reproduce it there:

266c73b77706f2d05b4a3e70a5bb702ed35431d6
Comment 133 Diego Viola 2016-06-27 14:01:06 UTC
git-bisect is over and this is the result:

266c73b77706f2d05b4a3e70a5bb702ed35431d6 is the first bad commit
Comment 134 Diego Viola 2016-06-27 14:04:16 UTC
Created attachment 124742 [details]
git bisect log

git bisect log of Linus' git tree (bisected v4.5 (good) and v4.6 (bad))
Comment 135 Diego Viola 2016-06-27 14:20:49 UTC
Just went back to the Arch Linux kernel 4.6.3-1-ARCH and the problem is easier to reproduce there as it happens 99% of the time.
Comment 136 Diego Viola 2016-06-27 14:43:32 UTC
Please note that during the bisect I've marked 4.5 as good and 4.6 as bad, but I have no idea if 4.6 is actually bad here.

The bisect was using the Arch Linux .config from /proc/config.gz.

I'm currently building v4.6 tag from Linus' git tree using the archlinux .config to confirm this one is bad too.

There are no minor 4.6.x in the git tree so this is why I tested between 4.5 and 4.6.

Thanks.
Comment 137 Diego Viola 2016-06-27 14:48:06 UTC
I'm sure it's unrelated but Firefox is also feeling a lot more sluggish/slow/unusable with recent kernels.
Comment 138 Diego Viola 2016-06-27 15:50:20 UTC
OK, tested v4.6 tag from Linus' git tree with archlinux .config and that is broken as well.
Comment 139 Diego Viola 2016-06-27 17:49:57 UTC
Still broken in 4.7.0-rc5-ARCH.
Comment 140 Diego Viola 2016-06-27 19:00:31 UTC
Anyone please?

Any ideas?
Comment 141 Diego Viola 2016-06-27 20:19:34 UTC
Disabling VSYNC on RetroArch also makes it possible to go back into windowed mode from fullscreen, but the the video is also slower.
Comment 142 Diego Viola 2016-06-27 20:20:03 UTC
(In reply to Diego Viola from comment #141)
> Disabling VSYNC on RetroArch also makes it possible to go back into windowed
> mode from fullscreen, but the the video is also slower.

then the*
Comment 143 Diego Viola 2016-06-27 23:39:37 UTC
Using the above config helps as well:

Section "Device"
        Identifier      "Intel Graphics"
        Driver          "intel"
        Option          "TearFree" "true"
EndSection

RetroArch no longer hangs when switching to windowed mode.
Comment 144 Diego Viola 2016-06-28 00:42:52 UTC
Setting DRI to 3 also helps in /etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
   Identifier  "Intel Graphics"
   Driver      "intel"
   Option      "DRI"    "3"
EndSection
Comment 145 Chris Wilson 2016-06-28 13:40:25 UTC
So far pointing towards i915.enable_fbc=0 at a guess. (Bisecting to a merge is not a good sign)
Comment 146 Diego Viola 2016-06-28 13:50:40 UTC
(In reply to Chris Wilson from comment #145)
> So far pointing towards i915.enable_fbc=0 at a guess. (Bisecting to a merge
> is not a good sign)

Yes, although I already experienced this issue in Linux 4.4, so I'm not exactly sure that commit is the problem.

Although while being on that commit I only experienced a single hang out of 40 times I tried to fullscreen and switch to windowed mode.

Strange.
Comment 147 Diego Viola 2016-06-28 14:34:04 UTC
(In reply to Chris Wilson from comment #145)
> So far pointing towards i915.enable_fbc=0 at a guess. (Bisecting to a merge
> is not a good sign)

I already tried booting with i915.enable_fbc=0 but it didn't change anything, the problem is still there.
Comment 148 Diego Viola 2016-06-28 15:34:37 UTC
Will I have to bisect earlier versions of the kernel?

Let me know if you want me to try anything. Thanks. :)
Comment 149 Chris Wilson 2016-06-28 15:44:44 UTC
First double check the bisect result by testing 266c73b77706f2d05b4a3e70a5bb702ed35431d6 and 266c73b77706f2d05b4a3e70a5bb702ed35431d6^ (i.e. the merge and linus's parent of the merge).

If the bug is clearly reproducible on the merge and not before the merge, restart the bisect with those two end-points and hopefully bisect will descend into the merge itself.
Comment 150 Diego Viola 2016-06-28 15:52:28 UTC
(In reply to Chris Wilson from comment #149)
> First double check the bisect result by testing
> 266c73b77706f2d05b4a3e70a5bb702ed35431d6 and
> 266c73b77706f2d05b4a3e70a5bb702ed35431d6^ (i.e. the merge and linus's parent
> of the merge).
> 
> If the bug is clearly reproducible on the merge and not before the merge,
> restart the bisect with those two end-points and hopefully bisect will
> descend into the merge itself.

OK building 266c73b77706f2d05b4a3e70a5bb702ed35431d6 as we speak.

Thanks.
Comment 151 Diego Viola 2016-06-28 17:35:25 UTC
I've built the kernel at 266c73b77706f2d05b4a3e70a5bb702ed35431d6 and I can't make RetroArch hang.
Comment 152 Diego Viola 2016-06-28 17:45:59 UTC
266c73b77706f2d05b4a3e70a5bb702ed35431d6 is fine.

Should I bisect this with the parent now?
Comment 153 Diego Viola 2016-06-28 19:06:42 UTC
I'm also thinking to restart a bisect between v4.5 and v4.6 and mark the commit 266c73b77706f2d05b4a3e70a5bb702ed35431d6 as good.

I wonder if that will shed some light?
Comment 154 Diego Viola 2016-06-28 19:09:08 UTC
v4.6 with the archlinux .config is definitely broken, so before I do that let me know if you can suggest something.
Comment 155 Diego Viola 2016-06-28 20:35:07 UTC
Making more progress here.

I was able to reproduce the issue with a custom kernel using localmodconfig, so now I'll be able to bisect faster and not wait 1 hour for each kernel build.
Comment 156 Diego Viola 2016-06-28 20:47:09 UTC
Since I can build faster now with localmodconfig and I can reproduce the bug, I'm redoing the bisect between v4.5 and v4.6 and marking 266c73b77706f2d05b4a3e70a5bb702ed35431d6 as good.
Comment 157 Diego Viola 2016-06-28 20:49:33 UTC
Created attachment 124767 [details]
localmodconfig config

Here is the new config I'm using, just for the record.
Comment 158 Diego Viola 2016-06-28 21:39:07 UTC
I wish I used localmodconfig before, all the time I could have saved.
Comment 159 Diego Viola 2016-06-28 22:49:23 UTC
Will attach a new bisect log soon, I found another commit where I can reproduce this.

I think we can safely ignore the previous bisect log. :D
Comment 160 Diego Viola 2016-06-28 23:18:37 UTC
This time I also can reproduce the error more accurately, it hangs the first time I try to switch to windowed mode from fullscreen.
Comment 161 Diego Viola 2016-06-28 23:54:25 UTC
[diego@myhost ~]$ cd linux
[diego@myhost linux]$ git bisect bad
7ac7d19f808697abe6658c64c96868f728273f9c is the first bad commit
commit 7ac7d19f808697abe6658c64c96868f728273f9c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Apr 17 20:42:46 2016 +0100

    drm/i915: Avoid stalling on pending flips for legacy cursor updates
    
    The legacy cursor ioctl expects to be asynchronous with respect to other
    screen updates, in particular page flips. As X updates the cursor from a
    signal context, if the cursor blocks then it will stall both the input
    and output chains causing bad stuttering and horrible UX.
    
    Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94980
    Fixes: 5008e874edd34 ("drm/i915: Make wait_for_flips interruptible.")
    Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: Jani Nikula <jani.nikula@intel.com>
    Cc: stable@vger.kernel.org
    Link: http://patchwork.freedesktop.org/patch/msgid/1460922166-20292-1-git-send-email-chris@chris-wilson.co.uk
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    (cherry picked from commit acf4e84d6167317ff21be5c03e1ea76ea5783701)
    Signed-off-by: Jani Nikula <jani.nikula@intel.com>

:040000 040000 ffd5371b8faffb065a2cd8c5624127ce2a03284c a19fe78a340ba89e0c206e96b65d5c426eb0e150 M	drivers
[diego@myhost linux]$
Comment 162 Diego Viola 2016-06-28 23:56:43 UTC
Created attachment 124772 [details]
git bisect log 2

git-bisect log, second try.

This is using localmodconfig and I think this time it's more accurate as I'm able to reproduce the bug better.
Comment 163 Diego Viola 2016-06-29 01:20:44 UTC
Doing a revert makes the problem go away, e.g.

git revert -n 7ac7d19f808697abe6658c64c96868f728273f9c
Comment 164 Diego Viola 2016-06-29 01:22:34 UTC
(In reply to Diego Viola from comment #163)
> Doing a revert makes the problem go away, e.g.
> 
> git revert -n 7ac7d19f808697abe6658c64c96868f728273f9c

I tried this revert after switching to master BTW.
Comment 165 Chris Wilson 2016-06-29 08:19:25 UTC
(In reply to Diego Viola from comment #163)
> Doing a revert makes the problem go away, e.g.
> 
> git revert -n 7ac7d19f808697abe6658c64c96868f728273f9c

Yeah, a commit touching atomic modesetting is a believable result.
Comment 166 Diego Viola 2016-06-29 13:24:05 UTC
Just for the record: I was testing this on my ThinkPad T450, I will test if reverting the commit 7ac7d19f808697abe6658c64c96868f728273f9c helps the issue go away on the older computer where I originally reported this problem.
Comment 167 Diego Viola 2016-06-29 14:47:53 UTC
Building linux.git (master) as of commit 02184c60eba8491ea574cd17b8ba766c86d468f2 to see if I can still reproduce this on the original machine, without reverting commit 7ac7d19f808697abe6658c64c96868f728273f9c.

Will report back when it's done.
Comment 168 Diego Viola 2016-06-29 15:09:51 UTC
OK, yes, I'm still able to reproduce it with current master.

I will try reverting 7ac7d19f808697abe6658c64c96868f728273f9c on the original machine and see if that fixes it there as well.
Comment 169 Diego Viola 2016-06-29 15:42:00 UTC
OK, I'm happy to report that reverting commit 7ac7d19f808697abe6658c64c96868f728273f9c solves the problem on the original machine as well.
Comment 170 Diego Viola 2016-06-29 15:55:47 UTC
Got a hang or twice on the older/original machine while pressing F too many times with the commit reverted, maybe it's something else?

Although the hang doesn't happen anymore most of the time, fullscreen/windowed works as expected 99.99% of the time and the hang is not as easy to reproduce.

Can be my machine just being too old perhaps? It's a 6-7 years old machine.
Comment 171 Diego Viola 2016-06-29 16:54:28 UTC
Tried to do a git-bisect of v4.4 and v4.5.

However, git said this:

Some good revs are not ancestor of the bad rev.
git bisect cannot work properly in this case.
Maybe you mistook good and bad revs?

I know v4.4 is bad and v4.5 is good.

Currently building v4.3 so I can bisect against v4.4.
Comment 172 Diego Viola 2016-06-29 17:11:45 UTC
Just started a bisect of v4.3 (good) and v4.4 (bad).
Comment 173 Diego Viola 2016-06-29 20:02:26 UTC
git bisect of 4.3 and 4.4 is done, see the next post for the resulting bad commit.
Comment 174 Diego Viola 2016-06-29 20:02:44 UTC
[diego@myhost ~]$ cd linux
[diego@myhost linux]$ git bisect good
f4502c25ebd04691f284fdafff4a5613299c36dc is the first bad commit
commit f4502c25ebd04691f284fdafff4a5613299c36dc
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date:   Thu Aug 27 15:44:04 2015 +0200

    drm/i915: Always try to inherit the initial fb.
    
    The initial state is read out correctly and the state is atomic,
    so it's safe to preserve the fb without any hacks if it's suitable.
    
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 4aa2f488c319e68a1a17baac4bd4269dbc34daf9 67bdae945d662fd1846371b469b402f070203b45 M	drivers
[diego@myhost linux]$
Comment 175 Diego Viola 2016-06-29 20:04:47 UTC
Created attachment 124785 [details]
git bisect v4.3 and v4.4
Comment 176 Diego Viola 2016-06-29 20:33:05 UTC
Just did a checkout of f4502c25ebd04691f284fdafff4a5613299c36dc and indeed it hangs.
Comment 177 Diego Viola 2016-06-29 20:37:28 UTC
d551599181769571f4f68dd93e5d8b15868889af is fine, it doesn't hang.
Comment 178 Diego Viola 2016-06-30 13:18:45 UTC
Just built Linux 4.2 and it works fine also with this version.
Comment 179 Diego Viola 2016-06-30 13:37:23 UTC
Built Linux 4.1 (4.1.0-ARCH-dirty) as well and the problem is also not there.
Comment 180 Diego Viola 2016-06-30 14:08:10 UTC
I tried building v4.0 but my T450 didn't even boot with that kernel.
Comment 181 Diego Viola 2016-06-30 16:49:49 UTC
I've tried building v4.0 on my older machine, but it failed to boot there as well.

Maybe the config was messed up in some way, I recall booting older kernels on this machine in the past.

Not that it matters at the moment.
Comment 182 Diego Viola 2016-07-01 15:57:03 UTC
I tried these two commits on the original/older machine, but all I get is "Input Not Support" after i915 is loaded.

f4502c25ebd04691f284fdafff4a5613299c36dc
d551599181769571f4f68dd93e5d8b15868889af
Comment 183 Diego Viola 2016-07-01 16:31:41 UTC
v4.3 is also hanging on the original/older machine, god damn.
Comment 184 Diego Viola 2016-07-01 16:57:24 UTC
v4.2 also results in retroarch hanging on my older machine.
Comment 185 Diego Viola 2016-07-01 17:22:23 UTC
4.1 is also broken on the original/older machine.
Comment 186 Diego Viola 2016-07-01 18:36:14 UTC
OS: Arch Linux x86-64
RetroArch 1.3.4

RetroArch fails to switch back into windowed mode from fullscreen while pressing F twice, video freezes instead.
Comment 187 Diego Viola 2016-07-01 18:45:05 UTC

*** This bug has been marked as a duplicate of bug 96767 ***
Comment 189 Diego Viola 2016-07-01 20:26:37 UTC

*** This bug has been marked as a duplicate of bug 96769 ***
Comment 190 Chris Wilson 2016-11-03 10:25:44 UTC
commit 40e3be34367141c952678f456f0e0d4632b6c266
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 3 10:18:32 2016 +0000

    sna/dri2: Complete the final flip in a chain after the window is destroyed
    
    When the pending flip is queued, we update all the Windows to use the
    next bo as their rendering target. However, that bo is not yet the
    scanout until the future flip is performed. If the current fullscreen
    Window is destroyed, we still must allow that flip to proceed or else
    the old bo is left on the scanout.
    
    And yes, this is indeed a fix to one of the debug patches that intended
    to detect the error causing #93844. Irony.
    
    Fixes: 7817949314a2 ("sna/dri2: Avoiding marking a pending-signal on a dead Drawable")
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93844
    Reported-by: Diego Viola <diego.viola@gmail.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 191 Diego Viola 2016-11-03 11:13:29 UTC
(In reply to Chris Wilson from comment #190)
> commit 40e3be34367141c952678f456f0e0d4632b6c266
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Nov 3 10:18:32 2016 +0000
> 
>     sna/dri2: Complete the final flip in a chain after the window is
> destroyed
>     
>     When the pending flip is queued, we update all the Windows to use the
>     next bo as their rendering target. However, that bo is not yet the
>     scanout until the future flip is performed. If the current fullscreen
>     Window is destroyed, we still must allow that flip to proceed or else
>     the old bo is left on the scanout.
>     
>     And yes, this is indeed a fix to one of the debug patches that intended
>     to detect the error causing #93844. Irony.
>     
>     Fixes: 7817949314a2 ("sna/dri2: Avoiding marking a pending-signal on a
> dead Drawable")
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93844
>     Reported-by: Diego Viola <diego.viola@gmail.com>
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Tested this and your latest commit after that, the X server is still crashing.
Comment 192 Diego Viola 2016-11-03 12:27:34 UTC
Tested your latest commit and it's indeed fixed. Thanks a bunch. :)

https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=bf7316a4539afdf7a742d2b2ccbbaa5f27918255
Comment 193 Jari Tahvanainen 2016-12-13 09:14:46 UTC
Closing resolved+fixed. Fix introduced in xf86-video-intel and verified by reporter.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.