My system: Arch Linux x86-64 $ uname -a Linux myhost 4.3.3-3-ARCH #1 SMP PREEMPT Wed Jan 20 08:12:23 CET 2016 x86_64 GNU/Linux Created attachment 121249 [details]
dmidecode
Created attachment 121250 [details]
lspci
RetroArch was 1.3.1 at the time of testing this, I can reproduce the problem, in order to reproduce the issue I just have to toggle full screen and then press f again to switch to windowed mode, then X would crash. I use i3wm as my window manager. $ i3 --version i3 version 4.11 (2015-09-30, branch "4.11") © 2009 Michael Stapelberg and contributors Created attachment 121251 [details]
dmesg after X crashing
Created attachment 121252 [details]
glxinfo
Quickest way with a fairly reproducible test case like that is to build http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ with --enable-debug and retest. Created attachment 121276 [details]
Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1
(In reply to Diego Viola from comment #9) > Created attachment 121276 [details] > Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1 Wrong attachment. Created attachment 121277 [details]
Xorg.0.log backtrace with xf86-video-intel-git 2.99.917.535.g7bebe12-1
Just posted the right attachment (logs) with the backtrace. Please let me know if you need more info. Can you please try: commit 8ab71cd3293ad420b0cdf487e8d5c66170ddc13c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 25 21:41:57 2016 +0000 sna/dri2: Guard signalling swap completion after a FLIP Before sending the frame swap complete signal after a FLIP, make sure the client didn't die in the meantime. Reported-by: Diego Viola <diego.viola@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=93844 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to Chris Wilson from comment #13) > Can you please try: > > commit 8ab71cd3293ad420b0cdf487e8d5c66170ddc13c > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Jan 25 21:41:57 2016 +0000 > > sna/dri2: Guard signalling swap completion after a FLIP > > Before sending the frame swap complete signal after a FLIP, make sure > the client didn't die in the meantime. > > Reported-by: Diego Viola <diego.viola@gmail.com> > References: https://bugs.freedesktop.org/show_bug.cgi?id=93844 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I tried but it doesn't resolve the problem, still crashes. Created attachment 121286 [details]
Xorg.0.log with latest crash after trying the patch
Can you please recompile with ./configure --enable-debug ? It should throw an assertion much earlier, or at least complain about the error. Created attachment 121288 [details]
Xorg.0.log backtrace with debug enabled
(In reply to Chris Wilson from comment #16) > Can you please recompile with ./configure --enable-debug ? It should throw > an assertion much earlier, or at least complain about the error. Done. Any ideas about the issue? Gone over the possible settings of info->signal to true when info->draw == NULL, and think I have found the last missing piece: diff --git a/src/sna/sna_dri2.c b/src/sna/sna_dri2.c index f2f4908..045b12d 100644 --- a/src/sna/sna_dri2.c +++ b/src/sna/sna_dri2.c @@ -2787,6 +2787,9 @@ sna_dri2_flip_continue(struct sna_dri2_event *info) info->type = info->flip_continue; info->flip_continue = 0; + if (info->draw == NULL) + return false; + if (info->sna->mode.front_active == 0) return false; Should I try the code above or will you provide a patch? (In reply to Diego Viola from comment #21) > Should I try the code above or will you provide a patch? I was hoping you could quickly test the above. :) commit 7817949314a21293c8bc34dec214b42932b19aaf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 27 10:54:46 2016 +0000 sna/dri2: Avoiding marking a pending-signal on a dead Drawable If the Drawable is gone, we cannot send it a frame-complete signal, and in particular we cannot continue the pending flip-chain. References: https://bugs.freedesktop.org/show_bug.cgi?id=93844 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Hopefully that's the last of this bug. I'll give this a try today and report back, I'm currently not on my computer. Thanks a lot for your help. Created attachment 121333 [details]
Xorg.0.log after trying xf86-video-intel 1:2.99.917+519+g8229390-1
The problem still persist after trying the latest changes from git.
Sorry, I installed the latest build incorrectly, after trying the latest changes from git, the crash is actually fixed and I can't get it to crash anymore. The crash is now fixed after trying the latest code from HEAD, however, there's one more problem, after pressing f to go into fullscreen mode, I can't press f to go back, the image freezes rather than going back into windowed mode. I can press f again and the program keeps responding, but I can't find a way to go back to windowed mode. Any ideas? OK so after pressing f a few times the program doesn't even respond anymore. Hmm, at that point, can you send me a ./configure --enable-debug=full Xorg.log? Sounds like we are eating a signal too many. (In reply to Chris Wilson from comment #29) > Hmm, at that point, can you send me a ./configure --enable-debug=full > Xorg.log? Sounds like we are eating a signal too many. Sure, 1 min. Thanks. Created attachment 121336 [details]
Xorg.0.log with debug=full
Attached the Xorg.0.log with debug=full and compressed it with xz since uncompressed it's about 64MB.
(In reply to Diego Viola from comment #31) > Created attachment 121336 [details] > Xorg.0.log with debug=full > > Attached the Xorg.0.log with debug=full and compressed it with xz since > uncompressed it's about 64MB. I also forgot to say that I hit f (fullscreen) a few times when I captured that log in RetroArch. One thing I've noticed after trying your changes is that the Refresh Rate in RetroArch stays at 60 Hz, so the FPS is much better now. Before that, I had an issue that the Refresh Rate would be too high in Retro Arch so emulation would be too fast, to fix that I had to go into fullscreen mode and back to windowed mode. It was extremely annoying, I had reported this issue to the RetroArch developers but they mentioned the driver or my Refresh Rate was the issue. (In reply to Diego Viola from comment #32) > (In reply to Diego Viola from comment #31) > > Created attachment 121336 [details] > > Xorg.0.log with debug=full > > > > Attached the Xorg.0.log with debug=full and compressed it with xz since > > uncompressed it's about 64MB. > > I also forgot to say that I hit f (fullscreen) a few times when I captured > that log in RetroArch. It starts as a window, swapping every vblank. It destroys that window and creates a fullscreen, and swaps every vblank. It then destroys that window and goes quiet. The bug I thought I might see would be where we have a listener but no vblank (and rendering/program updates would cease) - that doesn't appear to be the case here, it looks fairly regular. About the FPS problem I mentioned earlier, I tested xf86-video-intel-git again and tried to reproduce the issue. Here's how it went: 1) started a game with retroarch in tiling mode (I use i3), so the game was at the right and the terminal at the left (I use termite). 2) Pressed Shift+Super+Space to make it go into float mode. Then the FPS would go as high as ~400 FPS. Should I open a separate bug report for this? It appears the crash is fixed. Higher than vblank refresh rate implies that the swap buffers target is off-screen. It should be picking the primary CRTC's vrefresh if available. Might as well see if you can capture that in a full-debug and attach it. Just need the tail really to see where the drawable is when swap buffers is called, and why it is not choosing to use sync-to-some-vblank. Created attachment 121355 [details]
Xorg.0.log.xz
Attaching the full Xorg.0.log in compressed/xz format because I don't know how big the tail should be.
Hmm. It is hitting DRI2CopyRegion instead. It may be doing a DRI2SwapBuffers that the xserver/dri2 core converts to CopyRegion is vblank_mode=0, or the client may be doing GLX_MESA_copy_sub_buffers (which calls DRI2CopyRegion directly). CopyRegion is not rate-limited. (In reply to Chris Wilson from comment #38) > Hmm. It is hitting DRI2CopyRegion instead. It may be doing a DRI2SwapBuffers > that the xserver/dri2 core converts to CopyRegion is vblank_mode=0, or the > client may be doing GLX_MESA_copy_sub_buffers (which calls DRI2CopyRegion > directly). CopyRegion is not rate-limited. OK found the problem, it's not a bug. I found that pressing space enables/disables frame throttling in RetroArch. I wish I knew that before. :-( Pressing Space in RetroArch makes the FPS go high, pressing it again normalizes the FPS again. OK so my last issue is that fullscreen and back to windowed mode bug. Created attachment 121356 [details]
RetroArch Input Hotkey Binds
So any ideas about my issue with fullscreen? How can we debug it further? So I just switched to UXA with the latest xf86-video-intel from git and RetroArch doesn't hang anymore when I switch between fullscreen and windowed mode. Created attachment 121419 [details]
RetroArch backtrace after it hangs
Attaching this backtrace in case it helps.
Created attachment 121422 [details] [review] Restore signalling on Window destruction I dropped this code recently as I thought it was the source of the DRI2 BadDrawable errors - hopefully it is not as we need to undo the IgnoreClient from DRI2 as DRI2 itself does not. I think the bug is really: diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c index 5d54ab9..af7d136 100644 --- a/hw/xfree86/dri2/dri2.c +++ b/hw/xfree86/dri2/dri2.c @@ -420,6 +420,9 @@ DRI2DrawableGone(void *p, XID id) (*pDraw->pScreen->DestroyPixmap)(pPriv->redirectpixmap); } + if (pPriv->blockedClient) + AttendClient(pPriv->blockedClient); + free(pPriv); return Success; (In reply to Chris Wilson from comment #47) > I think the bug is really: > > diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c > index 5d54ab9..af7d136 100644 > --- a/hw/xfree86/dri2/dri2.c > +++ b/hw/xfree86/dri2/dri2.c > @@ -420,6 +420,9 @@ DRI2DrawableGone(void *p, XID id) > (*pDraw->pScreen->DestroyPixmap)(pPriv->redirectpixmap); > } > > + if (pPriv->blockedClient) > + AttendClient(pPriv->blockedClient); > + > free(pPriv); > > return Success; Thanks, should I try this patch or the previous one too? Sorry, I'm a little confused. (In reply to Diego Viola from comment #48) > (In reply to Chris Wilson from comment #47) > > I think the bug is really: > > > > diff --git a/hw/xfree86/dri2/dri2.c b/hw/xfree86/dri2/dri2.c > > index 5d54ab9..af7d136 100644 > > --- a/hw/xfree86/dri2/dri2.c > > +++ b/hw/xfree86/dri2/dri2.c > > @@ -420,6 +420,9 @@ DRI2DrawableGone(void *p, XID id) > > (*pDraw->pScreen->DestroyPixmap)(pPriv->redirectpixmap); > > } > > > > + if (pPriv->blockedClient) > > + AttendClient(pPriv->blockedClient); > > + > > free(pPriv); > > > > return Success; > > Thanks, should I try this patch or the previous one too? The first because I need to workaround the issue. I applied the patch from #46 and I still can reproduce the issue. (In reply to Diego Viola from comment #50) > I applied the patch from #46 and I still can reproduce the issue. Do you mind running with -enable-debug=full again? I want to see if we are sending that signal when the window is destroyed. (In reply to Chris Wilson from comment #51) > (In reply to Diego Viola from comment #50) > > I applied the patch from #46 and I still can reproduce the issue. > > Do you mind running with -enable-debug=full again? I want to see if we are > sending that signal when the window is destroyed. Should I do that with your patch applied? (In reply to Diego Viola from comment #52) > (In reply to Chris Wilson from comment #51) > > (In reply to Diego Viola from comment #50) > > > I applied the patch from #46 and I still can reproduce the issue. > > > > Do you mind running with -enable-debug=full again? I want to see if we are > > sending that signal when the window is destroyed. > > Should I do that with your patch applied? Ping? Created attachment 121449 [details]
Xorg.0.log with patch applied and debug=full
Applied patch from #46 and enabled debug=full, please see the log attached.
Created attachment 121453 [details] [review] Restore signalling on Window destruction That had the expected error of DRI2 complaining it had freed its private data before we could destroy the window, i.e. too late to fix the AttendClient. This second attempt uses a resource type instead of chaining up the Window destruction. I've managed to reproduce the same issue by destroying the window from a second connection (i.e. the window manager closing the window instead of the client). I haven't yet found a way to avoid the bug in DRI2DrawableGone, it depends upon having our destructor run first and there is no guaranteed way that will happen (DRI2DrawableGone is even run before the Window is unrealized). Knowing the trigger, it is reproducible on UXA as well - just harder as UXA doesn't do triple buffering so has a smaller race window. Glad you were able to reproduce this, should I try the patch from #55 or no need? Thanks. (In reply to Diego Viola from comment #58) > Glad you were able to reproduce this, should I try the patch from #55 or no > need? Indeed, no need. Nothing short of fixing it in the xserver works for me, so far. Should we open a separate bug for the X server? Created attachment 121495 [details] [review] Force AttendClient after DRI2DrawableGone Best, most horrible, attempt yet. It works, but may cause false positives. (In reply to Diego Viola from comment #60) > Should we open a separate bug for the X server? http://patchwork.freedesktop.org/patch/72436/ (In reply to Chris Wilson from comment #62) > (In reply to Diego Viola from comment #60) > > Should we open a separate bug for the X server? > > http://patchwork.freedesktop.org/patch/72436/ Thank you so much, I'll just wait for the proper fix to be implemented. :-) Just upgraded to Linux 4.5 and the problem is fixed now. Thank you. Closing. 4.5.0-rc7 Tested with xorg-server 1.18.2-4 and Linux 4.4.5-1-ARCH as well and it's fixed. Thanks. I've just tried the fix with xorg-server 1.18.2 and RetroArch no longer hangs after I launch it (main menu) and when I toggle it between fullscreen and windowed mode. However, when opening a core and a game I can't go from fullscreen to windowed mode (it hangs like before). Reopening this bug report. Linux 4.4.5-1-ARCH xorg-server 1.18.2-4 RetroArch v1.3.3 -- 577f210 Same drill as last time. Please reproduce with --enable-debug=full and lets see if that first produces an assertion failure or failing that a chance to painstakingly go through the log file and try and spot what is missing. Thanks. Created attachment 122621 [details]
Xorg.0.log with latest driver and debug=full
Attached it, that log was after starting RetroArch and loading a core with a game. This happens independently of what the core and game is. Also, I've noticed that upgrading to Linux 4.5 sort of helped with this issue before, but I'm not sure if it fixed this issue of preventing to go into windowed from fullscreen after a core/game is loaded. That said, Arch Linux is still using the 4.4.5 kernel. Please let me know if you need anything else, thanks. Also, when I captured that log I tried opening the core/game and hitting fullscreen and f again to go into windowed mode, it made it hang most of the time I tried this but it succeeded 2 times or so (fullscreen->windowed worked 2 times). Interestingly, I was able to reproduce this problem on my ThinkPad T450 as well, I run (Arch Linux there as well). I've also noticed I can always press F1 while in game to pop up the menu and press F there and it will go back to windowed mode, but I can't do that after I start a game. I've compiled Linux 4.6.0-rc1+ (git commit 1993b17) and tried switching between fullscreen and windowed mode on a game and it works fine. After switching between fullscreen/windowed mode I experience a pause of something like 2-3 seconds, so I have to wait 2 or 3 seconds so I can press F again. A pause in the video. I don't know what the latest kernels are doing to help fix this issue, does X still needs fixing like you did last time? The 2-3 seconds video pause is probably some RetroArch related thing. Thanks. I've gone through most of the window deletion events (which seem to denote the change between window sizes) and can see both the Xorg fix take effect and the ddx seems to be cancelling pending events correctly. So far seeing what I'm expecting to see and haven't identified the hang yet (or even noticed where the hang is). The kernel's vblank code has been changing recently, so could well have an impact. (In reply to Chris Wilson from comment #76) > I've gone through most of the window deletion events (which seem to denote > the change between window sizes) and can see both the Xorg fix take effect > and the ddx seems to be cancelling pending events correctly. So far seeing > what I'm expecting to see and haven't identified the hang yet (or even > noticed where the hang is). > > The kernel's vblank code has been changing recently, so could well have an > impact. I see, well, for me the problem occurs only with the 4.4 kernel right now, and yes, the hang happens when I toggle between fullscreen and windowed mode (while in a game). I remember the 4.5 kernel fixing this before the patch was included in the xorg-server. (In reply to Diego Viola from comment #77) > I see, well, for me the problem occurs only with the 4.4 kernel right now, > and yes, the hang happens when I toggle between fullscreen and windowed mode > (while in a game). > > I remember the 4.5 kernel fixing this before the patch was included in the > xorg-server. That's strange. We definitely found a real bug in xorg-server that just depended upon there being incomplete DRI2BlockClient (from too many swaps pending) at the time the window was destroyed. We could only prevent that bug by not having any swaps pending, which to me implies they were completing instaneously. Bizarre. Do you have time for a full-debug from 4.5 so that I can compare? (In reply to Chris Wilson from comment #78) > (In reply to Diego Viola from comment #77) > > I see, well, for me the problem occurs only with the 4.4 kernel right now, > > and yes, the hang happens when I toggle between fullscreen and windowed mode > > (while in a game). > > > > I remember the 4.5 kernel fixing this before the patch was included in the > > xorg-server. > > That's strange. We definitely found a real bug in xorg-server that just > depended upon there being incomplete DRI2BlockClient (from too many swaps > pending) at the time the window was destroyed. We could only prevent that > bug by not having any swaps pending, which to me implies they were > completing instaneously. Bizarre. > > Do you have time for a full-debug from 4.5 so that I can compare? Sure. What should I do? Should I just get a Xorg.0.log with debug=full using the 4.5 kernel? I'm getting a lot of warnings such as these when compiling xf86-video-intel. Any ideas why that is? /usr/include/features.h:331:4: warning: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Wcpp] # warning _FORTIFY_SOURCE requires compiling with optimization (-O) ^ CC sna_display.lo In file included from /usr/include/stdint.h:25:0, from /usr/lib/gcc/x86_64-unknown-linux-gnu/5.3.0/include/stdint.h:9, from sna_display.c:33: Created attachment 122638 [details]
Xorg.0.log with debug=full and kernel 4.5.0
This time didn't hang when switching between fullscreen/windowed in RetroArch, the video just also paused 2-3 seconds between changing modes.
(In reply to Diego Viola from comment #81) > I'm getting a lot of warnings such as these when compiling xf86-video-intel. > > Any ideas why that is? For debugging purposes full-debug sets -O0 to disable all optimisation and get accurate source tracking and backtraces. That seems to be causing a silly conflict and pointless spam. (In reply to Chris Wilson from comment #83) > (In reply to Diego Viola from comment #81) > > I'm getting a lot of warnings such as these when compiling xf86-video-intel. > > > > Any ideas why that is? > > For debugging purposes full-debug sets -O0 to disable all optimisation and > get accurate source tracking and backtraces. That seems to be causing a > silly conflict and pointless spam. I see, thanks. Attached the log you requested. (In reply to Diego Viola from comment #82) > Created attachment 122638 [details] > Xorg.0.log with debug=full and kernel 4.5.0 > > This time didn't hang when switching between fullscreen/windowed in > RetroArch, the video just also paused 2-3 seconds between changing modes. At first glance, it looks the same. :| In xf86-video-intel/test/dri2-race I have tests for everything we have encountered so far. Modulo a bug in the testcase itself (or at least so it looks like), so far it hasn't found a fresh hang. I might try that with an older kernel just in case, but such a bug would also likely to be hardware/setup dependent. (In reply to Chris Wilson from comment #85) > (In reply to Diego Viola from comment #82) > > Created attachment 122638 [details] > > Xorg.0.log with debug=full and kernel 4.5.0 > > > > This time didn't hang when switching between fullscreen/windowed in > > RetroArch, the video just also paused 2-3 seconds between changing modes. > > At first glance, it looks the same. :| > > In xf86-video-intel/test/dri2-race I have tests for everything we have > encountered so far. Modulo a bug in the testcase itself (or at least so it > looks like), so far it hasn't found a fresh hang. I might try that with an > older kernel just in case, but such a bug would also likely to be > hardware/setup dependent. I'm not sure how it can be hardware/setup dependent since I was able to reproduce this on my ThinkPad as well as it happens on my desktop, and they both have different hardware. I also run the same identical Arch Linux setup on both. This is also an app hang (RetroArch hang) and not a X hang, it is the same problem you said you was able to reproduce. It's a case of RetroArch hanging and not being able to do anything until I press F on the app again and then I can move between menus or quit the app. I tried dri2-race but I didn't know how to stop it, how should I use it? At this present time, I do not see any evidence of a missed swap completion event in the logs and the test cases written to exercise the earlier hangs don't show the error. I still need to find something in the log that corresponds with the hang :| The comment about maybe being hardware dependent is that kernel bugs have a tendency of being so, I meant nothing more. (In reply to Chris Wilson from comment #88) > At this present time, I do not see any evidence of a missed swap completion > event in the logs and the test cases written to exercise the earlier hangs > don't show the error. I still need to find something in the log that > corresponds with the hang :| > > The comment about maybe being hardware dependent is that kernel bugs have a > tendency of being so, I meant nothing more. I see, thanks for the clarification. Arch Linux is going to get Linux 4.5 soon in [core], so this bug won't affect me anymore hopefully. Thanks for all of your help. :) It's fixed with the latest kernel update (4.5.0-1-ARCH), closing. Not sure if the latest update of xf86-video-intel had anything to do with this, but the pause between fullscreen and windowed mode also disappeared while in the game. Thanks. For those of us using non-rolling releases, is there a preferred way for us to get these fixes? Any chance of them being backported to older kernels, say the kernel in Ubuntu's latest LTS? It looks like the drivers have regressed again, I've installed retroarch 1.3.4 from the Arch repos and I can't go back from fullscreen now while playing a game. (In reply to Diego Viola from comment #93) > It looks like the drivers have regressed again, I've installed retroarch > 1.3.4 from the Arch repos and I can't go back from fullscreen now while > playing a game. Any ideas about this? I'm currently on Linux 4.5.4-1-ARCH, it was fine on 4.5.0. Nevermind, I will do some git bisecting and report back. Works fine on 4.5.5 and 4.6 so I guess no bisect needed. Although there is some lag (2-3 seconds) between the fullscreen/windowed transition. Still can't switch to windowed mode from fullscreen with Linux 4.6.1. This bug is really annoying. (In reply to Diego Viola from comment #96) > Works fine on 4.5.5 and 4.6 so I guess no bisect needed. (In reply to Diego Viola from comment #97) > Still can't switch to windowed mode from fullscreen with Linux 4.6.1. Are you saying that there's a difference between kernels v4.6 and v4.6.1? (In reply to Jani Nikula from comment #98) > (In reply to Diego Viola from comment #96) > > Works fine on 4.5.5 and 4.6 so I guess no bisect needed. > > (In reply to Diego Viola from comment #97) > > Still can't switch to windowed mode from fullscreen with Linux 4.6.1. > > Are you saying that there's a difference between kernels v4.6 and v4.6.1? I really don't know, I'm on 4.6.1-2-ARCH right now, but I also tried 4.6-1-ARCH today, the problem is there in both of them. When I tried 4.6 before I think I compiled it myself, so could the problem be in the kernel configuration? i.e. standard defconfig vs Arch .config. I'm getting a bit confused here. I've compiled Linux 4.6.1 now and I can fullscreen/toggle-windowed-mode just fine, but the problem appears again with the 4.6.1-2-ARCH kernel. Any ideas? (In reply to Diego Viola from comment #100) > I've compiled Linux 4.6.1 now and I can fullscreen/toggle-windowed-mode just > fine, but the problem appears again with the 4.6.1-2-ARCH kernel. > > Any ideas? Usually the distro config is stored under /boot. It takes forever to build a kernel using that, but might be interesting to see if that makes a difference for your self built kernel. If the upstream kernel works with the distro config, it's down to any arch specific patches in their kernel. Or you could pick up the arch kernel and build that using your config. (In reply to Jani Nikula from comment #101) > (In reply to Diego Viola from comment #100) > > I've compiled Linux 4.6.1 now and I can fullscreen/toggle-windowed-mode just > > fine, but the problem appears again with the 4.6.1-2-ARCH kernel. > > > > Any ideas? > > Usually the distro config is stored under /boot. It takes forever to build a > kernel using that, but might be interesting to see if that makes a > difference for your self built kernel. If the upstream kernel works with the > distro config, it's down to any arch specific patches in their kernel. > > Or you could pick up the arch kernel and build that using your config. 4.6.2-1-ARCH just landed on Arch and that's also broken, I'm building 4.6.2 with Arch's /proc/config.gz. I will let you know after I test it. Thanks. I've compiled 4.6.2 using Arch's /proc/config.gz and I was able to reproduce the hang. So this is an Arch config problem? (In reply to Diego Viola from comment #103) > I've compiled 4.6.2 using Arch's /proc/config.gz and I was able to reproduce > the hang. > > So this is an Arch config problem? Maybe. Please attach both the working and failing configs. Comparing them is painful, but let's see if something stands out. The kernel source tree has a scripts/diffconfig tool that you can try yourself for comparing two configs, but usually that ends up in tons of noise when comparing such different configs. Created attachment 124450 [details]
Arch Linux kernel config
Created attachment 124451 [details]
x86_64_defconfig
This is the default configuration based on x86_64_defconfig, this one is working.
The thing that's confusing me is that it was working with the Arch kernel as well a few days/weeks ago, see #comment 90 (In reply to Jani Nikula from comment #104) > (In reply to Diego Viola from comment #103) > > I've compiled 4.6.2 using Arch's /proc/config.gz and I was able to reproduce > > the hang. > > > > So this is an Arch config problem? > > Maybe. Please attach both the working and failing configs. Comparing them is > painful, but let's see if something stands out. > > The kernel source tree has a scripts/diffconfig tool that you can try > yourself for comparing two configs, but usually that ends up in tons of > noise when comparing such different configs. Thanks, I'll try that, the diffing part is the easiest part, the hardest part for me is knowing what is causing the problem. For instance, Arch kernel is using CONFIG_HZ_300=y and the default config uses CONFIG_HZ_1000=y, would that have anything to do with it? Tried building with CONFIG_HZ=1000, that didn't change anything. OK, I found that removing xf86-video-intel solves the problem, I can now switch between fullscreen and windowed mode just fine in RetroArch. Solved by removing xf86-video-intel. So the first bug was in Xorg, the second in the kernel, and you think removing -intel is the solution. (In reply to Chris Wilson from comment #112) > So the first bug was in Xorg, the second in the kernel, and you think > removing -intel is the solution. Well, sorry if that was rude, I'm just saying what works for me. I'll reopen this if something else can be done. I also have to say that after removing xf86-video-intel, the switch between fullscreen/windowed is instant. I was always getting at least 1-2 seconds delay after toggling fullscreen/windowed mode with the DDX driver. Should I try Linux from git or xf86-video-intel from git or something else? Please let me know. (In reply to Chris Wilson from comment #112) > So the first bug was in Xorg, the second in the kernel, and you think > removing -intel is the solution. If the problem I'm having is in fact a bug in the kernel, wouldn't glamor also be failing in that case? I'm being just curious here. Thanks. I'm back to xf86-video-intel, building Linux 4.7-rc2 with stock Arch config as we speak. OK so I've compiled 4.7.0-rc2-ARCH and that changes nothing, the bug is still there. One thing I've noticed is also when toggling fast forward (Space) in RetroArch, I can fullscreen/switch to windowed just fine. But as soon as speed is normal, I cannot go back to windowed anymore. This with xf86-video-intel. Any ideas? Got a xf86-video-intel update today and that doesn't resolve the issue either. xf86-video-intel 2.99.917+662+gb617f80-1 Disappointing. I've reproduced this on 2 of my computers, the one I've originally reported on, and on my T450, ran the same tests on the different hardware (fall back to windowed from fullscreen on retroarch) and it failed on both computers using xf86-video-intel, but worked fine with glamor. How can you say it's a bug in the kernel for sure? And will it be fixed for Linux 4.7? Thanks. Changed the title to reflect the current problem. You reported that a change in kernel configuration is enough to change the symptoms at least. That should be enough to work out the trigger. Reporting that glamor works is the same as saying that if you don't use the same kernel functions, it works. (In reply to Chris Wilson from comment #124) > You reported that a change in kernel configuration is enough to change the > symptoms at least. That should be enough to work out the trigger. > > Reporting that glamor works is the same as saying that if you don't use the > same kernel functions, it works. Yes, this is why I'm getting confused, too many variables. I mean, it worked fine during the 4.5 kernels, then it regressed in 4.6, but if I compile 4.6 myself it works, it doesn't with the -ARCH kernel though. I try xf86-video-intel with the stock arch kernel, and it fails, I reboot with modesetting/glamor and it works. Worked fine with UXA back then but not SNA, etc. I don't know what else to try, I apologize if I said something stupid. I know you do great work on this driver. That said, I want to help but I don't know what else to do anymore. Any suggestions? I'm thinking about trying Linux 4.7-rc4 but I don't know whether that will help or not. Just re-edited the title, sorry for any misunderstandings. Git bisect coming. So I've just did a git clone of Linus' git and did a checkout of v4.5.0 tag and copied /proc/config.gz to .config, built the kernel, rebooted and that kernel is fine, RetroArch works great for doing a fullscreen to windowed and so on. I'll bisect v4.5.0 and 4.6.2 which is currently failing. So I'm currently testing this commit: 266c73b77706f2d05b4a3e70a5bb702ed35431d6 Which is not hanging but hanged once. Not sure if I should mark this part as yes/good in the bisect process. (In reply to Diego Viola from comment #129) > So I'm currently testing this commit: > > 266c73b77706f2d05b4a3e70a5bb702ed35431d6 > > Which is not hanging but hanged once. Not sure if I should mark this part as > yes/good in the bisect process. I ended up saying Good because it hanged once, but worked fine 99% of the time. This commit made it only hang once for me: 266c73b77706f2d05b4a3e70a5bb702ed35431d6 But I think it's the commit where this bug started showing up. Git bisect is still going. I think I'll go back and remark this as bad because I was able to reproduce it there: 266c73b77706f2d05b4a3e70a5bb702ed35431d6 git-bisect is over and this is the result: 266c73b77706f2d05b4a3e70a5bb702ed35431d6 is the first bad commit Created attachment 124742 [details]
git bisect log
git bisect log of Linus' git tree (bisected v4.5 (good) and v4.6 (bad))
Just went back to the Arch Linux kernel 4.6.3-1-ARCH and the problem is easier to reproduce there as it happens 99% of the time. Please note that during the bisect I've marked 4.5 as good and 4.6 as bad, but I have no idea if 4.6 is actually bad here. The bisect was using the Arch Linux .config from /proc/config.gz. I'm currently building v4.6 tag from Linus' git tree using the archlinux .config to confirm this one is bad too. There are no minor 4.6.x in the git tree so this is why I tested between 4.5 and 4.6. Thanks. I'm sure it's unrelated but Firefox is also feeling a lot more sluggish/slow/unusable with recent kernels. OK, tested v4.6 tag from Linus' git tree with archlinux .config and that is broken as well. Still broken in 4.7.0-rc5-ARCH. Anyone please? Any ideas? Disabling VSYNC on RetroArch also makes it possible to go back into windowed mode from fullscreen, but the the video is also slower. (In reply to Diego Viola from comment #141) > Disabling VSYNC on RetroArch also makes it possible to go back into windowed > mode from fullscreen, but the the video is also slower. then the* Using the above config helps as well: Section "Device" Identifier "Intel Graphics" Driver "intel" Option "TearFree" "true" EndSection RetroArch no longer hangs when switching to windowed mode. Setting DRI to 3 also helps in /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "DRI" "3" EndSection So far pointing towards i915.enable_fbc=0 at a guess. (Bisecting to a merge is not a good sign) (In reply to Chris Wilson from comment #145) > So far pointing towards i915.enable_fbc=0 at a guess. (Bisecting to a merge > is not a good sign) Yes, although I already experienced this issue in Linux 4.4, so I'm not exactly sure that commit is the problem. Although while being on that commit I only experienced a single hang out of 40 times I tried to fullscreen and switch to windowed mode. Strange. (In reply to Chris Wilson from comment #145) > So far pointing towards i915.enable_fbc=0 at a guess. (Bisecting to a merge > is not a good sign) I already tried booting with i915.enable_fbc=0 but it didn't change anything, the problem is still there. Will I have to bisect earlier versions of the kernel? Let me know if you want me to try anything. Thanks. :) First double check the bisect result by testing 266c73b77706f2d05b4a3e70a5bb702ed35431d6 and 266c73b77706f2d05b4a3e70a5bb702ed35431d6^ (i.e. the merge and linus's parent of the merge). If the bug is clearly reproducible on the merge and not before the merge, restart the bisect with those two end-points and hopefully bisect will descend into the merge itself. (In reply to Chris Wilson from comment #149) > First double check the bisect result by testing > 266c73b77706f2d05b4a3e70a5bb702ed35431d6 and > 266c73b77706f2d05b4a3e70a5bb702ed35431d6^ (i.e. the merge and linus's parent > of the merge). > > If the bug is clearly reproducible on the merge and not before the merge, > restart the bisect with those two end-points and hopefully bisect will > descend into the merge itself. OK building 266c73b77706f2d05b4a3e70a5bb702ed35431d6 as we speak. Thanks. I've built the kernel at 266c73b77706f2d05b4a3e70a5bb702ed35431d6 and I can't make RetroArch hang. 266c73b77706f2d05b4a3e70a5bb702ed35431d6 is fine. Should I bisect this with the parent now? I'm also thinking to restart a bisect between v4.5 and v4.6 and mark the commit 266c73b77706f2d05b4a3e70a5bb702ed35431d6 as good. I wonder if that will shed some light? v4.6 with the archlinux .config is definitely broken, so before I do that let me know if you can suggest something. Making more progress here. I was able to reproduce the issue with a custom kernel using localmodconfig, so now I'll be able to bisect faster and not wait 1 hour for each kernel build. Since I can build faster now with localmodconfig and I can reproduce the bug, I'm redoing the bisect between v4.5 and v4.6 and marking 266c73b77706f2d05b4a3e70a5bb702ed35431d6 as good. Created attachment 124767 [details]
localmodconfig config
Here is the new config I'm using, just for the record.
I wish I used localmodconfig before, all the time I could have saved. Will attach a new bisect log soon, I found another commit where I can reproduce this. I think we can safely ignore the previous bisect log. :D This time I also can reproduce the error more accurately, it hangs the first time I try to switch to windowed mode from fullscreen. [diego@myhost ~]$ cd linux [diego@myhost linux]$ git bisect bad 7ac7d19f808697abe6658c64c96868f728273f9c is the first bad commit commit 7ac7d19f808697abe6658c64c96868f728273f9c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Apr 17 20:42:46 2016 +0100 drm/i915: Avoid stalling on pending flips for legacy cursor updates The legacy cursor ioctl expects to be asynchronous with respect to other screen updates, in particular page flips. As X updates the cursor from a signal context, if the cursor blocks then it will stall both the input and output chains causing bad stuttering and horrible UX. Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94980 Fixes: 5008e874edd34 ("drm/i915: Make wait_for_flips interruptible.") Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/1460922166-20292-1-git-send-email-chris@chris-wilson.co.uk Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit acf4e84d6167317ff21be5c03e1ea76ea5783701) Signed-off-by: Jani Nikula <jani.nikula@intel.com> :040000 040000 ffd5371b8faffb065a2cd8c5624127ce2a03284c a19fe78a340ba89e0c206e96b65d5c426eb0e150 M drivers [diego@myhost linux]$ Created attachment 124772 [details]
git bisect log 2
git-bisect log, second try.
This is using localmodconfig and I think this time it's more accurate as I'm able to reproduce the bug better.
Doing a revert makes the problem go away, e.g. git revert -n 7ac7d19f808697abe6658c64c96868f728273f9c (In reply to Diego Viola from comment #163) > Doing a revert makes the problem go away, e.g. > > git revert -n 7ac7d19f808697abe6658c64c96868f728273f9c I tried this revert after switching to master BTW. (In reply to Diego Viola from comment #163) > Doing a revert makes the problem go away, e.g. > > git revert -n 7ac7d19f808697abe6658c64c96868f728273f9c Yeah, a commit touching atomic modesetting is a believable result. Just for the record: I was testing this on my ThinkPad T450, I will test if reverting the commit 7ac7d19f808697abe6658c64c96868f728273f9c helps the issue go away on the older computer where I originally reported this problem. Building linux.git (master) as of commit 02184c60eba8491ea574cd17b8ba766c86d468f2 to see if I can still reproduce this on the original machine, without reverting commit 7ac7d19f808697abe6658c64c96868f728273f9c. Will report back when it's done. OK, yes, I'm still able to reproduce it with current master. I will try reverting 7ac7d19f808697abe6658c64c96868f728273f9c on the original machine and see if that fixes it there as well. OK, I'm happy to report that reverting commit 7ac7d19f808697abe6658c64c96868f728273f9c solves the problem on the original machine as well. Got a hang or twice on the older/original machine while pressing F too many times with the commit reverted, maybe it's something else? Although the hang doesn't happen anymore most of the time, fullscreen/windowed works as expected 99.99% of the time and the hang is not as easy to reproduce. Can be my machine just being too old perhaps? It's a 6-7 years old machine. Tried to do a git-bisect of v4.4 and v4.5. However, git said this: Some good revs are not ancestor of the bad rev. git bisect cannot work properly in this case. Maybe you mistook good and bad revs? I know v4.4 is bad and v4.5 is good. Currently building v4.3 so I can bisect against v4.4. Just started a bisect of v4.3 (good) and v4.4 (bad). git bisect of 4.3 and 4.4 is done, see the next post for the resulting bad commit. [diego@myhost ~]$ cd linux [diego@myhost linux]$ git bisect good f4502c25ebd04691f284fdafff4a5613299c36dc is the first bad commit commit f4502c25ebd04691f284fdafff4a5613299c36dc Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Thu Aug 27 15:44:04 2015 +0200 drm/i915: Always try to inherit the initial fb. The initial state is read out correctly and the state is atomic, so it's safe to preserve the fb without any hacks if it's suitable. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 4aa2f488c319e68a1a17baac4bd4269dbc34daf9 67bdae945d662fd1846371b469b402f070203b45 M drivers [diego@myhost linux]$ Created attachment 124785 [details]
git bisect v4.3 and v4.4
Just did a checkout of f4502c25ebd04691f284fdafff4a5613299c36dc and indeed it hangs. d551599181769571f4f68dd93e5d8b15868889af is fine, it doesn't hang. Just built Linux 4.2 and it works fine also with this version. Built Linux 4.1 (4.1.0-ARCH-dirty) as well and the problem is also not there. I tried building v4.0 but my T450 didn't even boot with that kernel. I've tried building v4.0 on my older machine, but it failed to boot there as well. Maybe the config was messed up in some way, I recall booting older kernels on this machine in the past. Not that it matters at the moment. I tried these two commits on the original/older machine, but all I get is "Input Not Support" after i915 is loaded. f4502c25ebd04691f284fdafff4a5613299c36dc d551599181769571f4f68dd93e5d8b15868889af v4.3 is also hanging on the original/older machine, god damn. v4.2 also results in retroarch hanging on my older machine. 4.1 is also broken on the original/older machine. OS: Arch Linux x86-64 RetroArch 1.3.4 RetroArch fails to switch back into windowed mode from fullscreen while pressing F twice, video freezes instead. *** This bug has been marked as a duplicate of bug 96767 *** Moved here: https://bugs.freedesktop.org/show_bug.cgi?id=96769 And here: https://bugs.freedesktop.org/show_bug.cgi?id=96767 *** This bug has been marked as a duplicate of bug 96769 *** commit 40e3be34367141c952678f456f0e0d4632b6c266 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 3 10:18:32 2016 +0000 sna/dri2: Complete the final flip in a chain after the window is destroyed When the pending flip is queued, we update all the Windows to use the next bo as their rendering target. However, that bo is not yet the scanout until the future flip is performed. If the current fullscreen Window is destroyed, we still must allow that flip to proceed or else the old bo is left on the scanout. And yes, this is indeed a fix to one of the debug patches that intended to detect the error causing #93844. Irony. Fixes: 7817949314a2 ("sna/dri2: Avoiding marking a pending-signal on a dead Drawable") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93844 Reported-by: Diego Viola <diego.viola@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to Chris Wilson from comment #190) > commit 40e3be34367141c952678f456f0e0d4632b6c266 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Nov 3 10:18:32 2016 +0000 > > sna/dri2: Complete the final flip in a chain after the window is > destroyed > > When the pending flip is queued, we update all the Windows to use the > next bo as their rendering target. However, that bo is not yet the > scanout until the future flip is performed. If the current fullscreen > Window is destroyed, we still must allow that flip to proceed or else > the old bo is left on the scanout. > > And yes, this is indeed a fix to one of the debug patches that intended > to detect the error causing #93844. Irony. > > Fixes: 7817949314a2 ("sna/dri2: Avoiding marking a pending-signal on a > dead Drawable") > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93844 > Reported-by: Diego Viola <diego.viola@gmail.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested this and your latest commit after that, the X server is still crashing. Tested your latest commit and it's indeed fixed. Thanks a bunch. :) https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=bf7316a4539afdf7a742d2b2ccbbaa5f27918255 Closing resolved+fixed. Fix introduced in xf86-video-intel and verified by reporter. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 121248 [details] Xorg.0.log