Description
Martin Pitt
2008-12-28 02:41:30 UTC
Created attachment 21511 [details]
intel_reg_dump when working
Created attachment 21512 [details]
intel_reg_dump after going black
Created attachment 21513 [details]
Xorg.0.log
X.org log which shows the plethora of pipe-A underruns.
When this happens, I get the following kernel messages: Dec 28 10:25:07 tick kernel: [ 5559.025081] mtrr: no MTRR for d0000000,10000000 found Dec 28 10:25:08 tick kernel: [ 5560.478087] apm: BIOS not found. $ diff -U 0 intel_regs.works.txt intel_regs.black.txt --- intel_regs.works.txt 2008-12-28 10:33:22.000000000 +0100 +++ intel_regs.black.txt 2008-12-28 10:24:50.000000000 +0100 @@ -57 +57 @@ -(II): PIPEASTAT: 0x00000203 (status: VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS) +(II): PIPEASTAT: 0x80000000 (status: FIFO_UNDERRUN) @@ -132 +132 @@ -(II): FBC_CONTROL: 0x43e847e2 +(II): FBC_CONTROL: 0xc3e847e2 @@ -134 +134 @@ -(II): FBC_STATUS: 0x20000000 +(II): FBC_STATUS: 0x60000000 @@ -138 +138 @@ -(II): MI_MODE: 0x00000200 +(II): MI_MODE: 0x00000000 I think this might be a DUP, can you try the patch in 18651? *** This bug has been marked as a duplicate of bug 18651 *** I'm building/installing -intel with this patch applied. I'll report back in a day or two, since the underruns only start to happen after a couple of hours (presumably when I'm doing particular things with my computer, but I'm not able to pinpoint what triggers it). Thanks! I found out that starting kvm and doing some other window juggling triggers the quick underrun (i. e. the flickering, not the total blackout) pretty reliably. With the proposed patch applied, I still get underruns, though. I'll let it run for a couple of days to see whether I get any black screen still. Looks separate from 18651 unfortunately. I have used the suggestion in https://bugs.launchpad.net/bugs/311895 since yesterday (Option "FramebufferCompression" "off"), and that *seems* to do the trick. I want to test it a little longer before fully confirming, especially since the most recent X.org stopped logging the underruns in Xorg.0.log, and I got too used to the occasional screen flicker, so I might well have ignored them. But my screen went black (or brown, or white) irrecoverably after a day or two without that option. If that doesn't happen any more either, I'll report back here. Two days have passed with Option "FramebufferCompression" "off", and I didn't notice a single flickering, nor encounter another black screen. Thus I'm fairly sure that this is at least a very good (if not perfect) workaround for the problem, and might also point to the root cause. Just reiterating that I never ever observed those problems with the internal LVDS (1280x800), just with the external TFT (1280x1024). </facts> <wild and unqualified speculations> May it be possible that compressing the framebuffer just occasionally takes too long, once it gets bigger than a critical treshold (which lies somewhere in between 1280x800 and 1280x1024 pixels)? Any idea why it would sometimes not recover from this at all any more, perhaps if it takes too long, and it cannot 'catch up' any more? Thanks! That mirrors my experience, too. I'm on a Mac Mini with a GM945 video... using the TV-out at 1024x768 for several months I never had any issues, and when I changed to using DVI->HDMI output on it at 1280x720, I started getting the solid color screen really frequently. Disabling the FramebufferCompression about three weeks ago did make the machine usable again. I've run the thing for 5 or 6 hours per day on a daily basis (I have it hooked up to a TV using MythTV on it), and although I have still gotten that solid color screen since then, it's only happened once in all that time (as opposed to every 5 or 10 minutes before). I was getting that periodic flicker before, too, and that's infrequent enough that I don't notice it anymore if it's still happening at all. In #18491 there's a patch (https://bugs.freedesktop.org/attachment.cgi?id=22319) to mess with the FIFO watermark values that might help. But more than that, it includes a patch to dump the FIFO watermark regs to the intel_reg_dumper tool. Can someone apply it and capture a reg dump both before and after starting X on their machine with the patch applied? The spontaneous black screen is almost surely caused by a series of pipe underruns. That generally happens if our memory arbitration settings are off (so a given pipe can't get its pixels due to some other pipe hogging the memory interface) or the FIFO watermark regs being incorrect (we fetch a new chunk of pixels too late and end up missing our window of time to feed them to the pipe). The framebuffer compression hardware periodically compresses the framebuffer into a private section of memory (the compressed buffer), temporarily increasing memory activity; it could be that we're not accounting for that in the FIFO settings, so the screen goes black after the first compression pass (which is usually after about 15s iirc). I enabled FB compression again and applied the patch in bug 18491. It had quite a dramatic regressive effect: the screen now flickers at each hard disk access, mouse movement, or key press, and only stands still if absolutely nothing happens. I captured the registers right after boot, then after X and gdm started, and finally after GNOME was fully running. Created attachment 22944 [details]
regs with patch from #18491: right after boot
Created attachment 22945 [details]
regs with patch from #18491: after X and gdm start
Created attachment 22946 [details]
regs with patch from #18491: GNOME fully running
That's the watermark change you asked for:
--- boot-nox.regs 2009-02-14 15:49:55.000000000 +0100
+++ boot-gdm.regs 2009-02-14 15:50:15.000000000 +0100
@@ -31,2 +31,2 @@
-(II): FWATER_BLC: 0x03060106
-(II): FWATER_BLC2: 0x00000306
+(II): FWATER_BLC: 0x033f033f
+(II): FWATER_BLC2: 0x0000033f
It doesn't change any further after starting GNOME (which does xrandr stuff, etc.) Other registers do change during GNOME startup, though.
Heh, I think I had the watermark regs backwards... I'll have to spin a new patch, but you could try changing the watermark value in the patch in the meantime: watermark = (3 << 8) | 0x3f should instead be something like watermark = (3 << 8) | 1 I did that change, much better. :-) It doesn't flicker so badly any more, and the watermark reg diff is now $ diff -U 0 boot-nox.regs boot-gnome.regs |grep WATER -(II): FWATER_BLC: 0x03060106 -(II): FWATER_BLC2: 0x00000306 +(II): FWATER_BLC: 0x03010301 +(II): FWATER_BLC2: 0x00000301 I have to run now, so I can't do the full test which triggers the original underrun; will report back tomorrow or Monday. Thank you so far and have a nice weekend! Created attachment 22956 [details]
regs with fixed patch from #18491: right after boot
Created attachment 22957 [details]
regs with fixed patch from #18491: GNOME fully running
OK, I now threw kvm, glxinfo, and totem at it, all running at the same time, and not a single flicker. No watermark difference in the registers. Great work, thanks! Great, thanks a lot for testing. Dave does this change also help your situation? Thanks for the new the patch. Martin, Do you have a pointer to how to build the new driver with the patch for 8.10 (Interpid)? Or if someone could post a like the binary driver, that would be great! I can just replace the original intel driver with this. Raghu, I ported the patch to the intrepid version (2.4.1), will attach in a bit. To make testing easier for everybody, I also uploaded it to my personal package archive, so that you can grab the ready-built .deb from there, or just add the new apt source: https://launchpad.net/~pitti/+archive/ppa Be warned, though, I didn't test it. In the (unlikely) event that it totally screws up your system, please boot with the "text" kernel command line option in grub, log in at Ctrl+Alt+F1, and do sudo apt-get install xserver-xorg-video-intel/intrepid-updates Created attachment 22959 [details] [review] patch ported to 2.4.1 Thanks Martin, for making real easy to install! I am currently running xserver-xorg-video-intel-2.4.1-1ubuntu10.4~test1 from your repository. so far so good. Using the original xorg.conf that does not have any options set for the device. the deb will make it easy for me to test, too, thanks! It'll probably be tomorrow before I can get to it though. Created attachment 23007 [details]
regs with fixed patch from #18491: after hibernate
Ugh, after a hibernate/resume cycle the flickering is back. I have never seen it any more when not using hibernate (didn't test suspend, it's currently broken).
The watermark registers did not change, though.
Hm, so the regs look ok after resume but you see flickering? That sounds bad; it means there may be another reg we've got to write to get things working again. (In reply to comment #25) > or just add the new apt source: > > https://launchpad.net/~pitti/+archive/ppa How do I add this source? If I to that URL and follow the directions given on that page, I should add this to my sources.list: deb http://ppa.launchpad.net/pitti/ppa/ubuntu intrepid main But after doing that, I get a 404 error trying to retrieve Packages.gz I just downloaded the package by hand for now, will let you know how it goes. @Dave: Weird, that should be the correct URL. I just tried it here, and it works. Dave, just in case : make sure sure you don't have 's' after 'http'. 'deb http://ppa.launchpad.net/pitti/ppa/ubuntu intrepid main' worked for me as well. The Syaptic Package manager complains about either lack of or mismatch of signatures, but repo works. Huh. I tried again just now and it worked. Maybe I just caught it at a bad time during a repo refresh or something before. Anyhow, I installed the deb manually yesterday, and I've been running the thing most of the day, with the workaround hacks removed from xorg.conf (so it just has the default "detect everything" settings again). No screen blankouts yet. I did just get a flicker, though, just before typing this (I quit out of MythTV so I could run Synaptic and try the repo source add again, the flicker happened right after MythTV quit). The flicker did have the corresponding: (EE) intel(0): underrun on pipe A! in Xorg.0.log It's the one and only occurrence of that error in the log since Xorg was restarted this morning. I'm not sure how to check the registers that were mentioned. If it was just one flicker when you exited MythTV that might be normal, if a mode set or pipe on/off sequence occurred. Anyway sounds like we have at least this part of the problem narrowed down; I'll put together a patch for the 2.7 release. Been running this for a few days now, and no further issues so far. Looks like it fixes it for me. Oh, and I haven't tested Martin's situation from comment 29... I've never had reason to suspend or hibernate this thing. Jesse, do you think that http://bugs.freedesktop.org/attachment.cgi?id=22319 plus the "0x3f -> 1" fix is good for uploading? I'd like to get it some more testing exposure, but I'm not sure whether this was just a test patch and needs to be redone for public consupmtion? Thank you! Hm, for a few days now I get the screen flicker immediately, even after a clean boot and no suspend, etc. Odd, I was running Jaunty with this patch for over a week without a single glitch; apparently something else changed in the system now (newer X.org, kernel updates, etc.) $ sudo intel_reg_dumper |grep WATER (II): FWATER_BLC: 0x03010301 (II): FWATER_BLC2: 0x00000301 There's a patch in 18651 that might also help (they're more proper at least, not like the hack I posted here). Can you give try? I applied the latest patch (http://bugs.freedesktop.org/attachment.cgi?id=24375) to 2.6.3 (what we have in Ubuntu Jaunty). I threw everything at it which I could find: running glxgears under a load of 6.6 (having an rsync and jigdo in the background), playing a fullscreen video while booting a live system under kvm, suspend/resume, everything works. I haven't seen a single flickering so far. This even fixes the flickering of glxgears when running under EXA (we didn't switch to UXA in Ubuntu yet, since it still causes too many crashes and problems). I will run with this patch for a while, to see the long-term behaviour. Before, I got the flickering/hang after running for some hours, or some time after suspend (see bug 20520). Perhaps bug 20520 is even just another consequence of this one, although it happened even with FramebufferCompression off. I'll report back in a couple of days with the long-term results. Kudos, Jesse! You made my day! Just got the hang after suspend again (bug 20520), so that is independent after all. (In reply to comment #42) > Just got the hang after suspend again (bug 20520), so that is independent after > all. Hm that could be one of the other suspend/resume related bugs we have open at the moment. It could also be due to some missing bits I posted a patch for in 18702. Care to try that out? A first quick shot at trying that patch left me with a ton of rejections (tried to apply against linux 2.6.28.8, with some ubuntu modifications). I'll try again later, but this might take a while. As for this bug, I have used this latest patch for several hours now, with no problem whatsoever. Thanks muchly! Great, thanks a lot for testing, Martin. I'll push as soon as I get some review on intel-gfx. *** Bug 18651 has been marked as a duplicate of this bug. *** Created attachment 24431 [details]
registers with latest version 5 patch
Argh, this is haunting me. With the latest patch applied, it was working perfectly yesterday, but just now the flickering is back. No suspend involved.
I attach the current registered, do you see anything wrong there?
Thanks!
Created attachment 24438 [details]
Xorg.log with patch 5 and flicker
I'm attaching current Xorg log as well, since it has a couple of messages like
(II) intel(0): Setting FIFO watermarks - A: 1, B: 37, C: 2, SR 127
Just as an early warning, this patch (same .deb that I am running) completely broke matters for a colleague of mine, also on 945GM/GMS. I asked for registers and Xorg.log, will forward as soon as I get it. Created attachment 24465 [details]
debug logs for monitor/VT state changes
Ah, I know what changed. After a clean boot, with the latest (version 5) patch applied, everything works perfectly for me, the trouble starts when I switch off my monitor, and switch it on again (as I usually do during lunch break).
So I looked at dmesg, Xorg log, and registers in three states.
1. After clean boot, and GNOME login. See boot.* files.
2. Switch to VT1 and back. dmesg says
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1
Registers are wildly different, see diff -U0 boot.registers.txt vtsiwtch.registers.txt. After waiting for one minute, the registers change further to
-(II): FBC_STATUS: 0x20000000
+(II): FBC_STATUS: 0x60000000
Xorg log gets a some 30 lines of info lines, and some interesting warnings:
$ diff -u boot.Xorg.log vtswitch.Xorg.log | grep -v '(II)'
+(WW) intel(0): ESR is 0x00000001, instruction error
+(WW) intel(0): Existing errors found in hardware state.
+(WW) intel(0): plane B needs more FIFO entries
3. Switch off monitor, and turn it on again. Now I get the occasional flickering.
dmesg gets some USB disconnect/connect messages (monitor has USB hub with some stuff), nothing X related.
registers do not change at all.
No new Xorg log entries.
Now, having typed this, it seems to me that switching off the monitor doesn't change much, and that most likely the VT switch is to blame; I will do another test to affirm that I get flickering after VT switch already (I'll report back if that is not the case). Your patch seems to work by and large, but seems to not take VT switches into account correctly.
Ah I can believe that VT switches might cause trouble... The diff actually doesn't look too interesting though, mainly LVDS is off. However this part definitely does look weird: (II) intel(0): FIFO entries - A: 25, B: 0 (II) intel(0): FIFO size - A: 28, B: 59 (WW) intel(0): plane B needs more FIFO entries That FIFO entries line indicates that pipe B is off. Maybe I don't handle that case correctly... Created attachment 24471 [details] [review] add save/restore of watermark regs across VT switch Not restoring these across VT switch might be bad... This one leaves the programming alone but takes care to save/restore the regs across VT switch. Created attachment 24493 [details] debug logs for monitor/VT state changes for patch v6 similar debug logs for patch v6 (https://bugs.freedesktop.org/attachment.cgi?id=24471). Unfortunately the flickering still happens. :-( Yesterday night I conducted another experiment. I just switched the monitor off and on, without any VT switch. After that I already got the flickering. However, there was no change at all in registers, Xorg.log, or dmesg. (This was with the previous patch version 5, though). Many thanks for your efforts, Martin P.S. If ssh to my machine helps you in any way, I can provide that. I'll just be away next week on the LinuxFoundation collaboration summit in San Francisco, and can spend little to no time testing this. The first time only I installed xserver-xorg-vido-intel 2:2.6.3-0ubuntu4pitti1 (AMD64) from Martin Pitt's PPA and restarted GDM, I was met with a blank screen, though it was clear GDM was waiting for input. Hardware is GM45 rev 7, dual-channel mem with only pipe B connected to internal LVDS to a 1440x900 6-bit TN (aargh) panel. Perhaps this relates to when the pipe watermarks are reprogrammed and thus data is discarded; we see pipe B's LBLC_EVENT_STATUS flag set and EDID detection was not performed. There were no kernel/syslog messages, and the Xorg.log difference against normal operation is: $ diff -u /var/log/Xorg.0.log.working /var/log/Xorg.0.log.blank @@ -197,10 +197,7 @@ (WW) intel(0): Register 0x61200 (PP_STATUS) changed from 0xc0000008 to 0xd000000a (WW) intel(0): PP_STATUS before: on, ready, sequencing idle (WW) intel(0): PP_STATUS after: on, ready, sequencing on -(WW) intel(0): Register 0x71024 (PIPEBSTAT) changed from 0x80000206 to 0x80000246 -(WW) intel(0): PIPEBSTAT before: status: FIFO_UNDERRUN VSYNC_INT_STATUS SVBLANK_INT_STATUS VBLANK_INT_STATUS -(WW) intel(0): PIPEBSTAT after: status: FIFO_UNDERRUN VSYNC_INT_STATUS LBLC_EVENT_STATUS SVBLANK_INT_STATUS VBLANK_INT_STATUS -(WW) intel(0): Register 0x321b (FBC_FENCE_OFF) changed from 0x59018500 to 0x2a03a200 +(WW) intel(0): Register 0x321b (FBC_FENCE_OFF) changed from 0x59008500 to 0x2a03a200 (==) Depth 24 pixmap format is 32 bpp (II) do I need RAC? No, I don't. (II) resource ranges after preInit: @@ -432,93 +429,17 @@ (II) AT Translated Set 2 keyboard: Device reopened after 1 attempts. (II) Video Bus: Device reopened after 1 attempts. (II) Macintosh mouse button emulation: Device reopened after 1 attempts. -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -exaCopyDirty: Pending damage region empty! -(II) PM Event received: Capability Changed -I830PMEvent: Capability change -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) PM Event received: Capability Changed -I830PMEvent: Capability change -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) PM Event received: Capability Changed -I830PMEvent: Capability change -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) PM Event received: Capability Changed -I830PMEvent: Capability change -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) PM Event received: Capability Changed -I830PMEvent: Capability change -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) PM Event received: Capability Changed -I830PMEvent: Capability change -(II) intel(0): EDID vendor "LEN", prod id 16435 -(II) intel(0): Using hsync ranges from config file -(II) intel(0): Using vrefresh ranges from config file -(II) intel(0): Printing DDC gathered Modelines: -(II) intel(0): Modeline "1440x900"x0.0 101.60 1440 1488 1520 1792 900 903 909 945 -hsync -vsync (56.7 kHz) -(II) intel(0): Modeline "1440x900"x0.0 81.49 1440 1488 1520 1760 900 903 909 926 -hsync -vsync (46.3 kHz) -(II) intel(0): EDID vendor "LEN", prod id 16435 Daniel, please note that 4pitti1 has the "v 5" patch. I just uploaded my current test package with the latest "v6" patch (http://bugs.freedesktop.org/attachment.cgi?id=24471) to my PPA, as 4pitti2. Daniel, looks like you hit the LVDS detect bug with the version Martin packaged. Martin, the fact that you see flickering after just a monitor power cycle is strange. If the FIFO regs weren't changed the flicker you see shouldn't be caused by underruns... I'm putting together another patch which will report that so we can check. Created attachment 24651 [details] [review] Add underrun debugging This one should log any underruns that occur so we can figure out if the flicker you're seeing is some other problem. Thanks, Jesse. I applied the patch to the current Ubuntu 9.04 package and uploaded it to my personal package archive again, so that people on 9.04 can test it. I can't test it myself until next Tuesday, since this week I'm in San Francisco on the LF summit. I never got any flickering with the internal LVDS, and I don't have an external screen here. Rebuilding the xserver-xorg-video-intel package with the updated patch, I was unable to trigger underruns with my GM45 rev 7 hardware, rebooting a some times for initial state, separately restarting GDM in a loop ~50 times, and switching VTs, testing both EXA and UXA paths. Since the runtime overhead is minimal, I'd say it's worth carrying this patch forward to help understand the failure mechanism later. Daniel The X-server was still solid after ~10 suspend-resume cycles (running in EXA) also, though I do see the Error Status Register getting bit 0 set - presumably expected. See attached Xorg.0.log. Created attachment 24653 [details]
GM45 (rev7) patched intel-2.6.3 log on Thinkpad T400, showing ESR:0x1
Daniel, glad to hear things are stable for you. But my patch shouldn't affect your configuration (GM45 has automatic FIFO sizing & pipe arbitration). Looks like your LVDS detection bug is fixed though, which is good. Created attachment 24816 [details]
debug logs for patch v8
I applied the latest patch (v8) to my PPA against the current Jaunty package (2:2.6.3-0ubuntu9pitti1). Again I captured logs right after a clean X.org startup (startup.*), right after a monitor off/on cycle (not included, since no change), and a while after a VT switch.
I didn't see any underruns happen after switching off the monitor. Perhaps the effect during lunch break is that the monitor gets disabled by the screensaver (DPMS off), which acts more like a VT switch?
The underruns started some minutes after a real VT switch, and due to the new patch I get them logged now:
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
(EE) intel(0): underrun on pipe A!
The attached logs have just one instance of those, but the underruns become more frequent now. After the first underrun happened, I got this change:
-(II): FBC_STATUS: 0x20000000
+(II): FBC_STATUS: 0x60000000
(vtswitch2.regs)
The pipe underruns also start to happen massively after I used kvm (even after kvm was stopped long ago). Perhaps this is a symptom of high (IRQ-safe) spinlock hold-times, preventing the pipe being reset/refilled within the needed time window? (unless I'm misunderstanding the mechanism) This may be key to reproducing the issue, and may be worse on kernels without preemption and lock-break points (ie server/throughput/compute optimised kernels). Using latencytop or kernel ftrace to see what magnitude of lock hold time is needed to cause the pipe underruns may be useful to developers trying to reproduce this later... No the pipe is filled automatically by hardware (the GPU just does fetches from RAM based on the FIFO watermark values), so either the watermarks are incorrect or the FIFO sizes are wrong or both. Oh wow I definitely see this problem now on my 945 test machine with the patch applied... Ah looks like my latency constant wasn't so pessimistic after all. This one works for me though; hope it fixes your problem too (though I'm not sure why a VT switch would trigger it). Created attachment 25075 [details] [review] Increase latency constant Made the latency 5us instead of 3us, which seems to be closer to the truth on my Acer platform at least. Created attachment 25080 [details]
debug logs for patch v9
I tried the v9 patch (also uploaded to PPA again). Unfortunately this is now worse.
At gdm, when both the internal LVDS and the external TFT are active @1024x768 (no xrandr in gdm yet), I get a constant flickering about twice per second. This cannot even be worked around any more with disabling fb compression.
After logging in, when the internal LVDS switches off, behaviour is identical to the v8 patch: occasional flickering starts after a vt switch (or some hours of usage).
I attached the logs again, after a clean boot (start.*), a vt switch (vtswitch.*), after the first overflow a few minutes later (overflow.*), and after several more overflows occurred (overflow-more.*).
Created attachment 25116 [details] [review] Fix watermark sanity check Arg, maybe I'll get this right one day: (II) intel(0): FIFO entries - A: 42, B: 0 (II) intel(0): FIFO size - A: 28, B: 59 (II) intel(0): Setting FIFO watermarks - A: -16, B: 1, C: 2, SR 5 That negative A value would certainly cause trouble. Looks like my sanity check was looking at the wrong variable; I should have been checking the watermark value against <= 0, not the entries value (that should always be positive). Interestingly, the new calculation indicates that you're driving pipe A pretty hard relative to it's FIFO RAM allocation, but with just a single pipe enabled it should be safe. If not, we could modify both DSPARB and the FIFO watermarks to increase the chances of a given config working, or enable pixel clock doubling perhaps. Sigh, looking again at your older logs I doubt that last patch will fix the issue: (II) intel(0): FIFO size - A: 28, B: 59 (II) intel(0): Setting FIFO watermarks - A: 1, B: 1, C: 2, SR 22 So we're already setting the watermark as aggressively as possible, so the pipe should be continuously fetching data for display. In your config that's still not enough though, since we drain it faster than we fill it. Another thing that might help is to reduce the pixel clock on the mode you're sending to your external monitor; you can use the cvt or gtf tools to create a mode with reduced blanking or a lower refresh. I think I'll need to cook one up to modify DSPARB as well (like we do in the current driver). Ah, so you are saying that something after a VT switch or after putting a high load on the graphics card introduces a fill/drain backlog which the card can't ever catch up with any more? So the disabling of the fb compression helps because dropping that extra work causes the GPU to have enough time again to re-fill the pipes? NB that I have used that very same laptop to drive a 1920x1200 external screen without problems, but then again I hadn't done it for very long (just about an hour for testing the new monitor for my wife's computer). So if this is principally not fixable due to hw speed limitations, maybe it would be possible to automatically disable fb compression once the chip hits pipe underruns? Thanks for your efforts! Martin Yeah avoiding compression when the FIFO watermark is low is probably a good idea. But we may also be able to increase the amount of FIFO RAM allocated to the large display. Btw, we're carrying an old patch from this bug in the Ubuntu release, one from Feb 2009, patches/109_i830-fifo-watermark-conservative.patch. It sounds like that patch has grown obsolete, or at least doesn't solve this bug 100%, however I'm going to leave it in place when we move to 2.7.0. If we should be doing something differently, please ping me so we can get a better fix in. Bryce, I think you should drop the patch. It's insufficient, might cause regressions on other platforms, and doesn't help at all any more at least on my computer. Thanks Martin, I've removed the patch from Karmic. Jesse, as we discussed last week in Barcelona, I have now tried -intel git head, mesa git head, 2.6.30rc7 on my home system with the external monitor again, now with the extra 1 GB of RAM that I plugged in last week. As you suspected, the underruns are now gone, apparently having a second RAM bar now provides enough bandwidth for the graphics card to avoid underruns. I'm happy to test further patches, I can easily remove the extra GB of RAM again. The very same symptom happens on the Samsung NC10 of a friend of mine, I can test stuff on his machine as well (with some delay). My impression is that with FB compresssion my machine is simply not fast enough, regardless of the watermark settings (given that all of above patches failed consistently). Would it be possible for the driver to disable FB compression dynamically if it encounters pipe underruns, such as "twice in five minutes"? I wonder why this problem didn't occur at all with earlier driver versions (2.4). Didn't that use FB compression yet? Thanks! On Wed, 3 Jun 2009 00:06:45 -0700 (PDT) bugzilla-daemon@freedesktop.org wrote: > as we discussed last week in Barcelona, I have now tried -intel git > head, mesa git head, 2.6.30rc7 on my home system with the external > monitor again, now with the extra 1 GB of RAM that I plugged in last > week. > > As you suspected, the underruns are now gone, apparently having a > second RAM bar now provides enough bandwidth for the graphics card to > avoid underruns. > > I'm happy to test further patches, I can easily remove the extra GB > of RAM again. The very same symptom happens on the Samsung NC10 of a > friend of mine, I can test stuff on his machine as well (with some > delay). > > My impression is that with FB compresssion my machine is simply not > fast enough, regardless of the watermark settings (given that all of > above patches failed consistently). Would it be possible for the > driver to disable FB compression dynamically if it encounters pipe > underruns, such as "twice in five minutes"? > > I wonder why this problem didn't occur at all with earlier driver > versions (2.4). Didn't that use FB compression yet? Great, thanks for the update. Yes, we should detect either memory configuration or underruns and take appropriate action. Previous drivers didn't modify the FIFO or DSPARB settings, so the defaults may have been working on your platform, or something else changed to affect the way we access memory (it's also possible that FBC was disabled on older releases in your config for some reason). Jesse Created attachment 26930 [details] [review] most recent, KMS version of the patch This patch applies to the kernel. It still doesn't contain checks against available bandwidth & latency to reject modes we can't support, but it should behave a bit better than the current 2D driver. I applied the patch to 2.6.31rc1 and first tested it with 2 GB of RAM. No noticeable difference, everything continued to work smoothly. Now I ripped out the second GB RAM bar again, and did some stress testing: kvm -m512 (booting another Ubuntu desktop live system), running glxgears, and do some compiz juggling and VT switches. In previous versions this was a reliable way of triggering underruns quickly (which otherwise just occur after a couple of hours). I had a load of 4.3, and glxgears/compiz froze for some fractional seconds due to the high load, but I didn't get any pipe underrun. I now continue to use the system for a couple of hours to see the longer-term effects. What I didn't do yet is exercising the same stress test on 2.6.31 without this patch. Do you need this? Only if you're feeling thorough. :) Thanks for the updated report though. I fixed a few bugs in the calculations in the KMS patch, so maybe one of those fixed your issues. I'm really looking forward to closing this one; I'll ping Eric about including the patch. Yay, fix pushed! commit 7662c8bd6545c12ac7b2b39e4554c3ba34789c50 Author: Shaohua Li <shaohua.li@intel.com> Date: Fri Jun 26 11:23:55 2009 +0800 drm/i915: add FIFO watermark support Oops, I am terribly sorry. We currently put i915 into the initramfs, and it gets loaded from there. When I built the module with the patch, I forgot to update the initramfs, so all these successful tests were actually done with the original i915 from 2.6.31rc1. Later this afternoon some other package updated the initramfs, and now the screen goes entirely and irrecorverably black when booting, both when docked (external DVI) and when undocked (internal LVDS). So, perhaps you should revert this from your tree until this is investigated further? So far, I don't seem to have this underrun problem at all with 2.6.31rc1, thus I leave the bug as "resolved". Uh-oh, ok thanks for the heads-up. I'll look at this. Can you modprobe your drm with debug=1 so we can see what the watermark values end up being on your machine? It would help if you could confirm that this particular patch caused the problem too, was that the only change or was there another kernel update as well? It wasn't the only patch, I also applied the tiny patch from bug 20520 (register restoring ordering fix for resuming). However, I tested that patch in isolation before, and it worked fine. Also, I don't think that code path is active on boot. There was no other kernel update. I'll send detailled debugging information tomorrow (I hope I can ssh into the machine still, or it gets logged far enough), bed time for today. I just wanted to give you an early warning to perhaps defer propagation of the patch (or just revert it for now, since it just works without it. Created attachment 27329 [details]
logs for early/late i915 loading with drm debugging
So, first I turned on DRM debugging and dmesg capturing:
$ cat /etc/rcS.d/S80dmesg
#!/bin/sh
dmesg > /var/log/dmesg-`date +%T`
$ cat /etc/modprobe.d/drmdebug.conf
options drm debug=1
In the attached logs I renamed the dmesg files from timestamps to situation descriptions, such as "dmesg-31rc1-vanilla-early-2GB-ok.txt"
Then I tested all possible combinations of 2.6.31rc1 with/without this patch, with 1GB or 2 GB RAM, and with "
early" or "late" loading of i915/drm.
early: modules are contained and loaded by initramfs, i. e. pretty much as one of the first things after the k
ernel starts to boot
late: I booted without an initramfs, thus init starts readahead, sets the hostname and keyboard layout, and th
en starts udev which does an "udev trigger" and causes modules such as drm and i915 to be loaded, which in tur
n does KMS.
In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the initramfs, and it worked fine (just looked a bit ugly since mode got switched halfway through boot). Now I noticed that this late loading doe
s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or with 31rc1+your patch. That is
a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so perhaps I should report it separa
tely?
Results from this testing:
* late loading never works, I always get LVDS and DVI turned off
* early loading works with .30 final and .31rc1 vanilla
* with this patch applied, it never works, and worse, I don't even get a dmesg captured; this means that the
boot doesn't even get to rcS/70. Sounds like it wedges display and causes a kernel panic? Anything I can do to
debug this?
* 1 GB/2 GB does not make any difference in any test case
(In reply to comment #87) > Then I tested all possible combinations of 2.6.31rc1 with/without this patch, > with 1GB or 2 GB RAM, and with " > early" or "late" loading of i915/drm. > > early: modules are contained and loaded by initramfs, i. e. pretty much as one > of the first things after the k > ernel starts to boot > > late: I booted without an initramfs, thus init starts readahead, sets the > hostname and keyboard layout, and th > en starts udev which does an "udev trigger" and causes modules such as drm and > i915 to be loaded, which in tur > n does KMS. Sounds like a good set of combinations, thanks for testing. > In earlier Karmic (2.6.30 release candidates), we didn't put i915/drm into the > initramfs, and it worked fine (just looked a bit ugly since mode got switched > halfway through boot). Now I noticed that this late loading doe > s not work any more for some reason, not with 2.6.30 final, not with 31rc1, or > with 31rc1+your patch. That is > a bug in itself, and sounds pretty unrelated to this pipe underrun issue, so > perhaps I should report it separately? One thing jumped out between the early (working) and late (broken) logs: in the broken ones there's no line for the fbcon loading & initializing. Which would leave your display blank if/until X starts. Maybe that's missing from the load in the late case? > Results from this testing: > * late loading never works, I always get LVDS and DVI turned off > * early loading works with .30 final and .31rc1 vanilla > * with this patch applied, it never works, and worse, I don't even get a dmesg > captured; this means that the > boot doesn't even get to rcS/70. Sounds like it wedges display and causes a > kernel panic? Anything I can do to > debug this? > * 1 GB/2 GB does not make any difference in any test case Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine (at least I hope not); could be a kernel panic. You could try netconsole (modprobe netconsole netconsole=<params> and then use nc on another machine, the kernel Documentation/ directory has some info on that); it might capture a panic if you load the module by hand with the netconsole running. > One thing jumped out between the early (working) and late (broken) logs: in the > broken ones there's no line for the fbcon loading & initializing. Which would > leave your display blank if/until X starts. Maybe that's missing from the load > in the late case? Indeed, I discussed that with our initramfs/boot guru. So that's not a concern here. > Ugh, ok so it's probably not a pipe underrun then if it kills the whole machine (at least I hope not); could be a kernel panic. You could try netconsole Thanks for the netconsole hint, that worked beautifully. Indeed it catches a nice trace in the watermark updating: [ 489.298734] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 [ 489.298908] IP: [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915] [ 489.299056] PGD 0 [ 489.299152] Oops: 0000 [#1] SMP [ 489.299289] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/drm/card0/dev [ 489.299384] CPU 0 [ 489.299481] Modules linked in: i915(+) drm netconsole i2c_algo_bit configfs snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm arc4 joydev ecb snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer iwl3945 iwlcore iTCO_wdt iTCO_vendor_support snd_seq_device mac80211 led_class snd psmouse dell_wmi dell_laptop cfg80211 soundcore snd_page_alloc usb_storage usbhid serio_raw dcdbas video output tg3 fbcon tileblit font bitblit softcursor intel_agp [last unloaded: drm] [ 489.300005] Pid: 2208, comm: work_for_cpu Not tainted 2.6.31-1-generic #14-Ubuntu Latitude D430 [ 489.300005] RIP: 0010:[<ffffffffa030f1af>] [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915] [ 489.300005] RSP: 0018:ffff8800229e98b0 EFLAGS: 00010202 [ 489.300005] RAX: 0000000000000000 RBX: ffff880022966800 RCX: ffffffffa03244fb [ 489.300005] RDX: ffffffffa0321a20 RSI: ffffffffa0324518 RDI: 0000000000000001 [ 489.300005] RBP: ffff8800229e9930 R08: 0000000000000000 R09: 000000000001a400 [ 489.300005] R10: 0000000000000500 R11: 0000000000000000 R12: ffff880022967000 [ 489.300005] R13: 000000000001a400 R14: ffff8800229674a0 R15: 0000000000000001 [ 489.300005] FS: 0000000000000000(0000) GS:ffff8800019b4000(0000) knlGS:0000000000000000 [ 489.300005] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 489.300005] CR2: 0000000000000038 CR3: 0000000001001000 CR4: 00000000000006b0 [ 489.300005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 489.300005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 489.300005] Process work_for_cpu (pid: 2208, threadinfo ffff8800229e8000, task ffff88003d5416b0) [ 489.300005] Stack: [ 489.300005] ffff8800229e9910 ffffffffa0317a5a ffff000100000038 ffff8800229e98f0 [ 489.300005] <0> ffff000100010038 ffff8800229e98e0 0000000000000001 0000000000000002 [ 489.300005] <0> ffff8800229e0009 0000000000000000 ffff8800229e9920 ffff880022f3b000 [ 489.300005] Call Trace: [ 489.300005] [<ffffffffa0317a5a>] ? intel_sdvo_read_byte+0x6a/0xc0 [i915] [ 489.300005] [<ffffffffa031161c>] intel_crtc_dpms+0xb0c/0xef0 [i915] [ 489.300005] [<ffffffffa0317cff>] ? intel_sdvo_set_active_outputs+0x2f/0x40 [i915] [ 489.300005] [<ffffffffa031baab>] ? intel_tv_mode_find+0x2b/0x50 [i915] [ 489.300005] [<ffffffffa030ee52>] intel_crtc_prepare+0x12/0x20 [i915] [ 489.300005] [<ffffffffa02dbff2>] drm_crtc_helper_set_mode+0x272/0x3d0 [drm] [ 489.300005] [<ffffffffa03138c6>] intel_get_load_detect_pipe+0x116/0x160 [i915] [ 489.300005] [<ffffffffa031cede>] intel_tv_detect+0x7e/0x100 [i915] [ 489.300005] [<ffffffffa02dc273>] drm_helper_probe_single_connector_modes+0x93/0x2b0 [drm] [ 489.300005] [<ffffffffa02dc4d6>] drm_helper_probe_connector_modes+0x46/0x80 [drm] [ 489.300005] [<ffffffffa02dd2f8>] drm_helper_initial_config+0x28/0xc0 [drm] [ 489.300005] [<ffffffffa0301b78>] i915_driver_load+0xc68/0xd70 [i915] [ 489.300005] [<ffffffffa02d38b7>] drm_get_dev+0x147/0x2a0 [drm] [ 489.300005] [<ffffffff8106c2d0>] ? do_work_for_cpu+0x0/0x30 [ 489.300005] [<ffffffffa0320bfa>] i915_pci_probe+0x10/0xd0 [i915] [ 489.300005] [<ffffffff81280882>] local_pci_probe+0x12/0x20 [ 489.300005] [<ffffffff8106c2e3>] do_work_for_cpu+0x13/0x30 [ 489.300005] [<ffffffff81070a26>] kthread+0x96/0xa0 [ 489.300005] [<ffffffff8101308a>] child_rip+0xa/0x20 [ 489.300005] [<ffffffff81070990>] ? kthread+0x0/0xa0 [ 489.300005] [<ffffffff81013080>] ? child_rip+0x0/0x20 [ 489.300005] Code: c2 20 1a 32 a0 48 c7 c6 18 45 32 a0 bf 01 00 00 00 31 c0 e8 c4 40 fc ff 4c 63 6b 74 4d 89 e9 48 8b 43 20 41 83 c7 01 44 8b 53 78 <8b> 40 38 8d 48 07 85 c0 0f 49 c8 48 8b 43 08 c1 f9 03 48 8d 58 [ 489.300005] RIP [<ffffffffa030f1af>] intel_update_watermarks+0xcf/0xd40 [i915] [ 489.300005] RSP <ffff8800229e98b0> [ 489.300005] CR2: 0000000000000038 [ 489.308329] ---[ end trace 26bde7aeab46e24b ]--- jbarnes| pitti: just wondering if you can gdb your i915.o and do a "list *intel_update_watermarks+0xcf" Seems I need to build the module with debugging or so: (gdb) list *intel_update_watermarks+0xcf No symbol table is loaded. Use the "file" command. Sorry, this kernel debugging is all new to me :/ I now built the module with "CONFIG_DEBUG_INFO=1 make -C /usr/src/linux-headers-2.6.31-1-generic/ M=`pwd` modules", so they have debug info now and gdb works. But I guess due to the rebuild the offsets were all scrambled, so I need to get the backtrace again. Stay tuned.. So apparently the offset is even stable across rebuilds. I captured the trace again, and it looks exactly like the previous trace, so I'm not copying that again. (gdb) list *intel_update_watermarks+0xcf 0x101af is in intel_update_watermarks (/home/martin/ubuntu/kernel/linux-2.6.31/drivers/gpu/drm/i915/intel_display.c:1918). 1913 intel_crtc->pipe, crtc->mode.clock); 1914 planeb_clock = crtc->mode.clock; 1915 } 1916 sr_hdisplay = crtc->mode.hdisplay; 1917 sr_clock = crtc->mode.clock; 1918 pixel_size = crtc->fb->bits_per_pixel / 8; 1919 } 1920 } 1921 1922 /* Single pipe configs can enable self refresh */ So I guess it crashes because crtc->fb is NULL, since fbcon is not loaded yet? BTW, this happens whether or not 'fbcon' gets loaded before. Also confirmed when applying the patch to 2.6.31rc2. On Mon, 6 Jul 2009 23:03:19 -0700 (PDT) bugzilla-daemon@freedesktop.org wrote: > --- Comment #91 from Martin Pitt <martin.pitt@ubuntu.com> 2009-07-06 > 23:03:18 PST --- So apparently the offset is even stable across > rebuilds. I captured the trace again, and it looks exactly like the > previous trace, so I'm not copying that again. > > (gdb) list *intel_update_watermarks+0xcf > 0x101af is in intel_update_watermarks > (/home/martin/ubuntu/kernel/linux-2.6.31/drivers/gpu/drm/i915/intel_display.c:1918). > 1913 intel_crtc->pipe, > crtc->mode.clock); > 1914 planeb_clock = > crtc->mode.clock; 1915 } > 1916 sr_hdisplay = crtc->mode.hdisplay; > 1917 sr_clock = crtc->mode.clock; > 1918 pixel_size = > crtc->fb->bits_per_pixel / 8; 1919 } > 1920 } > 1921 > 1922 /* Single pipe configs can enable self refresh */ > > So I guess it crashes because crtc->fb is NULL, since fbcon is not > loaded yet? Ah yes, that helps a lot, thanks. I'll fix that up. Created attachment 27540 [details] [review] fix up FIFO programming The stuff that went upstream falls into the "how did that ever work" category. We were just getting lucky that the calculations always resulted in the most aggressive FIFO programming. This corrects that and should also fix your hang. Re-opening this as the FIFO master bug. *** Bug 18702 has been marked as a duplicate of this bug. *** *** Bug 18491 has been marked as a duplicate of this bug. *** Does that patch go on top of the "most recent, KMS version of the patch" (https://bugs.freedesktop.org/attachment.cgi?id=26930) or does it replace it? I suppose the latter, since the new one doesn't touch crtc->fb at all, but it looks very different from the older one. Thanks! Martin It sits on top of current drm-intel-next bits. Created attachment 27575 [details] [review] more fixes for FIFO programming I tested on my 855 machine and found some bugs in that configuration. So I cleaned up the code a little more and fixed things up. This one applies on top of the drm-intel-next branch. For the record, I get a warning after applying the patch to drm-intel-next: /home/martin/ubuntu/kernel/drm-intel-next/i915/intel_display.c: In function ‘intel_find_pll_g4x_dp’: /home/martin/ubuntu/kernel/drm-intel-next/i915/intel_display.c:834: warning: ‘clock.vco’ is used uninitialized in this function Will test now. Applied on top of current intel-drm-next, so far no noticeable difference (in other words, everything still works just fine). I'll use that driver for a few days now, will report back if anything regresses. Hey Jesse, sorry I haven't been able to try the patch that you sent me yet. I did real quick install the newest version of the video-intel driver, which on Arch is 2.7.99.901-3. This is on the 2.6.30 kernel (i686). It still exhibits the same behavior (flickering after resume from suspend to ram), but the frequency of the flicker is substantially reduced....it's actually usable now, with just the occasional flicker. Better performance than the vesa driver!! I'll still attempt the patch at some point when I get a chance. Send me a new one if this info changes anything. Scott Sorry guys....I have to retract my previous post after using intel-video-newest for a couple of hours. Worked fine with normal browsing, and program open/closing, but as soon as a non-flash video (avi) played, the flicker went back to making it unusable (well, highly unpleasant at least) for the duration of the movie. Flash video doesn't seem to trigger the flicker, except periodically. Scott The last patch I attached here is a kernel patch; it should make things better for you if you've got a KMS enabled configuration. Is there any way for you to try that, Scott? (In reply to comment #105) > The last patch I attached here is a kernel patch; it should make things better > for you if you've got a KMS enabled configuration. Is there any way for you to > try that, Scott? > Tried with the kernel source from the Arch repos and got: patching file drivers/gpu/drm/i915/i915_reg.h Hunk #1 FAILED at 1618. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_reg.h.rej patching file drivers/gpu/drm/i915/intel_display.c Hunk #1 FAILED at 1623. Hunk #2 FAILED at 1822. Hunk #3 FAILED at 1869. Hunk #4 FAILED at 2022. 4 out of 4 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_display.c.rej I did make sure this patch was applied before the standard arch patches. Can you send the link for the other kernel source you had me use last time? Thanks! Scott Fix has been pushed to drm-intel-next, that's probably the easiest way to get it now: author Jesse Barnes <jbarnes@virtuousgeek.org> commit dff33cfcefa31c30b72c57f44586754ea9e8f3e2 drm/i915: FIFO watermark calculation fixes Ok, got and compiled the kernel from git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git. uname -a = 2.6.31-rc2-drm-intel-26127-gdff33cf #1 SMP PREEMPT Thu Jul 16 20:23:01 PDT 2009 i686 Intel(R) Pentium(R) 4 CPU 1.80GHz GenuineIntel GNU/Linux xf86-video-intel-newest 2.7.99.902-1 : X.org Intel i810\/i830\/i915\/945G\/G965+ video drivers (2.8.0 RC2). Enabled KMS. Same flicker behavior following suspend to RAM, possibly even worse than with the stock kernel and no KMS. Darn it, I was hoping we had this solved! Well, let me know what other information you need from me. I can't remember where to find the source for the intel_reg_dump program you had me use several months ago, if you need that. Thanks! Scott The bug that keeps on giving. Please check this one out; Eric found the same thing for his high res configs: http://lists.freedesktop.org/archives/intel-gfx/2009-July/003471.html Jesse, no change with that patch. Still horrible flickering of the whole screen after resuming from suspend to RAM. What's next? :) Scott Can you attach your kernel log after you've loaded drm with debug=1? (Note, I'm assuming you're using KMS here.) Boot was at 09:18, I suspended and resumed a few minutes later. Debug sure fills up the log quick! Sorry its so big....it was too big to post here so here's the link. Booted with drm.debug=1 and i915.modeset=1. Definitely have KMS working, because the switching between virtual terminals is so fast. Cool! http://scottandchrystie.homeip.net/kernel.log.gz Scott Ah thanks, that helps a lot. What chipset do you have? I should be able to give you a fix pretty quickly... Jesse: Graphics device from lspci -- 00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01) 00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01) Scott Hm, I was hoping it was something simple like I'd just read the 845 docs incorrectly, but afaict things are actually correct for that case. But the plane A FIFO allocation does look supiciously high; this patch assumes 845G actually measures FIFO entries in DSPARB as 16 byte values rather than 64, so it might help. I'll have to check some more docs before I know for sure though. --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1844,6 +1844,9 @@ static int intel_get_fifo_size(struct drm_device *dev, int size = ((dsparb >> DSPARB_BEND_SHIFT) & 0x1ff) - (dsparb & 0x1ff); size >>= 1; /* Convert to cachelines */ + } else if (IS_845G(dev)){ + size = dsparb & 0x7f; + size >>= 2; /* Convert to cachelines */ } else { size = dsparb & 0x7f; size >>= 1; /* Convert to cachelines */ That didn't work, Jesse. I just get a black screen when it switches to the framebuffer on boot. The machine is still functioning because I can ssh in, but no display. Let me know if you need the logs for this. Scott Ah I was looking at the wrong code path. In the 830/845 case I think I might be clobbering some important bits, this should preserve them and hopefully set the right values. --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1943,14 +1943,16 @@ static void i830_update_wm(struct drm_device *dev, int planea_clock, int pixel_size) { struct drm_i915_private *dev_priv = dev->dev_private; - uint32_t fwater_lo = I915_READ(FW_BLC) & MM_FIFO_WATERMARK; + uint32_t fwater_lo = I915_READ(FW_BLC) & ~0xfff; int planea_wm; i830_wm_info.fifo_size = intel_get_fifo_size(dev, 0); planea_wm = intel_calculate_wm(planea_clock, &i830_wm_info, pixel_size, latency_ns); - fwater_lo = fwater_lo | planea_wm; + fwater_lo |= (3<<8) | planea_wm; + + DRM_DEBUG("Setting FIFO watermarks - A: %d\n", planea_wm); I915_WRITE(FW_BLC, fwater_lo); } Jesse, I think this is an improvement. Still get occasional flickers with normal browsing and window movements following suspend. DVD and other movie playback still triggers strong flickering, although it seems somewhat better than the last patch. Flash doesn't seem to trigger the flicker, even running full screen. Here's the link for the kernel log (drm.debug=1). http://scottandchrystie.homeip.net/kernel.log.gz Just so you know, the last two patches you posted have been "malformed patches" right around line 4. I've had to manually patch to get it working :) Not sure if its a cut and paste artifact, but the other ones you posted worked fine as a patch file. Thanks! Scott OK, so we're slowly improving. :) What if you apply both patches? I still can't find docs for the 845G FIFO and cache line sizes, so that could still be an issue. Awesome! That did it!! Not a flicker to be seen so far! Nice work :) Let me know if you need anything else and when the patches actually make it into the kernel. Thanks very much! Scott |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.