mplayer and ogle show badly broken video on mpeg2, mpeg4 (avi), realmedia at least, when xvideo is used. Some chunks of the video are flickering, and the video freezes for a few seconds from time to time. This is very specific to this Radeon model. This has been shown to be reproducible by both Bernd Ahlers (bernd@openbsd.org) and Robert Nagy (robert@openbsd.org) on three distinct laptops: two thinkpads, and a Dell model. This is a regression from the pre-6.9.0 radeon driver. Replacing the driver modules with the old driver, compiled with the newer XOrg framework works just fine, without changing anything else. You can ask Matthieu Herb (matthieu@openbsd.org) for specific details about X integration in OpenBSD.
this is probably related to the radeon gatos code that got merged.
can you try with cvs HEAD or the ati-1.0-branch? I think Roland fixed this.
(In reply to comment #2) > can you try with cvs HEAD or the ati-1.0-branch? I think Roland fixed this. Tried it with matthieu's help, no change. Still broken in the exact same way.
Can you try isolating the change that caused the regression with git bisect?
Did the bisect generate any results?
I finally got access to Marc's hardware, and after some manual CVS dissect, it's this commit which broke things: 5be4bf9000bdf58584a10a6b8e285d0f173304fa
I also tried current drivers from git both master and agd5f-superpatch still mis-behave on this hardware (Mobility M6).
Created attachment 7109 [details] [review] crude patch disabling the erratas all together The attached crude patch effectively fixes the problem. So now, the question is, how to disable those errata only on Mobility M6 chipsets ?
Could this be fixed before 7.2?
(In reply to comment #8) > Created an attachment (id=7109) [edit] > crude patch disabling the erratas all together > > The attached crude patch effectively fixes the problem. So now, the question > is, how to disable those errata only on Mobility M6 chipsets ? The problem is that the errata are need on M6 chips to avoid lockups or clock problems IIRC. Can you try adding a usleep of various periods after the OUTPLL calls in radeon_video.c? If that doesn't help you might try forcing the overlay clock on as we do in for the IGP chips in RADEONAllocAdaptor().
marc/matthieu: ping?
*** Bug 8938 has been marked as a duplicate of this bug. ***
So, is the slow video only caused by that OUTPLL in radeonDisplayVideo? Could you try that out by not reverting the patch but just commenting it out there (the other OUTPLL calls in radeon_video.c shouldn't be problematic)? If so, there isn't really a reason we do this every video frame. Could just recalc the ecp div and only submit to hw if it actually changes - just need to be careful the value still gets fixed up after console switching etc.
Also, what framerate do you get with the test program (from #8938) with that OUTPLL removed (or with the patch disabling the pll patches, if it makes any difference)? I'm just asking because I'm wondering what's up with that 50fps limitation - the driver just waits 5ms on a OUTPLL, which should still be enough for 200fps...
(In reply to comment #13) > So, is the slow video only caused by that OUTPLL in radeonDisplayVideo? Could > you try that out by not reverting the patch but just commenting it out there > (the other OUTPLL calls in radeon_video.c shouldn't be problematic)? If so, > there isn't really a reason we do this every video frame. Could just recalc the > ecp div and only submit to hw if it actually changes - just need to be careful > the value still gets fixed up after console switching etc. > I don't think it's the OUTPLL itself per se, but the additional stuff introduced by the OUTPLL errata hander.
I'll do your test later tonight, but until then some other insight: The test program now gives around 300 fps (previously 50). Also, the bug isn't gone now, just harder to trigger. So I suspect that the real bug is that double buffering is slightly broken, and these bugs are just instances of running into that problem. I don't know how double buffering on the radeon works, but I assume that the hw has two buffers (A and B), and switches between them on vsync blank. But what makes sure that if A is displayed and we're writing into B, that the switch doesn't occur until we've written an entire frame to B?
(In reply to comment #16) > > I don't know how double buffering on the radeon works, but I assume that the hw > has two buffers (A and B), and switches between them on vsync blank. But what > makes sure that if A is displayed and we're writing into B, that the switch > doesn't occur until we've written an entire frame to B? The driver only initiates the flip after the upload of the new frame is complete (with DMAForXv/DRI disabled, anyway...). One thing it doesn't do yet though is wait for the previous flip to complete before uploading a new frame, which could result in the upload overwriting the frame that's being displayed. Patch #7940 from bug 8938 was an attempt at fixing that, does it make any difference (for better or worse) now?
It is my intention to test that patch, but I haven't had the time yet. Hopefully this evening.
Unfortunately, the patch had no real effect this time either... :/
Removing just that call to OUTPLL is insufficient, so there is something more to it. I've been trying to get my test program to provoke the bug, but no luck so far. It seems the movies that trigger the problem are also those heavily compressed, and hence CPU hungry. So perhaps there is some race in the driver where you check some register, do something else, then act on that register. And in the cases where X doesn't get enough CPU time, that action is now wrong because the register has changed. I can't really decrypt the register jumble going on in these routines so I'm having trouble checking this myself. :)
I think I've managed to prove that this is a not a problem with the hardware flipping buffers at least (provided my test is correct). What I've done is add a large usleep() (or several in a for loop since something seems to abort the sleeps) in RADEONPutImage() and/or RADEONDisplayVideo(). The behaviour I get is consistent with the initial problem, but I can now produce it on both cards. My big discovery here is that I can see that the tear stays present for an entire frame. So it's not a matter of sync problem between RAMDAC and buffer shuffling, but that the buffer is indeed incorrect for some reason.
This gives me almost identical behaviour on both cards (including the intermittent "hang"): diff --git a/src/radeon_driver.c b/src/radeon_driver.c diff --git a/src/radeon_video.c b/src/radeon_video.c index 5138ce1..ffc6429 100644 --- a/src/radeon_video.c +++ b/src/radeon_video.c @@ -2810,6 +2810,7 @@ RADEONPutImage( int top, left, npixels, nlines, bpp; BoxRec dstBox; CARD32 tmp; + int i; /* * s2offset, s3offset - byte offsets into U and V plane of the @@ -2956,6 +2957,9 @@ RADEONPutImage( break; } + for (i = 0;i < 2;i++) + usleep(10000); + /* update cliplist */ if(!REGION_EQUAL(pScrn->pScreen, &pPriv->clip, clipBoxes)) {
(In reply to comment #20) > Removing just that call to OUTPLL is insufficient, so there is something more to it. Are you sure you got the right OUTPLL? I've verified that no other PLL reads or writes occur other than there when playing a video. I've also tried enabling the chip errata on my chip rv250. Guess what happens - nothing. Well actually something does happen, cpu time will be LOWER. The likely reason being that X won't consume cpu cycles during the usleep, otherwise it probably would have had to wait for RADEONWaitForFIFO anyway since the gpu is still busy uploading the data (this effect probably could only be seen with dma for xv)... And I was wrong assuming you could still get 200fps even with those pll erratas. INPLL causes this wait just as well, so the max would be 100fps.
I commented out this thing: OUTPLL(pScrn, RADEON_VCLK_ECP_CNTL, (INPLL(pScrn, RADEON_VCLK_ECP_CNTL) & 0xfffffCff) | (ecp_div << 8));
(In reply to comment #24) > I commented out this thing: > > OUTPLL(pScrn, RADEON_VCLK_ECP_CNTL, > (INPLL(pScrn, RADEON_VCLK_ECP_CNTL) & 0xfffffCff) | (ecp_div << 8)); Well ok then (I just asked because it's easy to get the wrong one as the same code can be found in another function...). I've no explanation why it wouldn't work even without that OUTPLL/INPLL pair.
How about the delay patch? Does it seem reasonable? Since it is right after the data copy, I assume it should be something the driver can handle (but it isn't).
I actually think now the behaviour is correct. At least the "tearing" I got when making the driver wait wasn't tearing, but simply the result of the asynchronous X protocol. What happens if the driver is too slow is that somewhere those XvPutImage requests get queued up. Since mplayer only uses 2 different buffers, and the driver just does all requests in-order, it will end up copying buffers which no longer exist (i.e. have been replaced with newer data), if it gets enough cpu time (that is, if it had 20 requests queued up, and would now get enough cpu time it would copy the newest 2 buffers for 10 times till the queue is empty), which will obviously look broken as the image gets back in time...
So mplayer assumes that by the time it wants to fill that buffer again, X has finished copying it. That sounds very broken, and not really consistent with observed behaviour. If mplayer didn't pay attention to what X is doing with the buffer, then it wouldn't be able to detect and whine about "your machine is too slow to play this". Right, in libvo/vo_xv.c we can find this: static void flip_page(void) { put_xvimage( xvimage[current_buf] ); /* remember the currently visible buffer */ visible_buf = current_buf; if (num_buffers > 1) { current_buf = vo_directrendering ? 0 : ((current_buf + 1) % num_buffers); XFlush(mDisplay); } else XSync(mDisplay, False); return; }
(In reply to comment #28) > So mplayer assumes that by the time it wants to fill that buffer again, X has > finished copying it. That sounds very broken, and not really consistent with > observed behaviour. > > If mplayer didn't pay attention to what X is doing with the buffer, then it > wouldn't be able to detect and whine about "your machine is too slow to play this". > > Right, in libvo/vo_xv.c we can find this: > > static void flip_page(void) > { > put_xvimage( xvimage[current_buf] ); > > /* remember the currently visible buffer */ > visible_buf = current_buf; > > if (num_buffers > 1) > { > current_buf = > vo_directrendering ? 0 : ((current_buf + 1) % num_buffers); > XFlush(mDisplay); > } else > XSync(mDisplay, False); > return; > } So I can't see where it would wait or notice X hasn't finished the last request. XFlush does nothing, and XSync isn't called. It will never show the "too slow" message unless mplayer itself does not get enough cpu time to decode a picture or the event queue would be full (I think - not sure there what happens in that case). Yes this looks a bit broken, but under the assumption that the decoding time of mplayer is larger than what the driver needs, and the schedular is at least somewhat fair, it's only going to be a problem if the box is too slow anyway, so who cares.
(In reply to comment #23) > The likely reason being that X won't consume cpu cycles during the usleep, > otherwise it probably would have had to wait for RADEONWaitForFIFO anyway > since the gpu is still busy uploading the data BTW, most if not all of these RADEONWaitForFIFO() calls are probably bogus. I doubt the overlay registers go through the FIFO, but even if they do, RADEONWaitForFIFO() is useless while the CP is busy, as it would likely fill up any available FIFO slots anyway. > And I was wrong assuming you could still get 200fps even with those pll erratas. > INPLL causes this wait just as well, so the max would be 100fps. That's assuming the delays are accurate... which I suspect is rather unlikely, although in both directions due to scheduling latency and signals.
(In reply to comment #29) > So I can't see where it would wait or notice X hasn't finished the last request. > XFlush does nothing, and XSync isn't called. It will never show the "too slow" > message unless mplayer itself does not get enough cpu time to decode a picture > or the event queue would be full (I think - not sure there what happens in that > case). Actually, XFlush is only for double buffering, which is disabled by default. So during my tests it has been calling XSync.
(In reply to comment #31) > Actually, XFlush is only for double buffering, which is disabled by default. So > during my tests it has been calling XSync. Umm, mplayer certainly does double buffering by default. The manpage explicitly mentions -nodouble is only meant for debugging...
Seems you're right. I did a grep for vo_doublebuffering and got "int vo_doublevuffering=0;". What I failed to notice was that it was in the file mencoder.c. In libvo/video_out.c we can find "int vo_doublebuffering = 1;". So if we assume the problem is entirely mplayer at this point, running mplayer with -nodouble should result in the tearing going away, right? The driver should still be doing its double buffering (based on XV_DOUBLE_BUFFER).
(In reply to comment #33) > So if we assume the problem is entirely mplayer at this point, running mplayer > with -nodouble should result in the tearing going away, right? The driver should > still be doing its double buffering (based on XV_DOUBLE_BUFFER). should work, if mplayer leaves the XV_DOUBLE_BUFFER alone.
Ok, I have now confirmed that there is no problem with tearing if you run mplayer with -nodouble. That seems to confirm that there is no inherent tearing problem with the radeon driver's double buffering. So I guess this bug is really that the errata fix slows down video to the point that many movies fail to render at a reasonable speed. But I don't think we have a fix that everyone is satisfied with? Out of curiosity, is there a way to do mplayer's double buffering properly? I.e., can you check if the pixmap you're about to overwrite has been copied yet?
(In reply to comment #35) > So I guess this bug is really that the errata fix slows down video to the point > that many movies fail to render at a reasonable speed. Indeed. > But I don't think we have > a fix that everyone is satisfied with? No, as backing out the erratas is not an option, and since apparently backing out only the outpll/inpll in radeondisplayvideo doesn't do anything neither (I think it would be a good idea to do that anyway though, of course with making sure we still get the correct ecp div) I'm at a loss what else should be done (or even what's causing the slowdown). > Out of curiosity, is there a way to do mplayer's double buffering properly? > I.e., can you check if the pixmap you're about to overwrite has been copied yet? It should be possible to do this with XSync, just don't call it right after the XVPutImage (pointless doing double buffering...), and probably not right before uploading neither (due to the possible high latency it might have for instance over network) but somewhere in the middle of decoding a new frame (but do it from a different thread so you don't need to wait in the decoding thread?), so that in the normal case X should already be finished and no unneeded waiting happens. Or maybe use XPending or friends. Dunno.
I've spent the night fiddling back and forth with the driver, and I have some good news. Removing the OUTPLL has the same results as removing the entire errata. My earlier report was based on too few samples, so it wasn't the complete truth. I've been running both versions during several tests here, and the results are the same when you even things out. The difference is just a bit of dumb luck whether X or mplayer is the one getting shafted CPU wise. As for a solution, as long as I run X at nice -20, the tearing goes away. So if you can kill of that OUTPLL then I'll be a happy camper. :)
That makes sense... look at the implementation of OUTPLL.. It has specific workarounds for chip errata, which on the M6, afaik, ends up doing a usleep(5000); on every access. That might definitely cause performance problems for Xv.
I've commited a fix to avoid the INPLL/OUTPLL pair causing a 10ms delay, though I'm not sure it fixes all problems. After all, for 24fps video this would "only" have caused a 240ms additional max delay (in case of dri/dmaforxv less because the chip is busy anyway there).
Great, I'll give it a try. As for the delay, it actually seems reasonable. At 300 fps (OUTPLL removed), it takes 3 ms to push an image to the card. At 50 fps (with OUTPLL), it is 20 ms per frame, or about 8 ms extra delay per sleep. Considering that we tell the kernel to sleep for _at least_ 5 ms, and the kernel has a HZ of 250, it must sleep for at least two ticks, which is precisely 8 ms.
Terribly sorry, I completely forgot about this issue. I've tested the patch and I can verify that it works nicely. Feel free to close the bug.
Reported to be fixed, closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.