mplayer and ogle show badly broken video on mpeg2, mpeg4 (avi), realmedia
at least, when xvideo is used.
Some chunks of the video are flickering, and the video freezes for a few
seconds from time to time.
This is very specific to this Radeon model. This has been shown to be
reproducible by both Bernd Ahlers (email@example.com) and Robert Nagy
(firstname.lastname@example.org) on three distinct laptops: two thinkpads, and a
This is a regression from the pre-6.9.0 radeon driver. Replacing the driver
modules with the old driver, compiled with the newer XOrg framework works just
fine, without changing anything else.
You can ask Matthieu Herb (email@example.com) for specific details about X
integration in OpenBSD.
this is probably related to the radeon gatos code that got merged.
can you try with cvs HEAD or the ati-1.0-branch? I think Roland fixed this.
(In reply to comment #2)
> can you try with cvs HEAD or the ati-1.0-branch? I think Roland fixed this.
Tried it with matthieu's help, no change. Still broken in the exact same way.
Can you try isolating the change that caused the regression with git bisect?
Did the bisect generate any results?
I finally got access to Marc's hardware, and after some manual CVS dissect,
it's this commit which broke things: 5be4bf9000bdf58584a10a6b8e285d0f173304fa
I also tried current drivers from git both master and agd5f-superpatch still
mis-behave on this hardware (Mobility M6).
Created attachment 7109 [details] [review]
crude patch disabling the erratas all together
The attached crude patch effectively fixes the problem. So now, the question
is, how to disable those errata only on Mobility M6 chipsets ?
Could this be fixed before 7.2?
(In reply to comment #8)
> Created an attachment (id=7109) 
> crude patch disabling the erratas all together
> The attached crude patch effectively fixes the problem. So now, the question
> is, how to disable those errata only on Mobility M6 chipsets ?
The problem is that the errata are need on M6 chips to avoid lockups or clock
problems IIRC. Can you try adding a usleep of various periods after the OUTPLL
calls in radeon_video.c? If that doesn't help you might try forcing the overlay
clock on as we do in for the IGP chips in RADEONAllocAdaptor().
*** Bug 8938 has been marked as a duplicate of this bug. ***
So, is the slow video only caused by that OUTPLL in radeonDisplayVideo? Could
you try that out by not reverting the patch but just commenting it out there
(the other OUTPLL calls in radeon_video.c shouldn't be problematic)? If so,
there isn't really a reason we do this every video frame. Could just recalc the
ecp div and only submit to hw if it actually changes - just need to be careful
the value still gets fixed up after console switching etc.
Also, what framerate do you get with the test program (from #8938) with that
OUTPLL removed (or with the patch disabling the pll patches, if it makes any
difference)? I'm just asking because I'm wondering what's up with that 50fps
limitation - the driver just waits 5ms on a OUTPLL, which should still be enough
(In reply to comment #13)
> So, is the slow video only caused by that OUTPLL in radeonDisplayVideo? Could
> you try that out by not reverting the patch but just commenting it out there
> (the other OUTPLL calls in radeon_video.c shouldn't be problematic)? If so,
> there isn't really a reason we do this every video frame. Could just recalc the
> ecp div and only submit to hw if it actually changes - just need to be careful
> the value still gets fixed up after console switching etc.
I don't think it's the OUTPLL itself per se, but the additional stuff introduced
by the OUTPLL errata hander.
I'll do your test later tonight, but until then some other insight:
The test program now gives around 300 fps (previously 50).
Also, the bug isn't gone now, just harder to trigger. So I suspect that the real
bug is that double buffering is slightly broken, and these bugs are just
instances of running into that problem.
I don't know how double buffering on the radeon works, but I assume that the hw
has two buffers (A and B), and switches between them on vsync blank. But what
makes sure that if A is displayed and we're writing into B, that the switch
doesn't occur until we've written an entire frame to B?
(In reply to comment #16)
> I don't know how double buffering on the radeon works, but I assume that the hw
> has two buffers (A and B), and switches between them on vsync blank. But what
> makes sure that if A is displayed and we're writing into B, that the switch
> doesn't occur until we've written an entire frame to B?
The driver only initiates the flip after the upload of the new frame is complete
(with DMAForXv/DRI disabled, anyway...). One thing it doesn't do yet though is
wait for the previous flip to complete before uploading a new frame, which could
result in the upload overwriting the frame that's being displayed. Patch #7940
from bug 8938 was an attempt at fixing that, does it make any difference (for
better or worse) now?
It is my intention to test that patch, but I haven't had the time yet. Hopefully
Unfortunately, the patch had no real effect this time either... :/
Removing just that call to OUTPLL is insufficient, so there is something more to it.
I've been trying to get my test program to provoke the bug, but no luck so far.
It seems the movies that trigger the problem are also those heavily compressed,
and hence CPU hungry. So perhaps there is some race in the driver where you
check some register, do something else, then act on that register. And in the
cases where X doesn't get enough CPU time, that action is now wrong because the
register has changed.
I can't really decrypt the register jumble going on in these routines so I'm
having trouble checking this myself. :)
I think I've managed to prove that this is a not a problem with the hardware
flipping buffers at least (provided my test is correct).
What I've done is add a large usleep() (or several in a for loop since something
seems to abort the sleeps) in RADEONPutImage() and/or RADEONDisplayVideo(). The
behaviour I get is consistent with the initial problem, but I can now produce it
on both cards.
My big discovery here is that I can see that the tear stays present for an
entire frame. So it's not a matter of sync problem between RAMDAC and buffer
shuffling, but that the buffer is indeed incorrect for some reason.
This gives me almost identical behaviour on both cards (including the
diff --git a/src/radeon_driver.c b/src/radeon_driver.c
diff --git a/src/radeon_video.c b/src/radeon_video.c
index 5138ce1..ffc6429 100644
@@ -2810,6 +2810,7 @@ RADEONPutImage(
int top, left, npixels, nlines, bpp;
+ int i;
* s2offset, s3offset - byte offsets into U and V plane of the
@@ -2956,6 +2957,9 @@ RADEONPutImage(
+ for (i = 0;i < 2;i++)
/* update cliplist */
if(!REGION_EQUAL(pScrn->pScreen, &pPriv->clip, clipBoxes))
(In reply to comment #20)
> Removing just that call to OUTPLL is insufficient, so there is something more
Are you sure you got the right OUTPLL? I've verified that no other PLL reads or
writes occur other than there when playing a video.
I've also tried enabling the chip errata on my chip rv250. Guess what happens -
nothing. Well actually something does happen, cpu time will be LOWER. The likely
reason being that X won't consume cpu cycles during the usleep, otherwise it
probably would have had to wait for RADEONWaitForFIFO anyway since the gpu is
still busy uploading the data (this effect probably could only be seen with dma
And I was wrong assuming you could still get 200fps even with those pll erratas.
INPLL causes this wait just as well, so the max would be 100fps.
I commented out this thing:
(INPLL(pScrn, RADEON_VCLK_ECP_CNTL) & 0xfffffCff) | (ecp_div << 8));
(In reply to comment #24)
> I commented out this thing:
> OUTPLL(pScrn, RADEON_VCLK_ECP_CNTL,
> (INPLL(pScrn, RADEON_VCLK_ECP_CNTL) & 0xfffffCff) | (ecp_div << 8));
Well ok then (I just asked because it's easy to get the wrong one as the same
code can be found in another function...). I've no explanation why it wouldn't
work even without that OUTPLL/INPLL pair.
How about the delay patch? Does it seem reasonable? Since it is right after the
data copy, I assume it should be something the driver can handle (but it isn't).
I actually think now the behaviour is correct. At least the "tearing" I got when
making the driver wait wasn't tearing, but simply the result of the asynchronous
X protocol. What happens if the driver is too slow is that somewhere those
XvPutImage requests get queued up. Since mplayer only uses 2 different buffers,
and the driver just does all requests in-order, it will end up copying buffers
which no longer exist (i.e. have been replaced with newer data), if it gets
enough cpu time (that is, if it had 20 requests queued up, and would now get
enough cpu time it would copy the newest 2 buffers for 10 times till the queue
is empty), which will obviously look broken as the image gets back in time...
So mplayer assumes that by the time it wants to fill that buffer again, X has
finished copying it. That sounds very broken, and not really consistent with
If mplayer didn't pay attention to what X is doing with the buffer, then it
wouldn't be able to detect and whine about "your machine is too slow to play this".
Right, in libvo/vo_xv.c we can find this:
static void flip_page(void)
put_xvimage( xvimage[current_buf] );
/* remember the currently visible buffer */
visible_buf = current_buf;
if (num_buffers > 1)
vo_directrendering ? 0 : ((current_buf + 1) % num_buffers);
(In reply to comment #28)
> So mplayer assumes that by the time it wants to fill that buffer again, X has
> finished copying it. That sounds very broken, and not really consistent with
> observed behaviour.
> If mplayer didn't pay attention to what X is doing with the buffer, then it
> wouldn't be able to detect and whine about "your machine is too slow to play
> Right, in libvo/vo_xv.c we can find this:
> static void flip_page(void)
> put_xvimage( xvimage[current_buf] );
> /* remember the currently visible buffer */
> visible_buf = current_buf;
> if (num_buffers > 1)
> current_buf =
> vo_directrendering ? 0 : ((current_buf + 1) % num_buffers);
> } else
> XSync(mDisplay, False);
So I can't see where it would wait or notice X hasn't finished the last request.
XFlush does nothing, and XSync isn't called. It will never show the "too slow"
message unless mplayer itself does not get enough cpu time to decode a picture
or the event queue would be full (I think - not sure there what happens in that
case). Yes this looks a bit broken, but under the assumption that the decoding
time of mplayer is larger than what the driver needs, and the schedular is at
least somewhat fair, it's only going to be a problem if the box is too slow
anyway, so who cares.
(In reply to comment #23)
> The likely reason being that X won't consume cpu cycles during the usleep,
> otherwise it probably would have had to wait for RADEONWaitForFIFO anyway
> since the gpu is still busy uploading the data
BTW, most if not all of these RADEONWaitForFIFO() calls are probably bogus. I
doubt the overlay registers go through the FIFO, but even if they do,
RADEONWaitForFIFO() is useless while the CP is busy, as it would likely fill up
any available FIFO slots anyway.
> And I was wrong assuming you could still get 200fps even with those pll erratas.
> INPLL causes this wait just as well, so the max would be 100fps.
That's assuming the delays are accurate... which I suspect is rather unlikely,
although in both directions due to scheduling latency and signals.
(In reply to comment #29)
> So I can't see where it would wait or notice X hasn't finished the last request.
> XFlush does nothing, and XSync isn't called. It will never show the "too slow"
> message unless mplayer itself does not get enough cpu time to decode a picture
> or the event queue would be full (I think - not sure there what happens in that
Actually, XFlush is only for double buffering, which is disabled by default. So
during my tests it has been calling XSync.
(In reply to comment #31)
> Actually, XFlush is only for double buffering, which is disabled by default. So
> during my tests it has been calling XSync.
Umm, mplayer certainly does double buffering by default. The manpage explicitly
mentions -nodouble is only meant for debugging...
Seems you're right. I did a grep for vo_doublebuffering and got "int
vo_doublevuffering=0;". What I failed to notice was that it was in the file
mencoder.c. In libvo/video_out.c we can find "int vo_doublebuffering = 1;".
So if we assume the problem is entirely mplayer at this point, running mplayer
with -nodouble should result in the tearing going away, right? The driver should
still be doing its double buffering (based on XV_DOUBLE_BUFFER).
(In reply to comment #33)
> So if we assume the problem is entirely mplayer at this point, running mplayer
> with -nodouble should result in the tearing going away, right? The driver should
> still be doing its double buffering (based on XV_DOUBLE_BUFFER).
should work, if mplayer leaves the XV_DOUBLE_BUFFER alone.
Ok, I have now confirmed that there is no problem with tearing if you run
mplayer with -nodouble. That seems to confirm that there is no inherent tearing
problem with the radeon driver's double buffering.
So I guess this bug is really that the errata fix slows down video to the point
that many movies fail to render at a reasonable speed. But I don't think we have
a fix that everyone is satisfied with?
Out of curiosity, is there a way to do mplayer's double buffering properly?
I.e., can you check if the pixmap you're about to overwrite has been copied yet?
(In reply to comment #35)
> So I guess this bug is really that the errata fix slows down video to the point
> that many movies fail to render at a reasonable speed.
> But I don't think we have
> a fix that everyone is satisfied with?
No, as backing out the erratas is not an option, and since apparently backing
out only the outpll/inpll in radeondisplayvideo doesn't do anything neither (I
think it would be a good idea to do that anyway though, of course with making
sure we still get the correct ecp div) I'm at a loss what else should be done
(or even what's causing the slowdown).
> Out of curiosity, is there a way to do mplayer's double buffering properly?
> I.e., can you check if the pixmap you're about to overwrite has been copied yet?
It should be possible to do this with XSync, just don't call it right after the
XVPutImage (pointless doing double buffering...), and probably not right before
uploading neither (due to the possible high latency it might have for instance
over network) but somewhere in the middle of decoding a new frame (but do it
from a different thread so you don't need to wait in the decoding thread?), so
that in the normal case X should already be finished and no unneeded waiting
happens. Or maybe use XPending or friends. Dunno.
I've spent the night fiddling back and forth with the driver, and I have some
good news. Removing the OUTPLL has the same results as removing the entire errata.
My earlier report was based on too few samples, so it wasn't the complete truth.
I've been running both versions during several tests here, and the results are
the same when you even things out. The difference is just a bit of dumb luck
whether X or mplayer is the one getting shafted CPU wise.
As for a solution, as long as I run X at nice -20, the tearing goes away. So if
you can kill of that OUTPLL then I'll be a happy camper. :)
That makes sense... look at the implementation of OUTPLL.. It has specific
workarounds for chip errata, which on the M6, afaik, ends up doing a
usleep(5000); on every access. That might definitely cause performance problems
I've commited a fix to avoid the INPLL/OUTPLL pair causing a 10ms delay, though
I'm not sure it fixes all problems. After all, for 24fps video this would "only"
have caused a 240ms additional max delay (in case of dri/dmaforxv less because
the chip is busy anyway there).
Great, I'll give it a try.
As for the delay, it actually seems reasonable. At 300 fps (OUTPLL removed), it
takes 3 ms to push an image to the card. At 50 fps (with OUTPLL), it is 20 ms
per frame, or about 8 ms extra delay per sleep.
Considering that we tell the kernel to sleep for _at least_ 5 ms, and the kernel
has a HZ of 250, it must sleep for at least two ticks, which is precisely 8 ms.
Terribly sorry, I completely forgot about this issue.
I've tested the patch and I can verify that it works nicely. Feel free to close
Reported to be fixed, closing.