Summary: | Incorrect Pixmap pointer causing prepareaccess / finishaccess inbalance | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Thomas Hellström <thomas> | ||||||||||||
Component: | Server/Acceleration/EXA | Assignee: | Xorg Project Team <xorg-team> | ||||||||||||
Status: | RESOLVED WORKSFORME | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||
Severity: | blocker | ||||||||||||||
Priority: | medium | CC: | madman2003, thomas | ||||||||||||
Version: | git | Keywords: | patch | ||||||||||||
Hardware: | Other | ||||||||||||||
OS: | All | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Bug Depends on: | |||||||||||||||
Bug Blocks: | 20314 | ||||||||||||||
Attachments: |
|
Description
Thomas Hellström
2009-02-19 01:36:56 UTC
Created attachment 23112 [details] [review] Make exaPixmapIsOffscreen fix up the pixmap ptr This patch makes exaPixmapIsOffscreen fix up the pixmap pointer regardless whether it's NULL or not. This Fixes the problem for me, but I'm not sure this is the correct fix. > Apparently under some circumstances pPixmap->devPrivate.ptr is set to a
> non-NULL non-fb address
That sounds like the real bug, which should be tracked down.
I'm not sure we can use your patch even as a workaround, I think nouveau relies on its own ptr value being preserved.
Is this not broken yet in 1.5?
(In reply to comment #2) > > Apparently under some circumstances pPixmap->devPrivate.ptr is set to a > > non-NULL non-fb address > > That sounds like the real bug, which should be tracked down. > > I'm not sure we can use your patch even as a workaround, I think nouveau relies > on its own ptr value being preserved. Understood. What are the rules for the pPixmap->devPrivate.ptr value? When is it supposed to be NULL, and when is it supposed to be populated? I have some concerns, though, about ExaDoPrepareAccess being called in the migration code, as if the corresponding driver call fails, it will try to move out the pixmap recursively? > > Is this not broken yet in 1.5? > It's possible. We've had very limited testing on that version. Although it's possible to work around this issue in the driver it's pretty important that this get fixed before the 1.6 release. (In reply to comment #3) > What are the rules for the pPixmap->devPrivate.ptr value? When is it supposed > to be NULL, and when is it supposed to be populated? The basic idea is for it to be non-NULL only between PrepareAccess and FinishAccess, to catch paths which try to access the data behind our back. > I have some concerns, though, about ExaDoPrepareAccess being called in the > migration code, as if the corresponding driver call fails, it will try to move > out the pixmap recursively? The driver PrepareAccess hook may only fail if there's a DownloadFromScreen hook which can't fail. > > Is this not broken yet in 1.5? > > It's possible. We've had very limited testing on that version. It would be good to verify that, and if 1.5 isn't broken maybe try to bisect what broke it. (In reply to comment #4) > (In reply to comment #3) > > What are the rules for the pPixmap->devPrivate.ptr value? When is it supposed > > to be NULL, and when is it supposed to be populated? > > The basic idea is for it to be non-NULL only between PrepareAccess and > FinishAccess, to catch paths which try to access the data behind our back. > So In the Nouveau case, where the driver populates this pointer, when we're migrating a pixmap from system to VRAM, and the do_migrate code is relying on exaPixmapIsOffscreen() to return true, How is the driver tricked into returning true? (In reply to comment #5) > So In the Nouveau case, where the driver populates this pointer, when we're > migrating a pixmap from system to VRAM, and the do_migrate code is relying on > exaPixmapIsOffscreen() to return true, How is the driver tricked into returning > true? Actually, while I'm not sure what Maarten's change (2056c10d12f6b4facf628b861230f9b0a13f3a73 on server-1.6-branch) was needed for or if it couldn't be achieved in a better way, it looks like it only affects drivers which provide a CreatePixmap hook anyway, so it probably isn't relevant for openchrome. (In reply to comment #6) > (In reply to comment #5) > > So In the Nouveau case, where the driver populates this pointer, when we're > > migrating a pixmap from system to VRAM, and the do_migrate code is relying on > > exaPixmapIsOffscreen() to return true, How is the driver tricked into returning > > true? > > Actually, while I'm not sure what Maarten's change > (2056c10d12f6b4facf628b861230f9b0a13f3a73 on server-1.6-branch) was needed for > or if it couldn't be achieved in a better way, it looks like it only affects > drivers which provide a CreatePixmap hook anyway, so it probably isn't relevant > for openchrome. > So this means that the patch would work OK? This bug more importantly impacts GMA500 as well. /Thomas (In reply to comment #7) > So this means that the patch would work OK? Not sure, but possibly not, as exaPixmapIsOffscreen() will be called for nouveau as well. Maarten? > This bug more importantly impacts GMA500 as well. Better find the root cause then. :) 2056c10d12f6b4facf628b861230f9b0a13f3a73 only affects driver allocated pixmap drivers, the change was made to allow non-offscreen pixmaps, which are guaranteed to have a valid pointer at all times. So the pointer has to be NULL'ed before the driver gets to it. It's completely irrelevant to "classic" exa. The patch should cause no issues for nouveau, there is still the issue who or what is messing with devPrivate.ptr, or forgetting to init it. Created attachment 23201 [details] [review] Possible fix I suspect devPrivate.ptr is non-NULL due to ModifyPixmapHeader(). Thomas, does this patch fix the problem? Maarten, would this work for nouveau as well? This patch has a problem, in the sense that it now forbids non-offscreen pixmaps. So at the very least add a if (exaPixmapIsOffscreen(pPixmap)) before NULL'ing devprivate-ptr. Created attachment 23205 [details] [review] Possible fix, take two Better patch which makes sure the ret variable isn't used uninitialized. (In reply to comment #11) > This patch has a problem, in the sense that it now forbids non-offscreen > pixmaps. What exactly do you mean with 'non-offscreen' pixmaps? I don't see how this can break so long as exaPrepareAccess is always called before dereferencing devPrivate.ptr. Check ExaDoPrepareAccess() please and see what happens for non-offscreen pixmaps. (In reply to comment #14) > Check ExaDoPrepareAccess() please and see what happens for non-offscreen > pixmaps. Still not sure what you mean... if you mean that the driver PrepareAccess hook doesn't get called, then that is by design, you'll have to make sure exaPixmapIsOffscreen() returns TRUE. Otherwise please be more specific. The driver has to manage all pixmaps (in driver_alloc case), including those that make no sense whatsoever in an acceleration path. So you create non-offscreen pixmaps, which guarantees you it never ever touches acceleration code. But at the same time you need to have valid devPrivate.ptr. That is all i'm trying to do. (In reply to comment #16) > But at the same time you need to have valid devPrivate.ptr. ExaDoPrepareAccess gives you that. Except when the pixmap isn't considered offscreen. Because you NULL'ed it after ModifyPixmapHeader. Created attachment 23206 [details] [review] Possible fix, take three Okay, I think I see the problem now - why didn't you just mention the !(pExaScr->info->flags & EXA_HANDLES_PIXMAPS) test rather than having me second-guess? So how about this? How about also NULL'ing it again on FinishAccess? Perhaps only when exaPixmapIsOffscreen fails. Created attachment 23207 [details] [review] Possible fix, take four > How about also NULL'ing it again on FinishAccess? Perhaps only when > exaPixmapIsOffscreen fails. No, the whole idea is to always keep it NULL except between PrepareAcess and FinishAccess. This patch takes care of exaFinishAccess as well as exaPixmapIsOffscreen. This comment could be removed (imo): /* Rehide pixmap pointer if we're doing that. */ Also, maybe add a note to exa.h that under no circumstance the driver can rely on the contents of devPrivate.ptr when FinishAccess is called. While i don't expect anyone to do this, it was still possible in the past (for EXA_HANDLES_PIXMAPS). Codewise the patch looks ok. Unfortunately it doesn't fix the problem. I still get a couple of these pixmap. Typically when I start mozilla. /Thomas (In reply to comment #23) > Unfortunately it doesn't fix the problem. I still get a couple of these pixmap. Bummer. Were you able to find out anything about how the problem comes to be? Maybe a Prepare/FinishAccess asymmetry somewhere? > Typically when I start mozilla. As in seamonkey? If so, I wonder if bug 16416 could be related... (In reply to comment #24) > (In reply to comment #23) > > Unfortunately it doesn't fix the problem. I still get a couple of these pixmap. > > Bummer. Were you able to find out anything about how the problem comes to be? > Maybe a Prepare/FinishAccess asymmetry somewhere? > No, I don't think, so, I've debugged the prepareaccess read / write refcount and it seems to be OK, except the read refcount drops below zero when this bug is hit. The driver should probably hang on an inbalance in the other direction since it uses TTM syncing in the prepare / finishaccess hooks. It's a bit odd, though, that it's the read refcount indicating that we want to read from offscreen memory, given that the stale pointer value points to a system area. This would perhaps indicate that the pointer is not nulled after migration from system to fb. Anyway what I've found out so far, is that this seems to be ARGB 8888 pixmaps used for "over" compositing of glyphs without a mask pixmap. > > Typically when I start mozilla. > > As in seamonkey? If so, I wonder if bug 16416 could be related... > The outcome looks very similar to what I'm seeing. This is because prepareAccess is used for synchronization instead of the normal exa Sync methods. When the prepareAccess refcount is nonzero for an offscreen buffer backing a pixmap, no synchronization takes place, and the client will write into the pixmap before the GPU is done with it. So the problem becomes much more obvious if the GPU fifo is populated (for example by running gears) when the prepareaccess refcount is incorrect. If the normal exa sync methods are used in the ati driver, the bugs are probably not related. (In reply to comment #25) > It's a bit odd, though, that it's the read refcount indicating that we want to > read from offscreen memory, given that the stale pointer value points to a > system area. This would perhaps indicate that the pointer is not nulled after > migration from system to fb. Hmm, I don't see how that could happen... > Anyway what I've found out so far, is that this seems to be ARGB 8888 pixmaps > used for "over" compositing of glyphs without a mask pixmap. From exaGlyphsToDst() then? If so, I wonder if http://lists.x.org/archives/xorg-devel/2009-February/000276.html could indirectly avoid the problem... Is it a glyph cache pixmap (1024xyyy) or an actual glyph pixmap? > If the normal exa sync methods are used in the ati driver, the bugs are > probably not related. They are. BTW, are you getting per-pixmap driver private data via a CreatePixmap hook / exaGetPixmapDriverPrivate or via some other means? Thomas: give xserver git a try, if it works without spewing errors, then i suggest you bisect over the last series of exa patches. Sorry for being unresponsive. I'll hopfully get some time early next week for this. /Thomas (In reply to comment #28) > Sorry for being unresponsive. > I'll hopfully get some time early next week for this. How time flys. :) Where do we stand on this? Is this still a bug? No response in over 2 years. Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.