Attached is a patch that does the following to the fbmmx implementation: 1. It replaces gcc-specific __builtin_ia32_* functions with the standard MMX and SSE intrinsic types and functions from mmintrin.h and xmmintrin.h. This makes the code easier to understand and allows for the possible use of MMX other x86/x86-64 compilers that don't support gcc's builtin functions. 2. It replaces all inline assembly (except the cpuid use in fbHaveMMX) with intrinsics. This is easier to understand, compatible with compilers other than gcc, and (in some cases) results in better code generation. (In other cases, specifically the use of _mm_cvtsi64_si32 and _mm_cvtsi32_si64, extra instructions are generated. I expect this to improve with future gcc releases.) 3. Support for fbmmx on AMD64 systems is added. fbHaveMMX() is defined to TRUE on AMD64 systems. SSE support (in the form of the MMX pshufw instruction) is also unconditionally enabled on AMD64. SSE remains disabled on i386. 4. The USE_GCC34_MMX macro is renamed to the more general USE_MMX, under the assumption compilers other than gcc may want to use MMX. A USE_SSE macro is introduced, which unconditionally enables (i.e. no cpuid test) the above mentiond pshufw instruction. 5. Access to constants in the MMX data structure are now done using the MC() macro, which hides away those messy casts. All references to the zero constant are replaced with _mm_setzero_si64(), which generates "pxor %mm, %mm" and is recommended instead of constant loads in the AMD optimization manual. 6. The shift function is replaced by open-coded instances of _mm_srli_pi16 and _mm_slli_pi16. 7. Some comments were added, updated, or reformatted for 80 columns. This patch has been tested on AMD64 against mharris's xorg-x11-6.7.99.1 packages. rendercheck passes without error and the benchmark from bug #839 reports a 1.2 second overall improvement compared to 6.7.0.
Created attachment 624 [details] [review] patch
Just skimming over the patch it looks very good to me. Two comments though: - GCC 3.3 is unusably buggy wrt. to intrinsics. GCC 3.4 is buggy, but you can workaround. We need to make sure the code continues to work with GCC 3.4. - This is not material for the upcoming release, but we should looks at it for the next one.
(In reply to comment #2) > - GCC 3.3 is unusably buggy wrt. to intrinsics. GCC 3.4 is buggy, but you > can workaround. We need to make sure the code continues to work > with GCC 3.4. It's inclusion is still keyed off of the use of GCC 3.4, and will remain that way until somebody adds support for other compilers. I only built and tested it with gcc 3.4 (and am using it right now, without any problem).
> 6. The shift function is replaced by open-coded instances of _mm_srli_pi16 and > _mm_slli_pi16. I'd really like to keep the shift function. To me, shift (blah, -48) is much clearer than _mm_slli_si64 (blah, 48), and since the function is inlined and optimized, gcc doesn't generate any worse code for it when the argument is a compile time constant. Other than that, the patch looks fine to me.
(In reply to comment #4) > I'd really like to keep the shift function. To me, > > shift (blah, -48) > > is much clearer than _mm_slli_si64 (blah, 48), and since the function is > inlined and optimized, gcc doesn't generate any worse code for it when the > argument is a compile time constant. Eh, I found "shift" with no indication of direction other than a sign confusing, but I'm not particularly attached to it. I'll change it and re-diff soon.
Created attachment 1591 [details] [review] new patch This also fixes the non-SSE expand_alpha, which I messed up the first time around.
Committed, except that I put #include <xmmintrin.h> inside an #ifdef USE_SSE, because xmmintrin.h contains an #error that triggers if -msse is not used Mon Jan 3 12:45:10 2005 Søren Sandmann <sandmann@redhat.com> Clean-ups and support for AMD64. Bug 1067. Patch by Nicholas Miell (nmiell@comcast.net) * programs/Xserver/fb/Imakefile: Add support for AMD64 * programs/Xserver/fb/{fbmmx.[ch],fbpict.c}: Many cleanups using <mmintrin.h> instead of __builin_ia32_*, and intrinsics instead of inline assembly. Also unconditionally use pshufw on AMD64. * programs/Xserver/fb/*: s/USE_GCC34_MMX/USE_MMX/g
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.