* [PATCH 1/2] xv: fix last pixel for big-endian machines in YV12 -> NV12 conversion
@ 2013-07-29 6:40 Ilia Mirkin
[not found] ` <1375080039-22607-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Ilia Mirkin @ 2013-07-29 6:40 UTC (permalink / raw)
To: Ben Skeggs; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Signed-off-by: Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
---
src/nouveau_xv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/nouveau_xv.c b/src/nouveau_xv.c
index 8eafcf0..567e30c 100644
--- a/src/nouveau_xv.c
+++ b/src/nouveau_xv.c
@@ -552,8 +552,11 @@ NVCopyNV12ColorPlanes(unsigned char *src1, unsigned char *src2,
if (e) {
unsigned short *vud = (unsigned short *) vuvud;
-
+#if X_BYTE_ORDER == X_BIG_ENDIAN
+ *vud = us[0] | (vs[0]<<8);
+#else
*vud = vs[0] | (us[0]<<8);
+#endif
}
dst += dstPitch;
--
1.8.1.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
[not found] ` <1375080039-22607-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
@ 2013-07-29 6:40 ` Ilia Mirkin
[not found] ` <1375080039-22607-2-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Ilia Mirkin @ 2013-07-29 6:40 UTC (permalink / raw)
To: Ben Skeggs; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
memcpy() goes from taking 45% to 66% of total function time, which
translates to a 30% decrease in NVPutImage runtime.
Signed-off-by: Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
---
src/nouveau_xv.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/src/nouveau_xv.c b/src/nouveau_xv.c
index 567e30c..5569b7c 100644
--- a/src/nouveau_xv.c
+++ b/src/nouveau_xv.c
@@ -25,6 +25,8 @@
#include "config.h"
#endif
+#include <immintrin.h>
+
#include "xf86xv.h"
#include <X11/extensions/Xv.h>
#include "exa.h"
@@ -532,30 +534,47 @@ NVCopyNV12ColorPlanes(unsigned char *src1, unsigned char *src2,
w >>= 1;
h >>= 1;
+#ifdef __SSE2__
+ l = w >> 3;
+ e = w & 7;
+#else
l = w >> 1;
e = w & 1;
+#endif
for (j = 0; j < h; j++) {
unsigned char *us = src1;
unsigned char *vs = src2;
unsigned int *vuvud = (unsigned int *) dst;
+ unsigned short *vud;
for (i = 0; i < l; i++) {
-#if X_BYTE_ORDER == X_BIG_ENDIAN
+#ifdef __SSE2__
+ _mm_storeu_si128(
+ (void*)vuvud,
+ _mm_unpacklo_epi8(
+ _mm_loadl_epi64((void*)vs),
+ _mm_loadl_epi64((void*)us)));
+ vuvud+=4;
+ us+=8;
+ vs+=8;
+#else /* __SSE2__ */
+# if X_BYTE_ORDER == X_BIG_ENDIAN
*vuvud++ = (vs[0]<<24) | (us[0]<<16) | (vs[1]<<8) | us[1];
-#else
+# else
*vuvud++ = vs[0] | (us[0]<<8) | (vs[1]<<16) | (us[1]<<24);
-#endif
+# endif
us+=2;
vs+=2;
+#endif /* __SSE2__ */
}
- if (e) {
- unsigned short *vud = (unsigned short *) vuvud;
+ vud = (unsigned short *)vuvud;
+ for (i = 0; i < e; i++) {
#if X_BYTE_ORDER == X_BIG_ENDIAN
- *vud = us[0] | (vs[0]<<8);
+ vud[i] = us[i] | (vs[i]<<8);
#else
- *vud = vs[0] | (us[0]<<8);
+ vud[i] = vs[i] | (us[i]<<8);
#endif
}
--
1.8.1.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
[not found] ` <1375080039-22607-2-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
@ 2013-07-31 17:16 ` Sven Joachim
[not found] ` <CAKb7Uvi-nVtxhENZnwm6kbRaOW-a_Gv5D_v6Ex1CuyJWTE5aUw@mail.gmail.com>
0 siblings, 1 reply; 4+ messages in thread
From: Sven Joachim @ 2013-07-31 17:16 UTC (permalink / raw)
To: Ilia Mirkin
Cc: Ben Skeggs,
public-nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW-wOFGN7rlS/M9smdsby/KFg
On 2013-07-29 08:40 +0200, Ilia Mirkin wrote:
> memcpy() goes from taking 45% to 66% of total function time, which
> translates to a 30% decrease in NVPutImage runtime.
>
> Signed-off-by: Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
> ---
> src/nouveau_xv.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/src/nouveau_xv.c b/src/nouveau_xv.c
> index 567e30c..5569b7c 100644
> --- a/src/nouveau_xv.c
> +++ b/src/nouveau_xv.c
> @@ -25,6 +25,8 @@
> #include "config.h"
> #endif
>
> +#include <immintrin.h>
> +
Unfortunately, immintrin.h is not available on most architectures,
leading to build failures as can be seen on
https://buildd.debian.org/status/package.php?p=xserver-xorg-video-nouveau.
Any ideas?
> #include "xf86xv.h"
> #include <X11/extensions/Xv.h>
> #include "exa.h"
> @@ -532,30 +534,47 @@ NVCopyNV12ColorPlanes(unsigned char *src1, unsigned char *src2,
>
> w >>= 1;
> h >>= 1;
> +#ifdef __SSE2__
> + l = w >> 3;
> + e = w & 7;
> +#else
> l = w >> 1;
> e = w & 1;
> +#endif
>
> for (j = 0; j < h; j++) {
> unsigned char *us = src1;
> unsigned char *vs = src2;
> unsigned int *vuvud = (unsigned int *) dst;
> + unsigned short *vud;
>
> for (i = 0; i < l; i++) {
> -#if X_BYTE_ORDER == X_BIG_ENDIAN
> +#ifdef __SSE2__
> + _mm_storeu_si128(
> + (void*)vuvud,
> + _mm_unpacklo_epi8(
> + _mm_loadl_epi64((void*)vs),
> + _mm_loadl_epi64((void*)us)));
> + vuvud+=4;
> + us+=8;
> + vs+=8;
> +#else /* __SSE2__ */
> +# if X_BYTE_ORDER == X_BIG_ENDIAN
> *vuvud++ = (vs[0]<<24) | (us[0]<<16) | (vs[1]<<8) | us[1];
> -#else
> +# else
> *vuvud++ = vs[0] | (us[0]<<8) | (vs[1]<<16) | (us[1]<<24);
> -#endif
> +# endif
> us+=2;
> vs+=2;
> +#endif /* __SSE2__ */
> }
>
> - if (e) {
> - unsigned short *vud = (unsigned short *) vuvud;
> + vud = (unsigned short *)vuvud;
> + for (i = 0; i < e; i++) {
> #if X_BYTE_ORDER == X_BIG_ENDIAN
> - *vud = us[0] | (vs[0]<<8);
> + vud[i] = us[i] | (vs[i]<<8);
> #else
> - *vud = vs[0] | (us[0]<<8);
> + vud[i] = vs[i] | (us[i]<<8);
> #endif
> }
>
> --
> 1.8.1.5
Cheers,
Sven
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available
[not found] ` <CAKb7Uvi-nVtxhENZnwm6kbRaOW-a_Gv5D_v6Ex1CuyJWTE5aUw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-31 17:28 ` Sven Joachim
0 siblings, 0 replies; 4+ messages in thread
From: Sven Joachim @ 2013-07-31 17:28 UTC (permalink / raw)
To: Ilia Mirkin
Cc: Ben Skeggs,
public-nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW-wOFGN7rlS/M9smdsby/KFg,
Ilia Mirkin
On 2013-07-31 19:18 +0200, Ilia Mirkin wrote:
> On Wed, Jul 31, 2013 at 1:16 PM, Sven Joachim <svenjoac-Mmb7MZpHnFY@public.gmane.org> wrote:
>>
>> Unfortunately, immintrin.h is not available on most architectures,
>> leading to build failures as can be seen on
>> https://buildd.debian.org/status/package.php?p=xserver-xorg-video-nouveau.
>
> Sorry :( I thought that immintrin.h would be available everywhere and
> just end up empty since none of the __SSE*__ would be defined. I was
> wrong.
>
>>
>> Any ideas?
>
> A fix is checked into master already:
> http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=1df177f35a05db505577cdc929e63fde906a704b
Ah, good to hear and sorry that I didn't check myself. I'll merge that
commit into the Debian package.
Cheers,
Sven
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-07-31 17:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-29 6:40 [PATCH 1/2] xv: fix last pixel for big-endian machines in YV12 -> NV12 conversion Ilia Mirkin
[not found] ` <1375080039-22607-1-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
2013-07-29 6:40 ` [PATCH 2/2] xv: speed up YV12 -> NV12 conversion using SSE2 if available Ilia Mirkin
[not found] ` <1375080039-22607-2-git-send-email-imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
2013-07-31 17:16 ` Sven Joachim
[not found] ` <CAKb7Uvi-nVtxhENZnwm6kbRaOW-a_Gv5D_v6Ex1CuyJWTE5aUw@mail.gmail.com>
[not found] ` <CAKb7Uvi-nVtxhENZnwm6kbRaOW-a_Gv5D_v6Ex1CuyJWTE5aUw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-31 17:28 ` Sven Joachim
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.