From: malc <av1474@comtv.ru>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations.
Date: Mon, 19 Apr 2010 20:05:53 +0400 (MSD) [thread overview]
Message-ID: <alpine.LNX.2.00.1004192004080.1477@linmac> (raw)
In-Reply-To: <4BCC611C.3020202@twiddle.net>
On Mon, 19 Apr 2010, Richard Henderson wrote:
> On 04/18/2010 05:13 PM, Aurelien Jarno wrote:
> > On Tue, Apr 13, 2010 at 04:33:59PM -0700, Richard Henderson wrote:
> >> Define OPC_BSWAP. Factor opcode emission to separate functions.
> >> Use bswap+shift to implement 16-bit swap instead of a rolw; this
> >> gets the proper zero-extension required by INDEX_op_bswap16_i32.
> >
> > This is not required by INDEX_op_bswap16_i32. What is need is that the
> > value in the input register has the 16 upper bits set to 0.
>
> Ah.
Apparently i'm not the only one who misinterpreted this bit of bswap
documentation. How about:
diff --git a/tcg/README b/tcg/README
index 68d27ff..5b39a38 100644
--- a/tcg/README
+++ b/tcg/README
@@ -269,7 +269,7 @@ ext32u_i64 t0, t1
* bswap16_i32/i64 t0, t1
16 bit byte swap on a 32/64 bit value. It assumes that the two/six high
order
-bytes are set to zero.
+bytes of t1 are set to zero.
* bswap32_i32/i64 t0, t1
>
> > Considering
> > that, the rolw instruction is faster than bswap + shift.
>
> Well, no, it isn't.
>
> static inline int test_rolw(unsigned short *s)
> {
> int i, start, end;
> asm volatile("rdtsc\n\t"
> "movl %%eax, %1\n\t"
> "movzwl %3,%2\n\t"
> "rolw $8, %w2\n\t"
> "addl $1,%2\n\t"
> "rdtsc"
> : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
> return end - start;
> }
>
> static inline int test_bswap(unsigned short *s)
> {
> int i, start, end;
> asm volatile("rdtsc\n\t"
> "movl %%eax, %1\n\t"
> "movzwl %3,%2\n\t"
> "bswap %2\n\t"
> "shl $16,%2\n\t"
> "addl $1,%2\n\t"
> "rdtsc"
> : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx");
> return end - start;
> }
>
>
> model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz
> rolw 60 60 72 60 60 72 60 60 72 60
> bswap 60 60 60 60 60 60 60 60 60 60
>
> model name : Dual-Core AMD Opteron(tm) Processor 1210
> rolw 9 10 9 9 8 8 8 8 8 8
> bswap 9 9 8 8 8 8 8 8 8 8
>
> The rolw sequence isn't ever faster, and it's more unstable,
> likely due to the partial register stall I mentioned.
>
> I will grant that the rolw sequence is smaller, and I can
> adjust this patch to use that sequence if you wish.
>
>
> r~
>
>
--
mailto:av1474@comtv.ru
next prev parent reply other threads:[~2010-04-19 16:06 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-14 20:35 [Qemu-devel] [PATCH 00/21] tcg-i386 cleanup and improvement Richard Henderson
2010-04-13 22:23 ` [Qemu-devel] [PATCH 01/21] tcg-i386: Allocate call-saved registers first Richard Henderson
2010-04-13 22:26 ` [Qemu-devel] [PATCH 02/21] tcg-i386: Tidy initialization of tcg_target_call_clobber_regs Richard Henderson
2010-04-13 22:59 ` [Qemu-devel] [PATCH 03/21] tcg-i386: Tidy ext8u and ext16u operations Richard Henderson
2010-04-13 23:13 ` [Qemu-devel] [PATCH 04/21] tcg-i386: Tidy ext8s and ext16s operations Richard Henderson
2010-04-13 23:33 ` [Qemu-devel] [PATCH 05/21] tcg-i386: Tidy bswap operations Richard Henderson
2010-04-18 22:13 ` Aurelien Jarno
2010-04-19 13:56 ` Richard Henderson
2010-04-19 16:05 ` malc [this message]
2010-04-19 19:19 ` Richard Henderson
2010-04-13 23:44 ` [Qemu-devel] [PATCH 06/21] tcg-i386: Tidy shift operations Richard Henderson
2010-04-14 14:58 ` [Qemu-devel] [PATCH 07/21] tcg-i386: Tidy move operations Richard Henderson
2010-04-14 15:06 ` [Qemu-devel] [PATCH 08/21] tcg-i386: Eliminate extra move from qemu_ld64 Richard Henderson
2010-04-14 15:26 ` [Qemu-devel] [PATCH 09/21] tcg-i386: Tidy jumps Richard Henderson
2010-04-14 15:38 ` [Qemu-devel] [PATCH 10/21] tcg-i386: Tidy immediate arithmetic operations Richard Henderson
2010-04-14 17:16 ` [Qemu-devel] [PATCH 11/21] tcg-i386: Tidy non-immediate " Richard Henderson
2010-04-14 17:20 ` [Qemu-devel] [PATCH 12/21] tcg-i386: Tidy movi Richard Henderson
2010-04-14 17:59 ` [Qemu-devel] [PATCH 13/21] tcg-i386: Tidy push/pop Richard Henderson
2010-04-14 18:02 ` [Qemu-devel] [PATCH 14/21] tcg-i386: Tidy calls Richard Henderson
2010-04-14 18:04 ` [Qemu-devel] [PATCH 15/21] tcg-i386: Tidy ret Richard Henderson
2010-04-14 18:07 ` [Qemu-devel] [PATCH 16/21] tcg-i386: Tidy setcc Richard Henderson
2010-04-14 18:22 ` [Qemu-devel] [PATCH 17/21] tcg-i386: Tidy unary arithmetic Richard Henderson
2010-04-14 18:29 ` [Qemu-devel] [PATCH 18/21] tcg-i386: Tidy multiply Richard Henderson
2010-04-14 18:32 ` [Qemu-devel] [PATCH 19/21] tcg-i386: Tidy xchg Richard Henderson
2010-04-14 19:08 ` [Qemu-devel] [PATCH 20/21] tcg-i386: Tidy lea Richard Henderson
2010-04-14 20:29 ` [Qemu-devel] [PATCH 21/21] tcg-i386: Use lea for three-operand add Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LNX.2.00.1004192004080.1477@linmac \
--to=av1474@comtv.ru \
--cc=aurelien@aurel32.net \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).