[Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr_i32
@ 2015-07-18  7:58 Richard Henderson
  2015-07-18 21:18 ` Aurelien Jarno
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Henderson @ 2015-07-18  7:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, leon.alrae, aurelien

Enforce the invariant that 32-bit quantities are zero extended
in the register.  This avoids having to re-zero-extend at memory
accesses for 32-bit guests.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
Here's an alternative to the other things we've been considering.
We could even make this conditional on USER_ONLY if you like.

This does in fact fix the mips test case.  Consider the fact that
memory operations are probably more common than truncations, and
it would seem that we have a net size win by forcing the truncate
over adding a byte for the ADDR32 (or 2 bytes for a zero-extend).

Indeed, for 2.5, we could look at dropping the existing zero-extend
from the softmmu path.  Also for 2.5, split trunc_shr into two parts,
trunc_lo and trunc_hi.  No one really uses any other value for the
shift, and x86 could really use different constraints for the two.

Thoughts?

r~
---
 tcg/i386/tcg-target.c | 9 +++++++++
 tcg/i386/tcg-target.h | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ff4d9cf..9f01369 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2040,6 +2040,14 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext32s_i64:
         tcg_out_ext32s(s, args[0], args[1]);
         break;
+    case INDEX_op_trunc_shr_i32:
+        if (args[2] > 0) {
+            tcg_out_shifti(s, SHIFT_SHR + P_REXW, args[0], args[2]);
+        }
+        if (args[2] < 32) {
+            tcg_out_ext32u(s, args[0], args[0]);
+        }
+        break;
 #endif
 
     OP_32_64(deposit):
@@ -2170,6 +2178,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_ext8u_i64, { "r", "r" } },
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
+    { INDEX_op_trunc_shr_i32,  { "r", "0" } },
 
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 25b5133..8b87050 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i32    1
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr_i32
  2015-07-18  7:58 [Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr_i32 Richard Henderson
@ 2015-07-18 21:18 ` Aurelien Jarno
  2015-07-19 11:26   ` Aurelien Jarno
  0 siblings, 1 reply; 3+ messages in thread
From: Aurelien Jarno @ 2015-07-18 21:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: pbonzini, leon.alrae, qemu-devel

On 2015-07-18 08:58, Richard Henderson wrote:
> Enforce the invariant that 32-bit quantities are zero extended
> in the register.  This avoids having to re-zero-extend at memory
> accesses for 32-bit guests.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> Here's an alternative to the other things we've been considering.
> We could even make this conditional on USER_ONLY if you like.
> 
> This does in fact fix the mips test case.  Consider the fact that
> memory operations are probably more common than truncations, and
> it would seem that we have a net size win by forcing the truncate
> over adding a byte for the ADDR32 (or 2 bytes for a zero-extend).

I think we should go with your previous patch for 2.4, and think calmly
about how to do that better for 2.5. It slightly increases the generated
code, but only in bytes, not in number of instructions, so I don't think
the performance impact is huge.

> Indeed, for 2.5, we could look at dropping the existing zero-extend
> from the softmmu path.  Also for 2.5, split trunc_shr into two parts,

From a quick look, we need to move the address to new registers anyway,
so not zero-extending will mean adding the REXW prefix.

> trunc_lo and trunc_hi.  No one really uses any other value for the
> shift, and x86 could really use different constraints for the two.

That sounds like a good idea.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr_i32
  2015-07-18 21:18 ` Aurelien Jarno
@ 2015-07-19 11:26   ` Aurelien Jarno
  0 siblings, 0 replies; 3+ messages in thread
From: Aurelien Jarno @ 2015-07-19 11:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: pbonzini, leon.alrae, qemu-devel

On 2015-07-18 23:18, Aurelien Jarno wrote:
> On 2015-07-18 08:58, Richard Henderson wrote:
> > Enforce the invariant that 32-bit quantities are zero extended
> > in the register.  This avoids having to re-zero-extend at memory
> > accesses for 32-bit guests.
> > 
> > Signed-off-by: Richard Henderson <rth@twiddle.net>
> > ---
> > Here's an alternative to the other things we've been considering.
> > We could even make this conditional on USER_ONLY if you like.
> > 
> > This does in fact fix the mips test case.  Consider the fact that
> > memory operations are probably more common than truncations, and
> > it would seem that we have a net size win by forcing the truncate
> > over adding a byte for the ADDR32 (or 2 bytes for a zero-extend).
> 
> I think we should go with your previous patch for 2.4, and think calmly
> about how to do that better for 2.5. It slightly increases the generated
> code, but only in bytes, not in number of instructions, so I don't think
> the performance impact is huge.
> 
> > Indeed, for 2.5, we could look at dropping the existing zero-extend
> > from the softmmu path.  Also for 2.5, split trunc_shr into two parts,
> 
> From a quick look, we need to move the address to new registers anyway,
> so not zero-extending will mean adding the REXW prefix.

Well looking more in details, we can move one instruction from the
fast-path to the slow-path. Here is a typical TLB code for store:

fast-path:
      mov    %rbp,%rdi
      mov    %rbp,%rsi
      shr    $0x7,%rdi
      and    $0xfffffffffffff003,%rsi
      and    $0x1fe0,%edi
      lea    0x4e68(%r14,%rdi,1),%rdi
      cmp    (%rdi),%rsi
      mov    %rbp,%rsi
      jne    0x7f45b8bcc800
      add    0x10(%rdi),%rsi
      mov    %ebx,(%rsi)

slow-path:
      mov    %r14,%rdi
      mov    %ebx,%edx
      mov    $0x22,%ecx
      lea    -0x156(%rip),%r8
      push   %r8
      jmpq   0x7f45cb337010

If we know that %rbp is properly zero-extend when needed, we can change
the end of the fast path into:

      cmp    (%rdi),%rsi
      jne    0x7f45b8bcc800
      mov    0x10(%rdi),%rsi  
      mov    %ebx,(%rsi,%rbp,1)

However that means that %rsi is not loaded anymore with the address, so
we have to load it in the slow path. At the end it means moving one
instruction from the fast-path to the slow-path.

Now I have no idea what would really improve the performances. Smaller
fast-path so there are less instructions to execute? Smaller code in
general so that the caches are better used?

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-07-19 11:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-18  7:58 [Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr_i32 Richard Henderson
2015-07-18 21:18 ` Aurelien Jarno
2015-07-19 11:26   ` Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).