[Qemu-devel] [PATCH v3 0/2] Optimize movcond

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/2] Optimize movcond
@ 2012-09-24 20:44 Richard Henderson
  2012-09-24 20:44 ` [Qemu-devel] [PATCH 1/2] tcg: Streamline movcond_i64 using 32-bit arithmetic Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Richard Henderson @ 2012-09-24 20:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Changes v2->v3:
  Rebase with the first 5 patches committed.
  Fix 32/64-bit compile problems.  Oops.


r~


Richard Henderson (2):
  tcg: Streamline movcond_i64 using 32-bit arithmetic
  tcg: Streamline movcond_i64 using movcond_i32

 tcg/tcg-op.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

-- 
1.7.11.4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH 1/2] tcg: Streamline movcond_i64 using 32-bit arithmetic
  2012-09-24 20:44 [Qemu-devel] [PATCH v3 0/2] Optimize movcond Richard Henderson
@ 2012-09-24 20:44 ` Richard Henderson
  2012-09-24 20:45 ` [Qemu-devel] [PATCH 2/2] tcg: Streamline movcond_i64 using movcond_i32 Richard Henderson
  2012-09-25 22:47 ` [Qemu-devel] [PATCH v3 0/2] Optimize movcond Aurelien Jarno
  2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2012-09-24 20:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Avoiding 64-bit arithmetic (outside of the compare) reduces the
generated op count from 15 to 12, and the generated code size on
i686 from 105 to 88 bytes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 6d28f82..c32646e 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -2141,6 +2141,25 @@ static inline void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret,
                                        TCGv_i64 c1, TCGv_i64 c2,
                                        TCGv_i64 v1, TCGv_i64 v2)
 {
+#if TCG_TARGET_REG_BITS == 32
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0,
+                     TCGV_LOW(c1), TCGV_HIGH(c1),
+                     TCGV_LOW(c2), TCGV_HIGH(c2), cond);
+    tcg_gen_neg_i32(t0, t0);
+
+    tcg_gen_and_i32(t1, TCGV_LOW(v1), t0);
+    tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0);
+    tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1);
+
+    tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0);
+    tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0);
+    tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1);
+
+    tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
+#else
     if (TCG_TARGET_HAS_movcond_i64) {
         tcg_gen_op6i_i64(INDEX_op_movcond_i64, ret, c1, c2, v1, v2, cond);
     } else {
@@ -2154,6 +2173,7 @@ static inline void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret,
         tcg_temp_free_i64(t0);
         tcg_temp_free_i64(t1);
     }
+#endif
 }
 
 /***************************************/
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH 2/2] tcg: Streamline movcond_i64 using movcond_i32
  2012-09-24 20:44 [Qemu-devel] [PATCH v3 0/2] Optimize movcond Richard Henderson
  2012-09-24 20:44 ` [Qemu-devel] [PATCH 1/2] tcg: Streamline movcond_i64 using 32-bit arithmetic Richard Henderson
@ 2012-09-24 20:45 ` Richard Henderson
  2012-09-25 22:47 ` [Qemu-devel] [PATCH v3 0/2] Optimize movcond Aurelien Jarno
  2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2012-09-24 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

When movcond_i32 is available we can further reduce the generated
op count from 12 to 6, and the generated code size on i686 from
88 to 74 bytes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.h | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index c32646e..9ee16e7 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -2147,16 +2147,24 @@ static inline void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret,
     tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0,
                      TCGV_LOW(c1), TCGV_HIGH(c1),
                      TCGV_LOW(c2), TCGV_HIGH(c2), cond);
-    tcg_gen_neg_i32(t0, t0);
 
-    tcg_gen_and_i32(t1, TCGV_LOW(v1), t0);
-    tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0);
-    tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1);
+    if (TCG_TARGET_HAS_movcond_i32) {
+        tcg_gen_movi_i32(t1, 0);
+        tcg_gen_movcond_i32(TCG_COND_NE, TCGV_LOW(ret), t0, t1,
+                            TCGV_LOW(v1), TCGV_LOW(v2));
+        tcg_gen_movcond_i32(TCG_COND_NE, TCGV_HIGH(ret), t0, t1,
+                            TCGV_HIGH(v1), TCGV_HIGH(v2));
+    } else {
+        tcg_gen_neg_i32(t0, t0);
 
-    tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0);
-    tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0);
-    tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1);
+        tcg_gen_and_i32(t1, TCGV_LOW(v1), t0);
+        tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0);
+        tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1);
 
+        tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0);
+        tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0);
+        tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1);
+    }
     tcg_temp_free_i32(t0);
     tcg_temp_free_i32(t1);
 #else
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/2] Optimize movcond
  2012-09-24 20:44 [Qemu-devel] [PATCH v3 0/2] Optimize movcond Richard Henderson
  2012-09-24 20:44 ` [Qemu-devel] [PATCH 1/2] tcg: Streamline movcond_i64 using 32-bit arithmetic Richard Henderson
  2012-09-24 20:45 ` [Qemu-devel] [PATCH 2/2] tcg: Streamline movcond_i64 using movcond_i32 Richard Henderson
@ 2012-09-25 22:47 ` Aurelien Jarno
  2 siblings, 0 replies; 4+ messages in thread
From: Aurelien Jarno @ 2012-09-25 22:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 24, 2012 at 01:44:58PM -0700, Richard Henderson wrote:
> Changes v2->v3:
>   Rebase with the first 5 patches committed.
>   Fix 32/64-bit compile problems.  Oops.
> 
> 
> r~
> 
> 
> Richard Henderson (2):
>   tcg: Streamline movcond_i64 using 32-bit arithmetic
>   tcg: Streamline movcond_i64 using movcond_i32
> 
>  tcg/tcg-op.h | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 

Thanks, both applied.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-09-25 22:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-24 20:44 [Qemu-devel] [PATCH v3 0/2] Optimize movcond Richard Henderson
2012-09-24 20:44 ` [Qemu-devel] [PATCH 1/2] tcg: Streamline movcond_i64 using 32-bit arithmetic Richard Henderson
2012-09-24 20:45 ` [Qemu-devel] [PATCH 2/2] tcg: Streamline movcond_i64 using movcond_i32 Richard Henderson
2012-09-25 22:47 ` [Qemu-devel] [PATCH v3 0/2] Optimize movcond Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).