qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] tcg-arm improvements
@ 2013-03-05 15:56 Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
  To: qemu-devel

Here's a few things I've noticed while looking at debugging dumps.
Tested on an A15, compiled for armv7 and armv6.


r~


Richard Henderson (4):
  tcg-arm: Implement deposit for armv7
  tcg-arm: Use bic to implement and with constant
  tcg-arm: Handle negated constant arguments to and/sub
  tcg-arm: Improve constant generation

 tcg/arm/tcg-target.c | 170 ++++++++++++++++++++++++++++++++++++++++-----------
 tcg/arm/tcg-target.h |   7 ++-
 2 files changed, 138 insertions(+), 39 deletions(-)

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7
  2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant Richard Henderson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
  To: qemu-devel

We have BFI and BFC available for implementing it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
 tcg/arm/tcg-target.h |  5 ++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 94c6ca4..3422bd7 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -644,6 +644,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
     }
 }
 
+bool tcg_target_deposit_i32_value(int ofs, int len)
+{
+    /* ??? Without bfi, we could improve over generic code by combining
+       the right-shift from a non-zero ofs with the orr.  We do run into
+       problems when rd == rs, and the mask generated from ofs+len don't
+       fit into an immediate.  We would have to be careful not to pessimize
+       wrt the optimizations performed on the expanded code.  */
+    return use_armv7_instructions;
+}
+
+static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
+                                   TCGArg a1, int ofs, int len, bool const_a1)
+{
+    if (const_a1) {
+        uint32_t mask = (2u << (len - 1)) - 1;
+        a1 &= mask;
+        if (a1 == 0) {
+            /* bfi becomes bfc with rn == 15.  */
+            a1 = 15;
+        } else {
+            tcg_out_movi32(s, cond, TCG_REG_R8, a1);
+            a1 = TCG_REG_R8;
+        }
+    }
+    /* bfi/bfc */
+    tcg_out32(s, 0x07c00010 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((ofs + len - 1) << 16));
+}
+
 static inline void tcg_out_ld32_12(TCGContext *s, int cond,
                 int rd, int rn, tcg_target_long im)
 {
@@ -1773,6 +1802,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_ext16u(s, COND_AL, args[0], args[1]);
         break;
 
+    case INDEX_op_deposit_i32:
+        tcg_out_deposit(s, COND_AL, args[0], args[2],
+                        args[3], args[4], const_args[2]);
+        break;
+
     default:
         tcg_abort();
     }
@@ -1858,6 +1892,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_ext16s_i32, { "r", "r" } },
     { INDEX_op_ext16u_i32, { "r", "r" } },
 
+    { INDEX_op_deposit_i32, { "r", "0", "ri" } },
+
     { -1 },
 };
 
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b6eed1f..cb89419 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -73,10 +73,13 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_muls2_i32        1
 
+extern bool tcg_target_deposit_i32_value(int ofs, int len);
+#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_i32_value
+
 enum {
     TCG_AREG0 = TCG_REG_R6,
 };
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant
  2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation Richard Henderson
  3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
  To: qemu-devel

This greatly improves the code we can produce for deposit
without armv7 support.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 38 +++++++++++++++++++++++++++++---------
 tcg/arm/tcg-target.h |  2 --
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 3422bd7..6618571 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -145,6 +145,9 @@ static void patch_reloc(uint8_t *code_ptr, int type,
     }
 }
 
+#define TCG_CT_CONST_ARM 0x100
+#define TCG_CT_CONST_INV 0x200
+
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
@@ -155,6 +158,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'I':
          ct->ct |= TCG_CT_CONST_ARM;
          break;
+    case 'K':
+         ct->ct |= TCG_CT_CONST_INV;
+         break;
 
     case 'r':
         ct->ct |= TCG_CT_REG;
@@ -275,16 +281,19 @@ static inline int check_fit_imm(uint32_t imm)
  * add, sub, eor...: ditto
  */
 static inline int tcg_target_const_match(tcg_target_long val,
-                const TCGArgConstraint *arg_ct)
+                                         const TCGArgConstraint *arg_ct)
 {
     int ct;
     ct = arg_ct->ct;
-    if (ct & TCG_CT_CONST)
+    if (ct & TCG_CT_CONST) {
         return 1;
-    else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val))
+    } else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val)) {
         return 1;
-    else
+    } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
+        return 1;
+    } else {
         return 0;
+    }
 }
 
 enum arm_data_opc_e {
@@ -1535,6 +1544,7 @@ static uint8_t *tb_ret_addr;
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 const TCGArg *args, const int *const_args)
 {
+    TCGArg a0, a1, a2;
     int c;
 
     switch (opc) {
@@ -1639,11 +1649,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         c = ARITH_SUB;
         goto gen_arith;
     case INDEX_op_and_i32:
+        a0 = args[0], a1 = args[1], a2 = args[2];
         c = ARITH_AND;
-        goto gen_arith;
+        if (const_args[2] && check_fit_imm(~a2)) {
+            c = ARITH_BIC, a2 = ~a2;
+        }
+        goto gen_arith2;
     case INDEX_op_andc_i32:
+        a0 = args[0], a1 = args[1], a2 = args[2];
         c = ARITH_BIC;
-        goto gen_arith;
+        if (const_args[2] && check_fit_imm(~a2)) {
+            c = ARITH_AND, a2 = ~a2;
+        }
+        goto gen_arith2;
     case INDEX_op_or_i32:
         c = ARITH_ORR;
         goto gen_arith;
@@ -1651,7 +1669,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         c = ARITH_EOR;
         /* Fall through.  */
     gen_arith:
-        tcg_out_dat_rI(s, COND_AL, c, args[0], args[1], args[2], const_args[2]);
+        a0 = args[0], a1 = args[1], a2 = args[2];
+    gen_arith2:
+        tcg_out_dat_rI(s, COND_AL, c, a0, a1, a2, const_args[2]);
         break;
     case INDEX_op_add2_i32:
         tcg_out_dat_reg2(s, COND_AL, ARITH_ADD, ARITH_ADC,
@@ -1836,8 +1856,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
     { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
-    { INDEX_op_and_i32, { "r", "r", "rI" } },
-    { INDEX_op_andc_i32, { "r", "r", "rI" } },
+    { INDEX_op_and_i32, { "r", "r", "rIK" } },
+    { INDEX_op_andc_i32, { "r", "r", "rIK" } },
     { INDEX_op_or_i32, { "r", "r", "rI" } },
     { INDEX_op_xor_i32, { "r", "r", "rI" } },
     { INDEX_op_neg_i32, { "r", "r" } },
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index cb89419..c4970d6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -49,8 +49,6 @@ typedef enum {
 
 #define TCG_TARGET_NB_REGS 16
 
-#define TCG_CT_CONST_ARM 0x100
-
 /* used for function call generation */
 #define TCG_REG_CALL_STACK		TCG_REG_R13
 #define TCG_TARGET_STACK_ALIGN		8
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub
  2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation Richard Henderson
  3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
  To: qemu-devel

This greatly improves code generation for addition of small
negative constants.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 6618571..25d7f5c 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -147,6 +147,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 
 #define TCG_CT_CONST_ARM 0x100
 #define TCG_CT_CONST_INV 0x200
+#define TCG_CT_CONST_NEG 0x400
 
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
@@ -161,7 +162,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'K':
          ct->ct |= TCG_CT_CONST_INV;
          break;
-
+    case 'N': /* The gcc constraint letter is L, already used here.  */
+         ct->ct |= TCG_CT_CONST_NEG;
+         break;
+         
     case 'r':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
@@ -291,6 +295,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
         return 1;
     } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
         return 1;
+    } else if ((ct & TCG_CT_CONST_NEG) && check_fit_imm(-val)) {
+        return 1;
     } else {
         return 0;
     }
@@ -1643,11 +1649,26 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        ARITH_MOV, args[0], 0, args[3], const_args[3]);
         break;
     case INDEX_op_add_i32:
+        a0 = args[0], a1 = args[1], a2 = args[2];
         c = ARITH_ADD;
-        goto gen_arith;
+        if (const_args[2] && check_fit_imm(-a2)) {
+            c = ARITH_SUB, a2 = -a2;
+        }
+        goto gen_arith2;
     case INDEX_op_sub_i32:
+        a0 = args[0], a1 = args[1], a2 = args[2];
         c = ARITH_SUB;
-        goto gen_arith;
+        if (const_args[1]) {
+            if (const_args[2]) {
+                tcg_out_movi32(s, COND_AL, a0, a1 - a2);
+            } else {
+                tcg_out_dat_rI(s, COND_AL, ARITH_RSB, a0, a2, a1, 1);
+            }
+            break;
+        } else if (const_args[2] && check_fit_imm(-a2)) {
+            c = ARITH_ADD, a2 = -a2;
+        }
+        goto gen_arith2;
     case INDEX_op_and_i32:
         a0 = args[0], a1 = args[1], a2 = args[2];
         c = ARITH_AND;
@@ -1851,8 +1872,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_st_i32, { "r", "r" } },
 
     /* TODO: "r", "r", "ri" */
-    { INDEX_op_add_i32, { "r", "r", "rI" } },
-    { INDEX_op_sub_i32, { "r", "r", "rI" } },
+    { INDEX_op_add_i32, { "r", "r", "rIN" } },
+    { INDEX_op_sub_i32, { "r", "rI", "rIN" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
     { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation
  2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2013-03-05 15:56 ` [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
  3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
  To: qemu-devel

Try fully rotated arguments to mov and mvn before trying movt
or full decomposition.  Begin decomposition with mvn when it
looks like it'll help.  Examples include

-:        mov   r9, #0x00000fa0
-:        orr   r9, r9, #0x000ee000
-:        orr   r9, r9, #0x0ff00000
-:        orr   r9, r9, #0xf0000000
+:        mvn   r9, #0x0000005f
+:        eor   r9, r9, #0x00011000

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 67 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 25d7f5c..59084a3 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -447,15 +447,31 @@ static inline void tcg_out_dat_imm(TCGContext *s,
                     (rn << 16) | (rd << 12) | im);
 }
 
-static inline void tcg_out_movi32(TCGContext *s,
-                int cond, int rd, uint32_t arg)
-{
-    /* TODO: This is very suboptimal, we can easily have a constant
-     * pool somewhere after all the instructions.  */
-    if ((int)arg < 0 && (int)arg >= -0x100) {
-        tcg_out_dat_imm(s, cond, ARITH_MVN, rd, 0, (~arg) & 0xff);
-    } else if (use_armv7_instructions) {
-        /* use movw/movt */
+static void tcg_out_movi32(TCGContext *s, int cond, int rd, uint32_t arg)
+{
+    int rot, opc, rn;
+
+    /* For armv7, make sure not to use movw+movt when mov/mvn would do.
+       Speed things up by only checking when movt would be required.
+       Prior to armv7, have one go at fully rotated immediates before
+       doing the decomposition thing below.  */
+    if (!use_armv7_instructions || (arg & 0xffff0000)) {
+        rot = encode_imm(arg);
+        if (rot >= 0) {
+            tcg_out_dat_imm(s, cond, ARITH_MOV, rd, 0,
+                            rotl(arg, rot) | (rot << 7));
+            return;
+        }
+        rot = encode_imm(~arg);
+        if (rot >= 0) {
+            tcg_out_dat_imm(s, cond, ARITH_MVN, rd, 0,
+                            rotl(~arg, rot) | (rot << 7));
+            return;
+        }
+    }
+
+    /* Use movw + movt.  */
+    if (use_armv7_instructions) {
         /* movw */
         tcg_out32(s, (cond << 28) | 0x03000000 | (rd << 12)
                   | ((arg << 4) & 0x000f0000) | (arg & 0xfff));
@@ -464,22 +480,27 @@ static inline void tcg_out_movi32(TCGContext *s,
             tcg_out32(s, (cond << 28) | 0x03400000 | (rd << 12)
                       | ((arg >> 12) & 0x000f0000) | ((arg >> 16) & 0xfff));
         }
-    } else {
-        int opc = ARITH_MOV;
-        int rn = 0;
-
-        do {
-            int i, rot;
-
-            i = ctz32(arg) & ~1;
-            rot = ((32 - i) << 7) & 0xf00;
-            tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
-            arg &= ~(0xff << i);
+        return;
+    }
 
-            opc = ARITH_ORR;
-            rn = rd;
-        } while (arg);
+    /* TODO: This is very suboptimal, we can easily have a constant
+       pool somewhere after all the instructions.  */
+    opc = ARITH_MOV;
+    rn = 0;
+    /* If we have lots of leading 1's, we can shorten the sequence by
+       beginning with mvn and then clearing higher bits with eor.  */
+    if (clz32(~arg) > clz32(arg)) {
+        opc = ARITH_MVN, arg = ~arg;
     }
+    do {
+        int i = ctz32(arg) & ~1;
+        rot = ((32 - i) << 7) & 0xf00;
+        tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
+        arg &= ~(0xff << i);
+
+        opc = ARITH_EOR;
+        rn = rd;
+    } while (arg);
 }
 
 static inline void tcg_out_dat_rI(TCGContext *s, int cond, int opc, TCGArg dst,
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7
@ 2013-03-06 10:03 Jay Foad
  0 siblings, 0 replies; 6+ messages in thread
From: Jay Foad @ 2013-03-06 10:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index b6eed1f..cb89419 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -73,10 +73,13 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> -#define TCG_TARGET_HAS_deposit_i32      0
> +#define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_muls2_i32        1
>
> +extern bool tcg_target_deposit_i32_value(int ofs, int len);
> +#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_i32_value

s/_value/_valid/g ?

Jay.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-06 10:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation Richard Henderson
  -- strict thread matches above, loose matches on Subject: below --
2013-03-06 10:03 [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Jay Foad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).