* [Qemu-devel] [PATCH 0/4] tcg-arm improvements
@ 2013-03-05 15:56 Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
To: qemu-devel
Here's a few things I've noticed while looking at debugging dumps.
Tested on an A15, compiled for armv7 and armv6.
r~
Richard Henderson (4):
tcg-arm: Implement deposit for armv7
tcg-arm: Use bic to implement and with constant
tcg-arm: Handle negated constant arguments to and/sub
tcg-arm: Improve constant generation
tcg/arm/tcg-target.c | 170 ++++++++++++++++++++++++++++++++++++++++-----------
tcg/arm/tcg-target.h | 7 ++-
2 files changed, 138 insertions(+), 39 deletions(-)
--
1.8.1.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7
2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant Richard Henderson
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
To: qemu-devel
We have BFI and BFC available for implementing it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
tcg/arm/tcg-target.h | 5 ++++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 94c6ca4..3422bd7 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -644,6 +644,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
}
}
+bool tcg_target_deposit_i32_value(int ofs, int len)
+{
+ /* ??? Without bfi, we could improve over generic code by combining
+ the right-shift from a non-zero ofs with the orr. We do run into
+ problems when rd == rs, and the mask generated from ofs+len don't
+ fit into an immediate. We would have to be careful not to pessimize
+ wrt the optimizations performed on the expanded code. */
+ return use_armv7_instructions;
+}
+
+static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
+ TCGArg a1, int ofs, int len, bool const_a1)
+{
+ if (const_a1) {
+ uint32_t mask = (2u << (len - 1)) - 1;
+ a1 &= mask;
+ if (a1 == 0) {
+ /* bfi becomes bfc with rn == 15. */
+ a1 = 15;
+ } else {
+ tcg_out_movi32(s, cond, TCG_REG_R8, a1);
+ a1 = TCG_REG_R8;
+ }
+ }
+ /* bfi/bfc */
+ tcg_out32(s, 0x07c00010 | (cond << 28) | (rd << 12) | a1
+ | (ofs << 7) | ((ofs + len - 1) << 16));
+}
+
static inline void tcg_out_ld32_12(TCGContext *s, int cond,
int rd, int rn, tcg_target_long im)
{
@@ -1773,6 +1802,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
tcg_out_ext16u(s, COND_AL, args[0], args[1]);
break;
+ case INDEX_op_deposit_i32:
+ tcg_out_deposit(s, COND_AL, args[0], args[2],
+ args[3], args[4], const_args[2]);
+ break;
+
default:
tcg_abort();
}
@@ -1858,6 +1892,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
{ INDEX_op_ext16s_i32, { "r", "r" } },
{ INDEX_op_ext16u_i32, { "r", "r" } },
+ { INDEX_op_deposit_i32, { "r", "0", "ri" } },
+
{ -1 },
};
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b6eed1f..cb89419 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -73,10 +73,13 @@ typedef enum {
#define TCG_TARGET_HAS_eqv_i32 0
#define TCG_TARGET_HAS_nand_i32 0
#define TCG_TARGET_HAS_nor_i32 0
-#define TCG_TARGET_HAS_deposit_i32 0
+#define TCG_TARGET_HAS_deposit_i32 1
#define TCG_TARGET_HAS_movcond_i32 1
#define TCG_TARGET_HAS_muls2_i32 1
+extern bool tcg_target_deposit_i32_value(int ofs, int len);
+#define TCG_TARGET_deposit_i32_valid tcg_target_deposit_i32_value
+
enum {
TCG_AREG0 = TCG_REG_R6,
};
--
1.8.1.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant
2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation Richard Henderson
3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
To: qemu-devel
This greatly improves the code we can produce for deposit
without armv7 support.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/arm/tcg-target.c | 38 +++++++++++++++++++++++++++++---------
tcg/arm/tcg-target.h | 2 --
2 files changed, 29 insertions(+), 11 deletions(-)
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 3422bd7..6618571 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -145,6 +145,9 @@ static void patch_reloc(uint8_t *code_ptr, int type,
}
}
+#define TCG_CT_CONST_ARM 0x100
+#define TCG_CT_CONST_INV 0x200
+
/* parse target specific constraints */
static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
{
@@ -155,6 +158,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
case 'I':
ct->ct |= TCG_CT_CONST_ARM;
break;
+ case 'K':
+ ct->ct |= TCG_CT_CONST_INV;
+ break;
case 'r':
ct->ct |= TCG_CT_REG;
@@ -275,16 +281,19 @@ static inline int check_fit_imm(uint32_t imm)
* add, sub, eor...: ditto
*/
static inline int tcg_target_const_match(tcg_target_long val,
- const TCGArgConstraint *arg_ct)
+ const TCGArgConstraint *arg_ct)
{
int ct;
ct = arg_ct->ct;
- if (ct & TCG_CT_CONST)
+ if (ct & TCG_CT_CONST) {
return 1;
- else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val))
+ } else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val)) {
return 1;
- else
+ } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
+ return 1;
+ } else {
return 0;
+ }
}
enum arm_data_opc_e {
@@ -1535,6 +1544,7 @@ static uint8_t *tb_ret_addr;
static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
const TCGArg *args, const int *const_args)
{
+ TCGArg a0, a1, a2;
int c;
switch (opc) {
@@ -1639,11 +1649,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
c = ARITH_SUB;
goto gen_arith;
case INDEX_op_and_i32:
+ a0 = args[0], a1 = args[1], a2 = args[2];
c = ARITH_AND;
- goto gen_arith;
+ if (const_args[2] && check_fit_imm(~a2)) {
+ c = ARITH_BIC, a2 = ~a2;
+ }
+ goto gen_arith2;
case INDEX_op_andc_i32:
+ a0 = args[0], a1 = args[1], a2 = args[2];
c = ARITH_BIC;
- goto gen_arith;
+ if (const_args[2] && check_fit_imm(~a2)) {
+ c = ARITH_AND, a2 = ~a2;
+ }
+ goto gen_arith2;
case INDEX_op_or_i32:
c = ARITH_ORR;
goto gen_arith;
@@ -1651,7 +1669,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
c = ARITH_EOR;
/* Fall through. */
gen_arith:
- tcg_out_dat_rI(s, COND_AL, c, args[0], args[1], args[2], const_args[2]);
+ a0 = args[0], a1 = args[1], a2 = args[2];
+ gen_arith2:
+ tcg_out_dat_rI(s, COND_AL, c, a0, a1, a2, const_args[2]);
break;
case INDEX_op_add2_i32:
tcg_out_dat_reg2(s, COND_AL, ARITH_ADD, ARITH_ADC,
@@ -1836,8 +1856,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
{ INDEX_op_mul_i32, { "r", "r", "r" } },
{ INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
{ INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
- { INDEX_op_and_i32, { "r", "r", "rI" } },
- { INDEX_op_andc_i32, { "r", "r", "rI" } },
+ { INDEX_op_and_i32, { "r", "r", "rIK" } },
+ { INDEX_op_andc_i32, { "r", "r", "rIK" } },
{ INDEX_op_or_i32, { "r", "r", "rI" } },
{ INDEX_op_xor_i32, { "r", "r", "rI" } },
{ INDEX_op_neg_i32, { "r", "r" } },
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index cb89419..c4970d6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -49,8 +49,6 @@ typedef enum {
#define TCG_TARGET_NB_REGS 16
-#define TCG_CT_CONST_ARM 0x100
-
/* used for function call generation */
#define TCG_REG_CALL_STACK TCG_REG_R13
#define TCG_TARGET_STACK_ALIGN 8
--
1.8.1.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub
2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation Richard Henderson
3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
To: qemu-devel
This greatly improves code generation for addition of small
negative constants.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/arm/tcg-target.c | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 6618571..25d7f5c 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -147,6 +147,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
#define TCG_CT_CONST_ARM 0x100
#define TCG_CT_CONST_INV 0x200
+#define TCG_CT_CONST_NEG 0x400
/* parse target specific constraints */
static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
@@ -161,7 +162,10 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
case 'K':
ct->ct |= TCG_CT_CONST_INV;
break;
-
+ case 'N': /* The gcc constraint letter is L, already used here. */
+ ct->ct |= TCG_CT_CONST_NEG;
+ break;
+
case 'r':
ct->ct |= TCG_CT_REG;
tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
@@ -291,6 +295,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
return 1;
} else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
return 1;
+ } else if ((ct & TCG_CT_CONST_NEG) && check_fit_imm(-val)) {
+ return 1;
} else {
return 0;
}
@@ -1643,11 +1649,26 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
ARITH_MOV, args[0], 0, args[3], const_args[3]);
break;
case INDEX_op_add_i32:
+ a0 = args[0], a1 = args[1], a2 = args[2];
c = ARITH_ADD;
- goto gen_arith;
+ if (const_args[2] && check_fit_imm(-a2)) {
+ c = ARITH_SUB, a2 = -a2;
+ }
+ goto gen_arith2;
case INDEX_op_sub_i32:
+ a0 = args[0], a1 = args[1], a2 = args[2];
c = ARITH_SUB;
- goto gen_arith;
+ if (const_args[1]) {
+ if (const_args[2]) {
+ tcg_out_movi32(s, COND_AL, a0, a1 - a2);
+ } else {
+ tcg_out_dat_rI(s, COND_AL, ARITH_RSB, a0, a2, a1, 1);
+ }
+ break;
+ } else if (const_args[2] && check_fit_imm(-a2)) {
+ c = ARITH_ADD, a2 = -a2;
+ }
+ goto gen_arith2;
case INDEX_op_and_i32:
a0 = args[0], a1 = args[1], a2 = args[2];
c = ARITH_AND;
@@ -1851,8 +1872,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
{ INDEX_op_st_i32, { "r", "r" } },
/* TODO: "r", "r", "ri" */
- { INDEX_op_add_i32, { "r", "r", "rI" } },
- { INDEX_op_sub_i32, { "r", "r", "rI" } },
+ { INDEX_op_add_i32, { "r", "r", "rIN" } },
+ { INDEX_op_sub_i32, { "r", "rI", "rIN" } },
{ INDEX_op_mul_i32, { "r", "r", "r" } },
{ INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
{ INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
--
1.8.1.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation
2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
` (2 preceding siblings ...)
2013-03-05 15:56 ` [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
@ 2013-03-05 15:56 ` Richard Henderson
3 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-03-05 15:56 UTC (permalink / raw)
To: qemu-devel
Try fully rotated arguments to mov and mvn before trying movt
or full decomposition. Begin decomposition with mvn when it
looks like it'll help. Examples include
-: mov r9, #0x00000fa0
-: orr r9, r9, #0x000ee000
-: orr r9, r9, #0x0ff00000
-: orr r9, r9, #0xf0000000
+: mvn r9, #0x0000005f
+: eor r9, r9, #0x00011000
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/arm/tcg-target.c | 67 ++++++++++++++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 23 deletions(-)
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 25d7f5c..59084a3 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -447,15 +447,31 @@ static inline void tcg_out_dat_imm(TCGContext *s,
(rn << 16) | (rd << 12) | im);
}
-static inline void tcg_out_movi32(TCGContext *s,
- int cond, int rd, uint32_t arg)
-{
- /* TODO: This is very suboptimal, we can easily have a constant
- * pool somewhere after all the instructions. */
- if ((int)arg < 0 && (int)arg >= -0x100) {
- tcg_out_dat_imm(s, cond, ARITH_MVN, rd, 0, (~arg) & 0xff);
- } else if (use_armv7_instructions) {
- /* use movw/movt */
+static void tcg_out_movi32(TCGContext *s, int cond, int rd, uint32_t arg)
+{
+ int rot, opc, rn;
+
+ /* For armv7, make sure not to use movw+movt when mov/mvn would do.
+ Speed things up by only checking when movt would be required.
+ Prior to armv7, have one go at fully rotated immediates before
+ doing the decomposition thing below. */
+ if (!use_armv7_instructions || (arg & 0xffff0000)) {
+ rot = encode_imm(arg);
+ if (rot >= 0) {
+ tcg_out_dat_imm(s, cond, ARITH_MOV, rd, 0,
+ rotl(arg, rot) | (rot << 7));
+ return;
+ }
+ rot = encode_imm(~arg);
+ if (rot >= 0) {
+ tcg_out_dat_imm(s, cond, ARITH_MVN, rd, 0,
+ rotl(~arg, rot) | (rot << 7));
+ return;
+ }
+ }
+
+ /* Use movw + movt. */
+ if (use_armv7_instructions) {
/* movw */
tcg_out32(s, (cond << 28) | 0x03000000 | (rd << 12)
| ((arg << 4) & 0x000f0000) | (arg & 0xfff));
@@ -464,22 +480,27 @@ static inline void tcg_out_movi32(TCGContext *s,
tcg_out32(s, (cond << 28) | 0x03400000 | (rd << 12)
| ((arg >> 12) & 0x000f0000) | ((arg >> 16) & 0xfff));
}
- } else {
- int opc = ARITH_MOV;
- int rn = 0;
-
- do {
- int i, rot;
-
- i = ctz32(arg) & ~1;
- rot = ((32 - i) << 7) & 0xf00;
- tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
- arg &= ~(0xff << i);
+ return;
+ }
- opc = ARITH_ORR;
- rn = rd;
- } while (arg);
+ /* TODO: This is very suboptimal, we can easily have a constant
+ pool somewhere after all the instructions. */
+ opc = ARITH_MOV;
+ rn = 0;
+ /* If we have lots of leading 1's, we can shorten the sequence by
+ beginning with mvn and then clearing higher bits with eor. */
+ if (clz32(~arg) > clz32(arg)) {
+ opc = ARITH_MVN, arg = ~arg;
}
+ do {
+ int i = ctz32(arg) & ~1;
+ rot = ((32 - i) << 7) & 0xf00;
+ tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
+ arg &= ~(0xff << i);
+
+ opc = ARITH_EOR;
+ rn = rd;
+ } while (arg);
}
static inline void tcg_out_dat_rI(TCGContext *s, int cond, int opc, TCGArg dst,
--
1.8.1.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7
@ 2013-03-06 10:03 Jay Foad
0 siblings, 0 replies; 6+ messages in thread
From: Jay Foad @ 2013-03-06 10:03 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index b6eed1f..cb89419 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -73,10 +73,13 @@ typedef enum {
> #define TCG_TARGET_HAS_eqv_i32 0
> #define TCG_TARGET_HAS_nand_i32 0
> #define TCG_TARGET_HAS_nor_i32 0
> -#define TCG_TARGET_HAS_deposit_i32 0
> +#define TCG_TARGET_HAS_deposit_i32 1
> #define TCG_TARGET_HAS_movcond_i32 1
> #define TCG_TARGET_HAS_muls2_i32 1
>
> +extern bool tcg_target_deposit_i32_value(int ofs, int len);
> +#define TCG_TARGET_deposit_i32_valid tcg_target_deposit_i32_value
s/_value/_valid/g ?
Jay.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-03-06 10:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-05 15:56 [Qemu-devel] [PATCH 0/4] tcg-arm improvements Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 2/4] tcg-arm: Use bic to implement and with constant Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 3/4] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
2013-03-05 15:56 ` [Qemu-devel] [PATCH 4/4] tcg-arm: Improve constant generation Richard Henderson
-- strict thread matches above, loose matches on Subject: below --
2013-03-06 10:03 [Qemu-devel] [PATCH 1/4] tcg-arm: Implement deposit for armv7 Jay Foad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).