qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes
@ 2009-10-24 12:18 juha.riihimaki
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops juha.riihimaki
                   ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:18 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

This is the revised set of patches for the ARM translator and it includes a number of smaller fixes and improvements. The series should be applied in sequence as the modifications are mostly related to the same file, target-arm/translate.c. The whole series should apply cleanly against latest git.

Note that the three patches have been removed from this series compared to the previous version and one new one added.


Juha Riihimäki (10):
  target-arm: fix neon vshrn/vrshrn ops
  target-arm: add support for neon vld1.64/vst1.64 instructions
  target-arm: allow modifying vfp fpexc en bit only
  target-arm: optimize vfp load/store multiple ops
  target-arm: optimize arm load/store multiple ops
  target-arm: fix neon vsri, vshl and vsli ops
  target-arm: optimize thumb2 load/store multiple ops
  target-arm: optimize thumb push/pop ops
  target-arm: optimize neon vld/vst ops
  target-arm: fix neon shift helper functions

 target-arm/neon_helper.c |   12 ++--
 target-arm/translate.c   |  205 +++++++++++++++++++++++++++-------------------
 2 files changed, 128 insertions(+), 89 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-25 10:58   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 02/10] target-arm: add support for neon vld1.64/vst1.64 instructions juha.riihimaki
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

In the existing code shift value is clobbered during the pass loop.
This patch changes the code so that it stores the intermediate
result in the target neon register directly and eliminates the need
to use a temporary to hold the intermediate value thus leaving the
shift value in the temporary variable intact. This is a new patch
in this version of the patch series.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 9d13d42..8a85db6 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4680,18 +4680,12 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                         else
                             gen_neon_narrow_satu(size - 1, tmp, cpu_V0);
                     }
-                    if (pass == 0) {
-                        if (size != 3) {
-                            dead_tmp(tmp2);
-                        }
-                        tmp2 = tmp;
-                    } else {
-                        neon_store_reg(rd, 0, tmp2);
-                        neon_store_reg(rd, 1, tmp);
-                    }
+                    neon_store_reg(rd, pass, tmp);
                 } /* for pass */
                 if (size == 3) {
                     tcg_temp_free_i64(tmp64);
+                } else {
+                    dead_tmp(tmp2);
                 }
             } else if (op == 10) {
                 /* VSHLL */
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 02/10] target-arm: add support for neon vld1.64/vst1.64 instructions
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-25 11:11   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only juha.riihimaki
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Add support for NEON vld1.64 and vst1.64 instructions. This patch is
revised to follow more closely the specification and raises
undefined exception if 64bit element size is used for vld2/vst2 or
vld4/vst4 instructions.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |  133 +++++++++++++++++++++++++++++-------------------
 1 files changed, 81 insertions(+), 52 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 8a85db6..09c996d 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -795,6 +795,12 @@ static inline TCGv gen_ld32(TCGv addr, int index)
     tcg_gen_qemu_ld32u(tmp, addr, index);
     return tmp;
 }
+static inline TCGv_i64 gen_ld64(TCGv addr, int index)
+{
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld64(tmp, addr, index);
+    return tmp;
+}
 static inline void gen_st8(TCGv val, TCGv addr, int index)
 {
     tcg_gen_qemu_st8(val, addr, index);
@@ -810,6 +816,11 @@ static inline void gen_st32(TCGv val, TCGv addr, int index)
     tcg_gen_qemu_st32(val, addr, index);
     dead_tmp(val);
 }
+static inline void gen_st64(TCGv_i64 val, TCGv addr, int index)
+{
+    tcg_gen_qemu_st64(val, addr, index);
+    tcg_temp_free_i64(val);
+}
 
 static inline void gen_set_pc_im(uint32_t val)
 {
@@ -3680,6 +3691,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
     int op;
     int nregs;
     int interleave;
+    int spacing;
     int stride;
     int size;
     int reg;
@@ -3690,6 +3702,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
     TCGv addr;
     TCGv tmp;
     TCGv tmp2;
+    TCGv_i64 tmp64;
 
     if (!vfp_enabled(env))
       return 1;
@@ -3702,10 +3715,13 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
         /* Load store all elements.  */
         op = (insn >> 8) & 0xf;
         size = (insn >> 6) & 3;
-        if (op > 10 || size == 3)
+        if (op > 10)
             return 1;
         nregs = neon_ls_element_type[op].nregs;
         interleave = neon_ls_element_type[op].interleave;
+        spacing = neon_ls_element_type[op].spacing;
+        if (size == 3 && (interleave | spacing) != 1)
+            return 1;
         load_reg_var(s, addr, rn);
         stride = (1 << size) * interleave;
         for (reg = 0; reg < nregs; reg++) {
@@ -3716,65 +3732,78 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 load_reg_var(s, addr, rn);
                 tcg_gen_addi_i32(addr, addr, 1 << size);
             }
-            for (pass = 0; pass < 2; pass++) {
-                if (size == 2) {
-                    if (load) {
-                        tmp = gen_ld32(addr, IS_USER(s));
-                        neon_store_reg(rd, pass, tmp);
-                    } else {
-                        tmp = neon_load_reg(rd, pass);
-                        gen_st32(tmp, addr, IS_USER(s));
-                    }
-                    tcg_gen_addi_i32(addr, addr, stride);
-                } else if (size == 1) {
-                    if (load) {
-                        tmp = gen_ld16u(addr, IS_USER(s));
-                        tcg_gen_addi_i32(addr, addr, stride);
-                        tmp2 = gen_ld16u(addr, IS_USER(s));
-                        tcg_gen_addi_i32(addr, addr, stride);
-                        gen_bfi(tmp, tmp, tmp2, 16, 0xffff);
-                        dead_tmp(tmp2);
-                        neon_store_reg(rd, pass, tmp);
-                    } else {
-                        tmp = neon_load_reg(rd, pass);
-                        tmp2 = new_tmp();
-                        tcg_gen_shri_i32(tmp2, tmp, 16);
-                        gen_st16(tmp, addr, IS_USER(s));
-                        tcg_gen_addi_i32(addr, addr, stride);
-                        gen_st16(tmp2, addr, IS_USER(s));
+            if (size == 3) {
+                if (load) {
+                    tmp64 = gen_ld64(addr, IS_USER(s));
+                    neon_store_reg64(tmp64, rd);
+                    tcg_temp_free_i64(tmp64);
+                } else {
+                    tmp64 = tcg_temp_new_i64();
+                    neon_load_reg64(tmp64, rd);
+                    gen_st64(tmp64, addr, IS_USER(s));
+                }
+                tcg_gen_addi_i32(addr, addr, stride);
+            } else {
+                for (pass = 0; pass < 2; pass++) {
+                    if (size == 2) {
+                        if (load) {
+                            tmp = gen_ld32(addr, IS_USER(s));
+                            neon_store_reg(rd, pass, tmp);
+                        } else {
+                            tmp = neon_load_reg(rd, pass);
+                            gen_st32(tmp, addr, IS_USER(s));
+                        }
                         tcg_gen_addi_i32(addr, addr, stride);
-                    }
-                } else /* size == 0 */ {
-                    if (load) {
-                        TCGV_UNUSED(tmp2);
-                        for (n = 0; n < 4; n++) {
-                            tmp = gen_ld8u(addr, IS_USER(s));
+                    } else if (size == 1) {
+                        if (load) {
+                            tmp = gen_ld16u(addr, IS_USER(s));
+                            tcg_gen_addi_i32(addr, addr, stride);
+                            tmp2 = gen_ld16u(addr, IS_USER(s));
+                            tcg_gen_addi_i32(addr, addr, stride);
+                            gen_bfi(tmp, tmp, tmp2, 16, 0xffff);
+                            dead_tmp(tmp2);
+                            neon_store_reg(rd, pass, tmp);
+                        } else {
+                            tmp = neon_load_reg(rd, pass);
+                            tmp2 = new_tmp();
+                            tcg_gen_shri_i32(tmp2, tmp, 16);
+                            gen_st16(tmp, addr, IS_USER(s));
+                            tcg_gen_addi_i32(addr, addr, stride);
+                            gen_st16(tmp2, addr, IS_USER(s));
                             tcg_gen_addi_i32(addr, addr, stride);
-                            if (n == 0) {
-                                tmp2 = tmp;
-                            } else {
-                                gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff);
-                                dead_tmp(tmp);
-                            }
                         }
-                        neon_store_reg(rd, pass, tmp2);
-                    } else {
-                        tmp2 = neon_load_reg(rd, pass);
-                        for (n = 0; n < 4; n++) {
-                            tmp = new_tmp();
-                            if (n == 0) {
-                                tcg_gen_mov_i32(tmp, tmp2);
-                            } else {
-                                tcg_gen_shri_i32(tmp, tmp2, n * 8);
+                    } else /* size == 0 */ {
+                        if (load) {
+                            TCGV_UNUSED(tmp2);
+                            for (n = 0; n < 4; n++) {
+                                tmp = gen_ld8u(addr, IS_USER(s));
+                                tcg_gen_addi_i32(addr, addr, stride);
+                                if (n == 0) {
+                                    tmp2 = tmp;
+                                } else {
+                                    gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff);
+                                    dead_tmp(tmp);
+                                }
                             }
-                            gen_st8(tmp, addr, IS_USER(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
+                            neon_store_reg(rd, pass, tmp2);
+                        } else {
+                            tmp2 = neon_load_reg(rd, pass);
+                            for (n = 0; n < 4; n++) {
+                                tmp = new_tmp();
+                                if (n == 0) {
+                                    tcg_gen_mov_i32(tmp, tmp2);
+                                } else {
+                                    tcg_gen_shri_i32(tmp, tmp2, n * 8);
+                                }
+                                gen_st8(tmp, addr, IS_USER(s));
+                                tcg_gen_addi_i32(addr, addr, stride);
+                            }
+                            dead_tmp(tmp2);
                         }
-                        dead_tmp(tmp2);
                     }
                 }
             }
-            rd += neon_ls_element_type[op].spacing;
+            rd += spacing;
         }
         stride = nregs * 8;
     } else {
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops juha.riihimaki
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 02/10] target-arm: add support for neon vld1.64/vst1.64 instructions juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-25 12:23   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops juha.riihimaki
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

All other bits except for the EN in the VFP FPEXC register are defined
as subarchitecture specific and real functionality for any of the
other bits has not been implemented in QEMU. However, current code
allows modifying all bits in the VFP FPEXC register leading to
problems when guest code is writing 1's to the subarchitecture
specific bits and checking whether the bits stay up to verify the
existence of functionality which in fact does not exist in QEMU.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 09c996d..8cb1c0f 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -2788,6 +2788,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
                         case ARM_VFP_FPEXC:
                             if (IS_USER(s))
                                 return 1;
+                            /* TODO: VFP subarchitecture support.
+                             * For now, keep the EN bit only */
+                            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
                             store_cpu_field(tmp, vfp.xregs[rn]);
                             gen_lookup_tb(s);
                             break;
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (2 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-24 17:36   ` Laurent Desnogues
  2009-10-27  8:42   ` Aurelien Jarno
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm " juha.riihimaki
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

VFP load/store multiple instructions can be slightly optimized by
loading the register offset constant into a variable outside the
register loop and using the preloaded variable inside the loop instead
of reloading the offset value to a temporary variable on each loop
iteration. This causes less TCG ops to be generated for a VFP load/
store multiple instruction if there are more than one register
accessed, otherwise the amount of generated TCG ops is the same.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 8cb1c0f..38fb833 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     offset = 8;
                 else
                     offset = 4;
+                tmp = tcg_const_i32(offset);
                 for (i = 0; i < n; i++) {
                     if (insn & ARM_CP_RW_BIT) {
                         /* load */
@@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
                         gen_mov_F0_vreg(dp, rd + i);
                         gen_vfp_st(s, dp, addr);
                     }
-                    tcg_gen_addi_i32(addr, addr, offset);
+                    tcg_gen_add_i32(addr, addr, tmp);
                 }
+                tcg_temp_free_i32(tmp);
                 if (insn & (1 << 21)) {
                     /* writeback */
                     if (insn & (1 << 24))
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (3 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-27  8:39   ` Aurelien Jarno
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 06/10] target-arm: fix neon vsri, vshl and vsli ops juha.riihimaki
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

RM load/store multiple instructions can be slightly optimized by
loading the register offset constant into a variable outside the
register loop and using the preloaded variable inside the loop instead
of reloading the offset value to a temporary variable on each loop
iteration. This causes less TCG ops to be generated for a ARM load/
store multiple instruction if there are more than one register
accessed, otherwise the number of generated TCG ops is the same.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>
---
 target-arm/translate.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 38fb833..d1e2ed2 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -6852,6 +6852,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                 }
                 rn = (insn >> 16) & 0xf;
                 addr = load_reg(s, rn);
+                tmp2 = tcg_const_i32(4);
 
                 /* compute total size */
                 loaded_base = 0;
@@ -6865,7 +6866,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                 if (insn & (1 << 23)) {
                     if (insn & (1 << 24)) {
                         /* pre increment */
-                        tcg_gen_addi_i32(addr, addr, 4);
+                        tcg_gen_add_i32(addr, addr, tmp2);
                     } else {
                         /* post increment */
                     }
@@ -6918,7 +6919,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                         j++;
                         /* no need to add after the last transfer */
                         if (j != n)
-                            tcg_gen_addi_i32(addr, addr, 4);
+                            tcg_gen_add_i32(addr, addr, tmp2);
                     }
                 }
                 if (insn & (1 << 21)) {
@@ -6928,7 +6929,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                             /* pre increment */
                         } else {
                             /* post increment */
-                            tcg_gen_addi_i32(addr, addr, 4);
+                            tcg_gen_add_i32(addr, addr, tmp2);
                         }
                     } else {
                         if (insn & (1 << 24)) {
@@ -6944,6 +6945,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                 } else {
                     dead_tmp(addr);
                 }
+                tcg_temp_free_i32(tmp2);
                 if (loaded_base) {
                     store_reg(s, rn, loaded_var);
                 }
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 06/10] target-arm: fix neon vsri, vshl and vsli ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (4 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm " juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-24 17:44   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 07/10] target-arm: optimize thumb2 load/store multiple ops juha.riihimaki
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Shift by immediate value is incorrectly overwritten by a temporary
variable in the processing of NEON vsri, vshl and vsli instructions.
This patch has been revised to also include a fix for the special
case where the code would previously try to shift an integer value
over 31 bits left/right.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |   32 ++++++++++++++++++--------------
 1 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index d1e2ed2..9e924d4 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4098,7 +4098,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
     int pairwise;
     int u;
     int n;
-    uint32_t imm;
+    uint32_t imm, mask;
     TCGv tmp, tmp2, tmp3, tmp4, tmp5;
     TCGv_i64 tmp64;
 
@@ -4626,31 +4626,35 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                             switch (size) {
                             case 0:
                                 if (op == 4)
-                                    imm = 0xff >> -shift;
+                                    mask = 0xff >> -shift;
                                 else
-                                    imm = (uint8_t)(0xff << shift);
-                                imm |= imm << 8;
-                                imm |= imm << 16;
+                                    mask = (uint8_t)(0xff << shift);
+                                mask |= mask << 8;
+                                mask |= mask << 16;
                                 break;
                             case 1:
                                 if (op == 4)
-                                    imm = 0xffff >> -shift;
+                                    mask = 0xffff >> -shift;
                                 else
-                                    imm = (uint16_t)(0xffff << shift);
-                                imm |= imm << 16;
+                                    mask = (uint16_t)(0xffff << shift);
+                                mask |= mask << 16;
                                 break;
                             case 2:
-                                if (op == 4)
-                                    imm = 0xffffffffu >> -shift;
-                                else
-                                    imm = 0xffffffffu << shift;
+                                if (shift < -31 || shift > 31) {
+                                    mask = 0;
+                                } else {
+                                    if (op == 4)
+                                        mask = 0xffffffffu >> -shift;
+                                    else
+                                        mask = 0xffffffffu << shift;
+                                }
                                 break;
                             default:
                                 abort();
                             }
                             tmp2 = neon_load_reg(rd, pass);
-                            tcg_gen_andi_i32(tmp, tmp, imm);
-                            tcg_gen_andi_i32(tmp2, tmp2, ~imm);
+                            tcg_gen_andi_i32(tmp, tmp, mask);
+                            tcg_gen_andi_i32(tmp2, tmp2, ~mask);
                             tcg_gen_or_i32(tmp, tmp, tmp2);
                             dead_tmp(tmp2);
                         }
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 07/10] target-arm: optimize thumb2 load/store multiple ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (5 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 06/10] target-arm: fix neon vsri, vshl and vsli ops juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-24 17:32   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 08/10] target-arm: optimize thumb push/pop ops juha.riihimaki
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Thumb2 load/store multiple instructions can be slightly optimized by
loading the register offset constant into a variable outside the
register loop and using the preloaded variable inside the loop instead
of reloading the offset value to a temporary variable on each loop
iteration. This causes less TCG ops to be generated for a Thumb2 load/
store multiple instruction if there are more than one register
accessed, otherwise the amount of generated TCG ops will be the same.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 9e924d4..353f638 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7374,6 +7374,7 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1)
                     tcg_gen_addi_i32(addr, addr, -offset);
                 }
 
+                tmp2 = tcg_const_i32(4);
                 for (i = 0; i < 16; i++) {
                     if ((insn & (1 << i)) == 0)
                         continue;
@@ -7390,8 +7391,9 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1)
                         tmp = load_reg(s, i);
                         gen_st32(tmp, addr, IS_USER(s));
                     }
-                    tcg_gen_addi_i32(addr, addr, 4);
+                    tcg_gen_add_i32(addr, addr, tmp2);
                 }
+                tcg_temp_free_i32(tmp2);
                 if (insn & (1 << 21)) {
                     /* Base register writeback.  */
                     if (insn & (1 << 24)) {
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 08/10] target-arm: optimize thumb push/pop ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (6 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 07/10] target-arm: optimize thumb2 load/store multiple ops juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-24 17:34   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops juha.riihimaki
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Thumb push/pop instructions can be slightly optimized by loading the
register offset constant into a variable outside the register loop and
using the preloaded variable inside the loop instead of reloading the
offset value to a temporary variable on each loop iteration. This
causes less TCG ops to be generated for a Thumb push/pop instruction
if there are more than one register accessed, otherwise the amount of
generated TCG ops is the same.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 353f638..f262758 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8596,6 +8596,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s)
             if ((insn & (1 << 11)) == 0) {
                 tcg_gen_addi_i32(addr, addr, -offset);
             }
+            tmp2 = tcg_const_i32(4);
             for (i = 0; i < 8; i++) {
                 if (insn & (1 << i)) {
                     if (insn & (1 << 11)) {
@@ -8608,7 +8609,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s)
                         gen_st32(tmp, addr, IS_USER(s));
                     }
                     /* advance to the next address.  */
-                    tcg_gen_addi_i32(addr, addr, 4);
+                    tcg_gen_add_i32(addr, addr, tmp2);
                 }
             }
             TCGV_UNUSED(tmp);
@@ -8623,8 +8624,9 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s)
                     tmp = load_reg(s, 14);
                     gen_st32(tmp, addr, IS_USER(s));
                 }
-                tcg_gen_addi_i32(addr, addr, 4);
+                tcg_gen_add_i32(addr, addr, tmp2);
             }
+            tcg_temp_free_i32(tmp2);
             if ((insn & (1 << 11)) == 0) {
                 tcg_gen_addi_i32(addr, addr, -offset);
             }
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (7 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 08/10] target-arm: optimize thumb push/pop ops juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-25 14:01   ` Laurent Desnogues
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 10/10] target-arm: fix neon shift helper functions juha.riihimaki
  2009-10-25 19:17 ` [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes Aurelien Jarno
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Reduce the amount of TCG ops generated from NEON vld/vst instructions
by simplifying the code generation.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/translate.c |   67 ++++++++++++++++++++++++-----------------------
 1 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index f262758..55d6377 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3708,6 +3708,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
     TCGv tmp;
     TCGv tmp2;
     TCGv_i64 tmp64;
+    TCGv stride_var;
 
     if (!vfp_enabled(env))
       return 1;
@@ -3729,6 +3730,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
             return 1;
         load_reg_var(s, addr, rn);
         stride = (1 << size) * interleave;
+        stride_var = tcg_const_i32(stride);
         for (reg = 0; reg < nregs; reg++) {
             if (interleave > 2 || (interleave == 2 && nregs == 2)) {
                 load_reg_var(s, addr, rn);
@@ -3747,7 +3749,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     neon_load_reg64(tmp64, rd);
                     gen_st64(tmp64, addr, IS_USER(s));
                 }
-                tcg_gen_addi_i32(addr, addr, stride);
+                tcg_gen_add_i32(addr, addr, stride_var);
             } else {
                 for (pass = 0; pass < 2; pass++) {
                     if (size == 2) {
@@ -3758,58 +3760,57 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                             tmp = neon_load_reg(rd, pass);
                             gen_st32(tmp, addr, IS_USER(s));
                         }
-                        tcg_gen_addi_i32(addr, addr, stride);
+                        tcg_gen_add_i32(addr, addr, stride_var);
                     } else if (size == 1) {
                         if (load) {
                             tmp = gen_ld16u(addr, IS_USER(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
+                            tcg_gen_add_i32(addr, addr, stride_var);
                             tmp2 = gen_ld16u(addr, IS_USER(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
-                            gen_bfi(tmp, tmp, tmp2, 16, 0xffff);
+                            tcg_gen_add_i32(addr, addr, stride_var);
+                            tcg_gen_shli_i32(tmp2, tmp2, 16);
+                            tcg_gen_or_i32(tmp, tmp, tmp2);
                             dead_tmp(tmp2);
                             neon_store_reg(rd, pass, tmp);
                         } else {
                             tmp = neon_load_reg(rd, pass);
-                            tmp2 = new_tmp();
-                            tcg_gen_shri_i32(tmp2, tmp, 16);
-                            gen_st16(tmp, addr, IS_USER(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
-                            gen_st16(tmp2, addr, IS_USER(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
+                            tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
+                            tcg_gen_add_i32(addr, addr, stride_var);
+                            tcg_gen_shri_i32(tmp, tmp, 16);
+                            tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
+                            tcg_gen_add_i32(addr, addr, stride_var);
+                            dead_tmp(tmp);
                         }
                     } else /* size == 0 */ {
                         if (load) {
-                            TCGV_UNUSED(tmp2);
-                            for (n = 0; n < 4; n++) {
-                                tmp = gen_ld8u(addr, IS_USER(s));
-                                tcg_gen_addi_i32(addr, addr, stride);
-                                if (n == 0) {
-                                    tmp2 = tmp;
-                                } else {
-                                    gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff);
-                                    dead_tmp(tmp);
-                                }
+                            tmp = gen_ld8u(addr, IS_USER(s));
+                            tcg_gen_add_i32(addr, addr, stride_var);
+                            for (n = 1; n < 4; n++) {
+                                tmp2 = gen_ld8u(addr, IS_USER(s));
+                                tcg_gen_add_i32(addr, addr, stride_var);
+                                tcg_gen_shli_i32(tmp2, tmp2, n * 8);
+                                tcg_gen_or_i32(tmp, tmp, tmp2);
+                                dead_tmp(tmp2);
                             }
-                            neon_store_reg(rd, pass, tmp2);
+                            neon_store_reg(rd, pass, tmp);
                         } else {
-                            tmp2 = neon_load_reg(rd, pass);
-                            for (n = 0; n < 4; n++) {
-                                tmp = new_tmp();
-                                if (n == 0) {
-                                    tcg_gen_mov_i32(tmp, tmp2);
-                                } else {
-                                    tcg_gen_shri_i32(tmp, tmp2, n * 8);
-                                }
-                                gen_st8(tmp, addr, IS_USER(s));
-                                tcg_gen_addi_i32(addr, addr, stride);
+                            tmp2 = tcg_const_i32(8);
+                            tmp = neon_load_reg(rd, pass);
+                            for (n = 0; n < 3; n++) {
+                                tcg_gen_qemu_st8(tmp, addr, IS_USER(s));
+                                tcg_gen_add_i32(addr, addr, stride_var);
+                                tcg_gen_shr_i32(tmp, tmp, tmp2);
                             }
-                            dead_tmp(tmp2);
+                            tcg_gen_qemu_st8(tmp, addr, IS_USER(s));
+                            tcg_gen_add_i32(addr, addr, stride_var);
+                            dead_tmp(tmp);
+                            tcg_temp_free_i32(tmp2);
                         }
                     }
                 }
             }
             rd += spacing;
         }
+        tcg_temp_free_i32(stride_var);
         stride = nregs * 8;
     } else {
         size = (insn >> 10) & 3;
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v2 10/10] target-arm: fix neon shift helper functions
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (8 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops juha.riihimaki
@ 2009-10-24 12:19 ` juha.riihimaki
  2009-10-25 12:16   ` Laurent Desnogues
  2009-10-25 19:17 ` [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes Aurelien Jarno
  10 siblings, 1 reply; 33+ messages in thread
From: juha.riihimaki @ 2009-10-24 12:19 UTC (permalink / raw)
  To: qemu-devel

From: Juha Riihimäki <juha.riihimaki@nokia.com>

Current code is broken at least on gcc 4.2, the result of a comparison
"-1 >= sizeof(type) * 8" results true and causes wrong code path to
be taken. The fix has been revised to use a type cast instead of
abs() function and extra checks.

Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
---
 target-arm/neon_helper.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index f32ecd6..440b2ec 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -392,7 +392,7 @@ NEON_VOP(abd_u32, neon_u32, 1)
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
-    if (tmp >= sizeof(src1) * 8 || tmp <= -sizeof(src1) * 8) { \
+    if (tmp >= (int)sizeof(src1) * 8 || tmp <= -sizeof(src1) * 8) { \
         dest = 0; \
     } else if (tmp < 0) { \
         dest = src1 >> -tmp; \
@@ -420,7 +420,7 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
-    if (tmp >= sizeof(src1) * 8) { \
+    if (tmp >= (int)sizeof(src1) * 8) { \
         dest = 0; \
     } else if (tmp <= -sizeof(src1) * 8) { \
         dest = src1 >> (sizeof(src1) * 8 - 1); \
@@ -453,7 +453,7 @@ uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
-    if (tmp >= sizeof(src1) * 8) { \
+    if (tmp >= (int)sizeof(src1) * 8) { \
         dest = 0; \
     } else if (tmp < -sizeof(src1) * 8) { \
         dest = src1 >> (sizeof(src1) * 8 - 1); \
@@ -494,7 +494,7 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t valop, uint64_t shiftop)
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
-    if (tmp >= sizeof(src1) * 8 || tmp < -sizeof(src1) * 8) { \
+    if (tmp >= (int)sizeof(src1) * 8 || tmp < -sizeof(src1) * 8) { \
         dest = 0; \
     } else if (tmp == -sizeof(src1) * 8) { \
         dest = src1 >> (tmp - 1); \
@@ -528,7 +528,7 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val, uint64_t shiftop)
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
-    if (tmp >= sizeof(src1) * 8) { \
+    if (tmp >= (int)sizeof(src1) * 8) { \
         if (src1) { \
             SET_QC(); \
             dest = ~0; \
@@ -579,7 +579,7 @@ uint64_t HELPER(neon_qshl_u64)(CPUState *env, uint64_t val, uint64_t shiftop)
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
-    if (tmp >= sizeof(src1) * 8) { \
+    if (tmp >= (int)sizeof(src1) * 8) { \
         if (src1) \
             SET_QC(); \
         dest = src1 >> 31; \
-- 
1.6.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/10] target-arm: optimize thumb2 load/store multiple ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 07/10] target-arm: optimize thumb2 load/store multiple ops juha.riihimaki
@ 2009-10-24 17:32   ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-24 17:32 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 2:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> Thumb2 load/store multiple instructions can be slightly optimized by
> loading the register offset constant into a variable outside the
> register loop and using the preloaded variable inside the loop instead
> of reloading the offset value to a temporary variable on each loop
> iteration. This causes less TCG ops to be generated for a Thumb2 load/
> store multiple instruction if there are more than one register
> accessed, otherwise the amount of generated TCG ops will be the same.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 9e924d4..353f638 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -7374,6 +7374,7 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1)
>                     tcg_gen_addi_i32(addr, addr, -offset);
>                 }
>
> +                tmp2 = tcg_const_i32(4);
>                 for (i = 0; i < 16; i++) {
>                     if ((insn & (1 << i)) == 0)
>                         continue;
> @@ -7390,8 +7391,9 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1)
>                         tmp = load_reg(s, i);
>                         gen_st32(tmp, addr, IS_USER(s));
>                     }
> -                    tcg_gen_addi_i32(addr, addr, 4);
> +                    tcg_gen_add_i32(addr, addr, tmp2);
>                 }
> +                tcg_temp_free_i32(tmp2);
>                 if (insn & (1 << 21)) {
>                     /* Base register writeback.  */
>                     if (insn & (1 << 24)) {
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/10] target-arm: optimize thumb push/pop ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 08/10] target-arm: optimize thumb push/pop ops juha.riihimaki
@ 2009-10-24 17:34   ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-24 17:34 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 2:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> Thumb push/pop instructions can be slightly optimized by loading the
> register offset constant into a variable outside the register loop and
> using the preloaded variable inside the loop instead of reloading the
> offset value to a temporary variable on each loop iteration. This
> causes less TCG ops to be generated for a Thumb push/pop instruction
> if there are more than one register accessed, otherwise the amount of
> generated TCG ops is the same.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 353f638..f262758 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -8596,6 +8596,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s)
>             if ((insn & (1 << 11)) == 0) {
>                 tcg_gen_addi_i32(addr, addr, -offset);
>             }
> +            tmp2 = tcg_const_i32(4);
>             for (i = 0; i < 8; i++) {
>                 if (insn & (1 << i)) {
>                     if (insn & (1 << 11)) {
> @@ -8608,7 +8609,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s)
>                         gen_st32(tmp, addr, IS_USER(s));
>                     }
>                     /* advance to the next address.  */
> -                    tcg_gen_addi_i32(addr, addr, 4);
> +                    tcg_gen_add_i32(addr, addr, tmp2);
>                 }
>             }
>             TCGV_UNUSED(tmp);
> @@ -8623,8 +8624,9 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s)
>                     tmp = load_reg(s, 14);
>                     gen_st32(tmp, addr, IS_USER(s));
>                 }
> -                tcg_gen_addi_i32(addr, addr, 4);
> +                tcg_gen_add_i32(addr, addr, tmp2);
>             }
> +            tcg_temp_free_i32(tmp2);
>             if ((insn & (1 << 11)) == 0) {
>                 tcg_gen_addi_i32(addr, addr, -offset);
>             }
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops juha.riihimaki
@ 2009-10-24 17:36   ` Laurent Desnogues
  2009-10-27  8:42   ` Aurelien Jarno
  1 sibling, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-24 17:36 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 2:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> VFP load/store multiple instructions can be slightly optimized by
> loading the register offset constant into a variable outside the
> register loop and using the preloaded variable inside the loop instead
> of reloading the offset value to a temporary variable on each loop
> iteration. This causes less TCG ops to be generated for a VFP load/
> store multiple instruction if there are more than one register
> accessed, otherwise the amount of generated TCG ops is the same.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 8cb1c0f..38fb833 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                     offset = 8;
>                 else
>                     offset = 4;
> +                tmp = tcg_const_i32(offset);
>                 for (i = 0; i < n; i++) {
>                     if (insn & ARM_CP_RW_BIT) {
>                         /* load */
> @@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                         gen_mov_F0_vreg(dp, rd + i);
>                         gen_vfp_st(s, dp, addr);
>                     }
> -                    tcg_gen_addi_i32(addr, addr, offset);
> +                    tcg_gen_add_i32(addr, addr, tmp);
>                 }
> +                tcg_temp_free_i32(tmp);
>                 if (insn & (1 << 21)) {
>                     /* writeback */
>                     if (insn & (1 << 24))
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/10] target-arm: fix neon vsri, vshl and vsli ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 06/10] target-arm: fix neon vsri, vshl and vsli ops juha.riihimaki
@ 2009-10-24 17:44   ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-24 17:44 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 2:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> Shift by immediate value is incorrectly overwritten by a temporary
> variable in the processing of NEON vsri, vshl and vsli instructions.
> This patch has been revised to also include a fix for the special
> case where the code would previously try to shift an integer value
> over 31 bits left/right.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |   32 ++++++++++++++++++--------------
>  1 files changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index d1e2ed2..9e924d4 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -4098,7 +4098,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
>     int pairwise;
>     int u;
>     int n;
> -    uint32_t imm;
> +    uint32_t imm, mask;
>     TCGv tmp, tmp2, tmp3, tmp4, tmp5;
>     TCGv_i64 tmp64;
>
> @@ -4626,31 +4626,35 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                             switch (size) {
>                             case 0:
>                                 if (op == 4)
> -                                    imm = 0xff >> -shift;
> +                                    mask = 0xff >> -shift;
>                                 else
> -                                    imm = (uint8_t)(0xff << shift);
> -                                imm |= imm << 8;
> -                                imm |= imm << 16;
> +                                    mask = (uint8_t)(0xff << shift);
> +                                mask |= mask << 8;
> +                                mask |= mask << 16;
>                                 break;
>                             case 1:
>                                 if (op == 4)
> -                                    imm = 0xffff >> -shift;
> +                                    mask = 0xffff >> -shift;
>                                 else
> -                                    imm = (uint16_t)(0xffff << shift);
> -                                imm |= imm << 16;
> +                                    mask = (uint16_t)(0xffff << shift);
> +                                mask |= mask << 16;
>                                 break;
>                             case 2:
> -                                if (op == 4)
> -                                    imm = 0xffffffffu >> -shift;
> -                                else
> -                                    imm = 0xffffffffu << shift;
> +                                if (shift < -31 || shift > 31) {
> +                                    mask = 0;
> +                                } else {
> +                                    if (op == 4)
> +                                        mask = 0xffffffffu >> -shift;
> +                                    else
> +                                        mask = 0xffffffffu << shift;
> +                                }
>                                 break;
>                             default:
>                                 abort();
>                             }
>                             tmp2 = neon_load_reg(rd, pass);
> -                            tcg_gen_andi_i32(tmp, tmp, imm);
> -                            tcg_gen_andi_i32(tmp2, tmp2, ~imm);
> +                            tcg_gen_andi_i32(tmp, tmp, mask);
> +                            tcg_gen_andi_i32(tmp2, tmp2, ~mask);
>                             tcg_gen_or_i32(tmp, tmp, tmp2);
>                             dead_tmp(tmp2);
>                         }
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops juha.riihimaki
@ 2009-10-25 10:58   ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-25 10:58 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 1:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> In the existing code shift value is clobbered during the pass loop.
> This patch changes the code so that it stores the intermediate
> result in the target neon register directly and eliminates the need
> to use a temporary to hold the intermediate value thus leaving the
> shift value in the temporary variable intact. This is a new patch
> in this version of the patch series.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |   12 +++---------
>  1 files changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 9d13d42..8a85db6 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -4680,18 +4680,12 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                         else
>                             gen_neon_narrow_satu(size - 1, tmp, cpu_V0);
>                     }
> -                    if (pass == 0) {
> -                        if (size != 3) {
> -                            dead_tmp(tmp2);
> -                        }
> -                        tmp2 = tmp;
> -                    } else {
> -                        neon_store_reg(rd, 0, tmp2);
> -                        neon_store_reg(rd, 1, tmp);
> -                    }
> +                    neon_store_reg(rd, pass, tmp);
>                 } /* for pass */
>                 if (size == 3) {
>                     tcg_temp_free_i64(tmp64);
> +                } else {
> +                    dead_tmp(tmp2);
>                 }
>             } else if (op == 10) {
>                 /* VSHLL */
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/10] target-arm: add support for neon vld1.64/vst1.64 instructions
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 02/10] target-arm: add support for neon vld1.64/vst1.64 instructions juha.riihimaki
@ 2009-10-25 11:11   ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-25 11:11 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 1:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> Add support for NEON vld1.64 and vst1.64 instructions. This patch is
> revised to follow more closely the specification and raises
> undefined exception if 64bit element size is used for vld2/vst2 or
> vld4/vst4 instructions.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>

Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Laurent

> ---
>  target-arm/translate.c |  133 +++++++++++++++++++++++++++++-------------------
>  1 files changed, 81 insertions(+), 52 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 8a85db6..09c996d 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -795,6 +795,12 @@ static inline TCGv gen_ld32(TCGv addr, int index)
>     tcg_gen_qemu_ld32u(tmp, addr, index);
>     return tmp;
>  }
> +static inline TCGv_i64 gen_ld64(TCGv addr, int index)
> +{
> +    TCGv_i64 tmp = tcg_temp_new_i64();
> +    tcg_gen_qemu_ld64(tmp, addr, index);
> +    return tmp;
> +}
>  static inline void gen_st8(TCGv val, TCGv addr, int index)
>  {
>     tcg_gen_qemu_st8(val, addr, index);
> @@ -810,6 +816,11 @@ static inline void gen_st32(TCGv val, TCGv addr, int index)
>     tcg_gen_qemu_st32(val, addr, index);
>     dead_tmp(val);
>  }
> +static inline void gen_st64(TCGv_i64 val, TCGv addr, int index)
> +{
> +    tcg_gen_qemu_st64(val, addr, index);
> +    tcg_temp_free_i64(val);
> +}
>
>  static inline void gen_set_pc_im(uint32_t val)
>  {
> @@ -3680,6 +3691,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>     int op;
>     int nregs;
>     int interleave;
> +    int spacing;
>     int stride;
>     int size;
>     int reg;
> @@ -3690,6 +3702,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>     TCGv addr;
>     TCGv tmp;
>     TCGv tmp2;
> +    TCGv_i64 tmp64;
>
>     if (!vfp_enabled(env))
>       return 1;
> @@ -3702,10 +3715,13 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>         /* Load store all elements.  */
>         op = (insn >> 8) & 0xf;
>         size = (insn >> 6) & 3;
> -        if (op > 10 || size == 3)
> +        if (op > 10)
>             return 1;
>         nregs = neon_ls_element_type[op].nregs;
>         interleave = neon_ls_element_type[op].interleave;
> +        spacing = neon_ls_element_type[op].spacing;
> +        if (size == 3 && (interleave | spacing) != 1)
> +            return 1;
>         load_reg_var(s, addr, rn);
>         stride = (1 << size) * interleave;
>         for (reg = 0; reg < nregs; reg++) {
> @@ -3716,65 +3732,78 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                 load_reg_var(s, addr, rn);
>                 tcg_gen_addi_i32(addr, addr, 1 << size);
>             }
> -            for (pass = 0; pass < 2; pass++) {
> -                if (size == 2) {
> -                    if (load) {
> -                        tmp = gen_ld32(addr, IS_USER(s));
> -                        neon_store_reg(rd, pass, tmp);
> -                    } else {
> -                        tmp = neon_load_reg(rd, pass);
> -                        gen_st32(tmp, addr, IS_USER(s));
> -                    }
> -                    tcg_gen_addi_i32(addr, addr, stride);
> -                } else if (size == 1) {
> -                    if (load) {
> -                        tmp = gen_ld16u(addr, IS_USER(s));
> -                        tcg_gen_addi_i32(addr, addr, stride);
> -                        tmp2 = gen_ld16u(addr, IS_USER(s));
> -                        tcg_gen_addi_i32(addr, addr, stride);
> -                        gen_bfi(tmp, tmp, tmp2, 16, 0xffff);
> -                        dead_tmp(tmp2);
> -                        neon_store_reg(rd, pass, tmp);
> -                    } else {
> -                        tmp = neon_load_reg(rd, pass);
> -                        tmp2 = new_tmp();
> -                        tcg_gen_shri_i32(tmp2, tmp, 16);
> -                        gen_st16(tmp, addr, IS_USER(s));
> -                        tcg_gen_addi_i32(addr, addr, stride);
> -                        gen_st16(tmp2, addr, IS_USER(s));
> +            if (size == 3) {
> +                if (load) {
> +                    tmp64 = gen_ld64(addr, IS_USER(s));
> +                    neon_store_reg64(tmp64, rd);
> +                    tcg_temp_free_i64(tmp64);
> +                } else {
> +                    tmp64 = tcg_temp_new_i64();
> +                    neon_load_reg64(tmp64, rd);
> +                    gen_st64(tmp64, addr, IS_USER(s));
> +                }
> +                tcg_gen_addi_i32(addr, addr, stride);
> +            } else {
> +                for (pass = 0; pass < 2; pass++) {
> +                    if (size == 2) {
> +                        if (load) {
> +                            tmp = gen_ld32(addr, IS_USER(s));
> +                            neon_store_reg(rd, pass, tmp);
> +                        } else {
> +                            tmp = neon_load_reg(rd, pass);
> +                            gen_st32(tmp, addr, IS_USER(s));
> +                        }
>                         tcg_gen_addi_i32(addr, addr, stride);
> -                    }
> -                } else /* size == 0 */ {
> -                    if (load) {
> -                        TCGV_UNUSED(tmp2);
> -                        for (n = 0; n < 4; n++) {
> -                            tmp = gen_ld8u(addr, IS_USER(s));
> +                    } else if (size == 1) {
> +                        if (load) {
> +                            tmp = gen_ld16u(addr, IS_USER(s));
> +                            tcg_gen_addi_i32(addr, addr, stride);
> +                            tmp2 = gen_ld16u(addr, IS_USER(s));
> +                            tcg_gen_addi_i32(addr, addr, stride);
> +                            gen_bfi(tmp, tmp, tmp2, 16, 0xffff);
> +                            dead_tmp(tmp2);
> +                            neon_store_reg(rd, pass, tmp);
> +                        } else {
> +                            tmp = neon_load_reg(rd, pass);
> +                            tmp2 = new_tmp();
> +                            tcg_gen_shri_i32(tmp2, tmp, 16);
> +                            gen_st16(tmp, addr, IS_USER(s));
> +                            tcg_gen_addi_i32(addr, addr, stride);
> +                            gen_st16(tmp2, addr, IS_USER(s));
>                             tcg_gen_addi_i32(addr, addr, stride);
> -                            if (n == 0) {
> -                                tmp2 = tmp;
> -                            } else {
> -                                gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff);
> -                                dead_tmp(tmp);
> -                            }
>                         }
> -                        neon_store_reg(rd, pass, tmp2);
> -                    } else {
> -                        tmp2 = neon_load_reg(rd, pass);
> -                        for (n = 0; n < 4; n++) {
> -                            tmp = new_tmp();
> -                            if (n == 0) {
> -                                tcg_gen_mov_i32(tmp, tmp2);
> -                            } else {
> -                                tcg_gen_shri_i32(tmp, tmp2, n * 8);
> +                    } else /* size == 0 */ {
> +                        if (load) {
> +                            TCGV_UNUSED(tmp2);
> +                            for (n = 0; n < 4; n++) {
> +                                tmp = gen_ld8u(addr, IS_USER(s));
> +                                tcg_gen_addi_i32(addr, addr, stride);
> +                                if (n == 0) {
> +                                    tmp2 = tmp;
> +                                } else {
> +                                    gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff);
> +                                    dead_tmp(tmp);
> +                                }
>                             }
> -                            gen_st8(tmp, addr, IS_USER(s));
> -                            tcg_gen_addi_i32(addr, addr, stride);
> +                            neon_store_reg(rd, pass, tmp2);
> +                        } else {
> +                            tmp2 = neon_load_reg(rd, pass);
> +                            for (n = 0; n < 4; n++) {
> +                                tmp = new_tmp();
> +                                if (n == 0) {
> +                                    tcg_gen_mov_i32(tmp, tmp2);
> +                                } else {
> +                                    tcg_gen_shri_i32(tmp, tmp2, n * 8);
> +                                }
> +                                gen_st8(tmp, addr, IS_USER(s));
> +                                tcg_gen_addi_i32(addr, addr, stride);
> +                            }
> +                            dead_tmp(tmp2);
>                         }
> -                        dead_tmp(tmp2);
>                     }
>                 }
>             }
> -            rd += neon_ls_element_type[op].spacing;
> +            rd += spacing;
>         }
>         stride = nregs * 8;
>     } else {
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/10] target-arm: fix neon shift helper functions
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 10/10] target-arm: fix neon shift helper functions juha.riihimaki
@ 2009-10-25 12:16   ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-25 12:16 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 1:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> Current code is broken at least on gcc 4.2, the result of a comparison
> "-1 >= sizeof(type) * 8" results true and causes wrong code path to
> be taken. The fix has been revised to use a type cast instead of
> abs() function and extra checks.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
> ---
>  target-arm/neon_helper.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
> index f32ecd6..440b2ec 100644
> --- a/target-arm/neon_helper.c
> +++ b/target-arm/neon_helper.c
> @@ -392,7 +392,7 @@ NEON_VOP(abd_u32, neon_u32, 1)
>  #define NEON_FN(dest, src1, src2) do { \
>     int8_t tmp; \
>     tmp = (int8_t)src2; \
> -    if (tmp >= sizeof(src1) * 8 || tmp <= -sizeof(src1) * 8) { \
> +    if (tmp >= (int)sizeof(src1) * 8 || tmp <= -sizeof(src1) * 8) { \

You should also cast the sizeof in the second comparison.
With gcc 4.4.1, "1 <= -sizeof(uint32_t) * 8" is true.


Laurent

>         dest = 0; \
>     } else if (tmp < 0) { \
>         dest = src1 >> -tmp; \
> @@ -420,7 +420,7 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
>  #define NEON_FN(dest, src1, src2) do { \
>     int8_t tmp; \
>     tmp = (int8_t)src2; \
> -    if (tmp >= sizeof(src1) * 8) { \
> +    if (tmp >= (int)sizeof(src1) * 8) { \
>         dest = 0; \
>     } else if (tmp <= -sizeof(src1) * 8) { \
>         dest = src1 >> (sizeof(src1) * 8 - 1); \
> @@ -453,7 +453,7 @@ uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
>  #define NEON_FN(dest, src1, src2) do { \
>     int8_t tmp; \
>     tmp = (int8_t)src2; \
> -    if (tmp >= sizeof(src1) * 8) { \
> +    if (tmp >= (int)sizeof(src1) * 8) { \
>         dest = 0; \
>     } else if (tmp < -sizeof(src1) * 8) { \
>         dest = src1 >> (sizeof(src1) * 8 - 1); \
> @@ -494,7 +494,7 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t valop, uint64_t shiftop)
>  #define NEON_FN(dest, src1, src2) do { \
>     int8_t tmp; \
>     tmp = (int8_t)src2; \
> -    if (tmp >= sizeof(src1) * 8 || tmp < -sizeof(src1) * 8) { \
> +    if (tmp >= (int)sizeof(src1) * 8 || tmp < -sizeof(src1) * 8) { \
>         dest = 0; \
>     } else if (tmp == -sizeof(src1) * 8) { \
>         dest = src1 >> (tmp - 1); \
> @@ -528,7 +528,7 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val, uint64_t shiftop)
>  #define NEON_FN(dest, src1, src2) do { \
>     int8_t tmp; \
>     tmp = (int8_t)src2; \
> -    if (tmp >= sizeof(src1) * 8) { \
> +    if (tmp >= (int)sizeof(src1) * 8) { \
>         if (src1) { \
>             SET_QC(); \
>             dest = ~0; \
> @@ -579,7 +579,7 @@ uint64_t HELPER(neon_qshl_u64)(CPUState *env, uint64_t val, uint64_t shiftop)
>  #define NEON_FN(dest, src1, src2) do { \
>     int8_t tmp; \
>     tmp = (int8_t)src2; \
> -    if (tmp >= sizeof(src1) * 8) { \
> +    if (tmp >= (int)sizeof(src1) * 8) { \
>         if (src1) \
>             SET_QC(); \
>         dest = src1 >> 31; \
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only juha.riihimaki
@ 2009-10-25 12:23   ` Laurent Desnogues
  2009-10-26  7:32     ` Juha.Riihimaki
  0 siblings, 1 reply; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-25 12:23 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 1:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> All other bits except for the EN in the VFP FPEXC register are defined
> as subarchitecture specific and real functionality for any of the
> other bits has not been implemented in QEMU. However, current code
> allows modifying all bits in the VFP FPEXC register leading to
> problems when guest code is writing 1's to the subarchitecture
> specific bits and checking whether the bits stay up to verify the
> existence of functionality which in fact does not exist in QEMU.

Shouldn't writes to FPEXC from gdb be protected in the same
way?  Except for that I agree with your patch.


Laurent

> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
> ---
>  target-arm/translate.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 09c996d..8cb1c0f 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -2788,6 +2788,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                         case ARM_VFP_FPEXC:
>                             if (IS_USER(s))
>                                 return 1;
> +                            /* TODO: VFP subarchitecture support.
> +                             * For now, keep the EN bit only */
> +                            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
>                             store_cpu_field(tmp, vfp.xregs[rn]);
>                             gen_lookup_tb(s);
>                             break;
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops juha.riihimaki
@ 2009-10-25 14:01   ` Laurent Desnogues
  2009-10-26  7:46     ` Juha.Riihimaki
  0 siblings, 1 reply; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-25 14:01 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 1:19 PM,  <juha.riihimaki@nokia.com> wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>
> Reduce the amount of TCG ops generated from NEON vld/vst instructions
> by simplifying the code generation.
>
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
> ---
>  target-arm/translate.c |   67 ++++++++++++++++++++++++-----------------------
>  1 files changed, 34 insertions(+), 33 deletions(-)
>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index f262758..55d6377 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -3708,6 +3708,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>     TCGv tmp;
>     TCGv tmp2;
>     TCGv_i64 tmp64;
> +    TCGv stride_var;
>
>     if (!vfp_enabled(env))
>       return 1;
> @@ -3729,6 +3730,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>             return 1;
>         load_reg_var(s, addr, rn);
>         stride = (1 << size) * interleave;
> +        stride_var = tcg_const_i32(stride);
>         for (reg = 0; reg < nregs; reg++) {
>             if (interleave > 2 || (interleave == 2 && nregs == 2)) {
>                 load_reg_var(s, addr, rn);
> @@ -3747,7 +3749,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                     neon_load_reg64(tmp64, rd);
>                     gen_st64(tmp64, addr, IS_USER(s));
>                 }
> -                tcg_gen_addi_i32(addr, addr, stride);
> +                tcg_gen_add_i32(addr, addr, stride_var);
>             } else {
>                 for (pass = 0; pass < 2; pass++) {
>                     if (size == 2) {
> @@ -3758,58 +3760,57 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                             tmp = neon_load_reg(rd, pass);
>                             gen_st32(tmp, addr, IS_USER(s));
>                         }
> -                        tcg_gen_addi_i32(addr, addr, stride);
> +                        tcg_gen_add_i32(addr, addr, stride_var);
>                     } else if (size == 1) {
>                         if (load) {
>                             tmp = gen_ld16u(addr, IS_USER(s));
> -                            tcg_gen_addi_i32(addr, addr, stride);
> +                            tcg_gen_add_i32(addr, addr, stride_var);
>                             tmp2 = gen_ld16u(addr, IS_USER(s));
> -                            tcg_gen_addi_i32(addr, addr, stride);
> -                            gen_bfi(tmp, tmp, tmp2, 16, 0xffff);
> +                            tcg_gen_add_i32(addr, addr, stride_var);
> +                            tcg_gen_shli_i32(tmp2, tmp2, 16);
> +                            tcg_gen_or_i32(tmp, tmp, tmp2);
>                             dead_tmp(tmp2);
>                             neon_store_reg(rd, pass, tmp);
>                         } else {
>                             tmp = neon_load_reg(rd, pass);
> -                            tmp2 = new_tmp();
> -                            tcg_gen_shri_i32(tmp2, tmp, 16);
> -                            gen_st16(tmp, addr, IS_USER(s));
> -                            tcg_gen_addi_i32(addr, addr, stride);
> -                            gen_st16(tmp2, addr, IS_USER(s));
> -                            tcg_gen_addi_i32(addr, addr, stride);
> +                            tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
> +                            tcg_gen_add_i32(addr, addr, stride_var);
> +                            tcg_gen_shri_i32(tmp, tmp, 16);
> +                            tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
> +                            tcg_gen_add_i32(addr, addr, stride_var);
> +                            dead_tmp(tmp);

I don't really like the idea of having tcg_qemu_ld/st not factored
in some place, as it makes memory access tracing extensions
more intrusive.

This brings us back to the problem having function freeing tmps.
In that case, you could perhaps create a gen_st16_dont_free
function as a temporary workaround?

>                         }
>                     } else /* size == 0 */ {
>                         if (load) {
> -                            TCGV_UNUSED(tmp2);
> -                            for (n = 0; n < 4; n++) {
> -                                tmp = gen_ld8u(addr, IS_USER(s));
> -                                tcg_gen_addi_i32(addr, addr, stride);
> -                                if (n == 0) {
> -                                    tmp2 = tmp;
> -                                } else {
> -                                    gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff);
> -                                    dead_tmp(tmp);
> -                                }
> +                            tmp = gen_ld8u(addr, IS_USER(s));
> +                            tcg_gen_add_i32(addr, addr, stride_var);
> +                            for (n = 1; n < 4; n++) {
> +                                tmp2 = gen_ld8u(addr, IS_USER(s));
> +                                tcg_gen_add_i32(addr, addr, stride_var);
> +                                tcg_gen_shli_i32(tmp2, tmp2, n * 8);
> +                                tcg_gen_or_i32(tmp, tmp, tmp2);
> +                                dead_tmp(tmp2);
>                             }
> -                            neon_store_reg(rd, pass, tmp2);
> +                            neon_store_reg(rd, pass, tmp);
>                         } else {
> -                            tmp2 = neon_load_reg(rd, pass);
> -                            for (n = 0; n < 4; n++) {
> -                                tmp = new_tmp();
> -                                if (n == 0) {
> -                                    tcg_gen_mov_i32(tmp, tmp2);
> -                                } else {
> -                                    tcg_gen_shri_i32(tmp, tmp2, n * 8);
> -                                }
> -                                gen_st8(tmp, addr, IS_USER(s));
> -                                tcg_gen_addi_i32(addr, addr, stride);
> +                            tmp2 = tcg_const_i32(8);
> +                            tmp = neon_load_reg(rd, pass);
> +                            for (n = 0; n < 3; n++) {
> +                                tcg_gen_qemu_st8(tmp, addr, IS_USER(s));
> +                                tcg_gen_add_i32(addr, addr, stride_var);
> +                                tcg_gen_shr_i32(tmp, tmp, tmp2);
>                             }
> -                            dead_tmp(tmp2);
> +                            tcg_gen_qemu_st8(tmp, addr, IS_USER(s));
> +                            tcg_gen_add_i32(addr, addr, stride_var);
> +                            dead_tmp(tmp);
> +                            tcg_temp_free_i32(tmp2);

Same comment as above.


Laurent

>                         }
>                     }
>                 }
>             }
>             rd += spacing;
>         }
> +        tcg_temp_free_i32(stride_var);
>         stride = nregs * 8;
>     } else {
>         size = (insn >> 10) & 3;
> --
> 1.6.5
>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes
  2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
                   ` (9 preceding siblings ...)
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 10/10] target-arm: fix neon shift helper functions juha.riihimaki
@ 2009-10-25 19:17 ` Aurelien Jarno
  10 siblings, 0 replies; 33+ messages in thread
From: Aurelien Jarno @ 2009-10-25 19:17 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 03:18:59PM +0300, juha.riihimaki@nokia.com wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
> 
> This is the revised set of patches for the ARM translator and it includes a number of smaller fixes and improvements. The series should be applied in sequence as the modifications are mostly related to the same file, target-arm/translate.c. The whole series should apply cleanly against latest git.

Thanks for this new version. I have all the patches acked by Laurent
Desnogues in my local tree. I'll push them if in a few days if no
other comments are received.

> Note that the three patches have been removed from this series compared to the previous version and one new one added.
> 
> 
> Juha Riihimäki (10):
>   target-arm: fix neon vshrn/vrshrn ops
>   target-arm: add support for neon vld1.64/vst1.64 instructions
>   target-arm: allow modifying vfp fpexc en bit only
>   target-arm: optimize vfp load/store multiple ops
>   target-arm: optimize arm load/store multiple ops
>   target-arm: fix neon vsri, vshl and vsli ops
>   target-arm: optimize thumb2 load/store multiple ops
>   target-arm: optimize thumb push/pop ops
>   target-arm: optimize neon vld/vst ops
>   target-arm: fix neon shift helper functions
> 
>  target-arm/neon_helper.c |   12 ++--
>  target-arm/translate.c   |  205 +++++++++++++++++++++++++++-------------------
>  2 files changed, 128 insertions(+), 89 deletions(-)
> 
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only
  2009-10-25 12:23   ` Laurent Desnogues
@ 2009-10-26  7:32     ` Juha.Riihimaki
  2009-10-26  9:08       ` Laurent Desnogues
  0 siblings, 1 reply; 33+ messages in thread
From: Juha.Riihimaki @ 2009-10-26  7:32 UTC (permalink / raw)
  To: laurent.desnogues; +Cc: qemu-devel

On Oct 25, 2009, at 14:23, ext Laurent Desnogues wrote:

> On Sat, Oct 24, 2009 at 1:19 PM,  <juha.riihimaki@nokia.com> wrote:
>> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>>
>> All other bits except for the EN in the VFP FPEXC register are  
>> defined
>> as subarchitecture specific and real functionality for any of the
>> other bits has not been implemented in QEMU. However, current code
>> allows modifying all bits in the VFP FPEXC register leading to
>> problems when guest code is writing 1's to the subarchitecture
>> specific bits and checking whether the bits stay up to verify the
>> existence of functionality which in fact does not exist in QEMU.
>
> Shouldn't writes to FPEXC from gdb be protected in the same
> way?  Except for that I agree with your patch.

Please correct me if I'm wrong but it seems to me that the code in  
gdbstub.c never writes anything to the VFP registers, at least it  
seems to me the code in cpu_gdb_write_register and  
cpu_gdb_read_register ignore floating point registers.


Regards,
Juha

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-25 14:01   ` Laurent Desnogues
@ 2009-10-26  7:46     ` Juha.Riihimaki
  2009-10-26  9:11       ` Laurent Desnogues
  0 siblings, 1 reply; 33+ messages in thread
From: Juha.Riihimaki @ 2009-10-26  7:46 UTC (permalink / raw)
  To: laurent.desnogues; +Cc: qemu-devel


On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote:

> I don't really like the idea of having tcg_qemu_ld/st not factored
> in some place, as it makes memory access tracing extensions
> more intrusive.
>
> This brings us back to the problem having function freeing tmps.
> In that case, you could perhaps create a gen_st16_dont_free
> function as a temporary workaround?

I could, but it is getting ugly =/ How about if I create another patch  
that moves the temporary variable (de)allocation outside the gen_ldx/ 
gen_stx functions and then refactor this patch on top of that?


Regards,
Juha

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only
  2009-10-26  7:32     ` Juha.Riihimaki
@ 2009-10-26  9:08       ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-26  9:08 UTC (permalink / raw)
  To: Juha.Riihimaki; +Cc: qemu-devel

On Mon, Oct 26, 2009 at 8:32 AM,  <Juha.Riihimaki@nokia.com> wrote:
[...]
>>
>> Shouldn't writes to FPEXC from gdb be protected in the same
>> way?  Except for that I agree with your patch.
>
> Please correct me if I'm wrong but it seems to me that the code in
> gdbstub.c never writes anything to the VFP registers, at least it
> seems to me the code in cpu_gdb_write_register and
> cpu_gdb_read_register ignore floating point registers.

There's code in helper.c that implements writes to coprocessor
registers.  Look for vfp_gdb_set_reg which is registered by
cpu_arm_init.


Laurent

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-26  7:46     ` Juha.Riihimaki
@ 2009-10-26  9:11       ` Laurent Desnogues
  2009-10-26 21:05         ` Aurelien Jarno
  0 siblings, 1 reply; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-26  9:11 UTC (permalink / raw)
  To: Juha.Riihimaki; +Cc: qemu-devel

On Mon, Oct 26, 2009 at 8:46 AM,  <Juha.Riihimaki@nokia.com> wrote:
>
> On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote:
>
>> I don't really like the idea of having tcg_qemu_ld/st not factored
>> in some place, as it makes memory access tracing extensions
>> more intrusive.
>>
>> This brings us back to the problem having function freeing tmps.
>> In that case, you could perhaps create a gen_st16_dont_free
>> function as a temporary workaround?
>
> I could, but it is getting ugly =/ How about if I create another patch
> that moves the temporary variable (de)allocation outside the gen_ldx/
> gen_stx functions and then refactor this patch on top of that?

I'd personally like this, but I guess you'd better wait for some inputs
from other QEMU dev's before doing it.


Laurent

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-26  9:11       ` Laurent Desnogues
@ 2009-10-26 21:05         ` Aurelien Jarno
  2009-10-29 13:45           ` Juha.Riihimaki
  0 siblings, 1 reply; 33+ messages in thread
From: Aurelien Jarno @ 2009-10-26 21:05 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: Juha.Riihimaki, qemu-devel

On Mon, Oct 26, 2009 at 10:11:07AM +0100, Laurent Desnogues wrote:
> On Mon, Oct 26, 2009 at 8:46 AM,  <Juha.Riihimaki@nokia.com> wrote:
> >
> > On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote:
> >
> >> I don't really like the idea of having tcg_qemu_ld/st not factored
> >> in some place, as it makes memory access tracing extensions
> >> more intrusive.
> >>
> >> This brings us back to the problem having function freeing tmps.
> >> In that case, you could perhaps create a gen_st16_dont_free
> >> function as a temporary workaround?
> >
> > I could, but it is getting ugly =/ How about if I create another patch
> > that moves the temporary variable (de)allocation outside the gen_ldx/
> > gen_stx functions and then refactor this patch on top of that?
> 
> I'd personally like this, but I guess you'd better wait for some inputs
> from other QEMU dev's before doing it.

Looks like the best option to me.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm " juha.riihimaki
@ 2009-10-27  8:39   ` Aurelien Jarno
  2009-10-27  8:48     ` Juha.Riihimaki
  0 siblings, 1 reply; 33+ messages in thread
From: Aurelien Jarno @ 2009-10-27  8:39 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 03:19:04PM +0300, juha.riihimaki@nokia.com wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
> 
> RM load/store multiple instructions can be slightly optimized by
> loading the register offset constant into a variable outside the
> register loop and using the preloaded variable inside the loop instead
> of reloading the offset value to a temporary variable on each loop
> iteration. This causes less TCG ops to be generated for a ARM load/
> store multiple instruction if there are more than one register
> accessed, otherwise the number of generated TCG ops is the same.
> 
> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
> Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>

This patch breaks, the boot of an arm kernel, as tmp2 is used elsewhere
within this code path.

OTOH, while it reduce the number of TCG ops, that should not impact the
generated host asm code, as most (all ?) targets are able to add a
small constant value to a register in one instruction.

> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 38fb833..d1e2ed2 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -6852,6 +6852,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
>                  }
>                  rn = (insn >> 16) & 0xf;
>                  addr = load_reg(s, rn);
> +                tmp2 = tcg_const_i32(4);
>  
>                  /* compute total size */
>                  loaded_base = 0;
> @@ -6865,7 +6866,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
>                  if (insn & (1 << 23)) {
>                      if (insn & (1 << 24)) {
>                          /* pre increment */
> -                        tcg_gen_addi_i32(addr, addr, 4);
> +                        tcg_gen_add_i32(addr, addr, tmp2);
>                      } else {
>                          /* post increment */
>                      }
> @@ -6918,7 +6919,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
>                          j++;
>                          /* no need to add after the last transfer */
>                          if (j != n)
> -                            tcg_gen_addi_i32(addr, addr, 4);
> +                            tcg_gen_add_i32(addr, addr, tmp2);
>                      }
>                  }
>                  if (insn & (1 << 21)) {
> @@ -6928,7 +6929,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
>                              /* pre increment */
>                          } else {
>                              /* post increment */
> -                            tcg_gen_addi_i32(addr, addr, 4);
> +                            tcg_gen_add_i32(addr, addr, tmp2);
>                          }
>                      } else {
>                          if (insn & (1 << 24)) {
> @@ -6944,6 +6945,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
>                  } else {
>                      dead_tmp(addr);
>                  }
> +                tcg_temp_free_i32(tmp2);
>                  if (loaded_base) {
>                      store_reg(s, rn, loaded_var);
>                  }
> -- 
> 1.6.5
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops
  2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops juha.riihimaki
  2009-10-24 17:36   ` Laurent Desnogues
@ 2009-10-27  8:42   ` Aurelien Jarno
  1 sibling, 0 replies; 33+ messages in thread
From: Aurelien Jarno @ 2009-10-27  8:42 UTC (permalink / raw)
  To: juha.riihimaki; +Cc: qemu-devel

On Sat, Oct 24, 2009 at 03:19:03PM +0300, juha.riihimaki@nokia.com wrote:
> From: Juha Riihimäki <juha.riihimaki@nokia.com>
> 
> VFP load/store multiple instructions can be slightly optimized by
> loading the register offset constant into a variable outside the
> register loop and using the preloaded variable inside the loop instead
> of reloading the offset value to a temporary variable on each loop
> iteration. This causes less TCG ops to be generated for a VFP load/
> store multiple instruction if there are more than one register
> accessed, otherwise the amount of generated TCG ops is the same.

Same for this patch, it should not change the generated host code, so I
am not sure it really worth it.

> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
> ---
>  target-arm/translate.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 8cb1c0f..38fb833 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                      offset = 8;
>                  else
>                      offset = 4;
> +                tmp = tcg_const_i32(offset);
>                  for (i = 0; i < n; i++) {
>                      if (insn & ARM_CP_RW_BIT) {
>                          /* load */
> @@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
>                          gen_mov_F0_vreg(dp, rd + i);
>                          gen_vfp_st(s, dp, addr);
>                      }
> -                    tcg_gen_addi_i32(addr, addr, offset);
> +                    tcg_gen_add_i32(addr, addr, tmp);
>                  }
> +                tcg_temp_free_i32(tmp);
>                  if (insn & (1 << 21)) {
>                      /* writeback */
>                      if (insn & (1 << 24))
> -- 
> 1.6.5
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops
  2009-10-27  8:39   ` Aurelien Jarno
@ 2009-10-27  8:48     ` Juha.Riihimaki
  2009-10-27  9:00       ` Aurelien Jarno
  0 siblings, 1 reply; 33+ messages in thread
From: Juha.Riihimaki @ 2009-10-27  8:48 UTC (permalink / raw)
  To: aurelien; +Cc: qemu-devel

On Oct 27, 2009, at 10:39, ext Aurelien Jarno wrote:

> On Sat, Oct 24, 2009 at 03:19:04PM +0300, juha.riihimaki@nokia.com  
> wrote:
>> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>>
>> RM load/store multiple instructions can be slightly optimized by
>> loading the register offset constant into a variable outside the
>> register loop and using the preloaded variable inside the loop  
>> instead
>> of reloading the offset value to a temporary variable on each loop
>> iteration. This causes less TCG ops to be generated for a ARM load/
>> store multiple instruction if there are more than one register
>> accessed, otherwise the number of generated TCG ops is the same.
>>
>> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
>> Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>
> This patch breaks, the boot of an arm kernel, as tmp2 is used  
> elsewhere
> within this code path.

True, I just noticed that as well. This is because the resource leak  
patch
was refactored to utilize tmp2 inside the loop as well. I just sent a  
new
revision of this patch that uses tmp3 for th constant value.

> OTOH, while it reduce the number of TCG ops, that should not impact  
> the
> generated host asm code, as most (all ?) targets are able to add a
> small constant value to a register in one instruction.

This is true, but I still think it provides a small speed gain as  
there are
less TCG ops to be processed when generating host code...?


Cheers,
Juha

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops
  2009-10-27  8:48     ` Juha.Riihimaki
@ 2009-10-27  9:00       ` Aurelien Jarno
  2009-10-27  9:05         ` Juha.Riihimaki
  0 siblings, 1 reply; 33+ messages in thread
From: Aurelien Jarno @ 2009-10-27  9:00 UTC (permalink / raw)
  To: Juha.Riihimaki; +Cc: qemu-devel

Juha.Riihimaki@nokia.com a écrit :
> On Oct 27, 2009, at 10:39, ext Aurelien Jarno wrote:
> 
>> On Sat, Oct 24, 2009 at 03:19:04PM +0300, juha.riihimaki@nokia.com  
>> wrote:
>>> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>>>
>>> RM load/store multiple instructions can be slightly optimized by
>>> loading the register offset constant into a variable outside the
>>> register loop and using the preloaded variable inside the loop  
>>> instead
>>> of reloading the offset value to a temporary variable on each loop
>>> iteration. This causes less TCG ops to be generated for a ARM load/
>>> store multiple instruction if there are more than one register
>>> accessed, otherwise the number of generated TCG ops is the same.
>>>
>>> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
>>> Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>> This patch breaks, the boot of an arm kernel, as tmp2 is used  
>> elsewhere
>> within this code path.
> 
> True, I just noticed that as well. This is because the resource leak  
> patch
> was refactored to utilize tmp2 inside the loop as well. I just sent a  
> new
> revision of this patch that uses tmp3 for th constant value.
> 
>> OTOH, while it reduce the number of TCG ops, that should not impact  
>> the
>> generated host asm code, as most (all ?) targets are able to add a
>> small constant value to a register in one instruction.
> 
> This is true, but I still think it provides a small speed gain as  
> there are
> less TCG ops to be processed when generating host code...?

It means less TCG ops, but one more temp variable, therefore if there is
a gain, I don't think it is something even measurable.

OTOH it makes the code a bit more complex to read. I am not really
opposed to this patch (and the other patches of the same kind), but I
will need some more arguments to convince me.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops
  2009-10-27  9:00       ` Aurelien Jarno
@ 2009-10-27  9:05         ` Juha.Riihimaki
  0 siblings, 0 replies; 33+ messages in thread
From: Juha.Riihimaki @ 2009-10-27  9:05 UTC (permalink / raw)
  To: aurelien; +Cc: qemu-devel

On Oct 27, 2009, at 11:00, ext Aurelien Jarno wrote:

> Juha.Riihimaki@nokia.com a écrit :
>> On Oct 27, 2009, at 10:39, ext Aurelien Jarno wrote:
>>
>>> On Sat, Oct 24, 2009 at 03:19:04PM +0300, juha.riihimaki@nokia.com
>>> wrote:
>>>> From: Juha Riihimäki <juha.riihimaki@nokia.com>
>>>>
>>>> RM load/store multiple instructions can be slightly optimized by
>>>> loading the register offset constant into a variable outside the
>>>> register loop and using the preloaded variable inside the loop
>>>> instead
>>>> of reloading the offset value to a temporary variable on each loop
>>>> iteration. This causes less TCG ops to be generated for a ARM load/
>>>> store multiple instruction if there are more than one register
>>>> accessed, otherwise the number of generated TCG ops is the same.
>>>>
>>>> Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
>>>> Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>>> This patch breaks, the boot of an arm kernel, as tmp2 is used
>>> elsewhere
>>> within this code path.
>>
>> True, I just noticed that as well. This is because the resource leak
>> patch
>> was refactored to utilize tmp2 inside the loop as well. I just sent a
>> new
>> revision of this patch that uses tmp3 for th constant value.
>>
>>> OTOH, while it reduce the number of TCG ops, that should not impact
>>> the
>>> generated host asm code, as most (all ?) targets are able to add a
>>> small constant value to a register in one instruction.
>>
>> This is true, but I still think it provides a small speed gain as
>> there are
>> less TCG ops to be processed when generating host code...?
>
> It means less TCG ops, but one more temp variable, therefore if  
> there is
> a gain, I don't think it is something even measurable.
>
> OTOH it makes the code a bit more complex to read. I am not really
> opposed to this patch (and the other patches of the same kind), but I
> will need some more arguments to convince me.

Shouldn't the amount of temp variables stay the same? tcg_gen_addi_i32  
will internally allocate a temporary variable to hold the integer  
parameter. The only difference is that the temporary stays alive  
during the loop instead of being allocated/deallocated during each  
iteration. But I agree, the performance gain is probably quite small.


Regards,
Juha

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-26 21:05         ` Aurelien Jarno
@ 2009-10-29 13:45           ` Juha.Riihimaki
  2009-10-29 13:52             ` Laurent Desnogues
  0 siblings, 1 reply; 33+ messages in thread
From: Juha.Riihimaki @ 2009-10-29 13:45 UTC (permalink / raw)
  To: aurelien; +Cc: laurent.desnogues, qemu-devel

On Oct 26, 2009, at 23:05, ext Aurelien Jarno wrote:

> On Mon, Oct 26, 2009 at 10:11:07AM +0100, Laurent Desnogues wrote:
>> On Mon, Oct 26, 2009 at 8:46 AM,  <Juha.Riihimaki@nokia.com> wrote:
>>>
>>> On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote:
>>>
>>>> I don't really like the idea of having tcg_qemu_ld/st not factored
>>>> in some place, as it makes memory access tracing extensions
>>>> more intrusive.
>>>>
>>>> This brings us back to the problem having function freeing tmps.
>>>> In that case, you could perhaps create a gen_st16_dont_free
>>>> function as a temporary workaround?
>>>
>>> I could, but it is getting ugly =/ How about if I create another  
>>> patch
>>> that moves the temporary variable (de)allocation outside the  
>>> gen_ldx/
>>> gen_stx functions and then refactor this patch on top of that?
>>
>> I'd personally like this, but I guess you'd better wait for some  
>> inputs
>> from other QEMU dev's before doing it.
>
> Looks like the best option to me.

Alrighty then, I did the patch against the latest git and it's rather  
large... but seems to have broken nothing at least in my testing. The  
patch will remove all implicit tcg temp variable allocation and  
deallocation in target-arm/translate.c and make it the responsibility  
of the calling function. At the same time I also removed the new_tmp  
and dead_tmp functions completely because I see no point in only  
tracking some of the 32bit temp variables instead of everything.  
Personally I think the patch makes reading and understanding (and why  
not also writing) the file much easier. I do also have a version that  
has a compile time option of substituting the tcg temp variable alloc/ 
dealloc function calls with inline functions that track the usage but  
this is not included with the patch.

I'll send the patch shortly, should be nice bed-time reading I  
guess ;) Please comment when you have free time to read it through...


Cheers,
Juha

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
  2009-10-29 13:45           ` Juha.Riihimaki
@ 2009-10-29 13:52             ` Laurent Desnogues
  0 siblings, 0 replies; 33+ messages in thread
From: Laurent Desnogues @ 2009-10-29 13:52 UTC (permalink / raw)
  To: Juha.Riihimaki; +Cc: qemu-devel, aurelien

On Thu, Oct 29, 2009 at 2:45 PM,  <Juha.Riihimaki@nokia.com> wrote:
[...]
>
> Alrighty then, I did the patch against the latest git and it's rather
> large... but seems to have broken nothing at least in my testing. The
> patch will remove all implicit tcg temp variable allocation and
> deallocation in target-arm/translate.c and make it the responsibility
> of the calling function. At the same time I also removed the new_tmp
> and dead_tmp functions completely because I see no point in only
> tracking some of the 32bit temp variables instead of everything.
> Personally I think the patch makes reading and understanding (and why
> not also writing) the file much easier. I do also have a version that
> has a compile time option of substituting the tcg temp variable alloc/
> dealloc function calls with inline functions that track the usage but
> this is not included with the patch.

I was thinking about alloc/dealloc problems and I think a good way
of detecting issues is to exercise all target instructions.  It can be
done easily with some bit trickery for targets with regular encodings.
I have already done so for a new target and this helps a lot;  it only
required a few dozens lines of code, but it's hacky :-)

> I'll send the patch shortly, should be nice bed-time reading I
> guess ;) Please comment when you have free time to read it through...

Nice!


Laurent

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-10-29 13:52 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-24 12:18 [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes juha.riihimaki
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 01/10] target-arm: fix neon vshrn/vrshrn ops juha.riihimaki
2009-10-25 10:58   ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 02/10] target-arm: add support for neon vld1.64/vst1.64 instructions juha.riihimaki
2009-10-25 11:11   ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 03/10] target-arm: allow modifying vfp fpexc en bit only juha.riihimaki
2009-10-25 12:23   ` Laurent Desnogues
2009-10-26  7:32     ` Juha.Riihimaki
2009-10-26  9:08       ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 04/10] target-arm: optimize vfp load/store multiple ops juha.riihimaki
2009-10-24 17:36   ` Laurent Desnogues
2009-10-27  8:42   ` Aurelien Jarno
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm " juha.riihimaki
2009-10-27  8:39   ` Aurelien Jarno
2009-10-27  8:48     ` Juha.Riihimaki
2009-10-27  9:00       ` Aurelien Jarno
2009-10-27  9:05         ` Juha.Riihimaki
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 06/10] target-arm: fix neon vsri, vshl and vsli ops juha.riihimaki
2009-10-24 17:44   ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 07/10] target-arm: optimize thumb2 load/store multiple ops juha.riihimaki
2009-10-24 17:32   ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 08/10] target-arm: optimize thumb push/pop ops juha.riihimaki
2009-10-24 17:34   ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops juha.riihimaki
2009-10-25 14:01   ` Laurent Desnogues
2009-10-26  7:46     ` Juha.Riihimaki
2009-10-26  9:11       ` Laurent Desnogues
2009-10-26 21:05         ` Aurelien Jarno
2009-10-29 13:45           ` Juha.Riihimaki
2009-10-29 13:52             ` Laurent Desnogues
2009-10-24 12:19 ` [Qemu-devel] [PATCH v2 10/10] target-arm: fix neon shift helper functions juha.riihimaki
2009-10-25 12:16   ` Laurent Desnogues
2009-10-25 19:17 ` [Qemu-devel] [PATCH v2 00/10] target-arm: miscellaneous fixes Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).