[Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes
@ 2011-03-15 16:26 Peter Maydell
  2011-03-15 16:26 ` [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single " Peter Maydell
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Peter Maydell @ 2011-03-15 16:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

This patchset fixes various bugs in the implementation of the Neon
instructions loading a single element to all lanes.

The primary bug is that VLD1's "single element to all lanes" form differs
from those for VLD2, VLD3 and VLD4 in that bit 5 indicates whether the
loaded element should be written to one or two Dregs, rather than being a
register stride. (This is the issue addressed by Meego commit 6f3b4ee,
although my choice of fix is different.)

We were also incorrectly UNDEFfing VLD4.32 with 16 byte alignment
specifier, failing to UNDEF for invalid size and alignment combinations,
and leaking a TCG temporary on the UNDEF code paths.

Tested via the usual random instruction set testing, and also with the
valgrind 'neon64' test case.

Peter Maydell (2):
  target-arm: Fix VLD of single element to all lanes
  target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space

 target-arm/translate.c |   92 ++++++++++++++++++++++++++++++++++--------------
 1 files changed, 65 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single element to all lanes
  2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
@ 2011-03-15 16:26 ` Peter Maydell
  2011-03-15 16:26 ` [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
  2011-04-01 20:34 ` [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Aurelien Jarno
  2 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2011-03-15 16:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Fix several bugs in VLD of single element to all lanes:

The "single element to all lanes" form of VLD1 differs from those for
VLD2, VLD3 and VLD4 in that bit 5 indicates whether the loaded element
should be written to one or two Dregs (rather than being a register
stride). Handle this by special-casing VLD1 rather than trying to
have one loop which deals with both VLD1 and 2/3/4.

Handle VLD4.32 with 16 byte alignment specified, rather than UNDEFfing.

UNDEF for the invalid size and alignment combinations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |   84 +++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 062de5e..bef807d 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -2648,6 +2648,28 @@ static void gen_neon_dup_high16(TCGv var)
     tcg_temp_free_i32(tmp);
 }
 
+static TCGv gen_load_and_replicate(DisasContext *s, TCGv addr, int size)
+{
+    /* Load a single Neon element and replicate into a 32 bit TCG reg */
+    TCGv tmp;
+    switch (size) {
+    case 0:
+        tmp = gen_ld8u(addr, IS_USER(s));
+        gen_neon_dup_u8(tmp, 0);
+        break;
+    case 1:
+        tmp = gen_ld16u(addr, IS_USER(s));
+        gen_neon_dup_low16(tmp);
+        break;
+    case 2:
+        tmp = gen_ld32(addr, IS_USER(s));
+        break;
+    default: /* Avoid compiler warnings.  */
+        abort();
+    }
+    return tmp;
+}
+
 /* Disassemble a VFP instruction.  Returns nonzero if an error occured
    (ie. an undefined instruction).  */
 static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
@@ -3890,36 +3912,48 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
         size = (insn >> 10) & 3;
         if (size == 3) {
             /* Load single element to all lanes.  */
-            if (!load)
+            int a = (insn >> 4) & 1;
+            if (!load) {
                 return 1;
+            }
             size = (insn >> 6) & 3;
             nregs = ((insn >> 8) & 3) + 1;
-            stride = (insn & (1 << 5)) ? 2 : 1;
-            load_reg_var(s, addr, rn);
-            for (reg = 0; reg < nregs; reg++) {
-                switch (size) {
-                case 0:
-                    tmp = gen_ld8u(addr, IS_USER(s));
-                    gen_neon_dup_u8(tmp, 0);
-                    break;
-                case 1:
-                    tmp = gen_ld16u(addr, IS_USER(s));
-                    gen_neon_dup_low16(tmp);
-                    break;
-                case 2:
-                    tmp = gen_ld32(addr, IS_USER(s));
-                    break;
-                case 3:
+
+            if (size == 3) {
+                if (nregs != 4 || a == 0) {
                     return 1;
-                default: /* Avoid compiler warnings.  */
-                    abort();
                 }
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-                tmp2 = tcg_temp_new_i32();
-                tcg_gen_mov_i32(tmp2, tmp);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp);
-                rd += stride;
+                /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
+                size = 2;
+            }
+            if (nregs == 1 && a == 1 && size == 0) {
+                return 1;
+            }
+            if (nregs == 3 && a == 1) {
+                return 1;
+            }
+            load_reg_var(s, addr, rn);
+            if (nregs == 1) {
+                /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
+                tmp = gen_load_and_replicate(s, addr, size);
+                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
+                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
+                if (insn & (1 << 5)) {
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 0));
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 1));
+                }
+                tcg_temp_free_i32(tmp);
+            } else {
+                /* VLD2/3/4 to all lanes: bit 5 indicates register stride */
+                stride = (insn & (1 << 5)) ? 2 : 1;
+                for (reg = 0; reg < nregs; reg++) {
+                    tmp = gen_load_and_replicate(s, addr, size);
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
+                    tcg_temp_free_i32(tmp);
+                    tcg_gen_addi_i32(addr, addr, 1 << size);
+                    rd += stride;
+                }
             }
             stride = (1 << size) * nregs;
         } else {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
  2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
  2011-03-15 16:26 ` [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single " Peter Maydell
@ 2011-03-15 16:26 ` Peter Maydell
  2011-04-01 20:34 ` [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Aurelien Jarno
  2 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2011-03-15 16:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Move the allocation and freeing of the TCG temp used for the address for
Neon load/store instructions so that we don't allocate the temporary
until we've done enough decoding to know that the instruction is not
an UNDEF pattern; this avoids leaking the TCG temp in these cases.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index bef807d..5e87740 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3810,7 +3810,6 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
     rn = (insn >> 16) & 0xf;
     rm = insn & 0xf;
     load = (insn & (1 << 21)) != 0;
-    addr = tcg_temp_new_i32();
     if ((insn & (1 << 23)) == 0) {
         /* Load store all elements.  */
         op = (insn >> 8) & 0xf;
@@ -3822,6 +3821,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
         spacing = neon_ls_element_type[op].spacing;
         if (size == 3 && (interleave | spacing) != 1)
             return 1;
+        addr = tcg_temp_new_i32();
         load_reg_var(s, addr, rn);
         stride = (1 << size) * interleave;
         for (reg = 0; reg < nregs; reg++) {
@@ -3907,6 +3907,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
             }
             rd += spacing;
         }
+        tcg_temp_free_i32(addr);
         stride = nregs * 8;
     } else {
         size = (insn >> 10) & 3;
@@ -3932,6 +3933,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
             if (nregs == 3 && a == 1) {
                 return 1;
             }
+            addr = tcg_temp_new_i32();
             load_reg_var(s, addr, rn);
             if (nregs == 1) {
                 /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
@@ -3955,6 +3957,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     rd += stride;
                 }
             }
+            tcg_temp_free_i32(addr);
             stride = (1 << size) * nregs;
         } else {
             /* Single element.  */
@@ -3976,6 +3979,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 abort();
             }
             nregs = ((insn >> 8) & 3) + 1;
+            addr = tcg_temp_new_i32();
             load_reg_var(s, addr, rn);
             for (reg = 0; reg < nregs; reg++) {
                 if (load) {
@@ -4017,10 +4021,10 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 rd += stride;
                 tcg_gen_addi_i32(addr, addr, 1 << size);
             }
+            tcg_temp_free_i32(addr);
             stride = nregs * (1 << size);
         }
     }
-    tcg_temp_free_i32(addr);
     if (rm != 15) {
         TCGv base;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes
  2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
  2011-03-15 16:26 ` [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single " Peter Maydell
  2011-03-15 16:26 ` [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
@ 2011-04-01 20:34 ` Aurelien Jarno
  2 siblings, 0 replies; 4+ messages in thread
From: Aurelien Jarno @ 2011-04-01 20:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, patches

On Tue, Mar 15, 2011 at 04:26:50PM +0000, Peter Maydell wrote:
> This patchset fixes various bugs in the implementation of the Neon
> instructions loading a single element to all lanes.
> 
> The primary bug is that VLD1's "single element to all lanes" form differs
> from those for VLD2, VLD3 and VLD4 in that bit 5 indicates whether the
> loaded element should be written to one or two Dregs, rather than being a
> register stride. (This is the issue addressed by Meego commit 6f3b4ee,
> although my choice of fix is different.)
> 
> We were also incorrectly UNDEFfing VLD4.32 with 16 byte alignment
> specifier, failing to UNDEF for invalid size and alignment combinations,
> and leaking a TCG temporary on the UNDEF code paths.
> 
> Tested via the usual random instruction set testing, and also with the
> valgrind 'neon64' test case.
> 
> Peter Maydell (2):
>   target-arm: Fix VLD of single element to all lanes
>   target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
> 
>  target-arm/translate.c |   92 ++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 65 insertions(+), 27 deletions(-)
> 

Thanks, both applied.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-04-01 20:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
2011-03-15 16:26 ` [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single " Peter Maydell
2011-03-15 16:26 ` [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
2011-04-01 20:34 ` [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).