* [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single element to all lanes
2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
@ 2011-03-15 16:26 ` Peter Maydell
2011-03-15 16:26 ` [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
2011-04-01 20:34 ` [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Aurelien Jarno
2 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2011-03-15 16:26 UTC (permalink / raw)
To: qemu-devel; +Cc: patches
Fix several bugs in VLD of single element to all lanes:
The "single element to all lanes" form of VLD1 differs from those for
VLD2, VLD3 and VLD4 in that bit 5 indicates whether the loaded element
should be written to one or two Dregs (rather than being a register
stride). Handle this by special-casing VLD1 rather than trying to
have one loop which deals with both VLD1 and 2/3/4.
Handle VLD4.32 with 16 byte alignment specified, rather than UNDEFfing.
UNDEF for the invalid size and alignment combinations.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate.c | 84 +++++++++++++++++++++++++++++++++--------------
1 files changed, 59 insertions(+), 25 deletions(-)
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 062de5e..bef807d 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -2648,6 +2648,28 @@ static void gen_neon_dup_high16(TCGv var)
tcg_temp_free_i32(tmp);
}
+static TCGv gen_load_and_replicate(DisasContext *s, TCGv addr, int size)
+{
+ /* Load a single Neon element and replicate into a 32 bit TCG reg */
+ TCGv tmp;
+ switch (size) {
+ case 0:
+ tmp = gen_ld8u(addr, IS_USER(s));
+ gen_neon_dup_u8(tmp, 0);
+ break;
+ case 1:
+ tmp = gen_ld16u(addr, IS_USER(s));
+ gen_neon_dup_low16(tmp);
+ break;
+ case 2:
+ tmp = gen_ld32(addr, IS_USER(s));
+ break;
+ default: /* Avoid compiler warnings. */
+ abort();
+ }
+ return tmp;
+}
+
/* Disassemble a VFP instruction. Returns nonzero if an error occured
(ie. an undefined instruction). */
static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
@@ -3890,36 +3912,48 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
size = (insn >> 10) & 3;
if (size == 3) {
/* Load single element to all lanes. */
- if (!load)
+ int a = (insn >> 4) & 1;
+ if (!load) {
return 1;
+ }
size = (insn >> 6) & 3;
nregs = ((insn >> 8) & 3) + 1;
- stride = (insn & (1 << 5)) ? 2 : 1;
- load_reg_var(s, addr, rn);
- for (reg = 0; reg < nregs; reg++) {
- switch (size) {
- case 0:
- tmp = gen_ld8u(addr, IS_USER(s));
- gen_neon_dup_u8(tmp, 0);
- break;
- case 1:
- tmp = gen_ld16u(addr, IS_USER(s));
- gen_neon_dup_low16(tmp);
- break;
- case 2:
- tmp = gen_ld32(addr, IS_USER(s));
- break;
- case 3:
+
+ if (size == 3) {
+ if (nregs != 4 || a == 0) {
return 1;
- default: /* Avoid compiler warnings. */
- abort();
}
- tcg_gen_addi_i32(addr, addr, 1 << size);
- tmp2 = tcg_temp_new_i32();
- tcg_gen_mov_i32(tmp2, tmp);
- neon_store_reg(rd, 0, tmp2);
- neon_store_reg(rd, 1, tmp);
- rd += stride;
+ /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
+ size = 2;
+ }
+ if (nregs == 1 && a == 1 && size == 0) {
+ return 1;
+ }
+ if (nregs == 3 && a == 1) {
+ return 1;
+ }
+ load_reg_var(s, addr, rn);
+ if (nregs == 1) {
+ /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
+ tmp = gen_load_and_replicate(s, addr, size);
+ tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
+ tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
+ if (insn & (1 << 5)) {
+ tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 0));
+ tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 1));
+ }
+ tcg_temp_free_i32(tmp);
+ } else {
+ /* VLD2/3/4 to all lanes: bit 5 indicates register stride */
+ stride = (insn & (1 << 5)) ? 2 : 1;
+ for (reg = 0; reg < nregs; reg++) {
+ tmp = gen_load_and_replicate(s, addr, size);
+ tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
+ tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
+ tcg_temp_free_i32(tmp);
+ tcg_gen_addi_i32(addr, addr, 1 << size);
+ rd += stride;
+ }
}
stride = (1 << size) * nregs;
} else {
--
1.7.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
2011-03-15 16:26 ` [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single " Peter Maydell
@ 2011-03-15 16:26 ` Peter Maydell
2011-04-01 20:34 ` [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Aurelien Jarno
2 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2011-03-15 16:26 UTC (permalink / raw)
To: qemu-devel; +Cc: patches
Move the allocation and freeing of the TCG temp used for the address for
Neon load/store instructions so that we don't allocate the temporary
until we've done enough decoding to know that the instruction is not
an UNDEF pattern; this avoids leaking the TCG temp in these cases.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate.c b/target-arm/translate.c
index bef807d..5e87740 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3810,7 +3810,6 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
rn = (insn >> 16) & 0xf;
rm = insn & 0xf;
load = (insn & (1 << 21)) != 0;
- addr = tcg_temp_new_i32();
if ((insn & (1 << 23)) == 0) {
/* Load store all elements. */
op = (insn >> 8) & 0xf;
@@ -3822,6 +3821,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
spacing = neon_ls_element_type[op].spacing;
if (size == 3 && (interleave | spacing) != 1)
return 1;
+ addr = tcg_temp_new_i32();
load_reg_var(s, addr, rn);
stride = (1 << size) * interleave;
for (reg = 0; reg < nregs; reg++) {
@@ -3907,6 +3907,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
}
rd += spacing;
}
+ tcg_temp_free_i32(addr);
stride = nregs * 8;
} else {
size = (insn >> 10) & 3;
@@ -3932,6 +3933,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
if (nregs == 3 && a == 1) {
return 1;
}
+ addr = tcg_temp_new_i32();
load_reg_var(s, addr, rn);
if (nregs == 1) {
/* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
@@ -3955,6 +3957,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
rd += stride;
}
}
+ tcg_temp_free_i32(addr);
stride = (1 << size) * nregs;
} else {
/* Single element. */
@@ -3976,6 +3979,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
abort();
}
nregs = ((insn >> 8) & 3) + 1;
+ addr = tcg_temp_new_i32();
load_reg_var(s, addr, rn);
for (reg = 0; reg < nregs; reg++) {
if (load) {
@@ -4017,10 +4021,10 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
rd += stride;
tcg_gen_addi_i32(addr, addr, 1 << size);
}
+ tcg_temp_free_i32(addr);
stride = nregs * (1 << size);
}
}
- tcg_temp_free_i32(addr);
if (rm != 15) {
TCGv base;
--
1.7.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes
2011-03-15 16:26 [Qemu-devel] [PATCH 0/2] ARM: fix VLD of one element to all lanes Peter Maydell
2011-03-15 16:26 ` [Qemu-devel] [PATCH 1/2] target-arm: Fix VLD of single " Peter Maydell
2011-03-15 16:26 ` [Qemu-devel] [PATCH 2/2] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
@ 2011-04-01 20:34 ` Aurelien Jarno
2 siblings, 0 replies; 4+ messages in thread
From: Aurelien Jarno @ 2011-04-01 20:34 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, patches
On Tue, Mar 15, 2011 at 04:26:50PM +0000, Peter Maydell wrote:
> This patchset fixes various bugs in the implementation of the Neon
> instructions loading a single element to all lanes.
>
> The primary bug is that VLD1's "single element to all lanes" form differs
> from those for VLD2, VLD3 and VLD4 in that bit 5 indicates whether the
> loaded element should be written to one or two Dregs, rather than being a
> register stride. (This is the issue addressed by Meego commit 6f3b4ee,
> although my choice of fix is different.)
>
> We were also incorrectly UNDEFfing VLD4.32 with 16 byte alignment
> specifier, failing to UNDEF for invalid size and alignment combinations,
> and leaking a TCG temporary on the UNDEF code paths.
>
> Tested via the usual random instruction set testing, and also with the
> valgrind 'neon64' test case.
>
> Peter Maydell (2):
> target-arm: Fix VLD of single element to all lanes
> target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
>
> target-arm/translate.c | 92 ++++++++++++++++++++++++++++++++++--------------
> 1 files changed, 65 insertions(+), 27 deletions(-)
>
Thanks, both applied.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 4+ messages in thread