* [PATCH 0/3] target/arm: Fix LDRD, STRD atomicity, fault behaviour
@ 2025-02-27 14:27 Peter Maydell
2025-02-27 14:27 ` [PATCH 1/3] target/arm: Correct LDRD atomicity and " Peter Maydell
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Peter Maydell @ 2025-02-27 14:27 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Our LDRD and STRD implementations have a couple of bugs:
* if the LDRD address is 4-aligned and the load crosses a page boundary
and the second load faults and the first load was to the base register
(as in cases like "ldrd r2, r3, [r2]", then we must not update the base
register before taking the fault. Our current implementation does
a sequence of "32 bit load; write to Rt; 32-bit load; write to Rt2"
so it mishandles this kind of insn.
* if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses.
This patchseries fixes both of these bugs, and then cleans up an
argument to some utility functions that we no longer need after
the first two changes.
Note for reviewers: please check that I got the MemOp right:
I believe that MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN expresses
"8-aligned addresses should be 64-bit atomic, 4-aligned addresses
should be 32-bit atomic, less-aligned addresses fault" but I'm
not sure if I've correctly understood MO_ATOM_SUBALIGN.
Thanks to Stu Grossman for reporting the page-boundary-crossing
fault bug, which prompted me to look a bit closer at the code and
notice that we weren't doing the atomicity right either.
thanks
-- PMM
Peter Maydell (3):
target/arm: Correct LDRD atomicity and fault behaviour
target/arm: Correct STRD atomicity
target/arm: Drop unused address_offset from op_addr_{rr,ri}_post()
target/arm/tcg/translate.c | 137 +++++++++++++++++++++++--------------
1 file changed, 84 insertions(+), 53 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour
2025-02-27 14:27 [PATCH 0/3] target/arm: Fix LDRD, STRD atomicity, fault behaviour Peter Maydell
@ 2025-02-27 14:27 ` Peter Maydell
2025-02-27 17:40 ` Richard Henderson
2025-02-27 14:27 ` [PATCH 2/3] target/arm: Correct STRD atomicity Peter Maydell
2025-02-27 14:27 ` [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post() Peter Maydell
2 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2025-02-27 14:27 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Our LDRD implementation is wrong in two respects:
* if the address is 4-aligned and the load crosses a page boundary
and the second load faults and the first load was to the
base register (as in cases like "ldrd r2, r3, [r2]", then we
must not update the base register before taking the fault
* if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses
Rewrite the handling of the loads in LDRD to use a single
tcg_gen_qemu_ld_i64() and split the result into the destination
registers. This allows us to get the atomicity requirements
right, and also implicitly means that we won't update the
base register too early for the page-crossing case.
Note that because we no longer increment 'addr' by 4 in the course of
performing the LDRD we must change the adjustment value we pass to
op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
subtract 4 to get the correct value to use if doing base register
writeback.
STRD has the same problem with not getting the atomicity right;
we will deal with that in the following commit.
Cc: qemu-stable@nongnu.org
Reported-by: Stu Grossman <stu.grossman@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/tcg/translate.c | 64 ++++++++++++++++++++++++--------------
1 file changed, 40 insertions(+), 24 deletions(-)
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index d8225b77c8c..e10a1240c17 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -5003,10 +5003,43 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
return true;
}
+static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
+{
+ /*
+ * LDRD is required to be an atomic 64-bit access if the
+ * address is 8-aligned, two atomic 32-bit accesses if
+ * it's only 4-aligned, and to give an alignemnt fault
+ * if it's not 4-aligned.
+ * Rt is always the word from the lower address, and Rt2 the
+ * data from the higher address, regardless of endianness.
+ * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
+ * so we don't get its SCTLR_B check, and instead do a 64-bit access
+ * using MO_BE if appropriate and then split the two halves.
+ *
+ * This also gives us the correct behaviour of not updating
+ * rt if the load of rt2 faults; this is required for cases
+ * like "ldrd r2, r3, [r2]" where rt is also the base register.
+ */
+ int mem_idx = get_mem_index(s);
+ MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
+ TCGv taddr = gen_aa32_addr(s, addr, opc);
+ TCGv_i64 t64 = tcg_temp_new_i64();
+ TCGv_i32 tmp = tcg_temp_new_i32();
+ TCGv_i32 tmp2 = tcg_temp_new_i32();
+
+ tcg_gen_qemu_ld_i64(t64, taddr, mem_idx, opc);
+ if (s->be_data == MO_BE) {
+ tcg_gen_extr_i64_i32(tmp2, tmp, t64);
+ } else {
+ tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+ }
+ store_reg(s, rt, tmp);
+ store_reg(s, rt2, tmp2);
+}
+
static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
{
- int mem_idx = get_mem_index(s);
- TCGv_i32 addr, tmp;
+ TCGv_i32 addr;
if (!ENABLE_ARCH_5TE) {
return false;
@@ -5017,18 +5050,10 @@ static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
}
addr = op_addr_rr_pre(s, a);
- tmp = tcg_temp_new_i32();
- gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
- store_reg(s, a->rt, tmp);
-
- tcg_gen_addi_i32(addr, addr, 4);
-
- tmp = tcg_temp_new_i32();
- gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
- store_reg(s, a->rt + 1, tmp);
+ do_ldrd_load(s, addr, a->rt, a->rt + 1);
/* LDRD w/ base writeback is undefined if the registers overlap. */
- op_addr_rr_post(s, a, addr, -4);
+ op_addr_rr_post(s, a, addr, 0);
return true;
}
@@ -5152,23 +5177,14 @@ static bool op_store_ri(DisasContext *s, arg_ldst_ri *a,
static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
{
- int mem_idx = get_mem_index(s);
- TCGv_i32 addr, tmp;
+ TCGv_i32 addr;
addr = op_addr_ri_pre(s, a);
- tmp = tcg_temp_new_i32();
- gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
- store_reg(s, a->rt, tmp);
-
- tcg_gen_addi_i32(addr, addr, 4);
-
- tmp = tcg_temp_new_i32();
- gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
- store_reg(s, rt2, tmp);
+ do_ldrd_load(s, addr, a->rt, rt2);
/* LDRD w/ base writeback is undefined if the registers overlap. */
- op_addr_ri_post(s, a, addr, -4);
+ op_addr_ri_post(s, a, addr, 0);
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] target/arm: Correct STRD atomicity
2025-02-27 14:27 [PATCH 0/3] target/arm: Fix LDRD, STRD atomicity, fault behaviour Peter Maydell
2025-02-27 14:27 ` [PATCH 1/3] target/arm: Correct LDRD atomicity and " Peter Maydell
@ 2025-02-27 14:27 ` Peter Maydell
2025-02-27 17:42 ` Richard Henderson
2025-02-27 14:27 ` [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post() Peter Maydell
2 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2025-02-27 14:27 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Our STRD implementation doesn't correctly implement the requirement:
* if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses
Rewrite the handling of STRD to use a single tcg_gen_qemu_st_i64()
of a value produced by concatenating the two 32 bit source registers.
This allows us to get the atomicity right.
As with the LDRD change, now that we don't update 'addr' in the
course of performing the store we need to adjust the offset
we pass to op_addr_ri_post() and op_addr_rr_post().
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/tcg/translate.c | 55 ++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 20 deletions(-)
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index e10a1240c17..2020d18f019 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -5057,10 +5057,38 @@ static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
return true;
}
+static void do_strd_store(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
+{
+ /*
+ * STRD is required to be an atomic 64-bit access if the
+ * address is 8-aligned, two atomic 32-bit accesses if
+ * it's only 4-aligned, and to give an alignemnt fault
+ * if it's not 4-aligned.
+ * Rt is always the word from the lower address, and Rt2 the
+ * data from the higher address, regardless of endianness.
+ * So (like gen_store_exclusive) we avoid gen_aa32_ld_i64()
+ * so we don't get its SCTLR_B check, and instead do a 64-bit access
+ * using MO_BE if appropriate, using a value constructed
+ * by putting the two halves together in the right order.
+ */
+ int mem_idx = get_mem_index(s);
+ MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
+ TCGv taddr = gen_aa32_addr(s, addr, opc);
+ TCGv_i32 t1 = load_reg(s, rt);
+ TCGv_i32 t2 = load_reg(s, rt2);
+ TCGv_i64 t64 = tcg_temp_new_i64();
+
+ if (s->be_data == MO_BE) {
+ tcg_gen_concat_i32_i64(t64, t2, t1);
+ } else {
+ tcg_gen_concat_i32_i64(t64, t1, t2);
+ }
+ tcg_gen_qemu_st_i64(t64, taddr, mem_idx, opc);
+}
+
static bool trans_STRD_rr(DisasContext *s, arg_ldst_rr *a)
{
- int mem_idx = get_mem_index(s);
- TCGv_i32 addr, tmp;
+ TCGv_i32 addr;
if (!ENABLE_ARCH_5TE) {
return false;
@@ -5071,15 +5099,9 @@ static bool trans_STRD_rr(DisasContext *s, arg_ldst_rr *a)
}
addr = op_addr_rr_pre(s, a);
- tmp = load_reg(s, a->rt);
- gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
+ do_strd_store(s, addr, a->rt, a->rt + 1);
- tcg_gen_addi_i32(addr, addr, 4);
-
- tmp = load_reg(s, a->rt + 1);
- gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-
- op_addr_rr_post(s, a, addr, -4);
+ op_addr_rr_post(s, a, addr, 0);
return true;
}
@@ -5207,20 +5229,13 @@ static bool trans_LDRD_ri_t32(DisasContext *s, arg_ldst_ri2 *a)
static bool op_strd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
{
- int mem_idx = get_mem_index(s);
- TCGv_i32 addr, tmp;
+ TCGv_i32 addr;
addr = op_addr_ri_pre(s, a);
- tmp = load_reg(s, a->rt);
- gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
+ do_strd_store(s, addr, a->rt, rt2);
- tcg_gen_addi_i32(addr, addr, 4);
-
- tmp = load_reg(s, rt2);
- gen_aa32_st_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN);
-
- op_addr_ri_post(s, a, addr, -4);
+ op_addr_ri_post(s, a, addr, 0);
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post()
2025-02-27 14:27 [PATCH 0/3] target/arm: Fix LDRD, STRD atomicity, fault behaviour Peter Maydell
2025-02-27 14:27 ` [PATCH 1/3] target/arm: Correct LDRD atomicity and " Peter Maydell
2025-02-27 14:27 ` [PATCH 2/3] target/arm: Correct STRD atomicity Peter Maydell
@ 2025-02-27 14:27 ` Peter Maydell
2025-02-27 17:43 ` Richard Henderson
2025-02-27 22:23 ` Philippe Mathieu-Daudé
2 siblings, 2 replies; 11+ messages in thread
From: Peter Maydell @ 2025-02-27 14:27 UTC (permalink / raw)
To: qemu-arm, qemu-devel
All the callers of op_addr_rr_post() and op_addr_ri_post() now pass in
zero for the address_offset, so we can remove that argument.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/tcg/translate.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 2020d18f019..bd3838d68e3 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -4941,7 +4941,7 @@ static TCGv_i32 op_addr_rr_pre(DisasContext *s, arg_ldst_rr *a)
}
static void op_addr_rr_post(DisasContext *s, arg_ldst_rr *a,
- TCGv_i32 addr, int address_offset)
+ TCGv_i32 addr)
{
if (!a->p) {
TCGv_i32 ofs = load_reg(s, a->rm);
@@ -4954,7 +4954,6 @@ static void op_addr_rr_post(DisasContext *s, arg_ldst_rr *a,
} else if (!a->w) {
return;
}
- tcg_gen_addi_i32(addr, addr, address_offset);
store_reg(s, a->rn, addr);
}
@@ -4974,7 +4973,7 @@ static bool op_load_rr(DisasContext *s, arg_ldst_rr *a,
* Perform base writeback before the loaded value to
* ensure correct behavior with overlapping index registers.
*/
- op_addr_rr_post(s, a, addr, 0);
+ op_addr_rr_post(s, a, addr);
store_reg_from_load(s, a->rt, tmp);
return true;
}
@@ -4999,7 +4998,7 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
gen_aa32_st_i32(s, tmp, addr, mem_idx, mop);
disas_set_da_iss(s, mop, issinfo);
- op_addr_rr_post(s, a, addr, 0);
+ op_addr_rr_post(s, a, addr);
return true;
}
@@ -5053,7 +5052,7 @@ static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a)
do_ldrd_load(s, addr, a->rt, a->rt + 1);
/* LDRD w/ base writeback is undefined if the registers overlap. */
- op_addr_rr_post(s, a, addr, 0);
+ op_addr_rr_post(s, a, addr);
return true;
}
@@ -5101,7 +5100,7 @@ static bool trans_STRD_rr(DisasContext *s, arg_ldst_rr *a)
do_strd_store(s, addr, a->rt, a->rt + 1);
- op_addr_rr_post(s, a, addr, 0);
+ op_addr_rr_post(s, a, addr);
return true;
}
@@ -5137,13 +5136,14 @@ static TCGv_i32 op_addr_ri_pre(DisasContext *s, arg_ldst_ri *a)
}
static void op_addr_ri_post(DisasContext *s, arg_ldst_ri *a,
- TCGv_i32 addr, int address_offset)
+ TCGv_i32 addr)
{
+ int address_offset = 0;
if (!a->p) {
if (a->u) {
- address_offset += a->imm;
+ address_offset = a->imm;
} else {
- address_offset -= a->imm;
+ address_offset = -a->imm;
}
} else if (!a->w) {
return;
@@ -5168,7 +5168,7 @@ static bool op_load_ri(DisasContext *s, arg_ldst_ri *a,
* Perform base writeback before the loaded value to
* ensure correct behavior with overlapping index registers.
*/
- op_addr_ri_post(s, a, addr, 0);
+ op_addr_ri_post(s, a, addr);
store_reg_from_load(s, a->rt, tmp);
return true;
}
@@ -5193,7 +5193,7 @@ static bool op_store_ri(DisasContext *s, arg_ldst_ri *a,
gen_aa32_st_i32(s, tmp, addr, mem_idx, mop);
disas_set_da_iss(s, mop, issinfo);
- op_addr_ri_post(s, a, addr, 0);
+ op_addr_ri_post(s, a, addr);
return true;
}
@@ -5206,7 +5206,7 @@ static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
do_ldrd_load(s, addr, a->rt, rt2);
/* LDRD w/ base writeback is undefined if the registers overlap. */
- op_addr_ri_post(s, a, addr, 0);
+ op_addr_ri_post(s, a, addr);
return true;
}
@@ -5235,7 +5235,7 @@ static bool op_strd_ri(DisasContext *s, arg_ldst_ri *a, int rt2)
do_strd_store(s, addr, a->rt, rt2);
- op_addr_ri_post(s, a, addr, 0);
+ op_addr_ri_post(s, a, addr);
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour
2025-02-27 14:27 ` [PATCH 1/3] target/arm: Correct LDRD atomicity and " Peter Maydell
@ 2025-02-27 17:40 ` Richard Henderson
2025-02-27 17:58 ` Peter Maydell
0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2025-02-27 17:40 UTC (permalink / raw)
To: qemu-devel
On 2/27/25 06:27, Peter Maydell wrote:
> Our LDRD implementation is wrong in two respects:
>
> * if the address is 4-aligned and the load crosses a page boundary
> and the second load faults and the first load was to the
> base register (as in cases like "ldrd r2, r3, [r2]", then we
> must not update the base register before taking the fault
> * if the address is 8-aligned the access must be a 64-bit
> single-copy atomic access, not two 32-bit accesses
>
> Rewrite the handling of the loads in LDRD to use a single
> tcg_gen_qemu_ld_i64() and split the result into the destination
> registers. This allows us to get the atomicity requirements
> right, and also implicitly means that we won't update the
> base register too early for the page-crossing case.
>
> Note that because we no longer increment 'addr' by 4 in the course of
> performing the LDRD we must change the adjustment value we pass to
> op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
> subtract 4 to get the correct value to use if doing base register
> writeback.
>
> STRD has the same problem with not getting the atomicity right;
> we will deal with that in the following commit.
>
> Cc: qemu-stable@nongnu.org
> Reported-by: Stu Grossman <stu.grossman@gmail.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> target/arm/tcg/translate.c | 64 ++++++++++++++++++++++++--------------
> 1 file changed, 40 insertions(+), 24 deletions(-)
>
> diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
> index d8225b77c8c..e10a1240c17 100644
> --- a/target/arm/tcg/translate.c
> +++ b/target/arm/tcg/translate.c
> @@ -5003,10 +5003,43 @@ static bool op_store_rr(DisasContext *s, arg_ldst_rr *a,
> return true;
> }
>
> +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
> +{
> + /*
> + * LDRD is required to be an atomic 64-bit access if the
> + * address is 8-aligned, two atomic 32-bit accesses if
> + * it's only 4-aligned, and to give an alignemnt fault
> + * if it's not 4-aligned.
> + * Rt is always the word from the lower address, and Rt2 the
> + * data from the higher address, regardless of endianness.
> + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
> + * so we don't get its SCTLR_B check, and instead do a 64-bit access
> + * using MO_BE if appropriate and then split the two halves.
> + *
> + * This also gives us the correct behaviour of not updating
> + * rt if the load of rt2 faults; this is required for cases
> + * like "ldrd r2, r3, [r2]" where rt is also the base register.
> + */
> + int mem_idx = get_mem_index(s);
> + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile.
Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?
Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
MO_ATOM_IFALIGN_PAIR.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] target/arm: Correct STRD atomicity
2025-02-27 14:27 ` [PATCH 2/3] target/arm: Correct STRD atomicity Peter Maydell
@ 2025-02-27 17:42 ` Richard Henderson
0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2025-02-27 17:42 UTC (permalink / raw)
To: qemu-devel
On 2/27/25 06:27, Peter Maydell wrote:
> Our STRD implementation doesn't correctly implement the requirement:
> * if the address is 8-aligned the access must be a 64-bit
> single-copy atomic access, not two 32-bit accesses
>
> Rewrite the handling of STRD to use a single tcg_gen_qemu_st_i64()
> of a value produced by concatenating the two 32 bit source registers.
> This allows us to get the atomicity right.
>
> As with the LDRD change, now that we don't update 'addr' in the
> course of performing the store we need to adjust the offset
> we pass to op_addr_ri_post() and op_addr_rr_post().
>
> Cc:qemu-stable@nongnu.org
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/tcg/translate.c | 55 ++++++++++++++++++++++++--------------
> 1 file changed, 35 insertions(+), 20 deletions(-)
Modulo the LPAE comment vs patch 1,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post()
2025-02-27 14:27 ` [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post() Peter Maydell
@ 2025-02-27 17:43 ` Richard Henderson
2025-02-27 22:23 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2025-02-27 17:43 UTC (permalink / raw)
To: qemu-devel
On 2/27/25 06:27, Peter Maydell wrote:
> All the callers of op_addr_rr_post() and op_addr_ri_post() now pass in
> zero for the address_offset, so we can remove that argument.
>
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> target/arm/tcg/translate.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour
2025-02-27 17:40 ` Richard Henderson
@ 2025-02-27 17:58 ` Peter Maydell
2025-02-28 0:18 ` Richard Henderson
0 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2025-02-27 17:58 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Thu, 27 Feb 2025 at 17:41, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/27/25 06:27, Peter Maydell wrote:
> > +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
> > +{
> > + /*
> > + * LDRD is required to be an atomic 64-bit access if the
> > + * address is 8-aligned, two atomic 32-bit accesses if
> > + * it's only 4-aligned, and to give an alignemnt fault
> > + * if it's not 4-aligned.
> > + * Rt is always the word from the lower address, and Rt2 the
> > + * data from the higher address, regardless of endianness.
> > + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
> > + * so we don't get its SCTLR_B check, and instead do a 64-bit access
> > + * using MO_BE if appropriate and then split the two halves.
> > + *
> > + * This also gives us the correct behaviour of not updating
> > + * rt if the load of rt2 faults; this is required for cases
> > + * like "ldrd r2, r3, [r2]" where rt is also the base register.
> > + */
> > + int mem_idx = get_mem_index(s);
> > + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
>
> The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile.
> Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?
>
> Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
> MO_ATOM_IFALIGN_PAIR.
Definitely worth a comment at minimum. Do we generate better
code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher
atomicity than the architecture mandates seems harmless.)
For the comment in memop.h that currently reads
* MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts
* by the alignment. E.g. if the address is 0 mod 4, then each
* 4-byte subobject is single-copy atomic.
* This is the atomicity e.g. of IBM Power.
maybe we could expand the e.g:
E.g if an 8-byte value is accessed at an address which is 0 mod 8,
then the whole 8-byte access is single-copy atomic; otherwise,
if it is accessed at 0 mod 4 then each 4-byte subobject is
single-copy atomic; otherwise if it is accessed at 0 mod 2
then the four 2-byte subobjects are single-copy atomic.
? I wasn't sure when reading what we currently have whether
it provided the 8-byte-aligned guarantee, rather than merely
the 4-byte-aligned one.
thanks
-- PMM
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post()
2025-02-27 14:27 ` [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post() Peter Maydell
2025-02-27 17:43 ` Richard Henderson
@ 2025-02-27 22:23 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 11+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-27 22:23 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 27/2/25 15:27, Peter Maydell wrote:
> All the callers of op_addr_rr_post() and op_addr_ri_post() now pass in
> zero for the address_offset, so we can remove that argument.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> target/arm/tcg/translate.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour
2025-02-27 17:58 ` Peter Maydell
@ 2025-02-28 0:18 ` Richard Henderson
2025-02-28 9:37 ` Peter Maydell
0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2025-02-28 0:18 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel
On 2/27/25 09:58, Peter Maydell wrote:
> On Thu, 27 Feb 2025 at 17:41, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 2/27/25 06:27, Peter Maydell wrote:
>>> +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
>>> +{
>>> + /*
>>> + * LDRD is required to be an atomic 64-bit access if the
>>> + * address is 8-aligned, two atomic 32-bit accesses if
>>> + * it's only 4-aligned, and to give an alignemnt fault
>>> + * if it's not 4-aligned.
>>> + * Rt is always the word from the lower address, and Rt2 the
>>> + * data from the higher address, regardless of endianness.
>>> + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
>>> + * so we don't get its SCTLR_B check, and instead do a 64-bit access
>>> + * using MO_BE if appropriate and then split the two halves.
>>> + *
>>> + * This also gives us the correct behaviour of not updating
>>> + * rt if the load of rt2 faults; this is required for cases
>>> + * like "ldrd r2, r3, [r2]" where rt is also the base register.
>>> + */
>>> + int mem_idx = get_mem_index(s);
>>> + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
>>
>> The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile.
>> Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?
>>
>> Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
>> MO_ATOM_IFALIGN_PAIR.
>
> Definitely worth a comment at minimum. Do we generate better
> code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher
> atomicity than the architecture mandates seems harmless.)
We could, but currently do not, generate better code for IFALIGN_PAIR for MO_64.
Currently the only place we take special care is for MO_128.
> For the comment in memop.h that currently reads
> * MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts
> * by the alignment. E.g. if the address is 0 mod 4, then each
> * 4-byte subobject is single-copy atomic.
> * This is the atomicity e.g. of IBM Power.
>
> maybe we could expand the e.g:
>
> E.g if an 8-byte value is accessed at an address which is 0 mod 8,
> then the whole 8-byte access is single-copy atomic; otherwise,
> if it is accessed at 0 mod 4 then each 4-byte subobject is
> single-copy atomic; otherwise if it is accessed at 0 mod 2
> then the four 2-byte subobjects are single-copy atomic.
>
> ?
Yes, that's correct.
> I wasn't sure when reading what we currently have whether
> it provided the 8-byte-aligned guarantee, rather than merely
> the 4-byte-aligned one.
I was trying to highlight the difference between SUBALIGN and IFALIGN, and perhaps didn't
do adequate job of it.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] target/arm: Correct LDRD atomicity and fault behaviour
2025-02-28 0:18 ` Richard Henderson
@ 2025-02-28 9:37 ` Peter Maydell
0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2025-02-28 9:37 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Fri, 28 Feb 2025 at 00:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/27/25 09:58, Peter Maydell wrote:
> > On Thu, 27 Feb 2025 at 17:41, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> On 2/27/25 06:27, Peter Maydell wrote:
> >>> +static void do_ldrd_load(DisasContext *s, TCGv_i32 addr, int rt, int rt2)
> >>> +{
> >>> + /*
> >>> + * LDRD is required to be an atomic 64-bit access if the
> >>> + * address is 8-aligned, two atomic 32-bit accesses if
> >>> + * it's only 4-aligned, and to give an alignemnt fault
> >>> + * if it's not 4-aligned.
> >>> + * Rt is always the word from the lower address, and Rt2 the
> >>> + * data from the higher address, regardless of endianness.
> >>> + * So (like gen_load_exclusive) we avoid gen_aa32_ld_i64()
> >>> + * so we don't get its SCTLR_B check, and instead do a 64-bit access
> >>> + * using MO_BE if appropriate and then split the two halves.
> >>> + *
> >>> + * This also gives us the correct behaviour of not updating
> >>> + * rt if the load of rt2 faults; this is required for cases
> >>> + * like "ldrd r2, r3, [r2]" where rt is also the base register.
> >>> + */
> >>> + int mem_idx = get_mem_index(s);
> >>> + MemOp opc = MO_64 | MO_ALIGN_4 | MO_ATOM_SUBALIGN | s->be_data;
> >>
> >> The 64-bit atomicity begins with armv7 + LPAE, and not present for any m-profile.
> >> Worth checking ARM_FEATURE_LPAE, or at least adding to the comment?
> >>
> >> Getting 2 x 4-byte atomicity, but not require 8-byte atomicity, would use
> >> MO_ATOM_IFALIGN_PAIR.
> >
> > Definitely worth a comment at minimum. Do we generate better
> > code for MO_ATOM_IFALIGN_PAIR ? (If not, then providing higher
> > atomicity than the architecture mandates seems harmless.)
>
> We could, but currently do not, generate better code for IFALIGN_PAIR for MO_64.
> Currently the only place we take special care is for MO_128.
OK, in that case I'll just add to the comment:
* For M-profile, and for A-profile before LPAE, the 64-bit
* atomicity is not required. We could model that using
* the looser MO_ATOM_IFALIGN_PAIR, but providing a higher
* level of atomicity than required is harmless (we would not
* currently generate better code for IFALIGN_PAIR here).
> > For the comment in memop.h that currently reads
> > * MO_ATOM_SUBALIGN: the operation is single-copy atomic by parts
> > * by the alignment. E.g. if the address is 0 mod 4, then each
> > * 4-byte subobject is single-copy atomic.
> > * This is the atomicity e.g. of IBM Power.
> >
> > maybe we could expand the e.g:
> >
> > E.g if an 8-byte value is accessed at an address which is 0 mod 8,
> > then the whole 8-byte access is single-copy atomic; otherwise,
> > if it is accessed at 0 mod 4 then each 4-byte subobject is
> > single-copy atomic; otherwise if it is accessed at 0 mod 2
> > then the four 2-byte subobjects are single-copy atomic.
> >
> > ?
>
> Yes, that's correct.
>
> > I wasn't sure when reading what we currently have whether
> > it provided the 8-byte-aligned guarantee, rather than merely
> > the 4-byte-aligned one.
>
> I was trying to highlight the difference between SUBALIGN and IFALIGN, and perhaps didn't
> do adequate job of it.
I reviewed the comment patch, and presumably wasn't confused
at the time because I'd just read through the corresponding
code changes :-)
I'll send in a patch adjusting the comment to expand the example.
thanks
-- PMM
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-02-28 9:37 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 14:27 [PATCH 0/3] target/arm: Fix LDRD, STRD atomicity, fault behaviour Peter Maydell
2025-02-27 14:27 ` [PATCH 1/3] target/arm: Correct LDRD atomicity and " Peter Maydell
2025-02-27 17:40 ` Richard Henderson
2025-02-27 17:58 ` Peter Maydell
2025-02-28 0:18 ` Richard Henderson
2025-02-28 9:37 ` Peter Maydell
2025-02-27 14:27 ` [PATCH 2/3] target/arm: Correct STRD atomicity Peter Maydell
2025-02-27 17:42 ` Richard Henderson
2025-02-27 14:27 ` [PATCH 3/3] target/arm: Drop unused address_offset from op_addr_{rr, ri}_post() Peter Maydell
2025-02-27 17:43 ` Richard Henderson
2025-02-27 22:23 ` Philippe Mathieu-Daudé
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).