* [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
@ 2016-12-02 17:34 Alex Bennée
2016-12-05 11:09 ` Alex Bennée
0 siblings, 1 reply; 8+ messages in thread
From: Alex Bennée @ 2016-12-02 17:34 UTC (permalink / raw)
To: stefanha, peter.maydell; +Cc: qemu-devel, cota, Alex Bennée, open list:ARM
While testing rth's latest TCG patches with risu I found ldaxp was
broken. Investigating further I found it was broken by 1dd089d0 when
the cmpxchg atomic work was merged. As part of that change the code
attempted to be clever by doing a single 64 bit load and then shuffle
the data around to set the two 32 bit registers.
As I couldn't quite follow the endian magic I've simply partially
reverted the change to the original code gen_load_exclusive code. This
doesn't affect the cmpxchg functionality as that is all done on in
gen_store_exclusive part which is untouched.
I've also restored the comment that was removed (with a slight tweak
to mention cmpxchg).
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
target-arm/translate-a64.c | 42 +++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 23 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index de48747..6dc27a6 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1839,41 +1839,37 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
}
}
+/*
+ * Load/Store exclusive instructions are implemented by remembering
+ * the value/address loaded, and seeing if these are the same
+ * when the store is performed. This is not actually the architecturally
+ * mandated semantics, but it works for typical guest code sequences
+ * and avoids having to monitor regular stores.
+ *
+ * The store exclusive uses the atomic cmpxchg primitives to avoid
+ * races in multi-threaded linux-user and when MTTCG softmmu is
+ * enabled.
+ */
static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
TCGv_i64 addr, int size, bool is_pair)
{
TCGv_i64 tmp = tcg_temp_new_i64();
- TCGMemOp be = s->be_data;
+ TCGMemOp memop = s->be_data + size;
g_assert(size <= 3);
+ tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop);
+
if (is_pair) {
+ TCGv_i64 addr2 = tcg_temp_new_i64();
TCGv_i64 hitmp = tcg_temp_new_i64();
- if (size == 3) {
- TCGv_i64 addr2 = tcg_temp_new_i64();
-
- tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
- MO_64 | MO_ALIGN_16 | be);
- tcg_gen_addi_i64(addr2, addr, 8);
- tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s),
- MO_64 | MO_ALIGN | be);
- tcg_temp_free_i64(addr2);
- } else {
- g_assert(size == 2);
- tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
- MO_64 | MO_ALIGN | be);
- if (be == MO_LE) {
- tcg_gen_extr32_i64(tmp, hitmp, tmp);
- } else {
- tcg_gen_extr32_i64(hitmp, tmp, tmp);
- }
- }
-
+ g_assert(size >= 2);
+ tcg_gen_addi_i64(addr2, addr, 1 << size);
+ tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
+ tcg_temp_free_i64(addr2);
tcg_gen_mov_i64(cpu_exclusive_high, hitmp);
tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
tcg_temp_free_i64(hitmp);
- } else {
- tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be);
}
tcg_gen_mov_i64(cpu_exclusive_val, tmp);
--
2.10.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-02 17:34 [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive Alex Bennée
@ 2016-12-05 11:09 ` Alex Bennée
2016-12-05 15:42 ` Richard Henderson
2016-12-05 17:22 ` Richard Henderson
0 siblings, 2 replies; 8+ messages in thread
From: Alex Bennée @ 2016-12-05 11:09 UTC (permalink / raw)
To: stefanha, peter.maydell
Cc: qemu-devel, cota, open list:ARM, Paolo Bonzini, Richard Henderson
Alex Bennée <alex.bennee@linaro.org> writes:
> While testing rth's latest TCG patches with risu I found ldaxp was
> broken. Investigating further I found it was broken by 1dd089d0 when
> the cmpxchg atomic work was merged.
CC'ing Paolo/Richard
Do you guys have any final TCG fixes planned for 2.8 that can take this
fix for the ldaxp regression?
> As part of that change the code
> attempted to be clever by doing a single 64 bit load and then shuffle
> the data around to set the two 32 bit registers.
>
> As I couldn't quite follow the endian magic I've simply partially
> reverted the change to the original code gen_load_exclusive code. This
> doesn't affect the cmpxchg functionality as that is all done on in
> gen_store_exclusive part which is untouched.
>
> I've also restored the comment that was removed (with a slight tweak
> to mention cmpxchg).
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> target-arm/translate-a64.c | 42 +++++++++++++++++++-----------------------
> 1 file changed, 19 insertions(+), 23 deletions(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index de48747..6dc27a6 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -1839,41 +1839,37 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
> }
> }
>
> +/*
> + * Load/Store exclusive instructions are implemented by remembering
> + * the value/address loaded, and seeing if these are the same
> + * when the store is performed. This is not actually the architecturally
> + * mandated semantics, but it works for typical guest code sequences
> + * and avoids having to monitor regular stores.
> + *
> + * The store exclusive uses the atomic cmpxchg primitives to avoid
> + * races in multi-threaded linux-user and when MTTCG softmmu is
> + * enabled.
> + */
> static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
> TCGv_i64 addr, int size, bool is_pair)
> {
> TCGv_i64 tmp = tcg_temp_new_i64();
> - TCGMemOp be = s->be_data;
> + TCGMemOp memop = s->be_data + size;
>
> g_assert(size <= 3);
> + tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop);
> +
> if (is_pair) {
> + TCGv_i64 addr2 = tcg_temp_new_i64();
> TCGv_i64 hitmp = tcg_temp_new_i64();
>
> - if (size == 3) {
> - TCGv_i64 addr2 = tcg_temp_new_i64();
> -
> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
> - MO_64 | MO_ALIGN_16 | be);
> - tcg_gen_addi_i64(addr2, addr, 8);
> - tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s),
> - MO_64 | MO_ALIGN | be);
> - tcg_temp_free_i64(addr2);
> - } else {
> - g_assert(size == 2);
> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
> - MO_64 | MO_ALIGN | be);
> - if (be == MO_LE) {
> - tcg_gen_extr32_i64(tmp, hitmp, tmp);
> - } else {
> - tcg_gen_extr32_i64(hitmp, tmp, tmp);
> - }
> - }
> -
> + g_assert(size >= 2);
> + tcg_gen_addi_i64(addr2, addr, 1 << size);
> + tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
> + tcg_temp_free_i64(addr2);
> tcg_gen_mov_i64(cpu_exclusive_high, hitmp);
> tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
> tcg_temp_free_i64(hitmp);
> - } else {
> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be);
> }
>
> tcg_gen_mov_i64(cpu_exclusive_val, tmp);
--
Alex Bennée
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-05 11:09 ` Alex Bennée
@ 2016-12-05 15:42 ` Richard Henderson
2016-12-05 15:55 ` Peter Maydell
2016-12-05 17:04 ` Alex Bennée
2016-12-05 17:22 ` Richard Henderson
1 sibling, 2 replies; 8+ messages in thread
From: Richard Henderson @ 2016-12-05 15:42 UTC (permalink / raw)
To: Alex Bennée, stefanha, peter.maydell
Cc: qemu-devel, cota, open list:ARM, Paolo Bonzini
On 12/05/2016 03:09 AM, Alex Bennée wrote:
>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> While testing rth's latest TCG patches with risu I found ldaxp was
>> broken. Investigating further I found it was broken by 1dd089d0 when
>> the cmpxchg atomic work was merged.
>
> CC'ing Paolo/Richard
>
> Do you guys have any final TCG fixes planned for 2.8 that can take this
> fix for the ldaxp regression?
I don't have any pending patchs for 2.8, no.
>> As part of that change the code
>> attempted to be clever by doing a single 64 bit load and then shuffle
>> the data around to set the two 32 bit registers.
It's not trying to be "clever", it's trying to be correct, giving an atomic
64-bit load.
The fact that we didn't attempt an atomic 128-bit load here for the 64-bit pair
is a bug. We have support for that in helper_atomic_ldo_le_mmu; I'm not sure
why I didn't use that here.
>> As I couldn't quite follow the endian magic I've simply partially
>> reverted the change to the original code gen_load_exclusive code. This
>> doesn't affect the cmpxchg functionality as that is all done on in
>> gen_store_exclusive part which is untouched.
We'll have to fix it properly eventually. But perhaps 2.9 is soon enough,
since that's when mttcg will go in.
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-05 15:42 ` Richard Henderson
@ 2016-12-05 15:55 ` Peter Maydell
2016-12-05 17:04 ` Alex Bennée
1 sibling, 0 replies; 8+ messages in thread
From: Peter Maydell @ 2016-12-05 15:55 UTC (permalink / raw)
To: Richard Henderson
Cc: Alex Bennée, Stefan Hajnoczi, QEMU Developers,
Emilio G. Cota, open list:ARM, Paolo Bonzini
On 5 December 2016 at 15:42, Richard Henderson <rth@twiddle.net> wrote:
> We'll have to fix it properly eventually. But perhaps 2.9 is soon enough,
> since that's when mttcg will go in.
Could you provide a reviewed-by (or an explicit nak) for Alex's
patch, please? I didn't look at this area of the code when it
went in so I'm not really in position to review it even if it
goes via my tree.
thanks
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-05 15:42 ` Richard Henderson
2016-12-05 15:55 ` Peter Maydell
@ 2016-12-05 17:04 ` Alex Bennée
2016-12-05 17:15 ` Richard Henderson
1 sibling, 1 reply; 8+ messages in thread
From: Alex Bennée @ 2016-12-05 17:04 UTC (permalink / raw)
To: Richard Henderson
Cc: stefanha, peter.maydell, qemu-devel, cota, open list:ARM,
Paolo Bonzini
Richard Henderson <rth@twiddle.net> writes:
> On 12/05/2016 03:09 AM, Alex Bennée wrote:
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>>> While testing rth's latest TCG patches with risu I found ldaxp was
>>> broken. Investigating further I found it was broken by 1dd089d0 when
>>> the cmpxchg atomic work was merged.
>>
>> CC'ing Paolo/Richard
>>
>> Do you guys have any final TCG fixes planned for 2.8 that can take this
>> fix for the ldaxp regression?
>
> I don't have any pending patchs for 2.8, no.
>
>>> As part of that change the code
>>> attempted to be clever by doing a single 64 bit load and then shuffle
>>> the data around to set the two 32 bit registers.
>
> It's not trying to be "clever", it's trying to be correct, giving an atomic
> 64-bit load.
Ahh right I see. What happens if the backend is 32bit, will it issue two
loads anyway?
>
> The fact that we didn't attempt an atomic 128-bit load here for the 64-bit pair
> is a bug. We have support for that in helper_atomic_ldo_le_mmu; I'm not sure
> why I didn't use that here.
>
>>> As I couldn't quite follow the endian magic I've simply partially
>>> reverted the change to the original code gen_load_exclusive code. This
>>> doesn't affect the cmpxchg functionality as that is all done on in
>>> gen_store_exclusive part which is untouched.
>
> We'll have to fix it properly eventually. But perhaps 2.9 is soon enough,
> since that's when mttcg will go in.
I guess the easiest way would be to punt these paired loads to a helper?
However regardless of the atomicity the actual values loaded into
the registers are wrong so that behaviour is a regression vs 2.7.
I don't know how often these load-exclusive paired operations are used
in real code but I think we should at least fix it so the values are
loaded properly for 2.8 and do the proper atomic fix for 2.9.
>
>
> r~
--
Alex Bennée
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-05 17:04 ` Alex Bennée
@ 2016-12-05 17:15 ` Richard Henderson
0 siblings, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2016-12-05 17:15 UTC (permalink / raw)
To: Alex Bennée
Cc: stefanha, peter.maydell, qemu-devel, cota, open list:ARM,
Paolo Bonzini
On 12/05/2016 09:04 AM, Alex Bennée wrote:
>> It's not trying to be "clever", it's trying to be correct, giving an atomic
>> 64-bit load.
>
> Ahh right I see. What happens if the backend is 32bit, will it issue two
> loads anyway?
Yes. I did bring this up when the atomics patch set went in.
In principal there's no reason we can't handle this like the 128-bit path, with
a special helper for a 32-bit host using __atomic_load_64 if available, which
it would be for i686 (via fpu) and armv7 (via ldrexd), or
gen_helper_exit_atomic if not.
But it's nasty and slow, and work that's unnecessary if we simply decide that
32-bit hosts cannot run mttcg.
> I don't know how often these load-exclusive paired operations are used
> in real code but I think we should at least fix it so the values are
> loaded properly for 2.8 and do the proper atomic fix for 2.9.
Yep.
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-05 11:09 ` Alex Bennée
2016-12-05 15:42 ` Richard Henderson
@ 2016-12-05 17:22 ` Richard Henderson
2016-12-05 18:00 ` Peter Maydell
1 sibling, 1 reply; 8+ messages in thread
From: Richard Henderson @ 2016-12-05 17:22 UTC (permalink / raw)
To: Alex Bennée, stefanha, peter.maydell
Cc: Paolo Bonzini, cota, qemu-devel, open list:ARM
On 12/05/2016 03:09 AM, Alex Bennée wrote:
>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> While testing rth's latest TCG patches with risu I found ldaxp was
>> broken. Investigating further I found it was broken by 1dd089d0 when
>> the cmpxchg atomic work was merged.
>
> CC'ing Paolo/Richard
>
> Do you guys have any final TCG fixes planned for 2.8 that can take this
> fix for the ldaxp regression?
>
>> As part of that change the code
>> attempted to be clever by doing a single 64 bit load and then shuffle
>> the data around to set the two 32 bit registers.
>>
>> As I couldn't quite follow the endian magic I've simply partially
>> reverted the change to the original code gen_load_exclusive code. This
>> doesn't affect the cmpxchg functionality as that is all done on in
>> gen_store_exclusive part which is untouched.
>>
>> I've also restored the comment that was removed (with a slight tweak
>> to mention cmpxchg).
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>> target-arm/translate-a64.c | 42 +++++++++++++++++++-----------------------
>> 1 file changed, 19 insertions(+), 23 deletions(-)
>>
>> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
>> index de48747..6dc27a6 100644
>> --- a/target-arm/translate-a64.c
>> +++ b/target-arm/translate-a64.c
>> @@ -1839,41 +1839,37 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
>> }
>> }
>>
>> +/*
>> + * Load/Store exclusive instructions are implemented by remembering
>> + * the value/address loaded, and seeing if these are the same
>> + * when the store is performed. This is not actually the architecturally
>> + * mandated semantics, but it works for typical guest code sequences
>> + * and avoids having to monitor regular stores.
>> + *
>> + * The store exclusive uses the atomic cmpxchg primitives to avoid
>> + * races in multi-threaded linux-user and when MTTCG softmmu is
>> + * enabled.
>> + */
>> static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>> TCGv_i64 addr, int size, bool is_pair)
>> {
>> TCGv_i64 tmp = tcg_temp_new_i64();
>> - TCGMemOp be = s->be_data;
>> + TCGMemOp memop = s->be_data + size;
>>
>> g_assert(size <= 3);
>> + tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), memop);
>> +
>> if (is_pair) {
>> + TCGv_i64 addr2 = tcg_temp_new_i64();
>> TCGv_i64 hitmp = tcg_temp_new_i64();
>>
>> - if (size == 3) {
>> - TCGv_i64 addr2 = tcg_temp_new_i64();
>> -
>> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
>> - MO_64 | MO_ALIGN_16 | be);
>> - tcg_gen_addi_i64(addr2, addr, 8);
>> - tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s),
>> - MO_64 | MO_ALIGN | be);
>> - tcg_temp_free_i64(addr2);
>> - } else {
>> - g_assert(size == 2);
>> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s),
>> - MO_64 | MO_ALIGN | be);
>> - if (be == MO_LE) {
>> - tcg_gen_extr32_i64(tmp, hitmp, tmp);
>> - } else {
>> - tcg_gen_extr32_i64(hitmp, tmp, tmp);
>> - }
>> - }
>> -
>> + g_assert(size >= 2);
>> + tcg_gen_addi_i64(addr2, addr, 1 << size);
>> + tcg_gen_qemu_ld_i64(hitmp, addr2, get_mem_index(s), memop);
>> + tcg_temp_free_i64(addr2);
>> tcg_gen_mov_i64(cpu_exclusive_high, hitmp);
>> tcg_gen_mov_i64(cpu_reg(s, rt2), hitmp);
>> tcg_temp_free_i64(hitmp);
>> - } else {
>> - tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), size | MO_ALIGN | be);
>> }
>>
>> tcg_gen_mov_i64(cpu_exclusive_val, tmp);
Acked-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive
2016-12-05 17:22 ` Richard Henderson
@ 2016-12-05 18:00 ` Peter Maydell
0 siblings, 0 replies; 8+ messages in thread
From: Peter Maydell @ 2016-12-05 18:00 UTC (permalink / raw)
To: Richard Henderson
Cc: Alex Bennée, Stefan Hajnoczi, Paolo Bonzini, Emilio G. Cota,
QEMU Developers, open list:ARM
On 5 December 2016 at 17:22, Richard Henderson <rth@twiddle.net> wrote:
> Acked-by: Richard Henderson <rth@twiddle.net>
Thanks. I've put this into a one-patch pullrequest which I've just
sent out and which will hopefully make -rc3.
thanks
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-12-05 18:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-02 17:34 [Qemu-devel] [PATCH for-2.8] target-arm/translate-a64: fix gen_load_exclusive Alex Bennée
2016-12-05 11:09 ` Alex Bennée
2016-12-05 15:42 ` Richard Henderson
2016-12-05 15:55 ` Peter Maydell
2016-12-05 17:04 ` Alex Bennée
2016-12-05 17:15 ` Richard Henderson
2016-12-05 17:22 ` Richard Henderson
2016-12-05 18:00 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).