[PATCH v1 00/19] target/arm: Implement FEAT

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 00/19] target/arm: Implement FEAT_LSE2
@ 2023-02-16  3:08 Richard Henderson
  2023-02-16  3:08 ` [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits Richard Henderson
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Based-on: 20230216025739.1211680-1-richard.henderson@linaro.org
("[PATCH v2 00/30] tcg: Improve atomicity support")

Testing has not been extensive, but it does boot and run stuff.
Suggestions for actually testing atomicity solicited.
I would imagine it would have to involve -semihosting...


r~


Richard Henderson (19):
  target/arm: Make cpu_exclusive_high hold the high bits
  target/arm: Use tcg_gen_qemu_ld_i128 for LDXP
  target/arm: Use tcg_gen_qemu_{st,ld}_i128 for do_fp_{st,ld}
  target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G
  target/arm: Use tcg_gen_qemu_{ld,st}_i128 in gen_sve_{ld,st}r
  target/arm: Sink gen_mte_check1 into load/store_exclusive
  target/arm: Add feature test for FEAT_LSE2
  target/arm: Add atom_data to DisasContext
  target/arm: Load/store integer pair with one tcg operation
  target/arm: Hoist finalize_memop out of do_gpr_{ld,st}
  target/arm: Hoist finalize_memop out of do_fp_{ld,st}
  target/arm: Pass memop to gen_mte_check1*
  target/arm: Pass single_memop to gen_mte_checkN
  target/arm: Check alignment in helper_mte_check
  target/arm: Add SCTLR.nAA to TBFLAG_A64
  target/arm: Relax ordered/atomic alignment checks for LSE2
  target/arm: Move mte check for store-exclusive
  test/tcg/multiarch: Adjust sigbus.c
  target/arm: Enable FEAT_LSE2 for -cpu max

 docs/system/arm/emulation.rst |   1 +
 target/arm/cpu.h              |   8 +-
 target/arm/helper-a64.h       |   3 +
 target/arm/internals.h        |   3 +-
 target/arm/translate-a64.h    |   4 +-
 target/arm/translate.h        |  16 +-
 target/arm/cpu64.c            |   1 +
 target/arm/helper-a64.c       |   7 +
 target/arm/helper.c           |   6 +
 target/arm/mte_helper.c       |  18 ++
 target/arm/translate-a64.c    | 521 ++++++++++++++++++++++------------
 target/arm/translate-sve.c    | 118 ++++++--
 target/arm/translate.c        |   1 +
 tests/tcg/aarch64/mte-7.c     |   3 +-
 tests/tcg/multiarch/sigbus.c  |  13 +-
 15 files changed, 498 insertions(+), 225 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:14   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 02/19] target/arm: Use tcg_gen_qemu_ld_i128 for LDXP Richard Henderson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

We currently treat cpu_exclusive_high as containing the
second word of LDXP, even though that word is not "high"
in big-endian mode.  Swap things around so that it is.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 54 ++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index da9f877476..78a2141224 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2547,17 +2547,28 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
         } else {
             /* The pair must be single-copy atomic for *each* doubleword, not
                the entire quadword, however it must be quadword aligned.  */
+            TCGv_i64 t0 = tcg_temp_new_i64();
+            TCGv_i64 t1 = tcg_temp_new_i64();
+
             memop |= MO_64;
-            tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx,
-                                memop | MO_ALIGN_16);
+            tcg_gen_qemu_ld_i64(t0, addr, idx, memop | MO_ALIGN_16);
 
-            TCGv_i64 addr2 = tcg_temp_new_i64();
-            tcg_gen_addi_i64(addr2, addr, 8);
-            tcg_gen_qemu_ld_i64(cpu_exclusive_high, addr2, idx, memop);
-            tcg_temp_free_i64(addr2);
+            tcg_gen_addi_i64(t1, addr, 8);
+            tcg_gen_qemu_ld_i64(t1, t1, idx, memop);
 
-            tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val);
-            tcg_gen_mov_i64(cpu_reg(s, rt2), cpu_exclusive_high);
+            if (s->be_data == MO_LE) {
+                tcg_gen_mov_i64(cpu_exclusive_val, t0);
+                tcg_gen_mov_i64(cpu_exclusive_high, t1);
+            } else {
+                tcg_gen_mov_i64(cpu_exclusive_high, t0);
+                tcg_gen_mov_i64(cpu_exclusive_val, t1);
+            }
+
+            tcg_gen_mov_i64(cpu_reg(s, rt), t0);
+            tcg_gen_mov_i64(cpu_reg(s, rt2), t1);
+
+            tcg_temp_free_i64(t0);
+            tcg_temp_free_i64(t1);
         }
     } else {
         memop |= size | MO_ALIGN;
@@ -2604,36 +2615,29 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
         } else {
             TCGv_i128 t16 = tcg_temp_new_i128();
             TCGv_i128 c16 = tcg_temp_new_i128();
-            TCGv_i64 a, b;
+            TCGv_i64 lo, hi;
 
             if (s->be_data == MO_LE) {
                 tcg_gen_concat_i64_i128(t16, cpu_reg(s, rt), cpu_reg(s, rt2));
-                tcg_gen_concat_i64_i128(c16, cpu_exclusive_val,
-                                        cpu_exclusive_high);
             } else {
                 tcg_gen_concat_i64_i128(t16, cpu_reg(s, rt2), cpu_reg(s, rt));
-                tcg_gen_concat_i64_i128(c16, cpu_exclusive_high,
-                                        cpu_exclusive_val);
             }
+            tcg_gen_concat_i64_i128(c16, cpu_exclusive_val, cpu_exclusive_high);
 
             tcg_gen_atomic_cmpxchg_i128(t16, cpu_exclusive_addr, c16, t16,
                                         get_mem_index(s),
                                         MO_128 | MO_ALIGN | s->be_data);
             tcg_temp_free_i128(c16);
 
-            a = tcg_temp_new_i64();
-            b = tcg_temp_new_i64();
-            if (s->be_data == MO_LE) {
-                tcg_gen_extr_i128_i64(a, b, t16);
-            } else {
-                tcg_gen_extr_i128_i64(b, a, t16);
-            }
+            lo = tcg_temp_new_i64();
+            hi = tcg_temp_new_i64();
+            tcg_gen_extr_i128_i64(lo, hi, t16);
 
-            tcg_gen_xor_i64(a, a, cpu_exclusive_val);
-            tcg_gen_xor_i64(b, b, cpu_exclusive_high);
-            tcg_gen_or_i64(tmp, a, b);
-            tcg_temp_free_i64(a);
-            tcg_temp_free_i64(b);
+            tcg_gen_xor_i64(lo, lo, cpu_exclusive_val);
+            tcg_gen_xor_i64(hi, hi, cpu_exclusive_high);
+            tcg_gen_or_i64(tmp, lo, hi);
+            tcg_temp_free_i64(lo);
+            tcg_temp_free_i64(hi);
             tcg_temp_free_i128(t16);
 
             tcg_gen_setcondi_i64(TCG_COND_NE, tmp, tmp, 0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 02/19] target/arm: Use tcg_gen_qemu_ld_i128 for LDXP
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
  2023-02-16  3:08 ` [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-16  3:08 ` [PATCH v1 03/19] target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld} Richard Henderson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

While we don't require 16-byte atomicity here, using
a single larger load simplifies the code, and makes it
a closer match to STXP.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 78a2141224..d7d4b68328 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2545,30 +2545,27 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                 tcg_gen_extract_i64(cpu_reg(s, rt2), cpu_exclusive_val, 0, 32);
             }
         } else {
-            /* The pair must be single-copy atomic for *each* doubleword, not
-               the entire quadword, however it must be quadword aligned.  */
-            TCGv_i64 t0 = tcg_temp_new_i64();
-            TCGv_i64 t1 = tcg_temp_new_i64();
+            /*
+             * The pair must be single-copy atomic for *each* doubleword, not
+             * the entire quadword, however it must be quadword aligned.
+             * Expose the complete load to tcg, for ease of tlb lookup,
+             * but indicate that only 8-byte atomicity is required.
+             */
+            TCGv_i128 t16 = tcg_temp_new_i128();
 
-            memop |= MO_64;
-            tcg_gen_qemu_ld_i64(t0, addr, idx, memop | MO_ALIGN_16);
+            memop |= MO_128 | MO_ALIGN_16 | MO_ATMAX_8;
+            tcg_gen_qemu_ld_i128(t16, addr, idx, memop);
 
-            tcg_gen_addi_i64(t1, addr, 8);
-            tcg_gen_qemu_ld_i64(t1, t1, idx, memop);
+            tcg_gen_extr_i128_i64(cpu_exclusive_val, cpu_exclusive_high, t16);
+            tcg_temp_free_i128(t16);
 
             if (s->be_data == MO_LE) {
-                tcg_gen_mov_i64(cpu_exclusive_val, t0);
-                tcg_gen_mov_i64(cpu_exclusive_high, t1);
+                tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val);
+                tcg_gen_mov_i64(cpu_reg(s, rt2), cpu_exclusive_high);
             } else {
-                tcg_gen_mov_i64(cpu_exclusive_high, t0);
-                tcg_gen_mov_i64(cpu_exclusive_val, t1);
+                tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_high);
+                tcg_gen_mov_i64(cpu_reg(s, rt2), cpu_exclusive_val);
             }
-
-            tcg_gen_mov_i64(cpu_reg(s, rt), t0);
-            tcg_gen_mov_i64(cpu_reg(s, rt2), t1);
-
-            tcg_temp_free_i64(t0);
-            tcg_temp_free_i64(t1);
         }
     } else {
         memop |= size | MO_ALIGN;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 03/19] target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld}
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
  2023-02-16  3:08 ` [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits Richard Henderson
  2023-02-16  3:08 ` [PATCH v1 02/19] target/arm: Use tcg_gen_qemu_ld_i128 for LDXP Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:23   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G Richard Henderson
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

While we don't require 16-byte atomicity here, using a single
larger operation simplifies the code.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d7d4b68328..edf92a728f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -965,25 +965,20 @@ static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size)
 
     tcg_gen_ld_i64(tmplo, cpu_env, fp_reg_offset(s, srcidx, MO_64));
 
-    if (size < 4) {
+    if (size < MO_128) {
         mop = finalize_memop(s, size);
         tcg_gen_qemu_st_i64(tmplo, tcg_addr, get_mem_index(s), mop);
     } else {
-        bool be = s->be_data == MO_BE;
-        TCGv_i64 tcg_hiaddr = tcg_temp_new_i64();
         TCGv_i64 tmphi = tcg_temp_new_i64();
+        TCGv_i128 t16 = tcg_temp_new_i128();
 
         tcg_gen_ld_i64(tmphi, cpu_env, fp_reg_hi_offset(s, srcidx));
-
-        mop = s->be_data | MO_UQ;
-        tcg_gen_qemu_st_i64(be ? tmphi : tmplo, tcg_addr, get_mem_index(s),
-                            mop | (s->align_mem ? MO_ALIGN_16 : 0));
-        tcg_gen_addi_i64(tcg_hiaddr, tcg_addr, 8);
-        tcg_gen_qemu_st_i64(be ? tmplo : tmphi, tcg_hiaddr,
-                            get_mem_index(s), mop);
-
-        tcg_temp_free_i64(tcg_hiaddr);
+        tcg_gen_concat_i64_i128(t16, tmplo, tmphi);
         tcg_temp_free_i64(tmphi);
+
+        mop = finalize_memop(s, size);
+        tcg_gen_qemu_st_i128(t16, tcg_addr, get_mem_index(s), mop);
+        tcg_temp_free_i128(t16);
     }
 
     tcg_temp_free_i64(tmplo);
@@ -999,23 +994,18 @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size)
     TCGv_i64 tmphi = NULL;
     MemOp mop;
 
-    if (size < 4) {
+    if (size < MO_128) {
         mop = finalize_memop(s, size);
         tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), mop);
     } else {
-        bool be = s->be_data == MO_BE;
-        TCGv_i64 tcg_hiaddr;
+        TCGv_i128 t16 = tcg_temp_new_i128();
+
+        mop = finalize_memop(s, size);
+        tcg_gen_qemu_ld_i128(t16, tcg_addr, get_mem_index(s), mop);
 
         tmphi = tcg_temp_new_i64();
-        tcg_hiaddr = tcg_temp_new_i64();
-
-        mop = s->be_data | MO_UQ;
-        tcg_gen_qemu_ld_i64(be ? tmphi : tmplo, tcg_addr, get_mem_index(s),
-                            mop | (s->align_mem ? MO_ALIGN_16 : 0));
-        tcg_gen_addi_i64(tcg_hiaddr, tcg_addr, 8);
-        tcg_gen_qemu_ld_i64(be ? tmplo : tmphi, tcg_hiaddr,
-                            get_mem_index(s), mop);
-        tcg_temp_free_i64(tcg_hiaddr);
+        tcg_gen_extr_i128_i64(tmplo, tmphi, t16);
+        tcg_temp_free_i128(t16);
     }
 
     tcg_gen_st_i64(tmplo, cpu_env, fp_reg_offset(s, destidx, MO_64));
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (2 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 03/19] target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld} Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:24   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 05/19] target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r Richard Henderson
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This fixes a bug in that these two insns should have been using atomic
16-byte stores, since MTE is ARMv8.5 and LSE2 is mandatory from ARMv8.4.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 18 +++++++++++-------
 tests/tcg/aarch64/mte-7.c  |  3 +--
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index edf92a728f..b42e5848cc 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4207,16 +4207,20 @@ static void disas_ldst_tag(DisasContext *s, uint32_t insn)
 
     if (is_zero) {
         TCGv_i64 clean_addr = clean_data_tbi(s, addr);
-        TCGv_i64 tcg_zero = tcg_constant_i64(0);
+        TCGv_i64 zero64 = tcg_constant_i64(0);
+        TCGv_i128 zero128 = tcg_temp_new_i128();
         int mem_index = get_mem_index(s);
-        int i, n = (1 + is_pair) << LOG2_TAG_GRANULE;
+        MemOp mop = MO_128 | MO_ALIGN;
 
-        tcg_gen_qemu_st_i64(tcg_zero, clean_addr, mem_index,
-                            MO_UQ | MO_ALIGN_16);
-        for (i = 8; i < n; i += 8) {
-            tcg_gen_addi_i64(clean_addr, clean_addr, 8);
-            tcg_gen_qemu_st_i64(tcg_zero, clean_addr, mem_index, MO_UQ);
+        tcg_gen_concat_i64_i128(zero128, zero64, zero64);
+
+        /* This is 1 or 2 atomic 16-byte operations. */
+        tcg_gen_qemu_st_i128(zero128, clean_addr, mem_index, mop);
+        if (is_pair) {
+            tcg_gen_addi_i64(clean_addr, clean_addr, 16);
+            tcg_gen_qemu_st_i128(zero128, clean_addr, mem_index, mop);
         }
+        tcg_temp_free_i128(zero128);
     }
 
     if (index != 0) {
diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c
index a981de62d4..04974f9ebb 100644
--- a/tests/tcg/aarch64/mte-7.c
+++ b/tests/tcg/aarch64/mte-7.c
@@ -19,8 +19,7 @@ int main(int ac, char **av)
     p = (void *)((unsigned long)p | (1ul << 56));
 
     /* Store tag in sequential granules. */
-    asm("stg %0, [%0]" : : "r"(p + 0x0ff0));
-    asm("stg %0, [%0]" : : "r"(p + 0x1000));
+    asm("stz2g %0, [%0]" : : "r"(p + 0x0ff0));
 
     /*
      * Perform an unaligned store with tag 1 crossing the pages.
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 05/19] target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (3 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:36   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 06/19] target/arm: Sink gen_mte_check1 into load/store_exclusive Richard Henderson
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Round len_align to 16 instead of 8, handling an odd 8-byte as part
of the tail.  Use MO_ATOM_NONE to indicate that all of these memory
ops have only byte atomicity.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 107 ++++++++++++++++++++++++++++---------
 1 file changed, 81 insertions(+), 26 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 621a2abb22..f3d5e79dd2 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4312,11 +4312,12 @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
 void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
                  int len, int rn, int imm)
 {
-    int len_align = QEMU_ALIGN_DOWN(len, 8);
-    int len_remain = len % 8;
-    int nparts = len / 8 + ctpop8(len_remain);
+    int len_align = QEMU_ALIGN_DOWN(len, 16);
+    int len_remain = len % 16;
+    int nparts = len / 16 + ctpop8(len_remain);
     int midx = get_mem_index(s);
     TCGv_i64 dirty_addr, clean_addr, t0, t1;
+    TCGv_i128 t16;
 
     dirty_addr = tcg_temp_new_i64();
     tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm);
@@ -4334,12 +4335,20 @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
         int i;
 
         t0 = tcg_temp_new_i64();
-        for (i = 0; i < len_align; i += 8) {
-            tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ);
+        t1 = tcg_temp_new_i64();
+        t16 = tcg_temp_new_i128();
+
+        for (i = 0; i < len_align; i += 16) {
+            tcg_gen_qemu_ld_i128(t16, clean_addr, midx,
+                                 MO_LE | MO_128 | MO_ATOM_NONE);
+            tcg_gen_extr_i128_i64(t0, t1, t16);
             tcg_gen_st_i64(t0, base, vofs + i);
-            tcg_gen_addi_i64(clean_addr, clean_addr, 8);
+            tcg_gen_st_i64(t1, base, vofs + i + 8);
+            tcg_gen_addi_i64(clean_addr, clean_addr, 16);
         }
         tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+        tcg_temp_free_i128(t16);
     } else {
         TCGLabel *loop = gen_new_label();
         TCGv_ptr tp, i = tcg_const_local_ptr(0);
@@ -4357,16 +4366,25 @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
 
         gen_set_label(loop);
 
-        t0 = tcg_temp_new_i64();
-        tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ);
-        tcg_gen_addi_i64(clean_addr, clean_addr, 8);
+        t16 = tcg_temp_new_i128();
+        tcg_gen_qemu_ld_i128(t16, clean_addr, midx,
+                             MO_LE | MO_128 | MO_ATOM_NONE);
+        tcg_gen_addi_i64(clean_addr, clean_addr, 16);
 
         tp = tcg_temp_new_ptr();
         tcg_gen_add_ptr(tp, base, i);
-        tcg_gen_addi_ptr(i, i, 8);
+        tcg_gen_addi_ptr(i, i, 16);
+
+        t0 = tcg_temp_new_i64();
+        t1 = tcg_temp_new_i64();
+        tcg_gen_extr_i128_i64(t0, t1, t16);
+        tcg_temp_free_i128(t16);
+
         tcg_gen_st_i64(t0, tp, vofs);
-        tcg_temp_free_ptr(tp);
+        tcg_gen_st_i64(t1, tp, vofs + 8);
         tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+        tcg_temp_free_ptr(tp);
 
         tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
         tcg_temp_free_ptr(i);
@@ -4381,6 +4399,17 @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
      * Predicate register loads can be any multiple of 2.
      * Note that we still store the entire 64-bit unit into cpu_env.
      */
+    if (len_remain >= 8) {
+        t0 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE);
+        tcg_gen_st_i64(t0, base, vofs + len_align);
+        len_remain -= 8;
+        len_align += 8;
+        if (len_remain) {
+            tcg_gen_addi_i64(clean_addr, clean_addr, 8);
+        }
+        tcg_temp_free_i64(t0);
+    }
     if (len_remain) {
         t0 = tcg_temp_new_i64();
         switch (len_remain) {
@@ -4388,14 +4417,14 @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
         case 4:
         case 8:
             tcg_gen_qemu_ld_i64(t0, clean_addr, midx,
-                                MO_LE | ctz32(len_remain));
+                                MO_LE | ctz32(len_remain) | MO_ATOM_NONE);
             break;
 
         case 6:
             t1 = tcg_temp_new_i64();
-            tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL);
+            tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE);
             tcg_gen_addi_i64(clean_addr, clean_addr, 4);
-            tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW);
+            tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW | MO_ATOM_NONE);
             tcg_gen_deposit_i64(t0, t0, t1, 32, 32);
             tcg_temp_free_i64(t1);
             break;
@@ -4412,11 +4441,12 @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
 void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
                  int len, int rn, int imm)
 {
-    int len_align = QEMU_ALIGN_DOWN(len, 8);
-    int len_remain = len % 8;
-    int nparts = len / 8 + ctpop8(len_remain);
+    int len_align = QEMU_ALIGN_DOWN(len, 16);
+    int len_remain = len % 16;
+    int nparts = len / 16 + ctpop8(len_remain);
     int midx = get_mem_index(s);
-    TCGv_i64 dirty_addr, clean_addr, t0;
+    TCGv_i64 dirty_addr, clean_addr, t0, t1;
+    TCGv_i128 t16;
 
     dirty_addr = tcg_temp_new_i64();
     tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm);
@@ -4435,12 +4465,19 @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
         int i;
 
         t0 = tcg_temp_new_i64();
+        t1 = tcg_temp_new_i64();
+        t16 = tcg_temp_new_i128();
         for (i = 0; i < len_align; i += 8) {
             tcg_gen_ld_i64(t0, base, vofs + i);
-            tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ);
-            tcg_gen_addi_i64(clean_addr, clean_addr, 8);
+            tcg_gen_ld_i64(t1, base, vofs + i + 8);
+            tcg_gen_concat_i64_i128(t16, t0, t1);
+            tcg_gen_qemu_st_i128(t16, clean_addr, midx,
+                                 MO_LE | MO_128 | MO_ATOM_NONE);
+            tcg_gen_addi_i64(clean_addr, clean_addr, 16);
         }
         tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+        tcg_temp_free_i128(t16);
     } else {
         TCGLabel *loop = gen_new_label();
         TCGv_ptr tp, i = tcg_const_local_ptr(0);
@@ -4459,15 +4496,22 @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
         gen_set_label(loop);
 
         t0 = tcg_temp_new_i64();
+        t1 = tcg_temp_new_i64();
         tp = tcg_temp_new_ptr();
         tcg_gen_add_ptr(tp, base, i);
         tcg_gen_ld_i64(t0, tp, vofs);
-        tcg_gen_addi_ptr(i, i, 8);
+        tcg_gen_ld_i64(t1, tp, vofs + 8);
+        tcg_gen_addi_ptr(i, i, 16);
         tcg_temp_free_ptr(tp);
 
-        tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ);
-        tcg_gen_addi_i64(clean_addr, clean_addr, 8);
+        t16 = tcg_temp_new_i128();
+        tcg_gen_concat_i64_i128(t16, t0, t1);
         tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+
+        tcg_gen_qemu_st_i128(t16, clean_addr, midx, MO_LEUQ);
+        tcg_temp_free_i128(t16);
+        tcg_gen_addi_i64(clean_addr, clean_addr, 16);
 
         tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
         tcg_temp_free_ptr(i);
@@ -4479,6 +4523,17 @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
     }
 
     /* Predicate register stores can be any multiple of 2.  */
+    if (len_remain >= 8) {
+        t0 = tcg_temp_new_i64();
+        tcg_gen_st_i64(t0, base, vofs + len_align);
+        tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE);
+        len_remain -= 8;
+        len_align += 8;
+        if (len_remain) {
+            tcg_gen_addi_i64(clean_addr, clean_addr, 8);
+        }
+        tcg_temp_free_i64(t0);
+    }
     if (len_remain) {
         t0 = tcg_temp_new_i64();
         tcg_gen_ld_i64(t0, base, vofs + len_align);
@@ -4488,14 +4543,14 @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
         case 4:
         case 8:
             tcg_gen_qemu_st_i64(t0, clean_addr, midx,
-                                MO_LE | ctz32(len_remain));
+                                MO_LE | ctz32(len_remain) | MO_ATOM_NONE);
             break;
 
         case 6:
-            tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL);
+            tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE);
             tcg_gen_addi_i64(clean_addr, clean_addr, 4);
             tcg_gen_shri_i64(t0, t0, 32);
-            tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW);
+            tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW | MO_ATOM_NONE);
             break;
 
         default:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 06/19] target/arm: Sink gen_mte_check1 into load/store_exclusive
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (4 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 05/19] target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:40   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 07/19] target/arm: Add feature test for FEAT_LSE2 Richard Henderson
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

No need to duplicate this check across multiple call sites.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 44 ++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b42e5848cc..cd86597172 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2514,11 +2514,16 @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn)
  * races in multi-threaded linux-user and when MTTCG softmmu is
  * enabled.
  */
-static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
-                               TCGv_i64 addr, int size, bool is_pair)
+static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn,
+                               int size, bool is_pair)
 {
     int idx = get_mem_index(s);
     MemOp memop = s->be_data;
+    TCGv_i64 dirty_addr, clean_addr;
+
+    s->is_ldex = true;
+    dirty_addr = cpu_reg_sp(s, rn);
+    clean_addr = gen_mte_check1(s, dirty_addr, false, rn != 31, size);
 
     g_assert(size <= 3);
     if (is_pair) {
@@ -2526,7 +2531,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
         if (size == 2) {
             /* The pair must be single-copy atomic for the doubleword.  */
             memop |= MO_64 | MO_ALIGN;
-            tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, memop);
+            tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop);
             if (s->be_data == MO_LE) {
                 tcg_gen_extract_i64(cpu_reg(s, rt), cpu_exclusive_val, 0, 32);
                 tcg_gen_extract_i64(cpu_reg(s, rt2), cpu_exclusive_val, 32, 32);
@@ -2544,7 +2549,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
             TCGv_i128 t16 = tcg_temp_new_i128();
 
             memop |= MO_128 | MO_ALIGN_16 | MO_ATMAX_8;
-            tcg_gen_qemu_ld_i128(t16, addr, idx, memop);
+            tcg_gen_qemu_ld_i128(t16, clean_addr, idx, memop);
 
             tcg_gen_extr_i128_i64(cpu_exclusive_val, cpu_exclusive_high, t16);
             tcg_temp_free_i128(t16);
@@ -2559,14 +2564,14 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
         }
     } else {
         memop |= size | MO_ALIGN;
-        tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, memop);
+        tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop);
         tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val);
     }
-    tcg_gen_mov_i64(cpu_exclusive_addr, addr);
+    tcg_gen_mov_i64(cpu_exclusive_addr, clean_addr);
 }
 
 static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
-                                TCGv_i64 addr, int size, int is_pair)
+                                int rn, int size, int is_pair)
 {
     /* if (env->exclusive_addr == addr && env->exclusive_val == [addr]
      *     && (!is_pair || env->exclusive_high == [addr + datasize])) {
@@ -2582,9 +2587,12 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
      */
     TCGLabel *fail_label = gen_new_label();
     TCGLabel *done_label = gen_new_label();
-    TCGv_i64 tmp;
+    TCGv_i64 tmp, dirty_addr, clean_addr;
 
-    tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
+    dirty_addr = cpu_reg_sp(s, rn);
+    clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, size);
+
+    tcg_gen_brcond_i64(TCG_COND_NE, clean_addr, cpu_exclusive_addr, fail_label);
 
     tmp = tcg_temp_new_i64();
     if (is_pair) {
@@ -2774,9 +2782,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (is_lasr) {
             tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
         }
-        clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
-                                    true, rn != 31, size);
-        gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, false);
+        gen_store_exclusive(s, rs, rt, rt2, rn, size, false);
         return;
 
     case 0x4: /* LDXR */
@@ -2784,10 +2790,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (rn == 31) {
             gen_check_sp_alignment(s);
         }
-        clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
-                                    false, rn != 31, size);
-        s->is_ldex = true;
-        gen_load_exclusive(s, rt, rt2, clean_addr, size, false);
+        gen_load_exclusive(s, rt, rt2, rn, size, false);
         if (is_lasr) {
             tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
         }
@@ -2839,9 +2842,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             if (is_lasr) {
                 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
             }
-            clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
-                                        true, rn != 31, size);
-            gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, true);
+            gen_store_exclusive(s, rs, rt, rt2, rn, size, true);
             return;
         }
         if (rt2 == 31
@@ -2858,10 +2859,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             if (rn == 31) {
                 gen_check_sp_alignment(s);
             }
-            clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
-                                        false, rn != 31, size);
-            s->is_ldex = true;
-            gen_load_exclusive(s, rt, rt2, clean_addr, size, true);
+            gen_load_exclusive(s, rt, rt2, rn, size, true);
             if (is_lasr) {
                 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
             }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 07/19] target/arm: Add feature test for FEAT_LSE2
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (5 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 06/19] target/arm: Sink gen_mte_check1 into load/store_exclusive Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:43   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 08/19] target/arm: Add atom_data to DisasContext Richard Henderson
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7bc97fece9..2108caf753 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4049,6 +4049,11 @@ static inline bool isar_feature_aa64_st(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, ST) != 0;
 }
 
+static inline bool isar_feature_aa64_lse2(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, AT) != 0;
+}
+
 static inline bool isar_feature_aa64_fwb(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, FWB) != 0;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 08/19] target/arm: Add atom_data to DisasContext
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (6 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 07/19] target/arm: Add feature test for FEAT_LSE2 Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:47   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 09/19] target/arm: Load/store integer pair with one tcg operation Richard Henderson
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Use this to record the default atomicity of memory operations.
Set it to MO_ATOM_WITHIN16 if FEAT_LSE2 applies.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.h     | 14 +++++++++++---
 target/arm/translate-a64.c |  4 ++++
 target/arm/translate.c     |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index 3717824b75..809479f9b7 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -54,6 +54,7 @@ typedef struct DisasContext {
     bool eci_handled;
     int sctlr_b;
     MemOp be_data;
+    MemOp atom_data;
 #if !defined(CONFIG_USER_ONLY)
     int user;
 #endif
@@ -556,10 +557,10 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
 /**
  * finalize_memop:
  * @s: DisasContext
- * @opc: size+sign+align of the memory operation
+ * @opc: size+sign+align+atomicity of the memory operation
  *
- * Build the complete MemOp for a memory operation, including alignment
- * and endianness.
+ * Build the complete MemOp for a memory operation, including alignment,
+ * endianness, and atomicity.
  *
  * If (op & MO_AMASK) then the operation already contains the required
  * alignment, e.g. for AccType_ATOMIC.  Otherwise, this an optionally
@@ -568,12 +569,19 @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour)
  * In the latter case, there are configuration bits that require alignment,
  * and this is applied here.  Note that there is no way to indicate that
  * no alignment should ever be enforced; this must be handled manually.
+ *
+ * If (op & MO_ATOM_MASK) or (op & MO_ATMAX_MASK) then the operation already
+ * contains the required atomicity, e.g. for AccType_VEC.  Otherwise, apply
+ * atomicity for AccType_NORMAL.
  */
 static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
 {
     if (s->align_mem && !(opc & MO_AMASK)) {
         opc |= MO_ALIGN;
     }
+    if (!(opc & (MO_ATOM_MASK | MO_ATMAX_MASK))) {
+        opc |= s->atom_data;
+    }
     return opc | s->be_data;
 }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index cd86597172..fa793485c3 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14762,6 +14762,10 @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     tcg_debug_assert(dc->tbid & 1);
 #endif
 
+    /* Record the atomicity of a single AccType_NORMAL memory access. */
+    dc->atom_data = (dc_isar_feature(aa64_lse2, dc)
+                     ? MO_ATOM_WITHIN16 : MO_ATOM_IFALIGN);
+
     /* Single step state. The code-generation logic here is:
      *  SS_ACTIVE == 0:
      *   generate code with no special handling for single-stepping (except
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c23a3462bf..552c376050 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -9449,6 +9449,7 @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
         dc->sme_trap_nonstreaming =
             EX_TBFLAG_A32(tb_flags, SME_TRAP_NONSTREAMING);
     }
+    dc->atom_data = MO_ATOM_IFALIGN;
     dc->cp_regs = cpu->cp_regs;
     dc->features = env->features;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 09/19] target/arm: Load/store integer pair with one tcg operation
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (7 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 08/19] target/arm: Add atom_data to DisasContext Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 15:57   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 10/19] target/arm: Hoist finalize_memop out of do_gpr_{ld, st} Richard Henderson
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This is required for LSE2, where the pair must be treated
atomically if it does not cross a 16-byte boundary.  But
it simplifies the code to do this always, just use the
unpaired atomicity without LSE2.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 77 ++++++++++++++++++++++++++++++--------
 1 file changed, 61 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index fa793485c3..c0d55c9204 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -3089,27 +3089,72 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         TCGv_i64 tcg_rt2 = cpu_reg(s, rt2);
+        MemOp mop = (size + 1) | s->be_data;
+
+        /*
+         * With LSE2, non-sign-extending pairs are treated atomically if
+         * aligned, and if unaligned one of the pair will be completely
+         * within a 16-byte block and that element will be atomic.
+         * Otherwise each element is separately atomic.
+         * In all cases, issue one operation with the correct atomicity.
+         *
+         * This treats sign-extending loads like zero-extending loads,
+         * since that reuses the most code below.
+         */
+        mop |= size << MO_ATMAX_SHIFT;
+        mop |= s->atom_data;
+        if (s->align_mem) {
+            mop |= (size == 2 ? MO_ALIGN_4 : MO_ALIGN_8);
+        }
 
         if (is_load) {
-            TCGv_i64 tmp = tcg_temp_new_i64();
+            if (size == 2) {
+                TCGv_i64 tmp = tcg_temp_new_i64();
 
-            /* Do not modify tcg_rt before recognizing any exception
-             * from the second load.
-             */
-            do_gpr_ld(s, tmp, clean_addr, size + is_signed * MO_SIGN,
-                      false, false, 0, false, false);
-            tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
-            do_gpr_ld(s, tcg_rt2, clean_addr, size + is_signed * MO_SIGN,
-                      false, false, 0, false, false);
+                tcg_gen_qemu_ld_i64(tmp, clean_addr, get_mem_index(s), mop);
+                if (s->be_data == MO_LE) {
+                    tcg_gen_extr32_i64(tcg_rt, tcg_rt2, tmp);
+                } else {
+                    tcg_gen_extr32_i64(tcg_rt2, tcg_rt, tmp);
+                }
+                if (is_signed) {
+                    tcg_gen_ext32s_i64(tcg_rt, tcg_rt);
+                    tcg_gen_ext32s_i64(tcg_rt2, tcg_rt2);
+                }
+                tcg_temp_free_i64(tmp);
+            } else {
+                TCGv_i128 tmp = tcg_temp_new_i128();
 
-            tcg_gen_mov_i64(tcg_rt, tmp);
-            tcg_temp_free_i64(tmp);
+                tcg_gen_qemu_ld_i128(tmp, clean_addr, get_mem_index(s), mop);
+                if (s->be_data == MO_LE) {
+                    tcg_gen_extr_i128_i64(tcg_rt, tcg_rt2, tmp);
+                } else {
+                    tcg_gen_extr_i128_i64(tcg_rt2, tcg_rt, tmp);
+                }
+                tcg_temp_free_i128(tmp);
+            }
         } else {
-            do_gpr_st(s, tcg_rt, clean_addr, size,
-                      false, 0, false, false);
-            tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
-            do_gpr_st(s, tcg_rt2, clean_addr, size,
-                      false, 0, false, false);
+            if (size == 2) {
+                TCGv_i64 tmp = tcg_temp_new_i64();
+
+                if (s->be_data == MO_LE) {
+                    tcg_gen_concat32_i64(tmp, tcg_rt, tcg_rt2);
+                } else {
+                    tcg_gen_concat32_i64(tmp, tcg_rt2, tcg_rt);
+                }
+                tcg_gen_qemu_st_i64(tmp, clean_addr, get_mem_index(s), mop);
+                tcg_temp_free_i64(tmp);
+            } else {
+                TCGv_i128 tmp = tcg_temp_new_i128();
+
+                if (s->be_data == MO_LE) {
+                    tcg_gen_concat_i64_i128(tmp, tcg_rt, tcg_rt2);
+                } else {
+                    tcg_gen_concat_i64_i128(tmp, tcg_rt2, tcg_rt);
+                }
+                tcg_gen_qemu_st_i128(tmp, clean_addr, get_mem_index(s), mop);
+                tcg_temp_free_i128(tmp);
+            }
         }
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 10/19] target/arm: Hoist finalize_memop out of do_gpr_{ld, st}
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (8 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 09/19] target/arm: Load/store integer pair with one tcg operation Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:03   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 11/19] target/arm: Hoist finalize_memop out of do_fp_{ld, st} Richard Henderson
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

We are going to need the complete memop beforehand,
so let's not compute it twice.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 61 ++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c0d55c9204..fd499a208d 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -888,7 +888,6 @@ static void do_gpr_st_memidx(DisasContext *s, TCGv_i64 source,
                              unsigned int iss_srt,
                              bool iss_sf, bool iss_ar)
 {
-    memop = finalize_memop(s, memop);
     tcg_gen_qemu_st_i64(source, tcg_addr, memidx, memop);
 
     if (iss_valid) {
@@ -923,7 +922,6 @@ static void do_gpr_ld_memidx(DisasContext *s, TCGv_i64 dest, TCGv_i64 tcg_addr,
                              bool iss_valid, unsigned int iss_srt,
                              bool iss_sf, bool iss_ar)
 {
-    memop = finalize_memop(s, memop);
     tcg_gen_qemu_ld_i64(dest, tcg_addr, memidx, memop);
 
     if (extend && (memop & MO_SIGN)) {
@@ -2772,6 +2770,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
     int o2_L_o1_o0 = extract32(insn, 21, 3) * 2 | is_lasr;
     int size = extract32(insn, 30, 2);
     TCGv_i64 clean_addr;
+    MemOp memop;
 
     switch (o2_L_o1_o0) {
     case 0x0: /* STXR */
@@ -2808,10 +2807,11 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             gen_check_sp_alignment(s);
         }
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
+        /* TODO: ARMv8.4-LSE SCTLR.nAA */
+        memop = finalize_memop(s, size | MO_ALIGN);
         clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
                                     true, rn != 31, size);
-        /* TODO: ARMv8.4-LSE SCTLR.nAA */
-        do_gpr_st(s, cpu_reg(s, rt), clean_addr, size | MO_ALIGN, true, rt,
+        do_gpr_st(s, cpu_reg(s, rt), clean_addr, memop, true, rt,
                   disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
         return;
 
@@ -2826,10 +2826,11 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (rn == 31) {
             gen_check_sp_alignment(s);
         }
+        /* TODO: ARMv8.4-LSE SCTLR.nAA */
+        memop = finalize_memop(s, size | MO_ALIGN);
         clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
                                     false, rn != 31, size);
-        /* TODO: ARMv8.4-LSE SCTLR.nAA */
-        do_gpr_ld(s, cpu_reg(s, rt), clean_addr, size | MO_ALIGN, false, true,
+        do_gpr_ld(s, cpu_reg(s, rt), clean_addr, memop, false, true,
                   rt, disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
         return;
@@ -2937,9 +2938,9 @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
     } else {
         /* Only unsigned 32bit loads target 32bit registers.  */
         bool iss_sf = opc != 0;
+        MemOp memop = finalize_memop(s, size + is_signed * MO_SIGN);
 
-        do_gpr_ld(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN,
-                  false, true, rt, iss_sf, false);
+        do_gpr_ld(s, tcg_rt, clean_addr, mop, false, true, rt, iss_sf, false);
     }
 }
 
@@ -3199,7 +3200,7 @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
     bool post_index;
     bool writeback;
     int memidx;
-
+    MemOp memop;
     TCGv_i64 clean_addr, dirty_addr;
 
     if (is_vector) {
@@ -3226,7 +3227,7 @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
             return;
         }
         is_store = (opc == 0);
-        is_signed = extract32(opc, 1, 1);
+        is_signed = !is_store && extract32(opc, 1, 1);
         is_extended = (size < 3) && extract32(opc, 0, 1);
     }
 
@@ -3260,6 +3261,8 @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
     }
 
     memidx = is_unpriv ? get_a64_user_mem_index(s) : get_mem_index(s);
+    memop = finalize_memop(s, size + is_signed * MO_SIGN);
+
     clean_addr = gen_mte_check1_mmuidx(s, dirty_addr, is_store,
                                        writeback || rn != 31,
                                        size, is_unpriv, memidx);
@@ -3275,10 +3278,10 @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
         bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
 
         if (is_store) {
-            do_gpr_st_memidx(s, tcg_rt, clean_addr, size, memidx,
+            do_gpr_st_memidx(s, tcg_rt, clean_addr, memop, memidx,
                              iss_valid, rt, iss_sf, false);
         } else {
-            do_gpr_ld_memidx(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN,
+            do_gpr_ld_memidx(s, tcg_rt, clean_addr, memop,
                              is_extended, memidx,
                              iss_valid, rt, iss_sf, false);
         }
@@ -3327,8 +3330,8 @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
     bool is_signed = false;
     bool is_store = false;
     bool is_extended = false;
-
     TCGv_i64 tcg_rm, clean_addr, dirty_addr;
+    MemOp memop;
 
     if (extract32(opt, 1, 1) == 0) {
         unallocated_encoding(s);
@@ -3355,7 +3358,7 @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
             return;
         }
         is_store = (opc == 0);
-        is_signed = extract32(opc, 1, 1);
+        is_signed = !is_store && extract32(opc, 1, 1);
         is_extended = (size < 3) && extract32(opc, 0, 1);
     }
 
@@ -3368,6 +3371,8 @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
     ext_and_shift_reg(tcg_rm, tcg_rm, opt, shift ? size : 0);
 
     tcg_gen_add_i64(dirty_addr, dirty_addr, tcg_rm);
+
+    memop = finalize_memop(s, size + is_signed * MO_SIGN);
     clean_addr = gen_mte_check1(s, dirty_addr, is_store, true, size);
 
     if (is_vector) {
@@ -3379,11 +3384,12 @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
+
         if (is_store) {
-            do_gpr_st(s, tcg_rt, clean_addr, size,
+            do_gpr_st(s, tcg_rt, clean_addr, memop,
                       true, rt, iss_sf, false);
         } else {
-            do_gpr_ld(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN,
+            do_gpr_ld(s, tcg_rt, clean_addr, memop,
                       is_extended, true, rt, iss_sf, false);
         }
     }
@@ -3415,12 +3421,11 @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
     int rn = extract32(insn, 5, 5);
     unsigned int imm12 = extract32(insn, 10, 12);
     unsigned int offset;
-
     TCGv_i64 clean_addr, dirty_addr;
-
     bool is_store;
     bool is_signed = false;
     bool is_extended = false;
+    MemOp memop;
 
     if (is_vector) {
         size |= (opc & 2) << 1;
@@ -3442,7 +3447,7 @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
             return;
         }
         is_store = (opc == 0);
-        is_signed = extract32(opc, 1, 1);
+        is_signed = !is_store && extract32(opc, 1, 1);
         is_extended = (size < 3) && extract32(opc, 0, 1);
     }
 
@@ -3452,6 +3457,8 @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
     dirty_addr = read_cpu_reg_sp(s, rn, 1);
     offset = imm12 << size;
     tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
+
+    memop = finalize_memop(s, size + is_signed * MO_SIGN);
     clean_addr = gen_mte_check1(s, dirty_addr, is_store, rn != 31, size);
 
     if (is_vector) {
@@ -3464,10 +3471,9 @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc);
         if (is_store) {
-            do_gpr_st(s, tcg_rt, clean_addr, size,
-                      true, rt, iss_sf, false);
+            do_gpr_st(s, tcg_rt, clean_addr, memop, true, rt, iss_sf, false);
         } else {
-            do_gpr_ld(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN,
+            do_gpr_ld(s, tcg_rt, clean_addr, memop,
                       is_extended, true, rt, iss_sf, false);
         }
     }
@@ -3497,7 +3503,7 @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     bool a = extract32(insn, 23, 1);
     TCGv_i64 tcg_rs, tcg_rt, clean_addr;
     AtomicThreeOpFn *fn = NULL;
-    MemOp mop = s->be_data | size | MO_ALIGN;
+    MemOp mop = finalize_memop(s, size | MO_ALIGN);
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
@@ -3558,7 +3564,7 @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
          * full load-acquire (we only need "load-acquire processor consistent"),
          * but we choose to implement them as full LDAQ.
          */
-        do_gpr_ld(s, cpu_reg(s, rt), clean_addr, size, false,
+        do_gpr_ld(s, cpu_reg(s, rt), clean_addr, mop, false,
                   true, rt, disas_ldst_compute_iss_sf(size, false, 0), true);
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
         return;
@@ -3604,6 +3610,7 @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
     bool use_key_a = !extract32(insn, 23, 1);
     int offset;
     TCGv_i64 clean_addr, dirty_addr, tcg_rt;
+    MemOp memop;
 
     if (size != 3 || is_vector || !dc_isar_feature(aa64_pauth, s)) {
         unallocated_encoding(s);
@@ -3630,12 +3637,14 @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
     offset = sextract32(offset << size, 0, 10 + size);
     tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
 
+    memop = finalize_memop(s, size);
+
     /* Note that "clean" and "dirty" here refer to TBI not PAC.  */
     clean_addr = gen_mte_check1(s, dirty_addr, false,
                                 is_wback || rn != 31, size);
 
     tcg_rt = cpu_reg(s, rt);
-    do_gpr_ld(s, tcg_rt, clean_addr, size,
+    do_gpr_ld(s, tcg_rt, clean_addr, memop,
               /* extend */ false, /* iss_valid */ !is_wback,
               /* iss_srt */ rt, /* iss_sf */ true, /* iss_ar */ false);
 
@@ -3677,7 +3686,7 @@ static void disas_ldst_ldapr_stlr(DisasContext *s, uint32_t insn)
     }
 
     /* TODO: ARMv8.4-LSE SCTLR.nAA */
-    mop = size | MO_ALIGN;
+    mop = finalize_memop(s, size | MO_ALIGN);
 
     switch (opc) {
     case 0: /* STLURB */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 11/19] target/arm: Hoist finalize_memop out of do_fp_{ld, st}
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (9 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 10/19] target/arm: Hoist finalize_memop out of do_gpr_{ld, st} Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:04   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 12/19] target/arm: Pass memop to gen_mte_check1* Richard Henderson
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

We are going to need the complete memop beforehand,
so let's not compute it twice.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 42 +++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index fd499a208d..cc857d60d7 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -955,16 +955,14 @@ static void do_gpr_ld(DisasContext *s, TCGv_i64 dest, TCGv_i64 tcg_addr,
 /*
  * Store from FP register to memory
  */
-static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size)
+static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, MemOp mop)
 {
     /* This writes the bottom N bits of a 128 bit wide vector to memory */
     TCGv_i64 tmplo = tcg_temp_new_i64();
-    MemOp mop;
 
     tcg_gen_ld_i64(tmplo, cpu_env, fp_reg_offset(s, srcidx, MO_64));
 
-    if (size < MO_128) {
-        mop = finalize_memop(s, size);
+    if ((mop & MO_SIZE) < MO_128) {
         tcg_gen_qemu_st_i64(tmplo, tcg_addr, get_mem_index(s), mop);
     } else {
         TCGv_i64 tmphi = tcg_temp_new_i64();
@@ -974,7 +972,6 @@ static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size)
         tcg_gen_concat_i64_i128(t16, tmplo, tmphi);
         tcg_temp_free_i64(tmphi);
 
-        mop = finalize_memop(s, size);
         tcg_gen_qemu_st_i128(t16, tcg_addr, get_mem_index(s), mop);
         tcg_temp_free_i128(t16);
     }
@@ -985,20 +982,17 @@ static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size)
 /*
  * Load from memory to FP register
  */
-static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size)
+static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, MemOp mop)
 {
     /* This always zero-extends and writes to a full 128 bit wide vector */
     TCGv_i64 tmplo = tcg_temp_new_i64();
     TCGv_i64 tmphi = NULL;
-    MemOp mop;
 
-    if (size < MO_128) {
-        mop = finalize_memop(s, size);
+    if ((mop & MO_SIZE) < MO_128) {
         tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), mop);
     } else {
         TCGv_i128 t16 = tcg_temp_new_i128();
 
-        mop = finalize_memop(s, size);
         tcg_gen_qemu_ld_i128(t16, tcg_addr, get_mem_index(s), mop);
 
         tmphi = tcg_temp_new_i64();
@@ -2910,6 +2904,7 @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
     bool is_signed = false;
     int size = 2;
     TCGv_i64 tcg_rt, clean_addr;
+    MemOp mop;
 
     if (is_vector) {
         if (opc == 3) {
@@ -2933,13 +2928,13 @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
 
     clean_addr = new_tmp_a64(s);
     gen_pc_plus_diff(s, clean_addr, imm);
+
+    mop = finalize_memop(s, size + is_signed * MO_SIGN);
     if (is_vector) {
-        do_fp_ld(s, rt, clean_addr, size);
+        do_fp_ld(s, rt, clean_addr, mop);
     } else {
         /* Only unsigned 32bit loads target 32bit registers.  */
         bool iss_sf = opc != 0;
-        MemOp memop = finalize_memop(s, size + is_signed * MO_SIGN);
-
         do_gpr_ld(s, tcg_rt, clean_addr, mop, false, true, rt, iss_sf, false);
     }
 }
@@ -3076,16 +3071,17 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
                                 (wback || rn != 31) && !set_tag, 2 << size);
 
     if (is_vector) {
+        MemOp mop = finalize_memop(s, size);
         if (is_load) {
-            do_fp_ld(s, rt, clean_addr, size);
+            do_fp_ld(s, rt, clean_addr, mop);
         } else {
-            do_fp_st(s, rt, clean_addr, size);
+            do_fp_st(s, rt, clean_addr, mop);
         }
         tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size);
         if (is_load) {
-            do_fp_ld(s, rt2, clean_addr, size);
+            do_fp_ld(s, rt2, clean_addr, mop);
         } else {
-            do_fp_st(s, rt2, clean_addr, size);
+            do_fp_st(s, rt2, clean_addr, mop);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -3269,9 +3265,9 @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
 
     if (is_vector) {
         if (is_store) {
-            do_fp_st(s, rt, clean_addr, size);
+            do_fp_st(s, rt, clean_addr, memop);
         } else {
-            do_fp_ld(s, rt, clean_addr, size);
+            do_fp_ld(s, rt, clean_addr, memop);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -3377,9 +3373,9 @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
 
     if (is_vector) {
         if (is_store) {
-            do_fp_st(s, rt, clean_addr, size);
+            do_fp_st(s, rt, clean_addr, memop);
         } else {
-            do_fp_ld(s, rt, clean_addr, size);
+            do_fp_ld(s, rt, clean_addr, memop);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
@@ -3463,9 +3459,9 @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
 
     if (is_vector) {
         if (is_store) {
-            do_fp_st(s, rt, clean_addr, size);
+            do_fp_st(s, rt, clean_addr, memop);
         } else {
-            do_fp_ld(s, rt, clean_addr, size);
+            do_fp_ld(s, rt, clean_addr, memop);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 12/19] target/arm: Pass memop to gen_mte_check1*
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (10 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 11/19] target/arm: Hoist finalize_memop out of do_fp_{ld, st} Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:08   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 13/19] target/arm: Pass single_memop to gen_mte_checkN Richard Henderson
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Pass the completed memop to gen_mte_check1_mmuidx.
For the moment, do nothing more than extract the size.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.h |  2 +-
 target/arm/translate-a64.c | 80 ++++++++++++++++++++------------------
 target/arm/translate-sve.c |  7 ++--
 3 files changed, 48 insertions(+), 41 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index ad3762d1ac..3fc39763d0 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -52,7 +52,7 @@ static inline bool sme_smza_enabled_check(DisasContext *s)
 
 TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr);
 TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
-                        bool tag_checked, int log2_size);
+                        bool tag_checked, MemOp memop);
 TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
                         bool tag_checked, int size);
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index cc857d60d7..e02bdd3e7c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -256,7 +256,7 @@ static void gen_probe_access(DisasContext *s, TCGv_i64 ptr,
  */
 static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr,
                                       bool is_write, bool tag_checked,
-                                      int log2_size, bool is_unpriv,
+                                      MemOp memop, bool is_unpriv,
                                       int core_idx)
 {
     if (tag_checked && s->mte_active[is_unpriv]) {
@@ -267,7 +267,7 @@ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr,
         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << log2_size) - 1);
+        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, memop_size(memop) - 1);
 
         ret = new_tmp_a64(s);
         gen_helper_mte_check(ret, cpu_env, tcg_constant_i32(desc), addr);
@@ -278,9 +278,9 @@ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr,
 }
 
 TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
-                        bool tag_checked, int log2_size)
+                        bool tag_checked, MemOp memop)
 {
-    return gen_mte_check1_mmuidx(s, addr, is_write, tag_checked, log2_size,
+    return gen_mte_check1_mmuidx(s, addr, is_write, tag_checked, memop,
                                  false, get_mem_index(s));
 }
 
@@ -2510,19 +2510,30 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn,
                                int size, bool is_pair)
 {
     int idx = get_mem_index(s);
-    MemOp memop = s->be_data;
     TCGv_i64 dirty_addr, clean_addr;
+    MemOp memop;
+
+    /*
+     * For pairs:
+     * if size == 2, the operation is single-copy atomic for the doubleword.
+     * if size == 3, the operation is single-copy atomic for *each* doubleword,
+     * not the entire quadword, however it must be quadword aligned.
+     */
+    memop = size + is_pair;
+    if (memop == MO_128) {
+        memop |= MO_ATMAX_8;
+    }
+    memop |= MO_ALIGN;
+    memop = finalize_memop(s, memop);
 
     s->is_ldex = true;
     dirty_addr = cpu_reg_sp(s, rn);
-    clean_addr = gen_mte_check1(s, dirty_addr, false, rn != 31, size);
+    clean_addr = gen_mte_check1(s, dirty_addr, false, rn != 31, memop);
 
     g_assert(size <= 3);
     if (is_pair) {
         g_assert(size >= 2);
         if (size == 2) {
-            /* The pair must be single-copy atomic for the doubleword.  */
-            memop |= MO_64 | MO_ALIGN;
             tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop);
             if (s->be_data == MO_LE) {
                 tcg_gen_extract_i64(cpu_reg(s, rt), cpu_exclusive_val, 0, 32);
@@ -2532,15 +2543,8 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn,
                 tcg_gen_extract_i64(cpu_reg(s, rt2), cpu_exclusive_val, 0, 32);
             }
         } else {
-            /*
-             * The pair must be single-copy atomic for *each* doubleword, not
-             * the entire quadword, however it must be quadword aligned.
-             * Expose the complete load to tcg, for ease of tlb lookup,
-             * but indicate that only 8-byte atomicity is required.
-             */
             TCGv_i128 t16 = tcg_temp_new_i128();
 
-            memop |= MO_128 | MO_ALIGN_16 | MO_ATMAX_8;
             tcg_gen_qemu_ld_i128(t16, clean_addr, idx, memop);
 
             tcg_gen_extr_i128_i64(cpu_exclusive_val, cpu_exclusive_high, t16);
@@ -2555,7 +2559,6 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn,
             }
         }
     } else {
-        memop |= size | MO_ALIGN;
         tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop);
         tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val);
     }
@@ -2580,9 +2583,13 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     TCGLabel *fail_label = gen_new_label();
     TCGLabel *done_label = gen_new_label();
     TCGv_i64 tmp, dirty_addr, clean_addr;
+    MemOp memop;
+
+    memop = (size + is_pair) | MO_ALIGN;
+    memop = finalize_memop(s, memop);
 
     dirty_addr = cpu_reg_sp(s, rn);
-    clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, size);
+    clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, memop);
 
     tcg_gen_brcond_i64(TCG_COND_NE, clean_addr, cpu_exclusive_addr, fail_label);
 
@@ -2596,8 +2603,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
             }
             tcg_gen_atomic_cmpxchg_i64(tmp, cpu_exclusive_addr,
                                        cpu_exclusive_val, tmp,
-                                       get_mem_index(s),
-                                       MO_64 | MO_ALIGN | s->be_data);
+                                       get_mem_index(s), memop);
             tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val);
         } else {
             TCGv_i128 t16 = tcg_temp_new_i128();
@@ -2612,8 +2618,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
             tcg_gen_concat_i64_i128(c16, cpu_exclusive_val, cpu_exclusive_high);
 
             tcg_gen_atomic_cmpxchg_i128(t16, cpu_exclusive_addr, c16, t16,
-                                        get_mem_index(s),
-                                        MO_128 | MO_ALIGN | s->be_data);
+                                        get_mem_index(s), memop);
             tcg_temp_free_i128(c16);
 
             lo = tcg_temp_new_i64();
@@ -2631,8 +2636,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
         }
     } else {
         tcg_gen_atomic_cmpxchg_i64(tmp, cpu_exclusive_addr, cpu_exclusive_val,
-                                   cpu_reg(s, rt), get_mem_index(s),
-                                   size | MO_ALIGN | s->be_data);
+                                   cpu_reg(s, rt), get_mem_index(s), memop);
         tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val);
     }
     tcg_gen_mov_i64(cpu_reg(s, rd), tmp);
@@ -2652,13 +2656,15 @@ static void gen_compare_and_swap(DisasContext *s, int rs, int rt,
     TCGv_i64 tcg_rt = cpu_reg(s, rt);
     int memidx = get_mem_index(s);
     TCGv_i64 clean_addr;
+    MemOp memop;
 
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, size);
-    tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt, memidx,
-                               size | MO_ALIGN | s->be_data);
+    memop = finalize_memop(s, size | MO_ALIGN);
+    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop);
+    tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt,
+                               memidx, memop);
 }
 
 static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
@@ -2670,13 +2676,15 @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
     TCGv_i64 t2 = cpu_reg(s, rt + 1);
     TCGv_i64 clean_addr;
     int memidx = get_mem_index(s);
+    MemOp memop;
 
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
 
     /* This is a single atomic access, despite the "pair". */
-    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, size + 1);
+    memop = finalize_memop(s, (size + 1) | MO_ALIGN);
+    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop);
 
     if (size == 2) {
         TCGv_i64 cmp = tcg_temp_new_i64();
@@ -2690,8 +2698,7 @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
             tcg_gen_concat32_i64(cmp, s2, s1);
         }
 
-        tcg_gen_atomic_cmpxchg_i64(cmp, clean_addr, cmp, val, memidx,
-                                   MO_64 | MO_ALIGN | s->be_data);
+        tcg_gen_atomic_cmpxchg_i64(cmp, clean_addr, cmp, val, memidx, memop);
         tcg_temp_free_i64(val);
 
         if (s->be_data == MO_LE) {
@@ -2712,8 +2719,7 @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
             tcg_gen_concat_i64_i128(cmp, s2, s1);
         }
 
-        tcg_gen_atomic_cmpxchg_i128(cmp, clean_addr, cmp, val, memidx,
-                                    MO_128 | MO_ALIGN | s->be_data);
+        tcg_gen_atomic_cmpxchg_i128(cmp, clean_addr, cmp, val, memidx, memop);
         tcg_temp_free_i128(val);
 
         if (s->be_data == MO_LE) {
@@ -2804,7 +2810,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         /* TODO: ARMv8.4-LSE SCTLR.nAA */
         memop = finalize_memop(s, size | MO_ALIGN);
         clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
-                                    true, rn != 31, size);
+                                    true, rn != 31, memop);
         do_gpr_st(s, cpu_reg(s, rt), clean_addr, memop, true, rt,
                   disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
         return;
@@ -2823,7 +2829,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         /* TODO: ARMv8.4-LSE SCTLR.nAA */
         memop = finalize_memop(s, size | MO_ALIGN);
         clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
-                                    false, rn != 31, size);
+                                    false, rn != 31, memop);
         do_gpr_ld(s, cpu_reg(s, rt), clean_addr, memop, false, true,
                   rt, disas_ldst_compute_iss_sf(size, false, 0), is_lasr);
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
@@ -3369,7 +3375,7 @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
     tcg_gen_add_i64(dirty_addr, dirty_addr, tcg_rm);
 
     memop = finalize_memop(s, size + is_signed * MO_SIGN);
-    clean_addr = gen_mte_check1(s, dirty_addr, is_store, true, size);
+    clean_addr = gen_mte_check1(s, dirty_addr, is_store, true, memop);
 
     if (is_vector) {
         if (is_store) {
@@ -3455,7 +3461,7 @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
     tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
 
     memop = finalize_memop(s, size + is_signed * MO_SIGN);
-    clean_addr = gen_mte_check1(s, dirty_addr, is_store, rn != 31, size);
+    clean_addr = gen_mte_check1(s, dirty_addr, is_store, rn != 31, memop);
 
     if (is_vector) {
         if (is_store) {
@@ -3550,7 +3556,7 @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, size);
+    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, mop);
 
     if (o3_opc == 014) {
         /*
@@ -3637,7 +3643,7 @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn,
 
     /* Note that "clean" and "dirty" here refer to TBI not PAC.  */
     clean_addr = gen_mte_check1(s, dirty_addr, false,
-                                is_wback || rn != 31, size);
+                                is_wback || rn != 31, memop);
 
     tcg_rt = cpu_reg(s, rt);
     do_gpr_ld(s, tcg_rt, clean_addr, memop,
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index f3d5e79dd2..f283322cda 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5215,6 +5215,7 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a)
     unsigned msz = dtype_msz(a->dtype);
     TCGLabel *over;
     TCGv_i64 temp, clean_addr;
+    MemOp memop;
 
     if (!dc_isar_feature(aa64_sve, s)) {
         return false;
@@ -5246,10 +5247,10 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a)
     /* Load the data.  */
     temp = tcg_temp_new_i64();
     tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << msz);
-    clean_addr = gen_mte_check1(s, temp, false, true, msz);
 
-    tcg_gen_qemu_ld_i64(temp, clean_addr, get_mem_index(s),
-                        finalize_memop(s, dtype_mop[a->dtype]));
+    memop = finalize_memop(s, dtype_mop[a->dtype]);
+    clean_addr = gen_mte_check1(s, temp, false, true, memop);
+    tcg_gen_qemu_ld_i64(temp, clean_addr, get_mem_index(s), memop);
 
     /* Broadcast to *all* elements.  */
     tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd),
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 13/19] target/arm: Pass single_memop to gen_mte_checkN
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (11 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 12/19] target/arm: Pass memop to gen_mte_check1* Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:10   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check Richard Henderson
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Pass the individual memop to gen_mte_checkN.
For the moment, do nothing with it.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.h |  2 +-
 target/arm/translate-a64.c | 26 +++++++++++++++-----------
 target/arm/translate-sve.c |  4 ++--
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 3fc39763d0..b7518f9d34 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -54,7 +54,7 @@ TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr);
 TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
                         bool tag_checked, MemOp memop);
 TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
-                        bool tag_checked, int size);
+                        bool tag_checked, int size, MemOp memop);
 
 /* We should have at some point before trying to access an FP register
  * done the necessary access check, so assert that
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e02bdd3e7c..1117a1cc41 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -288,7 +288,7 @@ TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
  * For MTE, check multiple logical sequential accesses.
  */
 TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
-                        bool tag_checked, int size)
+                        bool tag_checked, int total_size, MemOp single_mop)
 {
     if (tag_checked && s->mte_active[0]) {
         TCGv_i64 ret;
@@ -298,7 +298,7 @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, size - 1);
+        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, total_size - 1);
 
         ret = new_tmp_a64(s);
         gen_helper_mte_check(ret, cpu_env, tcg_constant_i32(desc), addr);
@@ -2983,14 +2983,12 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
     bool is_vector = extract32(insn, 26, 1);
     bool is_load = extract32(insn, 22, 1);
     int opc = extract32(insn, 30, 2);
-
     bool is_signed = false;
     bool postindex = false;
     bool wback = false;
     bool set_tag = false;
-
     TCGv_i64 clean_addr, dirty_addr;
-
+    MemOp mop;
     int size;
 
     if (opc == 3) {
@@ -3073,11 +3071,13 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
         }
     }
 
+    mop = finalize_memop(s, size);
     clean_addr = gen_mte_checkN(s, dirty_addr, !is_load,
-                                (wback || rn != 31) && !set_tag, 2 << size);
+                                (wback || rn != 31) && !set_tag,
+                                2 << size, mop);
 
     if (is_vector) {
-        MemOp mop = finalize_memop(s, size);
+        /* LSE2 does not merge FP pairs; leave these as separate operations. */
         if (is_load) {
             do_fp_ld(s, rt, clean_addr, mop);
         } else {
@@ -3092,9 +3092,11 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         TCGv_i64 tcg_rt2 = cpu_reg(s, rt2);
-        MemOp mop = (size + 1) | s->be_data;
 
         /*
+         * We built mop above for the single logical access -- rebuild it
+         * now for the paired operation.
+         *
          * With LSE2, non-sign-extending pairs are treated atomically if
          * aligned, and if unaligned one of the pair will be completely
          * within a 16-byte block and that element will be atomic.
@@ -3104,6 +3106,7 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn)
          * This treats sign-extending loads like zero-extending loads,
          * since that reuses the most code below.
          */
+        mop = (size + 1) | s->be_data;
         mop |= size << MO_ATMAX_SHIFT;
         mop |= s->atom_data;
         if (s->align_mem) {
@@ -3887,7 +3890,7 @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
      * promote consecutive little-endian elements below.
      */
     clean_addr = gen_mte_checkN(s, tcg_rn, is_store, is_postidx || rn != 31,
-                                total);
+                                total, finalize_memop(s, size));
 
     /*
      * Consecutive little-endian elements from a single register
@@ -4045,10 +4048,11 @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
     total = selem << scale;
     tcg_rn = cpu_reg_sp(s, rn);
 
-    clean_addr = gen_mte_checkN(s, tcg_rn, !is_load, is_postidx || rn != 31,
-                                total);
     mop = finalize_memop(s, scale);
 
+    clean_addr = gen_mte_checkN(s, tcg_rn, !is_load, is_postidx || rn != 31,
+                                total, mop);
+
     tcg_ebytes = tcg_constant_i64(1 << scale);
     for (xs = 0; xs < selem; xs++) {
         if (replicate) {
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index f283322cda..6a89126fc5 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4321,7 +4321,7 @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
 
     dirty_addr = tcg_temp_new_i64();
     tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm);
-    clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len);
+    clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len, MO_8);
     tcg_temp_free_i64(dirty_addr);
 
     /*
@@ -4450,7 +4450,7 @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
 
     dirty_addr = tcg_temp_new_i64();
     tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm);
-    clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len);
+    clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len, MO_8);
     tcg_temp_free_i64(dirty_addr);
 
     /* Note that unpredicated load/store of vector/predicate registers
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (12 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 13/19] target/arm: Pass single_memop to gen_mte_checkN Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:28   ` Peter Maydell
  2023-02-23 16:54   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 15/19] target/arm: Add SCTLR.nAA to TBFLAG_A64 Richard Henderson
                   ` (4 subsequent siblings)
  18 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Fixes a bug in that with SCTLR.A set, we should raise any
alignment fault before raising any MTE check fault.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h     |  3 ++-
 target/arm/mte_helper.c    | 18 ++++++++++++++++++
 target/arm/translate-a64.c |  2 ++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index e1e018da46..fa264e368c 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1222,7 +1222,8 @@ FIELD(MTEDESC, MIDX,  0, 4)
 FIELD(MTEDESC, TBI,   4, 2)
 FIELD(MTEDESC, TCMA,  6, 2)
 FIELD(MTEDESC, WRITE, 8, 1)
-FIELD(MTEDESC, SIZEM1, 9, SIMD_DATA_BITS - 9)  /* size - 1 */
+FIELD(MTEDESC, ALIGN, 9, 3)
+FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
 
 bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
 uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 98bcf59c22..e50bb4ea13 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -784,6 +784,24 @@ uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra)
 
 uint64_t HELPER(mte_check)(CPUARMState *env, uint32_t desc, uint64_t ptr)
 {
+    /*
+     * In the Arm ARM pseudocode, the alignment check happens at the top
+     * of Mem[], while the MTE check happens later in AArch64.MemSingle[].
+     * Thus the alignment check has priority.
+     * When the mte check is disabled, tcg performs the alignment check
+     * during the code generated for the memory access.
+     */
+    unsigned align = FIELD_EX32(desc, MTEDESC, ALIGN);
+    if (unlikely(align)) {
+        align = (1u << align) - 1;
+        if (unlikely(ptr & align)) {
+            int idx = FIELD_EX32(desc, MTEDESC, MIDX);
+            bool w = FIELD_EX32(desc, MTEDESC, WRITE);
+            MMUAccessType type = w ? MMU_DATA_STORE : MMU_DATA_LOAD;
+            arm_cpu_do_unaligned_access(env_cpu(env), ptr, type, idx, GETPC());
+        }
+    }
+
     return mte_check(env, desc, ptr, GETPC());
 }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 1117a1cc41..caeb91efa5 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -267,6 +267,7 @@ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr,
         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
+        desc = FIELD_DP32(desc, MTEDESC, ALIGN, get_alignment_bits(memop));
         desc = FIELD_DP32(desc, MTEDESC, SIZEM1, memop_size(memop) - 1);
 
         ret = new_tmp_a64(s);
@@ -298,6 +299,7 @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
+        desc = FIELD_DP32(desc, MTEDESC, ALIGN, get_alignment_bits(single_mop));
         desc = FIELD_DP32(desc, MTEDESC, SIZEM1, total_size - 1);
 
         ret = new_tmp_a64(s);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 15/19] target/arm: Add SCTLR.nAA to TBFLAG_A64
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (13 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:32   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2 Richard Henderson
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           | 3 ++-
 target/arm/translate.h     | 2 ++
 target/arm/helper.c        | 6 ++++++
 target/arm/translate-a64.c | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2108caf753..b814c52469 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1243,7 +1243,7 @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_D       (1U << 5) /* up to v5; RAO in v6 */
 #define SCTLR_CP15BEN (1U << 5) /* v7 onward */
 #define SCTLR_L       (1U << 6) /* up to v5; RAO in v6 and v7; RAZ in v8 */
-#define SCTLR_nAA     (1U << 6) /* when v8.4-LSE is implemented */
+#define SCTLR_nAA     (1U << 6) /* when FEAT_LSE2 is implemented */
 #define SCTLR_B       (1U << 7) /* up to v6; RAZ in v7 */
 #define SCTLR_ITD     (1U << 7) /* v8 onward */
 #define SCTLR_S       (1U << 8) /* up to v6; RAZ in v7 */
@@ -3247,6 +3247,7 @@ FIELD(TBFLAG_A64, SVL, 24, 4)
 /* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */
 FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1)
 FIELD(TBFLAG_A64, FGT_ERET, 29, 1)
+FIELD(TBFLAG_A64, NAA, 30, 1)
 
 /*
  * Helpers for using the above.
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 809479f9b7..46a60f8987 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -137,6 +137,8 @@ typedef struct DisasContext {
     bool fgt_eret;
     /* True if fine-grained trap on SVC is enabled */
     bool fgt_svc;
+    /* True if FEAT_LSE2 SCTLR_ELx.nAA is set */
+    bool naa;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index c62ed05c12..d1683155a1 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12053,6 +12053,12 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         }
     }
 
+    if (cpu_isar_feature(aa64_lse2, env_archcpu(env))) {
+        if (sctlr & SCTLR_nAA) {
+            DP_TBFLAG_A64(flags, NAA, 1);
+        }
+    }
+
     /* Compute the condition for using AccType_UNPRIV for LDTR et al. */
     if (!(env->pstate & PSTATE_UAO)) {
         switch (mmu_idx) {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index caeb91efa5..56c9d63664 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14813,6 +14813,7 @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->pstate_sm = EX_TBFLAG_A64(tb_flags, PSTATE_SM);
     dc->pstate_za = EX_TBFLAG_A64(tb_flags, PSTATE_ZA);
     dc->sme_trap_nonstreaming = EX_TBFLAG_A64(tb_flags, SME_TRAP_NONSTREAMING);
+    dc->naa = EX_TBFLAG_A64(tb_flags, NAA);
     dc->vec_len = 0;
     dc->vec_stride = 0;
     dc->cp_regs = arm_cpu->cp_regs;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (14 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 15/19] target/arm: Add SCTLR.nAA to TBFLAG_A64 Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:49   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 17/19] target/arm: Move mte check for store-exclusive Richard Henderson
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

FEAT_LSE2 only requires that atomic operations not cross a
16-byte boundary.  Ordered operations may be completely
unaligned if SCTLR.nAA is set.

Because this alignment check is so special, do it by hand.
Make sure not to keep TCG temps live across the branch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-a64.h    |  3 ++
 target/arm/helper-a64.c    |  7 +++
 target/arm/translate-a64.c | 95 ++++++++++++++++++++++++++++++++------
 3 files changed, 92 insertions(+), 13 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index ff56807247..3d5957c11f 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -110,3 +110,6 @@ DEF_HELPER_FLAGS_2(st2g_stub, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(ldgm, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_3(stgm, TCG_CALL_NO_WG, void, env, i64, i64)
 DEF_HELPER_FLAGS_3(stzgm_tags, TCG_CALL_NO_WG, void, env, i64, i64)
+
+DEF_HELPER_FLAGS_4(unaligned_access, TCG_CALL_NO_WG,
+                   noreturn, env, i64, i32, i32)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 0972a4bdd0..abbe3f7077 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -952,3 +952,10 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
 
     memset(mem, 0, blocklen);
 }
+
+void HELPER(unaligned_access)(CPUARMState *env, uint64_t addr,
+                              uint32_t access_type, uint32_t mmu_idx)
+{
+    arm_cpu_do_unaligned_access(env_cpu(env), addr, access_type,
+                                mmu_idx, GETPC());
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 56c9d63664..78103f723d 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -310,6 +310,77 @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
     return clean_data_tbi(s, addr);
 }
 
+/*
+ * Generate the special alignment check that applies to AccType_ATOMIC
+ * and AccType_ORDERED insns under FEAT_LSE2: the access need not be
+ * naturally aligned, but it must not cross a 16-byte boundary.
+ * See AArch64.CheckAlignment().
+ */
+static void check_lse2_align(DisasContext *s, int rn, int imm,
+                             bool is_write, MemOp mop)
+{
+    TCGv_i32 tmp;
+    TCGv_i64 addr;
+    TCGLabel *over_label;
+    MMUAccessType type;
+    int mmu_idx;
+
+    tmp = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(tmp, cpu_reg_sp(s, rn));
+    tcg_gen_addi_i32(tmp, tmp, imm & 15);
+    tcg_gen_andi_i32(tmp, tmp, 15);
+    tcg_gen_addi_i32(tmp, tmp, memop_size(mop));
+
+    over_label = gen_new_label();
+    tcg_gen_brcond_i32(TCG_COND_LEU, tmp, tcg_constant_i32(16), over_label);
+    tcg_temp_free_i32(tmp);
+
+    addr = tcg_temp_new_i64();
+    tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm);
+
+    type = is_write ? MMU_DATA_STORE : MMU_DATA_LOAD,
+    mmu_idx = get_mem_index(s);
+    gen_helper_unaligned_access(cpu_env, addr, tcg_constant_i32(type),
+                                tcg_constant_i32(mmu_idx));
+    tcg_temp_free_i64(addr);
+
+    gen_set_label(over_label);
+
+}
+
+/* Handle the alignment check for AccType_ATOMIC instructions. */
+static MemOp check_atomic_align(DisasContext *s, int rn, MemOp mop)
+{
+    MemOp size = mop & MO_SIZE;
+
+    if (size == MO_8) {
+        return mop;
+    }
+    if (size >= MO_128 || !dc_isar_feature(aa64_lse2, s)) {
+        return mop | MO_ALIGN | s->be_data;
+    }
+    check_lse2_align(s, rn, 0, true, mop);
+    return mop | s->be_data;
+}
+
+/* Handle the alignment check for AccType_ORDERED instructions. */
+static MemOp check_ordered_align(DisasContext *s, int rn, int imm,
+                                 bool is_write, MemOp mop)
+{
+    MemOp size = mop & MO_SIZE;
+
+    if (size == MO_8) {
+        return mop;
+    }
+    if (size >= MO_128 || !dc_isar_feature(aa64_lse2, s)) {
+        return mop | MO_ALIGN | s->be_data;
+    }
+    if (!s->naa) {
+        check_lse2_align(s, rn, imm, is_write, mop);
+    }
+    return mop | s->be_data;
+}
+
 typedef struct DisasCompare64 {
     TCGCond cond;
     TCGv_i64 value;
@@ -2525,8 +2596,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn,
     if (memop == MO_128) {
         memop |= MO_ATMAX_8;
     }
-    memop |= MO_ALIGN;
-    memop = finalize_memop(s, memop);
+    memop = check_atomic_align(s, rn, memop);
 
     s->is_ldex = true;
     dirty_addr = cpu_reg_sp(s, rn);
@@ -2663,7 +2733,7 @@ static void gen_compare_and_swap(DisasContext *s, int rs, int rt,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
-    memop = finalize_memop(s, size | MO_ALIGN);
+    memop = check_atomic_align(s, rn, size);
     clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop);
     tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt,
                                memidx, memop);
@@ -2685,7 +2755,7 @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt,
     }
 
     /* This is a single atomic access, despite the "pair". */
-    memop = finalize_memop(s, (size + 1) | MO_ALIGN);
+    memop = check_atomic_align(s, rn, size + 1);
     clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop);
 
     if (size == 2) {
@@ -2809,8 +2879,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
             gen_check_sp_alignment(s);
         }
         tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
-        /* TODO: ARMv8.4-LSE SCTLR.nAA */
-        memop = finalize_memop(s, size | MO_ALIGN);
+        memop = check_ordered_align(s, rn, 0, true, size);
         clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
                                     true, rn != 31, memop);
         do_gpr_st(s, cpu_reg(s, rt), clean_addr, memop, true, rt,
@@ -2828,8 +2897,7 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         if (rn == 31) {
             gen_check_sp_alignment(s);
         }
-        /* TODO: ARMv8.4-LSE SCTLR.nAA */
-        memop = finalize_memop(s, size | MO_ALIGN);
+        memop = check_ordered_align(s, rn, 0, false, size);
         clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn),
                                     false, rn != 31, memop);
         do_gpr_ld(s, cpu_reg(s, rt), clean_addr, memop, false, true,
@@ -3510,7 +3578,7 @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     bool a = extract32(insn, 23, 1);
     TCGv_i64 tcg_rs, tcg_rt, clean_addr;
     AtomicThreeOpFn *fn = NULL;
-    MemOp mop = finalize_memop(s, size | MO_ALIGN);
+    MemOp mop = size;
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
@@ -3561,6 +3629,8 @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     if (rn == 31) {
         gen_check_sp_alignment(s);
     }
+
+    mop = check_atomic_align(s, rn, mop);
     clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, mop);
 
     if (o3_opc == 014) {
@@ -3685,16 +3755,13 @@ static void disas_ldst_ldapr_stlr(DisasContext *s, uint32_t insn)
     bool is_store = false;
     bool extend = false;
     bool iss_sf;
-    MemOp mop;
+    MemOp mop = size;
 
     if (!dc_isar_feature(aa64_rcpc_8_4, s)) {
         unallocated_encoding(s);
         return;
     }
 
-    /* TODO: ARMv8.4-LSE SCTLR.nAA */
-    mop = finalize_memop(s, size | MO_ALIGN);
-
     switch (opc) {
     case 0: /* STLURB */
         is_store = true;
@@ -3726,6 +3793,8 @@ static void disas_ldst_ldapr_stlr(DisasContext *s, uint32_t insn)
         gen_check_sp_alignment(s);
     }
 
+    mop = check_ordered_align(s, rn, offset, is_store, mop);
+
     dirty_addr = read_cpu_reg_sp(s, rn, 1);
     tcg_gen_addi_i64(dirty_addr, dirty_addr, offset);
     clean_addr = clean_data_tbi(s, dirty_addr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 17/19] target/arm: Move mte check for store-exclusive
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (15 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2 Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:36   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 18/19] test/tcg/multiarch: Adjust sigbus.c Richard Henderson
  2023-02-16  3:08 ` [PATCH v1 19/19] target/arm: Enable FEAT_LSE2 for -cpu max Richard Henderson
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Push the mte check behind the exclusive_addr check.
Document the several ways that we are still out of spec
with this implementation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 42 ++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 78103f723d..f9fc8ed215 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2654,17 +2654,47 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
      */
     TCGLabel *fail_label = gen_new_label();
     TCGLabel *done_label = gen_new_label();
-    TCGv_i64 tmp, dirty_addr, clean_addr;
+    TCGv_i64 tmp, clean_addr;
     MemOp memop;
 
-    memop = (size + is_pair) | MO_ALIGN;
-    memop = finalize_memop(s, memop);
-
-    dirty_addr = cpu_reg_sp(s, rn);
-    clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, memop);
+    /*
+     * FIXME: We are out of spec here.  We have recorded only the address
+     * from load_exclusive, not the entire range, and we assume that the
+     * size of the access on both sides match.  The architecture allows the
+     * store to be smaller than the load, so long as the stored bytes are
+     * within the range recorded by the load.
+     */
 
+    /* See AArch64.ExclusiveMonitorsPass() and AArch64.IsExclusiveVA(). */
+    clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn));
     tcg_gen_brcond_i64(TCG_COND_NE, clean_addr, cpu_exclusive_addr, fail_label);
 
+    /*
+     * The write, and any associated faults, only happen if the virtual
+     * and physical addresses pass the exclusive monitor check.  These
+     * faults are exceedingly unlikely, because normally the guest uses
+     * the exact same address register for the load_exclusive, and we
+     * would have recognized these faults there.
+     *
+     * It is possible to trigger an alignment fault pre-LSE2, e.g. with an
+     * unaligned 4-byte write within the range of an aligned 8-byte load.
+     * With LSE2, the store would need to cross a 16-byte boundary when the
+     * load did not, which would mean the store is outside the range
+     * recorded for the monitor, which would have failed a corrected monitor
+     * check above.  For now, we assume no size change and retain the
+     * MO_ALIGN to let tcg know what we checked in the load_exclusive.
+     *
+     * It is possible to trigger an MTE fault, by performing the load with
+     * a virtual address with a valid tag and performing the store with the
+     * same virtual address and a different invalid tag.
+     */
+    memop = size + is_pair;
+    if (memop == MO_128 || !dc_isar_feature(aa64_lse2, s)) {
+        memop |= MO_ALIGN;
+    }
+    memop = finalize_memop(s, memop);
+    gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop);
+
     tmp = tcg_temp_new_i64();
     if (is_pair) {
         if (size == 2) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 18/19] test/tcg/multiarch: Adjust sigbus.c
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (16 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 17/19] target/arm: Move mte check for store-exclusive Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:36   ` Peter Maydell
  2023-02-16  3:08 ` [PATCH v1 19/19] target/arm: Enable FEAT_LSE2 for -cpu max Richard Henderson
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

With -cpu max and FEAT_LSE2, the __aarch64__ section will only raise
an alignment exception when the load crosses a 16-byte boundary.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/tcg/multiarch/sigbus.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tests/tcg/multiarch/sigbus.c b/tests/tcg/multiarch/sigbus.c
index 8134c5fd56..f47c7390e7 100644
--- a/tests/tcg/multiarch/sigbus.c
+++ b/tests/tcg/multiarch/sigbus.c
@@ -6,8 +6,13 @@
 #include <endian.h>
 
 
-unsigned long long x = 0x8877665544332211ull;
-void * volatile p = (void *)&x + 1;
+char x[32] __attribute__((aligned(16))) = {
+  0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
+  0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10,
+  0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18,
+  0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20,
+};
+void * volatile p = (void *)&x + 15;
 
 void sigbus(int sig, siginfo_t *info, void *uc)
 {
@@ -60,9 +65,9 @@ int main()
      * We might as well validate the unaligned load worked.
      */
     if (BYTE_ORDER == LITTLE_ENDIAN) {
-        assert(tmp == 0x55443322);
+        assert(tmp == 0x13121110);
     } else {
-        assert(tmp == 0x77665544);
+        assert(tmp == 0x10111213);
     }
     return EXIT_SUCCESS;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v1 19/19] target/arm: Enable FEAT_LSE2 for -cpu max
  2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
                   ` (17 preceding siblings ...)
  2023-02-16  3:08 ` [PATCH v1 18/19] test/tcg/multiarch: Adjust sigbus.c Richard Henderson
@ 2023-02-16  3:08 ` Richard Henderson
  2023-02-23 16:37   ` Peter Maydell
  18 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-16  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/cpu64.c            | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 2062d71261..a97e05e746 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -50,6 +50,7 @@ the following architecture extensions:
 - FEAT_LRCPC (Load-acquire RCpc instructions)
 - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 - FEAT_LSE (Large System Extensions)
+- FEAT_LSE2 (Large System Extensions v2)
 - FEAT_LVA (Large Virtual Address space)
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 4066950da1..ab91b579bc 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -1245,6 +1245,7 @@ static void aarch64_max_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR2, IESB, 1);     /* FEAT_IESB */
     t = FIELD_DP64(t, ID_AA64MMFR2, VARANGE, 1);  /* FEAT_LVA */
     t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1);       /* FEAT_TTST */
+    t = FIELD_DP64(t, ID_AA64MMFR2, AT, 1);       /* FEAT_LSE2 */
     t = FIELD_DP64(t, ID_AA64MMFR2, IDS, 1);      /* FEAT_IDST */
     t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1);      /* FEAT_S2FWB */
     t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1);      /* FEAT_TTL */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits
  2023-02-16  3:08 ` [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits Richard Henderson
@ 2023-02-23 15:14   ` Peter Maydell
  2023-02-23 16:12     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We currently treat cpu_exclusive_high as containing the
> second word of LDXP, even though that word is not "high"
> in big-endian mode.  Swap things around so that it is.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 54 ++++++++++++++++++++------------------
>  1 file changed, 29 insertions(+), 25 deletions(-)

This code change looks OK as far as it goes, but the bad
news is that we migrate the env.exclusive_val and
env.exclusive_high values in the machine state. So a
migration from a QEMU before this change to a QEMU with
this change on a BE host will get confused...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 03/19] target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld}
  2023-02-16  3:08 ` [PATCH v1 03/19] target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld} Richard Henderson
@ 2023-02-23 15:23   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> While we don't require 16-byte atomicity here, using a single
> larger operation simplifies the code.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 38 ++++++++++++++------------------------
>  1 file changed, 14 insertions(+), 24 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G
  2023-02-16  3:08 ` [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G Richard Henderson
@ 2023-02-23 15:24   ` Peter Maydell
  2023-02-23 16:20     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This fixes a bug in that these two insns should have been using atomic
> 16-byte stores, since MTE is ARMv8.5 and LSE2 is mandatory from ARMv8.4.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c
> index a981de62d4..04974f9ebb 100644
> --- a/tests/tcg/aarch64/mte-7.c
> +++ b/tests/tcg/aarch64/mte-7.c
> @@ -19,8 +19,7 @@ int main(int ac, char **av)
>      p = (void *)((unsigned long)p | (1ul << 56));
>
>      /* Store tag in sequential granules. */
> -    asm("stg %0, [%0]" : : "r"(p + 0x0ff0));
> -    asm("stg %0, [%0]" : : "r"(p + 0x1000));
> +    asm("stz2g %0, [%0]" : : "r"(p + 0x0ff0));

Why did we need to change the test program ?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 05/19] target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r
  2023-02-16  3:08 ` [PATCH v1 05/19] target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r Richard Henderson
@ 2023-02-23 15:36   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Round len_align to 16 instead of 8, handling an odd 8-byte as part
> of the tail.  Use MO_ATOM_NONE to indicate that all of these memory
> ops have only byte atomicity.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 06/19] target/arm: Sink gen_mte_check1 into load/store_exclusive
  2023-02-16  3:08 ` [PATCH v1 06/19] target/arm: Sink gen_mte_check1 into load/store_exclusive Richard Henderson
@ 2023-02-23 15:40   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> No need to duplicate this check across multiple call sites.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 07/19] target/arm: Add feature test for FEAT_LSE2
  2023-02-16  3:08 ` [PATCH v1 07/19] target/arm: Add feature test for FEAT_LSE2 Richard Henderson
@ 2023-02-23 15:43   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 7bc97fece9..2108caf753 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -4049,6 +4049,11 @@ static inline bool isar_feature_aa64_st(const ARMISARegisters *id)
>      return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, ST) != 0;
>  }
>
> +static inline bool isar_feature_aa64_lse2(const ARMISARegisters *id)
> +{
> +    return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, AT) != 0;
> +}
> +

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 08/19] target/arm: Add atom_data to DisasContext
  2023-02-16  3:08 ` [PATCH v1 08/19] target/arm: Add atom_data to DisasContext Richard Henderson
@ 2023-02-23 15:47   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Use this to record the default atomicity of memory operations.
> Set it to MO_ATOM_WITHIN16 if FEAT_LSE2 applies.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.h     | 14 +++++++++++---
>  target/arm/translate-a64.c |  4 ++++
>  target/arm/translate.c     |  1 +
>  3 files changed, 16 insertions(+), 3 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 09/19] target/arm: Load/store integer pair with one tcg operation
  2023-02-16  3:08 ` [PATCH v1 09/19] target/arm: Load/store integer pair with one tcg operation Richard Henderson
@ 2023-02-23 15:57   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 15:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is required for LSE2, where the pair must be treated
> atomically if it does not cross a 16-byte boundary.  But
> it simplifies the code to do this always, just use the
> unpaired atomicity without LSE2.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 10/19] target/arm: Hoist finalize_memop out of do_gpr_{ld, st}
  2023-02-16  3:08 ` [PATCH v1 10/19] target/arm: Hoist finalize_memop out of do_gpr_{ld, st} Richard Henderson
@ 2023-02-23 16:03   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We are going to need the complete memop beforehand,
> so let's not compute it twice.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 11/19] target/arm: Hoist finalize_memop out of do_fp_{ld, st}
  2023-02-16  3:08 ` [PATCH v1 11/19] target/arm: Hoist finalize_memop out of do_fp_{ld, st} Richard Henderson
@ 2023-02-23 16:04   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We are going to need the complete memop beforehand,
> so let's not compute it twice.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 42 +++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 23 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 12/19] target/arm: Pass memop to gen_mte_check1*
  2023-02-16  3:08 ` [PATCH v1 12/19] target/arm: Pass memop to gen_mte_check1* Richard Henderson
@ 2023-02-23 16:08   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Pass the completed memop to gen_mte_check1_mmuidx.
> For the moment, do nothing more than extract the size.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 13/19] target/arm: Pass single_memop to gen_mte_checkN
  2023-02-16  3:08 ` [PATCH v1 13/19] target/arm: Pass single_memop to gen_mte_checkN Richard Henderson
@ 2023-02-23 16:10   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Pass the individual memop to gen_mte_checkN.
> For the moment, do nothing with it.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.h |  2 +-
>  target/arm/translate-a64.c | 26 +++++++++++++++-----------
>  target/arm/translate-sve.c |  4 ++--
>  3 files changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
> index 3fc39763d0..b7518f9d34 100644
> --- a/target/arm/translate-a64.h
> +++ b/target/arm/translate-a64.h
> @@ -54,7 +54,7 @@ TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr);
>  TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
>                          bool tag_checked, MemOp memop);
>  TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
> -                        bool tag_checked, int size);
> +                        bool tag_checked, int size, MemOp memop);
>
>  /* We should have at some point before trying to access an FP register
>   * done the necessary access check, so assert that
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index e02bdd3e7c..1117a1cc41 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -288,7 +288,7 @@ TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
>   * For MTE, check multiple logical sequential accesses.
>   */
>  TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write,
> -                        bool tag_checked, int size)
> +                        bool tag_checked, int total_size, MemOp single_mop)

Argument name in function definition should match the one in the
prototype.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits
  2023-02-23 15:14   ` Peter Maydell
@ 2023-02-23 16:12     ` Richard Henderson
  2023-02-23 16:51       ` Peter Maydell
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-23 16:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 2/23/23 05:14, Peter Maydell wrote:
> On Thu, 16 Feb 2023 at 03:09, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> We currently treat cpu_exclusive_high as containing the
>> second word of LDXP, even though that word is not "high"
>> in big-endian mode.  Swap things around so that it is.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/arm/translate-a64.c | 54 ++++++++++++++++++++------------------
>>   1 file changed, 29 insertions(+), 25 deletions(-)
> 
> This code change looks OK as far as it goes, but the bad
> news is that we migrate the env.exclusive_val and
> env.exclusive_high values in the machine state. So a
> migration from a QEMU before this change to a QEMU with
> this change on a BE host will get confused...

Oof.  Ok, I didn't *really* need this, it just seemed to make sense.  I'll add some 
commentary about "high" only meaning "high" for little-endian...


r~



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G
  2023-02-23 15:24   ` Peter Maydell
@ 2023-02-23 16:20     ` Richard Henderson
  2023-02-23 16:53       ` Peter Maydell
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-23 16:20 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 2/23/23 05:24, Peter Maydell wrote:
> On Thu, 16 Feb 2023 at 03:10, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> This fixes a bug in that these two insns should have been using atomic
>> 16-byte stores, since MTE is ARMv8.5 and LSE2 is mandatory from ARMv8.4.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
>> diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c
>> index a981de62d4..04974f9ebb 100644
>> --- a/tests/tcg/aarch64/mte-7.c
>> +++ b/tests/tcg/aarch64/mte-7.c
>> @@ -19,8 +19,7 @@ int main(int ac, char **av)
>>       p = (void *)((unsigned long)p | (1ul << 56));
>>
>>       /* Store tag in sequential granules. */
>> -    asm("stg %0, [%0]" : : "r"(p + 0x0ff0));
>> -    asm("stg %0, [%0]" : : "r"(p + 0x1000));
>> +    asm("stz2g %0, [%0]" : : "r"(p + 0x0ff0));
> 
> Why did we need to change the test program ?

It gives us a test of the stz2g insn.  We still have other instances of stg in the testsuite.


r~



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check
  2023-02-16  3:08 ` [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check Richard Henderson
@ 2023-02-23 16:28   ` Peter Maydell
  2023-02-23 16:38     ` Richard Henderson
  2023-02-23 16:54   ` Peter Maydell
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Fixes a bug in that with SCTLR.A set, we should raise any
> alignment fault before raising any MTE check fault.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/internals.h     |  3 ++-
>  target/arm/mte_helper.c    | 18 ++++++++++++++++++
>  target/arm/translate-a64.c |  2 ++
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index e1e018da46..fa264e368c 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -1222,7 +1222,8 @@ FIELD(MTEDESC, MIDX,  0, 4)
>  FIELD(MTEDESC, TBI,   4, 2)
>  FIELD(MTEDESC, TCMA,  6, 2)
>  FIELD(MTEDESC, WRITE, 8, 1)
> -FIELD(MTEDESC, SIZEM1, 9, SIMD_DATA_BITS - 9)  /* size - 1 */
> +FIELD(MTEDESC, ALIGN, 9, 3)
> +FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
>
>  bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
>  uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
> diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
> index 98bcf59c22..e50bb4ea13 100644
> --- a/target/arm/mte_helper.c
> +++ b/target/arm/mte_helper.c
> @@ -784,6 +784,24 @@ uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra)
>
>  uint64_t HELPER(mte_check)(CPUARMState *env, uint32_t desc, uint64_t ptr)
>  {
> +    /*
> +     * In the Arm ARM pseudocode, the alignment check happens at the top
> +     * of Mem[], while the MTE check happens later in AArch64.MemSingle[].
> +     * Thus the alignment check has priority.
> +     * When the mte check is disabled, tcg performs the alignment check
> +     * during the code generated for the memory access.
> +     */

Also described in the text: the I_ZFGJP priority table lists
MTE faults at priority 33, basically lower than anything else
except an external abort.

Looking at the code, is this really the only case here where
we were mis-prioritizing tag check faults? Have we already
checked things like "no page table entry" and all the other
cases that can cause data aborts at this point?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 15/19] target/arm: Add SCTLR.nAA to TBFLAG_A64
  2023-02-16  3:08 ` [PATCH v1 15/19] target/arm: Add SCTLR.nAA to TBFLAG_A64 Richard Henderson
@ 2023-02-23 16:32   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h           | 3 ++-
>  target/arm/translate.h     | 2 ++
>  target/arm/helper.c        | 6 ++++++
>  target/arm/translate-a64.c | 1 +
>  4 files changed, 11 insertions(+), 1 deletion(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 17/19] target/arm: Move mte check for store-exclusive
  2023-02-16  3:08 ` [PATCH v1 17/19] target/arm: Move mte check for store-exclusive Richard Henderson
@ 2023-02-23 16:36   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Push the mte check behind the exclusive_addr check.
> Document the several ways that we are still out of spec
> with this implementation.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 18/19] test/tcg/multiarch: Adjust sigbus.c
  2023-02-16  3:08 ` [PATCH v1 18/19] test/tcg/multiarch: Adjust sigbus.c Richard Henderson
@ 2023-02-23 16:36   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> With -cpu max and FEAT_LSE2, the __aarch64__ section will only raise
> an alignment exception when the load crosses a 16-byte boundary.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 19/19] target/arm: Enable FEAT_LSE2 for -cpu max
  2023-02-16  3:08 ` [PATCH v1 19/19] target/arm: Enable FEAT_LSE2 for -cpu max Richard Henderson
@ 2023-02-23 16:37   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  docs/system/arm/emulation.rst | 1 +
>  target/arm/cpu64.c            | 1 +
>  2 files changed, 2 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check
  2023-02-23 16:28   ` Peter Maydell
@ 2023-02-23 16:38     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-02-23 16:38 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 2/23/23 06:28, Peter Maydell wrote:
> Also described in the text: the I_ZFGJP priority table lists
> MTE faults at priority 33, basically lower than anything else
> except an external abort.
> 
> Looking at the code, is this really the only case here where
> we were mis-prioritizing tag check faults? Have we already
> checked things like "no page table entry" and all the other
> cases that can cause data aborts at this point?

Well, yes, we do the page table lookup in allocation_tag_mem(), while fetching the tag 
memory against which the check is performed.

If there's another mis-prioritization, I don't know about it.


r~



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2
  2023-02-16  3:08 ` [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2 Richard Henderson
@ 2023-02-23 16:49   ` Peter Maydell
  2023-02-23 17:16     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> FEAT_LSE2 only requires that atomic operations not cross a
> 16-byte boundary.  Ordered operations may be completely
> unaligned if SCTLR.nAA is set.
>
> Because this alignment check is so special, do it by hand.
> Make sure not to keep TCG temps live across the branch.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


> +static void check_lse2_align(DisasContext *s, int rn, int imm,
> +                             bool is_write, MemOp mop)
> +{
> +    TCGv_i32 tmp;
> +    TCGv_i64 addr;
> +    TCGLabel *over_label;
> +    MMUAccessType type;
> +    int mmu_idx;
> +
> +    tmp = tcg_temp_new_i32();
> +    tcg_gen_extrl_i64_i32(tmp, cpu_reg_sp(s, rn));
> +    tcg_gen_addi_i32(tmp, tmp, imm & 15);
> +    tcg_gen_andi_i32(tmp, tmp, 15);
> +    tcg_gen_addi_i32(tmp, tmp, memop_size(mop));
> +
> +    over_label = gen_new_label();
> +    tcg_gen_brcond_i32(TCG_COND_LEU, tmp, tcg_constant_i32(16), over_label);

This brcond ends the basic block and destroys the content
of TCG temporaries, which is bad because some of the
callsites have set some of those up before calling this
function (eg gen_compare_and_swap() has called cpu_reg()
which might have created and initialized a temporary
for xZR).

Using a brcond anywhere other than directly in a top level
function where you can see it and work around this awkward
property seems rather fragile.

(Ideally there would be a variant of brcond that didn't
trash temporaries, because almost all the time that is
an annoying hazard rather than a useful property.)

> +    tcg_temp_free_i32(tmp);
> +
> +    addr = tcg_temp_new_i64();
> +    tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm);
> +
> +    type = is_write ? MMU_DATA_STORE : MMU_DATA_LOAD,
> +    mmu_idx = get_mem_index(s);
> +    gen_helper_unaligned_access(cpu_env, addr, tcg_constant_i32(type),
> +                                tcg_constant_i32(mmu_idx));
> +    tcg_temp_free_i64(addr);
> +
> +    gen_set_label(over_label);
> +
> +}

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits
  2023-02-23 16:12     ` Richard Henderson
@ 2023-02-23 16:51       ` Peter Maydell
  2023-02-23 17:08         ` Peter Maydell
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 23 Feb 2023 at 16:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/23/23 05:14, Peter Maydell wrote:
> > On Thu, 16 Feb 2023 at 03:09, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> We currently treat cpu_exclusive_high as containing the
> >> second word of LDXP, even though that word is not "high"
> >> in big-endian mode.  Swap things around so that it is.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   target/arm/translate-a64.c | 54 ++++++++++++++++++++------------------
> >>   1 file changed, 29 insertions(+), 25 deletions(-)
> >
> > This code change looks OK as far as it goes, but the bad
> > news is that we migrate the env.exclusive_val and
> > env.exclusive_high values in the machine state. So a
> > migration from a QEMU before this change to a QEMU with
> > this change on a BE host will get confused...
>
> Oof.  Ok, I didn't *really* need this, it just seemed to make sense.  I'll add some
> commentary about "high" only meaning "high" for little-endian...

The current state of affairs is arguably broken, because it
means you can't migrate a guest from a BE host to an LE host,
because the migration stream contains host-endian-dependent
data. But if it's easy to leave this sleeping dog alone
then it will save us figuring out what we would need to do
in a post-load function :-)

-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G
  2023-02-23 16:20     ` Richard Henderson
@ 2023-02-23 16:53       ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 23 Feb 2023 at 16:20, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/23/23 05:24, Peter Maydell wrote:
> > On Thu, 16 Feb 2023 at 03:10, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> This fixes a bug in that these two insns should have been using atomic
> >> 16-byte stores, since MTE is ARMv8.5 and LSE2 is mandatory from ARMv8.4.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >
> >> diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c
> >> index a981de62d4..04974f9ebb 100644
> >> --- a/tests/tcg/aarch64/mte-7.c
> >> +++ b/tests/tcg/aarch64/mte-7.c
> >> @@ -19,8 +19,7 @@ int main(int ac, char **av)
> >>       p = (void *)((unsigned long)p | (1ul << 56));
> >>
> >>       /* Store tag in sequential granules. */
> >> -    asm("stg %0, [%0]" : : "r"(p + 0x0ff0));
> >> -    asm("stg %0, [%0]" : : "r"(p + 0x1000));
> >> +    asm("stz2g %0, [%0]" : : "r"(p + 0x0ff0));
> >
> > Why did we need to change the test program ?
>
> It gives us a test of the stz2g insn.  We still have other instances of stg in the testsuite.

Ah, right. That should probably be a separate commit.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check
  2023-02-16  3:08 ` [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check Richard Henderson
  2023-02-23 16:28   ` Peter Maydell
@ 2023-02-23 16:54   ` Peter Maydell
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 16:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 16 Feb 2023 at 03:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Fixes a bug in that with SCTLR.A set, we should raise any
> alignment fault before raising any MTE check fault.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits
  2023-02-23 16:51       ` Peter Maydell
@ 2023-02-23 17:08         ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 17:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 23 Feb 2023 at 16:51, Peter Maydell <peter.maydell@linaro.org> wrote:
> The current state of affairs is arguably broken, because it
> means you can't migrate a guest from a BE host to an LE host,
> because the migration stream contains host-endian-dependent
> data.

...thinking it through, this isn't right. The current code
has cpu_exclusive_high always be the 64-bit word from
the higher of the two addresses. The change proposes
making it be the high part of the guest 128-bit value
(which is different if the guest is in BE mode).
Neither of those definitions depend on the host endianness:
they're just different and we would have to figure out
how to convert between them on migration.

-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2
  2023-02-23 16:49   ` Peter Maydell
@ 2023-02-23 17:16     ` Richard Henderson
  2023-02-23 17:22       ` Peter Maydell
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-02-23 17:16 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 2/23/23 06:49, Peter Maydell wrote:
> On Thu, 16 Feb 2023 at 03:09, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> FEAT_LSE2 only requires that atomic operations not cross a
>> 16-byte boundary.  Ordered operations may be completely
>> unaligned if SCTLR.nAA is set.
>>
>> Because this alignment check is so special, do it by hand.
>> Make sure not to keep TCG temps live across the branch.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
>> +static void check_lse2_align(DisasContext *s, int rn, int imm,
>> +                             bool is_write, MemOp mop)
>> +{
>> +    TCGv_i32 tmp;
>> +    TCGv_i64 addr;
>> +    TCGLabel *over_label;
>> +    MMUAccessType type;
>> +    int mmu_idx;
>> +
>> +    tmp = tcg_temp_new_i32();
>> +    tcg_gen_extrl_i64_i32(tmp, cpu_reg_sp(s, rn));
>> +    tcg_gen_addi_i32(tmp, tmp, imm & 15);
>> +    tcg_gen_andi_i32(tmp, tmp, 15);
>> +    tcg_gen_addi_i32(tmp, tmp, memop_size(mop));
>> +
>> +    over_label = gen_new_label();
>> +    tcg_gen_brcond_i32(TCG_COND_LEU, tmp, tcg_constant_i32(16), over_label);
> 
> This brcond ends the basic block and destroys the content
> of TCG temporaries, which is bad because some of the
> callsites have set some of those up before calling this
> function (eg gen_compare_and_swap() has called cpu_reg()
> which might have created and initialized a temporary
> for xZR).

xzr uses tcg_constant_i64(), which has no lifetime issues.

> 
> Using a brcond anywhere other than directly in a top level
> function where you can see it and work around this awkward
> property seems rather fragile.
> 
> (Ideally there would be a variant of brcond that didn't
> trash temporaries, because almost all the time that is
> an annoying hazard rather than a useful property.)

I've cc'd you on a patch set that fixes all the temporary lifetime stuff.

v1: https://patchew.org/QEMU/20230130205935.1157347-1-richard.henderson@linaro.org/
v2: https://patchew.org/QEMU/20230222232715.15034-1-richard.henderson@linaro.org/


r~


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2
  2023-02-23 17:16     ` Richard Henderson
@ 2023-02-23 17:22       ` Peter Maydell
  2023-02-23 17:27         ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2023-02-23 17:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 23 Feb 2023 at 17:16, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/23/23 06:49, Peter Maydell wrote:
> > On Thu, 16 Feb 2023 at 03:09, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> FEAT_LSE2 only requires that atomic operations not cross a
> >> 16-byte boundary.  Ordered operations may be completely
> >> unaligned if SCTLR.nAA is set.
> >>
> >> Because this alignment check is so special, do it by hand.
> >> Make sure not to keep TCG temps live across the branch.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >
> >
> >> +static void check_lse2_align(DisasContext *s, int rn, int imm,
> >> +                             bool is_write, MemOp mop)
> >> +{
> >> +    TCGv_i32 tmp;
> >> +    TCGv_i64 addr;
> >> +    TCGLabel *over_label;
> >> +    MMUAccessType type;
> >> +    int mmu_idx;
> >> +
> >> +    tmp = tcg_temp_new_i32();
> >> +    tcg_gen_extrl_i64_i32(tmp, cpu_reg_sp(s, rn));
> >> +    tcg_gen_addi_i32(tmp, tmp, imm & 15);
> >> +    tcg_gen_andi_i32(tmp, tmp, 15);
> >> +    tcg_gen_addi_i32(tmp, tmp, memop_size(mop));
> >> +
> >> +    over_label = gen_new_label();
> >> +    tcg_gen_brcond_i32(TCG_COND_LEU, tmp, tcg_constant_i32(16), over_label);
> >
> > This brcond ends the basic block and destroys the content
> > of TCG temporaries, which is bad because some of the
> > callsites have set some of those up before calling this
> > function (eg gen_compare_and_swap() has called cpu_reg()
> > which might have created and initialized a temporary
> > for xZR).
>
> xzr uses tcg_constant_i64(), which has no lifetime issues.

Hmm? cpu_reg() calls new_tmp_a64_zero() calls new_tmp_a64()
calls tcg_temp_new_i64(). What am I missing ?

> I've cc'd you on a patch set that fixes all the temporary lifetime stuff.
>
> v1: https://patchew.org/QEMU/20230130205935.1157347-1-richard.henderson@linaro.org/
> v2: https://patchew.org/QEMU/20230222232715.15034-1-richard.henderson@linaro.org/

Cool!

thanks
-- PMM


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2
  2023-02-23 17:22       ` Peter Maydell
@ 2023-02-23 17:27         ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-02-23 17:27 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 2/23/23 07:22, Peter Maydell wrote:
> On Thu, 23 Feb 2023 at 17:16, Richard Henderson
>> xzr uses tcg_constant_i64(), which has no lifetime issues.
> 
> Hmm? cpu_reg() calls new_tmp_a64_zero() calls new_tmp_a64()
> calls tcg_temp_new_i64(). What am I missing ?

Whoops, bad memory vs other target/foo/ -- aarch64 doesn't have separate 
get-register-for-source and get-register-for-destination accessors.


r~


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2023-02-23 17:27 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-16  3:08 [PATCH v1 00/19] target/arm: Implement FEAT_LSE2 Richard Henderson
2023-02-16  3:08 ` [PATCH v1 01/19] target/arm: Make cpu_exclusive_high hold the high bits Richard Henderson
2023-02-23 15:14   ` Peter Maydell
2023-02-23 16:12     ` Richard Henderson
2023-02-23 16:51       ` Peter Maydell
2023-02-23 17:08         ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 02/19] target/arm: Use tcg_gen_qemu_ld_i128 for LDXP Richard Henderson
2023-02-16  3:08 ` [PATCH v1 03/19] target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld} Richard Henderson
2023-02-23 15:23   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 04/19] target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G Richard Henderson
2023-02-23 15:24   ` Peter Maydell
2023-02-23 16:20     ` Richard Henderson
2023-02-23 16:53       ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 05/19] target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r Richard Henderson
2023-02-23 15:36   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 06/19] target/arm: Sink gen_mte_check1 into load/store_exclusive Richard Henderson
2023-02-23 15:40   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 07/19] target/arm: Add feature test for FEAT_LSE2 Richard Henderson
2023-02-23 15:43   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 08/19] target/arm: Add atom_data to DisasContext Richard Henderson
2023-02-23 15:47   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 09/19] target/arm: Load/store integer pair with one tcg operation Richard Henderson
2023-02-23 15:57   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 10/19] target/arm: Hoist finalize_memop out of do_gpr_{ld, st} Richard Henderson
2023-02-23 16:03   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 11/19] target/arm: Hoist finalize_memop out of do_fp_{ld, st} Richard Henderson
2023-02-23 16:04   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 12/19] target/arm: Pass memop to gen_mte_check1* Richard Henderson
2023-02-23 16:08   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 13/19] target/arm: Pass single_memop to gen_mte_checkN Richard Henderson
2023-02-23 16:10   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 14/19] target/arm: Check alignment in helper_mte_check Richard Henderson
2023-02-23 16:28   ` Peter Maydell
2023-02-23 16:38     ` Richard Henderson
2023-02-23 16:54   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 15/19] target/arm: Add SCTLR.nAA to TBFLAG_A64 Richard Henderson
2023-02-23 16:32   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 16/19] target/arm: Relax ordered/atomic alignment checks for LSE2 Richard Henderson
2023-02-23 16:49   ` Peter Maydell
2023-02-23 17:16     ` Richard Henderson
2023-02-23 17:22       ` Peter Maydell
2023-02-23 17:27         ` Richard Henderson
2023-02-16  3:08 ` [PATCH v1 17/19] target/arm: Move mte check for store-exclusive Richard Henderson
2023-02-23 16:36   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 18/19] test/tcg/multiarch: Adjust sigbus.c Richard Henderson
2023-02-23 16:36   ` Peter Maydell
2023-02-16  3:08 ` [PATCH v1 19/19] target/arm: Enable FEAT_LSE2 for -cpu max Richard Henderson
2023-02-23 16:37   ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).