qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] real ll/sc emulation
@ 2024-02-20  4:19 Nicholas Piggin
  2024-02-20  4:19 ` [RFC PATCH 1/3] accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nicholas Piggin @ 2024-02-20  4:19 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

I've been toying with how we might do a more faithful ll/sc emulation.
Our cmpxchg based one actually had problems on some firmware code we're
testing.

The using the dirty memory bitmap to detect stores coming from other
CPUs and invalidating active protection / reservations seems to be a
possibility. This passes some basic atomic and locking stress tests
with mttcg, and boots Linux.

Excuse some of the hacky / ugly / unfinished bits of code, I missed
getting into details of vfio, migration, and making it clean. Just
wanted to hear thoughts on the general idea at the moment.

The code doesn't seem to be _terribly_ tricky, but there are some
tricks around the store-conditional side of it where we have to
take a mutex, do the tlb lookup with possible recursion into the
code protected by that mutex, verify the protection is still active,
and then modify memory.

There is only a single lock now, but if that beomes a problem we
*might* be able to split it via physical address hash. But that
doesn't help uncontended performance or contention on the same
address, which are probably the two most important cases.

(I will submit the TCG TLB coherency fix patch separately, difficulty
at the moment is creating a test case for it that does not require
subsequent patches!)

Thanks,
Nick

Nicholas Piggin (3):
  accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing
  tcg: add a ll/sc protection facility
  target/ppc: Implement reservation protection for larx/stcx

 include/exec/cputlb.h    |   7 ++
 include/exec/exec-all.h  |   1 -
 include/exec/ram_addr.h  |  42 ++++++-
 include/exec/ramlist.h   |  10 ++
 include/hw/core/cpu.h    |   5 +
 target/ppc/cpu-param.h   |   4 +
 target/ppc/helper.h      |   2 +
 accel/stubs/tcg-stub.c   |   4 -
 accel/tcg/cputlb.c       | 235 ++++++++++++++++++++++++++++++++++++---
 hw/core/cpu-common.c     |   5 +
 hw/vfio/common.c         |   2 +-
 hw/virtio/vhost.c        |   1 +
 system/memory.c          |   3 +
 system/physmem.c         |   7 ++
 target/ppc/cpu_init.c    |   4 +
 target/ppc/mem_helper.c  | 132 ++++++++++++++++++++++
 target/ppc/translate.c   | 128 +++++----------------
 system/memory_ldst.c.inc |   3 +-
 18 files changed, 467 insertions(+), 128 deletions(-)

-- 
2.42.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/3] accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing
  2024-02-20  4:19 [RFC PATCH 0/3] real ll/sc emulation Nicholas Piggin
@ 2024-02-20  4:19 ` Nicholas Piggin
  2024-02-20  4:19 ` [RFC PATCH 2/3] tcg: add a ll/sc protection facility Nicholas Piggin
  2024-02-20  4:19 ` [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx Nicholas Piggin
  2 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2024-02-20  4:19 UTC (permalink / raw)
  To: qemu-ppc, Thomas Huth; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

The TCG TLB keeps a TLB_NOTDIRTY bit that must be kept coherent
with the cpu physical dirty memory bitmaps. If any bits are clear,
TLB_NOTDIRTY *must* be set in all CPUs so that the nodirty_write()
slowpath is guaranteed to be called on a write access.

TLB_NOTDIRTY may be set if none of the bits are clear, it just
results in a superfluous nodirty_write() call (which should remove
the bit).

There is a race with clearing physical dirty bits and setting
TLB_NOTDIRTY vs setting or creating TLB entries without the
TLB_NOTDIRTY bit, that can cause the bit to get lost and
notdirty_write() updates to be missed.

nodirty_write()
1. cpu_physical_memory_set_dirty_range()
2. if (!cpu_physical_memory_is_clean())
3.    tlb_set_dirty()

vs

cpu_physical_memory_test_and_clear_dirty()
A. dirty = bitmap_test_and_clear_atomic()
   if (dirty)
B.     tlb_reset_dirty_range_all()

1 executes, then 2 finds no bits clean, then A clears the dirty bit
and runs B to set TLB_NOTDIRTY on the TLB entries, then 3 clears the
TLB_NOTDIRTY bits from the TLB entries.

This results in the physical dirty bitmap having a clear bit with a
TLB entry pointing to it without TLB_NOTDIRTY, which means stores
through that TLB are performed without the notdirty_write() call to
set dirty bits (or invalidate tb).

Fix this by checking the physical dirty bitmap while holding the TLB
lock that also covers TLB entry insertion / TLB_NOTDIRTY clearing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Hi Thomas,

This is the other dirty bitmap race I saw, if you are hunting
migration bugs...

Thanks,
Nick

 include/exec/exec-all.h |  1 -
 accel/stubs/tcg-stub.c  |  4 ----
 accel/tcg/cputlb.c      | 47 +++++++++++++++++++++++++++++------------
 3 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ce36bb10d4..ade81b25f0 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -654,7 +654,6 @@ static inline void mmap_unlock(void) {}
 #define WITH_MMAP_LOCK_GUARD()
 
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length);
-void tlb_set_dirty(CPUState *cpu, vaddr addr);
 
 MemoryRegionSection *
 address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
index 8a496a2a6f..dd890d6cf6 100644
--- a/accel/stubs/tcg-stub.c
+++ b/accel/stubs/tcg-stub.c
@@ -18,10 +18,6 @@ void tb_flush(CPUState *cpu)
 {
 }
 
-void tlb_set_dirty(CPUState *cpu, vaddr vaddr)
-{
-}
-
 int probe_access_flags(CPUArchState *env, vaddr addr, int size,
                        MMUAccessType access_type, int mmu_idx,
                        bool nonfault, void **phost, uintptr_t retaddr)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 047cd2cc0a..078f4ef166 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1001,10 +1001,17 @@ static inline void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s)
     *d = *s;
 }
 
-/* This is a cross vCPU call (i.e. another vCPU resetting the flags of
+/*
+ * This is a cross vCPU call (i.e. another vCPU resetting the flags of
  * the target vCPU).
  * We must take tlb_c.lock to avoid racing with another vCPU update. The only
  * thing actually updated is the target TLB entry ->addr_write flags.
+ *
+ * This can race between the physical memory dirty bits becoming all set and
+ * tlb_set_page_full or tlb_try_set_dirty creating dirty entries: it is
+ * harmless to set TLB_NOTDIRTY on a !clean physical page, it might just
+ * cause a notdirty_write() slowpath on the next write which clears out the
+ * spurious TLB_NOTDIRTY.
  */
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
@@ -1037,9 +1044,11 @@ static inline void tlb_set_dirty1_locked(CPUTLBEntry *tlb_entry,
     }
 }
 
-/* update the TLB corresponding to virtual page vaddr
-   so that it is no longer dirty */
-void tlb_set_dirty(CPUState *cpu, vaddr addr)
+/*
+ * Update the TLB corresponding to virtual page vaddr if it no longer
+ * requires the TLB_NOTDIRTY bit.
+ */
+static void tlb_try_set_dirty(CPUState *cpu, vaddr addr, ram_addr_t ram_addr)
 {
     int mmu_idx;
 
@@ -1047,6 +1056,12 @@ void tlb_set_dirty(CPUState *cpu, vaddr addr)
 
     addr &= TARGET_PAGE_MASK;
     qemu_spin_lock(&cpu->neg.tlb.c.lock);
+    if (cpu_physical_memory_is_clean(ram_addr)) {
+        /* Must be checked under lock, like tlb_set_page_full */
+        qemu_spin_unlock(&cpu->neg.tlb.c.lock);
+        return;
+    }
+
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         tlb_set_dirty1_locked(tlb_entry(cpu, mmu_idx, addr), addr);
     }
@@ -1165,6 +1180,19 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
         addend = 0;
     }
 
+    /*
+     * Hold the TLB lock for the rest of the function. We could acquire/release
+     * the lock several times in the function, but it is faster to amortize the
+     * acquisition cost by acquiring it just once. Note that this leads to
+     * a longer critical section, but this is not a concern since the TLB lock
+     * is unlikely to be contended.
+     *
+     * The TLB lock must be held over checking cpu_physical_memory_is_clean
+     * and inserting the entry into the TLB, so clearing phsyical memory
+     * dirty status can find and set TLB_NOTDIRTY on the entries.
+     */
+    qemu_spin_lock(&tlb->c.lock);
+
     write_flags = read_flags;
     if (is_ram) {
         iotlb = memory_region_get_ram_addr(section->mr) + xlat;
@@ -1200,15 +1228,6 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
     index = tlb_index(cpu, mmu_idx, addr_page);
     te = tlb_entry(cpu, mmu_idx, addr_page);
 
-    /*
-     * Hold the TLB lock for the rest of the function. We could acquire/release
-     * the lock several times in the function, but it is faster to amortize the
-     * acquisition cost by acquiring it just once. Note that this leads to
-     * a longer critical section, but this is not a concern since the TLB lock
-     * is unlikely to be contended.
-     */
-    qemu_spin_lock(&tlb->c.lock);
-
     /* Note that the tlb is no longer clean.  */
     tlb->c.dirty |= 1 << mmu_idx;
 
@@ -1409,7 +1428,7 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
     /* We remove the notdirty callback only if the code has been flushed. */
     if (!cpu_physical_memory_is_clean(ram_addr)) {
         trace_memory_notdirty_set_dirty(mem_vaddr);
-        tlb_set_dirty(cpu, mem_vaddr);
+        tlb_try_set_dirty(cpu, mem_vaddr, ram_addr);
     }
 }
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/3] tcg: add a ll/sc protection facility
  2024-02-20  4:19 [RFC PATCH 0/3] real ll/sc emulation Nicholas Piggin
  2024-02-20  4:19 ` [RFC PATCH 1/3] accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing Nicholas Piggin
@ 2024-02-20  4:19 ` Nicholas Piggin
  2024-02-20 18:53   ` Richard Henderson
  2024-02-20  4:19 ` [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx Nicholas Piggin
  2 siblings, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2024-02-20  4:19 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

This uses the cpu physical memory dirty mechanism to provide a LL/SC
protection system, so that a CPU can set llsc protection on a block of
memory, and it can check whether any other CPUs have stored to that
memory in a way that can be done race-free to perform a store
conditional on that check.

QEMU TCG implements LL/SC conditional store with an atomic cmpxchg that
only checks that the value of memory is the same as the value returned
by the LL. This works well in practice, but it does not faithfully
emulate LL/SC. Among the issues are:

1. cmpxchg incorrectly succeeds in the case that stores were performed
   by other CPUs, but the observed value is the same. Also known as the
   ABA problem.

2. Some implementations (e.g., IBM POWER) provide a protection block
   (reservation granule) size that is larger than the load. A store
   from another CPU to anywhere in the granule should cause a loss of
   protection. A conditional store from this CPU to anywhere in the
   granule could also succeed. Some IBM firmware relies on this
   implementation detail.

3. Some (e.g., ppc, arm) implementations of LL/SC work on physical
   memory. It could succeed if the store is performed at a different
   virtual address than the load, if they alias to the same physical
   memory.

4. Some implementations allow the CPU that performed the load to store
   a different value at that memory location with a simple store, and
   have the conditional-store succeed.

These can cause varying degrees of problems. 2. was observed to cause
a real problem. Although most open source code doesn't seem to have
a problem. And I think the more restrictie pseudo-LLSC encourages
software to be clean :)

The ABA problem *almost* has an interesting memory ordering problem
vs Linux spinlocks:

lock:
   larx
   cmp
   bne   busy
   stcx.
   bne-  lock
   lwsync

The lwsync barrier permits loads to be performed ahead of earleir stores
so a load in the critical section can see a result before the stcx. store
to set the lock is visible to other CPUs. This works because lwsync does
prevent those loads being performed before the earlier larx, so they must
be performed with a valid reservation that results in the lock being taken.

If another CPU could come between the larx and stcx. and take the lock,
perform a critical section, and then release the lock, then if the stcx.
succeeds, the lwsync would not prevent loads in the critical section
being performed before it. The saving grace here is that a cmpxchg-based
stcx. itself has to be implemented with a larx on the host, so it does
end up ordering correctly. Maybe that is inherent in any similar kind of
issue and so it's not really very interesting, but I thought it showed
possible subtle issues (one of our firmware engineers spotted it).
---
 include/exec/cputlb.h    |   7 ++
 include/exec/ram_addr.h  |  42 ++++++++-
 include/exec/ramlist.h   |  10 +++
 include/hw/core/cpu.h    |   5 ++
 accel/tcg/cputlb.c       | 188 ++++++++++++++++++++++++++++++++++++++-
 hw/core/cpu-common.c     |   5 ++
 hw/vfio/common.c         |   2 +-
 hw/virtio/vhost.c        |   1 +
 system/memory.c          |   3 +
 system/physmem.c         |   7 ++
 system/memory_ldst.c.inc |   3 +-
 11 files changed, 266 insertions(+), 7 deletions(-)

diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h
index 6da1462c4f..8c590b34a7 100644
--- a/include/exec/cputlb.h
+++ b/include/exec/cputlb.h
@@ -27,4 +27,11 @@
 void tlb_protect_code(ram_addr_t ram_addr);
 void tlb_unprotect_code(ram_addr_t ram_addr);
 #endif
+
+void cpu_set_llsc_prot(CPUState *cpu, ram_addr_t addr);
+bool cpu_resolve_llsc_begin(CPUState *cpu);
+void cpu_resolve_llsc_abort(CPUState *cpu);
+bool cpu_resolve_llsc_check(CPUState *cpu, ram_addr_t addr);
+void cpu_resolve_llsc_success(CPUState *cpu);
+
 #endif
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index dadb2deb11..4776921c49 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -147,7 +147,13 @@ static inline void qemu_ram_block_writeback(RAMBlock *block)
 }
 
 #define DIRTY_CLIENTS_ALL     ((1 << DIRTY_MEMORY_NUM) - 1)
-#define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
+
+/*
+ * Migration and VGA just want to see the dirty bit set, hence "oneshot".
+ * As opposed to code and llsc_prot which want to be called on every store.
+ */
+#define DIRTY_CLIENTS_ONESHOT ((1 << DIRTY_MEMORY_VGA) |        \
+                               (1 << DIRTY_MEMORY_MIGRATION))
 
 static inline bool cpu_physical_memory_get_dirty(ram_addr_t start,
                                                  ram_addr_t length,
@@ -240,7 +246,13 @@ static inline bool cpu_physical_memory_is_clean(ram_addr_t addr)
     bool code = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_CODE);
     bool migration =
         cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_MIGRATION);
-    return !(vga && code && migration);
+    bool llsc_prot =
+#ifdef TARGET_HAS_LLSC_PROT
+        cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_LLSC_PROT);
+#else
+        false;
+#endif
+    return !(vga && code && migration && llsc_prot);
 }
 
 static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start,
@@ -261,6 +273,12 @@ static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start,
         !cpu_physical_memory_all_dirty(start, length, DIRTY_MEMORY_MIGRATION)) {
         ret |= (1 << DIRTY_MEMORY_MIGRATION);
     }
+#ifdef TARGET_HAS_LLSC_PROT
+    if (mask & (1 << DIRTY_MEMORY_LLSC_PROT) &&
+        !cpu_physical_memory_all_dirty(start, length, DIRTY_MEMORY_LLSC_PROT)) {
+        ret |= (1 << DIRTY_MEMORY_LLSC_PROT);
+    }
+#endif
     return ret;
 }
 
@@ -322,6 +340,12 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
                 bitmap_set_atomic(blocks[DIRTY_MEMORY_CODE]->blocks[idx],
                                   offset, next - page);
             }
+#ifdef TARGET_HAS_LLSC_PROT
+            if (unlikely(mask & (1 << DIRTY_MEMORY_LLSC_PROT))) {
+                bitmap_set_atomic(blocks[DIRTY_MEMORY_LLSC_PROT]->blocks[idx],
+                                  offset, next - page);
+            }
+#endif
 
             page = next;
             idx++;
@@ -355,6 +379,8 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
     unsigned long hpratio = qemu_real_host_page_size() / TARGET_PAGE_SIZE;
     unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
 
+    assert(0);
+
     /* start address is aligned at the start of a word? */
     if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) &&
         (hpratio == 1)) {
@@ -396,6 +422,12 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
                     if (tcg_enabled()) {
                         qatomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset],
                                    temp);
+#ifdef TARGET_HAS_LLSC_PROT
+			/* XXX? */
+                        qatomic_or(&blocks[DIRTY_MEMORY_LLSC_PROT][idx][offset],
+                                   temp);
+#endif
+			assert(0);
                     }
                 }
 
@@ -408,7 +440,8 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 
         xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS);
     } else {
-        uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE;
+        uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL :
+                                          DIRTY_CLIENTS_ONESHOT;
 
         if (!global_dirty_tracking) {
             clients &= ~(1 << DIRTY_MEMORY_MIGRATION);
@@ -470,6 +503,9 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
     cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_MIGRATION);
     cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_VGA);
     cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_CODE);
+#ifdef TARGET_HAS_LLSC_PROT
+    cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_LLSC_PROT);
+#endif
 }
 
 
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 2ad2a81acc..e6523bf25c 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -8,10 +8,20 @@
 
 typedef struct RAMBlockNotifier RAMBlockNotifier;
 
+/* XXX: hack, need to work out how to do this */
+#ifndef TARGET_HAS_LLSC_PROT
+#define TARGET_HAS_LLSC_PROT
+#endif
+
 #define DIRTY_MEMORY_VGA       0
 #define DIRTY_MEMORY_CODE      1
 #define DIRTY_MEMORY_MIGRATION 2
+#ifdef TARGET_HAS_LLSC_PROT
+#define DIRTY_MEMORY_LLSC_PROT 3
+#define DIRTY_MEMORY_NUM       4        /* num of dirty bits */
+#else
 #define DIRTY_MEMORY_NUM       3        /* num of dirty bits */
+#endif
 
 /* The dirty memory bitmap is split into fixed-size blocks to allow growth
  * under RCU.  The bitmap for a block can be accessed as follows:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 4385ce54c9..f1da94771e 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -489,6 +489,11 @@ struct CPUState {
     uint64_t random_seed;
     sigjmp_buf jmp_env;
 
+    int llsc_prot_block_size;
+    bool llsc_prot_active;
+    bool llsc_resolving;
+    hwaddr llsc_prot_address; /* ram_addr_t physical address of reservation */
+
     QemuMutex work_mutex;
     QSIMPLEQ_HEAD(, qemu_work_item) work_list;
 
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 078f4ef166..d27053cadc 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -274,11 +274,18 @@ static inline void tlb_n_used_entries_dec(CPUState *cpu, uintptr_t mmu_idx)
     cpu->neg.tlb.d[mmu_idx].n_used_entries--;
 }
 
+#ifdef TARGET_HAS_LLSC_PROT
+static QemuMutex llsc_prot_mutex;
+#endif
+
 void tlb_init(CPUState *cpu)
 {
     int64_t now = get_clock_realtime();
     int i;
 
+#ifdef TARGET_HAS_LLSC_PROT
+    qemu_mutex_init(&llsc_prot_mutex);
+#endif
     qemu_spin_init(&cpu->neg.tlb.c.lock);
 
     /* All tlbs are initialized flushed. */
@@ -1408,6 +1415,176 @@ static bool victim_tlb_hit(CPUState *cpu, size_t mmu_idx, size_t index,
     return false;
 }
 
+#ifdef TARGET_HAS_LLSC_PROT
+/*
+ * Remove a particular address from being watched by LLSC if it is
+ * no longer actively protected by any CPU.
+ */
+static void cpu_teardown_llsc_prot(ram_addr_t addr)
+{
+    CPUState *cpu;
+
+    addr &= TARGET_PAGE_MASK;
+
+    CPU_FOREACH(cpu) {
+        if ((cpu->llsc_prot_address & TARGET_PAGE_MASK) == addr) {
+            if (cpu->llsc_prot_active) {
+                return;
+            }
+            cpu->llsc_prot_address = -1ULL;
+        }
+    }
+    cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_LLSC_PROT);
+}
+
+/*
+ * Start protection of an address. cpu_resolve_llsc_begin() and
+ * cpu_resolve_llsc_check() will return true so long as another
+ * actor has not modified the line.
+ */
+void cpu_set_llsc_prot(CPUState *cpu, ram_addr_t addr)
+{
+    ram_addr_t old;
+
+    addr &= ~(cpu->llsc_prot_block_size - 1);
+
+    qemu_mutex_lock(&llsc_prot_mutex);
+    old = cpu->llsc_prot_address;
+    if ((addr & TARGET_PAGE_MASK) == (old & TARGET_PAGE_MASK)) {
+        old = -1ULL;
+        goto out;
+    }
+    cpu_physical_memory_test_and_clear_dirty(addr & TARGET_PAGE_MASK,
+                                             TARGET_PAGE_SIZE,
+                                             DIRTY_MEMORY_LLSC_PROT);
+out:
+    cpu->llsc_prot_address = addr;
+    cpu->llsc_prot_active = true;
+    if (old != -1ULL) {
+        cpu_teardown_llsc_prot(old);
+    }
+
+    qemu_mutex_unlock(&llsc_prot_mutex);
+}
+
+/*
+ * The store for store-conditional must be performed under the llsc_prot_mutex,
+ * but it must not go ahead if protection has been lost. The point of resolving
+ * conflicts happens at TLB probe time, so this can be achieved by taking the
+ * lock here, then checking our protection has not been invalidated and probing
+ * the TLB, then performing the store.
+ *
+ * The TLB probe must be non-faulting (to avoid problems with lock recursion).
+ * If the non-faulting probe fails then cpu_resolve_llsc_abort() must be called
+ * (and either perform a full probe and then retry, or perhaps could just fail
+ * the store-conditional).
+ *
+ * The TLB probe while holding the mutex may have to check and invalidate the
+ * protection on other CPUs and therefore it must hold the lock. If
+ * llsc_resolving is true then the lock is held and not acquired again.
+ */
+bool cpu_resolve_llsc_begin(CPUState *cpu)
+{
+    if (!cpu->llsc_prot_active) {
+        return false;
+    }
+    qemu_mutex_lock(&llsc_prot_mutex);
+    if (!cpu->llsc_prot_active) {
+        qemu_mutex_unlock(&llsc_prot_mutex);
+        return false;
+    }
+
+    g_assert(!cpu->llsc_resolving);
+    cpu->llsc_resolving = true;
+
+    return true;
+}
+
+void cpu_resolve_llsc_abort(CPUState *cpu)
+{
+    cpu->llsc_resolving = false;
+    qemu_mutex_unlock(&llsc_prot_mutex);
+}
+
+bool cpu_resolve_llsc_check(CPUState *cpu, ram_addr_t addr)
+{
+    g_assert(cpu->llsc_resolving);
+    g_assert(cpu->llsc_prot_active);
+
+    addr &= ~(cpu->llsc_prot_block_size - 1);
+    if (cpu->llsc_prot_address != addr) {
+        cpu->llsc_prot_active = false;
+        cpu->llsc_resolving = false;
+        qemu_mutex_unlock(&llsc_prot_mutex);
+        return false;
+    }
+
+    return true;
+}
+
+void cpu_resolve_llsc_success(CPUState *cpu)
+{
+    ram_addr_t addr;
+
+    addr = cpu->llsc_prot_address;
+    g_assert(addr != -1ULL);
+    assert(cpu->llsc_prot_active);
+    cpu->llsc_prot_active = false;
+
+    /* Leave the llsc_prot_address under active watch, for performance */
+//    cpu->llsc_prot_address = -1ULL;
+//    cpu_teardown_llsc_prot(addr);
+    g_assert(cpu->llsc_resolving);
+    cpu->llsc_resolving = false;
+
+    qemu_mutex_unlock(&llsc_prot_mutex);
+}
+
+static void other_cpus_clear_llsc_prot(CPUState *cpu, ram_addr_t addr,
+                                              unsigned size)
+{
+    CPUState *c;
+    ram_addr_t end;
+    bool teardown = false;
+
+    end = (addr + size - 1) & ~(cpu->llsc_prot_block_size - 1);
+    addr &= ~(cpu->llsc_prot_block_size - 1);
+
+    if (!cpu->llsc_resolving) {
+        qemu_mutex_lock(&llsc_prot_mutex);
+    }
+
+    CPU_FOREACH(c) {
+        ram_addr_t a = c->llsc_prot_address;
+
+        if (c == cpu) {
+            continue;
+        }
+        if (a == -1ULL) {
+            assert(!c->llsc_prot_active);
+            continue;
+        }
+	if (a == addr || a == end) {
+            if (c->llsc_prot_active) {
+                c->llsc_prot_active = false;
+            } else {
+                teardown = true;
+            }
+        }
+    }
+    if (teardown) {
+        cpu_teardown_llsc_prot(addr);
+        if (end != addr) {
+            cpu_teardown_llsc_prot(end);
+        }
+    }
+
+    if (!cpu->llsc_resolving) {
+        qemu_mutex_unlock(&llsc_prot_mutex);
+    }
+}
+#endif
+
 static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
                            CPUTLBEntryFull *full, uintptr_t retaddr)
 {
@@ -1419,11 +1596,18 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
         tb_invalidate_phys_range_fast(ram_addr, size, retaddr);
     }
 
+#ifdef TARGET_HAS_LLSC_PROT
+    if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_LLSC_PROT)) {
+        other_cpus_clear_llsc_prot(cpu, ram_addr, size);
+    }
+#endif
+
     /*
      * Set both VGA and migration bits for simplicity and to remove
-     * the notdirty callback faster.
+     * the notdirty callback faster. Code and llsc_prot don't get set
+     * because we always want callbacks for them.
      */
-    cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_NOCODE);
+    cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_ONESHOT);
 
     /* We remove the notdirty callback only if the code has been flushed. */
     if (!cpu_physical_memory_is_clean(ram_addr)) {
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 67db07741d..edfbffb705 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -250,6 +250,11 @@ static void cpu_common_initfn(Object *obj)
     cpu->nr_cores = 1;
     cpu->nr_threads = 1;
     cpu->cflags_next_tb = -1;
+#ifdef TARGET_HAS_LLSC_PROT
+    cpu->llsc_prot_active = false;
+    cpu->llsc_prot_address = -1ULL;
+    cpu->llsc_resolving = false;
+#endif
 
     qemu_mutex_init(&cpu->work_mutex);
     qemu_lockcnt_init(&cpu->in_ioctl_lock);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 059bfdc07a..f50ff712cf 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1174,7 +1174,7 @@ int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
     if (!bcontainer->dirty_pages_supported && !all_device_dirty_tracking) {
         cpu_physical_memory_set_dirty_range(ram_addr, size,
                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
-                                            DIRTY_CLIENTS_NOCODE);
+                                            DIRTY_CLIENTS_ONESHOT);
         return 0;
     }
 
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac79468..a57b153505 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -520,6 +520,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
          */
         handled_dirty = (1 << DIRTY_MEMORY_MIGRATION) |
             (1 << DIRTY_MEMORY_CODE);
+        /* XXX: llsc? */
 
         if (dirty_mask & ~handled_dirty) {
             trace_vhost_reject_section(mr->name, 1);
diff --git a/system/memory.c b/system/memory.c
index a229a79988..b2bac2d3b5 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1863,6 +1863,9 @@ uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
     if (tcg_enabled() && rb) {
         /* TCG only cares about dirty memory logging for RAM, not IOMMU.  */
         mask |= (1 << DIRTY_MEMORY_CODE);
+#ifdef TARGET_HAS_LLSC_PROT
+        mask |= (1 << DIRTY_MEMORY_LLSC_PROT);
+#endif
     }
     return mask;
 }
diff --git a/system/physmem.c b/system/physmem.c
index dc0d8b16aa..aac227f0d2 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2591,6 +2591,13 @@ static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr,
         dirty_log_mask =
             cpu_physical_memory_range_includes_clean(addr, length, dirty_log_mask);
     }
+#ifdef TARGET_HAS_LLSC_PROT
+    if (dirty_log_mask & (1 << DIRTY_MEMORY_LLSC_PROT)) {
+        assert(tcg_enabled());
+        /* XXX should invalidate CPU's llsc_prot protections here? */
+        dirty_log_mask &= ~(1 << DIRTY_MEMORY_LLSC_PROT);
+    }
+#endif
     if (dirty_log_mask & (1 << DIRTY_MEMORY_CODE)) {
         assert(tcg_enabled());
         tb_invalidate_phys_range(addr, addr + length - 1);
diff --git a/system/memory_ldst.c.inc b/system/memory_ldst.c.inc
index 0e6f3940a9..c9730de37b 100644
--- a/system/memory_ldst.c.inc
+++ b/system/memory_ldst.c.inc
@@ -286,7 +286,8 @@ void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL,
         stl_p(ptr, val);
 
         dirty_log_mask = memory_region_get_dirty_log_mask(mr);
-        dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
+//        dirty_log_mask &= ~((1 << DIRTY_MEMORY_CODE) | (1 << DIRTY_MEMORY_LLSC_PROT));
+        dirty_log_mask &= DIRTY_CLIENTS_ONESHOT;
         cpu_physical_memory_set_dirty_range(memory_region_get_ram_addr(mr) + addr,
                                             4, dirty_log_mask);
         r = MEMTX_OK;
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx
  2024-02-20  4:19 [RFC PATCH 0/3] real ll/sc emulation Nicholas Piggin
  2024-02-20  4:19 ` [RFC PATCH 1/3] accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing Nicholas Piggin
  2024-02-20  4:19 ` [RFC PATCH 2/3] tcg: add a ll/sc protection facility Nicholas Piggin
@ 2024-02-20  4:19 ` Nicholas Piggin
  2024-02-20 19:03   ` Richard Henderson
  2 siblings, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2024-02-20  4:19 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

Use the generic llsc protection feature to implement real reservation
protection for larx/stcx.

This is more complicated and quite a bit slower than the cmpxchg
pseudo-reservation, so it's questionable whether it should be merged
or ever made the default.

It could possibly be sped up more by tuning heuristics for managing
the dirty bitmap -- being a bit more lazy about that could reduce
the heavyweight bitmap cleaning and dirtying operations.

Not sure if there are other better approaches to this. One thing I
have tried to implement the reservation-granule (i.e., stcx. to
succeed for an address that differs from the one that established
the reservation) is to first fail the stcx. if it detects an
address mismatch, but then set a flag to say try the mismatched
mode next time. Next larx encountered will load and save memory
contents for the entire reservation granule, then stcx. verifies
the granule. This works to some degree with better performance
on code that doesn't behave this way, but it still misses things,
not only ABA but also stores that modify values not subject to the
stcx. on the first pass in case that stcx. address *does* match
larx. So I consider that even more of a hack and not really suitable
for upstream.

At the very least, this seems to have flushed out another bug in
the cpu memory dirty bitmap code, since it hammers it much harder
than typical users and simple test cases can be constructed that
make failures obvious (lost updates, lost mutual exclusion, etc).
so it hasn't been all for nothing even if it is a bad design :)
---
 target/ppc/cpu-param.h  |   4 ++
 target/ppc/helper.h     |   2 +
 target/ppc/cpu_init.c   |   4 ++
 target/ppc/mem_helper.c | 132 ++++++++++++++++++++++++++++++++++++++++
 target/ppc/translate.c  | 128 ++++++++------------------------------
 5 files changed, 168 insertions(+), 102 deletions(-)

diff --git a/target/ppc/cpu-param.h b/target/ppc/cpu-param.h
index 0a0416e0a8..355b4e2fdd 100644
--- a/target/ppc/cpu-param.h
+++ b/target/ppc/cpu-param.h
@@ -8,6 +8,10 @@
 #ifndef PPC_CPU_PARAM_H
 #define PPC_CPU_PARAM_H
 
+#ifndef TARGET_HAS_LLSC_PROT /* XXX: hack */
+#define TARGET_HAS_LLSC_PROT 1
+#endif
+
 #ifdef TARGET_PPC64
 # define TARGET_LONG_BITS 64
 /*
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 86f97ee1e7..e996b9f530 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -818,4 +818,6 @@ DEF_HELPER_4(DSCLI, void, env, fprp, fprp, i32)
 DEF_HELPER_4(DSCLIQ, void, env, fprp, fprp, i32)
 
 DEF_HELPER_1(tbegin, void, env)
+DEF_HELPER_4(larx, void, env, tl, tl, tl)
+DEF_HELPER_4(stcx, void, env, tl, tl, tl)
 DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 9931372a08..5fc14830ab 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6831,6 +6831,10 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
         goto unrealize;
     }
     init_ppc_proc(cpu);
+    if (tcg_enabled()) {
+        cs->llsc_prot_block_size = env->dcache_line_size;
+        printf("Reservation granule size %d\n", cs->llsc_prot_block_size);
+    }
 
     ppc_gdb_init(cs, pcc);
     qemu_init_vcpu(cs);
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index ea7e8443a8..fe2c7576c8 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -24,11 +24,21 @@
 #include "exec/helper-proto.h"
 #include "helper_regs.h"
 #include "exec/cpu_ldst.h"
+#include "exec/cputlb.h"
 #include "internal.h"
 #include "qemu/atomic128.h"
 
 /* #define DEBUG_OP */
 
+static inline int DEF_MEMOP(const CPUPPCState *env, int op)
+{
+    if (FIELD_EX64(env->msr, MSR, LE)) {
+        return op | MO_LE;
+    } else {
+        return op | MO_BE;
+    }
+}
+
 static inline bool needs_byteswap(const CPUPPCState *env)
 {
 #if TARGET_BIG_ENDIAN
@@ -528,3 +538,125 @@ void helper_tbegin(CPUPPCState *env)
     env->spr[SPR_TFHAR] = env->nip + 4;
     env->crf[0] = 0xB; /* 0b1010 = transaction failure */
 }
+
+void helper_larx(CPUPPCState *env, target_ulong addr, target_ulong size,
+                 target_ulong reg)
+{
+    CPUState *cs = env_cpu(env);
+    uintptr_t raddr = GETPC();
+    int mmu_idx = cpu_mmu_index(cs, false);
+    uint64_t val;
+    void *host;
+
+    if (addr & (size - 1)) {
+	ppc_cpu_do_unaligned_access(cs, addr, MMU_DATA_LOAD, mmu_idx, raddr);
+    }
+
+    env->access_type = ACCESS_RES;
+    host = probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, raddr);
+    if (host) {
+        cpu_set_llsc_prot(cs, qemu_ram_addr_from_host_nofail(host));
+    } else {
+        /* XXX: fault? */
+        g_assert_not_reached();
+    }
+
+    if (unlikely(size == 16)) {
+        Int128 val16;
+        val16 = cpu_ld16_mmu(env, addr,
+                     make_memop_idx(DEF_MEMOP(env, MO_128 | MO_ALIGN), mmu_idx),
+                     raddr);
+        env->gpr[reg] = int128_gethi(val16);
+        env->gpr[reg + 1] = int128_getlo(val16);
+        return;
+    }
+
+    switch (size) {
+    case 1:
+        val = ldub_p(host);
+        break;
+    case 2:
+        val = FIELD_EX64(env->msr, MSR, LE) ? lduw_le_p(host) : lduw_be_p(host);
+        break;
+    case 4:
+        val = FIELD_EX64(env->msr, MSR, LE) ? ldl_le_p(host) : ldl_be_p(host);
+        break;
+    case 8:
+        val = FIELD_EX64(env->msr, MSR, LE) ? ldq_le_p(host) : ldq_be_p(host);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    env->gpr[reg] = val;
+}
+
+void helper_stcx(CPUPPCState *env, target_ulong addr, target_ulong size,
+                 target_ulong reg)
+{
+    CPUState *cs = env_cpu(env);
+    uintptr_t raddr = GETPC();
+    int mmu_idx = cpu_mmu_index(cs, false);
+    uint64_t val;
+    void *host;
+    CPUTLBEntryFull *full;
+    int flags;
+
+    if (addr & (size - 1)) {
+	ppc_cpu_do_unaligned_access(cs, addr, MMU_DATA_STORE, mmu_idx, raddr);
+    }
+
+    env->access_type = ACCESS_RES;
+
+again:
+    if (!cpu_resolve_llsc_begin(cs)) {
+        goto fail;
+    }
+
+    flags = probe_access_full(env, addr, size, MMU_DATA_STORE,
+                              mmu_idx, true, &host, &full, raddr);
+    if (unlikely(flags & TLB_INVALID_MASK)) {
+        cpu_resolve_llsc_abort(cs);
+        host = probe_access(env, addr, size, MMU_DATA_STORE, mmu_idx, raddr);
+        g_assert(host);
+        goto again;
+    }
+
+    if (!cpu_resolve_llsc_check(cs, qemu_ram_addr_from_host_nofail(host))) {
+        goto fail;
+    }
+
+    if (unlikely(size == 16)) {
+        Int128 val16 = int128_make128(env->gpr[reg + 1], env->gpr[reg]);
+        cpu_st16_mmu(env, addr, val16,
+                    make_memop_idx(DEF_MEMOP(env, MO_128 | MO_ALIGN), mmu_idx),
+                    raddr);
+	goto success;
+    }
+
+    val = env->gpr[reg];
+    switch (size) {
+    case 1:
+        stb_p(host, val);
+        break;
+    case 2:
+        FIELD_EX64(env->msr, MSR, LE) ? stw_le_p(host, val) : stw_be_p(host, val);
+        break;
+    case 4:
+        FIELD_EX64(env->msr, MSR, LE) ? stl_le_p(host, val) : stl_be_p(host, val);
+        break;
+    case 8:
+        FIELD_EX64(env->msr, MSR, LE) ? stq_le_p(host, val) : stq_be_p(host, val);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+success:
+    cpu_resolve_llsc_success(cs);
+
+    env->crf[0] = xer_so | CRF_EQ; /* stcx pass */
+    return;
+
+fail:
+    env->crf[0] = xer_so; /* stcx fail */
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 049f636927..a2d002eb89 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3570,31 +3570,27 @@ static void gen_isync(DisasContext *ctx)
     ctx->base.is_jmp = DISAS_EXIT_UPDATE;
 }
 
-#define MEMOP_GET_SIZE(x)  (1 << ((x) & MO_SIZE))
-
-static void gen_load_locked(DisasContext *ctx, MemOp memop)
+static void gen_load_locked(DisasContext *ctx, int size)
 {
-    TCGv gpr = cpu_gpr[rD(ctx->opcode)];
-    TCGv t0 = tcg_temp_new();
+    int rd = rD(ctx->opcode);
+    TCGv EA = tcg_temp_new();
 
-    gen_set_access_type(ctx, ACCESS_RES);
-    gen_addr_reg_index(ctx, t0);
-    tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop | MO_ALIGN);
-    tcg_gen_mov_tl(cpu_reserve, t0);
-    tcg_gen_movi_tl(cpu_reserve_length, memop_size(memop));
-    tcg_gen_mov_tl(cpu_reserve_val, gpr);
+    gen_addr_reg_index(ctx, EA);
+    gen_helper_larx(tcg_env, EA, tcg_constant_tl(size), tcg_constant_tl(rd));
 }
 
-#define LARX(name, memop)                  \
+#define LARX(name, size)                   \
 static void gen_##name(DisasContext *ctx)  \
 {                                          \
-    gen_load_locked(ctx, memop);           \
+    gen_load_locked(ctx, size);            \
 }
 
 /* lwarx */
-LARX(lbarx, DEF_MEMOP(MO_UB))
-LARX(lharx, DEF_MEMOP(MO_UW))
-LARX(lwarx, DEF_MEMOP(MO_UL))
+LARX(lbarx, 1)
+LARX(lharx, 2)
+LARX(lwarx, 4)
+
+#define MEMOP_GET_SIZE(x)  (1 << ((x) & MO_SIZE))
 
 static void gen_fetch_inc_conditional(DisasContext *ctx, MemOp memop,
                                       TCGv EA, TCGCond cond, int addend)
@@ -3802,59 +3798,36 @@ static void gen_stdat(DisasContext *ctx)
 }
 #endif
 
-static void gen_conditional_store(DisasContext *ctx, MemOp memop)
+static void gen_conditional_store(DisasContext *ctx, int size)
 {
-    TCGLabel *lfail;
-    TCGv EA;
-    TCGv cr0;
-    TCGv t0;
     int rs = rS(ctx->opcode);
+    TCGv EA = tcg_temp_new();
 
-    lfail = gen_new_label();
-    EA = tcg_temp_new();
-    cr0 = tcg_temp_new();
-    t0 = tcg_temp_new();
-
-    tcg_gen_mov_tl(cr0, cpu_so);
-    gen_set_access_type(ctx, ACCESS_RES);
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lfail);
-    tcg_gen_brcondi_tl(TCG_COND_NE, cpu_reserve_length, memop_size(memop), lfail);
-
-    tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val,
-                              cpu_gpr[rs], ctx->mem_idx,
-                              DEF_MEMOP(memop) | MO_ALIGN);
-    tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
-    tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT);
-    tcg_gen_or_tl(cr0, cr0, t0);
-
-    gen_set_label(lfail);
-    tcg_gen_trunc_tl_i32(cpu_crf[0], cr0);
-    tcg_gen_movi_tl(cpu_reserve, -1);
+    gen_helper_stcx(tcg_env, EA, tcg_constant_tl(size), tcg_constant_tl(rs));
 }
 
-#define STCX(name, memop)                  \
+#define STCX(name, size)                   \
 static void gen_##name(DisasContext *ctx)  \
 {                                          \
-    gen_conditional_store(ctx, memop);     \
+    gen_conditional_store(ctx, size);      \
 }
 
-STCX(stbcx_, DEF_MEMOP(MO_UB))
-STCX(sthcx_, DEF_MEMOP(MO_UW))
-STCX(stwcx_, DEF_MEMOP(MO_UL))
+STCX(stbcx_, 1)
+STCX(sthcx_, 2)
+STCX(stwcx_, 4)
 
 #if defined(TARGET_PPC64)
 /* ldarx */
-LARX(ldarx, DEF_MEMOP(MO_UQ))
+LARX(ldarx, 8)
 /* stdcx. */
-STCX(stdcx_, DEF_MEMOP(MO_UQ))
+STCX(stdcx_, 8)
 
 /* lqarx */
 static void gen_lqarx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
-    TCGv EA, hi, lo;
-    TCGv_i128 t16;
+    TCGv EA;
 
     if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
                  (rd == rB(ctx->opcode)))) {
@@ -3862,74 +3835,25 @@ static void gen_lqarx(DisasContext *ctx)
         return;
     }
 
-    gen_set_access_type(ctx, ACCESS_RES);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-
-    /* Note that the low part is always in RD+1, even in LE mode.  */
-    lo = cpu_gpr[rd + 1];
-    hi = cpu_gpr[rd];
-
-    t16 = tcg_temp_new_i128();
-    tcg_gen_qemu_ld_i128(t16, EA, ctx->mem_idx, DEF_MEMOP(MO_128 | MO_ALIGN));
-    tcg_gen_extr_i128_i64(lo, hi, t16);
-
-    tcg_gen_mov_tl(cpu_reserve, EA);
-    tcg_gen_movi_tl(cpu_reserve_length, 16);
-    tcg_gen_st_tl(hi, tcg_env, offsetof(CPUPPCState, reserve_val));
-    tcg_gen_st_tl(lo, tcg_env, offsetof(CPUPPCState, reserve_val2));
+    gen_helper_larx(tcg_env, EA, tcg_constant_tl(16), tcg_constant_tl(rd));
 }
 
 /* stqcx. */
 static void gen_stqcx_(DisasContext *ctx)
 {
-    TCGLabel *lfail;
-    TCGv EA, t0, t1;
-    TCGv cr0;
-    TCGv_i128 cmp, val;
     int rs = rS(ctx->opcode);
+    TCGv EA;
 
     if (unlikely(rs & 1)) {
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
         return;
     }
 
-    lfail = gen_new_label();
     EA = tcg_temp_new();
-    cr0 = tcg_temp_new();
-
-    tcg_gen_mov_tl(cr0, cpu_so);
-    gen_set_access_type(ctx, ACCESS_RES);
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lfail);
-    tcg_gen_brcondi_tl(TCG_COND_NE, cpu_reserve_length, 16, lfail);
-
-    cmp = tcg_temp_new_i128();
-    val = tcg_temp_new_i128();
-
-    tcg_gen_concat_i64_i128(cmp, cpu_reserve_val2, cpu_reserve_val);
-
-    /* Note that the low part is always in RS+1, even in LE mode.  */
-    tcg_gen_concat_i64_i128(val, cpu_gpr[rs + 1], cpu_gpr[rs]);
-
-    tcg_gen_atomic_cmpxchg_i128(val, cpu_reserve, cmp, val, ctx->mem_idx,
-                                DEF_MEMOP(MO_128 | MO_ALIGN));
-
-    t0 = tcg_temp_new();
-    t1 = tcg_temp_new();
-    tcg_gen_extr_i128_i64(t1, t0, val);
-
-    tcg_gen_xor_tl(t1, t1, cpu_reserve_val2);
-    tcg_gen_xor_tl(t0, t0, cpu_reserve_val);
-    tcg_gen_or_tl(t0, t0, t1);
-
-    tcg_gen_setcondi_tl(TCG_COND_EQ, t0, t0, 0);
-    tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT);
-    tcg_gen_or_tl(cr0, cr0, t0);
-
-    gen_set_label(lfail);
-    tcg_gen_trunc_tl_i32(cpu_crf[0], cr0);
-    tcg_gen_movi_tl(cpu_reserve, -1);
+    gen_helper_stcx(tcg_env, EA, tcg_constant_tl(16), tcg_constant_tl(rs));
 }
 #endif /* defined(TARGET_PPC64) */
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/3] tcg: add a ll/sc protection facility
  2024-02-20  4:19 ` [RFC PATCH 2/3] tcg: add a ll/sc protection facility Nicholas Piggin
@ 2024-02-20 18:53   ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-02-20 18:53 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 2/19/24 18:19, Nicholas Piggin wrote:
> +    bool llsc_prot =
> +#ifdef TARGET_HAS_LLSC_PROT
> +        cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_LLSC_PROT);
> +#else
> +        false;
> +#endif

We're trying to get rid of all target-specific adjustments to TCG.
We are not keen to introduce another.
Just drop the ifdefs.

> @@ -355,6 +379,8 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
>       unsigned long hpratio = qemu_real_host_page_size() / TARGET_PAGE_SIZE;
>       unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
>   
> +    assert(0);

Left over debugging?

> +
>       /* start address is aligned at the start of a word? */
>       if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) &&
>           (hpratio == 1)) {
> @@ -396,6 +422,12 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
>                       if (tcg_enabled()) {
>                           qatomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset],
>                                      temp);
> +#ifdef TARGET_HAS_LLSC_PROT
> +			/* XXX? */
> +                        qatomic_or(&blocks[DIRTY_MEMORY_LLSC_PROT][idx][offset],
> +                                   temp);
> +#endif
> +			assert(0);

Again.

> +++ b/include/hw/core/cpu.h
> @@ -489,6 +489,11 @@ struct CPUState {
>       uint64_t random_seed;
>       sigjmp_buf jmp_env;
>   
> +    int llsc_prot_block_size;
> +    bool llsc_prot_active;

Is active identical with block_size != 0.

> +    bool llsc_resolving;

I'm not following the logic around resolving and locks...


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx
  2024-02-20  4:19 ` [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx Nicholas Piggin
@ 2024-02-20 19:03   ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-02-20 19:03 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 2/19/24 18:19, Nicholas Piggin wrote:
> +    env->access_type = ACCESS_RES;
> +    host = probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, raddr);
> +    if (host) {
> +        cpu_set_llsc_prot(cs, qemu_ram_addr_from_host_nofail(host));
> +    } else {
> +        /* XXX: fault? */
> +        g_assert_not_reached();
> +    }

probe_access will not return a host address for lots of reasons, including watchpoints, 
plugins, etc.

> +
> +    if (unlikely(size == 16)) {
> +        Int128 val16;
> +        val16 = cpu_ld16_mmu(env, addr,
> +                     make_memop_idx(DEF_MEMOP(env, MO_128 | MO_ALIGN), mmu_idx),
> +                     raddr);
> +        env->gpr[reg] = int128_gethi(val16);
> +        env->gpr[reg + 1] = int128_getlo(val16);
> +        return;
> +    }
> +
> +    switch (size) {
> +    case 1:
> +        val = ldub_p(host);
> +        break;
> +    case 2:
> +        val = FIELD_EX64(env->msr, MSR, LE) ? lduw_le_p(host) : lduw_be_p(host);
> +        break;
> +    case 4:
> +        val = FIELD_EX64(env->msr, MSR, LE) ? ldl_le_p(host) : ldl_be_p(host);
> +        break;
> +    case 8:
> +        val = FIELD_EX64(env->msr, MSR, LE) ? ldq_le_p(host) : ldq_be_p(host);
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +    env->gpr[reg] = val;

Passing in size and performing a switch on it is not the best organization.
You should use multiple helper functions with a common subroutine to handle the llsc 
setup.  You should pass in the whole MemOpIdx.

> +#define MEMOP_GET_SIZE(x)  (1 << ((x) & MO_SIZE))

This is memop_size().


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-20 19:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-20  4:19 [RFC PATCH 0/3] real ll/sc emulation Nicholas Piggin
2024-02-20  4:19 ` [RFC PATCH 1/3] accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing Nicholas Piggin
2024-02-20  4:19 ` [RFC PATCH 2/3] tcg: add a ll/sc protection facility Nicholas Piggin
2024-02-20 18:53   ` Richard Henderson
2024-02-20  4:19 ` [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx Nicholas Piggin
2024-02-20 19:03   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).