qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH-for-9.1 v2 0/3] target/ppc: fix tlb flushing race (plus
@ 2024-04-05 12:53 Nicholas Piggin
  2024-04-05 12:53 ` [PATCH v2 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nicholas Piggin @ 2024-04-05 12:53 UTC (permalink / raw)
  To: qemu-ppc
  Cc: Nicholas Piggin, Philippe Mathieu-Daudé, Richard Henderson,
	Paolo Bonzini, Daniel Henrique Barboza, qemu-devel, Peter Maydell,
	qemu-arm, qemu-riscv, qemu-s390x

ppc's broadcast tlb flushe must ensure all vCPUs have invalidated
their TLBs when the tlbie sequence completes. This is not true in
the current TCG implementation, due to async_run_on_cpu() returning
before the remote CPU runs the job.

Fixing ppc requires moving to async_safe_run_on_cpu(), however this
API does not guarantee that either, and actually changing to the
tlb_flush_*_all_cpus_synced() variants introduces another race
which is that the flush is not even guaraneed to complete on the
local CPU. To ensure that it is, the tlbie has to be made the last
instruction in the TB.

Fixing ppc removes the last caller of the non-synced TLB flush
variants, we can remove some dead code too.

For others - at least arm, riscv, and s390x all use the
tlb_flush_*_all_cpus_synced() calls AFAIKS without ending the TB. But
it's possible I've missed where they do, or they have other issues that
mean it's not required for the flush to take effect until some later
operation which does end the TB. Maybe there is no problem, but it might
be worth looking at.

To reproduce, I have a kvm-unit-tests case for ppc but should be quite
easy to port to other archs. You just need to be careful for the local
CPU test case that your tlbi instruction is in the same TB as the
subsequent load/store that is to incorrectly use a stale translation.

https://gitlab.com/npiggin/kvm-unit-tests/-/tree/powerpc

The test 'powerpc/mmu.elf tlbie:tlbi-other-cpu' breaks with upstream
QEMU. If we use tlb_flush_*_all_cpus_synced() like other arch, then
'powerpc/mmu.elf tlbie:tlbi-this-cpu' also breaks.

Since v1 I understood the full problem and fix, and fixed the fix.

Thanks,
Nick

Nicholas Piggin (3):
  target/ppc: Fix broadcast tlbie synchronisation
  tcg/cputlb: Remove non-synced variants of global TLB flushes
  tcg/cputlb: remove other-cpu capability from TLB flushing

 docs/devel/multi-thread-tcg.rst              |  13 +-
 include/exec/exec-all.h                      |  97 ++-----------
 accel/tcg/cputlb.c                           | 145 ++-----------------
 target/ppc/helper_regs.c                     |   2 +-
 target/ppc/mmu_helper.c                      |   2 +-
 target/ppc/translate.c                       |   7 +
 target/ppc/translate/storage-ctrl-impl.c.inc |   7 +
 7 files changed, 44 insertions(+), 229 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-05 17:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-05 12:53 [PATCH-for-9.1 v2 0/3] target/ppc: fix tlb flushing race (plus Nicholas Piggin
2024-04-05 12:53 ` [PATCH v2 1/3] target/ppc: Fix broadcast tlbie synchronisation Nicholas Piggin
2024-04-05 12:53 ` [PATCH v2 2/3] tcg/cputlb: Remove non-synced variants of global TLB flushes Nicholas Piggin
2024-04-05 17:08   ` Richard Henderson
2024-04-05 12:53 ` [PATCH v2 3/3] tcg/cputlb: remove other-cpu capability from TLB flushing Nicholas Piggin
2024-04-05 17:09   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).