* [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO
@ 2025-08-08 12:01 Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
` (10 more replies)
0 siblings, 11 replies; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
v3:
* hpet: replace explicit atomics with use seqlock API (PeterX)
* introduce cpu_test_interrupt() (Paolo)
and use it tree wide for checking interrupts
* don't take BQL for setting exit_request, use qatomic_set() instead. (Paolo)
* after above change, relace conditional BQL with unconditional
to simlify things a bit (Paolo)
* drop not needed barriers (Paolo)
* minor tcg:cpu_handle_interrupt() cleanup
v2:
* Make both read and write pathes BQL-less (Gerd)
* Refactor HPET to handle lock-less access correctly
when stopping/starting counter in parallel. (Peter Maydell)
* Publish kvm-unit-tests HPET bench/torture test [1] to verify
HPET lock-less handling
When booting WS2025 with following CLI
1) -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4
the guest boots very slow and is sluggish after boot
or it's stuck on boot at spinning circle (most of the time).
pref shows that VM is experiencing heavy BQL contention on IO path
which happens to be ACPI PM timer read access. A variation with
HPET enabled moves contention to HPET timer read access.
And it only gets worse with increasing number of VCPUs.
Series prevents large VM vCPUs contending on BQL due to PM|HPET timer
access and lets Windows to move on with boot process.
Testing lock-less IO with HPET micro benchmark [2] shows approx 80%
better performance than the current BLQ locked path.
[chart https://ibb.co/MJY9999 shows much better scaling of lock-less
IO compared to BQL one.]
In my tests, with CLI WS2025 guest wasn't able to boot within 30min
on both hosts
* 32 core 2NUMA nodes
* 448 cores 8NUMA nodes
With ACPI PM timer in BQL-free read mode, guest boots within approx:
* 2min
* 1min
respectively.
With HPET enabled boot time shrinks ~2x
* 4m13 -> 2m21
* 2m19 -> 1m15
respectively.
2) "[kvm-unit-tests PATCH v4 0/5] x86: add HPET counter tests"
https://lore.kernel.org/kvm/20250725095429.1691734-1-imammedo@redhat.com/T/#t
PS:
Using hv-time=on cpu option helps a lot (when it works) and
lets [1] guest boot fine in ~1-2min. Series doesn't make
a significant impact in this case.
PS2:
Tested series with a bunch of different guests:
RHEL-[6..10]x64, WS2012R2, WS2016, WS2022, WS2025
PS3:
dropped mention of https://bugzilla.redhat.com/show_bug.cgi?id=1322713
as it's not reproducible with current software stack or even with
the same qemu/seabios as reported (kernel versions mentioned in
the report were interim ones and no longer available,
so I've used nearest released at the time for testing)
Igor Mammedov (10):
memory: reintroduce BQL-free fine-grained PIO/MMIO
acpi: mark PMTIMER as unlocked
hpet: switch to fain-grained device locking
hpet: move out main counter read into a separate block
hpet: make main counter read lock-less
introduce cpu_test_interrupt() that will replace open coded checks
x86: kvm: use cpu_test_interrupt() instead of oppen coding checks
kvm: i386: irqchip: take BQL only if there is an interrupt
use cpu_test_interrupt() instead of oppen coding checks tree wide
tcg: move interrupt caching and single step masking closer to user
include/hw/core/cpu.h | 12 ++++++++
include/system/memory.h | 10 +++++++
accel/tcg/cpu-exec.c | 25 +++++++---------
accel/tcg/tcg-accel-ops.c | 3 +-
hw/acpi/core.c | 1 +
hw/timer/hpet.c | 38 +++++++++++++++++++-----
system/cpus.c | 3 +-
system/memory.c | 15 ++++++++++
system/physmem.c | 2 +-
target/alpha/cpu.c | 8 ++---
target/arm/cpu.c | 20 ++++++-------
target/arm/helper.c | 16 +++++-----
target/arm/hvf/hvf.c | 6 ++--
target/avr/cpu.c | 2 +-
target/hppa/cpu.c | 2 +-
target/i386/hvf/hvf.c | 4 +--
target/i386/hvf/x86hvf.c | 21 +++++++------
target/i386/kvm/kvm.c | 46 ++++++++++++++---------------
target/i386/nvmm/nvmm-all.c | 24 +++++++--------
target/i386/tcg/system/seg_helper.c | 2 +-
target/i386/whpx/whpx-all.c | 34 ++++++++++-----------
target/loongarch/cpu.c | 2 +-
target/m68k/cpu.c | 2 +-
target/microblaze/cpu.c | 2 +-
target/mips/cpu.c | 6 ++--
target/mips/kvm.c | 2 +-
target/openrisc/cpu.c | 3 +-
target/ppc/cpu_init.c | 2 +-
target/ppc/kvm.c | 2 +-
target/rx/cpu.c | 3 +-
target/rx/helper.c | 2 +-
target/s390x/cpu-system.c | 2 +-
target/sh4/cpu.c | 2 +-
target/sh4/helper.c | 2 +-
target/sparc/cpu.c | 2 +-
target/sparc/int64_helper.c | 4 +--
36 files changed, 193 insertions(+), 139 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-08 12:12 ` David Hildenbrand
2025-08-11 15:54 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 02/10] acpi: mark PMTIMER as unlocked Igor Mammedov
` (9 subsequent siblings)
10 siblings, 2 replies; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
This patch brings back Jan's idea [1] of BQL-free IO access
This will let us make access to ACPI PM/HPET timers cheaper,
and prevent BQL contention in case of workload that heavily
uses the timers with a lot of vCPUs.
1) 196ea13104f (memory: Add global-locking property to memory regions)
... de7ea885c539 (kvm: Switch to unlocked MMIO)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
add comment for 'mr->disable_reentrancy_guard = true'
Peter Xu <peterx@redhat.com>
---
include/system/memory.h | 10 ++++++++++
system/memory.c | 15 +++++++++++++++
system/physmem.c | 2 +-
3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/include/system/memory.h b/include/system/memory.h
index e2cd6ed126..d04366c994 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -833,6 +833,7 @@ struct MemoryRegion {
bool nonvolatile;
bool rom_device;
bool flush_coalesced_mmio;
+ bool lockless_io;
bool unmergeable;
uint8_t dirty_log_mask;
bool is_iommu;
@@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
*/
void memory_region_clear_flush_coalesced(MemoryRegion *mr);
+/**
+ * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
+ *
+ * Enable BQL-free access for devices with fine-grained locking.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_enable_lockless_io(MemoryRegion *mr);
+
/**
* memory_region_add_eventfd: Request an eventfd to be triggered when a word
* is written to a location.
diff --git a/system/memory.c b/system/memory.c
index 5646547940..44701c465c 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2546,6 +2546,21 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
}
}
+void memory_region_enable_lockless_io(MemoryRegion *mr)
+{
+ mr->lockless_io = true;
+ /*
+ * reentrancy_guard has per device scope, that when enabled
+ * will effectively prevent concurrent access to device's IO
+ * MemoryRegion(s) by not calling accessor callback.
+ *
+ * Turn it off for lock-less IO enabled devices, to allow
+ * concurrent IO.
+ * TODO: remove this when reentrancy_guard becomes per transaction.
+ */
+ mr->disable_reentrancy_guard = true;
+}
+
void memory_region_add_eventfd(MemoryRegion *mr,
hwaddr addr,
unsigned size,
diff --git a/system/physmem.c b/system/physmem.c
index e5dd760e0b..f498572fc8 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2900,7 +2900,7 @@ bool prepare_mmio_access(MemoryRegion *mr)
{
bool release_lock = false;
- if (!bql_locked()) {
+ if (!bql_locked() && !mr->lockless_io) {
bql_lock();
release_lock = true;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 02/10] acpi: mark PMTIMER as unlocked
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 15:55 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 03/10] hpet: switch to fain-grained device locking Igor Mammedov
` (8 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
Reading QEMU_CLOCK_VIRTUAL is thread-safe, write access is NOP.
This makes possible to boot Windows with large vCPUs count when
hv-time is not used.
Reproducer:
-M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4 -m 8G WS2025.img
fails to boot within 30min.
With this fix it boots within 2-1min.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 58f8964e13..ff16582803 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -547,6 +547,7 @@ void acpi_pm_tmr_init(ACPIREGS *ar, acpi_update_sci_fn update_sci,
ar->tmr.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, acpi_pm_tmr_timer, ar);
memory_region_init_io(&ar->tmr.io, memory_region_owner(parent),
&acpi_pm_tmr_ops, ar, "acpi-tmr", 4);
+ memory_region_enable_lockless_io(&ar->tmr.io);
memory_region_add_subregion(parent, 8, &ar->tmr.io);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 03/10] hpet: switch to fain-grained device locking
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 02/10] acpi: mark PMTIMER as unlocked Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 15:56 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 04/10] hpet: move out main counter read into a separate block Igor Mammedov
` (7 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
as a step towards lock-less HPET counter read,
use per device locking instead of BQL.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
hw/timer/hpet.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index cb48cc151f..ab5aa59ae4 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -38,6 +38,7 @@
#include "hw/timer/i8254.h"
#include "system/address-spaces.h"
#include "qom/object.h"
+#include "qemu/lockable.h"
#include "trace.h"
struct hpet_fw_config hpet_fw_cfg = {.count = UINT8_MAX};
@@ -69,6 +70,7 @@ struct HPETState {
SysBusDevice parent_obj;
/*< public >*/
+ QemuMutex lock;
MemoryRegion iomem;
uint64_t hpet_offset;
bool hpet_offset_saved;
@@ -428,6 +430,7 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
trace_hpet_ram_read(addr);
addr &= ~4;
+ QEMU_LOCK_GUARD(&s->lock);
/*address range of all global regs*/
if (addr <= 0xff) {
switch (addr) {
@@ -482,6 +485,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
int len = MIN(size * 8, 64 - shift);
uint64_t old_val, new_val, cleared;
+ QEMU_LOCK_GUARD(&s->lock);
trace_hpet_ram_write(addr, value);
addr &= ~4;
@@ -679,8 +683,10 @@ static void hpet_init(Object *obj)
SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
HPETState *s = HPET(obj);
+ qemu_mutex_init(&s->lock);
/* HPET Area */
memory_region_init_io(&s->iomem, obj, &hpet_ram_ops, s, "hpet", HPET_LEN);
+ memory_region_enable_lockless_io(&s->iomem);
sysbus_init_mmio(sbd, &s->iomem);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 04/10] hpet: move out main counter read into a separate block
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (2 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 03/10] hpet: switch to fain-grained device locking Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 15:56 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 05/10] hpet: make main counter read lock-less Igor Mammedov
` (6 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
Follow up patche will switch main counter read to
lock-less mode. As preparation for that move relevant
branch into a separate top level block to make followup
patch cleaner/simplier by reducing contextual noise
when lock-less read is introduced.
no functional changes.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
* drop 'addr <= 0xff' as addr == HPET_COUNTER is sufficient
Peter Xu <peterx@redhat.com>
---
hw/timer/hpet.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index ab5aa59ae4..c776afc0f2 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -431,6 +431,16 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
addr &= ~4;
QEMU_LOCK_GUARD(&s->lock);
+ if (addr == HPET_COUNTER) {
+ if (hpet_enabled(s)) {
+ cur_tick = hpet_get_ticks(s);
+ } else {
+ cur_tick = s->hpet_counter;
+ }
+ trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
+ return cur_tick >> shift;
+ }
+
/*address range of all global regs*/
if (addr <= 0xff) {
switch (addr) {
@@ -438,14 +448,6 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
return s->capability >> shift;
case HPET_CFG:
return s->config >> shift;
- case HPET_COUNTER:
- if (hpet_enabled(s)) {
- cur_tick = hpet_get_ticks(s);
- } else {
- cur_tick = s->hpet_counter;
- }
- trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
- return cur_tick >> shift;
case HPET_STATUS:
return s->isr >> shift;
default:
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 05/10] hpet: make main counter read lock-less
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (3 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 04/10] hpet: move out main counter read into a separate block Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 15:58 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks Igor Mammedov
` (5 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
Make access to main HPET counter lock-less.
In unlikely event of an update in progress, readers will busy wait
untill update is finished.
As result micro benchmark of concurrent reading of HPET counter
with large number of vCPU shows over 80% better (less) latency.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
* make reader busy wait during update and reuse existing seqlock API
Peter Xu <peterx@redhat.com>
---
hw/timer/hpet.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index c776afc0f2..789a31d0a0 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -39,6 +39,7 @@
#include "system/address-spaces.h"
#include "qom/object.h"
#include "qemu/lockable.h"
+#include "qemu/seqlock.h"
#include "trace.h"
struct hpet_fw_config hpet_fw_cfg = {.count = UINT8_MAX};
@@ -74,6 +75,7 @@ struct HPETState {
MemoryRegion iomem;
uint64_t hpet_offset;
bool hpet_offset_saved;
+ QemuSeqLock state_version;
qemu_irq irqs[HPET_NUM_IRQ_ROUTES];
uint32_t flags;
uint8_t rtc_irq_level;
@@ -430,17 +432,25 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
trace_hpet_ram_read(addr);
addr &= ~4;
- QEMU_LOCK_GUARD(&s->lock);
if (addr == HPET_COUNTER) {
- if (hpet_enabled(s)) {
- cur_tick = hpet_get_ticks(s);
- } else {
- cur_tick = s->hpet_counter;
- }
+ unsigned version;
+
+ /*
+ * Write update is rare, so busywait here is unlikely to happen
+ */
+ do {
+ version = seqlock_read_begin(&s->state_version);
+ if (unlikely(!hpet_enabled(s))) {
+ cur_tick = s->hpet_counter;
+ } else {
+ cur_tick = hpet_get_ticks(s);
+ }
+ } while (seqlock_read_retry(&s->state_version, version));
trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
return cur_tick >> shift;
}
+ QEMU_LOCK_GUARD(&s->lock);
/*address range of all global regs*/
if (addr <= 0xff) {
switch (addr) {
@@ -500,6 +510,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
old_val = s->config;
new_val = deposit64(old_val, shift, len, value);
new_val = hpet_fixup_reg(new_val, old_val, HPET_CFG_WRITE_MASK);
+ seqlock_write_begin(&s->state_version);
s->config = new_val;
if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
/* Enable main counter and interrupt generation. */
@@ -518,6 +529,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
hpet_del_timer(&s->timer[i]);
}
}
+ seqlock_write_end(&s->state_version);
+
/* i8254 and RTC output pins are disabled
* when HPET is in legacy mode */
if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
@@ -686,6 +699,7 @@ static void hpet_init(Object *obj)
HPETState *s = HPET(obj);
qemu_mutex_init(&s->lock);
+ seqlock_init(&s->state_version);
/* HPET Area */
memory_region_init_io(&s->iomem, obj, &hpet_ram_ops, s, "hpet", HPET_LEN);
memory_region_enable_lockless_io(&s->iomem);
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (4 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 05/10] hpet: make main counter read lock-less Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 16:31 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 07/10] x86: kvm: use cpu_test_interrupt() instead of oppen coding checks Igor Mammedov
` (4 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
the helper forms load-acquire/store-release pair with
tcg_handle_interrupt/generic_handle_interrupt and can be used
for checking interrupts outside of BQL
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
include/hw/core/cpu.h | 12 ++++++++++++
accel/tcg/tcg-accel-ops.c | 3 ++-
system/cpus.c | 3 ++-
3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 5eaf41a566..d0460c01cf 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -942,6 +942,18 @@ CPUState *cpu_by_arch_id(int64_t id);
void cpu_interrupt(CPUState *cpu, int mask);
+/**
+ * cpu_test_interrupt:
+ * @cpu: The CPU to check interrupt(s) on.
+ * @mask: The interrupts to check.
+ *
+ * Checks if any of interrupts in @mask are pending on @cpu.
+ */
+static inline bool cpu_test_interrupt(CPUState *cpu, int mask)
+{
+ return qatomic_load_acquire(&cpu->interrupt_request) & mask;
+}
+
/**
* cpu_set_pc:
* @cpu: The CPU to set the program counter for.
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 3b0d7d298e..02c7600bb7 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -97,7 +97,8 @@ static void tcg_cpu_reset_hold(CPUState *cpu)
/* mask must never be zero, except for A20 change call */
void tcg_handle_interrupt(CPUState *cpu, int mask)
{
- cpu->interrupt_request |= mask;
+ qatomic_store_release(&cpu->interrupt_request,
+ cpu->interrupt_request | mask);
/*
* If called from iothread context, wake the target cpu in
diff --git a/system/cpus.c b/system/cpus.c
index 256723558d..8e543488c3 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -256,7 +256,8 @@ int64_t cpus_get_elapsed_ticks(void)
void generic_handle_interrupt(CPUState *cpu, int mask)
{
- cpu->interrupt_request |= mask;
+ qatomic_store_release(&cpu->interrupt_request,
+ cpu->interrupt_request | mask);
if (!qemu_cpu_is_self(cpu)) {
qemu_cpu_kick(cpu);
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 07/10] x86: kvm: use cpu_test_interrupt() instead of oppen coding checks
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (5 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 08/10] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
` (3 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
on top of that cpu_test_interrupt() uses barrier to ensure proper order
when interrupts are set from outside of vcpu thread.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
target/i386/kvm/kvm.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 369626f8c8..a7b5c8f81b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5453,8 +5453,8 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
int ret;
/* Inject NMI */
- if (cpu->interrupt_request & (CPU_INTERRUPT_NMI | CPU_INTERRUPT_SMI)) {
- if (cpu->interrupt_request & CPU_INTERRUPT_NMI) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI | CPU_INTERRUPT_SMI)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
bql_lock();
cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
bql_unlock();
@@ -5465,7 +5465,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
strerror(-ret));
}
}
- if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_SMI)) {
bql_lock();
cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
bql_unlock();
@@ -5486,12 +5486,12 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
* or (for userspace APIC, but it is cheap to combine the checks here)
* pending TPR access reports.
*/
- if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
- if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
!(env->hflags & HF_SMM_MASK)) {
cpu->exit_request = 1;
}
- if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
cpu->exit_request = 1;
}
}
@@ -5499,7 +5499,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
if (!kvm_pic_in_kernel()) {
/* Try to inject an interrupt if the guest can accept it */
if (run->ready_for_interrupt_injection &&
- (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+ cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK)) {
int irq;
@@ -5523,7 +5523,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
* interrupt, request an interrupt window exit. This will
* cause a return to userspace as soon as the guest is ready to
* receive interrupts. */
- if ((cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD)) {
run->request_interrupt_window = 1;
} else {
run->request_interrupt_window = 0;
@@ -5595,7 +5595,7 @@ int kvm_arch_process_async_events(CPUState *cs)
X86CPU *cpu = X86_CPU(cs);
CPUX86State *env = &cpu->env;
- if (cs->interrupt_request & CPU_INTERRUPT_MCE) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_MCE)) {
/* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
assert(env->mcg_cap);
@@ -5618,7 +5618,7 @@ int kvm_arch_process_async_events(CPUState *cs)
}
}
- if ((cs->interrupt_request & CPU_INTERRUPT_INIT) &&
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_INIT) &&
!(env->hflags & HF_SMM_MASK)) {
kvm_cpu_synchronize_state(cs);
do_cpu_init(cpu);
@@ -5628,20 +5628,20 @@ int kvm_arch_process_async_events(CPUState *cs)
return 0;
}
- if (cs->interrupt_request & CPU_INTERRUPT_POLL) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_POLL)) {
cs->interrupt_request &= ~CPU_INTERRUPT_POLL;
apic_poll_irq(cpu->apic_state);
}
- if (((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if ((cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK)) ||
- (cs->interrupt_request & CPU_INTERRUPT_NMI)) {
+ cpu_test_interrupt(cs, CPU_INTERRUPT_NMI)) {
cs->halted = 0;
}
- if (cs->interrupt_request & CPU_INTERRUPT_SIPI) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_SIPI)) {
kvm_cpu_synchronize_state(cs);
do_cpu_sipi(cpu);
}
- if (cs->interrupt_request & CPU_INTERRUPT_TPR) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_TPR)) {
cs->interrupt_request &= ~CPU_INTERRUPT_TPR;
kvm_cpu_synchronize_state(cs);
apic_handle_tpr_access_report(cpu->apic_state, env->eip,
@@ -5656,9 +5656,9 @@ static int kvm_handle_halt(X86CPU *cpu)
CPUState *cs = CPU(cpu);
CPUX86State *env = &cpu->env;
- if (!((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (!(cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK)) &&
- !(cs->interrupt_request & CPU_INTERRUPT_NMI)) {
+ !cpu_test_interrupt(cs, CPU_INTERRUPT_NMI)) {
cs->halted = 1;
return EXCP_HLT;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 08/10] kvm: i386: irqchip: take BQL only if there is an interrupt
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (6 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 07/10] x86: kvm: use cpu_test_interrupt() instead of oppen coding checks Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 16:22 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 09/10] use cpu_test_interrupt() instead of oppen coding checks tree wide Igor Mammedov
` (2 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
when kernel-irqchip=split is used, QEMU still hits BQL
contention issue when reading ACPI PM/HPET timers
(despite of timer[s] access being lock-less).
So Windows with more than 255 cpus is still not able to
boot (since it requires iommu -> split irqchip).
Problematic path is in kvm_arch_pre_run() where BQL is taken
unconditionally when split irqchip is in use.
There are a few parts that BQL protects there:
1. interrupt check and injecting
however we do not take BQL when checking for pending
interrupt (even within the same function), so the patch
takes the same approach for cpu->interrupt_request checks
and takes BQL only if there is a job to do.
2. request_interrupt_window access
CPUState::kvm_run::request_interrupt_window doesn't need BQL
as it's accessed by its own vCPU thread.
3. cr8/cpu_get_apic_tpr access
the same (as #2) applies to CPUState::kvm_run::cr8,
and APIC registers are also cached/synced (get/put) within
the vCPU thread it belongs to.
Taking BQL only when is necessary, eleminates BQL bottleneck on
IO/MMIO only exit path, improoving latency by 80% on HPET micro
benchmark.
This lets Windows to boot succesfully (in case hv-time isn't used)
when more than 255 vCPUs are in use.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
* drop net needed pair of () in cpu->interrupt_request & CPU_INTERRUPT_HARD
check
* Paolo Bonzini <pbonzini@redhat.com>
* don't take BQL when setting exit_request, use qatomic_set() instead
* after above simplification take/release BQL unconditionally
* drop smp_mb() after run->cr8/run->request_interrupt_window update
---
target/i386/kvm/kvm.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a7b5c8f81b..306430a052 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5478,9 +5478,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
}
}
- if (!kvm_pic_in_kernel()) {
- bql_lock();
- }
/* Force the VCPU out of its inner loop to process any INIT requests
* or (for userspace APIC, but it is cheap to combine the checks here)
@@ -5489,10 +5486,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
!(env->hflags & HF_SMM_MASK)) {
- cpu->exit_request = 1;
+ qatomic_set(&cpu->exit_request, 1);
}
if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
- cpu->exit_request = 1;
+ qatomic_set(&cpu->exit_request, 1);
}
}
@@ -5503,6 +5500,8 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
(env->eflags & IF_MASK)) {
int irq;
+ bql_lock();
+
cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
irq = cpu_get_pic_interrupt(env);
if (irq >= 0) {
@@ -5517,6 +5516,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
strerror(-ret));
}
}
+ bql_unlock();
}
/* If we have an interrupt but the guest is not ready to receive an
@@ -5531,8 +5531,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
DPRINTF("setting tpr\n");
run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
-
- bql_unlock();
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 09/10] use cpu_test_interrupt() instead of oppen coding checks tree wide
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (7 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 08/10] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 10/10] tcg: move interrupt caching and single step masking closer to user Igor Mammedov
2025-08-11 5:36 ` [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Michael S. Tsirkin
10 siblings, 0 replies; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
replace open coded checks for if interrupt is set with cpu_test_interrupt()
helper. It ensures that proper barriers are in place in case checks happen
outside of BQL and also makes checks more consistent/easier to read/find.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
accel/tcg/cpu-exec.c | 10 ++++-----
target/alpha/cpu.c | 8 +++----
target/arm/cpu.c | 20 ++++++++---------
target/arm/helper.c | 16 +++++++-------
target/arm/hvf/hvf.c | 6 ++---
target/avr/cpu.c | 2 +-
target/hppa/cpu.c | 2 +-
target/i386/hvf/hvf.c | 4 ++--
target/i386/hvf/x86hvf.c | 21 +++++++++---------
target/i386/nvmm/nvmm-all.c | 24 ++++++++++----------
target/i386/tcg/system/seg_helper.c | 2 +-
target/i386/whpx/whpx-all.c | 34 ++++++++++++++---------------
target/loongarch/cpu.c | 2 +-
target/m68k/cpu.c | 2 +-
target/microblaze/cpu.c | 2 +-
target/mips/cpu.c | 6 ++---
target/mips/kvm.c | 2 +-
target/openrisc/cpu.c | 3 +--
target/ppc/cpu_init.c | 2 +-
target/ppc/kvm.c | 2 +-
target/rx/cpu.c | 3 +--
target/rx/helper.c | 2 +-
target/s390x/cpu-system.c | 2 +-
target/sh4/cpu.c | 2 +-
target/sh4/helper.c | 2 +-
target/sparc/cpu.c | 2 +-
target/sparc/int64_helper.c | 4 ++--
27 files changed, 92 insertions(+), 95 deletions(-)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 713bdb2056..1269c2c6ba 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -778,7 +778,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
*/
qatomic_set_mb(&cpu->neg.icount_decr.u16.high, 0);
- if (unlikely(qatomic_read(&cpu->interrupt_request))) {
+ if (unlikely(cpu_test_interrupt(cpu, ~0))) {
int interrupt_request;
bql_lock();
interrupt_request = cpu->interrupt_request;
@@ -786,7 +786,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
/* Mask out external interrupts for this step. */
interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
}
- if (interrupt_request & CPU_INTERRUPT_DEBUG) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_DEBUG)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
cpu->exception_index = EXCP_DEBUG;
bql_unlock();
@@ -795,7 +795,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
#if !defined(CONFIG_USER_ONLY)
if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) {
/* Do nothing */
- } else if (interrupt_request & CPU_INTERRUPT_HALT) {
+ } else if (cpu_test_interrupt(cpu, CPU_INTERRUPT_HALT)) {
replay_interrupt();
cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
cpu->halted = 1;
@@ -805,7 +805,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
} else {
const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
- if (interrupt_request & CPU_INTERRUPT_RESET) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_RESET)) {
replay_interrupt();
tcg_ops->cpu_exec_reset(cpu);
bql_unlock();
@@ -841,7 +841,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
interrupt_request = cpu->interrupt_request;
}
#endif /* !CONFIG_USER_ONLY */
- if (interrupt_request & CPU_INTERRUPT_EXITTB) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_EXITTB)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
/* ensure that no TB jump will be modified as
the program flow was changed */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index bf1787a69d..932cddac05 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -86,10 +86,10 @@ static bool alpha_cpu_has_work(CPUState *cs)
assume that if a CPU really wants to stay asleep, it will mask
interrupts at the chipset level, which will prevent these bits
from being set in the first place. */
- return cs->interrupt_request & (CPU_INTERRUPT_HARD
- | CPU_INTERRUPT_TIMER
- | CPU_INTERRUPT_SMP
- | CPU_INTERRUPT_MCHK);
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD
+ | CPU_INTERRUPT_TIMER
+ | CPU_INTERRUPT_SMP
+ | CPU_INTERRUPT_MCHK);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index e2b2337399..a29c3facbf 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -142,11 +142,11 @@ static bool arm_cpu_has_work(CPUState *cs)
ARMCPU *cpu = ARM_CPU(cs);
return (cpu->power_state != PSCI_OFF)
- && cs->interrupt_request &
- (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
- | CPU_INTERRUPT_NMI | CPU_INTERRUPT_VINMI | CPU_INTERRUPT_VFNMI
- | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ | CPU_INTERRUPT_VSERR
- | CPU_INTERRUPT_EXITTB);
+ && cpu_test_interrupt(cs,
+ CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
+ | CPU_INTERRUPT_NMI | CPU_INTERRUPT_VINMI | CPU_INTERRUPT_VFNMI
+ | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ | CPU_INTERRUPT_VSERR
+ | CPU_INTERRUPT_EXITTB);
}
#endif /* !CONFIG_USER_ONLY */
@@ -958,7 +958,7 @@ void arm_cpu_update_virq(ARMCPU *cpu)
!(arm_hcrx_el2_eff(env) & HCRX_VINMI)) ||
(env->irq_line_state & CPU_INTERRUPT_VIRQ);
- if (new_state != ((cs->interrupt_request & CPU_INTERRUPT_VIRQ) != 0)) {
+ if (new_state != cpu_test_interrupt(cs, CPU_INTERRUPT_VIRQ)) {
if (new_state) {
cpu_interrupt(cs, CPU_INTERRUPT_VIRQ);
} else {
@@ -980,7 +980,7 @@ void arm_cpu_update_vfiq(ARMCPU *cpu)
!(arm_hcrx_el2_eff(env) & HCRX_VFNMI)) ||
(env->irq_line_state & CPU_INTERRUPT_VFIQ);
- if (new_state != ((cs->interrupt_request & CPU_INTERRUPT_VFIQ) != 0)) {
+ if (new_state != cpu_test_interrupt(cs, CPU_INTERRUPT_VFIQ)) {
if (new_state) {
cpu_interrupt(cs, CPU_INTERRUPT_VFIQ);
} else {
@@ -1002,7 +1002,7 @@ void arm_cpu_update_vinmi(ARMCPU *cpu)
(arm_hcrx_el2_eff(env) & HCRX_VINMI)) ||
(env->irq_line_state & CPU_INTERRUPT_VINMI);
- if (new_state != ((cs->interrupt_request & CPU_INTERRUPT_VINMI) != 0)) {
+ if (new_state != cpu_test_interrupt(cs, CPU_INTERRUPT_VINMI)) {
if (new_state) {
cpu_interrupt(cs, CPU_INTERRUPT_VINMI);
} else {
@@ -1022,7 +1022,7 @@ void arm_cpu_update_vfnmi(ARMCPU *cpu)
bool new_state = (arm_hcr_el2_eff(env) & HCR_VF) &&
(arm_hcrx_el2_eff(env) & HCRX_VFNMI);
- if (new_state != ((cs->interrupt_request & CPU_INTERRUPT_VFNMI) != 0)) {
+ if (new_state != cpu_test_interrupt(cs, CPU_INTERRUPT_VFNMI)) {
if (new_state) {
cpu_interrupt(cs, CPU_INTERRUPT_VFNMI);
} else {
@@ -1041,7 +1041,7 @@ void arm_cpu_update_vserr(ARMCPU *cpu)
bool new_state = env->cp15.hcr_el2 & HCR_VSE;
- if (new_state != ((cs->interrupt_request & CPU_INTERRUPT_VSERR) != 0)) {
+ if (new_state != cpu_test_interrupt(cs, CPU_INTERRUPT_VSERR)) {
if (new_state) {
cpu_interrupt(cs, CPU_INTERRUPT_VSERR);
} else {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0c1299ff84..03988876e9 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -833,40 +833,40 @@ static uint64_t isr_read(CPUARMState *env, const ARMCPRegInfo *ri)
uint64_t ret = 0;
if (hcr_el2 & HCR_IMO) {
- if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_VIRQ)) {
ret |= CPSR_I;
}
- if (cs->interrupt_request & CPU_INTERRUPT_VINMI) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_VINMI)) {
ret |= ISR_IS;
ret |= CPSR_I;
}
} else {
- if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD)) {
ret |= CPSR_I;
}
- if (cs->interrupt_request & CPU_INTERRUPT_NMI) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_NMI)) {
ret |= ISR_IS;
ret |= CPSR_I;
}
}
if (hcr_el2 & HCR_FMO) {
- if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_VFIQ)) {
ret |= CPSR_F;
}
- if (cs->interrupt_request & CPU_INTERRUPT_VFNMI) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_VFNMI)) {
ret |= ISR_FS;
ret |= CPSR_F;
}
} else {
- if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_FIQ)) {
ret |= CPSR_F;
}
}
if (hcr_el2 & HCR_AMO) {
- if (cs->interrupt_request & CPU_INTERRUPT_VSERR) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_VSERR)) {
ret |= CPSR_A;
}
}
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 47b0cd3a35..b77db99079 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -1782,13 +1782,13 @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
static int hvf_inject_interrupts(CPUState *cpu)
{
- if (cpu->interrupt_request & CPU_INTERRUPT_FIQ) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_FIQ)) {
trace_hvf_inject_fiq();
hv_vcpu_set_pending_interrupt(cpu->accel->fd, HV_INTERRUPT_TYPE_FIQ,
true);
}
- if (cpu->interrupt_request & CPU_INTERRUPT_HARD) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD)) {
trace_hvf_inject_irq();
hv_vcpu_set_pending_interrupt(cpu->accel->fd, HV_INTERRUPT_TYPE_IRQ,
true);
@@ -1840,7 +1840,7 @@ static void hvf_wfi(CPUState *cpu)
uint64_t nanos;
uint32_t cntfrq;
- if (cpu->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ)) {
/* Interrupt pending, no need to wait */
return;
}
diff --git a/target/avr/cpu.c b/target/avr/cpu.c
index 6995de6a12..a6df71d020 100644
--- a/target/avr/cpu.c
+++ b/target/avr/cpu.c
@@ -45,7 +45,7 @@ static vaddr avr_cpu_get_pc(CPUState *cs)
static bool avr_cpu_has_work(CPUState *cs)
{
- return (cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_RESET))
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD | CPU_INTERRUPT_RESET)
&& cpu_interrupts_enabled(cpu_env(cs));
}
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 24777727e6..0ca79ee5e2 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -135,7 +135,7 @@ static void hppa_restore_state_to_opc(CPUState *cs,
#ifndef CONFIG_USER_ONLY
static bool hppa_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI);
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index 818b50419f..8445cadece 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -773,9 +773,9 @@ int hvf_vcpu_exec(CPUState *cpu)
switch (exit_reason) {
case EXIT_REASON_HLT: {
macvm_set_rip(cpu, rip + ins_len);
- if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (!(cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK))
- && !(cpu->interrupt_request & CPU_INTERRUPT_NMI) &&
+ && !cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI) &&
!(idtvec_info & VMCS_IDT_VEC_VALID)) {
cpu->halted = 1;
ret = EXCP_HLT;
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index 17fce1d3cd..9e05e0e576 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -395,7 +395,7 @@ bool hvf_inject_interrupts(CPUState *cs)
};
}
- if (cs->interrupt_request & CPU_INTERRUPT_NMI) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_NMI)) {
if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
cs->interrupt_request &= ~CPU_INTERRUPT_NMI;
info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
@@ -406,7 +406,7 @@ bool hvf_inject_interrupts(CPUState *cs)
}
if (!(env->hflags & HF_INHIBIT_IRQ_MASK) &&
- (cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK) && !(info & VMCS_INTR_VALID)) {
int line = cpu_get_pic_interrupt(env);
cs->interrupt_request &= ~CPU_INTERRUPT_HARD;
@@ -415,11 +415,10 @@ bool hvf_inject_interrupts(CPUState *cs)
VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
}
}
- if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD)) {
vmx_set_int_window_exiting(cs);
}
- return (cs->interrupt_request
- & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR));
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR);
}
int hvf_process_events(CPUState *cs)
@@ -432,25 +431,25 @@ int hvf_process_events(CPUState *cs)
env->eflags = rreg(cs->accel->fd, HV_X86_RFLAGS);
}
- if (cs->interrupt_request & CPU_INTERRUPT_INIT) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_INIT)) {
cpu_synchronize_state(cs);
do_cpu_init(cpu);
}
- if (cs->interrupt_request & CPU_INTERRUPT_POLL) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_POLL)) {
cs->interrupt_request &= ~CPU_INTERRUPT_POLL;
apic_poll_irq(cpu->apic_state);
}
- if (((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if ((cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK)) ||
- (cs->interrupt_request & CPU_INTERRUPT_NMI)) {
+ cpu_test_interrupt(cs, CPU_INTERRUPT_NMI)) {
cs->halted = 0;
}
- if (cs->interrupt_request & CPU_INTERRUPT_SIPI) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_SIPI)) {
cpu_synchronize_state(cs);
do_cpu_sipi(cpu);
}
- if (cs->interrupt_request & CPU_INTERRUPT_TPR) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_TPR)) {
cs->interrupt_request &= ~CPU_INTERRUPT_TPR;
cpu_synchronize_state(cs);
apic_handle_tpr_access_report(cpu->apic_state, env->eip,
diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
index 92e3b8b2f4..c1ac74c4f0 100644
--- a/target/i386/nvmm/nvmm-all.c
+++ b/target/i386/nvmm/nvmm-all.c
@@ -413,11 +413,11 @@ nvmm_vcpu_pre_run(CPUState *cpu)
* Force the VCPU out of its inner loop to process any INIT requests
* or commit pending TPR access.
*/
- if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
cpu->exit_request = 1;
}
- if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+ if (!has_event && cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
if (nvmm_can_take_nmi(cpu)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
event->type = NVMM_VCPU_EVENT_INTR;
@@ -426,7 +426,7 @@ nvmm_vcpu_pre_run(CPUState *cpu)
}
}
- if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+ if (!has_event && cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD)) {
if (nvmm_can_take_int(cpu)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
event->type = NVMM_VCPU_EVENT_INTR;
@@ -436,7 +436,7 @@ nvmm_vcpu_pre_run(CPUState *cpu)
}
/* Don't want SMIs. */
- if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_SMI)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
}
@@ -651,9 +651,9 @@ nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
bql_lock();
- if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (!(cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD) &&
(cpu_env(cpu)->eflags & IF_MASK)) &&
- !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+ !cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
cpu->exception_index = EXCP_HLT;
cpu->halted = true;
ret = 1;
@@ -691,25 +691,25 @@ nvmm_vcpu_loop(CPUState *cpu)
* Some asynchronous events must be handled outside of the inner
* VCPU loop. They are handled here.
*/
- if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT)) {
nvmm_cpu_synchronize_state(cpu);
do_cpu_init(x86_cpu);
/* set int/nmi windows back to the reset state */
}
- if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_POLL)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
apic_poll_irq(x86_cpu->apic_state);
}
- if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if ((cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK)) ||
- (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+ cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
cpu->halted = false;
}
- if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_SIPI)) {
nvmm_cpu_synchronize_state(cpu);
do_cpu_sipi(x86_cpu);
}
- if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
nvmm_cpu_synchronize_state(cpu);
apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
diff --git a/target/i386/tcg/system/seg_helper.c b/target/i386/tcg/system/seg_helper.c
index d4ea890c12..794a23ddfc 100644
--- a/target/i386/tcg/system/seg_helper.c
+++ b/target/i386/tcg/system/seg_helper.c
@@ -133,7 +133,7 @@ bool x86_cpu_exec_halt(CPUState *cpu)
X86CPU *x86_cpu = X86_CPU(cpu);
CPUX86State *env = &x86_cpu->env;
- if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_POLL)) {
bql_lock();
apic_poll_irq(x86_cpu->apic_state);
cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index b72dcff3c8..878cdd1668 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -1436,9 +1436,9 @@ static int whpx_handle_halt(CPUState *cpu)
int ret = 0;
bql_lock();
- if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (!(cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD) &&
(cpu_env(cpu)->eflags & IF_MASK)) &&
- !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+ !cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
cpu->exception_index = EXCP_HLT;
cpu->halted = true;
ret = 1;
@@ -1469,15 +1469,15 @@ static void whpx_vcpu_pre_run(CPUState *cpu)
/* Inject NMI */
if (!vcpu->interruption_pending &&
- cpu->interrupt_request & (CPU_INTERRUPT_NMI | CPU_INTERRUPT_SMI)) {
- if (cpu->interrupt_request & CPU_INTERRUPT_NMI) {
+ cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI | CPU_INTERRUPT_SMI)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
vcpu->interruptable = false;
new_int.InterruptionType = WHvX64PendingNmi;
new_int.InterruptionPending = 1;
new_int.InterruptionVector = 2;
}
- if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_SMI)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
}
}
@@ -1486,12 +1486,12 @@ static void whpx_vcpu_pre_run(CPUState *cpu)
* Force the VCPU out of its inner loop to process any INIT requests or
* commit pending TPR access.
*/
- if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
- if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
!(env->hflags & HF_SMM_MASK)) {
cpu->exit_request = 1;
}
- if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
cpu->exit_request = 1;
}
}
@@ -1501,7 +1501,7 @@ static void whpx_vcpu_pre_run(CPUState *cpu)
if (!vcpu->interruption_pending &&
vcpu->interruptable && (env->eflags & IF_MASK)) {
assert(!new_int.InterruptionPending);
- if (cpu->interrupt_request & CPU_INTERRUPT_HARD) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
irq = cpu_get_pic_interrupt(env);
if (irq >= 0) {
@@ -1519,7 +1519,7 @@ static void whpx_vcpu_pre_run(CPUState *cpu)
reg_count += 1;
}
} else if (vcpu->ready_for_pic_interrupt &&
- (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+ cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
irq = cpu_get_pic_interrupt(env);
if (irq >= 0) {
@@ -1546,7 +1546,7 @@ static void whpx_vcpu_pre_run(CPUState *cpu)
/* Update the state of the interrupt delivery notification */
if (!vcpu->window_registered &&
- cpu->interrupt_request & CPU_INTERRUPT_HARD) {
+ cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD)) {
reg_values[reg_count].DeliverabilityNotifications =
(WHV_X64_DELIVERABILITY_NOTIFICATIONS_REGISTER) {
.InterruptNotification = 1
@@ -1599,30 +1599,30 @@ static void whpx_vcpu_process_async_events(CPUState *cpu)
CPUX86State *env = &x86_cpu->env;
AccelCPUState *vcpu = cpu->accel;
- if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
!(env->hflags & HF_SMM_MASK)) {
whpx_cpu_synchronize_state(cpu);
do_cpu_init(x86_cpu);
vcpu->interruptable = true;
}
- if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_POLL)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
apic_poll_irq(x86_cpu->apic_state);
}
- if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if ((cpu_test_interrupt(cpu, CPU_INTERRUPT_HARD) &&
(env->eflags & IF_MASK)) ||
- (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+ cpu_test_interrupt(cpu, CPU_INTERRUPT_NMI)) {
cpu->halted = false;
}
- if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_SIPI)) {
whpx_cpu_synchronize_state(cpu);
do_cpu_sipi(x86_cpu);
}
- if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+ if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
whpx_cpu_synchronize_state(cpu);
apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index abad84c054..3a7621c0ea 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -376,7 +376,7 @@ static bool loongarch_cpu_has_work(CPUState *cs)
{
bool has_work = false;
- if ((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
cpu_loongarch_hw_interrupts_pending(cpu_env(cs))) {
has_work = true;
}
diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index 6a09db3a6f..f1b673119d 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -74,7 +74,7 @@ static void m68k_restore_state_to_opc(CPUState *cs,
#ifndef CONFIG_USER_ONLY
static bool m68k_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request & CPU_INTERRUPT_HARD;
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index ee0a869a94..22231f09e6 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -129,7 +129,7 @@ static void mb_restore_state_to_opc(CPUState *cs,
#ifndef CONFIG_USER_ONLY
static bool mb_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI);
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 1f6c41fd34..5989c3ba17 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -145,7 +145,7 @@ static bool mips_cpu_has_work(CPUState *cs)
* check for interrupts that can be taken. For pre-release 6 CPUs,
* check for CP0 Config7 'Wait IE ignore' bit.
*/
- if ((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
cpu_mips_hw_interrupts_pending(env)) {
if (cpu_mips_hw_interrupts_enabled(env) ||
(env->CP0_Config7 & (1 << CP0C7_WII)) ||
@@ -160,7 +160,7 @@ static bool mips_cpu_has_work(CPUState *cs)
* The QEMU model will issue an _WAKE request whenever the CPUs
* should be woken up.
*/
- if (cs->interrupt_request & CPU_INTERRUPT_WAKE) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_WAKE)) {
has_work = true;
}
@@ -170,7 +170,7 @@ static bool mips_cpu_has_work(CPUState *cs)
}
/* MIPS Release 6 has the ability to halt the CPU. */
if (env->CP0_Config5 & (1 << CP0C5_VP)) {
- if (cs->interrupt_request & CPU_INTERRUPT_WAKE) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_WAKE)) {
has_work = true;
}
if (!mips_vp_active(env)) {
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index ec53acb51a..450947c3fa 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -144,7 +144,7 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
bql_lock();
- if ((cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
cpu_mips_io_interrupts_pending(cpu)) {
intr.cpu = -1;
intr.irq = 2;
diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index dfbb2df643..9bbfe22ed3 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -78,8 +78,7 @@ static void openrisc_restore_state_to_opc(CPUState *cs,
#ifndef CONFIG_USER_ONLY
static bool openrisc_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request & (CPU_INTERRUPT_HARD |
- CPU_INTERRUPT_TIMER);
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD | CPU_INTERRUPT_TIMER);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index a0e77f2673..db841f1260 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7225,7 +7225,7 @@ static int ppc_cpu_mmu_index(CPUState *cs, bool ifetch)
#ifndef CONFIG_USER_ONLY
static bool ppc_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request & CPU_INTERRUPT_HARD;
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 015658049e..d145774b09 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1354,7 +1354,7 @@ static int kvmppc_handle_halt(PowerPCCPU *cpu)
CPUState *cs = CPU(cpu);
CPUPPCState *env = &cpu->env;
- if (!(cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ if (!cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
FIELD_EX64(env->msr, MSR, EE)) {
cs->halted = 1;
cs->exception_index = EXCP_HLT;
diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index c6dd5d6f83..da02ae7bf8 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -75,8 +75,7 @@ static void rx_restore_state_to_opc(CPUState *cs,
static bool rx_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request &
- (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIR);
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIR);
}
static int rx_cpu_mmu_index(CPUState *cs, bool ifunc)
diff --git a/target/rx/helper.c b/target/rx/helper.c
index 0640ab322b..ce003af421 100644
--- a/target/rx/helper.c
+++ b/target/rx/helper.c
@@ -44,7 +44,7 @@ void rx_cpu_unpack_psw(CPURXState *env, uint32_t psw, int rte)
void rx_cpu_do_interrupt(CPUState *cs)
{
CPURXState *env = cpu_env(cs);
- int do_irq = cs->interrupt_request & INT_FLAGS;
+ int do_irq = cpu_test_interrupt(cs, INT_FLAGS);
uint32_t save_psw;
env->in_sleep = 0;
diff --git a/target/s390x/cpu-system.c b/target/s390x/cpu-system.c
index 709ccd5299..f3a9ffb2a2 100644
--- a/target/s390x/cpu-system.c
+++ b/target/s390x/cpu-system.c
@@ -49,7 +49,7 @@ bool s390_cpu_has_work(CPUState *cs)
return false;
}
- if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
+ if (!cpu_test_interrupt(cs, CPU_INTERRUPT_HARD)) {
return false;
}
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index 4f561e8c91..21ccb86df4 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -108,7 +108,7 @@ static bool superh_io_recompile_replay_branch(CPUState *cs,
static bool superh_cpu_has_work(CPUState *cs)
{
- return cs->interrupt_request & CPU_INTERRUPT_HARD;
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD);
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index fb7642bda1..1744ef0e6d 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -58,7 +58,7 @@ int cpu_sh4_is_cached(CPUSH4State *env, target_ulong addr)
void superh_cpu_do_interrupt(CPUState *cs)
{
CPUSH4State *env = cpu_env(cs);
- int do_irq = cs->interrupt_request & CPU_INTERRUPT_HARD;
+ int do_irq = cpu_test_interrupt(cs, CPU_INTERRUPT_HARD);
int do_exp, irq_vector = cs->exception_index;
/* prioritize exceptions over interrupts */
diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index 245caf2de0..c9773f1540 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -783,7 +783,7 @@ static void sparc_restore_state_to_opc(CPUState *cs,
#ifndef CONFIG_USER_ONLY
static bool sparc_cpu_has_work(CPUState *cs)
{
- return (cs->interrupt_request & CPU_INTERRUPT_HARD) &&
+ return cpu_test_interrupt(cs, CPU_INTERRUPT_HARD) &&
cpu_interrupts_enabled(cpu_env(cs));
}
#endif /* !CONFIG_USER_ONLY */
diff --git a/target/sparc/int64_helper.c b/target/sparc/int64_helper.c
index bd14c7a0db..49e4e51c6d 100644
--- a/target/sparc/int64_helper.c
+++ b/target/sparc/int64_helper.c
@@ -89,7 +89,7 @@ void cpu_check_irqs(CPUSPARCState *env)
* the next bit is (2 << psrpil).
*/
if (pil < (2 << env->psrpil)) {
- if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+ if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD)) {
trace_sparc64_cpu_check_irqs_reset_irq(env->interrupt_index);
env->interrupt_index = 0;
cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
@@ -120,7 +120,7 @@ void cpu_check_irqs(CPUSPARCState *env)
break;
}
}
- } else if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+ } else if (cpu_test_interrupt(cs, CPU_INTERRUPT_HARD)) {
trace_sparc64_cpu_check_irqs_disabled(pil, env->pil_in, env->softint,
env->interrupt_index);
env->interrupt_index = 0;
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 10/10] tcg: move interrupt caching and single step masking closer to user
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (8 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 09/10] use cpu_test_interrupt() instead of oppen coding checks tree wide Igor Mammedov
@ 2025-08-08 12:01 ` Igor Mammedov
2025-08-11 5:36 ` [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Michael S. Tsirkin
10 siblings, 0 replies; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 12:01 UTC (permalink / raw)
To: qemu-devel; +Cc: mst, pbonzini, peterx, david, philmd, mtosatti
in cpu_handle_interrupt() the only place where cached interrupt_request
might have effect is when CPU_INTERRUPT_SSTEP_MASK applied and
cached interrupt_request handed over to cpu_exec_interrupt() and
need_replay_interrupt().
Simplify code by moving interrupt_request caching and CPU_INTERRUPT_SSTEP_MASK
masking into the block where it actually matters and drop reloading cached value
from CPUState:interrupt_request as the rest of the code directly uses
CPUState:interrupt_request.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
accel/tcg/cpu-exec.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 1269c2c6ba..82867f456c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -779,13 +779,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
qatomic_set_mb(&cpu->neg.icount_decr.u16.high, 0);
if (unlikely(cpu_test_interrupt(cpu, ~0))) {
- int interrupt_request;
bql_lock();
- interrupt_request = cpu->interrupt_request;
- if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
- /* Mask out external interrupts for this step. */
- interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
- }
if (cpu_test_interrupt(cpu, CPU_INTERRUPT_DEBUG)) {
cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
cpu->exception_index = EXCP_DEBUG;
@@ -804,6 +798,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
return true;
} else {
const TCGCPUOps *tcg_ops = cpu->cc->tcg_ops;
+ int interrupt_request = cpu->interrupt_request;
if (cpu_test_interrupt(cpu, CPU_INTERRUPT_RESET)) {
replay_interrupt();
@@ -812,6 +807,11 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
return true;
}
+ if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
+ /* Mask out external interrupts for this step. */
+ interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
+ }
+
/*
* The target hook has 3 exit conditions:
* False when the interrupt isn't processed,
@@ -836,9 +836,6 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
cpu->exception_index = -1;
*last_tb = NULL;
}
- /* The target hook may have updated the 'cpu->interrupt_request';
- * reload the 'interrupt_request' value */
- interrupt_request = cpu->interrupt_request;
}
#endif /* !CONFIG_USER_ONLY */
if (cpu_test_interrupt(cpu, CPU_INTERRUPT_EXITTB)) {
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
2025-08-08 12:01 ` [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
@ 2025-08-08 12:12 ` David Hildenbrand
2025-08-08 14:36 ` Igor Mammedov
2025-08-11 15:54 ` Peter Xu
1 sibling, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2025-08-08 12:12 UTC (permalink / raw)
To: Igor Mammedov, qemu-devel; +Cc: mst, pbonzini, peterx, philmd, mtosatti
On 08.08.25 14:01, Igor Mammedov wrote:
> This patch brings back Jan's idea [1] of BQL-free IO access
>
> This will let us make access to ACPI PM/HPET timers cheaper,
> and prevent BQL contention in case of workload that heavily
> uses the timers with a lot of vCPUs.
>
> 1) 196ea13104f (memory: Add global-locking property to memory regions)
> ... de7ea885c539 (kvm: Switch to unlocked MMIO)
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v3:
> add comment for 'mr->disable_reentrancy_guard = true'
> Peter Xu <peterx@redhat.com>
> ---
> include/system/memory.h | 10 ++++++++++
> system/memory.c | 15 +++++++++++++++
> system/physmem.c | 2 +-
> 3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/system/memory.h b/include/system/memory.h
> index e2cd6ed126..d04366c994 100644
> --- a/include/system/memory.h
> +++ b/include/system/memory.h
> @@ -833,6 +833,7 @@ struct MemoryRegion {
> bool nonvolatile;
> bool rom_device;
> bool flush_coalesced_mmio;
> + bool lockless_io;
> bool unmergeable;
> uint8_t dirty_log_mask;
> bool is_iommu;
> @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> */
> void memory_region_clear_flush_coalesced(MemoryRegion *mr);
>
> +/**
> + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
> + *
> + * Enable BQL-free access for devices with fine-grained locking.
> + *
> + * @mr: the memory region to be updated.
> + */
> +void memory_region_enable_lockless_io(MemoryRegion *mr);
Is this safe to use on any IO region, or could actually something break
when mis-used? In case it's the latter, I assume we would want to
carefully document under which scenarios this is safe to use.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
2025-08-08 12:12 ` David Hildenbrand
@ 2025-08-08 14:36 ` Igor Mammedov
2025-08-08 15:24 ` David Hildenbrand
0 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-08 14:36 UTC (permalink / raw)
To: David Hildenbrand; +Cc: qemu-devel, mst, pbonzini, peterx, philmd, mtosatti
On Fri, 8 Aug 2025 14:12:54 +0200
David Hildenbrand <david@redhat.com> wrote:
> On 08.08.25 14:01, Igor Mammedov wrote:
> > This patch brings back Jan's idea [1] of BQL-free IO access
> >
> > This will let us make access to ACPI PM/HPET timers cheaper,
> > and prevent BQL contention in case of workload that heavily
> > uses the timers with a lot of vCPUs.
> >
> > 1) 196ea13104f (memory: Add global-locking property to memory regions)
> > ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v3:
> > add comment for 'mr->disable_reentrancy_guard = true'
> > Peter Xu <peterx@redhat.com>
> > ---
> > include/system/memory.h | 10 ++++++++++
> > system/memory.c | 15 +++++++++++++++
> > system/physmem.c | 2 +-
> > 3 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/system/memory.h b/include/system/memory.h
> > index e2cd6ed126..d04366c994 100644
> > --- a/include/system/memory.h
> > +++ b/include/system/memory.h
> > @@ -833,6 +833,7 @@ struct MemoryRegion {
> > bool nonvolatile;
> > bool rom_device;
> > bool flush_coalesced_mmio;
> > + bool lockless_io;
> > bool unmergeable;
> > uint8_t dirty_log_mask;
> > bool is_iommu;
> > @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> > */
> > void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >
> > +/**
> > + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
> > + *
> > + * Enable BQL-free access for devices with fine-grained locking.
> > + *
> > + * @mr: the memory region to be updated.
> > + */
> > +void memory_region_enable_lockless_io(MemoryRegion *mr);
>
> Is this safe to use on any IO region, or could actually something break
> when mis-used? In case it's the latter, I assume we would want to
> carefully document under which scenarios this is safe to use.
"for devices with fine-grained locking" is defining scope of where it's
applicable, in another words devices enabling this need to take care
of any locking if necessary.
in this series PM timer didn't need any, while HPET required
some refactoring to make it lock-less on main timer reads,
while taking per device lock for everything else.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
2025-08-08 14:36 ` Igor Mammedov
@ 2025-08-08 15:24 ` David Hildenbrand
2025-08-11 12:08 ` Igor Mammedov
0 siblings, 1 reply; 25+ messages in thread
From: David Hildenbrand @ 2025-08-08 15:24 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, peterx, philmd, mtosatti
On 08.08.25 16:36, Igor Mammedov wrote:
> On Fri, 8 Aug 2025 14:12:54 +0200
> David Hildenbrand <david@redhat.com> wrote:
>
>> On 08.08.25 14:01, Igor Mammedov wrote:
>>> This patch brings back Jan's idea [1] of BQL-free IO access
>>>
>>> This will let us make access to ACPI PM/HPET timers cheaper,
>>> and prevent BQL contention in case of workload that heavily
>>> uses the timers with a lot of vCPUs.
>>>
>>> 1) 196ea13104f (memory: Add global-locking property to memory regions)
>>> ... de7ea885c539 (kvm: Switch to unlocked MMIO)
>>>
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>> v3:
>>> add comment for 'mr->disable_reentrancy_guard = true'
>>> Peter Xu <peterx@redhat.com>
>>> ---
>>> include/system/memory.h | 10 ++++++++++
>>> system/memory.c | 15 +++++++++++++++
>>> system/physmem.c | 2 +-
>>> 3 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/system/memory.h b/include/system/memory.h
>>> index e2cd6ed126..d04366c994 100644
>>> --- a/include/system/memory.h
>>> +++ b/include/system/memory.h
>>> @@ -833,6 +833,7 @@ struct MemoryRegion {
>>> bool nonvolatile;
>>> bool rom_device;
>>> bool flush_coalesced_mmio;
>>> + bool lockless_io;
>>> bool unmergeable;
>>> uint8_t dirty_log_mask;
>>> bool is_iommu;
>>> @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
>>> */
>>> void memory_region_clear_flush_coalesced(MemoryRegion *mr);
>>>
>>> +/**
>>> + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
>>> + *
>>> + * Enable BQL-free access for devices with fine-grained locking.
>>> + *
>>> + * @mr: the memory region to be updated.
>>> + */
>>> +void memory_region_enable_lockless_io(MemoryRegion *mr);
>>
>> Is this safe to use on any IO region, or could actually something break
>> when mis-used? In case it's the latter, I assume we would want to
>> carefully document under which scenarios this is safe to use.
>
> "for devices with fine-grained locking" is defining scope of where it's
> applicable, in another words devices enabling this need to take care
> of any locking if necessary.
Okay, I would have stressed that a bit more, something like
"Enable BQL-free access for devices that are well prepared to handle
locking during I/O themselves: either by doing fine grained locking or
by providing lock-free I/O schemes."
Nothing else jumped at me.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
` (9 preceding siblings ...)
2025-08-08 12:01 ` [PATCH v3 10/10] tcg: move interrupt caching and single step masking closer to user Igor Mammedov
@ 2025-08-11 5:36 ` Michael S. Tsirkin
10 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2025-08-11 5:36 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, pbonzini, peterx, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:27PM +0200, Igor Mammedov wrote:
> v3:
> * hpet: replace explicit atomics with use seqlock API (PeterX)
> * introduce cpu_test_interrupt() (Paolo)
> and use it tree wide for checking interrupts
> * don't take BQL for setting exit_request, use qatomic_set() instead. (Paolo)
> * after above change, relace conditional BQL with unconditional
> to simlify things a bit (Paolo)
> * drop not needed barriers (Paolo)
> * minor tcg:cpu_handle_interrupt() cleanup
>
> v2:
> * Make both read and write pathes BQL-less (Gerd)
> * Refactor HPET to handle lock-less access correctly
> when stopping/starting counter in parallel. (Peter Maydell)
> * Publish kvm-unit-tests HPET bench/torture test [1] to verify
> HPET lock-less handling
nice
acpi things:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> When booting WS2025 with following CLI
> 1) -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4
> the guest boots very slow and is sluggish after boot
> or it's stuck on boot at spinning circle (most of the time).
>
> pref shows that VM is experiencing heavy BQL contention on IO path
> which happens to be ACPI PM timer read access. A variation with
> HPET enabled moves contention to HPET timer read access.
> And it only gets worse with increasing number of VCPUs.
>
> Series prevents large VM vCPUs contending on BQL due to PM|HPET timer
> access and lets Windows to move on with boot process.
>
> Testing lock-less IO with HPET micro benchmark [2] shows approx 80%
> better performance than the current BLQ locked path.
> [chart https://ibb.co/MJY9999 shows much better scaling of lock-less
> IO compared to BQL one.]
>
> In my tests, with CLI WS2025 guest wasn't able to boot within 30min
> on both hosts
> * 32 core 2NUMA nodes
> * 448 cores 8NUMA nodes
> With ACPI PM timer in BQL-free read mode, guest boots within approx:
> * 2min
> * 1min
> respectively.
>
> With HPET enabled boot time shrinks ~2x
> * 4m13 -> 2m21
> * 2m19 -> 1m15
> respectively.
>
> 2) "[kvm-unit-tests PATCH v4 0/5] x86: add HPET counter tests"
> https://lore.kernel.org/kvm/20250725095429.1691734-1-imammedo@redhat.com/T/#t
> PS:
> Using hv-time=on cpu option helps a lot (when it works) and
> lets [1] guest boot fine in ~1-2min. Series doesn't make
> a significant impact in this case.
>
> PS2:
> Tested series with a bunch of different guests:
> RHEL-[6..10]x64, WS2012R2, WS2016, WS2022, WS2025
>
> PS3:
> dropped mention of https://bugzilla.redhat.com/show_bug.cgi?id=1322713
> as it's not reproducible with current software stack or even with
> the same qemu/seabios as reported (kernel versions mentioned in
> the report were interim ones and no longer available,
> so I've used nearest released at the time for testing)
>
> Igor Mammedov (10):
> memory: reintroduce BQL-free fine-grained PIO/MMIO
> acpi: mark PMTIMER as unlocked
> hpet: switch to fain-grained device locking
> hpet: move out main counter read into a separate block
> hpet: make main counter read lock-less
> introduce cpu_test_interrupt() that will replace open coded checks
> x86: kvm: use cpu_test_interrupt() instead of oppen coding checks
> kvm: i386: irqchip: take BQL only if there is an interrupt
> use cpu_test_interrupt() instead of oppen coding checks tree wide
> tcg: move interrupt caching and single step masking closer to user
>
> include/hw/core/cpu.h | 12 ++++++++
> include/system/memory.h | 10 +++++++
> accel/tcg/cpu-exec.c | 25 +++++++---------
> accel/tcg/tcg-accel-ops.c | 3 +-
> hw/acpi/core.c | 1 +
> hw/timer/hpet.c | 38 +++++++++++++++++++-----
> system/cpus.c | 3 +-
> system/memory.c | 15 ++++++++++
> system/physmem.c | 2 +-
> target/alpha/cpu.c | 8 ++---
> target/arm/cpu.c | 20 ++++++-------
> target/arm/helper.c | 16 +++++-----
> target/arm/hvf/hvf.c | 6 ++--
> target/avr/cpu.c | 2 +-
> target/hppa/cpu.c | 2 +-
> target/i386/hvf/hvf.c | 4 +--
> target/i386/hvf/x86hvf.c | 21 +++++++------
> target/i386/kvm/kvm.c | 46 ++++++++++++++---------------
> target/i386/nvmm/nvmm-all.c | 24 +++++++--------
> target/i386/tcg/system/seg_helper.c | 2 +-
> target/i386/whpx/whpx-all.c | 34 ++++++++++-----------
> target/loongarch/cpu.c | 2 +-
> target/m68k/cpu.c | 2 +-
> target/microblaze/cpu.c | 2 +-
> target/mips/cpu.c | 6 ++--
> target/mips/kvm.c | 2 +-
> target/openrisc/cpu.c | 3 +-
> target/ppc/cpu_init.c | 2 +-
> target/ppc/kvm.c | 2 +-
> target/rx/cpu.c | 3 +-
> target/rx/helper.c | 2 +-
> target/s390x/cpu-system.c | 2 +-
> target/sh4/cpu.c | 2 +-
> target/sh4/helper.c | 2 +-
> target/sparc/cpu.c | 2 +-
> target/sparc/int64_helper.c | 4 +--
> 36 files changed, 193 insertions(+), 139 deletions(-)
>
> --
> 2.47.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
2025-08-08 15:24 ` David Hildenbrand
@ 2025-08-11 12:08 ` Igor Mammedov
0 siblings, 0 replies; 25+ messages in thread
From: Igor Mammedov @ 2025-08-11 12:08 UTC (permalink / raw)
To: David Hildenbrand; +Cc: qemu-devel, mst, pbonzini, peterx, philmd, mtosatti
On Fri, 8 Aug 2025 17:24:54 +0200
David Hildenbrand <david@redhat.com> wrote:
> On 08.08.25 16:36, Igor Mammedov wrote:
> > On Fri, 8 Aug 2025 14:12:54 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >
> >> On 08.08.25 14:01, Igor Mammedov wrote:
> >>> This patch brings back Jan's idea [1] of BQL-free IO access
> >>>
> >>> This will let us make access to ACPI PM/HPET timers cheaper,
> >>> and prevent BQL contention in case of workload that heavily
> >>> uses the timers with a lot of vCPUs.
> >>>
> >>> 1) 196ea13104f (memory: Add global-locking property to memory regions)
> >>> ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> >>>
> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>> ---
> >>> v3:
> >>> add comment for 'mr->disable_reentrancy_guard = true'
> >>> Peter Xu <peterx@redhat.com>
> >>> ---
> >>> include/system/memory.h | 10 ++++++++++
> >>> system/memory.c | 15 +++++++++++++++
> >>> system/physmem.c | 2 +-
> >>> 3 files changed, 26 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/system/memory.h b/include/system/memory.h
> >>> index e2cd6ed126..d04366c994 100644
> >>> --- a/include/system/memory.h
> >>> +++ b/include/system/memory.h
> >>> @@ -833,6 +833,7 @@ struct MemoryRegion {
> >>> bool nonvolatile;
> >>> bool rom_device;
> >>> bool flush_coalesced_mmio;
> >>> + bool lockless_io;
> >>> bool unmergeable;
> >>> uint8_t dirty_log_mask;
> >>> bool is_iommu;
> >>> @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> >>> */
> >>> void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >>>
> >>> +/**
> >>> + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
> >>> + *
> >>> + * Enable BQL-free access for devices with fine-grained locking.
> >>> + *
> >>> + * @mr: the memory region to be updated.
> >>> + */
> >>> +void memory_region_enable_lockless_io(MemoryRegion *mr);
> >>
> >> Is this safe to use on any IO region, or could actually something break
> >> when mis-used? In case it's the latter, I assume we would want to
> >> carefully document under which scenarios this is safe to use.
> >
> > "for devices with fine-grained locking" is defining scope of where it's
> > applicable, in another words devices enabling this need to take care
> > of any locking if necessary.
>
> Okay, I would have stressed that a bit more, something like
>
> "Enable BQL-free access for devices that are well prepared to handle
> locking during I/O themselves: either by doing fine grained locking or
> by providing lock-free I/O schemes."
Thank,
I'll fix it up on respin
>
> Nothing else jumped at me.
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
2025-08-08 12:01 ` [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-08-08 12:12 ` David Hildenbrand
@ 2025-08-11 15:54 ` Peter Xu
1 sibling, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-11 15:54 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:28PM +0200, Igor Mammedov wrote:
> This patch brings back Jan's idea [1] of BQL-free IO access
>
> This will let us make access to ACPI PM/HPET timers cheaper,
> and prevent BQL contention in case of workload that heavily
> uses the timers with a lot of vCPUs.
>
> 1) 196ea13104f (memory: Add global-locking property to memory regions)
> ... de7ea885c539 (kvm: Switch to unlocked MMIO)
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 02/10] acpi: mark PMTIMER as unlocked
2025-08-08 12:01 ` [PATCH v3 02/10] acpi: mark PMTIMER as unlocked Igor Mammedov
@ 2025-08-11 15:55 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-11 15:55 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:29PM +0200, Igor Mammedov wrote:
> Reading QEMU_CLOCK_VIRTUAL is thread-safe, write access is NOP.
>
> This makes possible to boot Windows with large vCPUs count when
> hv-time is not used.
>
> Reproducer:
> -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4 -m 8G WS2025.img
> fails to boot within 30min.
>
> With this fix it boots within 2-1min.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 03/10] hpet: switch to fain-grained device locking
2025-08-08 12:01 ` [PATCH v3 03/10] hpet: switch to fain-grained device locking Igor Mammedov
@ 2025-08-11 15:56 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-11 15:56 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:30PM +0200, Igor Mammedov wrote:
> as a step towards lock-less HPET counter read,
> use per device locking instead of BQL.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 04/10] hpet: move out main counter read into a separate block
2025-08-08 12:01 ` [PATCH v3 04/10] hpet: move out main counter read into a separate block Igor Mammedov
@ 2025-08-11 15:56 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-11 15:56 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:31PM +0200, Igor Mammedov wrote:
> Follow up patche will switch main counter read to
> lock-less mode. As preparation for that move relevant
> branch into a separate top level block to make followup
> patch cleaner/simplier by reducing contextual noise
> when lock-less read is introduced.
>
> no functional changes.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 05/10] hpet: make main counter read lock-less
2025-08-08 12:01 ` [PATCH v3 05/10] hpet: make main counter read lock-less Igor Mammedov
@ 2025-08-11 15:58 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-11 15:58 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:32PM +0200, Igor Mammedov wrote:
> Make access to main HPET counter lock-less.
>
> In unlikely event of an update in progress, readers will busy wait
> untill update is finished.
>
> As result micro benchmark of concurrent reading of HPET counter
> with large number of vCPU shows over 80% better (less) latency.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 08/10] kvm: i386: irqchip: take BQL only if there is an interrupt
2025-08-08 12:01 ` [PATCH v3 08/10] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
@ 2025-08-11 16:22 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-11 16:22 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:35PM +0200, Igor Mammedov wrote:
> when kernel-irqchip=split is used, QEMU still hits BQL
> contention issue when reading ACPI PM/HPET timers
> (despite of timer[s] access being lock-less).
>
> So Windows with more than 255 cpus is still not able to
> boot (since it requires iommu -> split irqchip).
>
> Problematic path is in kvm_arch_pre_run() where BQL is taken
> unconditionally when split irqchip is in use.
>
> There are a few parts that BQL protects there:
> 1. interrupt check and injecting
>
> however we do not take BQL when checking for pending
> interrupt (even within the same function), so the patch
> takes the same approach for cpu->interrupt_request checks
> and takes BQL only if there is a job to do.
>
> 2. request_interrupt_window access
> CPUState::kvm_run::request_interrupt_window doesn't need BQL
> as it's accessed by its own vCPU thread.
>
> 3. cr8/cpu_get_apic_tpr access
> the same (as #2) applies to CPUState::kvm_run::cr8,
> and APIC registers are also cached/synced (get/put) within
> the vCPU thread it belongs to.
>
> Taking BQL only when is necessary, eleminates BQL bottleneck on
> IO/MMIO only exit path, improoving latency by 80% on HPET micro
> benchmark.
>
> This lets Windows to boot succesfully (in case hv-time isn't used)
> when more than 255 vCPUs are in use.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v3:
> * drop net needed pair of () in cpu->interrupt_request & CPU_INTERRUPT_HARD
> check
> * Paolo Bonzini <pbonzini@redhat.com>
> * don't take BQL when setting exit_request, use qatomic_set() instead
> * after above simplification take/release BQL unconditionally
> * drop smp_mb() after run->cr8/run->request_interrupt_window update
> ---
> target/i386/kvm/kvm.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index a7b5c8f81b..306430a052 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5478,9 +5478,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> }
> }
>
> - if (!kvm_pic_in_kernel()) {
> - bql_lock();
> - }
>
> /* Force the VCPU out of its inner loop to process any INIT requests
> * or (for userspace APIC, but it is cheap to combine the checks here)
> @@ -5489,10 +5486,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
> !(env->hflags & HF_SMM_MASK)) {
> - cpu->exit_request = 1;
> + qatomic_set(&cpu->exit_request, 1);
> }
> if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
> - cpu->exit_request = 1;
> + qatomic_set(&cpu->exit_request, 1);
I still see plenty of cases where exit_request will be updated without
qatomic_set().. There's even one in kvm.c (kvm_arch_process_async_events)
which is the same file.
Maybe it would make more sense to move these two changes separately, e.g. a
single patch changing tree-wide exit_request setters to use qatomic_set()?
But not that it's a huge deal..
Reviewed-by: Peter Xu <peterx@redhat.com>
> }
> }
>
> @@ -5503,6 +5500,8 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> (env->eflags & IF_MASK)) {
> int irq;
>
> + bql_lock();
> +
> cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> irq = cpu_get_pic_interrupt(env);
> if (irq >= 0) {
> @@ -5517,6 +5516,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> strerror(-ret));
> }
> }
> + bql_unlock();
> }
>
> /* If we have an interrupt but the guest is not ready to receive an
> @@ -5531,8 +5531,6 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>
> DPRINTF("setting tpr\n");
> run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
> -
> - bql_unlock();
> }
> }
>
> --
> 2.47.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks
2025-08-08 12:01 ` [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks Igor Mammedov
@ 2025-08-11 16:31 ` Peter Xu
2025-08-12 15:00 ` Igor Mammedov
0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2025-08-11 16:31 UTC (permalink / raw)
To: Igor Mammedov; +Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti
On Fri, Aug 08, 2025 at 02:01:33PM +0200, Igor Mammedov wrote:
> the helper forms load-acquire/store-release pair with
> tcg_handle_interrupt/generic_handle_interrupt and can be used
> for checking interrupts outside of BQL
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> include/hw/core/cpu.h | 12 ++++++++++++
> accel/tcg/tcg-accel-ops.c | 3 ++-
> system/cpus.c | 3 ++-
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 5eaf41a566..d0460c01cf 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -942,6 +942,18 @@ CPUState *cpu_by_arch_id(int64_t id);
>
> void cpu_interrupt(CPUState *cpu, int mask);
>
> +/**
> + * cpu_test_interrupt:
> + * @cpu: The CPU to check interrupt(s) on.
> + * @mask: The interrupts to check.
> + *
> + * Checks if any of interrupts in @mask are pending on @cpu.
> + */
> +static inline bool cpu_test_interrupt(CPUState *cpu, int mask)
> +{
> + return qatomic_load_acquire(&cpu->interrupt_request) & mask;
> +}
> +
> /**
> * cpu_set_pc:
> * @cpu: The CPU to set the program counter for.
> diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
> index 3b0d7d298e..02c7600bb7 100644
> --- a/accel/tcg/tcg-accel-ops.c
> +++ b/accel/tcg/tcg-accel-ops.c
> @@ -97,7 +97,8 @@ static void tcg_cpu_reset_hold(CPUState *cpu)
> /* mask must never be zero, except for A20 change call */
> void tcg_handle_interrupt(CPUState *cpu, int mask)
> {
> - cpu->interrupt_request |= mask;
> + qatomic_store_release(&cpu->interrupt_request,
> + cpu->interrupt_request | mask);
>
> /*
> * If called from iothread context, wake the target cpu in
> diff --git a/system/cpus.c b/system/cpus.c
> index 256723558d..8e543488c3 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -256,7 +256,8 @@ int64_t cpus_get_elapsed_ticks(void)
>
> void generic_handle_interrupt(CPUState *cpu, int mask)
> {
> - cpu->interrupt_request |= mask;
> + qatomic_store_release(&cpu->interrupt_request,
> + cpu->interrupt_request | mask);
>
> if (!qemu_cpu_is_self(cpu)) {
> qemu_cpu_kick(cpu);
> --
> 2.47.1
>
Besides the two writters mentioned above, I still see some more:
*** accel/tcg/user-exec.c:
cpu_interrupt[52] cpu->interrupt_request |= mask;
*** hw/intc/s390_flic.c:
qemu_s390_flic_notify[193] cs->interrupt_request |= CPU_INTERRUPT_HARD;
*** hw/openrisc/cputimer.c:
openrisc_timer_cb[108] cs->interrupt_request |= CPU_INTERRUPT_TIMER;
*** target/arm/helper.c:
arm_cpu_do_interrupt[9150] cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
*** target/i386/tcg/system/svm_helper.c:
helper_vmrun[406] cs->interrupt_request |= CPU_INTERRUPT_VIRQ;
Do they better as well be converted to use store_release too?
The other nitpick is this patch introduces the reader helper but without
using it.
It can be squashed into the other patch where the reader helper will be
applied tree-wide. IMHO that would be better explaining the helper on its
own.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks
2025-08-11 16:31 ` Peter Xu
@ 2025-08-12 15:00 ` Igor Mammedov
2025-08-12 16:10 ` Peter Xu
0 siblings, 1 reply; 25+ messages in thread
From: Igor Mammedov @ 2025-08-12 15:00 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti,
David Hildenbrand, Stafford Horne, richard.henderson,
Philippe Mathieu-Daudé, Alex Bennée, Peter Maydell
On Mon, 11 Aug 2025 12:31:27 -0400
Peter Xu <peterx@redhat.com> wrote:
> On Fri, Aug 08, 2025 at 02:01:33PM +0200, Igor Mammedov wrote:
> > the helper forms load-acquire/store-release pair with
> > tcg_handle_interrupt/generic_handle_interrupt and can be used
> > for checking interrupts outside of BQL
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > include/hw/core/cpu.h | 12 ++++++++++++
> > accel/tcg/tcg-accel-ops.c | 3 ++-
> > system/cpus.c | 3 ++-
> > 3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 5eaf41a566..d0460c01cf 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -942,6 +942,18 @@ CPUState *cpu_by_arch_id(int64_t id);
> >
> > void cpu_interrupt(CPUState *cpu, int mask);
> >
> > +/**
> > + * cpu_test_interrupt:
> > + * @cpu: The CPU to check interrupt(s) on.
> > + * @mask: The interrupts to check.
> > + *
> > + * Checks if any of interrupts in @mask are pending on @cpu.
> > + */
> > +static inline bool cpu_test_interrupt(CPUState *cpu, int mask)
> > +{
> > + return qatomic_load_acquire(&cpu->interrupt_request) & mask;
> > +}
> > +
> > /**
> > * cpu_set_pc:
> > * @cpu: The CPU to set the program counter for.
> > diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
> > index 3b0d7d298e..02c7600bb7 100644
> > --- a/accel/tcg/tcg-accel-ops.c
> > +++ b/accel/tcg/tcg-accel-ops.c
> > @@ -97,7 +97,8 @@ static void tcg_cpu_reset_hold(CPUState *cpu)
> > /* mask must never be zero, except for A20 change call */
> > void tcg_handle_interrupt(CPUState *cpu, int mask)
> > {
> > - cpu->interrupt_request |= mask;
> > + qatomic_store_release(&cpu->interrupt_request,
> > + cpu->interrupt_request | mask);
> >
> > /*
> > * If called from iothread context, wake the target cpu in
> > diff --git a/system/cpus.c b/system/cpus.c
> > index 256723558d..8e543488c3 100644
> > --- a/system/cpus.c
> > +++ b/system/cpus.c
> > @@ -256,7 +256,8 @@ int64_t cpus_get_elapsed_ticks(void)
> >
> > void generic_handle_interrupt(CPUState *cpu, int mask)
> > {
> > - cpu->interrupt_request |= mask;
> > + qatomic_store_release(&cpu->interrupt_request,
> > + cpu->interrupt_request | mask);
> >
> > if (!qemu_cpu_is_self(cpu)) {
> > qemu_cpu_kick(cpu);
> > --
> > 2.47.1
> >
>
> Besides the two writters mentioned above, I still see some more:
>
> *** accel/tcg/user-exec.c:
> cpu_interrupt[52] cpu->interrupt_request |= mask;
I don't know if external interrupts can happen in user mode,
for same thread(self) exceptions we don't really need it.
> *** hw/intc/s390_flic.c:
> qemu_s390_flic_notify[193] cs->interrupt_request |= CPU_INTERRUPT_HARD;
later on the function, for cpus it deemed not to ignore,
explicitly calls cpu_interrupt() which will do store_release.
I'd tentatively would say we don't need explicit store_release here
Anyways CCing David, perhaps he would recall why it's setting interrupt for all
but ignores to actually rise it for some.
> *** hw/openrisc/cputimer.c:
> openrisc_timer_cb[108] cs->interrupt_request |= CPU_INTERRUPT_TIMER;
this one seems to be similar to generic_handle_interrupt(),
interrupt request from external thread, so I'd add store_release here.
Arguably, it should be calling cpu_interrupt() instead of opencoding it.
(the questionable part is that it set interrupt conditionally but
kicks vccpu always - is this really necessary?)
> *** target/arm/helper.c:
> arm_cpu_do_interrupt[9150] cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
on kvm it can be reached via main thread or vcpu(self) thread.
so tentatively we need it here.
(ps: kvm_arch_on_sigbus_vcpu()->arm_cpu_do_interrupt() is called under BQL
while kvm_on_sigbus() is called without one from signal handler,
which theoretically might trample on the thread running the 1st vcpu)
> *** target/i386/tcg/system/svm_helper.c:
> helper_vmrun[406] cs->interrupt_request |= CPU_INTERRUPT_VIRQ;
this one seems to be self targeted (i.e. same thread),
perhaps TCG experts can comment on it.
CCing TCG folks as series touches a few TCG bits anyway
> Do they better as well be converted to use store_release too?
alternatively, for consistency sake we can add a helper to set interrupt
with store_release included and do a blanket tree wide change like with
cpu_test_interrupt().
> The other nitpick is this patch introduces the reader helper but without
> using it.
>
> It can be squashed into the other patch where the reader helper will be
> applied tree-wide. IMHO that would be better explaining the helper on its
> own.
I can merge it with 7/10 that adds 1st user for the helper in kvm/i386 code.
That has less chances for the store/acquire change to be drowned in
tree wide patch (9/10)
>
> Thanks,
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks
2025-08-12 15:00 ` Igor Mammedov
@ 2025-08-12 16:10 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-08-12 16:10 UTC (permalink / raw)
To: Igor Mammedov
Cc: qemu-devel, mst, pbonzini, david, philmd, mtosatti,
Stafford Horne, richard.henderson, Alex Bennée,
Peter Maydell
On Tue, Aug 12, 2025 at 05:00:35PM +0200, Igor Mammedov wrote:
> > Do they better as well be converted to use store_release too?
>
> alternatively, for consistency sake we can add a helper to set interrupt
> with store_release included and do a blanket tree wide change like with
> cpu_test_interrupt().
Yep, that sounds like the simplest. Unless there's any concern on possible
performance regressions due to the rest conversions on store_release, those
can be special treated with open-code, tagged with reasoning as comment.
>
> > The other nitpick is this patch introduces the reader helper but without
> > using it.
> >
> > It can be squashed into the other patch where the reader helper will be
> > applied tree-wide. IMHO that would be better explaining the helper on its
> > own.
>
> I can merge it with 7/10 that adds 1st user for the helper in kvm/i386 code.
> That has less chances for the store/acquire change to be drowned in
> tree wide patch (9/10)
For mem barrier changes, IMHO the two sides are better in one patch, hence
the two helpers need better to appear in one patch to show the pairing of
them. That's what this patche does.
Then, from any helper POV it's better one helper introduced together in the
patch where it will be used to justify the helper with the applicable context.
From that POV, the clean way, IMHO, should be that we have one prior patch
introducing the reader/writter helpers, together using it globally with
existing users. Then the kvm patch can use the new helpers.
No strong opinion here, though..
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-08-12 16:12 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08 12:01 [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-08-08 12:12 ` David Hildenbrand
2025-08-08 14:36 ` Igor Mammedov
2025-08-08 15:24 ` David Hildenbrand
2025-08-11 12:08 ` Igor Mammedov
2025-08-11 15:54 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 02/10] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-08-11 15:55 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 03/10] hpet: switch to fain-grained device locking Igor Mammedov
2025-08-11 15:56 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 04/10] hpet: move out main counter read into a separate block Igor Mammedov
2025-08-11 15:56 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 05/10] hpet: make main counter read lock-less Igor Mammedov
2025-08-11 15:58 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 06/10] introduce cpu_test_interrupt() that will replace open coded checks Igor Mammedov
2025-08-11 16:31 ` Peter Xu
2025-08-12 15:00 ` Igor Mammedov
2025-08-12 16:10 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 07/10] x86: kvm: use cpu_test_interrupt() instead of oppen coding checks Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 08/10] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-08-11 16:22 ` Peter Xu
2025-08-08 12:01 ` [PATCH v3 09/10] use cpu_test_interrupt() instead of oppen coding checks tree wide Igor Mammedov
2025-08-08 12:01 ` [PATCH v3 10/10] tcg: move interrupt caching and single step masking closer to user Igor Mammedov
2025-08-11 5:36 ` [PATCH v3 00/10] Reinvent BQL-free PIO/MMIO Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).