From: Igor Mammedov <imammedo@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
qemu-ppc <qemu-ppc@nongnu.org>, Peter Xu <peterx@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
mtosatti@redhat.com, qemu-devel@nongnu.org,
Peter Collingbourne <pcc@google.com>,
Alexander Graf <agraf@csgraf.de>,
Roman Bolshakov <r.bolshakov@yadro.com>,
Sergio Lopez <slp@redhat.com>, qemu-arm <qemu-arm@nongnu.org>,
Mohamed Mediouni <mohamed@unpredictable.fr>
Subject: Re: [PATCH v4 7/8] kvm: i386: irqchip: take BQL only if there is an interrupt
Date: Wed, 27 Aug 2025 10:40:52 +0200 [thread overview]
Message-ID: <20250827104052.655134e9@fedora> (raw)
In-Reply-To: <c16db8b9-9b73-49c4-8233-370c1b46a382@linaro.org>
On Mon, 25 Aug 2025 12:46:07 +0200
Philippe Mathieu-Daudé <philmd@linaro.org> wrote:
> On 14/8/25 18:05, Igor Mammedov wrote:
> > when kernel-irqchip=split is used, QEMU still hits BQL
> > contention issue when reading ACPI PM/HPET timers
> > (despite of timer[s] access being lock-less).
> >
> > So Windows with more than 255 cpus is still not able to
> > boot (since it requires iommu -> split irqchip).
> >
> > Problematic path is in kvm_arch_pre_run() where BQL is taken
> > unconditionally when split irqchip is in use.
> >
> > There are a few parts that BQL protects there:
> > 1. interrupt check and injecting
> >
> > however we do not take BQL when checking for pending
> > interrupt (even within the same function), so the patch
> > takes the same approach for cpu->interrupt_request checks
> > and takes BQL only if there is a job to do.
> >
> > 2. request_interrupt_window access
> > CPUState::kvm_run::request_interrupt_window doesn't need BQL
> > as it's accessed by its own vCPU thread.
> >
> > 3. cr8/cpu_get_apic_tpr access
> > the same (as #2) applies to CPUState::kvm_run::cr8,
> > and APIC registers are also cached/synced (get/put) within
> > the vCPU thread it belongs to.
> >
> > Taking BQL only when is necessary, eleminates BQL bottleneck on
> > IO/MMIO only exit path, improoving latency by 80% on HPET micro
> > benchmark.
> >
> > This lets Windows to boot succesfully (in case hv-time isn't used)
> > when more than 255 vCPUs are in use.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > ---
> > v3:
> > * drop net needed pair of () in cpu->interrupt_request & CPU_INTERRUPT_HARD
> > check
> > * Paolo Bonzini <pbonzini@redhat.com>
> > * don't take BQL when setting exit_request, use qatomic_set() instead
> > * after above simplification take/release BQL unconditionally
> > * drop smp_mb() after run->cr8/run->request_interrupt_window update
> > ---
> > target/i386/kvm/kvm.c | 12 +++++-------
> > 1 file changed, 5 insertions(+), 7 deletions(-)
>
>
> > /* Force the VCPU out of its inner loop to process any INIT requests
> > * or (for userspace APIC, but it is cheap to combine the checks here)
> > @@ -5489,10 +5486,10 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> > if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> > if (cpu_test_interrupt(cpu, CPU_INTERRUPT_INIT) &&
> > !(env->hflags & HF_SMM_MASK)) {
> > - cpu->exit_request = 1;
> > + qatomic_set(&cpu->exit_request, 1);
> > }
> > if (cpu_test_interrupt(cpu, CPU_INTERRUPT_TPR)) {
> > - cpu->exit_request = 1;
> > + qatomic_set(&cpu->exit_request, 1);
> > }
> > }
>
> Interesting. IMHO to avoid future similar problems, we should declare
> CPUState::exit_request a "private" field and access it via a helper,
> where atomicity is enforced.
I did only bare minimum here, while
Paolo took over sanitizing/fixing exit_request part
see
https://patchew.org/QEMU/20250808185905.62776-1-pbonzini@redhat.com/
so this suggestion better fits there.
> The only non-accelerator use is in PPC sPAPR:
>
> hw/ppc/spapr_hcall.c:512: cs->exit_request = 1;
> hw/ppc/spapr_hcall.c:534: cs->exit_request = 1;
> hw/ppc/spapr_hcall.c:627: cs->exit_request = 1;
>
> FYI last week we noticed a similar problem with CPUState::thread_kicked
> when using HVF and I'll post in a few days a series containing this
> change:
>
> -- >8 --
> diff --git a/system/cpus.c b/system/cpus.c
> index 26fb3bd69c3..39daf85bae7 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -464,10 +464,10 @@ void qemu_wait_io_event(CPUState *cpu)
>
> void cpus_kick_thread(CPUState *cpu)
> {
> - if (cpu->thread_kicked) {
> + if (qatomic_read(&cpu->thread_kicked)) {
> return;
> }
> - cpu->thread_kicked = true;
> + qatomic_set(&cpu->thread_kicked, true);
>
> ---
>
> It only affects HVF because this is the single access to thread_kicked
> out of accelerator code:
>
> $ git grep -w thread_kicked
> include/hw/core/cpu.h:484: bool thread_kicked;
> system/cpus.c:449: qatomic_set_mb(&cpu->thread_kicked, false);
> system/cpus.c:476: if (cpu->thread_kicked) {
> system/cpus.c:479: cpu->thread_kicked = true;
> target/arm/hvf/hvf.c:1825: qatomic_set_mb(&cpu->thread_kicked, false);
>
> (Call introduced in commit 219c101fa7f "arm/hvf: Add a WFI handler").
>
next prev parent reply other threads:[~2025-08-27 8:42 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 16:05 [PATCH v4 0/8] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-08-14 16:05 ` [PATCH v4 1/8] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-08-25 10:55 ` Philippe Mathieu-Daudé
2025-08-14 16:05 ` [PATCH v4 2/8] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-08-14 16:05 ` [PATCH v4 3/8] hpet: switch to fain-grained device locking Igor Mammedov
2025-08-25 14:43 ` Zhao Liu
2025-08-14 16:05 ` [PATCH v4 4/8] hpet: move out main counter read into a separate block Igor Mammedov
2025-08-25 14:44 ` Zhao Liu
2025-08-14 16:05 ` [PATCH v4 5/8] hpet: make main counter read lock-less Igor Mammedov
2025-08-25 14:55 ` Zhao Liu
2025-08-25 15:10 ` Igor Mammedov
2025-08-14 16:05 ` [PATCH v4 6/8] add cpu_test_interrupt()/cpu_set_interrupt() helpers and use them tree wide Igor Mammedov
2025-08-14 19:05 ` Peter Xu
2025-08-20 15:01 ` Jason J. Herne
2025-08-21 15:57 ` Igor Mammedov
2025-08-21 15:56 ` [PATCH v5 " Igor Mammedov
2025-08-25 8:16 ` Harsh Prateek Bora
2025-08-25 15:06 ` Igor Mammedov
2025-08-25 10:35 ` Philippe Mathieu-Daudé
2025-08-25 15:02 ` Igor Mammedov
2025-08-25 15:28 ` Zhao Liu
2025-08-25 15:19 ` Igor Mammedov
2025-08-26 7:45 ` Zhao Liu
2025-08-26 8:47 ` Igor Mammedov
2025-08-26 9:27 ` Zhao Liu
2025-08-29 8:18 ` Paolo Bonzini
2025-08-29 12:33 ` Paolo Bonzini
2025-09-01 12:05 ` Igor Mammedov
2025-09-01 12:06 ` Paolo Bonzini
2025-08-14 16:05 ` [PATCH v4 7/8] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-08-25 10:46 ` Philippe Mathieu-Daudé
2025-08-27 8:40 ` Igor Mammedov [this message]
2025-08-14 16:06 ` [PATCH v4 8/8] tcg: move interrupt caching and single step masking closer to user Igor Mammedov
2025-08-29 8:19 ` [PATCH v4 0/8] Reinvent BQL-free PIO/MMIO Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250827104052.655134e9@fedora \
--to=imammedo@redhat.com \
--cc=agraf@csgraf.de \
--cc=mohamed@unpredictable.fr \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=pcc@google.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=r.bolshakov@yadro.com \
--cc=slp@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).