From: Igor Mammedov <imammedo@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mst@redhat.com,
mtosatti@redhat.com, kraxel@redhat.com, peter.maydell@linaro.org
Subject: Re: [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt
Date: Fri, 1 Aug 2025 10:42:15 +0200 [thread overview]
Message-ID: <20250801104215.2ceaa19a@fedora> (raw)
In-Reply-To: <aIvDC4nv1mUNLeMI@x1.local>
On Thu, 31 Jul 2025 15:24:59 -0400
Peter Xu <peterx@redhat.com> wrote:
> On Wed, Jul 30, 2025 at 02:39:34PM +0200, Igor Mammedov wrote:
> > when kernel-irqchip=split is used, QEMU still hits BQL
> > contention issue when reading ACPI PM/HPET timers
> > (despite of timer[s] access being lock-less).
> >
> > So Windows with more than 255 cpus is still not able to
> > boot (since it requires iommu -> split irqchip).
> >
> > Problematic path is in kvm_arch_pre_run() where BQL is taken
> > unconditionally when split irqchip is in use.
> >
> > There are a few parts tha BQL protects there:
> > 1. interrupt check and injecting
> >
> > however we do not take BQL when checking for pending
> > interrupt (even within the same function), so the patch
> > takes the same approach for cpu->interrupt_request checks
> > and takes BQL only if there is a job to do.
> >
> > 2. request_interrupt_window access
> > CPUState::kvm_run::request_interrupt_window doesn't need BQL
> > as it's accessed on side QEMU only by its own vCPU thread.
> > The only thing that BQL provides there is implict barrier.
> > Which can be done by using cheaper explicit barrier there.
> >
> > 3. cr8/cpu_get_apic_tpr access
> > the same (as #2) applies to CPUState::kvm_run::cr8 write,
> > and APIC registers are also cached/synced (get/put) within
> > the vCPU thread it belongs to.
> >
> > Taking BQL only when is necessary, eleminates BQL bottleneck on
> > IO/MMIO only exit path, improoving latency by 80% on HPET micro
> > benchmark.
> >
> > This lets Windows to boot succesfully (in case hv-time isn't used)
> > when more than 255 vCPUs are in use.
>
> Not familiar with this path, but the change looks reasonable, a few pure
> questions inline.
perhaps Paolo can give an expert opinion.
>
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > target/i386/kvm/kvm.c | 58 +++++++++++++++++++++++++++----------------
> > 1 file changed, 37 insertions(+), 21 deletions(-)
> >
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 369626f8c8..32024d50f5 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -5450,6 +5450,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> > {
> > X86CPU *x86_cpu = X86_CPU(cpu);
> > CPUX86State *env = &x86_cpu->env;
> > + bool release_bql = 0;
> > int ret;
> >
> > /* Inject NMI */
> > @@ -5478,15 +5479,16 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> > }
> > }
> >
> > - if (!kvm_pic_in_kernel()) {
> > - bql_lock();
> > - }
> >
> > /* Force the VCPU out of its inner loop to process any INIT requests
> > * or (for userspace APIC, but it is cheap to combine the checks here)
> > * pending TPR access reports.
> > */
> > if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> > + if (!kvm_pic_in_kernel()) {
> > + bql_lock();
> > + release_bql = true;
> > + }
>
> Does updating exit_request need bql at all? I saw the pattern is this:
>
> kvm_arch_pre_run(cpu, run);
> if (qatomic_read(&cpu->exit_request)) {
> trace_kvm_interrupt_exit_request();
> /*
> * KVM requires us to reenter the kernel after IO exits to complete
> * instruction emulation. This self-signal will ensure that we
> * leave ASAP again.
> */
> kvm_cpu_kick_self();
> }
>
> So setting exit_request=1 here will likely be read very soon later, in this
> case it seems the lock isn't needed.
Given I'm not familiar with the code, I'm changing locking logic only in
places I have to and preserving current current behavior elsewhere.
looking at the code, it seems we are always hold BQL when setting
exit_request.
>
> > if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
> > !(env->hflags & HF_SMM_MASK)) {
> > cpu->exit_request = 1;
> > @@ -5497,24 +5499,31 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> > }
> >
> > if (!kvm_pic_in_kernel()) {
> > - /* Try to inject an interrupt if the guest can accept it */
> > - if (run->ready_for_interrupt_injection &&
> > - (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> > - (env->eflags & IF_MASK)) {
> > - int irq;
> > -
> > - cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> > - irq = cpu_get_pic_interrupt(env);
> > - if (irq >= 0) {
> > - struct kvm_interrupt intr;
> > -
> > - intr.irq = irq;
> > - DPRINTF("injected interrupt %d\n", irq);
> > - ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> > - if (ret < 0) {
> > - fprintf(stderr,
> > - "KVM: injection failed, interrupt lost (%s)\n",
> > - strerror(-ret));
> > + if (cpu->interrupt_request & CPU_INTERRUPT_HARD) {
> > + if (!release_bql) {
> > + bql_lock();
> > + release_bql = true;
> > + }
> > +
> > + /* Try to inject an interrupt if the guest can accept it */
> > + if (run->ready_for_interrupt_injection &&
> > + (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> > + (env->eflags & IF_MASK)) {
> > + int irq;
> > +
> > + cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> > + irq = cpu_get_pic_interrupt(env);
> > + if (irq >= 0) {
> > + struct kvm_interrupt intr;
> > +
> > + intr.irq = irq;
> > + DPRINTF("injected interrupt %d\n", irq);
> > + ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> > + if (ret < 0) {
> > + fprintf(stderr,
> > + "KVM: injection failed, interrupt lost (%s)\n",
> > + strerror(-ret));
> > + }
> > }
> > }
> > }
> > @@ -5531,7 +5540,14 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> >
> > DPRINTF("setting tpr\n");
> > run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
> > + /*
> > + * make sure that request_interrupt_window/cr8 are set
> > + * before KVM_RUN might read them
> > + */
> > + smp_mb();
>
> Is this mb() needed if KVM_RUN will always happen in the same thread anyway?
it matches with similar pattern:
/* Read cpu->exit_request before KVM_RUN reads run->immediate_exit.
* Matching barrier in kvm_eat_signals.
*/
smp_rmb();
run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
to be on the safe side, this is preserving barrier that BQL has provided before
I can drop it if it's not really needed.
>
> Thanks,
>
> > + }
> >
> > + if (release_bql) {
> > bql_unlock();
> > }
> > }
> > --
> > 2.47.1
> >
>
next prev parent reply other threads:[~2025-08-01 8:43 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 12:39 [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 1/6] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-07-30 21:47 ` Peter Xu
2025-07-31 8:15 ` Igor Mammedov
2025-08-01 12:42 ` Igor Mammedov
2025-08-01 13:19 ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 2/6] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 3/6] hpet: switch to fain-grained device locking Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 4/6] hpet: move out main counter read into a separate block Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 5/6] hpet: make main counter read lock-less Igor Mammedov
2025-07-30 22:15 ` Peter Xu
2025-07-31 8:32 ` Igor Mammedov
2025-07-31 14:02 ` Peter Xu
2025-08-01 8:06 ` Igor Mammedov
2025-08-01 13:32 ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-07-31 19:24 ` Peter Xu
2025-08-01 8:42 ` Igor Mammedov [this message]
2025-08-01 13:08 ` Paolo Bonzini
2025-08-01 10:26 ` Paolo Bonzini
2025-08-01 12:47 ` Igor Mammedov
2025-07-31 21:15 ` [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Philippe Mathieu-Daudé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250801104215.2ceaa19a@fedora \
--to=imammedo@redhat.com \
--cc=kraxel@redhat.com \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).