Re: [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Igor Mammedov <imammedo@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mst@redhat.com,
	mtosatti@redhat.com, kraxel@redhat.com, peter.maydell@linaro.org
Subject: Re: [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt
Date: Fri, 1 Aug 2025 10:42:15 +0200	[thread overview]
Message-ID: <20250801104215.2ceaa19a@fedora> (raw)
In-Reply-To: <aIvDC4nv1mUNLeMI@x1.local>

On Thu, 31 Jul 2025 15:24:59 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Jul 30, 2025 at 02:39:34PM +0200, Igor Mammedov wrote:
> > when kernel-irqchip=split is used, QEMU still hits BQL
> > contention issue when reading ACPI PM/HPET timers
> > (despite of timer[s] access being lock-less).
> > 
> > So Windows with more than 255 cpus is still not able to
> > boot (since it requires iommu -> split irqchip).
> > 
> > Problematic path is in kvm_arch_pre_run() where BQL is taken
> > unconditionally when split irqchip is in use.
> > 
> > There are a few parts tha BQL protects there:
> >   1. interrupt check and injecting
> > 
> >     however we do not take BQL when checking for pending
> >     interrupt (even within the same function), so the patch
> >     takes the same approach for cpu->interrupt_request checks
> >     and takes BQL only if there is a job to do.
> > 
> >   2. request_interrupt_window access
> >       CPUState::kvm_run::request_interrupt_window doesn't need BQL
> >       as it's accessed on side QEMU only by its own vCPU thread.
> >       The only thing that BQL provides there is implict barrier.
> >       Which can be done by using cheaper explicit barrier there.
> > 
> >   3. cr8/cpu_get_apic_tpr access
> >       the same (as #2) applies to CPUState::kvm_run::cr8 write,
> >       and APIC registers are also cached/synced (get/put) within
> >       the vCPU thread it belongs to.
> > 
> > Taking BQL only when is necessary, eleminates BQL bottleneck on
> > IO/MMIO only exit path, improoving latency by 80% on HPET micro
> > benchmark.
> > 
> > This lets Windows to boot succesfully (in case hv-time isn't used)
> > when more than 255 vCPUs are in use.  
> 
> Not familiar with this path, but the change looks reasonable, a few pure
> questions inline.

perhaps Paolo can give an expert opinion.

> 
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  target/i386/kvm/kvm.c | 58 +++++++++++++++++++++++++++----------------
> >  1 file changed, 37 insertions(+), 21 deletions(-)
> > 
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 369626f8c8..32024d50f5 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -5450,6 +5450,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> >  {
> >      X86CPU *x86_cpu = X86_CPU(cpu);
> >      CPUX86State *env = &x86_cpu->env;
> > +    bool release_bql = 0;
> >      int ret;
> >  
> >      /* Inject NMI */
> > @@ -5478,15 +5479,16 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> >          }
> >      }
> >  
> > -    if (!kvm_pic_in_kernel()) {
> > -        bql_lock();
> > -    }
> >  
> >      /* Force the VCPU out of its inner loop to process any INIT requests
> >       * or (for userspace APIC, but it is cheap to combine the checks here)
> >       * pending TPR access reports.
> >       */
> >      if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> > +        if (!kvm_pic_in_kernel()) {
> > +            bql_lock();
> > +            release_bql = true;
> > +        }  
> 
> Does updating exit_request need bql at all?  I saw the pattern is this:
> 
>         kvm_arch_pre_run(cpu, run);
>         if (qatomic_read(&cpu->exit_request)) {
>             trace_kvm_interrupt_exit_request();
>             /*
>              * KVM requires us to reenter the kernel after IO exits to complete
>              * instruction emulation. This self-signal will ensure that we
>              * leave ASAP again.
>              */
>             kvm_cpu_kick_self();
>         }
> 
> So setting exit_request=1 here will likely be read very soon later, in this
> case it seems the lock isn't needed.

Given I'm not familiar with the code, I'm changing locking logic only in
places I have to and preserving current current behavior elsewhere.

looking at the code, it seems we are always hold BQL when setting
exit_request.

> 
> >          if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
> >              !(env->hflags & HF_SMM_MASK)) {
> >              cpu->exit_request = 1;
> > @@ -5497,24 +5499,31 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> >      }
> >  
> >      if (!kvm_pic_in_kernel()) {
> > -        /* Try to inject an interrupt if the guest can accept it */
> > -        if (run->ready_for_interrupt_injection &&
> > -            (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> > -            (env->eflags & IF_MASK)) {
> > -            int irq;
> > -
> > -            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> > -            irq = cpu_get_pic_interrupt(env);
> > -            if (irq >= 0) {
> > -                struct kvm_interrupt intr;
> > -
> > -                intr.irq = irq;
> > -                DPRINTF("injected interrupt %d\n", irq);
> > -                ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> > -                if (ret < 0) {
> > -                    fprintf(stderr,
> > -                            "KVM: injection failed, interrupt lost (%s)\n",
> > -                            strerror(-ret));
> > +        if (cpu->interrupt_request & CPU_INTERRUPT_HARD) {
> > +            if (!release_bql) {
> > +                bql_lock();
> > +                release_bql = true;
> > +            }
> > +
> > +            /* Try to inject an interrupt if the guest can accept it */
> > +            if (run->ready_for_interrupt_injection &&
> > +                (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> > +                (env->eflags & IF_MASK)) {
> > +                int irq;
> > +
> > +                cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> > +                irq = cpu_get_pic_interrupt(env);
> > +                if (irq >= 0) {
> > +                    struct kvm_interrupt intr;
> > +
> > +                    intr.irq = irq;
> > +                    DPRINTF("injected interrupt %d\n", irq);
> > +                    ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> > +                    if (ret < 0) {
> > +                        fprintf(stderr,
> > +                                "KVM: injection failed, interrupt lost (%s)\n",
> > +                                strerror(-ret));
> > +                    }
> >                  }
> >              }
> >          }
> > @@ -5531,7 +5540,14 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> >  
> >          DPRINTF("setting tpr\n");
> >          run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
> > +        /*
> > +         * make sure that request_interrupt_window/cr8 are set
> > +         * before KVM_RUN might read them
> > +         */
> > +        smp_mb();  
> 
> Is this mb() needed if KVM_RUN will always happen in the same thread anyway?


it matches with similar pattern:

        /* Read cpu->exit_request before KVM_RUN reads run->immediate_exit.      
         * Matching barrier in kvm_eat_signals.                                  
         */                                                                                                     
        smp_rmb();                                                               
                                                                                 
        run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0); 

to be on the safe side, this is preserving barrier that BQL has provided before
I can drop it if it's not really needed.

> 
> Thanks,
> 
> > +    }
> >  
> > +    if (release_bql) {
> >          bql_unlock();
> >      }
> >  }
> > -- 
> > 2.47.1
> >   
>

next prev parent reply	other threads:[~2025-08-01  8:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30 12:39 [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 1/6] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-07-30 21:47   ` Peter Xu
2025-07-31  8:15     ` Igor Mammedov
2025-08-01 12:42     ` Igor Mammedov
2025-08-01 13:19       ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 2/6] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 3/6] hpet: switch to fain-grained device locking Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 4/6] hpet: move out main counter read into a separate block Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 5/6] hpet: make main counter read lock-less Igor Mammedov
2025-07-30 22:15   ` Peter Xu
2025-07-31  8:32     ` Igor Mammedov
2025-07-31 14:02       ` Peter Xu
2025-08-01  8:06         ` Igor Mammedov
2025-08-01 13:32           ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-07-31 19:24   ` Peter Xu
2025-08-01  8:42     ` Igor Mammedov [this message]
2025-08-01 13:08       ` Paolo Bonzini
2025-08-01 10:26   ` Paolo Bonzini
2025-08-01 12:47     ` Igor Mammedov
2025-07-31 21:15 ` [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250801104215.2ceaa19a@fedora \
    --to=imammedo@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).