public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Nikita Kalyazin <kalyazin@amazon.com>
To: Keir Fraser <keirf@google.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>
Cc: Sean Christopherson <seanjc@google.com>,
	Eric Auger <eric.auger@redhat.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Li RongQing <lirongqing@baidu.com>
Subject: Re: [PATCH v4 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()
Date: Fri, 13 Feb 2026 15:42:16 +0000	[thread overview]
Message-ID: <a84ddba8-12da-489a-9dd1-ccdf7451a1ba@amazon.com> (raw)
In-Reply-To: <20250909100007.3136249-5-keirf@google.com>



On 09/09/2025 11:00, Keir Fraser wrote:
> Device MMIO registration may happen quite frequently during VM boot,
> and the SRCU synchronization each time has a measurable effect
> on VM startup time. In our experiments it can account for around 25%
> of a VM's startup time.
> 
> Replace the synchronization with a deferred free of the old kvm_io_bus
> structure.


Hi,

We noticed that this change introduced a regression of ~20 ms to the 
first KVM_CREATE_VCPU call of a VM, which is significant for our use case.

Before the patch:
45726 14:45:32.914330 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.000137>
45726 14:45:32.914533 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000046>

After the patch:
30295 14:47:08.057412 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.025182>
30295 14:47:08.082663 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000031>

The reason, as I understand, it happens is call_srcu() called from 
kvm_io_bus_register_dev() are adding callbacks to be called after a 
normal GP, which is 10 ms with HZ=100.  The subsequent 
synchronize_srcu_expedited() called from kvm_swap_active_memslots() 
(from KVM_CREATE_VCPU) has to wait for the normal GP to complete before 
making progress.  I don't fully understand why the delay is consistently 
greater than 1 GP, but that's what we see across our testing scenarios.

I verified that the problem is relaxed if the GP is reduced by 
configuring HZ=1000.  In that case, the regression is in the order of 1 ms.

It looks like in our case we don't benefit much from the intended 
optimisation as the number of device MMIO registrations is limited and 
and they don't cost us much (each takes at most 16 us, but most commonly 
~6 us):

      firecracker 68452 [054]  3053.183991: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.184007: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc03aa190)
      firecracker 68452 [054]  3053.184007: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.184014: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc03aa1b9)
      firecracker 68452 [054]  3053.184015: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.184021: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc03aa1db)
      firecracker 68452 [054]  3053.184028: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.184034: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc03ac957)
      firecracker 68452 [054]  3053.184093: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.184099: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc03ab51a)
      firecracker 68452 [054]  3053.184100: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.184106: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc03ab549)
      firecracker 68452 [054]  3053.193145: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.193164: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc0348c9f)
      firecracker 68452 [054]  3053.193165: 
kprobes:kvm_io_bus_register_dev: (ffffffffc0348390)
      firecracker 68452 [054]  3053.193171: 
kprobes:kvm_io_bus_register_dev__return: (ffffffffc0348390 <- 
ffffffffc0348c9f)

Our env:
  - 6.18
  - Arch: the analysis above is from x86, but ARM regressed very similarly
  - CONFIG_HZ=100
  - VMM: Firecracker (https://github.com/firecracker-microvm/firecracker)


I am not aware of way to make it fast for both use cases and would be 
more than happy to hear about possible solutions.


Thanks,
Nikita

> 
> Tested-by: Li RongQing <lirongqing@baidu.com>
> Signed-off-by: Keir Fraser <keirf@google.com>
> ---
>   include/linux/kvm_host.h |  1 +
>   virt/kvm/kvm_main.c      | 11 +++++++++--
>   2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e7d6111cf254..103be35caf0d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -206,6 +206,7 @@ struct kvm_io_range {
>   struct kvm_io_bus {
>   	int dev_count;
>   	int ioeventfd_count;
> +	struct rcu_head rcu;
>   	struct kvm_io_range range[];
>   };
>   
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 870ad8ea93a7..bcef324ccbf2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1320,6 +1320,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>   		kvm_free_memslots(kvm, &kvm->__memslots[i][1]);
>   	}
>   	cleanup_srcu_struct(&kvm->irq_srcu);
> +	srcu_barrier(&kvm->srcu);
>   	cleanup_srcu_struct(&kvm->srcu);
>   #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>   	xa_destroy(&kvm->mem_attr_array);
> @@ -5952,6 +5953,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
>   }
>   EXPORT_SYMBOL_GPL(kvm_io_bus_read);
>   
> +static void __free_bus(struct rcu_head *rcu)
> +{
> +	struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu);
> +
> +	kfree(bus);
> +}
> +
>   int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
>   			    int len, struct kvm_io_device *dev)
>   {
> @@ -5990,8 +5998,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
>   	memcpy(new_bus->range + i + 1, bus->range + i,
>   		(bus->dev_count - i) * sizeof(struct kvm_io_range));
>   	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> -	synchronize_srcu_expedited(&kvm->srcu);
> -	kfree(bus);
> +	call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
>   
>   	return 0;
>   }



  reply	other threads:[~2026-02-13 15:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 10:00 [PATCH v4 0/4] KVM: Speed up MMIO registrations Keir Fraser
2025-09-09 10:00 ` [PATCH v4 1/4] KVM: arm64: vgic-init: Remove vgic_ready() macro Keir Fraser
2025-09-09 10:00 ` [PATCH v4 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering Keir Fraser
2025-09-09 10:00 ` [PATCH v4 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths Keir Fraser
2025-09-09 10:00 ` [PATCH v4 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev() Keir Fraser
2026-02-13 15:42   ` Nikita Kalyazin [this message]
2026-02-13 23:20     ` Sean Christopherson
2026-02-16 17:53       ` Nikita Kalyazin
2026-02-17 19:07         ` Sean Christopherson
2026-02-18 12:55           ` Nikita Kalyazin
2026-02-18 16:02             ` Keir Fraser
2026-02-18 16:15               ` Nikita Kalyazin
2026-02-19  7:50                 ` Keir Fraser
2026-02-19 11:02                   ` Nikita Kalyazin
2025-09-15  9:59 ` [PATCH v4 0/4] KVM: Speed up MMIO registrations Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a84ddba8-12da-489a-9dd1-ccdf7451a1ba@amazon.com \
    --to=kalyazin@amazon.com \
    --cc=eric.auger@redhat.com \
    --cc=keirf@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox