* [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev @ 2025-04-23 9:25 lirongqing 2025-04-23 14:59 ` Sean Christopherson 0 siblings, 1 reply; 6+ messages in thread From: lirongqing @ 2025-04-23 9:25 UTC (permalink / raw) To: pbonzini, kvm, linux-kernel; +Cc: Li RongQing, lizhaoxin From: Li RongQing <lirongqing@baidu.com> Use call_rcu() instead of costly synchronize_srcu_expedited(), this can reduce the VM bootup time, and reduce VM migration downtime Signed-off-by: lizhaoxin <lizhaoxin04@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 11 +++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 291d49b..e772704 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -203,6 +203,7 @@ struct kvm_io_range { #define NR_IOBUS_DEVS 1000 struct kvm_io_bus { + struct rcu_head rcu; int dev_count; int ioeventfd_count; struct kvm_io_range range[]; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2e591cc..af730a5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5865,6 +5865,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, return r < 0 ? r : 0; } +static void free_kvm_io_bus(struct rcu_head *rcu) +{ + struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu); + + kfree(bus); +} + int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, struct kvm_io_device *dev) { @@ -5903,8 +5910,8 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, memcpy(new_bus->range + i + 1, bus->range + i, (bus->dev_count - i) * sizeof(struct kvm_io_range)); rcu_assign_pointer(kvm->buses[bus_idx], new_bus); - synchronize_srcu_expedited(&kvm->srcu); - kfree(bus); + + call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); return 0; } -- 2.9.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev 2025-04-23 9:25 [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev lirongqing @ 2025-04-23 14:59 ` Sean Christopherson 2025-04-24 2:56 ` 答复: [????] " Li,Rongqing 2025-04-29 2:13 ` 答复: " Li,Rongqing 0 siblings, 2 replies; 6+ messages in thread From: Sean Christopherson @ 2025-04-23 14:59 UTC (permalink / raw) To: lirongqing; +Cc: pbonzini, kvm, linux-kernel, lizhaoxin On Wed, Apr 23, 2025, lirongqing wrote: > From: Li RongQing <lirongqing@baidu.com> > > Use call_rcu() instead of costly synchronize_srcu_expedited(), this > can reduce the VM bootup time, and reduce VM migration downtime > > Signed-off-by: lizhaoxin <lizhaoxin04@baidu.com> If lizhaoxin is a co-author, then this needs: Co-developed-by: lizhaoxin <lizhaoxin04@baidu.com> If they are _the_ author, then the From: above is wrong. > Signed-off-by: Li RongQing <lirongqing@baidu.com> > --- > include/linux/kvm_host.h | 1 + > virt/kvm/kvm_main.c | 11 +++++++++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 291d49b..e772704 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -203,6 +203,7 @@ struct kvm_io_range { > #define NR_IOBUS_DEVS 1000 > > struct kvm_io_bus { > + struct rcu_head rcu; > int dev_count; > int ioeventfd_count; > struct kvm_io_range range[]; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 2e591cc..af730a5 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -5865,6 +5865,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, > return r < 0 ? r : 0; > } > > +static void free_kvm_io_bus(struct rcu_head *rcu) > +{ > + struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu); > + > + kfree(bus); > +} > + > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, > int len, struct kvm_io_device *dev) > { > @@ -5903,8 +5910,8 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, > memcpy(new_bus->range + i + 1, bus->range + i, > (bus->dev_count - i) * sizeof(struct kvm_io_range)); > rcu_assign_pointer(kvm->buses[bus_idx], new_bus); > - synchronize_srcu_expedited(&kvm->srcu); > - kfree(bus); > + > + call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); I don't think this is safe from a functional correctness perspective, as KVM must guarantee all readers see the new device before KVM returns control to userspace. E.g. I'm pretty sure KVM_REGISTER_COALESCED_MMIO is used while vCPUs are active. However, I'm pretty sure the only readers that actually rely on SRCU are vCPUs, so I _think_ the synchronize_srcu_expedited() is necessary if and only if vCPUs have been created. That could race with concurrent vCPU creation in a few flows that don't take kvm->lock, but that should be ok from an ABI perspective. False positives (vCPU creation fails) are benign, and false negatives (vCPU created after the check) are inherently racy, i.e. userspace can't guarantee the vCPU sees any particular ordering. So this? if (READ_ONCE(kvm->created_vcpus)) { synchronize_srcu_expedited(&kvm->srcu); kfree(bus); } else { call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); } ^ permalink raw reply [flat|nested] 6+ messages in thread
* 答复: [????] Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev 2025-04-23 14:59 ` Sean Christopherson @ 2025-04-24 2:56 ` Li,Rongqing 2025-05-07 14:32 ` Sean Christopherson 2025-04-29 2:13 ` 答复: " Li,Rongqing 1 sibling, 1 reply; 6+ messages in thread From: Li,Rongqing @ 2025-04-24 2:56 UTC (permalink / raw) To: Sean Christopherson Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Li,Zhaoxin(ACG CCN) > On Wed, Apr 23, 2025, lirongqing wrote: > > From: Li RongQing <lirongqing@baidu.com> > > > > Use call_rcu() instead of costly synchronize_srcu_expedited(), this > > can reduce the VM bootup time, and reduce VM migration downtime > > > > Signed-off-by: lizhaoxin <lizhaoxin04@baidu.com> > > If lizhaoxin is a co-author, then this needs: > > Co-developed-by: lizhaoxin <lizhaoxin04@baidu.com> > Thanks, I will add > If they are _the_ author, then the From: above is wrong. > > > Signed-off-by: Li RongQing <lirongqing@baidu.com> > > --- > > include/linux/kvm_host.h | 1 + > > virt/kvm/kvm_main.c | 11 +++++++++-- > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index > > 291d49b..e772704 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -203,6 +203,7 @@ struct kvm_io_range { #define NR_IOBUS_DEVS > 1000 > > > > struct kvm_io_bus { > > + struct rcu_head rcu; > > int dev_count; > > int ioeventfd_count; > > struct kvm_io_range range[]; > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > > 2e591cc..af730a5 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -5865,6 +5865,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, > enum kvm_bus bus_idx, gpa_t addr, > > return r < 0 ? r : 0; > > } > > > > +static void free_kvm_io_bus(struct rcu_head *rcu) { > > + struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu); > > + > > + kfree(bus); > > +} > > + > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, > gpa_t addr, > > int len, struct kvm_io_device *dev) { @@ -5903,8 +5910,8 > @@ > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t > addr, > > memcpy(new_bus->range + i + 1, bus->range + i, > > (bus->dev_count - i) * sizeof(struct kvm_io_range)); > > rcu_assign_pointer(kvm->buses[bus_idx], new_bus); > > - synchronize_srcu_expedited(&kvm->srcu); > > - kfree(bus); > > + > > + call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); > > I don't think this is safe from a functional correctness perspective, as KVM must > guarantee all readers see the new device before KVM returns control to > userspace. > E.g. I'm pretty sure KVM_REGISTER_COALESCED_MMIO is used while vCPUs are > active. > > However, I'm pretty sure the only readers that actually rely on SRCU are vCPUs, > so I _think_ the synchronize_srcu_expedited() is necessary if and only if vCPUs > have been created. > > That could race with concurrent vCPU creation in a few flows that don't take > kvm->lock, but that should be ok from an ABI perspective. False > kvm->positives (vCPU > creation fails) are benign, and false negatives (vCPU created after the check) are > inherently racy, i.e. userspace can't guarantee the vCPU sees any particular > ordering. > > So this? > > if (READ_ONCE(kvm->created_vcpus)) { > synchronize_srcu_expedited(&kvm->srcu); > kfree(bus); > } else { > call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); > } If call_srcu is able to used only before creating vCPU, the upper will have little effect, since most device are created after creating vCPU We want to optimize the ioeventfd creation, since a VM will create lots of ioeventfd, can ioeventfd uses call_srcu? Like: --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -853,8 +853,8 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm, kvm_iodevice_init(&p->dev, &ioeventfd_ops); - ret = kvm_io_bus_register_dev(kvm, bus_idx, p->addr, p->length, - &p->dev); + ret = __kvm_io_bus_register_dev(kvm, bus_idx, p->addr, p->length, + &p->dev, false); if (ret < 0) goto unlock_fail; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2e591cc..ff294b0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5865,8 +5865,15 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, return r < 0 ? r : 0; } -int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, - int len, struct kvm_io_device *dev) +static void free_kvm_io_bus(struct rcu_head *rcu) +{ + struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu); + + kfree(bus); +} + +int __kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, + int len, struct kvm_io_device *dev, bool sync) { int i; struct kvm_io_bus *new_bus, *bus; @@ -5903,12 +5910,22 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, memcpy(new_bus->range + i + 1, bus->range + i, (bus->dev_count - i) * sizeof(struct kvm_io_range)); rcu_assign_pointer(kvm->buses[bus_idx], new_bus); - synchronize_srcu_expedited(&kvm->srcu); - kfree(bus); + + if (sync) { + synchronize_srcu_expedited(&kvm->srcu); + kfree(bus); + } + else + call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); return 0; } Thanks -Li ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: 答复: [????] Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev 2025-04-24 2:56 ` 答复: [????] " Li,Rongqing @ 2025-05-07 14:32 ` Sean Christopherson 2025-05-12 5:03 ` 答复: [????] Re: ??: " Li,Rongqing 0 siblings, 1 reply; 6+ messages in thread From: Sean Christopherson @ 2025-05-07 14:32 UTC (permalink / raw) To: Li Rongqing Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Li Zhaoxin On Thu, Apr 24, 2025, Li,Rongqing wrote: > > On Wed, Apr 23, 2025, lirongqing wrote: > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > > > 2e591cc..af730a5 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -5865,6 +5865,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, > > enum kvm_bus bus_idx, gpa_t addr, > > > return r < 0 ? r : 0; > > > } > > > > > > +static void free_kvm_io_bus(struct rcu_head *rcu) { > > > + struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu); > > > + > > > + kfree(bus); > > > +} > > > + > > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, > > gpa_t addr, > > > int len, struct kvm_io_device *dev) { @@ -5903,8 +5910,8 > > @@ > > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t > > addr, > > > memcpy(new_bus->range + i + 1, bus->range + i, > > > (bus->dev_count - i) * sizeof(struct kvm_io_range)); > > > rcu_assign_pointer(kvm->buses[bus_idx], new_bus); > > > - synchronize_srcu_expedited(&kvm->srcu); > > > - kfree(bus); > > > + > > > + call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); > > > > I don't think this is safe from a functional correctness perspective, as > > KVM must guarantee all readers see the new device before KVM returns > > control to userspace. E.g. I'm pretty sure KVM_REGISTER_COALESCED_MMIO is > > used while vCPUs are active. > > > > However, I'm pretty sure the only readers that actually rely on SRCU are vCPUs, > > so I _think_ the synchronize_srcu_expedited() is necessary if and only if vCPUs > > have been created. > > > > That could race with concurrent vCPU creation in a few flows that don't > > take kvm->lock, but that should be ok from an ABI perspective. False > > kvm->positives (vCPU creation fails) are benign, and false negatives (vCPU > > created after the check) are inherently racy, i.e. userspace can't > > guarantee the vCPU sees any particular ordering. > > > > So this? > > > > if (READ_ONCE(kvm->created_vcpus)) { > > synchronize_srcu_expedited(&kvm->srcu); > > kfree(bus); > > } else { > > call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); > > } > > > If call_srcu is able to used only before creating vCPU, the upper will have > little effect, since most device are created after creating vCPU Is that something that can be "fixed" in userspace? I.e. why are devices being created after vCPUs? > We want to optimize the ioeventfd creation, since a VM will create lots of > ioeventfd, Ah, so this isn't about device creation from userspace, rather it's about reacting to the guest's configuration of a device, e.g. to register doorbells when the guest instantiates queues for a device? > can ioeventfd uses call_srcu? No, because that has the same problem of KVM not ensuring vCPUs will observe the the change before returning to userspace. Unfortunately, I don't see an easy solution. At a glance, every architecture except arm64 could switch to protect kvm->buses with a rwlock, but arm64 uses the MMIO bus for the vGIC's ITS, and I don't think it's feasible to make the ITS stuff play nice with a rwlock. E.g. vgic_its.its_lock and vgic_its.cmd_lock are mutexes, and there are multiple ITS paths that access guest memory, i.e. might sleep due to faulting. Even if we did something x86-centric, e.g. futher special case KVM_FAST_MMIO_BUS with a rwlock, I worry that using a rwlock would degrade steady state performance, e.g. due to cross-CPU atomic accesses. Does using a dedicated SRCU structure resolve the issue? E.g. add and use kvm->buses_srcu instead of kvm->srcu? x86's usage of the MMIO/PIO buses is limited to kvm_io_bus_{read,write}(), so it should be easy enough to do a super quick and dirty PoC. ^ permalink raw reply [flat|nested] 6+ messages in thread
* 答复: [????] Re: ??: [????] Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev 2025-05-07 14:32 ` Sean Christopherson @ 2025-05-12 5:03 ` Li,Rongqing 0 siblings, 0 replies; 6+ messages in thread From: Li,Rongqing @ 2025-05-12 5:03 UTC (permalink / raw) To: Sean Christopherson Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Li,Zhaoxin(ACG CCN) > Ah, so this isn't about device creation from userspace, rather it's about reacting > to the guest's configuration of a device, e.g. to register doorbells when the > guest instantiates queues for a device? > Yes, the ioeventfds are registered when guest instantiates queues > > can ioeventfd uses call_srcu? > > No, because that has the same problem of KVM not ensuring vCPUs will observe > the the change before returning to userspace. > > Unfortunately, I don't see an easy solution. At a glance, every architecture > except arm64 could switch to protect kvm->buses with a rwlock, but arm64 uses > the MMIO bus for the vGIC's ITS, and I don't think it's feasible to make the ITS > stuff play nice with a rwlock. E.g. vgic_its.its_lock and vgic_its.cmd_lock are > mutexes, and there are multiple ITS paths that access guest memory, i.e. might > sleep due to faulting. > > Even if we did something x86-centric, e.g. futher special case > KVM_FAST_MMIO_BUS with a rwlock, I worry that using a rwlock would > degrade steady state performance, e.g. due to cross-CPU atomic accesses. > > Does using a dedicated SRCU structure resolve the issue? E.g. add and use > kvm->buses_srcu instead of kvm->srcu? x86's usage of the MMIO/PIO buses > kvm->is > limited to kvm_io_bus_{read,write}(), so it should be easy enough to do a super > quick and dirty PoC. Could you write a patch, we can test it Thanks -Li ^ permalink raw reply [flat|nested] 6+ messages in thread
* 答复: [????] Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev 2025-04-23 14:59 ` Sean Christopherson 2025-04-24 2:56 ` 答复: [????] " Li,Rongqing @ 2025-04-29 2:13 ` Li,Rongqing 1 sibling, 0 replies; 6+ messages in thread From: Li,Rongqing @ 2025-04-29 2:13 UTC (permalink / raw) To: Sean Christopherson Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Li,Zhaoxin(ACG CCN) > > rcu_assign_pointer(kvm->buses[bus_idx], new_bus); > > - synchronize_srcu_expedited(&kvm->srcu); > > - kfree(bus); > > + > > + call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); > > I don't think this is safe from a functional correctness perspective, as KVM must > guarantee all readers see the new device before KVM returns control to > userspace. > E.g. I'm pretty sure KVM_REGISTER_COALESCED_MMIO is used while vCPUs are > active. > > However, I'm pretty sure the only readers that actually rely on SRCU are vCPUs, > so I _think_ the synchronize_srcu_expedited() is necessary if and only if vCPUs > have been created. > This patch does not change rcu_assign_pointer(), and srcu_dereference will be used when vCPU read this buses, so I think synchronize_srcu_expedited is not a must? > That could race with concurrent vCPU creation in a few flows that don't take > kvm->lock, but that should be ok from an ABI perspective. False > kvm->positives (vCPU > creation fails) are benign, and false negatives (vCPU created after the check) are > inherently racy, i.e. userspace can't guarantee the vCPU sees any particular > ordering. > > So this? > > if (READ_ONCE(kvm->created_vcpus)) { > synchronize_srcu_expedited(&kvm->srcu); > kfree(bus); > } else { > call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus); > } ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-12 5:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-23 9:25 [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev lirongqing 2025-04-23 14:59 ` Sean Christopherson 2025-04-24 2:56 ` 答复: [????] " Li,Rongqing 2025-05-07 14:32 ` Sean Christopherson 2025-05-12 5:03 ` 答复: [????] Re: ??: " Li,Rongqing 2025-04-29 2:13 ` 答复: " Li,Rongqing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox