All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Li Rongqing <lirongqing@baidu.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Li Zhaoxin <lizhaoxin04@baidu.com>
Subject: Re: 答复: [????] Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev
Date: Wed, 7 May 2025 07:32:43 -0700	[thread overview]
Message-ID: <aBtgTnQU0JlNq2Y3@google.com> (raw)
In-Reply-To: <4bfe7a8f5020448e903e6335173afc75@baidu.com>

On Thu, Apr 24, 2025, Li,Rongqing wrote:
> > On Wed, Apr 23, 2025, lirongqing wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > 2e591cc..af730a5 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -5865,6 +5865,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu,
> > enum kvm_bus bus_idx, gpa_t addr,
> > >  	return r < 0 ? r : 0;
> > >  }
> > >
> > > +static void free_kvm_io_bus(struct rcu_head *rcu) {
> > > +	struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu);
> > > +
> > > +	kfree(bus);
> > > +}
> > > +
> > >  int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> > gpa_t addr,
> > >  			    int len, struct kvm_io_device *dev)  { @@ -5903,8 +5910,8
> > @@
> > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t
> > addr,
> > >  	memcpy(new_bus->range + i + 1, bus->range + i,
> > >  		(bus->dev_count - i) * sizeof(struct kvm_io_range));
> > >  	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> > > -	synchronize_srcu_expedited(&kvm->srcu);
> > > -	kfree(bus);
> > > +
> > > +	call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus);
> > 
> > I don't think this is safe from a functional correctness perspective, as
> > KVM must guarantee all readers see the new device before KVM returns
> > control to userspace.  E.g. I'm pretty sure KVM_REGISTER_COALESCED_MMIO is
> > used while vCPUs are active.
> > 
> > However, I'm pretty sure the only readers that actually rely on SRCU are vCPUs,
> > so I _think_ the synchronize_srcu_expedited() is necessary if and only if vCPUs
> > have been created.
> > 
> > That could race with concurrent vCPU creation in a few flows that don't
> > take kvm->lock, but that should be ok from an ABI perspective.  False
> > kvm->positives (vCPU creation fails) are benign, and false negatives (vCPU
> > created after the check) are inherently racy, i.e. userspace can't
> > guarantee the vCPU sees any particular ordering.
> > 
> > So this?
> > 
> > 	if (READ_ONCE(kvm->created_vcpus)) {
> > 		synchronize_srcu_expedited(&kvm->srcu);
> > 		kfree(bus);
> > 	} else {
> > 		call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus);
> > 	}
> 
> 
> If call_srcu is able to used only before creating vCPU, the upper will have
> little effect, since most device are created after creating vCPU

Is that something that can be "fixed" in userspace?  I.e. why are devices being
created after vCPUs?

> We want to optimize the ioeventfd creation, since a VM will create lots of
> ioeventfd, 

Ah, so this isn't about device creation from userspace, rather it's about reacting
to the guest's configuration of a device, e.g. to register doorbells when the guest
instantiates queues for a device?

> can ioeventfd uses call_srcu?

No, because that has the same problem of KVM not ensuring vCPUs will observe the
the change before returning to userspace.

Unfortunately, I don't see an easy solution.  At a glance, every architecture
except arm64 could switch to protect kvm->buses with a rwlock, but arm64 uses
the MMIO bus for the vGIC's ITS, and I don't think it's feasible to make the ITS
stuff play nice with a rwlock.  E.g. vgic_its.its_lock and vgic_its.cmd_lock are
mutexes, and there are multiple ITS paths that access guest memory, i.e. might
sleep due to faulting.

Even if we did something x86-centric, e.g. futher special case KVM_FAST_MMIO_BUS
with a rwlock, I worry that using a rwlock would degrade steady state performance,
e.g. due to cross-CPU atomic accesses.

Does using a dedicated SRCU structure resolve the issue?  E.g. add and use
kvm->buses_srcu instead of kvm->srcu?  x86's usage of the MMIO/PIO buses is
limited to kvm_io_bus_{read,write}(), so it should be easy enough to do a super
quick and dirty PoC.

  reply	other threads:[~2025-05-07 14:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23  9:25 [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev lirongqing
2025-04-23 14:59 ` Sean Christopherson
2025-04-24  2:56   ` 答复: [????] " Li,Rongqing
2025-05-07 14:32     ` Sean Christopherson [this message]
2025-05-12  5:03       ` 答复: [????] Re: ??: " Li,Rongqing
2025-04-29  2:13   ` 答复: " Li,Rongqing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBtgTnQU0JlNq2Y3@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=lizhaoxin04@baidu.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.