public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Li Rongqing <lirongqing@baidu.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Li Zhaoxin <lizhaoxin04@baidu.com>
Subject: Re: 答复: [????] Re: [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev
Date: Wed, 7 May 2025 07:32:43 -0700	[thread overview]
Message-ID: <aBtgTnQU0JlNq2Y3@google.com> (raw)
In-Reply-To: <4bfe7a8f5020448e903e6335173afc75@baidu.com>

On Thu, Apr 24, 2025, Li,Rongqing wrote:
> > On Wed, Apr 23, 2025, lirongqing wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > 2e591cc..af730a5 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -5865,6 +5865,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu,
> > enum kvm_bus bus_idx, gpa_t addr,
> > >  	return r < 0 ? r : 0;
> > >  }
> > >
> > > +static void free_kvm_io_bus(struct rcu_head *rcu) {
> > > +	struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu);
> > > +
> > > +	kfree(bus);
> > > +}
> > > +
> > >  int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> > gpa_t addr,
> > >  			    int len, struct kvm_io_device *dev)  { @@ -5903,8 +5910,8
> > @@
> > > int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t
> > addr,
> > >  	memcpy(new_bus->range + i + 1, bus->range + i,
> > >  		(bus->dev_count - i) * sizeof(struct kvm_io_range));
> > >  	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> > > -	synchronize_srcu_expedited(&kvm->srcu);
> > > -	kfree(bus);
> > > +
> > > +	call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus);
> > 
> > I don't think this is safe from a functional correctness perspective, as
> > KVM must guarantee all readers see the new device before KVM returns
> > control to userspace.  E.g. I'm pretty sure KVM_REGISTER_COALESCED_MMIO is
> > used while vCPUs are active.
> > 
> > However, I'm pretty sure the only readers that actually rely on SRCU are vCPUs,
> > so I _think_ the synchronize_srcu_expedited() is necessary if and only if vCPUs
> > have been created.
> > 
> > That could race with concurrent vCPU creation in a few flows that don't
> > take kvm->lock, but that should be ok from an ABI perspective.  False
> > kvm->positives (vCPU creation fails) are benign, and false negatives (vCPU
> > created after the check) are inherently racy, i.e. userspace can't
> > guarantee the vCPU sees any particular ordering.
> > 
> > So this?
> > 
> > 	if (READ_ONCE(kvm->created_vcpus)) {
> > 		synchronize_srcu_expedited(&kvm->srcu);
> > 		kfree(bus);
> > 	} else {
> > 		call_srcu(&kvm->srcu, &bus->rcu, free_kvm_io_bus);
> > 	}
> 
> 
> If call_srcu is able to used only before creating vCPU, the upper will have
> little effect, since most device are created after creating vCPU

Is that something that can be "fixed" in userspace?  I.e. why are devices being
created after vCPUs?

> We want to optimize the ioeventfd creation, since a VM will create lots of
> ioeventfd, 

Ah, so this isn't about device creation from userspace, rather it's about reacting
to the guest's configuration of a device, e.g. to register doorbells when the guest
instantiates queues for a device?

> can ioeventfd uses call_srcu?

No, because that has the same problem of KVM not ensuring vCPUs will observe the
the change before returning to userspace.

Unfortunately, I don't see an easy solution.  At a glance, every architecture
except arm64 could switch to protect kvm->buses with a rwlock, but arm64 uses
the MMIO bus for the vGIC's ITS, and I don't think it's feasible to make the ITS
stuff play nice with a rwlock.  E.g. vgic_its.its_lock and vgic_its.cmd_lock are
mutexes, and there are multiple ITS paths that access guest memory, i.e. might
sleep due to faulting.

Even if we did something x86-centric, e.g. futher special case KVM_FAST_MMIO_BUS
with a rwlock, I worry that using a rwlock would degrade steady state performance,
e.g. due to cross-CPU atomic accesses.

Does using a dedicated SRCU structure resolve the issue?  E.g. add and use
kvm->buses_srcu instead of kvm->srcu?  x86's usage of the MMIO/PIO buses is
limited to kvm_io_bus_{read,write}(), so it should be easy enough to do a super
quick and dirty PoC.

  reply	other threads:[~2025-05-07 14:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23  9:25 [PATCH] KVM: Use call_rcu() in kvm_io_bus_register_dev lirongqing
2025-04-23 14:59 ` Sean Christopherson
2025-04-24  2:56   ` 答复: [????] " Li,Rongqing
2025-05-07 14:32     ` Sean Christopherson [this message]
2025-05-12  5:03       ` 答复: [????] Re: ??: " Li,Rongqing
2025-04-29  2:13   ` 答复: " Li,Rongqing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBtgTnQU0JlNq2Y3@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=lizhaoxin04@baidu.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox