Re: [PATCH] kvm: x86: move srcu lock out of kvm_vcpu_check_block

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Jon Kohler <jon@nutanix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Bijan Mottahedeh <bijan.mottahedeh@nutanix.com>,
	Raphael Norwitz <raphael.norwitz@nutanix.com>,
	Junaid Shahid <junaids@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"x86@kernel.org" <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] kvm: x86: move srcu lock out of kvm_vcpu_check_block
Date: Wed, 19 May 2021 21:53:24 +0000	[thread overview]
Message-ID: <YKWI1GPdNc4shaCt@google.com> (raw)
In-Reply-To: <70B34A15-C4A1-4227-B037-7B26B40EDBFE@nutanix.com>

On Wed, May 05, 2021, Jon Kohler wrote:
> 
> > On May 1, 2021, at 9:05 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> > On 30/04/21 22:45, Sean Christopherson wrote:
> >> On Wed, Apr 28, 2021, Jon Kohler wrote:
> >>> To improve performance, this moves kvm->srcu lock logic from
> >>> kvm_vcpu_check_block to kvm_vcpu_running and wraps directly around
> >>> check_events. Also adds a hint for callers to tell
> >>> kvm_vcpu_running whether or not to acquire srcu, which is useful in
> >>> situations where the lock may already be held. With this in place, we
> >>> see roughly 5% improvement in an internal benchmark [3] and no more
> >>> impact from this lock on non-nested workloads.
> >> ...
> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>> index efc7a82ab140..354f690cc982 100644
> >>> --- a/arch/x86/kvm/x86.c
> >>> +++ b/arch/x86/kvm/x86.c
> >>> @@ -9273,10 +9273,24 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> >>>  	return 1;
> >>>  }
> >>> 
> >>> -static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
> >>> +static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu, bool acquire_srcu)
> >>>  {
> >>> -	if (is_guest_mode(vcpu))
> >>> -		kvm_x86_ops.nested_ops->check_events(vcpu);
> >>> +	if (is_guest_mode(vcpu)) {
> >>> +		if (acquire_srcu) {
> >>> +			/*
> >>> +			 * We need to lock because check_events could call
> >>> +			 * nested_vmx_vmexit() which might need to resolve a
> >>> +			 * valid memslot. We will have this lock only when
> >>> +			 * called from vcpu_run but not when called from
> >>> +			 * kvm_vcpu_check_block > kvm_arch_vcpu_runnable.
> >>> +			 */
> >>> +			int idx = srcu_read_lock(&vcpu->kvm->srcu);
> >>> +			kvm_x86_ops.nested_ops->check_events(vcpu);
> >>> +			srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >>> +		} else {
> >>> +			kvm_x86_ops.nested_ops->check_events(vcpu);
> >>> +		}
> >>> +	}
> >> Obviously not your fault, but I absolutely detest calling check_events() from
> >> kvm_vcpu_running.  I would much prefer to make baby steps toward cleaning up the
> >> existing mess instead of piling more weirdness on top.
> >>
> >> Ideally, APICv support would be fixed to not require a deep probe into nested
> >> events just to see if a vCPU can run.  But, that's probably more than we want to
> >> bite off at this time.
> >>
> >> What if we add another nested_ops API to check if the vCPU has an event, but not
> >> actually process the event?  I think that would allow eliminating the SRCU lock,
> >> and would get rid of the most egregious behavior of triggering a nested VM-Exit
> >> in a seemingly innocuous helper.
> >>
> >> If this works, we could even explore moving the call to nested_ops->has_events()
> >> out of kvm_vcpu_running() and into kvm_vcpu_has_events(); I can't tell if the
> >> side effects in vcpu_block() would get messed up with that change :-/
> >> Incomplete patch...
> > 
> > I think it doesn't even have to be *nested* events.  Most events are the
> > same inside or outside guest mode, as they already special case guest mode
> > inside the kvm_x86_ops callbacks (e.g. kvm_arch_interrupt_allowed is
> > already called by kvm_vcpu_has_events).
> > 
> > I think we only need to extend kvm_x86_ops.nested_ops->hv_timer_pending to
> > cover MTF, plus double check that INIT and SIPI are handled correctly, and
> > then the call to check_nested_events can go away.
> 
> Thanks, Paolo, Sean. I appreciate the prompt response, Sorry for the slow
> reply, I was out with a hand injury for a few days and am caught up now. 
> 
> Just to confirm - In the spirit of baby steps as Sean mentioned, I’m happy to
> take up the idea that Sean mentioned, that makes sense to me. Perhaps we can
> do that small patch and leave a TODO do a tuneup for hv_timer_pending and the
> other double checks Paolo mentioned.

Paolo was pointing out that kvm_vcpu_has_events() already checks hv_timer_pending,
and that we could add the few missing nested event cases to kvm_vcpu_has_events()
instead of wholesale checking everything that's in check_nested_events().

I believe that would work, as I suspect the underlying bug that was alluded to
by commit 0ad3bed6c5ec ("kvm: nVMX: move nested events check to kvm_vcpu_running")
has since been fixed.  But, I'm not sure it makes much difference in practice
since we'll likely end up with nested_ops->has_events() either way.

Staring a bit more, I'm pretty sure hv_timer_pending() can be made obsolete and
dropped.  Unless Paolo objects, I still like my original proposal.

I think the safest approach from a bisection standpoint would be to do this in
3-4 stages:

  1. Refactor check_nested_events() to split out a has_events() helper.
  2. Move the has_events() call from kvm_vcpu_running() into kvm_vcpu_has_events()
  3. Drop the explicit hv_timer_pending() in inject_pending_event().  It should
     be dead code since it's just a pointer to nested_vmx_preemption_timer_pending(),
     which is handled by vmx_check_nested_events() and called earlier.
  4. Drop the explicit hv_timer_pending() in kvm_vcpu_has_events() for the same
     reasons as (3).  This can also drop hv_timer_pending() entirely.

> Or would you rather skip that approach and go right to making
> check_nested_events go-away first? Guessing that might be a larger effort
> with more nuances though?

next prev parent reply	other threads:[~2021-05-19 21:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 17:38 [PATCH] kvm: x86: move srcu lock out of kvm_vcpu_check_block Jon Kohler
2021-04-30 20:45 ` Sean Christopherson
2021-05-01 13:05   ` Paolo Bonzini
2021-05-05 15:46     ` Jon Kohler
2021-05-19 21:53       ` Sean Christopherson [this message]
2021-05-20 12:31         ` Paolo Bonzini
     [not found] <<20210428173820.13051-1-jon@nutanix.com>
2021-04-30  1:54 ` Prashanth Sreenivasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKWI1GPdNc4shaCt@google.com \
    --to=seanjc@google.com \
    --cc=bijan.mottahedeh@nutanix.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=jon@nutanix.com \
    --cc=joro@8bytes.org \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=raphael.norwitz@nutanix.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.