public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Jon Kohler <jon@nutanix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Bijan Mottahedeh <bijan.mottahedeh@nutanix.com>,
	Raphael Norwitz <raphael.norwitz@nutanix.com>,
	Junaid Shahid <junaids@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"x86@kernel.org" <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] kvm: x86: move srcu lock out of kvm_vcpu_check_block
Date: Wed, 19 May 2021 21:53:24 +0000	[thread overview]
Message-ID: <YKWI1GPdNc4shaCt@google.com> (raw)
In-Reply-To: <70B34A15-C4A1-4227-B037-7B26B40EDBFE@nutanix.com>

On Wed, May 05, 2021, Jon Kohler wrote:
> 
> > On May 1, 2021, at 9:05 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> > On 30/04/21 22:45, Sean Christopherson wrote:
> >> On Wed, Apr 28, 2021, Jon Kohler wrote:
> >>> To improve performance, this moves kvm->srcu lock logic from
> >>> kvm_vcpu_check_block to kvm_vcpu_running and wraps directly around
> >>> check_events. Also adds a hint for callers to tell
> >>> kvm_vcpu_running whether or not to acquire srcu, which is useful in
> >>> situations where the lock may already be held. With this in place, we
> >>> see roughly 5% improvement in an internal benchmark [3] and no more
> >>> impact from this lock on non-nested workloads.
> >> ...
> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>> index efc7a82ab140..354f690cc982 100644
> >>> --- a/arch/x86/kvm/x86.c
> >>> +++ b/arch/x86/kvm/x86.c
> >>> @@ -9273,10 +9273,24 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> >>>  	return 1;
> >>>  }
> >>> 
> >>> -static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
> >>> +static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu, bool acquire_srcu)
> >>>  {
> >>> -	if (is_guest_mode(vcpu))
> >>> -		kvm_x86_ops.nested_ops->check_events(vcpu);
> >>> +	if (is_guest_mode(vcpu)) {
> >>> +		if (acquire_srcu) {
> >>> +			/*
> >>> +			 * We need to lock because check_events could call
> >>> +			 * nested_vmx_vmexit() which might need to resolve a
> >>> +			 * valid memslot. We will have this lock only when
> >>> +			 * called from vcpu_run but not when called from
> >>> +			 * kvm_vcpu_check_block > kvm_arch_vcpu_runnable.
> >>> +			 */
> >>> +			int idx = srcu_read_lock(&vcpu->kvm->srcu);
> >>> +			kvm_x86_ops.nested_ops->check_events(vcpu);
> >>> +			srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >>> +		} else {
> >>> +			kvm_x86_ops.nested_ops->check_events(vcpu);
> >>> +		}
> >>> +	}
> >> Obviously not your fault, but I absolutely detest calling check_events() from
> >> kvm_vcpu_running.  I would much prefer to make baby steps toward cleaning up the
> >> existing mess instead of piling more weirdness on top.
> >>
> >> Ideally, APICv support would be fixed to not require a deep probe into nested
> >> events just to see if a vCPU can run.  But, that's probably more than we want to
> >> bite off at this time.
> >>
> >> What if we add another nested_ops API to check if the vCPU has an event, but not
> >> actually process the event?  I think that would allow eliminating the SRCU lock,
> >> and would get rid of the most egregious behavior of triggering a nested VM-Exit
> >> in a seemingly innocuous helper.
> >>
> >> If this works, we could even explore moving the call to nested_ops->has_events()
> >> out of kvm_vcpu_running() and into kvm_vcpu_has_events(); I can't tell if the
> >> side effects in vcpu_block() would get messed up with that change :-/
> >> Incomplete patch...
> > 
> > I think it doesn't even have to be *nested* events.  Most events are the
> > same inside or outside guest mode, as they already special case guest mode
> > inside the kvm_x86_ops callbacks (e.g. kvm_arch_interrupt_allowed is
> > already called by kvm_vcpu_has_events).
> > 
> > I think we only need to extend kvm_x86_ops.nested_ops->hv_timer_pending to
> > cover MTF, plus double check that INIT and SIPI are handled correctly, and
> > then the call to check_nested_events can go away.
> 
> Thanks, Paolo, Sean. I appreciate the prompt response, Sorry for the slow
> reply, I was out with a hand injury for a few days and am caught up now. 
> 
> Just to confirm - In the spirit of baby steps as Sean mentioned, I’m happy to
> take up the idea that Sean mentioned, that makes sense to me. Perhaps we can
> do that small patch and leave a TODO do a tuneup for hv_timer_pending and the
> other double checks Paolo mentioned.

Paolo was pointing out that kvm_vcpu_has_events() already checks hv_timer_pending,
and that we could add the few missing nested event cases to kvm_vcpu_has_events()
instead of wholesale checking everything that's in check_nested_events().

I believe that would work, as I suspect the underlying bug that was alluded to
by commit 0ad3bed6c5ec ("kvm: nVMX: move nested events check to kvm_vcpu_running")
has since been fixed.  But, I'm not sure it makes much difference in practice
since we'll likely end up with nested_ops->has_events() either way.

Staring a bit more, I'm pretty sure hv_timer_pending() can be made obsolete and
dropped.  Unless Paolo objects, I still like my original proposal.

I think the safest approach from a bisection standpoint would be to do this in
3-4 stages:

  1. Refactor check_nested_events() to split out a has_events() helper.
  2. Move the has_events() call from kvm_vcpu_running() into kvm_vcpu_has_events()
  3. Drop the explicit hv_timer_pending() in inject_pending_event().  It should
     be dead code since it's just a pointer to nested_vmx_preemption_timer_pending(),
     which is handled by vmx_check_nested_events() and called earlier.
  4. Drop the explicit hv_timer_pending() in kvm_vcpu_has_events() for the same
     reasons as (3).  This can also drop hv_timer_pending() entirely.

> Or would you rather skip that approach and go right to making
> check_nested_events go-away first? Guessing that might be a larger effort
> with more nuances though?

  reply	other threads:[~2021-05-19 21:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 17:38 [PATCH] kvm: x86: move srcu lock out of kvm_vcpu_check_block Jon Kohler
2021-04-30 20:45 ` Sean Christopherson
2021-05-01 13:05   ` Paolo Bonzini
2021-05-05 15:46     ` Jon Kohler
2021-05-19 21:53       ` Sean Christopherson [this message]
2021-05-20 12:31         ` Paolo Bonzini
     [not found] <<20210428173820.13051-1-jon@nutanix.com>
2021-04-30  1:54 ` Prashanth Sreenivasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKWI1GPdNc4shaCt@google.com \
    --to=seanjc@google.com \
    --cc=bijan.mottahedeh@nutanix.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=jon@nutanix.com \
    --cc=joro@8bytes.org \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=raphael.norwitz@nutanix.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox