All of lore.kernel.org
 help / color / mirror / Atom feed
From: yu.c.zhang@linux.intel.com
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jason Gunthorpe <jgg@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH 1/3] KVM: VMX: Retry APIC-access page reload if invalidation is in-progress
Date: Wed, 7 Jun 2023 15:40:37 +0800	[thread overview]
Message-ID: <20230607073728.vggwcoylibj3cp6s@linux.intel.com> (raw)
In-Reply-To: <20230602011518.787006-2-seanjc@google.com>

On Thu, Jun 01, 2023 at 06:15:16PM -0700, Sean Christopherson wrote:
> Re-request an APIC-access page reload if there is a relevant mmu_notifier
> invalidation in-progress when KVM retrieves the backing pfn, i.e. stall
> vCPUs until the backing pfn for the APIC-access page is "officially"
> stable.  Relying on the primary MMU to not make changes after invoking
> ->invalidate_range() works, e.g. any additional changes to a PRESENT PTE
> would also trigger an ->invalidate_range(), but using ->invalidate_range()
> to fudge around KVM not honoring past and in-progress invalidations is a
> bit hacky.
> 
> Honoring invalidations will allow using KVM's standard mmu_notifier hooks
> to detect APIC-access page reloads, which will in turn allow removing
> KVM's implementation of ->invalidate_range() (the APIC-access page case is
> a true one-off).
> 
> Opportunistically add a comment to explain why doing nothing if a memslot
> isn't found is functionally correct.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 44fb619803b8..59195f0dc7a5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6708,7 +6708,12 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
>  
>  static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>  {
> -	struct page *page;
> +	const gfn_t gfn = APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_memslots *slots = kvm_memslots(kvm);
> +	struct kvm_memory_slot *slot;
> +	unsigned long mmu_seq;
> +	kvm_pfn_t pfn;
>  
>  	/* Defer reload until vmcs01 is the current VMCS. */
>  	if (is_guest_mode(vcpu)) {
> @@ -6720,18 +6725,53 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>  	    SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
>  		return;
>  
> -	page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> -	if (is_error_page(page))
> +	/*
> +	 * Grab the memslot so that the hva lookup for the mmu_notifier retry
> +	 * is guaranteed to use the same memslot as the pfn lookup, i.e. rely
> +	 * on the pfn lookup's validation of the memslot to ensure a valid hva
> +	 * is used for the retry check.
> +	 */
> +	slot = id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT);
> +	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
>  		return;
>  
> -	vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page));
> +	/*
> +	 * Ensure that the mmu_notifier sequence count is read before KVM
> +	 * retrieves the pfn from the primary MMU.  Note, the memslot is
> +	 * protected by SRCU, not the mmu_notifier.  Pairs with the smp_wmb()
> +	 * in kvm_mmu_invalidate_end().
> +	 */
> +	mmu_seq = kvm->mmu_invalidate_seq;
> +	smp_rmb();
> +
> +	/*
> +	 * No need to retry if the memslot does not exist or is invalid.  KVM
> +	 * controls the APIC-access page memslot, and only deletes the memslot
> +	 * if APICv is permanently inhibited, i.e. the memslot won't reappear.
> +	 */
> +	pfn = gfn_to_pfn_memslot(slot, gfn);
> +	if (is_error_noslot_pfn(pfn))
> +		return;
> +
> +	read_lock(&vcpu->kvm->mmu_lock);
> +	if (mmu_invalidate_retry_hva(kvm, mmu_seq,
> +				     gfn_to_hva_memslot(slot, gfn))) {
> +		kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);

Are the mmu_invalidate_retry_hva() and the following request meant to stall
the vCPU if there's on going invalidation? 

If yes, may I ask how would the vCPU be stalled?

Below are my understandings and confusions about this process. I must have
missed something, waiting to be educated... :) 

When the backing page of APIC access page is to be reclaimed:
1> kvm_mmu_notifier_invalidate_range_start() -> __kvm_handle_hva_range() will
increase the kvm->mmu_invalidate_in_progress and account the start/end of this
page in kvm_mmu_invalidate_begin().
2> And then kvm_unmap_gfn_range() will zap the TDP, and send the request,
KVM_REQ_APIC_PAGE_RELOAD, to all vCPUs.
3> While a vCPU tries to reload the APIC access page before entering the guest,
kvm->mmu_invalidate_in_progress will be checked in mmu_invalidate_retry_hva(),
but it is possible(or is it?) that the kvm->mmu_invalidate_in_progess is still
positive, so KVM_REQ_APIC_PAGE_RELOAD is set again. No reload, and no TLB flush.
4> So what if the vCPU resumes with KVM_REQ_APIC_PAGE_RELOAD & KVM_REQ_TLB_FLUSH
flags being set, yet APIC access page is not reloaded and TLB is not flushed? Or,
will this happen?

One more dumb question - why does KVM not just pin the APIC access page? 

> +		read_unlock(&vcpu->kvm->mmu_lock);
> +		goto out;
> +	}
> +
> +	vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
> +	read_unlock(&vcpu->kvm->mmu_lock);
> +
>  	vmx_flush_tlb_current(vcpu);
>  
> +out:
>  	/*
>  	 * Do not pin apic access page in memory, the MMU notifier
>  	 * will call us again if it is migrated or swapped out.
>  	 */
> -	put_page(page);
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void vmx_hwapic_isr_update(int max_isr)
> -- 
> 2.41.0.rc2.161.g9c6817b8e7-goog
> 

B.R.
Yu

  parent reply	other threads:[~2023-06-07  7:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02  1:15 [PATCH 0/3] KVM: x86: Use "standard" mmu_notifier hook for APIC page Sean Christopherson
2023-06-02  1:15 ` [PATCH 1/3] KVM: VMX: Retry APIC-access page reload if invalidation is in-progress Sean Christopherson
2023-06-06  2:11   ` Alistair Popple
2023-06-06 17:22     ` Sean Christopherson
2023-06-06 17:39       ` Jason Gunthorpe
2023-06-07  7:40   ` yu.c.zhang [this message]
2023-06-07 14:30     ` Sean Christopherson
2023-06-07 17:23       ` Yu Zhang
2023-06-07 17:53         ` Sean Christopherson
2023-06-08  7:00           ` Yu Zhang
2023-06-13 19:07             ` Sean Christopherson
2023-06-17  3:45               ` Yu Zhang
2023-06-22 23:02                 ` Sean Christopherson
2023-06-02  1:15 ` [PATCH 2/3] KVM: x86: Use standard mmu_notifier invalidate hooks for APIC access page Sean Christopherson
2023-06-06  2:20   ` Alistair Popple
2023-06-02  1:15 ` [PATCH 3/3] KVM: x86/mmu: Trigger APIC-access page reload iff vendor code cares Sean Christopherson
2023-06-05 10:15 ` [PATCH 0/3] KVM: x86: Use "standard" mmu_notifier hook for APIC page Paolo Bonzini
2023-06-06 16:43 ` Jason Gunthorpe
2023-06-07  0:55 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230607073728.vggwcoylibj3cp6s@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=apopple@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.