From: Sean Christopherson <seanjc@google.com>
To: yu.c.zhang@linux.intel.com
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Jason Gunthorpe <jgg@nvidia.com>,
Alistair Popple <apopple@nvidia.com>,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH 1/3] KVM: VMX: Retry APIC-access page reload if invalidation is in-progress
Date: Wed, 7 Jun 2023 07:30:04 -0700 [thread overview]
Message-ID: <ZICUbIF2+Cvbb9GM@google.com> (raw)
In-Reply-To: <20230607073728.vggwcoylibj3cp6s@linux.intel.com>
On Wed, Jun 07, 2023, yu.c.zhang@linux.intel.com wrote:
> On Thu, Jun 01, 2023 at 06:15:16PM -0700, Sean Christopherson wrote:
> > + pfn = gfn_to_pfn_memslot(slot, gfn);
> > + if (is_error_noslot_pfn(pfn))
> > + return;
> > +
> > + read_lock(&vcpu->kvm->mmu_lock);
> > + if (mmu_invalidate_retry_hva(kvm, mmu_seq,
> > + gfn_to_hva_memslot(slot, gfn))) {
> > + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
>
> Are the mmu_invalidate_retry_hva() and the following request meant to stall
> the vCPU if there's on going invalidation?
Yep.
> If yes, may I ask how would the vCPU be stalled?
>
> Below are my understandings and confusions about this process. I must have
> missed something, waiting to be educated... :)
>
> When the backing page of APIC access page is to be reclaimed:
> 1> kvm_mmu_notifier_invalidate_range_start() -> __kvm_handle_hva_range() will
> increase the kvm->mmu_invalidate_in_progress and account the start/end of this
> page in kvm_mmu_invalidate_begin().
> 2> And then kvm_unmap_gfn_range() will zap the TDP, and send the request,
> KVM_REQ_APIC_PAGE_RELOAD, to all vCPUs.
> 3> While a vCPU tries to reload the APIC access page before entering the guest,
> kvm->mmu_invalidate_in_progress will be checked in mmu_invalidate_retry_hva(),
> but it is possible(or is it?) that the kvm->mmu_invalidate_in_progess is still
> positive, so KVM_REQ_APIC_PAGE_RELOAD is set again. No reload, and no TLB flush.
> 4> So what if the vCPU resumes with KVM_REQ_APIC_PAGE_RELOAD & KVM_REQ_TLB_FLUSH
> flags being set, yet APIC access page is not reloaded and TLB is not flushed? Or,
> will this happen?
Pending requests block KVM from actually entering the guest. If a request comes
in after vcpu_enter_guest()'s initial handling of requests, KVM will bail before
VM-Enter and go back through the entire "outer" run loop.
This isn't necessarily the most efficient way to handle the stall, e.g. KVM does
a fair bit of prep for VM-Enter before detecting the pending request. The
alternative would be to have kvm_vcpu_reload_apic_access_page() return value
instructing vcpu_enter_guest() whether to bail immediately or continue on. I
elected for the re-request approach because (a) it didn't require redefining the
kvm_x86_ops vendor hook, (b) this should be a rare situation and not performance
critical overall, and (c) there's no guarantee that bailing immediately would
actually yield better performance from the guest's perspective, e.g. if there are
other pending requests/work, then the KVM can handle those items while the vCPU
is stalled instead of waiting until the invalidation completes to proceed.
> One more dumb question - why does KVM not just pin the APIC access page?
Definitely not a dumb question, I asked myself the same thing multiple times when
looking at this :-) Pinning the page would be easier, and KVM actually did that
in the original implementation. The issue is in how KVM allocates the backing
page. It's not a traditional kernel allocation, but is instead anonymous memory
allocated by way of vm_mmap(), i.e. for all intents and purposes it's a user
allocation. That means the kernel expects it to be a regular movable page, e.g.
it's entirely possible the page (if it were pinned) could be the only page in a
2MiB chunk preventing the kernel from migrating/compacting and creating a hugepage.
In hindsight, I'm not entirely convinced that unpinning the page was the right
choice, as it resulted in a handful of nasty bugs. But, now that we've fixed all
those bugs (knock wood), there's no good argument for undoing all of that work.
Because while the code is subtle and requires hooks in a few paths, it's not *that*
complex and for the most part doesn't require active maintenance.
static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
{
if (kvm_request_pending(vcpu)) { <= check if any request is pending
if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
kvm_vcpu_reload_apic_access_page(vcpu); <= re-requests APIC_PAGE_RELOAD
...
}
...
preempt_disable();
static_call(kvm_x86_prepare_switch_to_guest)(vcpu);
<host => guest bookkeeping>
if (kvm_vcpu_exit_request(vcpu)) { <= detects the new pending request
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
local_irq_enable();
preempt_enable();
kvm_vcpu_srcu_read_lock(vcpu);
r = 1;
goto cancel_injection; <= bails from actually entering the guest
}
if (req_immediate_exit) {
kvm_make_request(KVM_REQ_EVENT, vcpu);
static_call(kvm_x86_request_immediate_exit)(vcpu);
}
for (;;) {
<inner run / VM-Enter loop>
}
<VM-Exit path>
r = static_call(kvm_x86_handle_exit)(vcpu, exit_fastpath);
return r;
cancel_injection:
if (req_immediate_exit)
kvm_make_request(KVM_REQ_EVENT, vcpu);
static_call(kvm_x86_cancel_injection)(vcpu);
if (unlikely(vcpu->arch.apic_attention))
kvm_lapic_sync_from_vapic(vcpu);
out:
return r;
}
next prev parent reply other threads:[~2023-06-07 14:30 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 1:15 [PATCH 0/3] KVM: x86: Use "standard" mmu_notifier hook for APIC page Sean Christopherson
2023-06-02 1:15 ` [PATCH 1/3] KVM: VMX: Retry APIC-access page reload if invalidation is in-progress Sean Christopherson
2023-06-06 2:11 ` Alistair Popple
2023-06-06 17:22 ` Sean Christopherson
2023-06-06 17:39 ` Jason Gunthorpe
2023-06-07 7:40 ` yu.c.zhang
2023-06-07 14:30 ` Sean Christopherson [this message]
2023-06-07 17:23 ` Yu Zhang
2023-06-07 17:53 ` Sean Christopherson
2023-06-08 7:00 ` Yu Zhang
2023-06-13 19:07 ` Sean Christopherson
2023-06-17 3:45 ` Yu Zhang
2023-06-22 23:02 ` Sean Christopherson
2023-06-02 1:15 ` [PATCH 2/3] KVM: x86: Use standard mmu_notifier invalidate hooks for APIC access page Sean Christopherson
2023-06-06 2:20 ` Alistair Popple
2023-06-02 1:15 ` [PATCH 3/3] KVM: x86/mmu: Trigger APIC-access page reload iff vendor code cares Sean Christopherson
2023-06-05 10:15 ` [PATCH 0/3] KVM: x86: Use "standard" mmu_notifier hook for APIC page Paolo Bonzini
2023-06-06 16:43 ` Jason Gunthorpe
2023-06-07 0:55 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZICUbIF2+Cvbb9GM@google.com \
--to=seanjc@google.com \
--cc=apopple@nvidia.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=robin.murphy@arm.com \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox