From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paolo Bonzini <pbonzini@redhat.com>,
x86@kernel.org, Borislav Petkov <bp@alien8.de>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present_queued
Date: Mon, 18 Aug 2025 11:08:38 -0700 [thread overview]
Message-ID: <aKNsJoslekEMI-FT@google.com> (raw)
In-Reply-To: <20250813192313.132431-3-mlevitsk@redhat.com>
On Wed, Aug 13, 2025, Maxim Levitsky wrote:
> Fix a semi theoretical race condition in reading of page_ready_pending
> in kvm_arch_async_page_present_queued.
This needs to explain what can actually go wrong if the race is "hit". After
staring at all of this for far, far too long, I'm 99.9% confident the race is
benign.
If the worker "incorrectly" sees pageready_pending as %false, then the result
is simply a spurious kick, and that spurious kick is all but guaranteed to be a
nop since if kvm_arch_async_page_present() is setting the flag, then (a) the
vCPU isn't blocking and (b) isn't IN_GUEST_MODE and thus won't be IPI'd.
If the worker incorrectly sees pageready_pending as %true, then the vCPU has
*just* written MSR_KVM_ASYNC_PF_ACK, and is guaranteed to observe and process
KVM_REQ_APF_READY before re-entering the guest, and the sole purpose of the kick
is to ensure the request is processed.
> Only trust the value of page_ready_pending if the guest is about to
> enter guest mode (vcpu->mode).
This is misleading, e.g. IN_GUEST_MODE can be true if the vCPU just *exited*.
All IN_GUEST_MODE says is that the vCPU task is somewhere in KVM's inner run loop.
> To achieve this, read the vcpu->mode using smp_load_acquire which is
> paired with smp_store_release in vcpu_enter_guest.
>
> Then only if vcpu_mode is IN_GUEST_MODE, trust the value of the
> page_ready_pending because it was written before and therefore its correct
> value is visible.
>
> Also if the above mentioned check is true, avoid raising the request
> on the target vCPU.
Why? At worst, a dangling KVM_REQ_APF_READY will cause KVM to bail from the
fastpath when it's not strictly necessary to do so. On the other hand, a missing
request could hang the guest. So I don't see any reason to try and be super
precise when setting KVM_REQ_APF_READY.
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
> arch/x86/kvm/x86.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9018d56b4b0a..3d45a4cd08a4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13459,9 +13459,14 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>
> void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu)
> {
> - kvm_make_request(KVM_REQ_APF_READY, vcpu);
> - if (!vcpu->arch.apf.pageready_pending)
> + /* Pairs with smp_store_release in vcpu_enter_guest. */
> + bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
In terms of arch.apf.pageready_pending being modified, it's not IN_GUEST_MODE
that's special, it's OUTSIDE_GUEST_MODE that's special, because that's the only
time the task that hold vcpu->mutex can clear pageready_pending.
> + bool page_ready_pending = READ_ONCE(vcpu->arch.apf.pageready_pending);
This should be paired with WRITE_ONCE() on the vCPU.
> +
> + if (!in_guest_mode || !page_ready_pending) {
> + kvm_make_request(KVM_REQ_APF_READY, vcpu);
> kvm_vcpu_kick(vcpu);
> + }
Given that the race is guaranteed to be bening (assuming my analysis is correct),
I definitely think there should be a comment here explaining that pageready_pending
is "technically unstable". Otherwise, it takes a lot of staring to understand
what this code is actually doing.
I also think it makes sense to do the bare minimum for OUTSIDE_GUEST_MODE, which
is to wake the vCPU. Because calling kvm_vcpu_kick() when the vCPU is known to
not be IN_GUEST_MODE is weird.
For the code+comment, how about this?
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bdf7ef0b535..d721fab3418d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4000,7 +4000,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
return 1;
if (data & 0x1) {
- vcpu->arch.apf.pageready_pending = false;
+ WRITE_ONCE(vcpu->arch.apf.pageready_pending, false);
kvm_check_async_pf_completion(vcpu);
}
break;
@@ -13457,7 +13457,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
if ((work->wakeup_all || work->notpresent_injected) &&
kvm_pv_async_pf_enabled(vcpu) &&
!apf_put_user_ready(vcpu, work->arch.token)) {
- vcpu->arch.apf.pageready_pending = true;
+ WRITE_ONCE(vcpu->arch.apf.pageready_pending, true);
kvm_apic_set_irq(vcpu, &irq, NULL);
}
@@ -13468,7 +13468,20 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu)
{
kvm_make_request(KVM_REQ_APF_READY, vcpu);
- if (!vcpu->arch.apf.pageready_pending)
+
+ /*
+ * Don't kick the vCPU if it has an outstanding "page ready" event as
+ * KVM won't be able to deliver the next "page ready" token until the
+ * outstanding one is handled. Ignore pageready_pending if the vCPU is
+ * outside "guest mode", i.e. if KVM might be sending "page ready" or
+ * servicing a MSR_KVM_ASYNC_PF_ACK write, as the flag is technically
+ * unstable. However, in that case, there's obviously no need to kick
+ * the vCPU out of the guest, so just ensure the vCPU is awakened if
+ * it's blocking.
+ */
+ if (smp_load_acquire(vcpu->mode) == OUTSIDE_GUEST_MODE)
+ kvm_vcpu_wake_up(vcpu);
+ else if (!READ_ONCE(vcpu->arch.apf.pageready_pending))
kvm_vcpu_kick(vcpu);
}
next prev parent reply other threads:[~2025-08-18 18:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-13 19:23 [PATCH 0/3] Fix a lost async pagefault notification when the guest is using SMM Maxim Levitsky
2025-08-13 19:23 ` [PATCH 1/3] KVM: x86: Warn if KVM tries to deliver an #APF completion when APF is not enabled Maxim Levitsky
2025-08-13 19:23 ` [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present_queued Maxim Levitsky
2025-08-18 18:08 ` Sean Christopherson [this message]
2025-09-23 15:58 ` Paolo Bonzini
2025-09-23 16:23 ` Paolo Bonzini
2025-09-23 18:55 ` Sean Christopherson
2025-09-23 19:28 ` Paolo Bonzini
2025-09-23 23:14 ` Sean Christopherson
2025-10-27 15:00 ` Sean Christopherson
2025-10-30 19:57 ` mlevitsk
2025-10-30 20:28 ` Sean Christopherson
2025-08-13 19:23 ` [PATCH 3/3] KVM: x86: Fix the interaction between SMM and the asynchronous pagefault Maxim Levitsky
2025-08-18 18:20 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aKNsJoslekEMI-FT@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.