public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
	 "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	x86@kernel.org, Borislav Petkov <bp@alien8.de>,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] KVM: x86: Fix the interaction between SMM and the asynchronous pagefault
Date: Mon, 18 Aug 2025 11:20:26 -0700	[thread overview]
Message-ID: <aKNu6gYNO1j_Wpdj@google.com> (raw)
In-Reply-To: <20250813192313.132431-4-mlevitsk@redhat.com>

On Wed, Aug 13, 2025, Maxim Levitsky wrote:
> Currently a #SMI can cause KVM to drop an #APF ready event and
> subsequently causes the guest to never resume the task that is waiting
> for it.
> This can result in tasks becoming permanently stuck within the guest.
> 
> This happens because KVM flushes the APF queue without notifying the guest
> of completed APF requests when the guest exits to real mode.
> 
> And the SMM exit code calls kvm_set_cr0 with CR.PE == 0, which triggers
> this code.
> 
> It must be noted that while this flush is reasonable to do for the actual
> real mode entry, it is actually achieves nothing because it is too late to
> flush this queue on SMM exit.
> 
> To fix this, avoid doing this flush altogether, and handle the real
> mode entry/exits in the same way KVM already handles the APIC
> enable/disable events:
> 
> APF completion events are not injected while APIC is disabled,
> and once APIC is re-enabled, KVM raises the KVM_REQ_APF_READY request
> which causes the first pending #APF ready event to be injected prior
> to entry to the guest mode.
> 
> This change also has the side benefit of preserving #APF events if the
> guest temporarily enters real mode - for example, to call firmware -
> although such usage should be extermery rare in modern operating systems.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 11 +++++++----
>  arch/x86/kvm/x86.h |  1 +
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3d45a4cd08a4..5dfe166025bf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1118,15 +1118,18 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>  	}
>  
>  	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
> -		kvm_clear_async_pf_completion_queue(vcpu);
> -		kvm_async_pf_hash_reset(vcpu);
> -
>  		/*
>  		 * Clearing CR0.PG is defined to flush the TLB from the guest's
>  		 * perspective.
>  		 */
>  		if (!(cr0 & X86_CR0_PG))
>  			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> +		/*
> +		 * Re-check APF completion events, when the guest re-enables paging.
> +		 */
> +		if ((cr0 & X86_CR0_PG) && kvm_pv_async_pf_enabled(vcpu))

I'm tempted to make this an elif, i.e.

		if (!(cr0 & X86_CR0_PG))
			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
		else if (kvm_pv_async_pf_enabled(vcpu))
			kvm_make_request(KVM_REQ_APF_READY, vcpu);

In theory, that could set us up to fail if another CR0.PG=1 case is added, but I
like to think future us will be smart enough to turn it into:

		if (!(cr0 & X86_CR0_PG)) {
			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
		} else {
			if (kvm_pv_async_pf_enabled(vcpu))
				kvm_make_request(KVM_REQ_APF_READY, vcpu);

			if (<other thing>)
				...
		}


> +			kvm_make_request(KVM_REQ_APF_READY, vcpu);
>  	}
>  
>  	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
> @@ -3547,7 +3550,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	return 0;
>  }
>  
> -static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
> +bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)

This is in the same file, there's no reason/need to expose this via x86.h.  The
overall diff is small enough that I'm comfortable hoisting this "up" as part of
the fix, especially since this needs to go to stable@.

If we use an elif, this?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bdf7ef0b535..2bc41e562314 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1030,6 +1030,13 @@ bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr)
 }
 EXPORT_SYMBOL_GPL(kvm_require_dr);
 
+static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
+{
+       u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
+
+       return (vcpu->arch.apf.msr_en_val & mask) == mask;
+}
+
 static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
 {
        return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2);
@@ -1122,15 +1129,15 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
        }
 
        if ((cr0 ^ old_cr0) & X86_CR0_PG) {
-               kvm_clear_async_pf_completion_queue(vcpu);
-               kvm_async_pf_hash_reset(vcpu);
-
                /*
                 * Clearing CR0.PG is defined to flush the TLB from the guest's
-                * perspective.
+                * perspective.  If the guest is (re)enabling, check for async
+                * #PFs that were completed while paging was disabled.
                 */
                if (!(cr0 & X86_CR0_PG))
                        kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+               else if (kvm_pv_async_pf_enabled(vcpu))
+                       kvm_make_request(KVM_REQ_APF_READY, vcpu);
        }
 
        if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
@@ -3524,13 +3531,6 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
        return 0;
 }
 
-static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
-{
-       u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
-
-       return (vcpu->arch.apf.msr_en_val & mask) == mask;
-}
-
 static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
        gpa_t gpa = data & ~0x3f;

      reply	other threads:[~2025-08-18 18:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 19:23 [PATCH 0/3] Fix a lost async pagefault notification when the guest is using SMM Maxim Levitsky
2025-08-13 19:23 ` [PATCH 1/3] KVM: x86: Warn if KVM tries to deliver an #APF completion when APF is not enabled Maxim Levitsky
2025-08-13 19:23 ` [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present_queued Maxim Levitsky
2025-08-18 18:08   ` Sean Christopherson
2025-09-23 15:58   ` Paolo Bonzini
2025-09-23 16:23     ` Paolo Bonzini
2025-09-23 18:55     ` Sean Christopherson
2025-09-23 19:28       ` Paolo Bonzini
2025-09-23 23:14         ` Sean Christopherson
2025-10-27 15:00           ` Sean Christopherson
2025-10-30 19:57             ` mlevitsk
2025-10-30 20:28               ` Sean Christopherson
2025-08-13 19:23 ` [PATCH 3/3] KVM: x86: Fix the interaction between SMM and the asynchronous pagefault Maxim Levitsky
2025-08-18 18:20   ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKNu6gYNO1j_Wpdj@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox