All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Kevin Cheng <chengkev@google.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  yosry.ahmed@linux.dev
Subject: Re: [PATCH V2 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK
Date: Tue, 24 Feb 2026 08:42:50 -0800	[thread overview]
Message-ID: <aZ3VCq4s7l9f4JTw@google.com> (raw)
In-Reply-To: <20260224071822.369326-3-chengkev@google.com>

On Tue, Feb 24, 2026, Kevin Cheng wrote:
> When KVM emulates an instruction for L2 and encounters a nested page
> fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit()
> injects an NPF to L1. However, the code incorrectly hardcodes
> (1ULL << 32) for exit_info_1's upper bits when the original exit was
> not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault
> occurred on a page table page, preventing L1 from correctly identifying
> the cause of the fault.
> 
> Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault
> occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when
> the fault occurs on the final GPA-to-HPA translation.
> 
> Widen error_code in struct x86_exception from u16 to u64 to accommodate
> the PFERR_GUEST_* bits (bits 32 and 33).

Stale comment as this was moved to a separate patch.

> Update nested_svm_inject_npf_exit() to use fault->error_code directly
> instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither
> PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would
> indicate a bug in the page fault handling code.
> 
> Signed-off-by: Kevin Cheng <chengkev@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/mmu/paging_tmpl.h  | 22 ++++++++++------------
>  arch/x86/kvm/svm/nested.c       | 19 +++++++++++++------
>  3 files changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ff07c45e3c731..454f84660edfc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -280,6 +280,8 @@ enum x86_intercept_stage;
>  #define PFERR_GUEST_RMP_MASK	BIT_ULL(31)
>  #define PFERR_GUEST_FINAL_MASK	BIT_ULL(32)
>  #define PFERR_GUEST_PAGE_MASK	BIT_ULL(33)
> +#define PFERR_GUEST_FAULT_STAGE_MASK \
> +	(PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK)
>  #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
>  #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
>  #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 37eba7dafd14f..f148c92b606ba 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -385,18 +385,12 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  		real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn),
>  					     nested_access, &walker->fault);
>  
> -		/*
> -		 * FIXME: This can happen if emulation (for of an INS/OUTS
> -		 * instruction) triggers a nested page fault.  The exit
> -		 * qualification / exit info field will incorrectly have
> -		 * "guest page access" as the nested page fault's cause,
> -		 * instead of "guest page structure access".  To fix this,
> -		 * the x86_exception struct should be augmented with enough
> -		 * information to fix the exit_qualification or exit_info_1
> -		 * fields.
> -		 */
> -		if (unlikely(real_gpa == INVALID_GPA))
> +		if (unlikely(real_gpa == INVALID_GPA)) {
> +#if PTTYPE != PTTYPE_EPT

I would rather swap the order of patches two and three, so that we end up with
a "positive" if-statement.  I.e. add EPT first so that we get (spoiler alert):

#if PTTYPE == PTTYPE_EPT
			walker->fault.exit_qualification |= EPT_VIOLATION_GVA_IS_VALID;
#else
			walker->fault.error_code |= PFERR_GUEST_PAGE_MASK;
#endif

> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de90b104a0dd5..1013e814168b5 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -40,18 +40,25 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
>  	struct vmcb *vmcb = svm->vmcb;
>  
>  	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> -		/*
> -		 * TODO: track the cause of the nested page fault, and
> -		 * correctly fill in the high bits of exit_info_1.
> -		 */
> -		vmcb->control.exit_code = SVM_EXIT_NPF;
> -		vmcb->control.exit_info_1 = (1ULL << 32);
> +		vmcb->control.exit_info_1 = fault->error_code;
>  		vmcb->control.exit_info_2 = fault->address;
>  	}
>  
> +	vmcb->control.exit_code = SVM_EXIT_NPF;
>  	vmcb->control.exit_info_1 &= ~0xffffffffULL;
>  	vmcb->control.exit_info_1 |= fault->error_code;
>  
> +	/*
> +	 * All nested page faults should be annotated as occurring on the
> +	 * final translation *or* the page walk. Arbitrarily choose "final"
> +	 * if KVM is buggy and enumerated both or neither.
> +	 */
> +	if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 &
> +				   PFERR_GUEST_FAULT_STAGE_MASK) != 1)) {
> +		vmcb->control.exit_info_1 &= ~PFERR_GUEST_FAULT_STAGE_MASK;
> +		vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;
> +	}

This is all kinds of messy.  KVM _appears_ to still rely on the hardware-reported
address + error_code

	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
		vmcb->control.exit_info_1 = fault->error_code;
		vmcb->control.exit_info_2 = fault->address;
	}

But then drops bits 31:0 in favor of the fault error code.  Then even more
bizarrely, bitwise-ORs bits 63:32 and WARNs if multiple bits in
PFERR_GUEST_FAULT_STAGE_MASK are set.  In practice, the bitwise-OR of 63:32 is
_only_ going to affect PFERR_GUEST_FAULT_STAGE_MASK, because the other defined
bits are all specific to SNP, and KVM doesn't support nested virtualization for
SEV+.

So I don't understand why this isn't simply:

	vmcb->control.exit_code = SVM_EXIT_NPF;
	vmcb->control.exit_info_1 = fault->error_code;

	/*
	 * All nested page faults should be annotated as occurring on the
	 * final translation *or* the page walk. Arbitrarily choose "final"
	 * if KVM is buggy and enumerated both or neither.
	 */
	if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 &
				   PFERR_GUEST_FAULT_STAGE_MASK) != 1)) {
		vmcb->control.exit_info_1 &= ~PFERR_GUEST_FAULT_STAGE_MASK;
		vmcb->control.exit_info_1 |= PFERR_GUEST_FINAL_MASK;
	}

Which would more or less align with the proposed nEPT handling.

> +
>  	nested_svm_vmexit(svm);
>  }
>  
> -- 
> 2.53.0.414.gf7e9f6c205-goog
> 

  reply	other threads:[~2026-02-24 16:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  7:18 [PATCH V2 0/4] KVM: X86: Correctly populate nested page fault Kevin Cheng
2026-02-24  7:18 ` [PATCH V2 1/4] KVM: x86: Widen x86_exception's error_code to 64 bits Kevin Cheng
2026-02-24  7:18 ` [PATCH V2 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK Kevin Cheng
2026-02-24 16:42   ` Sean Christopherson [this message]
2026-02-24 16:53     ` Sean Christopherson
2026-03-05  3:50     ` Kevin Cheng
2026-03-05 19:46       ` Sean Christopherson
2026-03-13  4:50     ` Kevin Cheng
2026-03-13  5:36       ` Kevin Cheng
2026-02-24  7:18 ` [PATCH V2 3/4] KVM: VMX: Don't consult original exit qualification for nested EPT violation injection Kevin Cheng
2026-02-24 17:31   ` Sean Christopherson
2026-02-24 19:00     ` Yosry Ahmed
2026-02-24 19:37       ` Sean Christopherson
2026-02-24 19:42         ` Yosry Ahmed
2026-02-24 20:28           ` Sean Christopherson
2026-02-24  7:18 ` [PATCH V2 4/4] KVM: selftests: Add nested page fault injection test Kevin Cheng
2026-02-24 17:37   ` Sean Christopherson
2026-03-05  3:54     ` Kevin Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ3VCq4s7l9f4JTw@google.com \
    --to=seanjc@google.com \
    --cc=chengkev@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.