public inbox for linux-coco@lists.linux.dev
 help / color / mirror / Atom feed
From: Xiaoyao Li <xiaoyao.li@intel.com>
To: "Huang, Kai" <kai.huang@intel.com>,
	"seanjc@google.com" <seanjc@google.com>
Cc: "Bae, Chang Seok" <chang.seok.bae@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"kas@kernel.org" <kas@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values
Date: Tue, 14 Apr 2026 10:12:37 +0800	[thread overview]
Message-ID: <95a931f8-42cc-4834-953c-30c9167bfdc1@intel.com> (raw)
In-Reply-To: <d6f05e5fa781ccb465d27a4fb1c7c1ac1e9e95ff.camel@intel.com>

On 4/14/2026 7:03 AM, Huang, Kai wrote:
>> Because VMX and SVM make all GRPs available immediately, except
>> for RSP, KVM ignores avail/dirty for GPRs.  I.e. "fixing" TDX will just shift the
>> "bugs" elsewhere.
> Just want to understand:
> 
> I thought the fix could be we simply remove the wrong GPRs from the list.
> Not sure how fixing TDX will shift bugs elsewhere?

I'm curious too.

>> More importantly, because the TDX-Module*requires* RCX (the GPR that holds the
>> mask of registers to expose to the VMM) to be hidden on TDVMCALL, KVM*can't*
>> do any kind of meaningful "available" tracking.
>>
> Hmm I think RCX conveys the shared GPRs and VMM can read.  Per "Table 5.323:
> TDH.VP.ENTER Output Operands Format #5 Definition: On TDCALL(TDG.VP.VMCALL)
> Following a TD Entry":
> 
>    RCX   ...
> 	Bit(s) Name         Description
> 
> 	31:0   PARAMS_MASK  Value as passed into TDCALL(TDG.VP.VMCALL) by
> 			    the guest TD: indicates which part of the guest
> 			    TD GPR and XMM state is passed as-is to the
> VMM
> 			    and back. For details, see the description of
> 			    TDG.VP.VMCALL in 5.5.26.
> 
> I think the problem is, as said previously, currently KVM TDX code uses
> KVM's existing infrastructure to emulate MSR, KVM hypercall etc,  but
> TDVMCALL has a different ABI, thus there's a mismatch here.

I once had patch for it internally.

It adds back the available check for GPRs when accessing instead of 
assuming they are always available. For normal VMX and SVM, all the GPRs 
are still always available. But for TDX, only EXIT_INFO_1 and 
EXIT_INFO_2 are always marked available, while others need to be 
explicitly set case by case.

The good thing is it makes TDX safer that KVM won't consume invalid data 
silently for TDX. But it adds additional overhead of checking the 
unnecessary register availability for VMX and SVM case.

-----------------------------&<-------------------------------------
From: Xiaoyao Li <xiaoyao.li@intel.com>
Date: Tue, 11 Mar 2025 07:13:29 -0400
Subject: [PATCH] KVM: x86: Add available check for GPRs

Since commit de3cd117ed2f ("KVM: x86: Omit caching logic for
always-available GPRs"), KVM doesn't check the availability of GPRs
except RSP and RIP when accessing them, because they are always
available.

However, it's not true when it comes to TDX. The GPRs are not available
after TD vcpu exits actually. And it relies on KVM manually sets the
GPRs value when needed, e.g.

  - setting rax, rbx, rcx, rdx, rsi, for hypercall emulation in
    tdx_emulate_tdvmall();

  - setting rax, rcx and rdx before MSR write emulation;

Add the available check of GPRs read, and WARN_ON_ONCE() when unavailable.
It can help capture the cases of undesired GPRs consumption by TDX.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
  arch/x86/kvm/kvm_cache_regs.h | 60 +++++++++++++++++++----------------
  arch/x86/kvm/vmx/tdx.c        | 25 +++------------
  2 files changed, 37 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 8ddb01191d6f..b2fa01ee2b4b 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -16,34 +16,6 @@

  static_assert(!(KVM_POSSIBLE_CR0_GUEST_BITS & X86_CR0_PDPTR_BITS));

-#define BUILD_KVM_GPR_ACCESSORS(lname, uname)				      \
-static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu 
*vcpu)\
-{									      \
-	return vcpu->arch.regs[VCPU_REGS_##uname];			      \
-}									      \
-static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,	 
      \
-						unsigned long val)	      \
-{									      \
-	vcpu->arch.regs[VCPU_REGS_##uname] = val;			      \
-}
-BUILD_KVM_GPR_ACCESSORS(rax, RAX)
-BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
-BUILD_KVM_GPR_ACCESSORS(rcx, RCX)
-BUILD_KVM_GPR_ACCESSORS(rdx, RDX)
-BUILD_KVM_GPR_ACCESSORS(rbp, RBP)
-BUILD_KVM_GPR_ACCESSORS(rsi, RSI)
-BUILD_KVM_GPR_ACCESSORS(rdi, RDI)
-#ifdef CONFIG_X86_64
-BUILD_KVM_GPR_ACCESSORS(r8,  R8)
-BUILD_KVM_GPR_ACCESSORS(r9,  R9)
-BUILD_KVM_GPR_ACCESSORS(r10, R10)
-BUILD_KVM_GPR_ACCESSORS(r11, R11)
-BUILD_KVM_GPR_ACCESSORS(r12, R12)
-BUILD_KVM_GPR_ACCESSORS(r13, R13)
-BUILD_KVM_GPR_ACCESSORS(r14, R14)
-BUILD_KVM_GPR_ACCESSORS(r15, R15)
-#endif
-
  /*
   * Using the register cache from interrupt context is generally not 
allowed, as
   * caching a register and marking it available/dirty can't be done 
atomically,
@@ -92,6 +64,38 @@ static inline void kvm_register_mark_dirty(struct 
kvm_vcpu *vcpu,
  	__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
  }

+#define BUILD_KVM_GPR_ACCESSORS(lname, uname)				      \
+static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu 
*vcpu)\
+{									      \
+	if (WARN_ON_ONCE(!kvm_register_is_available(vcpu, VCPU_REGS_##uname)))\
+		return 0;						      \
+									      \
+	return vcpu->arch.regs[VCPU_REGS_##uname];			      \
+}									      \
+static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,	 
      \
+						unsigned long val)	      \
+{									      \
+	vcpu->arch.regs[VCPU_REGS_##uname] = val;			      \
+	kvm_register_mark_available(vcpu, VCPU_REGS_##uname);	      	      \
+}
+BUILD_KVM_GPR_ACCESSORS(rax, RAX)
+BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
+BUILD_KVM_GPR_ACCESSORS(rcx, RCX)
+BUILD_KVM_GPR_ACCESSORS(rdx, RDX)
+BUILD_KVM_GPR_ACCESSORS(rbp, RBP)
+BUILD_KVM_GPR_ACCESSORS(rsi, RSI)
+BUILD_KVM_GPR_ACCESSORS(rdi, RDI)
+#ifdef CONFIG_X86_64
+BUILD_KVM_GPR_ACCESSORS(r8,  R8)
+BUILD_KVM_GPR_ACCESSORS(r9,  R9)
+BUILD_KVM_GPR_ACCESSORS(r10, R10)
+BUILD_KVM_GPR_ACCESSORS(r11, R11)
+BUILD_KVM_GPR_ACCESSORS(r12, R12)
+BUILD_KVM_GPR_ACCESSORS(r13, R13)
+BUILD_KVM_GPR_ACCESSORS(r14, R14)
+BUILD_KVM_GPR_ACCESSORS(r15, R15)
+#endif
+
  /*
   * kvm_register_test_and_mark_available() is a special snowflake that 
uses an
   * arch bitop directly to avoid the explicit instrumentation that 
comes with
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index cefe6cdd60a9..2b90a60e6f64 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -969,6 +969,9 @@ static __always_inline u32 
tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu)
  	return exit_reason;
  }

+#define TDX_REGS_AVAIL_SET	(BIT_ULL(VCPU_EXREG_EXIT_INFO_1) | \
+				 BIT_ULL(VCPU_EXREG_EXIT_INFO_2))
+
  static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu)
  {
  	struct vcpu_tdx *tdx = to_tdx(vcpu);
@@ -985,6 +988,8 @@ static noinstr void tdx_vcpu_enter_exit(struct 
kvm_vcpu *vcpu)
  	tdx->exit_gpa = tdx->vp_enter_args.r8;
  	vt->exit_intr_info = tdx->vp_enter_args.r9;

+	vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
+
  	vmx_handle_nmi(vcpu);

  	guest_state_exit_irqoff();
@@ -1017,24 +1022,6 @@ static fastpath_t 
tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
  	return EXIT_FASTPATH_NONE;
  }

-#define TDX_REGS_AVAIL_SET	(BIT_ULL(VCPU_EXREG_EXIT_INFO_1) | \
-				 BIT_ULL(VCPU_EXREG_EXIT_INFO_2) | \
-				 BIT_ULL(VCPU_REGS_RAX) | \
-				 BIT_ULL(VCPU_REGS_RBX) | \
-				 BIT_ULL(VCPU_REGS_RCX) | \
-				 BIT_ULL(VCPU_REGS_RDX) | \
-				 BIT_ULL(VCPU_REGS_RBP) | \
-				 BIT_ULL(VCPU_REGS_RSI) | \
-				 BIT_ULL(VCPU_REGS_RDI) | \
-				 BIT_ULL(VCPU_REGS_R8) | \
-				 BIT_ULL(VCPU_REGS_R9) | \
-				 BIT_ULL(VCPU_REGS_R10) | \
-				 BIT_ULL(VCPU_REGS_R11) | \
-				 BIT_ULL(VCPU_REGS_R12) | \
-				 BIT_ULL(VCPU_REGS_R13) | \
-				 BIT_ULL(VCPU_REGS_R14) | \
-				 BIT_ULL(VCPU_REGS_R15))
-
  static void tdx_load_host_xsave_state(struct kvm_vcpu *vcpu)
  {
  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
@@ -1108,8 +1095,6 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 
run_flags)

  	tdx_load_host_xsave_state(vcpu);

-	vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
-
  	if (unlikely(tdx->vp_enter_ret == EXIT_REASON_EPT_MISCONFIG))
  		return EXIT_FASTPATH_NONE;

-- 
2.43.0


  reply	other threads:[~2026-04-14  2:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 22:42 [PATCH v2 0/6] KVM: x86: Reg cleanups / prep work for APX Sean Christopherson
2026-04-09 22:42 ` [PATCH v2 1/6] KVM: x86: Add dedicated storage for guest RIP Sean Christopherson
2026-04-10 18:43   ` Chang S. Bae
2026-04-14 12:31   ` Xiaoyao Li
2026-04-14 13:59     ` Chang S. Bae
2026-04-14 15:37       ` Sean Christopherson
2026-04-09 22:42 ` [PATCH v2 2/6] KVM: x86: Drop the "EX" part of "EXREG" to avoid collision with APX Sean Christopherson
2026-04-13 11:23   ` Huang, Kai
2026-04-09 22:42 ` [PATCH v2 3/6] KVM: nVMX: Do a bitwise-AND of regs_avail when switching active VMCS Sean Christopherson
2026-04-09 22:42 ` [PATCH v2 4/6] KVM: x86: Add wrapper APIs to reset dirty/available register masks Sean Christopherson
2026-04-09 22:42 ` [PATCH v2 5/6] KVM: x86: Track available/dirty register masks as "unsigned long" values Sean Christopherson
2026-04-13 11:24   ` Huang, Kai
2026-04-13 11:28   ` Huang, Kai
2026-04-13 14:54     ` Sean Christopherson
2026-04-13 23:03       ` Huang, Kai
2026-04-14  2:12         ` Xiaoyao Li [this message]
2026-04-14 14:04           ` Sean Christopherson
2026-04-14 15:48         ` Sean Christopherson
2026-04-14 22:21           ` Huang, Kai
2026-04-09 22:42 ` [PATCH v2 6/6] KVM: x86: Use a proper bitmap for tracking available/dirty registers Sean Christopherson
2026-04-13 11:31 ` [PATCH v2 0/6] KVM: x86: Reg cleanups / prep work for APX Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95a931f8-42cc-4834-953c-30c9167bfdc1@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=chang.seok.bae@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox