* [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace
@ 2024-12-01 3:53 Binbin Wu
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
` (7 more replies)
0 siblings, 8 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
Hi,
When executing in the TD, TDX can exit to the host VMM (KVM) for many
reasons. These reasons are analogous to the exit reasons for VMX. Some of
the exits will be handled within KVM in later changes. This series handles
just the TDX exits that may be passed to userspace to handle, which are all
via the TDCALL exit code. Although, these userspace exits have the same TDX
exit code, they result in several different types of exits to userspace.
This patch set is one of several patch sets that are all needed to provide
the ability to run a functioning TD VM. The goal of this patch set is to
facilitate the review and finalization of the uAPI between KVM and userspace
VMMs (e.g., QEMU). We would like get maintainers/reviewers feedback of the
implementation (i.e. the design of individual KVM exits instead of one raw
TDX exit), but we don't think it is ready for kvm-coco-queue yet.
Base of this series
===================
This series is based off of a kvm-coco-queue commit and some pre-req series:
1. commit ee69eb746754 ("KVM: x86/mmu: Prevent aliased memslot GFNs") (in
kvm-coco-queue).
2. v7 of "TDX host: metadata reading tweaks and bug fixes and info dump" [0]
3. v1 of "KVM: VMX: Initialize TDX when loading KVM module" [1]
4. v2 of “TDX vCPU/VM creation” [2]
5. v2 of "TDX KVM MMU part 2" [3]
6 v1 of "TDX vcpu enter/exit" [4] with a few fixups based on review feedbacks.
7. v4 of "KVM: x86: Prep KVM hypercall handling for TDX" [5]
TDX hypercalls
==============
The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL) for the
guest TDs to make hypercall to VMM. When a guest TD issues a TDVMCALL, it
exits to VMM with a new exit reason. The arguments from the guest TD and
return values from the VMM are passed through the guest registers. The
ABI details for the guest TD hypercalls are specified in the TDX Guest-Host
Communication Interface (GHCI) specification [6].
There are two types of hypercalls defined in the GHCI specification:
- Standard TDVMCALLs: When input of R10 from guest TD is set to 0, it
indicates that the TDVMCALL sub-function used in R11 is defined in GHCI
specification.
- Vendor-Specific TDVMCALLs: When input of R10 from guest TD is non-zero,
it indicates a vendor-specific TDVMCALL. For KVM hypercalls from guest
TDs, KVM uses R10 as KVM hypercall number and R11-R14 as 4 arguments,
with the error code returned in R10.
KVM hypercalls
--------------
This series supports KVM hypercalls from guest TDs following the
vendor-specific interface described above. The KVM_HC_MAP_GPA_RANGE
hypercall will need to exit to userspace for handling.
Standard TDVMCALLs exiting to userspace
---------------------------------------
To support basic functionality of TDX, this series includes the following
standard TDVMCALL sub-functions, which reuse exist exit reasons when they
need to exit to userspace for handling:
- TDG.VP.VMCALL<MapGPA> reuses exit reason KVM_EXIT_HYPERCALL with the
hypercall number KVM_HC_MAP_GPA_RANGE.
- TDG.VP.VMCALL<ReportFatalError> reuses exit reason KVM_EXIT_SYSTEM_EVENT
with a new event type KVM_SYSTEM_EVENT_TDX_FATAL.
- TDG.VP.VMCALL<Instruction.IO> reuses exit reason KVM_EXIT_IO.
- TDG.VP.VMCALL<#VE.RequestMMIO> reuses exit reason KVM_EXIT_MMIO.
Support for TD attestation is currently excluded from the TDX basic enabling,
so handling for TDG.VP.VMCALL<SetupEventNotifyInterrupt> and
TDG.VP.VMCALL<GetQuote> is not included in this patch series.
Notable changes since v19
=========================
There are a several minor changes across the patches. A few more structural
changes are highlighted below.
Move out NMI, exception and external interrupt handling code
------------------------------------------------------------
Since this series is focusing on the hypercall that may exit to userspace,
the code for handling NMI, exception and external interrupt has been moved
out from "KVM: TDX: Add a place holder to handle TDX VM exit" to a future
series focusing on interrupts.
Remove the use of union tdx_exit_reason
---------------------------------------
As suggested by Sean, the use of union tdx_exit_reason has been removed.
The VMX exit reason is used only when the return value of TDH.VP.ENTER has
a valid VMX exit reason.
https://lore.kernel.org/kvm/ZfSExlemFMKjBtZb@google.com/
Remove the KVM exit reason KVM_EXIT_TDX
---------------------------------------
Each TDX hypercall needing to exit to userspace now has dedicated exit
reason. No new exit reasons are added in this series; existing exit reasons
are reused though.
Repos
=====
The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-11-20-exits-userspace
A matching QEMU is here:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-upstream-v6.1
Testing
=======
It has been tested as part of the development branch for the TDX base
series. The testing consisted of TDX kvm-unit-tests and booting a Linux
TD, and TDX enhanced KVM selftests.
[0] https://lore.kernel.org/kvm/cover.1731318868.git.kai.huang@intel.com
[1] https://lore.kernel.org/kvm/cover.1730120881.git.kai.huang@intel.com
[2] https://lore.kernel.org/kvm/20241030190039.77971-1-rick.p.edgecombe@intel.com
[3] https://lore.kernel.org/kvm/20241112073327.21979-1-yan.y.zhao@intel.com
[4] https://lore.kernel.org/kvm/20241121201448.36170-1-adrian.hunter@intel.com
[5] https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com
[6] https://cdrdv2.intel.com/v1/dl/getContent/726792
Binbin Wu (2):
KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
Isaku Yamahata (4):
KVM: TDX: Add a place holder to handle TDX VM exit
KVM: TDX: Add a place holder for handler of TDX hypercalls
(TDG.VP.VMCALL)
KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL
KVM: TDX: Handle TDX PV port I/O hypercall
Sean Christopherson (1):
KVM: TDX: Handle TDX PV MMIO hypercall
Documentation/virt/kvm/api.rst | 8 +
arch/x86/include/asm/shared/tdx.h | 1 +
arch/x86/include/asm/tdx.h | 1 +
arch/x86/include/uapi/asm/vmx.h | 4 +-
arch/x86/kvm/vmx/main.c | 25 +-
arch/x86/kvm/vmx/tdx.c | 578 +++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/tdx.h | 3 +
arch/x86/kvm/vmx/tdx_errno.h | 3 +
arch/x86/kvm/vmx/x86_ops.h | 8 +
arch/x86/kvm/x86.c | 1 +
include/uapi/linux/kvm.h | 1 +
virt/kvm/kvm_main.c | 1 +
12 files changed, 630 insertions(+), 4 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-09 11:21 ` Chao Gao
` (2 more replies)
2024-12-01 3:53 ` [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Binbin Wu
` (6 subsequent siblings)
7 siblings, 3 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
From: Isaku Yamahata <isaku.yamahata@intel.com>
Introduce the wiring for handling TDX VM exits by implementing the
callbacks .get_exit_info(), and .handle_exit(). Additionally, add
error handling during the TDX VM exit flow, and add a place holder
to handle various exit reasons. Add helper functions to retrieve
exit information, exit qualifications, and more.
Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.*
operations for secure EPT or TD EPOCH. If contention occurs, the return
value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU
to attempt re-entry into the guest via the fast path.
Error Handling: The following scenarios will return to userspace with
KVM_EXIT_INTERNAL_ERROR.
- TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the
CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX
isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM
firmware is not loaded or disabled.
- TDX_ERROR: This indicates some check failed in the TDX module, preventing
the vCPU from running.
- TDX_NON_RECOVERABLE: Set by the TDX module when the error is
non-recoverable, indicating that the TDX guest is dead or the vCPU is
disabled. This also covers failed_vmentry case, which must have
TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled.
An exception is the triple fault, which also sets TDX_NON_RECOVERABLE
but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX
case.
- Any unhandled VM exit reason will also return to userspace with
KVM_EXIT_INTERNAL_ERROR.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace breakout:
- Dropped Paolo's Reviewed-by since the change is not subtle.
- Mention addition of .get_exit_info() handler in changelog. (Binbin)
- tdh_sept_seamcall() -> tdx_seamcall_sept() in comments. (Binbin)
- Do not open code TDX_ERROR_SEPT_BUSY. (Binbin)
- "TDH.VP.ENTRY" -> "TDH.VP.ENTER". (Binbin)
- Remove the use of union tdx_exit_reason. (Sean)
https://lore.kernel.org/kvm/ZfSExlemFMKjBtZb@google.com/
- Add tdx_check_exit_reason() to check a VMX exit reason against the
status code of TDH.VP.ENTER.
- Move the handling of TDX_ERROR_SEPT_BUSY and (TDX_OPERAND_BUSY |
TDX_OPERAND_ID_TD_EPOCH) into fast path, and add a helper function
tdx_exit_handlers_fastpath().
- Remove the warning on TDX_SW_ERROR in fastpath, but return without
further handling.
- Call kvm_machine_check() for EXIT_REASON_MCE_DURING_VMENTRY, align
with VMX case.
- On failed_vmentry in fast path, return without further handling.
- Exit to userspace for #UD and #GP.
- Fix whitespace in tdx_get_exit_info()
- Add a comment in tdx_handle_exit() to describe failed_vmentry case
is handled by TDX_NON_RECOVERABLE handling.
- Move the code of handling NMI, exception and external interrupts out
of the patch, i.e., the NMI handling in tdx_vcpu_enter_exit() and the
wiring of .handle_exit_irqoff() are removed.
- Drop the check for VCPU_TD_STATE_INITIALIZED in tdx_handle_exit()
because it has been checked in tdx_vcpu_pre_run().
- Update changelog.
---
arch/x86/include/asm/tdx.h | 1 +
arch/x86/kvm/vmx/main.c | 25 +++++-
arch/x86/kvm/vmx/tdx.c | 164 ++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/tdx_errno.h | 3 +
arch/x86/kvm/vmx/x86_ops.h | 8 ++
5 files changed, 198 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 77477b905dca..01409a59224d 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -18,6 +18,7 @@
* TDX module.
*/
#define TDX_ERROR _BITUL(63)
+#define TDX_NON_RECOVERABLE _BITUL(62)
#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index f8acb1dc7c10..4f6faeb6e8e5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -165,6 +165,15 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
return vmx_vcpu_run(vcpu, force_immediate_exit);
}
+static int vt_handle_exit(struct kvm_vcpu *vcpu,
+ enum exit_fastpath_completion fastpath)
+{
+ if (is_td_vcpu(vcpu))
+ return tdx_handle_exit(vcpu, fastpath);
+
+ return vmx_handle_exit(vcpu, fastpath);
+}
+
static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
{
if (is_td_vcpu(vcpu)) {
@@ -212,6 +221,18 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
}
+static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+ u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+ if (is_td_vcpu(vcpu)) {
+ tdx_get_exit_info(vcpu, reason, info1, info2, intr_info,
+ error_code);
+ return;
+ }
+
+ vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
+}
+
static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
if (!is_td(kvm))
@@ -305,7 +326,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_pre_run = vt_vcpu_pre_run,
.vcpu_run = vt_vcpu_run,
- .handle_exit = vmx_handle_exit,
+ .handle_exit = vt_handle_exit,
.skip_emulated_instruction = vmx_skip_emulated_instruction,
.update_emulated_instruction = vmx_update_emulated_instruction,
.set_interrupt_shadow = vmx_set_interrupt_shadow,
@@ -340,7 +361,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.set_identity_map_addr = vmx_set_identity_map_addr,
.get_mt_mask = vmx_get_mt_mask,
- .get_exit_info = vmx_get_exit_info,
+ .get_exit_info = vt_get_exit_info,
.vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f975bb323f60..3dcbdb5a7bf8 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -186,6 +186,54 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
}
+static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu)
+{
+ return (union vmx_exit_reason)(u32)(to_tdx(vcpu)->vp_enter_ret);
+}
+
+/*
+ * There is no simple way to check some bit(s) to decide whether the return
+ * value of TDH.VP.ENTER has a VMX exit reason or not. E.g.,
+ * TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE has exit reason but with error bit
+ * (bit 63) set, TDX_NON_RECOVERABLE_TD_CORRUPTED_MD has no exit reason but with
+ * error bit cleared.
+ */
+static __always_inline bool tdx_has_exit_reason(struct kvm_vcpu *vcpu)
+{
+ u64 status = to_tdx(vcpu)->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK;
+
+ return status == TDX_SUCCESS || status == TDX_NON_RECOVERABLE_VCPU ||
+ status == TDX_NON_RECOVERABLE_TD ||
+ status == TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE ||
+ status == TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE;
+}
+
+static __always_inline bool tdx_check_exit_reason(struct kvm_vcpu *vcpu, u16 reason)
+{
+ return tdx_has_exit_reason(vcpu) &&
+ (u16)tdexit_exit_reason(vcpu).basic == reason;
+}
+
+static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcpu)
+{
+ return kvm_rcx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu *vcpu)
+{
+ return kvm_rdx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu)
+{
+ return kvm_r8_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu)
+{
+ return kvm_r9_read(vcpu);
+}
+
static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
{
tdx_guest_keyid_free(kvm_tdx->hkid);
@@ -824,6 +872,21 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu)
guest_state_exit_irqoff();
}
+static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
+{
+ u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret;
+
+ /* See the comment of tdx_seamcall_sept(). */
+ if (unlikely(vp_enter_ret == TDX_ERROR_SEPT_BUSY))
+ return EXIT_FASTPATH_REENTER_GUEST;
+
+ /* TDH.VP.ENTER checks TD EPOCH which can contend with TDH.MEM.TRACK. */
+ if (unlikely(vp_enter_ret == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH)))
+ return EXIT_FASTPATH_REENTER_GUEST;
+
+ return EXIT_FASTPATH_NONE;
+}
+
fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
{
struct vcpu_tdx *tdx = to_tdx(vcpu);
@@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED;
vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
+
+ if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR))
+ return EXIT_FASTPATH_NONE;
+
+ if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY)))
+ kvm_machine_check();
+
trace_kvm_exit(vcpu, KVM_ISA_VMX);
- return EXIT_FASTPATH_NONE;
+ if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry))
+ return EXIT_FASTPATH_NONE;
+
+ return tdx_exit_handlers_fastpath(vcpu);
+}
+
+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
+{
+ vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+ vcpu->mmio_needed = 0;
+ return 0;
}
void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
@@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
}
+int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
+{
+ struct vcpu_tdx *tdx = to_tdx(vcpu);
+ u64 vp_enter_ret = tdx->vp_enter_ret;
+ union vmx_exit_reason exit_reason;
+
+ if (fastpath != EXIT_FASTPATH_NONE)
+ return 1;
+
+ /*
+ * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
+ * TDX_SEAMCALL_VMFAILINVALID.
+ */
+ if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
+ KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
+ goto unhandled_exit;
+ }
+
+ /*
+ * Without off-TD debug enabled, failed_vmentry case must have
+ * TDX_NON_RECOVERABLE set.
+ */
+ if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
+ /* Triple fault is non-recoverable. */
+ if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT)))
+ return tdx_handle_triple_fault(vcpu);
+
+ kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n",
+ vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
+ set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
+ goto unhandled_exit;
+ }
+
+ /* From now, the seamcall status should be TDX_SUCCESS. */
+ WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS);
+ exit_reason = tdexit_exit_reason(vcpu);
+
+ switch (exit_reason.basic) {
+ default:
+ break;
+ }
+
+unhandled_exit:
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
+ vcpu->run->internal.ndata = 2;
+ vcpu->run->internal.data[0] = vp_enter_ret;
+ vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
+ return 0;
+}
+
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+ u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+ struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+ if (tdx_has_exit_reason(vcpu)) {
+ /*
+ * Encode some useful info from the the 64 bit return code
+ * into the 32 bit exit 'reason'. If the VMX exit reason is
+ * valid, just set it to those bits.
+ */
+ *reason = (u32)tdx->vp_enter_ret;
+ *info1 = tdexit_exit_qual(vcpu);
+ *info2 = tdexit_ext_exit_qual(vcpu);
+ } else {
+ /*
+ * When the VMX exit reason in vp_enter_ret is not valid,
+ * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to
+ * mean the vmexit code is not valid. Set the other bits to
+ * try to avoid picking a value that may someday be a valid
+ * VMX exit code.
+ */
+ *reason = 0xFFFFFFFF;
+ *info1 = 0;
+ *info2 = 0;
+ }
+
+ *intr_info = tdexit_intr_info(vcpu);
+ *error_code = 0;
+}
+
static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
{
const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf;
diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
index f9dbb3a065cc..6ff4672c4181 100644
--- a/arch/x86/kvm/vmx/tdx_errno.h
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -10,6 +10,9 @@
* TDX SEAMCALL Status Codes (returned in RAX)
*/
#define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL
+#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL
+#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL
+#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL
#define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL
#define TDX_OPERAND_INVALID 0xC000010000000000ULL
#define TDX_OPERAND_BUSY 0x8000020000000000ULL
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 02b33390e1bf..1c18943e0e1d 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
void tdx_vcpu_put(struct kvm_vcpu *vcpu);
+int tdx_handle_exit(struct kvm_vcpu *vcpu,
+ enum exit_fastpath_completion fastpath);
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+ u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat
}
static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
+static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
+ enum exit_fastpath_completion fastpath) { return 0; }
+static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
+ u64 *info2, u32 *intr_info, u32 *error_code) {}
static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL)
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-09 11:28 ` Chao Gao
2024-12-01 3:53 ` [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Binbin Wu
` (5 subsequent siblings)
7 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
From: Isaku Yamahata <isaku.yamahata@intel.com>
Add a place holder and related helper functions for preparation of
TDG.VP.VMCALL handling.
The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short)
for the guest TD to call hypercall to VMM. When the guest TD issues a
TDVMCALL, the guest TD exits to VMM with a new exit reason. The arguments
from the guest TD and returned values from the VMM are passed in the guest
registers. The guest RCX register indicates which registers are used.
Define helper functions to access those registers.
A new VMX exit reason TDCALL is added to indicate the exit is due to TDVMCALL
from the guest TD. Define the TDCALL exit reason and add a place holder to
handle such exit.
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace breakout:
- Update changelog.
- Drop the unused tdx->tdvmcall. (Chao)
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
---
arch/x86/include/uapi/asm/vmx.h | 4 ++-
arch/x86/kvm/vmx/tdx.c | 48 +++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index a5faf6d88f1b..6a9f268a2d2c 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -92,6 +92,7 @@
#define EXIT_REASON_TPAUSE 68
#define EXIT_REASON_BUS_LOCK 74
#define EXIT_REASON_NOTIFY 75
+#define EXIT_REASON_TDCALL 77
#define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -155,7 +156,8 @@
{ EXIT_REASON_UMWAIT, "UMWAIT" }, \
{ EXIT_REASON_TPAUSE, "TPAUSE" }, \
{ EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \
- { EXIT_REASON_NOTIFY, "NOTIFY" }
+ { EXIT_REASON_NOTIFY, "NOTIFY" }, \
+ { EXIT_REASON_TDCALL, "TDCALL" }
#define VMX_EXIT_REASON_FLAGS \
{ VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3dcbdb5a7bf8..19fd8a5dabd0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -234,6 +234,41 @@ static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu)
return kvm_r9_read(vcpu);
}
+#define BUILD_TDVMCALL_ACCESSORS(param, gpr) \
+static __always_inline \
+unsigned long tdvmcall_##param##_read(struct kvm_vcpu *vcpu) \
+{ \
+ return kvm_##gpr##_read(vcpu); \
+} \
+static __always_inline void tdvmcall_##param##_write(struct kvm_vcpu *vcpu, \
+ unsigned long val) \
+{ \
+ kvm_##gpr##_write(vcpu, val); \
+}
+BUILD_TDVMCALL_ACCESSORS(a0, r12);
+BUILD_TDVMCALL_ACCESSORS(a1, r13);
+BUILD_TDVMCALL_ACCESSORS(a2, r14);
+BUILD_TDVMCALL_ACCESSORS(a3, r15);
+
+static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *vcpu)
+{
+ return kvm_r10_read(vcpu);
+}
+static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu)
+{
+ return kvm_r11_read(vcpu);
+}
+static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu,
+ long val)
+{
+ kvm_r10_write(vcpu, val);
+}
+static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu,
+ unsigned long val)
+{
+ kvm_r11_write(vcpu, val);
+}
+
static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
{
tdx_guest_keyid_free(kvm_tdx->hkid);
@@ -922,6 +957,17 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
return 0;
}
+static int handle_tdvmcall(struct kvm_vcpu *vcpu)
+{
+ switch (tdvmcall_leaf(vcpu)) {
+ default:
+ break;
+ }
+
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
+ return 1;
+}
+
void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
{
u64 shared_bit = (pgd_level == 5) ? TDX_SHARED_BIT_PWL_5 :
@@ -1253,6 +1299,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
exit_reason = tdexit_exit_reason(vcpu);
switch (exit_reason.basic) {
+ case EXIT_REASON_TDCALL:
+ return handle_tdvmcall(vcpu);
default:
break;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
2024-12-01 3:53 ` [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-09 2:58 ` Chao Gao
2024-12-01 3:53 ` [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Binbin Wu
` (4 subsequent siblings)
7 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
From: Isaku Yamahata <isaku.yamahata@intel.com>
Handle KVM hypercall for TDX according to TDX Guest-Host Communication
Interface (GHCI) specification.
The TDX GHCI specification defines the ABI for the guest TD to issue
hypercalls. When R10 is non-zero, it indicates the TDG.VP.VMCALL is
vendor-specific. KVM uses R10 as KVM hypercall number and R11-R14
as 4 arguments, while the error code is returned in R10. Follow the
ABI and handle the KVM hypercall for TDX.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace breakout:
- Renamed from "KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL" to
"KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL".
- Update the change log.
- Rebased on Sean's "Prep KVM hypercall handling for TDX" patch set.
https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com
- Use the right register (i.e. R10) to set the return code after returning
back from userspace.
---
arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 19fd8a5dabd0..4cc55b120ab0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -957,8 +957,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
return 0;
}
+
+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
+{
+ kvm_r10_write(vcpu, vcpu->run->hypercall.ret);
+ return 1;
+}
+
+static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
+{
+ int r;
+
+ /*
+ * ABI for KVM tdvmcall argument:
+ * In Guest-Hypervisor Communication Interface(GHCI) specification,
+ * Non-zero leaf number (R10 != 0) is defined to indicate
+ * vendor-specific. KVM uses this for KVM hypercall. NOTE: KVM
+ * hypercall number starts from one. Zero isn't used for KVM hypercall
+ * number.
+ *
+ * R10: KVM hypercall number
+ * arguments: R11, R12, R13, R14.
+ */
+ r = __kvm_emulate_hypercall(vcpu, r10, r11, r12, r13, r14, true, 0,
+ complete_hypercall_exit);
+
+ return r > 0;
+}
+
static int handle_tdvmcall(struct kvm_vcpu *vcpu)
{
+ if (tdvmcall_exit_type(vcpu))
+ return tdx_emulate_vmcall(vcpu);
+
switch (tdvmcall_leaf(vcpu)) {
default:
break;
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
` (2 preceding siblings ...)
2024-12-01 3:53 ` [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-09 12:45 ` Chao Gao
2024-12-13 9:32 ` Xiaoyao Li
2024-12-01 3:53 ` [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
` (3 subsequent siblings)
7 siblings, 2 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL with
KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling.
MapGPA is used by TDX guest to request to map a GPA range as private
or shared memory. It needs to exit to userspace for handling. KVM has
already implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which will
exit to userspace with exit reason KVM_EXIT_HYPERCALL. Do sanity checks,
convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the request
to userspace.
To prevent a TDG.VP.VMCALL<MapGPA> call from taking too long, the MapGPA
range is split into 2MB chunks and check interrupt pending between chunks.
This allows for timely injection of interrupts and prevents issues with
guest lockup detection. TDX guest should retry the operation for the
GPA starting at the address specified in R11 when the TDVMCALL return
TDVMCALL_RETRY as status code.
Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL with
KVM_HC_MAP_GPA_RANGE bit set for TD VM.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace breakout:
- New added.
Implement one of the hypercalls need to exit to userspace for handling after
dropping "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
Sean's comment.
https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
- Check interrupt pending between chunks suggested by Sean.
https://lore.kernel.org/kvm/ZleJvmCawKqmpFIa@google.com/
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
- Use vt_is_tdx_private_gpa()
---
arch/x86/include/asm/shared/tdx.h | 1 +
arch/x86/kvm/vmx/tdx.c | 110 ++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/tdx.h | 3 +
3 files changed, 114 insertions(+)
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 620327f0161f..a602d081cf1c 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -32,6 +32,7 @@
#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL
+#define TDVMCALL_STATUS_ALIGN_ERROR 0x8000000000000002ULL
/*
* Bitmasks of exposed registers (with VMM).
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4cc55b120ab0..553f4cbe0693 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -985,12 +985,122 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
return r > 0;
}
+/*
+ * Split into chunks and check interrupt pending between chunks. This allows
+ * for timely injection of interrupts to prevent issues with guest lockup
+ * detection.
+ */
+#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024)
+static void __tdx_map_gpa(struct vcpu_tdx * tdx);
+
+static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_tdx * tdx = to_tdx(vcpu);
+
+ if(vcpu->run->hypercall.ret) {
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
+ kvm_r11_write(vcpu, tdx->map_gpa_next);
+ return 1;
+ }
+
+ tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN;
+ if (tdx->map_gpa_next >= tdx->map_gpa_end) {
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+ return 1;
+ }
+
+ /*
+ * Stop processing the remaining part if there is pending interrupt.
+ * Skip checking pending virtual interrupt (reflected by
+ * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because
+ * if guest disabled interrupt, it's OK not returning back to guest
+ * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA
+ * immediately after STI or MOV/POP SS.
+ */
+ if (pi_has_pending_interrupt(vcpu) ||
+ kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
+ kvm_r11_write(vcpu, tdx->map_gpa_next);
+ return 1;
+ }
+
+ __tdx_map_gpa(tdx);
+ /* Forward request to userspace. */
+ return 0;
+}
+
+static void __tdx_map_gpa(struct vcpu_tdx * tdx)
+{
+ u64 gpa = tdx->map_gpa_next;
+ u64 size = tdx->map_gpa_end - tdx->map_gpa_next;
+
+ if(size > TDX_MAP_GPA_MAX_LEN)
+ size = TDX_MAP_GPA_MAX_LEN;
+
+ tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
+ tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
+ tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
+ tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
+ tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ?
+ KVM_MAP_GPA_RANGE_ENCRYPTED :
+ KVM_MAP_GPA_RANGE_DECRYPTED;
+ tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
+
+ tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa;
+}
+
+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_tdx * tdx = to_tdx(vcpu);
+ u64 gpa = tdvmcall_a0_read(vcpu);
+ u64 size = tdvmcall_a1_read(vcpu);
+ u64 ret;
+
+ /*
+ * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
+ * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
+ * bit set. If not, the error code is not defined in GHCI for TDX, use
+ * TDVMCALL_STATUS_INVALID_OPERAND for this case.
+ */
+ if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
+ ret = TDVMCALL_STATUS_INVALID_OPERAND;
+ goto error;
+ }
+
+ if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
+ !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) ||
+ (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
+ vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) {
+ ret = TDVMCALL_STATUS_INVALID_OPERAND;
+ goto error;
+ }
+
+ if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
+ ret = TDVMCALL_STATUS_ALIGN_ERROR;
+ goto error;
+ }
+
+ tdx->map_gpa_end = gpa + size;
+ tdx->map_gpa_next = gpa;
+
+ __tdx_map_gpa(tdx);
+ /* Forward request to userspace. */
+ return 0;
+
+error:
+ tdvmcall_set_return_code(vcpu, ret);
+ kvm_r11_write(vcpu, gpa);
+ return 1;
+}
+
static int handle_tdvmcall(struct kvm_vcpu *vcpu)
{
if (tdvmcall_exit_type(vcpu))
return tdx_emulate_vmcall(vcpu);
switch (tdvmcall_leaf(vcpu)) {
+ case TDVMCALL_MAP_GPA:
+ return tdx_map_gpa(vcpu);
default:
break;
}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 1abc94b046a0..bfae70887695 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -71,6 +71,9 @@ struct vcpu_tdx {
enum tdx_prepare_switch_state prep_switch_state;
u64 msr_host_kernel_gs_base;
+
+ u64 map_gpa_next;
+ u64 map_gpa_end;
};
void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
` (3 preceding siblings ...)
2024-12-01 3:53 ` [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-06 9:31 ` Xu Yilun
` (2 more replies)
2024-12-01 3:53 ` [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall Binbin Wu
` (2 subsequent siblings)
7 siblings, 3 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
Convert TDG.VP.VMCALL<ReportFatalError> to KVM_EXIT_SYSTEM_EVENT with
a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for
handling.
TD guest can use TDG.VP.VMCALL<ReportFatalError> to report the fatal
error it has experienced. This hypercall is special because TD guest
is requesting a termination with the error information, KVM needs to
forward the hypercall to userspace anyway, KVM doesn't do sanity checks
and let userspace decide what to do.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace breakout:
- New added.
Implement one of the hypercalls need to exit to userspace for handling after
reverting "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
Sean's comment.
https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
---
Documentation/virt/kvm/api.rst | 8 ++++++
arch/x86/kvm/vmx/tdx.c | 50 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
3 files changed, 59 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index edc070c6e19b..bb39da72c647 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6815,6 +6815,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
#define KVM_SYSTEM_EVENT_WAKEUP 4
#define KVM_SYSTEM_EVENT_SUSPEND 5
#define KVM_SYSTEM_EVENT_SEV_TERM 6
+ #define KVM_SYSTEM_EVENT_TDX_FATAL 7
__u32 type;
__u32 ndata;
__u64 data[16];
@@ -6841,6 +6842,13 @@ Valid values for 'type' are:
reset/shutdown of the VM.
- KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
The guest physical address of the guest's GHCB is stored in `data[0]`.
+ - KVM_SYSTEM_EVENT_TDX_FATAL -- an TDX guest requested termination.
+ The error codes of the guest's GHCI is stored in `data[0]`.
+ If the bit 63 of `data[0]` is set, it indicates there is TD specified
+ additional information provided in a page, which is shared memory. The
+ guest physical address of the information page is stored in `data[1]`.
+ An optional error message is provided by `data[2]` ~ `data[9]`, which is
+ byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled.
- KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
KVM has recognized a wakeup event. Userspace may honor this event by
marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 553f4cbe0693..a79f9ca962d1 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1093,6 +1093,54 @@ static int tdx_map_gpa(struct kvm_vcpu *vcpu)
return 1;
}
+static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
+{
+ u64 reg_mask = kvm_rcx_read(vcpu);
+ u64* opt_regs;
+
+ /*
+ * Skip sanity checks and let userspace decide what to do if sanity
+ * checks fail.
+ */
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
+ vcpu->run->system_event.ndata = 10;
+ /* Error codes. */
+ vcpu->run->system_event.data[0] = tdvmcall_a0_read(vcpu);
+ /* GPA of additional information page. */
+ vcpu->run->system_event.data[1] = tdvmcall_a1_read(vcpu);
+ /* Information passed via registers (up to 64 bytes). */
+ opt_regs = &vcpu->run->system_event.data[2];
+
+#define COPY_REG(REG, MASK) \
+ do { \
+ if (reg_mask & MASK) \
+ *opt_regs = kvm_ ## REG ## _read(vcpu); \
+ else \
+ *opt_regs = 0; \
+ opt_regs++; \
+ } while (0)
+
+ /* The order is defined in GHCI. */
+ COPY_REG(r14, BIT_ULL(14));
+ COPY_REG(r15, BIT_ULL(15));
+ COPY_REG(rbx, BIT_ULL(3));
+ COPY_REG(rdi, BIT_ULL(7));
+ COPY_REG(rsi, BIT_ULL(6));
+ COPY_REG(r8, BIT_ULL(8));
+ COPY_REG(r9, BIT_ULL(9));
+ COPY_REG(rdx, BIT_ULL(2));
+
+ /*
+ * Set the status code according to GHCI spec, although the vCPU may
+ * not return back to guest.
+ */
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+
+ /* Forward request to userspace. */
+ return 0;
+}
+
static int handle_tdvmcall(struct kvm_vcpu *vcpu)
{
if (tdvmcall_exit_type(vcpu))
@@ -1101,6 +1149,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
switch (tdvmcall_leaf(vcpu)) {
case TDVMCALL_MAP_GPA:
return tdx_map_gpa(vcpu);
+ case TDVMCALL_REPORT_FATAL_ERROR:
+ return tdx_report_fatal_error(vcpu);
default:
break;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 637efc055145..c173c8dfcf83 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -375,6 +375,7 @@ struct kvm_run {
#define KVM_SYSTEM_EVENT_WAKEUP 4
#define KVM_SYSTEM_EVENT_SUSPEND 5
#define KVM_SYSTEM_EVENT_SEV_TERM 6
+#define KVM_SYSTEM_EVENT_TDX_FATAL 7
__u32 type;
__u32 ndata;
union {
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
` (4 preceding siblings ...)
2024-12-01 3:53 ` [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-10 9:42 ` Chao Gao
2024-12-01 3:53 ` [PATCH 7/7] KVM: TDX: Handle TDX PV MMIO hypercall Binbin Wu
2024-12-10 18:24 ` [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Paolo Bonzini
7 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
From: Isaku Yamahata <isaku.yamahata@intel.com>
Emulate port I/O requested by TDX guest via TDVMCALL with leaf
Instruction.IO (same value as EXIT_REASON_IO_INSTRUCTION) according to
TDX Guest Host Communication Interface (GHCI).
All port I/O instructions inside the TDX guest trigger the #VE exception.
On #VE triggered by I/O instructions, TDX guest can call TDVMCALL with
leaf Instruction.IO to request VMM to emulate I/O instructions.
Similar to normal port I/O emulation, try to handle the port I/O in kernel
first, if kernel can't support it, forward the request to userspace.
Note string I/O operations are not supported in TDX. Guest should unroll them
before calling the TDVMCALL.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
Hypercalls exit to userspace breakout:
- Renamed from "KVM: TDX: Handle TDX PV port io hypercall" to
"KVM: TDX: Handle TDX PV port I/O hypercall".
- Update changelog.
- Add missing curly brackets.
- Move reset of pio.count to tdx_complete_pio_out() and remove the stale
comment. (binbin)
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
- Set status code to TDVMCALL_STATUS_SUCCESS when PIO is handled in kernel.
- Don't write to R11 when it is a write operation for output.
v18:
- Fix out case to set R10 and R11 correctly when user space handled port
out.
---
arch/x86/kvm/vmx/tdx.c | 66 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a79f9ca962d1..495991407a95 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1141,6 +1141,70 @@ static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
return 0;
}
+static int tdx_complete_pio_out(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.pio.count = 0;
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+ return 1;
+}
+
+static int tdx_complete_pio_in(struct kvm_vcpu *vcpu)
+{
+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+ unsigned long val = 0;
+ int ret;
+
+ ret = ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size,
+ vcpu->arch.pio.port, &val, 1);
+
+ WARN_ON_ONCE(!ret);
+
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+ tdvmcall_set_return_val(vcpu, val);
+
+ return 1;
+}
+
+static int tdx_emulate_io(struct kvm_vcpu *vcpu)
+{
+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+ unsigned long val = 0;
+ unsigned int port;
+ int size, ret;
+ bool write;
+
+ ++vcpu->stat.io_exits;
+
+ size = tdvmcall_a0_read(vcpu);
+ write = tdvmcall_a1_read(vcpu);
+ port = tdvmcall_a2_read(vcpu);
+
+ if (size != 1 && size != 2 && size != 4) {
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
+ return 1;
+ }
+
+ if (write) {
+ val = tdvmcall_a3_read(vcpu);
+ ret = ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1);
+ } else {
+ ret = ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1);
+ }
+
+ if (ret) {
+ if (!write)
+ tdvmcall_set_return_val(vcpu, val);
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+ } else {
+ if (write)
+ vcpu->arch.complete_userspace_io = tdx_complete_pio_out;
+ else
+ vcpu->arch.complete_userspace_io = tdx_complete_pio_in;
+ }
+
+ return ret;
+}
+
static int handle_tdvmcall(struct kvm_vcpu *vcpu)
{
if (tdvmcall_exit_type(vcpu))
@@ -1151,6 +1215,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
return tdx_map_gpa(vcpu);
case TDVMCALL_REPORT_FATAL_ERROR:
return tdx_report_fatal_error(vcpu);
+ case EXIT_REASON_IO_INSTRUCTION:
+ return tdx_emulate_io(vcpu);
default:
break;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 7/7] KVM: TDX: Handle TDX PV MMIO hypercall
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
` (5 preceding siblings ...)
2024-12-01 3:53 ` [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall Binbin Wu
@ 2024-12-01 3:53 ` Binbin Wu
2024-12-10 18:24 ` [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Paolo Bonzini
7 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-01 3:53 UTC (permalink / raw)
To: pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
xiaoyao.li, tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao,
michael.roth, linux-kernel, binbin.wu
From: Sean Christopherson <sean.j.christopherson@intel.com>
Handle TDX PV MMIO hypercall when TDX guest calls TDVMCALL with the
leaf #VE.RequestMMIO (same value as EXIT_REASON_EPT_VIOLATION) according
to TDX Guest Host Communication Interface (GHCI) spec.
For TDX, VMM is not allowed to access vCPU registers or TD's private
memory, where all executed instruction are. So MMIO emulation implemented
for non-TDX VMs is not possible for TDX guest.
In TDX the MMIO regions are instead configured by VMM to trigger a #VE
exception in the guest. The #VE handling is supposed to emulate the MMIO
instruction inside the guest and convert it into a TDVMCALL with the
leaf #VE.RequestMMIO, which equals to EXIT_REASON_EPT_VIOLATION.
The requested MMIO address must be in shared GPA space. The shared bit
is stripped after check because the existing code for MMIO emulation is
not aware of the shared bit.
The MMIO GPA shouldn't have a valid memslot, also the attribute of the GPA
should be shared. KVM could do the checks before exiting to userspace,
however, even if KVM does the check, there still will be race conditions
between the check in KVM and the emulation of MMIO access in userspace due
to a memslot hotplug, or a memory attribute conversion. If userspace
doesn't check the attribute of the GPA and the attribute happens to be
private, it will not pose a security risk or cause an MCE, but it can lead
to another issue. E.g., in QEMU, treating a GPA with private attribute as
shared when it falls within RAM's range can result in extra memory
consumption during the emulation to the access to the HVA of the GPA.
There are two options: 1) Do the check both in KVM and userspace. 2) Do
the check only in QEMU. This patch chooses option 2, i.e. KVM omits the
memslot and attribute checks, and expects userspace to do the checks.
Similar to normal MMIO emulation, try to handle the MMIO in kernel first,
if kernel can't support it, forward the request to userspace. Export
needed symbols used for MMIO handling.
Fragments handling is not needed for TDX PV MMIO because GPA is provided,
if a MMIO access crosses page boundary, it should be continuous in GPA.
Also, the size is limited to 1, 2, 4, 8 bytes. No further split needed.
Allow cross page access because no extra handling needed after checking
both start and end GPA are shared GPAs.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
Hypercalls exit to userspace breakout:
- Update the changelog.
- Remove the check of memslot for GPA.
- Allow MMIO access crossing page boundary.
- Move the tracepoint for KVM_TRACE_MMIO_WRITE earlier so the tracepoint handles
the cases both for kernel and userspace. (Isaku)
- Set TDVMCALL return code when back from userspace, which is missing in v19.
- Move fast MMIO write into tdx_mmio_write()
- Check GPA is shared GPA. (Binbin)
- Remove extra check for size > 8u. (Binbin)
- Removed KVM_BUG_ON() in tdx_complete_mmio() and tdx_emulate_mmio()
- Removed vcpu->mmio_needed code since it's not used after removing
KVM_BUG_ON().
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
- Use vt_is_tdx_private_gpa()
---
arch/x86/kvm/vmx/tdx.c | 109 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 1 +
virt/kvm/kvm_main.c | 1 +
3 files changed, 111 insertions(+)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 495991407a95..50cfc795f01f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1205,6 +1205,113 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
return ret;
}
+static int tdx_complete_mmio(struct kvm_vcpu *vcpu)
+{
+ unsigned long val = 0;
+ gpa_t gpa;
+ int size;
+
+ if (!vcpu->mmio_is_write) {
+ gpa = vcpu->mmio_fragments[0].gpa;
+ size = vcpu->mmio_fragments[0].len;
+
+ memcpy(&val, vcpu->run->mmio.data, size);
+ tdvmcall_set_return_val(vcpu, val);
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+ }
+
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+ return 1;
+}
+
+static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size,
+ unsigned long val)
+{
+ if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+ trace_kvm_fast_mmio(gpa);
+ return 0;
+ }
+
+ trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val);
+ if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+ kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size)
+{
+ unsigned long val;
+
+ if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+ kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+ return -EOPNOTSUPP;
+
+ tdvmcall_set_return_val(vcpu, val);
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+ return 0;
+}
+
+static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
+{
+ int size, write, r;
+ unsigned long val;
+ gpa_t gpa;
+
+ size = tdvmcall_a0_read(vcpu);
+ write = tdvmcall_a1_read(vcpu);
+ gpa = tdvmcall_a2_read(vcpu);
+ val = write ? tdvmcall_a3_read(vcpu) : 0;
+
+ if (size != 1 && size != 2 && size != 4 && size != 8)
+ goto error;
+ if (write != 0 && write != 1)
+ goto error;
+
+ /*
+ * TDG.VP.VMCALL<MMIO> allows only shared GPA, it makes no sense to
+ * do MMIO emulation for private GPA.
+ */
+ if (vt_is_tdx_private_gpa(vcpu->kvm, gpa) ||
+ vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1))
+ goto error;
+
+ gpa = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));
+
+ if (write)
+ r = tdx_mmio_write(vcpu, gpa, size, val);
+ else
+ r = tdx_mmio_read(vcpu, gpa, size);
+ if (!r) {
+ /* Kernel completed device emulation. */
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
+ return 1;
+ }
+
+ /* Request the device emulation to userspace device model. */
+ vcpu->mmio_is_write = write;
+ vcpu->arch.complete_userspace_io = tdx_complete_mmio;
+
+ vcpu->run->mmio.phys_addr = gpa;
+ vcpu->run->mmio.len = size;
+ vcpu->run->mmio.is_write = write;
+ vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+ if (write) {
+ memcpy(vcpu->run->mmio.data, &val, size);
+ } else {
+ vcpu->mmio_fragments[0].gpa = gpa;
+ vcpu->mmio_fragments[0].len = size;
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL);
+ }
+ return 0;
+
+error:
+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
+ return 1;
+}
+
static int handle_tdvmcall(struct kvm_vcpu *vcpu)
{
if (tdvmcall_exit_type(vcpu))
@@ -1217,6 +1324,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
return tdx_report_fatal_error(vcpu);
case EXIT_REASON_IO_INSTRUCTION:
return tdx_emulate_io(vcpu);
+ case EXIT_REASON_EPT_VIOLATION:
+ return tdx_emulate_mmio(vcpu);
default:
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2eb660fed754..e155ae90e9fa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13987,6 +13987,7 @@ EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5901d03e372c..dc735d7b511b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5803,6 +5803,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
r = __kvm_io_bus_read(vcpu, bus, &range, val);
return r < 0 ? r : 0;
}
+EXPORT_SYMBOL_GPL(kvm_io_bus_read);
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev)
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-01 3:53 ` [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
@ 2024-12-06 9:31 ` Xu Yilun
2024-12-06 9:37 ` Binbin Wu
2024-12-10 9:05 ` Chao Gao
2024-12-13 9:40 ` Xiaoyao Li
2 siblings, 1 reply; 37+ messages in thread
From: Xu Yilun @ 2024-12-06 9:31 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, chao.gao, michael.roth, linux-kernel
> +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
> +{
> + u64 reg_mask = kvm_rcx_read(vcpu);
> + u64* opt_regs;
> +
> + /*
> + * Skip sanity checks and let userspace decide what to do if sanity
> + * checks fail.
> + */
> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
> + vcpu->run->system_event.ndata = 10;
> + /* Error codes. */
> + vcpu->run->system_event.data[0] = tdvmcall_a0_read(vcpu);
> + /* GPA of additional information page. */
> + vcpu->run->system_event.data[1] = tdvmcall_a1_read(vcpu);
> + /* Information passed via registers (up to 64 bytes). */
> + opt_regs = &vcpu->run->system_event.data[2];
> +
> +#define COPY_REG(REG, MASK) \
> + do { \
> + if (reg_mask & MASK) \
> + *opt_regs = kvm_ ## REG ## _read(vcpu); \
> + else \
> + *opt_regs = 0; \
> + opt_regs++; \
> + } while (0)
> +
> + /* The order is defined in GHCI. */
> + COPY_REG(r14, BIT_ULL(14));
> + COPY_REG(r15, BIT_ULL(15));
> + COPY_REG(rbx, BIT_ULL(3));
> + COPY_REG(rdi, BIT_ULL(7));
> + COPY_REG(rsi, BIT_ULL(6));
> + COPY_REG(r8, BIT_ULL(8));
> + COPY_REG(r9, BIT_ULL(9));
> + COPY_REG(rdx, BIT_ULL(2));
Nit:
#undef COPY_REG
Thanks,
Yilun
> +
> + /*
> + * Set the status code according to GHCI spec, although the vCPU may
> + * not return back to guest.
> + */
> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
> +
> + /* Forward request to userspace. */
> + return 0;
> +}
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-06 9:31 ` Xu Yilun
@ 2024-12-06 9:37 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-06 9:37 UTC (permalink / raw)
To: Xu Yilun
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, chao.gao, michael.roth, linux-kernel
On 12/6/2024 5:31 PM, Xu Yilun wrote:
>> +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
>> +{
>> + u64 reg_mask = kvm_rcx_read(vcpu);
>> + u64* opt_regs;
>> +
>> + /*
>> + * Skip sanity checks and let userspace decide what to do if sanity
>> + * checks fail.
>> + */
>> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
>> + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
>> + vcpu->run->system_event.ndata = 10;
>> + /* Error codes. */
>> + vcpu->run->system_event.data[0] = tdvmcall_a0_read(vcpu);
>> + /* GPA of additional information page. */
>> + vcpu->run->system_event.data[1] = tdvmcall_a1_read(vcpu);
>> + /* Information passed via registers (up to 64 bytes). */
>> + opt_regs = &vcpu->run->system_event.data[2];
>> +
>> +#define COPY_REG(REG, MASK) \
>> + do { \
>> + if (reg_mask & MASK) \
>> + *opt_regs = kvm_ ## REG ## _read(vcpu); \
>> + else \
>> + *opt_regs = 0; \
>> + opt_regs++; \
>> + } while (0)
>> +
>> + /* The order is defined in GHCI. */
>> + COPY_REG(r14, BIT_ULL(14));
>> + COPY_REG(r15, BIT_ULL(15));
>> + COPY_REG(rbx, BIT_ULL(3));
>> + COPY_REG(rdi, BIT_ULL(7));
>> + COPY_REG(rsi, BIT_ULL(6));
>> + COPY_REG(r8, BIT_ULL(8));
>> + COPY_REG(r9, BIT_ULL(9));
>> + COPY_REG(rdx, BIT_ULL(2));
> Nit:
>
> #undef COPY_REG
Thanks for catching it!
>
> Thanks,
> Yilun
>
>> +
>> + /*
>> + * Set the status code according to GHCI spec, although the vCPU may
>> + * not return back to guest.
>> + */
>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
>> +
>> + /* Forward request to userspace. */
>> + return 0;
>> +}
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL
2024-12-01 3:53 ` [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Binbin Wu
@ 2024-12-09 2:58 ` Chao Gao
2024-12-09 3:08 ` Binbin Wu
0 siblings, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-09 2:58 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On Sun, Dec 01, 2024 at 11:53:52AM +0800, Binbin Wu wrote:
>From: Isaku Yamahata <isaku.yamahata@intel.com>
>
>Handle KVM hypercall for TDX according to TDX Guest-Host Communication
>Interface (GHCI) specification.
>
>The TDX GHCI specification defines the ABI for the guest TD to issue
>hypercalls. When R10 is non-zero, it indicates the TDG.VP.VMCALL is
>vendor-specific. KVM uses R10 as KVM hypercall number and R11-R14
>as 4 arguments, while the error code is returned in R10. Follow the
>ABI and handle the KVM hypercall for TDX.
>
>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>---
>Hypercalls exit to userspace breakout:
>- Renamed from "KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL" to
> "KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL".
>- Update the change log.
>- Rebased on Sean's "Prep KVM hypercall handling for TDX" patch set.
> https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com
>- Use the right register (i.e. R10) to set the return code after returning
> back from userspace.
>---
> arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>index 19fd8a5dabd0..4cc55b120ab0 100644
>--- a/arch/x86/kvm/vmx/tdx.c
>+++ b/arch/x86/kvm/vmx/tdx.c
>@@ -957,8 +957,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
> return 0;
> }
>
>+
>+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
>+{
>+ kvm_r10_write(vcpu, vcpu->run->hypercall.ret);
Use tdvmcall_set_return_code() here? it would be more self-explanatory.
>+ return 1;
>+}
>+
>+static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
>+{
>+ int r;
>+
>+ /*
>+ * ABI for KVM tdvmcall argument:
>+ * In Guest-Hypervisor Communication Interface(GHCI) specification,
>+ * Non-zero leaf number (R10 != 0) is defined to indicate
>+ * vendor-specific. KVM uses this for KVM hypercall. NOTE: KVM
>+ * hypercall number starts from one. Zero isn't used for KVM hypercall
>+ * number.
>+ *
>+ * R10: KVM hypercall number
>+ * arguments: R11, R12, R13, R14.
>+ */
>+ r = __kvm_emulate_hypercall(vcpu, r10, r11, r12, r13, r14, true, 0,
note r10-14 are not declared in this function.
>+ complete_hypercall_exit);
>+
>+ return r > 0;
>+}
>+
> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
> {
>+ if (tdvmcall_exit_type(vcpu))
>+ return tdx_emulate_vmcall(vcpu);
>+
> switch (tdvmcall_leaf(vcpu)) {
> default:
> break;
>--
>2.46.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL
2024-12-09 2:58 ` Chao Gao
@ 2024-12-09 3:08 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-09 3:08 UTC (permalink / raw)
To: Chao Gao
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On 12/9/2024 10:58 AM, Chao Gao wrote:
> On Sun, Dec 01, 2024 at 11:53:52AM +0800, Binbin Wu wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Handle KVM hypercall for TDX according to TDX Guest-Host Communication
>> Interface (GHCI) specification.
>>
>> The TDX GHCI specification defines the ABI for the guest TD to issue
>> hypercalls. When R10 is non-zero, it indicates the TDG.VP.VMCALL is
>> vendor-specific. KVM uses R10 as KVM hypercall number and R11-R14
>> as 4 arguments, while the error code is returned in R10. Follow the
>> ABI and handle the KVM hypercall for TDX.
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>> Hypercalls exit to userspace breakout:
>> - Renamed from "KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL" to
>> "KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL".
>> - Update the change log.
>> - Rebased on Sean's "Prep KVM hypercall handling for TDX" patch set.
>> https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com
>> - Use the right register (i.e. R10) to set the return code after returning
>> back from userspace.
>> ---
>> arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++
>> 1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>> index 19fd8a5dabd0..4cc55b120ab0 100644
>> --- a/arch/x86/kvm/vmx/tdx.c
>> +++ b/arch/x86/kvm/vmx/tdx.c
>> @@ -957,8 +957,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
>> return 0;
>> }
>>
>> +
>> +static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
>> +{
>> + kvm_r10_write(vcpu, vcpu->run->hypercall.ret);
> Use tdvmcall_set_return_code() here? it would be more self-explanatory.
Yes, it's better.
Thanks!
>
>> + return 1;
>> +}
>> +
>> +static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
>> +{
>> + int r;
>> +
>> + /*
>> + * ABI for KVM tdvmcall argument:
>> + * In Guest-Hypervisor Communication Interface(GHCI) specification,
>> + * Non-zero leaf number (R10 != 0) is defined to indicate
>> + * vendor-specific. KVM uses this for KVM hypercall. NOTE: KVM
>> + * hypercall number starts from one. Zero isn't used for KVM hypercall
>> + * number.
>> + *
>> + * R10: KVM hypercall number
>> + * arguments: R11, R12, R13, R14.
>> + */
>> + r = __kvm_emulate_hypercall(vcpu, r10, r11, r12, r13, r14, true, 0,
> note r10-14 are not declared in this function.
__kvm_emulate_hypercall() is a macro, so these will be replaced by
kvm_{r10, r11, r12, r13, r14}_read().
>
>> + complete_hypercall_exit);
>> +
>> + return r > 0;
>> +}
>> +
>> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
>> {
>> + if (tdvmcall_exit_type(vcpu))
>> + return tdx_emulate_vmcall(vcpu);
>> +
>> switch (tdvmcall_leaf(vcpu)) {
>> default:
>> break;
>> --
>> 2.46.0
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
@ 2024-12-09 11:21 ` Chao Gao
2024-12-10 2:14 ` Binbin Wu
2024-12-13 8:57 ` Xiaoyao Li
2025-01-22 12:50 ` Paolo Bonzini
2 siblings, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-09 11:21 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On Sun, Dec 01, 2024 at 11:53:50AM +0800, Binbin Wu wrote:
>From: Isaku Yamahata <isaku.yamahata@intel.com>
>
>Introduce the wiring for handling TDX VM exits by implementing the
>callbacks .get_exit_info(), and .handle_exit(). Additionally, add
>error handling during the TDX VM exit flow, and add a place holder
>to handle various exit reasons. Add helper functions to retrieve
>exit information, exit qualifications, and more.
>
>Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.*
>operations for secure EPT or TD EPOCH. If contention occurs, the return
>value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU
>to attempt re-entry into the guest via the fast path.
>
>Error Handling: The following scenarios will return to userspace with
>KVM_EXIT_INTERNAL_ERROR.
>- TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the
> CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX
> isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM
> firmware is not loaded or disabled.
>- TDX_ERROR: This indicates some check failed in the TDX module, preventing
> the vCPU from running.
>- TDX_NON_RECOVERABLE: Set by the TDX module when the error is
> non-recoverable, indicating that the TDX guest is dead or the vCPU is
> disabled. This also covers failed_vmentry case, which must have
> TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled.
> An exception is the triple fault, which also sets TDX_NON_RECOVERABLE
> but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX
> case.
>- Any unhandled VM exit reason will also return to userspace with
> KVM_EXIT_INTERNAL_ERROR.
>
>Suggested-by: Sean Christopherson <seanjc@google.com>
>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
[..]
> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> {
> struct vcpu_tdx *tdx = to_tdx(vcpu);
>@@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED;
>
> vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
>+
>+ if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR))
>+ return EXIT_FASTPATH_NONE;
>+
>+ if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY)))
>+ kvm_machine_check();
I was wandering if EXIT_REASON_MCE_DURING_VMENTRY should be handled in the
switch-case in tdx_handle_exit() because I saw there is a dedicated handler
for VMX. But looks EXIT_REASON_MCE_DURING_VMENTRY is a kind of VMentry
failure. So, it won't reach that switch-case. And, VMX's handler for
EXIT_REASON_MCE_DURING_VMENTRY is actually dead code and can be removed.
>+
> trace_kvm_exit(vcpu, KVM_ISA_VMX);
>
>- return EXIT_FASTPATH_NONE;
>+ if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry))
>+ return EXIT_FASTPATH_NONE;
>+
>+ return tdx_exit_handlers_fastpath(vcpu);
>+}
>+
>+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
>+{
>+ vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
>+ vcpu->mmio_needed = 0;
>+ return 0;
> }
>
> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
>@@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
> }
>
>+int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
>+{
>+ struct vcpu_tdx *tdx = to_tdx(vcpu);
>+ u64 vp_enter_ret = tdx->vp_enter_ret;
>+ union vmx_exit_reason exit_reason;
>+
>+ if (fastpath != EXIT_FASTPATH_NONE)
>+ return 1;
>+
>+ /*
>+ * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
>+ * TDX_SEAMCALL_VMFAILINVALID.
>+ */
>+ if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
>+ KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
>+ goto unhandled_exit;
>+ }
>+
>+ /*
>+ * Without off-TD debug enabled, failed_vmentry case must have
>+ * TDX_NON_RECOVERABLE set.
>+ */
>+ if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
>+ /* Triple fault is non-recoverable. */
>+ if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT)))
>+ return tdx_handle_triple_fault(vcpu);
>+
>+ kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n",
>+ vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
>+ set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
>+ goto unhandled_exit;
>+ }
>+
>+ /* From now, the seamcall status should be TDX_SUCCESS. */
>+ WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS);
>+ exit_reason = tdexit_exit_reason(vcpu);
>+
>+ switch (exit_reason.basic) {
>+ default:
>+ break;
>+ }
>+
>+unhandled_exit:
>+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
>+ vcpu->run->internal.ndata = 2;
>+ vcpu->run->internal.data[0] = vp_enter_ret;
>+ vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
>+ return 0;
>+}
>+
>+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
>+ u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
>+{
>+ struct vcpu_tdx *tdx = to_tdx(vcpu);
>+
>+ if (tdx_has_exit_reason(vcpu)) {
>+ /*
>+ * Encode some useful info from the the 64 bit return code
>+ * into the 32 bit exit 'reason'. If the VMX exit reason is
>+ * valid, just set it to those bits.
>+ */
>+ *reason = (u32)tdx->vp_enter_ret;
>+ *info1 = tdexit_exit_qual(vcpu);
>+ *info2 = tdexit_ext_exit_qual(vcpu);
>+ } else {
>+ /*
>+ * When the VMX exit reason in vp_enter_ret is not valid,
>+ * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to
>+ * mean the vmexit code is not valid. Set the other bits to
>+ * try to avoid picking a value that may someday be a valid
>+ * VMX exit code.
>+ */
>+ *reason = 0xFFFFFFFF;
>+ *info1 = 0;
>+ *info2 = 0;
>+ }
>+
>+ *intr_info = tdexit_intr_info(vcpu);
If there is no valid exit reason, shouldn't intr_info be set to 0?
>+ *error_code = 0;
>+}
>+
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL)
2024-12-01 3:53 ` [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Binbin Wu
@ 2024-12-09 11:28 ` Chao Gao
2024-12-10 2:34 ` Binbin Wu
0 siblings, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-09 11:28 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On Sun, Dec 01, 2024 at 11:53:51AM +0800, Binbin Wu wrote:
>From: Isaku Yamahata <isaku.yamahata@intel.com>
>
>Add a place holder and related helper functions for preparation of
>TDG.VP.VMCALL handling.
>
>The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short)
>for the guest TD to call hypercall to VMM. When the guest TD issues a
>TDVMCALL, the guest TD exits to VMM with a new exit reason. The arguments
>from the guest TD and returned values from the VMM are passed in the guest
>registers. The guest RCX register indicates which registers are used.
>Define helper functions to access those registers.
>
>A new VMX exit reason TDCALL is added to indicate the exit is due to TDVMCALL
>from the guest TD. Define the TDCALL exit reason and add a place holder to
>handle such exit.
>
>Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
>Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
>---
>Hypercalls exit to userspace breakout:
>- Update changelog.
>- Drop the unused tdx->tdvmcall. (Chao)
>- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
>---
> arch/x86/include/uapi/asm/vmx.h | 4 ++-
> arch/x86/kvm/vmx/tdx.c | 48 +++++++++++++++++++++++++++++++++
> 2 files changed, 51 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
>index a5faf6d88f1b..6a9f268a2d2c 100644
>--- a/arch/x86/include/uapi/asm/vmx.h
>+++ b/arch/x86/include/uapi/asm/vmx.h
>@@ -92,6 +92,7 @@
> #define EXIT_REASON_TPAUSE 68
> #define EXIT_REASON_BUS_LOCK 74
> #define EXIT_REASON_NOTIFY 75
>+#define EXIT_REASON_TDCALL 77
>
> #define VMX_EXIT_REASONS \
> { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
>@@ -155,7 +156,8 @@
> { EXIT_REASON_UMWAIT, "UMWAIT" }, \
> { EXIT_REASON_TPAUSE, "TPAUSE" }, \
> { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \
>- { EXIT_REASON_NOTIFY, "NOTIFY" }
>+ { EXIT_REASON_NOTIFY, "NOTIFY" }, \
>+ { EXIT_REASON_TDCALL, "TDCALL" }
Side topic:
Strictly speaking, TDCALL vm-exit handling can happen for normal VMs. so, KVM may
need to handle it by injecting #UD. Of course, it is not necessary for this series.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-01 3:53 ` [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Binbin Wu
@ 2024-12-09 12:45 ` Chao Gao
2024-12-10 2:51 ` Binbin Wu
2024-12-13 9:32 ` Xiaoyao Li
1 sibling, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-09 12:45 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
>+/*
>+ * Split into chunks and check interrupt pending between chunks. This allows
>+ * for timely injection of interrupts to prevent issues with guest lockup
>+ * detection.
Would it cause any problems if an (intra-host or inter-host) migration happens
between chunks?
My understanding is that KVM would lose track of the progress if
map_gpa_next/end are not migrated. I'm not sure if KVM should expose the
state or prevent migration in the middle. Or, we can let the userspace VMM
cut the range into chunks, making it the userspace VMM's responsibility to
ensure necessary state is migrated.
I am not asking to fix this issue right now. I just want to ensure this issue
can be solved in a clean way when we start to support migration.
>+ */
>+#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024)
>+static void __tdx_map_gpa(struct vcpu_tdx * tdx);
>+
>+static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
>+{
>+ struct vcpu_tdx * tdx = to_tdx(vcpu);
>+
>+ if(vcpu->run->hypercall.ret) {
>+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
>+ kvm_r11_write(vcpu, tdx->map_gpa_next);
s/kvm_r11_write/tdvmcall_set_return_val
please fix other call sites in this patch.
>+ return 1;
>+ }
>+
>+ tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN;
>+ if (tdx->map_gpa_next >= tdx->map_gpa_end) {
>+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
>+ return 1;
>+ }
>+
>+ /*
>+ * Stop processing the remaining part if there is pending interrupt.
>+ * Skip checking pending virtual interrupt (reflected by
>+ * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because
>+ * if guest disabled interrupt, it's OK not returning back to guest
>+ * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA
>+ * immediately after STI or MOV/POP SS.
>+ */
>+ if (pi_has_pending_interrupt(vcpu) ||
>+ kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
>+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
>+ kvm_r11_write(vcpu, tdx->map_gpa_next);
>+ return 1;
>+ }
>+
>+ __tdx_map_gpa(tdx);
>+ /* Forward request to userspace. */
>+ return 0;
>+}
>+
>+static void __tdx_map_gpa(struct vcpu_tdx * tdx)
>+{
>+ u64 gpa = tdx->map_gpa_next;
>+ u64 size = tdx->map_gpa_end - tdx->map_gpa_next;
>+
>+ if(size > TDX_MAP_GPA_MAX_LEN)
>+ size = TDX_MAP_GPA_MAX_LEN;
>+
>+ tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
>+ tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>+ tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
>+ tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
>+ tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ?
>+ KVM_MAP_GPA_RANGE_ENCRYPTED :
>+ KVM_MAP_GPA_RANGE_DECRYPTED;
>+ tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>+
>+ tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa;
>+}
>+
>+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>+{
>+ struct vcpu_tdx * tdx = to_tdx(vcpu);
>+ u64 gpa = tdvmcall_a0_read(vcpu);
>+ u64 size = tdvmcall_a1_read(vcpu);
>+ u64 ret;
>+
>+ /*
>+ * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
>+ * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
>+ * bit set. If not, the error code is not defined in GHCI for TDX, use
>+ * TDVMCALL_STATUS_INVALID_OPERAND for this case.
>+ */
>+ if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
>+ ret = TDVMCALL_STATUS_INVALID_OPERAND;
>+ goto error;
>+ }
>+
>+ if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
>+ !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) ||
>+ (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
>+ vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) {
>+ ret = TDVMCALL_STATUS_INVALID_OPERAND;
>+ goto error;
>+ }
>+
>+ if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
>+ ret = TDVMCALL_STATUS_ALIGN_ERROR;
>+ goto error;
>+ }
>+
>+ tdx->map_gpa_end = gpa + size;
>+ tdx->map_gpa_next = gpa;
>+
>+ __tdx_map_gpa(tdx);
>+ /* Forward request to userspace. */
>+ return 0;
>+
>+error:
>+ tdvmcall_set_return_code(vcpu, ret);
>+ kvm_r11_write(vcpu, gpa);
>+ return 1;
>+}
>+
> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
> {
> if (tdvmcall_exit_type(vcpu))
> return tdx_emulate_vmcall(vcpu);
>
> switch (tdvmcall_leaf(vcpu)) {
>+ case TDVMCALL_MAP_GPA:
>+ return tdx_map_gpa(vcpu);
> default:
> break;
> }
>diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
>index 1abc94b046a0..bfae70887695 100644
>--- a/arch/x86/kvm/vmx/tdx.h
>+++ b/arch/x86/kvm/vmx/tdx.h
>@@ -71,6 +71,9 @@ struct vcpu_tdx {
>
> enum tdx_prepare_switch_state prep_switch_state;
> u64 msr_host_kernel_gs_base;
>+
>+ u64 map_gpa_next;
>+ u64 map_gpa_end;
> };
>
> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
>--
>2.46.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-09 11:21 ` Chao Gao
@ 2024-12-10 2:14 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-10 2:14 UTC (permalink / raw)
To: Chao Gao
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On 12/9/2024 7:21 PM, Chao Gao wrote:
> On Sun, Dec 01, 2024 at 11:53:50AM +0800, Binbin Wu wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Introduce the wiring for handling TDX VM exits by implementing the
>> callbacks .get_exit_info(), and .handle_exit(). Additionally, add
>> error handling during the TDX VM exit flow, and add a place holder
>> to handle various exit reasons. Add helper functions to retrieve
>> exit information, exit qualifications, and more.
>>
>> Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.*
>> operations for secure EPT or TD EPOCH. If contention occurs, the return
>> value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU
>> to attempt re-entry into the guest via the fast path.
>>
>> Error Handling: The following scenarios will return to userspace with
>> KVM_EXIT_INTERNAL_ERROR.
>> - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the
>> CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX
>> isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM
>> firmware is not loaded or disabled.
>> - TDX_ERROR: This indicates some check failed in the TDX module, preventing
>> the vCPU from running.
>> - TDX_NON_RECOVERABLE: Set by the TDX module when the error is
>> non-recoverable, indicating that the TDX guest is dead or the vCPU is
>> disabled. This also covers failed_vmentry case, which must have
>> TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled.
>> An exception is the triple fault, which also sets TDX_NON_RECOVERABLE
>> but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX
>> case.
>> - Any unhandled VM exit reason will also return to userspace with
>> KVM_EXIT_INTERNAL_ERROR.
>>
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
>
> [..]
>
>> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
>> {
>> struct vcpu_tdx *tdx = to_tdx(vcpu);
>> @@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
>> tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED;
>>
>> vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
>> +
>> + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR))
>> + return EXIT_FASTPATH_NONE;
>> +
>> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY)))
>> + kvm_machine_check();
> I was wandering if EXIT_REASON_MCE_DURING_VMENTRY should be handled in the
> switch-case in tdx_handle_exit() because I saw there is a dedicated handler
> for VMX. But looks EXIT_REASON_MCE_DURING_VMENTRY is a kind of VMentry
> failure. So, it won't reach that switch-case.
Yes, similar to VMX, for TDX, EXIT_REASON_MCE_DURING_VMENTRY will be captured
by failed_vmentry, which will have TDX_NON_RECOVERABLE set and handled before
the switch-case.
> And, VMX's handler for
> EXIT_REASON_MCE_DURING_VMENTRY is actually dead code and can be removed.
I think so, for VMX, the exit_reason.failed_vmentry case will return to
userspace, it won't reach the handler for EXIT_REASON_MCE_DURING_VMENTRY.
>
>> +
>> trace_kvm_exit(vcpu, KVM_ISA_VMX);
>>
>> - return EXIT_FASTPATH_NONE;
>> + if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry))
>> + return EXIT_FASTPATH_NONE;
>> +
>> + return tdx_exit_handlers_fastpath(vcpu);
>> +}
>> +
>> +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
>> +{
>> + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
>> + vcpu->mmio_needed = 0;
>> + return 0;
>> }
>>
>> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
>> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
>> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
>> }
>>
>> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
>> +{
>> + struct vcpu_tdx *tdx = to_tdx(vcpu);
>> + u64 vp_enter_ret = tdx->vp_enter_ret;
>> + union vmx_exit_reason exit_reason;
>> +
>> + if (fastpath != EXIT_FASTPATH_NONE)
>> + return 1;
>> +
>> + /*
>> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
>> + * TDX_SEAMCALL_VMFAILINVALID.
>> + */
>> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
>> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
>> + goto unhandled_exit;
>> + }
>> +
>> + /*
>> + * Without off-TD debug enabled, failed_vmentry case must have
>> + * TDX_NON_RECOVERABLE set.
>> + */
>> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
>> + /* Triple fault is non-recoverable. */
>> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT)))
>> + return tdx_handle_triple_fault(vcpu);
>> +
>> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n",
>> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
>> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
>> + goto unhandled_exit;
>> + }
>> +
>> + /* From now, the seamcall status should be TDX_SUCCESS. */
>> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS);
>> + exit_reason = tdexit_exit_reason(vcpu);
>> +
>> + switch (exit_reason.basic) {
>> + default:
>> + break;
>> + }
>> +
>> +unhandled_exit:
>> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
>> + vcpu->run->internal.ndata = 2;
>> + vcpu->run->internal.data[0] = vp_enter_ret;
>> + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
>> + return 0;
>> +}
>> +
>> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
>> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
>> +{
>> + struct vcpu_tdx *tdx = to_tdx(vcpu);
>> +
>> + if (tdx_has_exit_reason(vcpu)) {
>> + /*
>> + * Encode some useful info from the the 64 bit return code
>> + * into the 32 bit exit 'reason'. If the VMX exit reason is
>> + * valid, just set it to those bits.
>> + */
>> + *reason = (u32)tdx->vp_enter_ret;
>> + *info1 = tdexit_exit_qual(vcpu);
>> + *info2 = tdexit_ext_exit_qual(vcpu);
>> + } else {
>> + /*
>> + * When the VMX exit reason in vp_enter_ret is not valid,
>> + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to
>> + * mean the vmexit code is not valid. Set the other bits to
>> + * try to avoid picking a value that may someday be a valid
>> + * VMX exit code.
>> + */
>> + *reason = 0xFFFFFFFF;
>> + *info1 = 0;
>> + *info2 = 0;
>> + }
>> +
>> + *intr_info = tdexit_intr_info(vcpu);
> If there is no valid exit reason, shouldn't intr_info be set to 0?
Yes. Will fix it.
Thanks!
>
>> + *error_code = 0;
>> +}
>> +
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL)
2024-12-09 11:28 ` Chao Gao
@ 2024-12-10 2:34 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-10 2:34 UTC (permalink / raw)
To: Chao Gao
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On 12/9/2024 7:28 PM, Chao Gao wrote:
[...]
>>
>> #define VMX_EXIT_REASONS \
>> { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
>> @@ -155,7 +156,8 @@
>> { EXIT_REASON_UMWAIT, "UMWAIT" }, \
>> { EXIT_REASON_TPAUSE, "TPAUSE" }, \
>> { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \
>> - { EXIT_REASON_NOTIFY, "NOTIFY" }
>> + { EXIT_REASON_NOTIFY, "NOTIFY" }, \
>> + { EXIT_REASON_TDCALL, "TDCALL" }
> Side topic:
> Strictly speaking, TDCALL vm-exit handling can happen for normal VMs.
Oh, yes. TDX CPU architectural specification, TDCALL in VMX non-root mode
causes VM exit with exit reason TDCALL. So, normal VM could exit with TDCALL.
> so, KVM may
> need to handle it by injecting #UD. Of course, it is not necessary for this series.
Currently, the handling of TDCALL for VMX VMs will return to userspace
with KVM_EXIT_INTERNAL_ERROR + KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON.
Since it's not an expected VM Exit reason for normal VMs, maybe it doesn't
worth a dedicated handler?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-09 12:45 ` Chao Gao
@ 2024-12-10 2:51 ` Binbin Wu
2024-12-10 9:10 ` Chao Gao
0 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-10 2:51 UTC (permalink / raw)
To: Chao Gao
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On 12/9/2024 8:45 PM, Chao Gao wrote:
>> +/*
>> + * Split into chunks and check interrupt pending between chunks. This allows
>> + * for timely injection of interrupts to prevent issues with guest lockup
>> + * detection.
> Would it cause any problems if an (intra-host or inter-host) migration happens
> between chunks?
>
> My understanding is that KVM would lose track of the progress if
> map_gpa_next/end are not migrated. I'm not sure if KVM should expose the
> state or prevent migration in the middle. Or, we can let the userspace VMM
> cut the range into chunks, making it the userspace VMM's responsibility to
> ensure necessary state is migrated.
>
> I am not asking to fix this issue right now. I just want to ensure this issue
> can be solved in a clean way when we start to support migration.
How about:
Before exiting to userspace, KVM always sets the start GPA to r11 and set
return code to TDVMCALL_STATUS_RETRY.
- If userspace finishes the part, the complete_userspace_io() callback will
be called and the return code will be set to TDVMCALL_STATUS_SUCCESS.
- If the live migration interrupts the MapGAP in the userspace, and
complete_userspace_io() is not called, when the vCPU resumes from migration,
TDX guest will see the return code is TDVMCALL_STATUS_RETRY with the failed
GPA, and it can retry the MapGAP with the failed GAP.
>
>> + */
>> +#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024)
>> +static void __tdx_map_gpa(struct vcpu_tdx * tdx);
>> +
>> +static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
>> +{
>> + struct vcpu_tdx * tdx = to_tdx(vcpu);
>> +
>> + if(vcpu->run->hypercall.ret) {
>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
>> + kvm_r11_write(vcpu, tdx->map_gpa_next);
> s/kvm_r11_write/tdvmcall_set_return_val
>
> please fix other call sites in this patch.
I didn't use tdvmcall_set_return_val() because it's used for TDVMCALL
like MMIO, PIO, etc..., which has a return value.
For MapGPA, it's part of error information returned to TDX guest.
So, I used the raw register name version.
Let's see if others also think tdvmcall_set_return_val() is more clear
for this case.
>
>> + return 1;
>> + }
>> +
>> + tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN;
>> + if (tdx->map_gpa_next >= tdx->map_gpa_end) {
>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
>> + return 1;
>> + }
>> +
>> + /*
>> + * Stop processing the remaining part if there is pending interrupt.
>> + * Skip checking pending virtual interrupt (reflected by
>> + * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because
>> + * if guest disabled interrupt, it's OK not returning back to guest
>> + * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA
>> + * immediately after STI or MOV/POP SS.
>> + */
>> + if (pi_has_pending_interrupt(vcpu) ||
>> + kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
>> + kvm_r11_write(vcpu, tdx->map_gpa_next);
>> + return 1;
>> + }
>> +
>> + __tdx_map_gpa(tdx);
>> + /* Forward request to userspace. */
>> + return 0;
>> +}
>> +
>> +static void __tdx_map_gpa(struct vcpu_tdx * tdx)
>> +{
>> + u64 gpa = tdx->map_gpa_next;
>> + u64 size = tdx->map_gpa_end - tdx->map_gpa_next;
>> +
>> + if(size > TDX_MAP_GPA_MAX_LEN)
>> + size = TDX_MAP_GPA_MAX_LEN;
>> +
>> + tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
>> + tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>> + tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
>> + tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
>> + tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ?
>> + KVM_MAP_GPA_RANGE_ENCRYPTED :
>> + KVM_MAP_GPA_RANGE_DECRYPTED;
>> + tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>> +
>> + tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa;
>> +}
>> +
>> +static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>> +{
>> + struct vcpu_tdx * tdx = to_tdx(vcpu);
>> + u64 gpa = tdvmcall_a0_read(vcpu);
>> + u64 size = tdvmcall_a1_read(vcpu);
>> + u64 ret;
>> +
>> + /*
>> + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
>> + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
>> + * bit set. If not, the error code is not defined in GHCI for TDX, use
>> + * TDVMCALL_STATUS_INVALID_OPERAND for this case.
>> + */
>> + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
>> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
>> + goto error;
>> + }
>> +
>> + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
>> + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) ||
>> + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
>> + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) {
>> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
>> + goto error;
>> + }
>> +
>> + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
>> + ret = TDVMCALL_STATUS_ALIGN_ERROR;
>> + goto error;
>> + }
>> +
>> + tdx->map_gpa_end = gpa + size;
>> + tdx->map_gpa_next = gpa;
>> +
>> + __tdx_map_gpa(tdx);
>> + /* Forward request to userspace. */
>> + return 0;
>> +
>> +error:
>> + tdvmcall_set_return_code(vcpu, ret);
>> + kvm_r11_write(vcpu, gpa);
>> + return 1;
>> +}
>> +
>> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
>> {
>> if (tdvmcall_exit_type(vcpu))
>> return tdx_emulate_vmcall(vcpu);
>>
>> switch (tdvmcall_leaf(vcpu)) {
>> + case TDVMCALL_MAP_GPA:
>> + return tdx_map_gpa(vcpu);
>> default:
>> break;
>> }
>> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
>> index 1abc94b046a0..bfae70887695 100644
>> --- a/arch/x86/kvm/vmx/tdx.h
>> +++ b/arch/x86/kvm/vmx/tdx.h
>> @@ -71,6 +71,9 @@ struct vcpu_tdx {
>>
>> enum tdx_prepare_switch_state prep_switch_state;
>> u64 msr_host_kernel_gs_base;
>> +
>> + u64 map_gpa_next;
>> + u64 map_gpa_end;
>> };
>>
>> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
>> --
>> 2.46.0
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-01 3:53 ` [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
2024-12-06 9:31 ` Xu Yilun
@ 2024-12-10 9:05 ` Chao Gao
2024-12-10 9:43 ` Binbin Wu
2024-12-13 9:40 ` Xiaoyao Li
2 siblings, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-10 9:05 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
>diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>index edc070c6e19b..bb39da72c647 100644
>--- a/Documentation/virt/kvm/api.rst
>+++ b/Documentation/virt/kvm/api.rst
>@@ -6815,6 +6815,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> #define KVM_SYSTEM_EVENT_WAKEUP 4
> #define KVM_SYSTEM_EVENT_SUSPEND 5
> #define KVM_SYSTEM_EVENT_SEV_TERM 6
>+ #define KVM_SYSTEM_EVENT_TDX_FATAL 7
> __u32 type;
> __u32 ndata;
> __u64 data[16];
>@@ -6841,6 +6842,13 @@ Valid values for 'type' are:
> reset/shutdown of the VM.
> - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
> The guest physical address of the guest's GHCB is stored in `data[0]`.
>+ - KVM_SYSTEM_EVENT_TDX_FATAL -- an TDX guest requested termination.
Not sure termination is an accurate interpretation of fatal errors. Maybe
just say: a fatal error reported by a TDX guest.
>+ The error codes of the guest's GHCI is stored in `data[0]`.
what do you mean by "guest's GHCI"?
>+ If the bit 63 of `data[0]` is set, it indicates there is TD specified
>+ additional information provided in a page, which is shared memory. The
>+ guest physical address of the information page is stored in `data[1]`.
>+ An optional error message is provided by `data[2]` ~ `data[9]`, which is
>+ byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled.
> - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
> KVM has recognized a wakeup event. Userspace may honor this event by
> marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-10 2:51 ` Binbin Wu
@ 2024-12-10 9:10 ` Chao Gao
2024-12-10 9:27 ` Tony Lindgren
0 siblings, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-10 9:10 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On Tue, Dec 10, 2024 at 10:51:56AM +0800, Binbin Wu wrote:
>
>
>
>On 12/9/2024 8:45 PM, Chao Gao wrote:
>> > +/*
>> > + * Split into chunks and check interrupt pending between chunks. This allows
>> > + * for timely injection of interrupts to prevent issues with guest lockup
>> > + * detection.
>> Would it cause any problems if an (intra-host or inter-host) migration happens
>> between chunks?
>>
>> My understanding is that KVM would lose track of the progress if
>> map_gpa_next/end are not migrated. I'm not sure if KVM should expose the
>> state or prevent migration in the middle. Or, we can let the userspace VMM
>> cut the range into chunks, making it the userspace VMM's responsibility to
>> ensure necessary state is migrated.
>>
>> I am not asking to fix this issue right now. I just want to ensure this issue
>> can be solved in a clean way when we start to support migration.
>How about:
>Before exiting to userspace, KVM always sets the start GPA to r11 and set
>return code to TDVMCALL_STATUS_RETRY.
>- If userspace finishes the part, the complete_userspace_io() callback will
> be called and the return code will be set to TDVMCALL_STATUS_SUCCESS.
>- If the live migration interrupts the MapGAP in the userspace, and
> complete_userspace_io() is not called, when the vCPU resumes from migration,
> TDX guest will see the return code is TDVMCALL_STATUS_RETRY with the failed
> GPA, and it can retry the MapGAP with the failed GAP.
Sounds good
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-10 9:10 ` Chao Gao
@ 2024-12-10 9:27 ` Tony Lindgren
0 siblings, 0 replies; 37+ messages in thread
From: Tony Lindgren @ 2024-12-10 9:27 UTC (permalink / raw)
To: Chao Gao
Cc: Binbin Wu, pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang,
adrian.hunter, reinette.chatre, xiaoyao.li, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On Tue, Dec 10, 2024 at 05:10:29PM +0800, Chao Gao wrote:
> On Tue, Dec 10, 2024 at 10:51:56AM +0800, Binbin Wu wrote:
> >On 12/9/2024 8:45 PM, Chao Gao wrote:
> >> > +/*
> >> > + * Split into chunks and check interrupt pending between chunks. This allows
> >> > + * for timely injection of interrupts to prevent issues with guest lockup
> >> > + * detection.
> >> Would it cause any problems if an (intra-host or inter-host) migration happens
> >> between chunks?
> >>
> >> My understanding is that KVM would lose track of the progress if
> >> map_gpa_next/end are not migrated. I'm not sure if KVM should expose the
> >> state or prevent migration in the middle. Or, we can let the userspace VMM
> >> cut the range into chunks, making it the userspace VMM's responsibility to
> >> ensure necessary state is migrated.
> >>
> >> I am not asking to fix this issue right now. I just want to ensure this issue
> >> can be solved in a clean way when we start to support migration.
> >How about:
> >Before exiting to userspace, KVM always sets the start GPA to r11 and set
> >return code to TDVMCALL_STATUS_RETRY.
> >- If userspace finishes the part, the complete_userspace_io() callback will
> > be called and the return code will be set to TDVMCALL_STATUS_SUCCESS.
> >- If the live migration interrupts the MapGAP in the userspace, and
> > complete_userspace_io() is not called, when the vCPU resumes from migration,
> > TDX guest will see the return code is TDVMCALL_STATUS_RETRY with the failed
> > GPA, and it can retry the MapGAP with the failed GAP.
>
> Sounds good
Makes sense to me too.
Regards,
Tony
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall
2024-12-01 3:53 ` [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall Binbin Wu
@ 2024-12-10 9:42 ` Chao Gao
2024-12-10 9:50 ` Binbin Wu
0 siblings, 1 reply; 37+ messages in thread
From: Chao Gao @ 2024-12-10 9:42 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
>+static int tdx_emulate_io(struct kvm_vcpu *vcpu)
>+{
>+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>+ unsigned long val = 0;
>+ unsigned int port;
>+ int size, ret;
>+ bool write;
..
>+
>+ ++vcpu->stat.io_exits;
>+
>+ size = tdvmcall_a0_read(vcpu);
>+ write = tdvmcall_a1_read(vcpu);
a1 (i.e., R13) should be either 0 or 1. Other values are reserved according to
the GHCI spec. It is not appropriate to cast it to a boolean. For example, if
R13=2, KVM shouldn't treat it as a write request; instead, this request should
be rejected.
>+ port = tdvmcall_a2_read(vcpu);
>+
>+ if (size != 1 && size != 2 && size != 4) {
>+ tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
>+ return 1;
>+ }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-10 9:05 ` Chao Gao
@ 2024-12-10 9:43 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-10 9:43 UTC (permalink / raw)
To: Chao Gao
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On 12/10/2024 5:05 PM, Chao Gao wrote:
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index edc070c6e19b..bb39da72c647 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6815,6 +6815,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
>> #define KVM_SYSTEM_EVENT_WAKEUP 4
>> #define KVM_SYSTEM_EVENT_SUSPEND 5
>> #define KVM_SYSTEM_EVENT_SEV_TERM 6
>> + #define KVM_SYSTEM_EVENT_TDX_FATAL 7
>> __u32 type;
>> __u32 ndata;
>> __u64 data[16];
>> @@ -6841,6 +6842,13 @@ Valid values for 'type' are:
>> reset/shutdown of the VM.
>> - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
>> The guest physical address of the guest's GHCB is stored in `data[0]`.
>> + - KVM_SYSTEM_EVENT_TDX_FATAL -- an TDX guest requested termination.
> Not sure termination is an accurate interpretation of fatal errors. Maybe
> just say: a fatal error reported by a TDX guest.
OK, will update it as:
"a TDX guest reported a fatal error state."
>
>> + The error codes of the guest's GHCI is stored in `data[0]`.
> what do you mean by "guest's GHCI"?
I don't know what I was thinking about.
Will update it as:
The error code reported by the TDX guest is stored in `data[0]`, the error
code format is defined in TDX GHCI specification.
>
>> + If the bit 63 of `data[0]` is set, it indicates there is TD specified
>> + additional information provided in a page, which is shared memory. The
>> + guest physical address of the information page is stored in `data[1]`.
>> + An optional error message is provided by `data[2]` ~ `data[9]`, which is
>> + byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled.
>> - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
>> KVM has recognized a wakeup event. Userspace may honor this event by
>> marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall
2024-12-10 9:42 ` Chao Gao
@ 2024-12-10 9:50 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-10 9:50 UTC (permalink / raw)
To: Chao Gao
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, michael.roth, linux-kernel
On 12/10/2024 5:42 PM, Chao Gao wrote:
>> +static int tdx_emulate_io(struct kvm_vcpu *vcpu)
>> +{
>> + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>> + unsigned long val = 0;
>> + unsigned int port;
>> + int size, ret;
>> + bool write;
> ..
>
>> +
>> + ++vcpu->stat.io_exits;
>> +
>> + size = tdvmcall_a0_read(vcpu);
>> + write = tdvmcall_a1_read(vcpu);
> a1 (i.e., R13) should be either 0 or 1. Other values are reserved according to
> the GHCI spec. It is not appropriate to cast it to a boolean. For example, if
> R13=2, KVM shouldn't treat it as a write request; instead, this request should
> be rejected.
Right, will fix it.
Thanks!
>> + port = tdvmcall_a2_read(vcpu);
>> +
>> + if (size != 1 && size != 2 && size != 4) {
>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
>> + return 1;
>> + }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
` (6 preceding siblings ...)
2024-12-01 3:53 ` [PATCH 7/7] KVM: TDX: Handle TDX PV MMIO hypercall Binbin Wu
@ 2024-12-10 18:24 ` Paolo Bonzini
7 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2024-12-10 18:24 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, chao.gao, michael.roth, linux-kernel
Applied to kvm-coco-queue, thanks.
Paolo
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
2024-12-09 11:21 ` Chao Gao
@ 2024-12-13 8:57 ` Xiaoyao Li
2024-12-16 0:54 ` Binbin Wu
2025-01-22 12:50 ` Paolo Bonzini
2 siblings, 1 reply; 37+ messages in thread
From: Xiaoyao Li @ 2024-12-13 8:57 UTC (permalink / raw)
To: Binbin Wu, pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao, michael.roth,
linux-kernel
On 12/1/2024 11:53 AM, Binbin Wu wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Introduce the wiring for handling TDX VM exits by implementing the
> callbacks .get_exit_info(), and .handle_exit(). Additionally, add
> error handling during the TDX VM exit flow, and add a place holder
> to handle various exit reasons. Add helper functions to retrieve
> exit information, exit qualifications, and more.
>
> Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.*
> operations for secure EPT or TD EPOCH. If contention occurs, the return
> value will have TDX_OPERAND_BUSY set with operand type, prompting the vCPU
> to attempt re-entry into the guest via the fast path.
>
> Error Handling: The following scenarios will return to userspace with
> KVM_EXIT_INTERNAL_ERROR.
> - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the
> CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX
> isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM
> firmware is not loaded or disabled.
> - TDX_ERROR: This indicates some check failed in the TDX module, preventing
> the vCPU from running.
> - TDX_NON_RECOVERABLE: Set by the TDX module when the error is
> non-recoverable, indicating that the TDX guest is dead or the vCPU is
> disabled. This also covers failed_vmentry case, which must have
> TDX_NON_RECOVERABLE set since off-TD debug feature has not been enabled.
> An exception is the triple fault, which also sets TDX_NON_RECOVERABLE
> but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX
> case.
> - Any unhandled VM exit reason will also return to userspace with
> KVM_EXIT_INTERNAL_ERROR.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> Hypercalls exit to userspace breakout:
> - Dropped Paolo's Reviewed-by since the change is not subtle.
> - Mention addition of .get_exit_info() handler in changelog. (Binbin)
> - tdh_sept_seamcall() -> tdx_seamcall_sept() in comments. (Binbin)
> - Do not open code TDX_ERROR_SEPT_BUSY. (Binbin)
> - "TDH.VP.ENTRY" -> "TDH.VP.ENTER". (Binbin)
> - Remove the use of union tdx_exit_reason. (Sean)
> https://lore.kernel.org/kvm/ZfSExlemFMKjBtZb@google.com/
> - Add tdx_check_exit_reason() to check a VMX exit reason against the
> status code of TDH.VP.ENTER.
> - Move the handling of TDX_ERROR_SEPT_BUSY and (TDX_OPERAND_BUSY |
> TDX_OPERAND_ID_TD_EPOCH) into fast path, and add a helper function
> tdx_exit_handlers_fastpath().
> - Remove the warning on TDX_SW_ERROR in fastpath, but return without
> further handling.
> - Call kvm_machine_check() for EXIT_REASON_MCE_DURING_VMENTRY, align
> with VMX case.
> - On failed_vmentry in fast path, return without further handling.
> - Exit to userspace for #UD and #GP.
> - Fix whitespace in tdx_get_exit_info()
> - Add a comment in tdx_handle_exit() to describe failed_vmentry case
> is handled by TDX_NON_RECOVERABLE handling.
> - Move the code of handling NMI, exception and external interrupts out
> of the patch, i.e., the NMI handling in tdx_vcpu_enter_exit() and the
> wiring of .handle_exit_irqoff() are removed.
> - Drop the check for VCPU_TD_STATE_INITIALIZED in tdx_handle_exit()
> because it has been checked in tdx_vcpu_pre_run().
> - Update changelog.
> ---
> arch/x86/include/asm/tdx.h | 1 +
> arch/x86/kvm/vmx/main.c | 25 +++++-
> arch/x86/kvm/vmx/tdx.c | 164 ++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/vmx/tdx_errno.h | 3 +
> arch/x86/kvm/vmx/x86_ops.h | 8 ++
> 5 files changed, 198 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 77477b905dca..01409a59224d 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -18,6 +18,7 @@
> * TDX module.
> */
> #define TDX_ERROR _BITUL(63)
> +#define TDX_NON_RECOVERABLE _BITUL(62)
> #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
> #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
>
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index f8acb1dc7c10..4f6faeb6e8e5 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -165,6 +165,15 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> return vmx_vcpu_run(vcpu, force_immediate_exit);
> }
>
> +static int vt_handle_exit(struct kvm_vcpu *vcpu,
> + enum exit_fastpath_completion fastpath)
> +{
> + if (is_td_vcpu(vcpu))
> + return tdx_handle_exit(vcpu, fastpath);
> +
> + return vmx_handle_exit(vcpu, fastpath);
> +}
> +
> static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
> {
> if (is_td_vcpu(vcpu)) {
> @@ -212,6 +221,18 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
> }
>
> +static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
> +{
> + if (is_td_vcpu(vcpu)) {
> + tdx_get_exit_info(vcpu, reason, info1, info2, intr_info,
> + error_code);
> + return;
> + }
> +
> + vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
> +}
> +
> static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
> {
> if (!is_td(kvm))
> @@ -305,7 +326,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>
> .vcpu_pre_run = vt_vcpu_pre_run,
> .vcpu_run = vt_vcpu_run,
> - .handle_exit = vmx_handle_exit,
> + .handle_exit = vt_handle_exit,
> .skip_emulated_instruction = vmx_skip_emulated_instruction,
> .update_emulated_instruction = vmx_update_emulated_instruction,
> .set_interrupt_shadow = vmx_set_interrupt_shadow,
> @@ -340,7 +361,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> .set_identity_map_addr = vmx_set_identity_map_addr,
> .get_mt_mask = vmx_get_mt_mask,
>
> - .get_exit_info = vmx_get_exit_info,
> + .get_exit_info = vt_get_exit_info,
>
> .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid,
>
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index f975bb323f60..3dcbdb5a7bf8 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -186,6 +186,54 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
> return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
> }
>
> +static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu)
> +{
> + return (union vmx_exit_reason)(u32)(to_tdx(vcpu)->vp_enter_ret);
> +}
> +
> +/*
> + * There is no simple way to check some bit(s) to decide whether the return
> + * value of TDH.VP.ENTER has a VMX exit reason or not. E.g.,
> + * TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE has exit reason but with error bit
> + * (bit 63) set, TDX_NON_RECOVERABLE_TD_CORRUPTED_MD has no exit reason but with
> + * error bit cleared.
> + */
> +static __always_inline bool tdx_has_exit_reason(struct kvm_vcpu *vcpu)
> +{
> + u64 status = to_tdx(vcpu)->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK;
> +
> + return status == TDX_SUCCESS || status == TDX_NON_RECOVERABLE_VCPU ||
> + status == TDX_NON_RECOVERABLE_TD ||
> + status == TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE ||
> + status == TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE;
> +}
> +
> +static __always_inline bool tdx_check_exit_reason(struct kvm_vcpu *vcpu, u16 reason)
> +{
> + return tdx_has_exit_reason(vcpu) &&
> + (u16)tdexit_exit_reason(vcpu).basic == reason;
> +}
> +
> +static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcpu)
> +{
> + return kvm_rcx_read(vcpu);
> +}
> +
> +static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu *vcpu)
> +{
> + return kvm_rdx_read(vcpu);
> +}
> +
> +static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu)
> +{
> + return kvm_r8_read(vcpu);
> +}
> +
> +static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcpu)
> +{
> + return kvm_r9_read(vcpu);
> +}
> +
> static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
> {
> tdx_guest_keyid_free(kvm_tdx->hkid);
> @@ -824,6 +872,21 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu)
> guest_state_exit_irqoff();
> }
>
> +static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
> +{
> + u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret;
> +
> + /* See the comment of tdx_seamcall_sept(). */
> + if (unlikely(vp_enter_ret == TDX_ERROR_SEPT_BUSY))
> + return EXIT_FASTPATH_REENTER_GUEST;
> +
> + /* TDH.VP.ENTER checks TD EPOCH which can contend with TDH.MEM.TRACK. */
> + if (unlikely(vp_enter_ret == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH)))
> + return EXIT_FASTPATH_REENTER_GUEST;
> +
> + return EXIT_FASTPATH_NONE;
> +}
> +
> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> {
> struct vcpu_tdx *tdx = to_tdx(vcpu);
> @@ -837,9 +900,26 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
> tdx->prep_switch_state = TDX_PREP_SW_STATE_UNRESTORED;
>
> vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
> +
> + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR))
> + return EXIT_FASTPATH_NONE;
> +
> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_MCE_DURING_VMENTRY)))
> + kvm_machine_check();
> +
> trace_kvm_exit(vcpu, KVM_ISA_VMX);
>
> - return EXIT_FASTPATH_NONE;
> + if (unlikely(tdx_has_exit_reason(vcpu) && tdexit_exit_reason(vcpu).failed_vmentry))
> + return EXIT_FASTPATH_NONE;
> +
> + return tdx_exit_handlers_fastpath(vcpu);
> +}
> +
> +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
> +{
> + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
> + vcpu->mmio_needed = 0;
> + return 0;
This function is just same as handle_triple_fault() in vmx.c, why not
use it instead?
> }
>
> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
> }
>
> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
> +{
> + struct vcpu_tdx *tdx = to_tdx(vcpu);
> + u64 vp_enter_ret = tdx->vp_enter_ret;
> + union vmx_exit_reason exit_reason;
> +
> + if (fastpath != EXIT_FASTPATH_NONE)
> + return 1;
> +
> + /*
> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
> + * TDX_SEAMCALL_VMFAILINVALID.
> + */
> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
> + goto unhandled_exit;
> + }
> +
> + /*
> + * Without off-TD debug enabled, failed_vmentry case must have
> + * TDX_NON_RECOVERABLE set.
> + */
This comment is confusing. I'm not sure why it is put here. Below code
does nothing with exit_reason.failed_vmentry.
> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
> + /* Triple fault is non-recoverable. */
> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT)))
> + return tdx_handle_triple_fault(vcpu);
> +
> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n",
> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
It indeed needs clarification for the need of "hkid" and "hkid pa".
Especially the "hkdi pa", which is the result of applying HKID of the
current TD to a physical address 0. I cannot think of any reason why we
need such info.
> + goto unhandled_exit;
> + }
> +
> + /* From now, the seamcall status should be TDX_SUCCESS. */
> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS);
Is there any case that TDX_SUCCESS with additional non-zero information
in the lower 32-bits? I thought TDX_SUCCESS is a whole 64-bit status code.
> + exit_reason = tdexit_exit_reason(vcpu);
> +
> + switch (exit_reason.basic) {
> + default:
> + break;
> + }
> +
> +unhandled_exit:
> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
> + vcpu->run->internal.ndata = 2;
> + vcpu->run->internal.data[0] = vp_enter_ret;
> + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
> + return 0;
> +}
> +
> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
> +{
> + struct vcpu_tdx *tdx = to_tdx(vcpu);
> +
> + if (tdx_has_exit_reason(vcpu)) {
> + /*
> + * Encode some useful info from the the 64 bit return code
> + * into the 32 bit exit 'reason'. If the VMX exit reason is
> + * valid, just set it to those bits.
> + */
> + *reason = (u32)tdx->vp_enter_ret;
> + *info1 = tdexit_exit_qual(vcpu);
> + *info2 = tdexit_ext_exit_qual(vcpu);
> + } else {
> + /*
> + * When the VMX exit reason in vp_enter_ret is not valid,
> + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to
> + * mean the vmexit code is not valid. Set the other bits to
> + * try to avoid picking a value that may someday be a valid
> + * VMX exit code.
> + */
> + *reason = 0xFFFFFFFF;
> + *info1 = 0;
> + *info2 = 0;
> + }
> +
> + *intr_info = tdexit_intr_info(vcpu);
> + *error_code = 0;
> +}
> +
> static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
> {
> const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf;
> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
> index f9dbb3a065cc..6ff4672c4181 100644
> --- a/arch/x86/kvm/vmx/tdx_errno.h
> +++ b/arch/x86/kvm/vmx/tdx_errno.h
> @@ -10,6 +10,9 @@
> * TDX SEAMCALL Status Codes (returned in RAX)
> */
> #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL
> +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL
> +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL
> +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL
Not the fault of this patch.
There are other Status code defined in arch/x86/include/asm/tdx.h
/*
* TDX module SEAMCALL leaf function error codes
*/
#define TDX_SUCCESS 0ULL
#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL
It's better to put them in one single place.
> #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL
> #define TDX_OPERAND_INVALID 0xC000010000000000ULL
> #define TDX_OPERAND_BUSY 0x8000020000000000ULL
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 02b33390e1bf..1c18943e0e1d 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
> void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
> void tdx_vcpu_put(struct kvm_vcpu *vcpu);
> +int tdx_handle_exit(struct kvm_vcpu *vcpu,
> + enum exit_fastpath_completion fastpath);
> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
>
> int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
>
> @@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat
> }
> static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
> static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
> +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
> + enum exit_fastpath_completion fastpath) { return 0; }
> +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
> + u64 *info2, u32 *intr_info, u32 *error_code) {}
>
> static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-01 3:53 ` [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Binbin Wu
2024-12-09 12:45 ` Chao Gao
@ 2024-12-13 9:32 ` Xiaoyao Li
2024-12-16 1:08 ` Binbin Wu
1 sibling, 1 reply; 37+ messages in thread
From: Xiaoyao Li @ 2024-12-13 9:32 UTC (permalink / raw)
To: Binbin Wu, pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao, michael.roth,
linux-kernel
On 12/1/2024 11:53 AM, Binbin Wu wrote:
> Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL with
> KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling.
>
> MapGPA is used by TDX guest to request to map a GPA range as private
> or shared memory. It needs to exit to userspace for handling. KVM has
> already implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which will
> exit to userspace with exit reason KVM_EXIT_HYPERCALL. Do sanity checks,
> convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the request
> to userspace.
>
> To prevent a TDG.VP.VMCALL<MapGPA> call from taking too long, the MapGPA
> range is split into 2MB chunks and check interrupt pending between chunks.
> This allows for timely injection of interrupts and prevents issues with
> guest lockup detection. TDX guest should retry the operation for the
> GPA starting at the address specified in R11 when the TDVMCALL return
> TDVMCALL_RETRY as status code.
>
> Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL with
> KVM_HC_MAP_GPA_RANGE bit set for TD VM.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> Hypercalls exit to userspace breakout:
> - New added.
> Implement one of the hypercalls need to exit to userspace for handling after
> dropping "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
> Sean's comment.
> https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
> - Check interrupt pending between chunks suggested by Sean.
> https://lore.kernel.org/kvm/ZleJvmCawKqmpFIa@google.com/
> - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
> - Use vt_is_tdx_private_gpa()
> ---
> arch/x86/include/asm/shared/tdx.h | 1 +
> arch/x86/kvm/vmx/tdx.c | 110 ++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/tdx.h | 3 +
> 3 files changed, 114 insertions(+)
>
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index 620327f0161f..a602d081cf1c 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -32,6 +32,7 @@
> #define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL
> #define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL
> #define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL
> +#define TDVMCALL_STATUS_ALIGN_ERROR 0x8000000000000002ULL
>
> /*
> * Bitmasks of exposed registers (with VMM).
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 4cc55b120ab0..553f4cbe0693 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -985,12 +985,122 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
> return r > 0;
> }
>
> +/*
> + * Split into chunks and check interrupt pending between chunks. This allows
> + * for timely injection of interrupts to prevent issues with guest lockup
> + * detection.
> + */
> +#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024)
> +static void __tdx_map_gpa(struct vcpu_tdx * tdx);
> +
> +static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_tdx * tdx = to_tdx(vcpu);
> +
> + if(vcpu->run->hypercall.ret) {
> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
> + kvm_r11_write(vcpu, tdx->map_gpa_next);
> + return 1;
> + }
> +
> + tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN;
> + if (tdx->map_gpa_next >= tdx->map_gpa_end) {
> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
> + return 1;
> + }
> +
> + /*
> + * Stop processing the remaining part if there is pending interrupt.
> + * Skip checking pending virtual interrupt (reflected by
> + * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because
> + * if guest disabled interrupt, it's OK not returning back to guest
> + * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA
> + * immediately after STI or MOV/POP SS.
> + */
> + if (pi_has_pending_interrupt(vcpu) ||
> + kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
> + kvm_r11_write(vcpu, tdx->map_gpa_next);
> + return 1;
> + }
> +
> + __tdx_map_gpa(tdx);
> + /* Forward request to userspace. */
> + return 0;
> +}
> +
> +static void __tdx_map_gpa(struct vcpu_tdx * tdx)
> +{
> + u64 gpa = tdx->map_gpa_next;
> + u64 size = tdx->map_gpa_end - tdx->map_gpa_next;
> +
> + if(size > TDX_MAP_GPA_MAX_LEN)
> + size = TDX_MAP_GPA_MAX_LEN;
> +
> + tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL;
> + tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
> + tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
> + tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
> + tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ?
> + KVM_MAP_GPA_RANGE_ENCRYPTED :
> + KVM_MAP_GPA_RANGE_DECRYPTED;
> + tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> +
> + tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa;
> +}
> +
> +static int tdx_map_gpa(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_tdx * tdx = to_tdx(vcpu);
> + u64 gpa = tdvmcall_a0_read(vcpu);
We can use kvm_r12_read() directly, which is more intuitive. And we can
drop the MACRO for a0/a1/a2/a3 accessors in patch 022.
> + u64 size = tdvmcall_a1_read(vcpu);
> + u64 ret;
> +
> + /*
> + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
> + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
> + * bit set. If not, the error code is not defined in GHCI for TDX, use
> + * TDVMCALL_STATUS_INVALID_OPERAND for this case.
> + */
> + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
> + goto error;
> + }
> +
> + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
> + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) ||
> + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
> + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) {
> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
> + goto error;
> + }
> +
> + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
> + ret = TDVMCALL_STATUS_ALIGN_ERROR;
> + goto error;
> + }
> +
> + tdx->map_gpa_end = gpa + size;
> + tdx->map_gpa_next = gpa;
> +
> + __tdx_map_gpa(tdx);
> + /* Forward request to userspace. */
> + return 0;
Maybe let __tdx_map_gpa() return a int to decide whether it needs to
return to userspace and
return __tdx_map_gpa(tdx);
?
> +
> +error:
> + tdvmcall_set_return_code(vcpu, ret);
> + kvm_r11_write(vcpu, gpa);
> + return 1;
> +}
> +
> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
> {
> if (tdvmcall_exit_type(vcpu))
> return tdx_emulate_vmcall(vcpu);
>
> switch (tdvmcall_leaf(vcpu)) {
> + case TDVMCALL_MAP_GPA:
> + return tdx_map_gpa(vcpu);
> default:
> break;
> }
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 1abc94b046a0..bfae70887695 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -71,6 +71,9 @@ struct vcpu_tdx {
>
> enum tdx_prepare_switch_state prep_switch_state;
> u64 msr_host_kernel_gs_base;
> +
> + u64 map_gpa_next;
> + u64 map_gpa_end;
> };
>
> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-01 3:53 ` [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
2024-12-06 9:31 ` Xu Yilun
2024-12-10 9:05 ` Chao Gao
@ 2024-12-13 9:40 ` Xiaoyao Li
2024-12-16 1:14 ` Binbin Wu
2 siblings, 1 reply; 37+ messages in thread
From: Xiaoyao Li @ 2024-12-13 9:40 UTC (permalink / raw)
To: Binbin Wu, pbonzini, seanjc, kvm
Cc: rick.p.edgecombe, kai.huang, adrian.hunter, reinette.chatre,
tony.lindgren, isaku.yamahata, yan.y.zhao, chao.gao, michael.roth,
linux-kernel
On 12/1/2024 11:53 AM, Binbin Wu wrote:
> Convert TDG.VP.VMCALL<ReportFatalError> to KVM_EXIT_SYSTEM_EVENT with
> a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for
> handling.
>
> TD guest can use TDG.VP.VMCALL<ReportFatalError> to report the fatal
> error it has experienced. This hypercall is special because TD guest
> is requesting a termination with the error information, KVM needs to
> forward the hypercall to userspace anyway, KVM doesn't do sanity checks
> and let userspace decide what to do.
>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> Hypercalls exit to userspace breakout:
> - New added.
> Implement one of the hypercalls need to exit to userspace for handling after
> reverting "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
> Sean's comment.
> https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
> - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
> ---
> Documentation/virt/kvm/api.rst | 8 ++++++
> arch/x86/kvm/vmx/tdx.c | 50 ++++++++++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 59 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index edc070c6e19b..bb39da72c647 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6815,6 +6815,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> #define KVM_SYSTEM_EVENT_WAKEUP 4
> #define KVM_SYSTEM_EVENT_SUSPEND 5
> #define KVM_SYSTEM_EVENT_SEV_TERM 6
> + #define KVM_SYSTEM_EVENT_TDX_FATAL 7
> __u32 type;
> __u32 ndata;
> __u64 data[16];
> @@ -6841,6 +6842,13 @@ Valid values for 'type' are:
> reset/shutdown of the VM.
> - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
> The guest physical address of the guest's GHCB is stored in `data[0]`.
> + - KVM_SYSTEM_EVENT_TDX_FATAL -- an TDX guest requested termination.
> + The error codes of the guest's GHCI is stored in `data[0]`.
> + If the bit 63 of `data[0]` is set, it indicates there is TD specified
> + additional information provided in a page, which is shared memory. The
> + guest physical address of the information page is stored in `data[1]`.
> + An optional error message is provided by `data[2]` ~ `data[9]`, which is
> + byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled.
> - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
> KVM has recognized a wakeup event. Userspace may honor this event by
> marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 553f4cbe0693..a79f9ca962d1 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1093,6 +1093,54 @@ static int tdx_map_gpa(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
> +{
> + u64 reg_mask = kvm_rcx_read(vcpu);
> + u64* opt_regs;
> +
> + /*
> + * Skip sanity checks and let userspace decide what to do if sanity
> + * checks fail.
> + */
> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
> + vcpu->run->system_event.ndata = 10;
> + /* Error codes. */
> + vcpu->run->system_event.data[0] = tdvmcall_a0_read(vcpu);
> + /* GPA of additional information page. */
> + vcpu->run->system_event.data[1] = tdvmcall_a1_read(vcpu);
> + /* Information passed via registers (up to 64 bytes). */
> + opt_regs = &vcpu->run->system_event.data[2];
> +
> +#define COPY_REG(REG, MASK) \
> + do { \
> + if (reg_mask & MASK) \
> + *opt_regs = kvm_ ## REG ## _read(vcpu); \
> + else \
> + *opt_regs = 0; \
> + opt_regs++; \
I'm not sure if we need to skip the "opt_regs++" the corresponding
register is not set valid in reg_mask. And maintain ndata to actual
valid number instead of hardcoding it to 10.
> + } while (0)
> +
> + /* The order is defined in GHCI. */
> + COPY_REG(r14, BIT_ULL(14));
> + COPY_REG(r15, BIT_ULL(15));
> + COPY_REG(rbx, BIT_ULL(3));
> + COPY_REG(rdi, BIT_ULL(7));
> + COPY_REG(rsi, BIT_ULL(6));
> + COPY_REG(r8, BIT_ULL(8));
> + COPY_REG(r9, BIT_ULL(9));
> + COPY_REG(rdx, BIT_ULL(2));
> +
> + /*
> + * Set the status code according to GHCI spec, although the vCPU may
> + * not return back to guest.
> + */
> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
> +
> + /* Forward request to userspace. */
> + return 0;
> +}
> +
> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
> {
> if (tdvmcall_exit_type(vcpu))
> @@ -1101,6 +1149,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
> switch (tdvmcall_leaf(vcpu)) {
> case TDVMCALL_MAP_GPA:
> return tdx_map_gpa(vcpu);
> + case TDVMCALL_REPORT_FATAL_ERROR:
> + return tdx_report_fatal_error(vcpu);
> default:
> break;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 637efc055145..c173c8dfcf83 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -375,6 +375,7 @@ struct kvm_run {
> #define KVM_SYSTEM_EVENT_WAKEUP 4
> #define KVM_SYSTEM_EVENT_SUSPEND 5
> #define KVM_SYSTEM_EVENT_SEV_TERM 6
> +#define KVM_SYSTEM_EVENT_TDX_FATAL 7
> __u32 type;
> __u32 ndata;
> union {
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-13 8:57 ` Xiaoyao Li
@ 2024-12-16 0:54 ` Binbin Wu
2024-12-16 4:37 ` Xiaoyao Li
0 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-16 0:54 UTC (permalink / raw)
To: Xiaoyao Li
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/13/2024 4:57 PM, Xiaoyao Li wrote:
> On 12/1/2024 11:53 AM, Binbin Wu wrote:
>>
[...]
>> +
>> +static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
>> +{
>> + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
>> + vcpu->mmio_needed = 0;
>> + return 0;
>
> This function is just same as handle_triple_fault() in vmx.c, why not use it instead?
Yes, handle_triple_fault() could be moved to vmx.h can then it can be used
by tdx code.
Will do to.
>
>> }
>> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
>> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
>> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
>> }
>> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
>> +{
>> + struct vcpu_tdx *tdx = to_tdx(vcpu);
>> + u64 vp_enter_ret = tdx->vp_enter_ret;
>> + union vmx_exit_reason exit_reason;
>> +
>> + if (fastpath != EXIT_FASTPATH_NONE)
>> + return 1;
>> +
>> + /*
>> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
>> + * TDX_SEAMCALL_VMFAILINVALID.
>> + */
>> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
>> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
>> + goto unhandled_exit;
>> + }
>> +
>> + /*
>> + * Without off-TD debug enabled, failed_vmentry case must have
>> + * TDX_NON_RECOVERABLE set.
>> + */
>
> This comment is confusing. I'm not sure why it is put here. Below code does nothing with exit_reason.failed_vmentry.
Because when failed_vmentry occurs, vp_enter_ret will have
TDX_NON_RECOVERABLE set, so it will be handled below.
>
>> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
>> + /* Triple fault is non-recoverable. */
>> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT)))
>> + return tdx_handle_triple_fault(vcpu);
>> +
>> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n",
>> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
>> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
>
> It indeed needs clarification for the need of "hkid" and "hkid pa". Especially the "hkdi pa", which is the result of applying HKID of the current TD to a physical address 0. I cannot think of any reason why we need such info.
Yes, set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid) should be removed.
I didn't notice it.
Thanks!
>
>> + goto unhandled_exit;
>> + }
>> +
>> + /* From now, the seamcall status should be TDX_SUCCESS. */
>> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS);
>
> Is there any case that TDX_SUCCESS with additional non-zero information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64-bit status code.
TDX status code uses the upper 32-bits.
When the status code is TDX_SUCCESS and has a valid VMX exit reason, the lower
32-bit is the VMX exit reason.
You can refer to the TDX module ABI spec or interface_function_completion_status.json
from the intel-tdx-module-1.5-abi-table for details.
>
>> + exit_reason = tdexit_exit_reason(vcpu);
>> +
>> + switch (exit_reason.basic) {
>> + default:
>> + break;
>> + }
>> +
>> +unhandled_exit:
>> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
>> + vcpu->run->internal.ndata = 2;
>> + vcpu->run->internal.data[0] = vp_enter_ret;
>> + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
>> + return 0;
>> +}
>> +
>> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
>> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
>> +{
>> + struct vcpu_tdx *tdx = to_tdx(vcpu);
>> +
>> + if (tdx_has_exit_reason(vcpu)) {
>> + /*
>> + * Encode some useful info from the the 64 bit return code
>> + * into the 32 bit exit 'reason'. If the VMX exit reason is
>> + * valid, just set it to those bits.
>> + */
>> + *reason = (u32)tdx->vp_enter_ret;
>> + *info1 = tdexit_exit_qual(vcpu);
>> + *info2 = tdexit_ext_exit_qual(vcpu);
>> + } else {
>> + /*
>> + * When the VMX exit reason in vp_enter_ret is not valid,
>> + * overload the VMX_EXIT_REASONS_FAILED_VMENTRY bit (31) to
>> + * mean the vmexit code is not valid. Set the other bits to
>> + * try to avoid picking a value that may someday be a valid
>> + * VMX exit code.
>> + */
>> + *reason = 0xFFFFFFFF;
>> + *info1 = 0;
>> + *info2 = 0;
>> + }
>> +
>> + *intr_info = tdexit_intr_info(vcpu);
>> + *error_code = 0;
>> +}
>> +
>> static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
>> {
>> const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf;
>> diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
>> index f9dbb3a065cc..6ff4672c4181 100644
>> --- a/arch/x86/kvm/vmx/tdx_errno.h
>> +++ b/arch/x86/kvm/vmx/tdx_errno.h
>> @@ -10,6 +10,9 @@
>> * TDX SEAMCALL Status Codes (returned in RAX)
>> */
>> #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL
>> +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL
>> +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL
>> +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL
>
> Not the fault of this patch.
>
> There are other Status code defined in arch/x86/include/asm/tdx.h
>
> /*
> * TDX module SEAMCALL leaf function error codes
> */
> #define TDX_SUCCESS 0ULL
> #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL
>
> It's better to put them in one single place.
Agree.
Thanks!
>
>> #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL
>> #define TDX_OPERAND_INVALID 0xC000010000000000ULL
>> #define TDX_OPERAND_BUSY 0x8000020000000000ULL
>> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
>> index 02b33390e1bf..1c18943e0e1d 100644
>> --- a/arch/x86/kvm/vmx/x86_ops.h
>> +++ b/arch/x86/kvm/vmx/x86_ops.h
>> @@ -133,6 +133,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
>> fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
>> void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
>> void tdx_vcpu_put(struct kvm_vcpu *vcpu);
>> +int tdx_handle_exit(struct kvm_vcpu *vcpu,
>> + enum exit_fastpath_completion fastpath);
>> +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
>> + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
>> int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
>> @@ -167,6 +171,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat
>> }
>> static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
>> static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
>> +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
>> + enum exit_fastpath_completion fastpath) { return 0; }
>> +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1,
>> + u64 *info2, u32 *intr_info, u32 *error_code) {}
>> static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-13 9:32 ` Xiaoyao Li
@ 2024-12-16 1:08 ` Binbin Wu
2024-12-16 6:03 ` Xiaoyao Li
0 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-16 1:08 UTC (permalink / raw)
To: Xiaoyao Li
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/13/2024 5:32 PM, Xiaoyao Li wrote:
> On 12/1/2024 11:53 AM, Binbin Wu wrote:
>
[...]
>> +
>> +static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>> +{
>> + struct vcpu_tdx * tdx = to_tdx(vcpu);
>> + u64 gpa = tdvmcall_a0_read(vcpu);
>
> We can use kvm_r12_read() directly, which is more intuitive. And we can drop the MACRO for a0/a1/a2/a3 accessors in patch 022.
I am neutral about it.
>
>> + u64 size = tdvmcall_a1_read(vcpu);
>> + u64 ret;
>> +
>> + /*
>> + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
>> + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
>> + * bit set. If not, the error code is not defined in GHCI for TDX, use
>> + * TDVMCALL_STATUS_INVALID_OPERAND for this case.
>> + */
>> + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
>> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
>> + goto error;
>> + }
>> +
>> + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
>> + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) ||
>> + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
>> + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) {
>> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
>> + goto error;
>> + }
>> +
>> + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
>> + ret = TDVMCALL_STATUS_ALIGN_ERROR;
>> + goto error;
>> + }
>> +
>> + tdx->map_gpa_end = gpa + size;
>> + tdx->map_gpa_next = gpa;
>> +
>> + __tdx_map_gpa(tdx);
>> + /* Forward request to userspace. */
>> + return 0;
>
> Maybe let __tdx_map_gpa() return a int to decide whether it needs to return to userspace and
>
> return __tdx_map_gpa(tdx);
>
> ?
To save one line of code and the comment?
Because MapGPA always goes to userspace, I don't want to make a function return
a int that is a fixed value.
And if the multiple comments bother you, I think the comments can be removed.
>
>
>> +
>> +error:
>> + tdvmcall_set_return_code(vcpu, ret);
>> + kvm_r11_write(vcpu, gpa);
>> + return 1;
>> +}
>> +
>> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
>> {
>> if (tdvmcall_exit_type(vcpu))
>> return tdx_emulate_vmcall(vcpu);
>> switch (tdvmcall_leaf(vcpu)) {
>> + case TDVMCALL_MAP_GPA:
>> + return tdx_map_gpa(vcpu);
>> default:
>> break;
>> }
>> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
>> index 1abc94b046a0..bfae70887695 100644
>> --- a/arch/x86/kvm/vmx/tdx.h
>> +++ b/arch/x86/kvm/vmx/tdx.h
>> @@ -71,6 +71,9 @@ struct vcpu_tdx {
>> enum tdx_prepare_switch_state prep_switch_state;
>> u64 msr_host_kernel_gs_base;
>> +
>> + u64 map_gpa_next;
>> + u64 map_gpa_end;
>> };
>> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
2024-12-13 9:40 ` Xiaoyao Li
@ 2024-12-16 1:14 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-16 1:14 UTC (permalink / raw)
To: Xiaoyao Li
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/13/2024 5:40 PM, Xiaoyao Li wrote:
> On 12/1/2024 11:53 AM, Binbin Wu wrote:
>> Convert TDG.VP.VMCALL<ReportFatalError> to KVM_EXIT_SYSTEM_EVENT with
>> a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for
>> handling.
>>
>> TD guest can use TDG.VP.VMCALL<ReportFatalError> to report the fatal
>> error it has experienced. This hypercall is special because TD guest
>> is requesting a termination with the error information, KVM needs to
>> forward the hypercall to userspace anyway, KVM doesn't do sanity checks
>> and let userspace decide what to do.
>>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>> Hypercalls exit to userspace breakout:
>> - New added.
>> Implement one of the hypercalls need to exit to userspace for handling after
>> reverting "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
>> Sean's comment.
>> https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
>> - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
>> ---
>> Documentation/virt/kvm/api.rst | 8 ++++++
>> arch/x86/kvm/vmx/tdx.c | 50 ++++++++++++++++++++++++++++++++++
>> include/uapi/linux/kvm.h | 1 +
>> 3 files changed, 59 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index edc070c6e19b..bb39da72c647 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6815,6 +6815,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
>> #define KVM_SYSTEM_EVENT_WAKEUP 4
>> #define KVM_SYSTEM_EVENT_SUSPEND 5
>> #define KVM_SYSTEM_EVENT_SEV_TERM 6
>> + #define KVM_SYSTEM_EVENT_TDX_FATAL 7
>> __u32 type;
>> __u32 ndata;
>> __u64 data[16];
>> @@ -6841,6 +6842,13 @@ Valid values for 'type' are:
>> reset/shutdown of the VM.
>> - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
>> The guest physical address of the guest's GHCB is stored in `data[0]`.
>> + - KVM_SYSTEM_EVENT_TDX_FATAL -- an TDX guest requested termination.
>> + The error codes of the guest's GHCI is stored in `data[0]`.
>> + If the bit 63 of `data[0]` is set, it indicates there is TD specified
>> + additional information provided in a page, which is shared memory. The
>> + guest physical address of the information page is stored in `data[1]`.
>> + An optional error message is provided by `data[2]` ~ `data[9]`, which is
>> + byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled.
>> - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
>> KVM has recognized a wakeup event. Userspace may honor this event by
>> marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>> index 553f4cbe0693..a79f9ca962d1 100644
>> --- a/arch/x86/kvm/vmx/tdx.c
>> +++ b/arch/x86/kvm/vmx/tdx.c
>> @@ -1093,6 +1093,54 @@ static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>> return 1;
>> }
>> +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
>> +{
>> + u64 reg_mask = kvm_rcx_read(vcpu);
>> + u64* opt_regs;
>> +
>> + /*
>> + * Skip sanity checks and let userspace decide what to do if sanity
>> + * checks fail.
>> + */
>> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
>> + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
>> + vcpu->run->system_event.ndata = 10;
>> + /* Error codes. */
>> + vcpu->run->system_event.data[0] = tdvmcall_a0_read(vcpu);
>> + /* GPA of additional information page. */
>> + vcpu->run->system_event.data[1] = tdvmcall_a1_read(vcpu);
>> + /* Information passed via registers (up to 64 bytes). */
>> + opt_regs = &vcpu->run->system_event.data[2];
>> +
>> +#define COPY_REG(REG, MASK) \
>> + do { \
>> + if (reg_mask & MASK) \
>> + *opt_regs = kvm_ ## REG ## _read(vcpu); \
>> + else \
>> + *opt_regs = 0; \
>> + opt_regs++; \
>
> I'm not sure if we need to skip the "opt_regs++" the corresponding register is not set valid in reg_mask. And maintain ndata to actual valid number instead of hardcoding it to 10.
Yes, it's better.
>
>> + } while (0)
>> +
>> + /* The order is defined in GHCI. */
>> + COPY_REG(r14, BIT_ULL(14));
>> + COPY_REG(r15, BIT_ULL(15));
>> + COPY_REG(rbx, BIT_ULL(3));
>> + COPY_REG(rdi, BIT_ULL(7));
>> + COPY_REG(rsi, BIT_ULL(6));
>> + COPY_REG(r8, BIT_ULL(8));
>> + COPY_REG(r9, BIT_ULL(9));
>> + COPY_REG(rdx, BIT_ULL(2));
>> +
>> + /*
>> + * Set the status code according to GHCI spec, although the vCPU may
>> + * not return back to guest.
>> + */
>> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS);
>> +
>> + /* Forward request to userspace. */
>> + return 0;
>> +}
>> +
>> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
>> {
>> if (tdvmcall_exit_type(vcpu))
>> @@ -1101,6 +1149,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
>> switch (tdvmcall_leaf(vcpu)) {
>> case TDVMCALL_MAP_GPA:
>> return tdx_map_gpa(vcpu);
>> + case TDVMCALL_REPORT_FATAL_ERROR:
>> + return tdx_report_fatal_error(vcpu);
>> default:
>> break;
>> }
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 637efc055145..c173c8dfcf83 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -375,6 +375,7 @@ struct kvm_run {
>> #define KVM_SYSTEM_EVENT_WAKEUP 4
>> #define KVM_SYSTEM_EVENT_SUSPEND 5
>> #define KVM_SYSTEM_EVENT_SEV_TERM 6
>> +#define KVM_SYSTEM_EVENT_TDX_FATAL 7
>> __u32 type;
>> __u32 ndata;
>> union {
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-16 0:54 ` Binbin Wu
@ 2024-12-16 4:37 ` Xiaoyao Li
2024-12-18 1:33 ` Binbin Wu
0 siblings, 1 reply; 37+ messages in thread
From: Xiaoyao Li @ 2024-12-16 4:37 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/16/2024 8:54 AM, Binbin Wu wrote:
>
>
>
> On 12/13/2024 4:57 PM, Xiaoyao Li wrote:
>> On 12/1/2024 11:53 AM, Binbin Wu wrote:
...
>>
>>> }
>>> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int
>>> pgd_level)
>>> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm
>>> *kvm, gfn_t gfn,
>>> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
>>> }
>>> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
>>> +{
>>> + struct vcpu_tdx *tdx = to_tdx(vcpu);
>>> + u64 vp_enter_ret = tdx->vp_enter_ret;
>>> + union vmx_exit_reason exit_reason;
>>> +
>>> + if (fastpath != EXIT_FASTPATH_NONE)
>>> + return 1;
>>> +
>>> + /*
>>> + * Handle TDX SW errors, including TDX_SEAMCALL_UD,
>>> TDX_SEAMCALL_GP and
>>> + * TDX_SEAMCALL_VMFAILINVALID.
>>> + */
>>> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
>>> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
>>> + goto unhandled_exit;
>>> + }
>>> +
>>> + /*
>>> + * Without off-TD debug enabled, failed_vmentry case must have
>>> + * TDX_NON_RECOVERABLE set.
>>> + */
>>
>> This comment is confusing. I'm not sure why it is put here. Below code
>> does nothing with exit_reason.failed_vmentry.
>
> Because when failed_vmentry occurs, vp_enter_ret will have
> TDX_NON_RECOVERABLE set, so it will be handled below.
The words somehow is confusing, which to me is implying something like:
WARN_ON(!debug_td() && exit_reason.failed_vmentry &&
!(vp_enter_ret & TDX_NON_RECOVERABLE))
Besides, VMX returns KVM_EXIT_FAIL_ENTRY for vm-entry failure. So the
question is why TDX cannot do it same way?
>>
>>> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
>>> + /* Triple fault is non-recoverable. */
>>> + if (unlikely(tdx_check_exit_reason(vcpu,
>>> EXIT_REASON_TRIPLE_FAULT)))
>>> + return tdx_handle_triple_fault(vcpu);
>>> +
>>> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa
>>> 0x%llx\n",
>>> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
>>> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
>>
>> It indeed needs clarification for the need of "hkid" and "hkid pa".
>> Especially the "hkdi pa", which is the result of applying HKID of the
>> current TD to a physical address 0. I cannot think of any reason why
>> we need such info.
> Yes, set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid) should be removed.
> I didn't notice it.
don't forget to justify why HKID is useful here. To me, HKID can be
dropped as well.
> Thanks!
>
>
>>
>>> + goto unhandled_exit;
>>> + }
>>> +
>>> + /* From now, the seamcall status should be TDX_SUCCESS. */
>>> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) !=
>>> TDX_SUCCESS);
>>
>> Is there any case that TDX_SUCCESS with additional non-zero
>> information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64-
>> bit status code.
> TDX status code uses the upper 32-bits.
>
> When the status code is TDX_SUCCESS and has a valid VMX exit reason, the
> lower
> 32-bit is the VMX exit reason.
>
> You can refer to the TDX module ABI spec or
> interface_function_completion_status.json
> from the intel-tdx-module-1.5-abi-table for details.
>
I see. (I asked a silly question that I even missed the normal Exit case)
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-16 1:08 ` Binbin Wu
@ 2024-12-16 6:03 ` Xiaoyao Li
2024-12-18 1:38 ` Binbin Wu
0 siblings, 1 reply; 37+ messages in thread
From: Xiaoyao Li @ 2024-12-16 6:03 UTC (permalink / raw)
To: Binbin Wu
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/16/2024 9:08 AM, Binbin Wu wrote:
>
>
>
> On 12/13/2024 5:32 PM, Xiaoyao Li wrote:
>> On 12/1/2024 11:53 AM, Binbin Wu wrote:
>>
> [...]
>>> +
>>> +static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>>> +{
>>> + struct vcpu_tdx * tdx = to_tdx(vcpu);
>>> + u64 gpa = tdvmcall_a0_read(vcpu);
>>
>> We can use kvm_r12_read() directly, which is more intuitive. And we
>> can drop the MACRO for a0/a1/a2/a3 accessors in patch 022.
> I am neutral about it.
>
a0, a1, a2, a3, are the name convention for KVM's hypercall. It makes
sense when serving as the parameters to __kvm_emulate_hypercall().
However, now __kvm_emulate_hypercall() is made to a MACRO and we don't
need the temp variable like a0 = kvm_xx_read().
For TDVMCALL leafs other than normal KVM hypercalls, they are all TDX
specific and defined in TDX GHCI spec, a0/a1/a2/a3 makes no sense for them.
>
>>
>>> + u64 size = tdvmcall_a1_read(vcpu);
>>> + u64 ret;
>>> +
>>> + /*
>>> + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
>>> + * userspace to enable KVM_CAP_EXIT_HYPERCALL with
>>> KVM_HC_MAP_GPA_RANGE
>>> + * bit set. If not, the error code is not defined in GHCI for
>>> TDX, use
>>> + * TDVMCALL_STATUS_INVALID_OPERAND for this case.
>>> + */
>>> + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
>>> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
>>> + goto error;
>>> + }
>>> +
>>> + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
>>> + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) ||
>>> + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
>>> + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) {
>>> + ret = TDVMCALL_STATUS_INVALID_OPERAND;
>>> + goto error;
>>> + }
>>> +
>>> + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
>>> + ret = TDVMCALL_STATUS_ALIGN_ERROR;
>>> + goto error;
>>> + }
>>> +
>>> + tdx->map_gpa_end = gpa + size;
>>> + tdx->map_gpa_next = gpa;
>>> +
>>> + __tdx_map_gpa(tdx);
>>> + /* Forward request to userspace. */
>>> + return 0;
>>
>> Maybe let __tdx_map_gpa() return a int to decide whether it needs to
>> return to userspace and
>>
>> return __tdx_map_gpa(tdx);
>>
>> ?
>
> To save one line of code and the comment?
No. Just I found most of the cases that need to exit to usespace, comes
with "return 0" after setting up the run->exit_reason and run->(fields).
> Because MapGPA always goes to userspace, I don't want to make a function
> return
> a int that is a fixed value.
> And if the multiple comments bother you, I think the comments can be
> removed.
>
>>
>>
>>> +
>>> +error:
>>> + tdvmcall_set_return_code(vcpu, ret);
>>> + kvm_r11_write(vcpu, gpa);
>>> + return 1;
>>> +}
>>> +
>>> static int handle_tdvmcall(struct kvm_vcpu *vcpu)
>>> {
>>> if (tdvmcall_exit_type(vcpu))
>>> return tdx_emulate_vmcall(vcpu);
>>> switch (tdvmcall_leaf(vcpu)) {
>>> + case TDVMCALL_MAP_GPA:
>>> + return tdx_map_gpa(vcpu);
>>> default:
>>> break;
>>> }
>>> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
>>> index 1abc94b046a0..bfae70887695 100644
>>> --- a/arch/x86/kvm/vmx/tdx.h
>>> +++ b/arch/x86/kvm/vmx/tdx.h
>>> @@ -71,6 +71,9 @@ struct vcpu_tdx {
>>> enum tdx_prepare_switch_state prep_switch_state;
>>> u64 msr_host_kernel_gs_base;
>>> +
>>> + u64 map_gpa_next;
>>> + u64 map_gpa_end;
>>> };
>>> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32
>>> field, u64 err);
>>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-16 4:37 ` Xiaoyao Li
@ 2024-12-18 1:33 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-18 1:33 UTC (permalink / raw)
To: Xiaoyao Li
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/16/2024 12:37 PM, Xiaoyao Li wrote:
> On 12/16/2024 8:54 AM, Binbin Wu wrote:
>>
>>
>>
>> On 12/13/2024 4:57 PM, Xiaoyao Li wrote:
>>> On 12/1/2024 11:53 AM, Binbin Wu wrote:
> ...
>>>
>>>> }
>>>> void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
>>>> @@ -1135,6 +1215,88 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
>>>> return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
>>>> }
>>>> +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
>>>> +{
>>>> + struct vcpu_tdx *tdx = to_tdx(vcpu);
>>>> + u64 vp_enter_ret = tdx->vp_enter_ret;
>>>> + union vmx_exit_reason exit_reason;
>>>> +
>>>> + if (fastpath != EXIT_FASTPATH_NONE)
>>>> + return 1;
>>>> +
>>>> + /*
>>>> + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
>>>> + * TDX_SEAMCALL_VMFAILINVALID.
>>>> + */
>>>> + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) {
>>>> + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm);
>>>> + goto unhandled_exit;
>>>> + }
>>>> +
>>>> + /*
>>>> + * Without off-TD debug enabled, failed_vmentry case must have
>>>> + * TDX_NON_RECOVERABLE set.
>>>> + */
>>>
>>> This comment is confusing. I'm not sure why it is put here. Below code does nothing with exit_reason.failed_vmentry.
>>
>> Because when failed_vmentry occurs, vp_enter_ret will have
>> TDX_NON_RECOVERABLE set, so it will be handled below.
>
> The words somehow is confusing, which to me is implying something like:
>
> WARN_ON(!debug_td() && exit_reason.failed_vmentry &&
> !(vp_enter_ret & TDX_NON_RECOVERABLE))
The comment tried to say that the failed_vmentry case is also covered by
TDX_NON_RECOVERABLE since currently off-TD debug is not supported yet.
Since it's causing confusion, I'd like to add the code the handle the
failed_vmentry separately and aligned to VMX by using KVM_EXIT_FAIL_ENTRY
as suggested below.
>
> Besides, VMX returns KVM_EXIT_FAIL_ENTRY for vm-entry failure. So the question is why TDX cannot do it same way?
>
>>>
>>>> + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE))) {
>>>> + /* Triple fault is non-recoverable. */
>>>> + if (unlikely(tdx_check_exit_reason(vcpu, EXIT_REASON_TRIPLE_FAULT)))
>>>> + return tdx_handle_triple_fault(vcpu);
>>>> +
>>>> + kvm_pr_unimpl("TD vp_enter_ret 0x%llx, hkid 0x%x hkid pa 0x%llx\n",
>>>> + vp_enter_ret, to_kvm_tdx(vcpu->kvm)->hkid,
>>>> + set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
>>>
>>> It indeed needs clarification for the need of "hkid" and "hkid pa". Especially the "hkdi pa", which is the result of applying HKID of the current TD to a physical address 0. I cannot think of any reason why we need such info.
>> Yes, set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid) should be removed.
>> I didn't notice it.
>
> don't forget to justify why HKID is useful here. To me, HKID can be dropped as well.
OK.
>
>> Thanks!
>>
>>
>>>
>>>> + goto unhandled_exit;
>>>> + }
>>>> +
>>>> + /* From now, the seamcall status should be TDX_SUCCESS. */
>>>> + WARN_ON_ONCE((vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS);
>>>
>>> Is there any case that TDX_SUCCESS with additional non-zero information in the lower 32-bits? I thought TDX_SUCCESS is a whole 64- bit status code.
>> TDX status code uses the upper 32-bits.
>>
>> When the status code is TDX_SUCCESS and has a valid VMX exit reason, the lower
>> 32-bit is the VMX exit reason.
>>
>> You can refer to the TDX module ABI spec or interface_function_completion_status.json
>> from the intel-tdx-module-1.5-abi-table for details.
>>
>
> I see. (I asked a silly question that I even missed the normal Exit case)
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-16 6:03 ` Xiaoyao Li
@ 2024-12-18 1:38 ` Binbin Wu
2024-12-18 5:09 ` Binbin Wu
0 siblings, 1 reply; 37+ messages in thread
From: Binbin Wu @ 2024-12-18 1:38 UTC (permalink / raw)
To: Xiaoyao Li
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/16/2024 2:03 PM, Xiaoyao Li wrote:
> On 12/16/2024 9:08 AM, Binbin Wu wrote:
>>
>>
>>
>> On 12/13/2024 5:32 PM, Xiaoyao Li wrote:
>>> On 12/1/2024 11:53 AM, Binbin Wu wrote:
>>>
>> [...]
>>>> +
>>>> +static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + struct vcpu_tdx * tdx = to_tdx(vcpu);
>>>> + u64 gpa = tdvmcall_a0_read(vcpu);
>>>
>>> We can use kvm_r12_read() directly, which is more intuitive. And we can drop the MACRO for a0/a1/a2/a3 accessors in patch 022.
>> I am neutral about it.
>>
>
> a0, a1, a2, a3, are the name convention for KVM's hypercall. It makes sense when serving as the parameters to __kvm_emulate_hypercall().
>
> However, now __kvm_emulate_hypercall() is made to a MACRO and we don't need the temp variable like a0 = kvm_xx_read().
>
> For TDVMCALL leafs other than normal KVM hypercalls, they are all TDX specific and defined in TDX GHCI spec, a0/a1/a2/a3 makes no sense for them.
OK, make sense.
Thanks!
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
2024-12-18 1:38 ` Binbin Wu
@ 2024-12-18 5:09 ` Binbin Wu
0 siblings, 0 replies; 37+ messages in thread
From: Binbin Wu @ 2024-12-18 5:09 UTC (permalink / raw)
To: Xiaoyao Li
Cc: pbonzini, seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, tony.lindgren, isaku.yamahata, yan.y.zhao,
chao.gao, michael.roth, linux-kernel
On 12/18/2024 9:38 AM, Binbin Wu wrote:
>
>
>
> On 12/16/2024 2:03 PM, Xiaoyao Li wrote:
>> On 12/16/2024 9:08 AM, Binbin Wu wrote:
>>>
>>>
>>>
>>> On 12/13/2024 5:32 PM, Xiaoyao Li wrote:
>>>> On 12/1/2024 11:53 AM, Binbin Wu wrote:
>>>>
>>> [...]
>>>>> +
>>>>> +static int tdx_map_gpa(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> + struct vcpu_tdx * tdx = to_tdx(vcpu);
>>>>> + u64 gpa = tdvmcall_a0_read(vcpu);
>>>>
>>>> We can use kvm_r12_read() directly, which is more intuitive. And we can drop the MACRO for a0/a1/a2/a3 accessors in patch 022.
>>> I am neutral about it.
>>>
>>
>> a0, a1, a2, a3, are the name convention for KVM's hypercall. It makes sense when serving as the parameters to __kvm_emulate_hypercall().
>>
>> However, now __kvm_emulate_hypercall() is made to a MACRO and we don't need the temp variable like a0 = kvm_xx_read().
>>
>> For TDVMCALL leafs other than normal KVM hypercalls, they are all TDX specific and defined in TDX GHCI spec, a0/a1/a2/a3 makes no sense for them.
> OK, make sense.
>
> Thanks!
>
>
I will leave it as it is for now.
There is an RFC proposal from Hunter about using descriptively named parameters
https://lore.kernel.org/all/fa817f29-e3ba-4c54-8600-e28cf6ab1953@intel.com/
It can wait until there is a final conclusion.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
2024-12-09 11:21 ` Chao Gao
2024-12-13 8:57 ` Xiaoyao Li
@ 2025-01-22 12:50 ` Paolo Bonzini
2 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2025-01-22 12:50 UTC (permalink / raw)
To: Binbin Wu
Cc: seanjc, kvm, rick.p.edgecombe, kai.huang, adrian.hunter,
reinette.chatre, xiaoyao.li, tony.lindgren, isaku.yamahata,
yan.y.zhao, chao.gao, michael.roth, linux-kernel, binbin.wu
On Sun, Dec 1, 2024 at 4:52 AM Binbin Wu <binbin.wu@linux.intel.com> wrote:
>
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Introduce the wiring for handling TDX VM exits by implementing the
> callbacks .get_exit_info(), and .handle_exit().
Linux 6.14 adds get_entry_info. For lack of a better place, I added it
in this patch when rebasing kvm-coco-queue:
static void vt_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code)
{
*intr_info = 0;
*error_code = 0;
if (is_td_vcpu(vcpu))
return;
vmx_get_entry_info(vcpu, intr_info, error_code);
}
Thanks,
Paolo
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2025-01-22 12:50 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-01 3:53 [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
2024-12-01 3:53 ` [PATCH 1/7] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
2024-12-09 11:21 ` Chao Gao
2024-12-10 2:14 ` Binbin Wu
2024-12-13 8:57 ` Xiaoyao Li
2024-12-16 0:54 ` Binbin Wu
2024-12-16 4:37 ` Xiaoyao Li
2024-12-18 1:33 ` Binbin Wu
2025-01-22 12:50 ` Paolo Bonzini
2024-12-01 3:53 ` [PATCH 2/7] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Binbin Wu
2024-12-09 11:28 ` Chao Gao
2024-12-10 2:34 ` Binbin Wu
2024-12-01 3:53 ` [PATCH 3/7] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Binbin Wu
2024-12-09 2:58 ` Chao Gao
2024-12-09 3:08 ` Binbin Wu
2024-12-01 3:53 ` [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Binbin Wu
2024-12-09 12:45 ` Chao Gao
2024-12-10 2:51 ` Binbin Wu
2024-12-10 9:10 ` Chao Gao
2024-12-10 9:27 ` Tony Lindgren
2024-12-13 9:32 ` Xiaoyao Li
2024-12-16 1:08 ` Binbin Wu
2024-12-16 6:03 ` Xiaoyao Li
2024-12-18 1:38 ` Binbin Wu
2024-12-18 5:09 ` Binbin Wu
2024-12-01 3:53 ` [PATCH 5/7] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
2024-12-06 9:31 ` Xu Yilun
2024-12-06 9:37 ` Binbin Wu
2024-12-10 9:05 ` Chao Gao
2024-12-10 9:43 ` Binbin Wu
2024-12-13 9:40 ` Xiaoyao Li
2024-12-16 1:14 ` Binbin Wu
2024-12-01 3:53 ` [PATCH 6/7] KVM: TDX: Handle TDX PV port I/O hypercall Binbin Wu
2024-12-10 9:42 ` Chao Gao
2024-12-10 9:50 ` Binbin Wu
2024-12-01 3:53 ` [PATCH 7/7] KVM: TDX: Handle TDX PV MMIO hypercall Binbin Wu
2024-12-10 18:24 ` [PATCH 0/7] KVM: TDX: TDX hypercalls may exit to userspace Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox