public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Binbin Wu <binbin.wu@linux.intel.com>
To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org
Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com,
	adrian.hunter@intel.com, reinette.chatre@intel.com,
	xiaoyao.li@intel.com, tony.lindgren@intel.com,
	isaku.yamahata@intel.com, yan.y.zhao@intel.com,
	chao.gao@intel.com, linux-kernel@vger.kernel.org,
	binbin.wu@linux.intel.com
Subject: [PATCH v2 5/8] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>
Date: Tue, 11 Feb 2025 10:54:39 +0800	[thread overview]
Message-ID: <20250211025442.3071607-6-binbin.wu@linux.intel.com> (raw)
In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com>

Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL with
KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling.

MapGPA is used by TDX guest to request to map a GPA range as private
or shared memory.  It needs to exit to userspace for handling.  KVM has
already implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which will
exit to userspace with exit reason KVM_EXIT_HYPERCALL.  Do sanity checks,
convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the request
to userspace.

To prevent a TDG.VP.VMCALL<MapGPA> call from taking too long, the MapGPA
range is split into 2MB chunks and check interrupt pending between chunks.
This allows for timely injection of interrupts and prevents issues with
guest lockup detection.  TDX guest should retry the operation for the
GPA starting at the address specified in R11 when the TDVMCALL return
TDVMCALL_RETRY as status code.

Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL with
KVM_HC_MAP_GPA_RANGE bit set for TD VM.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace v2:
- Skip setting of return code as TDVMCALL_STATUS_SUCCESS.
- Use vp_enter_args instead of x86 registers.
- Remove unnecessary comments.
- Zero run->hypercall.ret in __tdx_map_gpa() following the pattern of Paolo's
  patch, the feedback of adding a helper is still pending. (Rick)
  https://lore.kernel.org/kvm/20241213194137.315304-1-pbonzini@redhat.com

Hypercalls exit to userspace v1:
- New added.
  Implement one of the hypercalls need to exit to userspace for handling after
  dropping "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
  Sean's comment.
  https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
- Check interrupt pending between chunks suggested by Sean.
  https://lore.kernel.org/kvm/ZleJvmCawKqmpFIa@google.com/
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
- Use vt_is_tdx_private_gpa()
---
 arch/x86/include/asm/shared/tdx.h |   1 +
 arch/x86/kvm/vmx/tdx.c            | 113 ++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h            |   3 +
 3 files changed, 117 insertions(+)

diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 4aedab1f2a1a..f23657350d28 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -77,6 +77,7 @@
 #define TDVMCALL_STATUS_SUCCESS		0x0000000000000000ULL
 #define TDVMCALL_STATUS_RETRY		0x0000000000000001ULL
 #define TDVMCALL_STATUS_INVALID_OPERAND	0x8000000000000000ULL
+#define TDVMCALL_STATUS_ALIGN_ERROR	0x8000000000000002ULL
 
 /*
  * Bitmasks of exposed registers (with VMM).
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index daa49f2ee2b3..8b51b4c937e9 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -981,9 +981,122 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
 	return __kvm_emulate_hypercall(vcpu, 0, complete_hypercall_exit);
 }
 
+/*
+ * Split into chunks and check interrupt pending between chunks.  This allows
+ * for timely injection of interrupts to prevent issues with guest lockup
+ * detection.
+ */
+#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024)
+static void __tdx_map_gpa(struct vcpu_tdx *tdx);
+
+static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+
+	if (vcpu->run->hypercall.ret) {
+		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND);
+		tdx->vp_enter_args.r11 = tdx->map_gpa_next;
+		return 1;
+	}
+
+	tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN;
+	if (tdx->map_gpa_next >= tdx->map_gpa_end)
+		return 1;
+
+	/*
+	 * Stop processing the remaining part if there is pending interrupt.
+	 * Skip checking pending virtual interrupt (reflected by
+	 * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because
+	 * if guest disabled interrupt, it's OK not returning back to guest
+	 * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA
+	 * immediately after STI or MOV/POP SS.
+	 */
+	if (pi_has_pending_interrupt(vcpu) ||
+	    kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
+		tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
+		tdx->vp_enter_args.r11 = tdx->map_gpa_next;
+		return 1;
+	}
+
+	__tdx_map_gpa(tdx);
+	return 0;
+}
+
+static void __tdx_map_gpa(struct vcpu_tdx *tdx)
+{
+	u64 gpa = tdx->map_gpa_next;
+	u64 size = tdx->map_gpa_end - tdx->map_gpa_next;
+
+	if (size > TDX_MAP_GPA_MAX_LEN)
+		size = TDX_MAP_GPA_MAX_LEN;
+
+	tdx->vcpu.run->exit_reason       = KVM_EXIT_HYPERCALL;
+	tdx->vcpu.run->hypercall.nr      = KVM_HC_MAP_GPA_RANGE;
+	/*
+	 * In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2)
+	 * assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that
+	 * it was always zero on KVM_EXIT_HYPERCALL.  Since KVM is now overwriting
+	 * vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU.
+	 */
+	tdx->vcpu.run->hypercall.ret = 0;
+	tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
+	tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
+	tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ?
+					   KVM_MAP_GPA_RANGE_ENCRYPTED :
+					   KVM_MAP_GPA_RANGE_DECRYPTED;
+	tdx->vcpu.run->hypercall.flags   = KVM_EXIT_HYPERCALL_LONG_MODE;
+
+	tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa;
+}
+
+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	u64 gpa = tdx->vp_enter_args.r12;
+	u64 size = tdx->vp_enter_args.r13;
+	u64 ret;
+
+	/*
+	 * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires
+	 * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
+	 * bit set.  If not, the error code is not defined in GHCI for TDX, use
+	 * TDVMCALL_STATUS_INVALID_OPERAND for this case.
+	 */
+	if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
+		ret = TDVMCALL_STATUS_INVALID_OPERAND;
+		goto error;
+	}
+
+	if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) ||
+	    !kvm_vcpu_is_legal_gpa(vcpu, gpa + size - 1) ||
+	    (vt_is_tdx_private_gpa(vcpu->kvm, gpa) !=
+	     vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1))) {
+		ret = TDVMCALL_STATUS_INVALID_OPERAND;
+		goto error;
+	}
+
+	if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) {
+		ret = TDVMCALL_STATUS_ALIGN_ERROR;
+		goto error;
+	}
+
+	tdx->map_gpa_end = gpa + size;
+	tdx->map_gpa_next = gpa;
+
+	__tdx_map_gpa(tdx);
+	return 0;
+
+error:
+	tdvmcall_set_return_code(vcpu, ret);
+	tdx->vp_enter_args.r11 = gpa;
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	switch (tdvmcall_leaf(vcpu)) {
+	case TDVMCALL_MAP_GPA:
+		return tdx_map_gpa(vcpu);
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 0e3522e423cc..45c1d064b6b7 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -57,6 +57,9 @@ struct vcpu_tdx {
 	u64 vp_enter_ret;
 
 	enum vcpu_tdx_state state;
+
+	u64 map_gpa_next;
+	u64 map_gpa_end;
 };
 
 void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
-- 
2.46.0


  parent reply	other threads:[~2025-02-11  2:53 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11  2:54 [PATCH v2 0/8] KVM: TDX: TDX hypercalls may exit to userspace Binbin Wu
2025-02-11  2:54 ` [PATCH v2 1/8] KVM: x86: Have ____kvm_emulate_hypercall() read the GPRs Binbin Wu
2025-02-11  5:05   ` Huang, Kai
2025-02-11 10:23   ` Xiaoyao Li
2025-02-12  1:32     ` Binbin Wu
2025-02-12  3:12       ` Xiaoyao Li
2025-02-11  2:54 ` [PATCH v2 2/8] KVM: TDX: Add a place holder to handle TDX VM exit Binbin Wu
2025-02-11  2:54 ` [PATCH v2 3/8] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Binbin Wu
2025-02-11  8:41   ` Chao Gao
2025-02-11  9:08     ` Binbin Wu
2025-02-11 23:46     ` Sean Christopherson
2025-02-12  2:21       ` Binbin Wu
2025-02-11  2:54 ` [PATCH v2 4/8] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Binbin Wu
2025-02-11 23:48   ` Sean Christopherson
2025-02-11  2:54 ` Binbin Wu [this message]
2025-02-11  6:54   ` [PATCH v2 5/8] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Yan Zhao
2025-02-11  8:11     ` Binbin Wu
2025-02-11  8:59       ` Chao Gao
2025-02-12  0:46         ` Sean Christopherson
2025-02-12  5:16           ` Binbin Wu
2025-02-12 18:56             ` Sean Christopherson
2025-02-13  3:23               ` Binbin Wu
2025-02-13  5:11                 ` Binbin Wu
2025-02-13 15:17                   ` Sean Christopherson
2025-02-17  3:41                     ` Binbin Wu
2025-02-19  0:29                       ` Sean Christopherson
2025-02-19  0:49                         ` Binbin Wu
2025-02-11  2:54 ` [PATCH v2 6/8] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError> Binbin Wu
2025-02-12  0:18   ` Sean Christopherson
2025-02-12  5:37     ` Binbin Wu
2025-02-12 13:53       ` Sean Christopherson
2025-02-11  2:54 ` [PATCH v2 7/8] KVM: TDX: Handle TDX PV port I/O hypercall Binbin Wu
2025-02-11  2:54 ` [PATCH v2 8/8] KVM: TDX: Handle TDX PV MMIO hypercall Binbin Wu
2025-02-12  2:28   ` Chao Gao
2025-02-12  2:39     ` Binbin Wu
2025-02-13 21:41       ` Edgecombe, Rick P
2025-02-14  0:47         ` Binbin Wu
2025-02-14  1:01           ` Edgecombe, Rick P
2025-02-14  1:20             ` Binbin Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250211025442.3071607-6-binbin.wu@linux.intel.com \
    --to=binbin.wu@linux.intel.com \
    --cc=adrian.hunter@intel.com \
    --cc=chao.gao@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tony.lindgren@intel.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox