linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part)
@ 2023-08-20 20:27 Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 1/9] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

Hyper-V provides two modes for running Intel TDX VMs:

1) TD Partitioning mode with a paravisor (see [1]).
2) In "fully enlightened" mode with normal TDX shared bit control
   over page encryption, and no paravisor

The first mode is similar to AMD SEV-SNP's vTOM mode (see [2]).
The second is similar to AMD SEV-SNP's C-bit mode(see [2]).

For #2, the v6 patchset was [3], which is later split into 2 parts:
the generic TDX part (see [4][5]), and the Hyper-V specific part, i.e.
the first 5 patches of this patchset. For the second part, I rebased
the patches to Tianyu's fully enlighted SNP v7 patchset [2]. Since this
is a straightforward rebasing, I keep the existing Acked-by and
Reviewed-by in the v6 patchset [3].

The next 3 patches of this patchset add the support for #1.

The last patch (the 9th patch) just makes some cleanup.

Please review all the 9 patches, which are also on my github
branch [6].

I tested the patches for a regular VM, a VBS VM, a SNP VM
with the paravisor, and a TDX VM with the paravisor and a TDX
VM without the paravisor, and an ARM64 VM. All the VMs worked
as expected.

If the patches all look good, I expect that Tianyu's patches [2]
are merged into the Hyper-V tree's hyperv-next branch first, then
these 9 patches can be merged afterwards.

Compared with v1, the difference is trivial: the v2 is mainly a rebase
to Tianyu's v7.

Thanks,
Dexuan

References:
[1] Intel TDX Module v1.5 TD Partitioning Architecture Specification
[2] https://lwn.net/ml/linux-kernel/20230818102919.1318039-1-ltykernel@gmail.com/
[3] https://lwn.net/ml/linux-kernel/20230504225351.10765-1-decui@microsoft.com/
[4] https://lwn.net/ml/linux-kernel/20230811214826.9609-1-decui%40microsoft.com/
[5] https://github.com/dcui/tdx/commits/decui/mainline/x86/tdx/v10
[6] https://github.com/dcui/tdx/commits/decui/mainline/x86/hyperv/tdx-v2 

Dexuan Cui (9):
  x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  x86/hyperv: Support hypercalls for fully enlightened TDX guests
  Drivers: hv: vmbus: Support fully enlightened TDX guests
  x86/hyperv: Fix serial console interrupts for fully enlightened TDX
    guests
  Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
  x86/hyperv: Introduce a global variable hyperv_paravisor_present
  Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the
    paravisor
  x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the
    paravisor
  x86/hyperv: Remove hv_isolation_type_en_snp

 arch/x86/hyperv/hv_apic.c          |  15 +++-
 arch/x86/hyperv/hv_init.c          |  59 +++++++++++---
 arch/x86/hyperv/ivm.c              | 120 ++++++++++++++++++++++++++---
 arch/x86/include/asm/hyperv-tlfs.h |   3 +-
 arch/x86/include/asm/mshyperv.h    |  42 +++++++---
 arch/x86/kernel/cpu/mshyperv.c     |  75 +++++++++++++++---
 drivers/hv/connection.c            |   2 +-
 drivers/hv/hv.c                    |  88 +++++++++++++++++----
 drivers/hv/hv_common.c             |  46 ++++++++---
 drivers/hv/hyperv_vmbus.h          |  11 +++
 include/asm-generic/mshyperv.h     |   6 +-
 11 files changed, 391 insertions(+), 76 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/9] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 2/9] x86/hyperv: Support hypercalls for fully enlightened " Dexuan Cui
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

No logic change to SNP/VBS guests.

hv_isolation_type_tdx() will be used to instruct a TDX guest on Hyper-V to
do some TDX-specific operations, e.g. for a fully enlightened TDX guest
(i.e. without the paravisor), hv_do_hypercall() should use
__tdx_hypercall() and such a guest on Hyper-V should handle the Hyper-V
Event/Message/Monitor pages specially.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
  Added Tianyu's Reviewed-by.

 arch/x86/hyperv/ivm.c              | 9 +++++++++
 arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
 arch/x86/include/asm/mshyperv.h    | 3 +++
 arch/x86/kernel/cpu/mshyperv.c     | 2 ++
 drivers/hv/hv_common.c             | 6 ++++++
 include/asm-generic/mshyperv.h     | 1 +
 6 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index cbbd3af4c3daf..afdae1a8a1177 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -562,3 +562,12 @@ bool hv_isolation_type_en_snp(void)
 	return static_branch_unlikely(&isolation_type_en_snp);
 }
 
+DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
+/*
+ * hv_isolation_type_tdx - Check if the system runs in an Intel TDX based
+ * isolated VM.
+ */
+bool hv_isolation_type_tdx(void)
+{
+	return static_branch_unlikely(&isolation_type_tdx);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 4bf0b315b0ce9..2ff26f53cd624 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -169,7 +169,8 @@
 enum hv_isolation_type {
 	HV_ISOLATION_TYPE_NONE	= 0,
 	HV_ISOLATION_TYPE_VBS	= 1,
-	HV_ISOLATION_TYPE_SNP	= 2
+	HV_ISOLATION_TYPE_SNP	= 2,
+	HV_ISOLATION_TYPE_TDX	= 3
 };
 
 /* Hyper-V specific model specific registers (MSRs) */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6bd9ae04d9c36..e18c6c8f4fba8 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -27,6 +27,7 @@ union hv_ghcb;
 
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
 DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
 
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
@@ -49,6 +50,8 @@ extern u64 hv_current_partition_id;
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
 extern bool hv_isolation_type_en_snp(void);
+bool hv_isolation_type_tdx(void);
+
 /*
  * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
  * to start AP in enlightened SEV guest.
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 26b9fcabd7d95..e0640fe6f6b82 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -417,6 +417,8 @@ static void __init ms_hyperv_init_platform(void)
 			static_branch_enable(&isolation_type_en_snp);
 		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
+		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
+			static_branch_enable(&isolation_type_tdx);
 		}
 	}
 
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 2d43ba2bc925d..da3307533f4d7 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -521,6 +521,12 @@ bool __weak hv_isolation_type_en_snp(void)
 }
 EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
 
+bool __weak hv_isolation_type_tdx(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
+
 void __weak hv_setup_vmbus_handler(void (*handler)(void))
 {
 }
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index efd0d2aedad39..82eba2d5fc4cd 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -66,6 +66,7 @@ extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
 extern bool hv_isolation_type_snp(void);
 extern bool hv_isolation_type_en_snp(void);
+bool hv_isolation_type_tdx(void);
 
 /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
 static inline int hv_result(u64 status)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/9] x86/hyperv: Support hypercalls for fully enlightened TDX guests
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 1/9] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 3/9] Drivers: hv: vmbus: Support " Dexuan Cui
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

A fully enlightened TDX guest on Hyper-V (i.e. without the paravisor) only
uses the GHCI call rather than hv_hypercall_pg.

In hv_do_hypercall(), Hyper-V requires that the input/output addresses
must have the cc_mask.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
  Included asm/coco.h in arch/x86/include/asm/mshyperv.h to avoid a
gcc warning: "implicit declaration of cc_mkdec"

 arch/x86/hyperv/hv_init.c       |  8 ++++++++
 arch/x86/hyperv/ivm.c           | 17 +++++++++++++++++
 arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
 drivers/hv/hv_common.c          | 10 ++++++++--
 include/asm-generic/mshyperv.h  |  1 +
 5 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index bcfbcda8b050c..255e02ec467eb 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -476,6 +476,10 @@ void __init hyperv_init(void)
 	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
 	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
+	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+	if (hv_isolation_type_tdx())
+		goto skip_hypercall_pg_init;
+
 	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -515,6 +519,7 @@ void __init hyperv_init(void)
 		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	}
 
+skip_hypercall_pg_init:
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -642,6 +647,9 @@ bool hv_is_hyperv_initialized(void)
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return false;
 
+	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+	if (hv_isolation_type_tdx())
+		return true;
 	/*
 	 * Verify that earlier initialization succeeded by checking
 	 * that the hypercall page is setup
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index afdae1a8a1177..6c7598d9e68a3 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -571,3 +571,20 @@ bool hv_isolation_type_tdx(void)
 {
 	return static_branch_unlikely(&isolation_type_tdx);
 }
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+	struct tdx_hypercall_args args = { };
+
+	args.r10 = control;
+	args.rdx = param1;
+	args.r8  = param2;
+
+	(void)__tdx_hypercall_ret(&args);
+
+	return args.r11;
+}
+
+#endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index e18c6c8f4fba8..24d7f662a8beb 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <linux/nmi.h>
 #include <linux/msi.h>
+#include <asm/coco.h>
 #include <asm/io.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/nospec-branch.h>
@@ -51,6 +52,7 @@ extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
 extern bool hv_isolation_type_en_snp(void);
 bool hv_isolation_type_tdx(void);
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
 
 /*
  * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
@@ -63,6 +65,10 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
 
+/*
+ * If the hypercall involves no input or output parameters, the hypervisor
+ * ignores the corresponding GPA pointer.
+ */
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
@@ -70,6 +76,11 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control,
+					cc_mkdec(input_address),
+					cc_mkdec(output_address));
+
 	if (hv_isolation_type_en_snp()) {
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     "vmmcall"
@@ -123,6 +134,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control, input1, 0);
+
 	if (hv_isolation_type_en_snp()) {
 		__asm__ __volatile__(
 				"vmmcall"
@@ -174,6 +188,9 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control, input1, input2);
+
 	if (hv_isolation_type_en_snp()) {
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     "vmmcall"
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index da3307533f4d7..897bbb96f4118 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -381,10 +381,10 @@ int hv_common_cpu_init(unsigned int cpu)
 			*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
 		}
 
-		if (hv_isolation_type_en_snp()) {
+		if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) {
 			ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
 			if (ret) {
-				kfree(*inputarg);
+				/* It may be unsafe to free *inputarg */
 				*inputarg = NULL;
 				return ret;
 			}
@@ -567,3 +567,9 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s
 	return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+u64 __weak hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+	return HV_STATUS_INVALID_PARAMETER;
+}
+EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 82eba2d5fc4cd..f577eff58ea0b 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -283,6 +283,7 @@ enum hv_isolation_type hv_get_isolation_type(void);
 bool hv_is_isolation_supported(void);
 bool hv_isolation_type_snp(void);
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
 void hv_setup_dma_ops(struct device *dev, bool coherent);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/9] Drivers: hv: vmbus: Support fully enlightened TDX guests
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 1/9] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 2/9] x86/hyperv: Support hypercalls for fully enlightened " Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 4/9] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

Add Hyper-V specific code so that a fully enlightened TDX guest (i.e.
without the paravisor) can run on Hyper-V:
  Don't use hv_vp_assist_page. Use GHCI instead.
  Don't try to use the unsupported HV_REGISTER_CRASH_CTL.
  Don't trust (use) Hyper-V's TLB-flushing hypercalls.
  Don't use lazy EOI.
  Share the SynIC Event/Message pages with the hypervisor.
  Don't use the Hyper-V TSC page for now, because non-trivial work is
    required to share the page with the hypervisor.

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2: None

 arch/x86/hyperv/hv_apic.c      | 15 ++++++++++++---
 arch/x86/hyperv/hv_init.c      | 19 +++++++++++++++----
 arch/x86/kernel/cpu/mshyperv.c | 23 +++++++++++++++++++++++
 drivers/hv/hv.c                | 17 +++++++++++++++--
 4 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 1fbda2f94184e..cb7429046d18d 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -177,8 +177,11 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
 	    (exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
 		return true;
 
-	if (!hv_hypercall_pg)
-		return false;
+	/* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
+	if (!hv_hypercall_pg) {
+		if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
+			return false;
+	}
 
 	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
 		return false;
@@ -231,9 +234,15 @@ static bool __send_ipi_one(int cpu, int vector)
 
 	trace_hyperv_send_ipi_one(cpu, vector);
 
-	if (!hv_hypercall_pg || (vp == VP_INVAL))
+	if (vp == VP_INVAL)
 		return false;
 
+	/* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
+	if (!hv_hypercall_pg) {
+		if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
+			return false;
+	}
+
 	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
 		return false;
 
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 255e02ec467eb..c1c1b4e1502f0 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -80,7 +80,7 @@ static int hyperv_init_ghcb(void)
 static int hv_cpu_init(unsigned int cpu)
 {
 	union hv_vp_assist_msr_contents msr = { 0 };
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+	struct hv_vp_assist_page **hvp;
 	int ret;
 
 	ret = hv_common_cpu_init(cpu);
@@ -90,6 +90,7 @@ static int hv_cpu_init(unsigned int cpu)
 	if (!hv_vp_assist_page)
 		return 0;
 
+	hvp = &hv_vp_assist_page[cpu];
 	if (hv_root_partition) {
 		/*
 		 * For root partition we get the hypervisor provided VP assist
@@ -442,11 +443,21 @@ void __init hyperv_init(void)
 	if (hv_common_init())
 		return;
 
-	hv_vp_assist_page = kcalloc(num_possible_cpus(),
-				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
+	/*
+	 * The VP assist page is useless to a TDX guest: the only use we
+	 * would have for it is lazy EOI, which can not be used with TDX.
+	 */
+	if (hv_isolation_type_tdx())
+		hv_vp_assist_page = NULL;
+	else
+		hv_vp_assist_page = kcalloc(num_possible_cpus(),
+					    sizeof(*hv_vp_assist_page),
+					    GFP_KERNEL);
 	if (!hv_vp_assist_page) {
 		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
-		goto common_free;
+
+		if (!hv_isolation_type_tdx())
+			goto common_free;
 	}
 
 	if (hv_isolation_type_snp()) {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0640fe6f6b82..2282f91716100 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -419,6 +419,29 @@ static void __init ms_hyperv_init_platform(void)
 			static_branch_enable(&isolation_type_snp);
 		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
+
+			/* A TDX VM must use x2APIC and doesn't use lazy EOI. */
+			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+
+			if (!ms_hyperv.paravisor_present) {
+				/*
+				 * The ms_hyperv.shared_gpa_boundary_active in
+				 * a fully enlightened TDX VM is 0, but the GPAs
+				 * of the SynIC Event/Message pages and VMBus
+				 * Moniter pages in such a VM still need to be
+				 * added by this offset.
+				 */
+				ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+				/* To be supported: more work is required.  */
+				ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+				/* HV_REGISTER_CRASH_CTL is unsupported. */
+				ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+
+				/* Don't trust Hyper-V's TLB-flushing hypercalls. */
+				ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+			}
 		}
 	}
 
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index ec6e35a0d9bf6..28bbb354324d6 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -121,11 +121,15 @@ int hv_synic_alloc(void)
 				(void *)get_zeroed_page(GFP_ATOMIC);
 			if (hv_cpu->synic_event_page == NULL) {
 				pr_err("Unable to allocate SYNIC event page\n");
+
+				free_page((unsigned long)hv_cpu->synic_message_page);
+				hv_cpu->synic_message_page = NULL;
 				goto err;
 			}
 		}
 
-		if (hv_isolation_type_en_snp()) {
+		if (!ms_hyperv.paravisor_present &&
+		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)
 				hv_cpu->synic_message_page, 1);
 			if (ret) {
@@ -174,7 +178,8 @@ void hv_synic_free(void)
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		/* It's better to leak the page if the encryption fails. */
-		if (hv_isolation_type_en_snp()) {
+		if (!ms_hyperv.paravisor_present &&
+		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
 			if (hv_cpu->synic_message_page) {
 				ret = set_memory_encrypted((unsigned long)
 					hv_cpu->synic_message_page, 1);
@@ -232,6 +237,10 @@ void hv_synic_enable_regs(unsigned int cpu)
 	} else {
 		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
 			>> HV_HYP_PAGE_SHIFT;
+
+		if (hv_isolation_type_tdx())
+			simp.base_simp_gpa |= ms_hyperv.shared_gpa_boundary
+				>> HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
@@ -251,6 +260,10 @@ void hv_synic_enable_regs(unsigned int cpu)
 	} else {
 		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
 			>> HV_HYP_PAGE_SHIFT;
+
+		if (hv_isolation_type_tdx())
+			siefp.base_siefp_gpa |= ms_hyperv.shared_gpa_boundary
+				>> HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/9] x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
                   ` (2 preceding siblings ...)
  2023-08-20 20:27 ` [PATCH v2 3/9] Drivers: hv: vmbus: Support " Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM Dexuan Cui
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

When a fully enlightened TDX guest runs on Hyper-V, the UEFI firmware sets
the HW_REDUCED flag and consequently ttyS0 interrupts can't work. Fix the
issue by overriding x86_init.acpi.reduced_hw_early_init().

Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2: None

 arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 2282f91716100..14baa288b1935 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -323,6 +323,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
 }
 #endif
 
+/*
+ * When a fully enlightened TDX VM runs on Hyper-V, the firmware sets the
+ * HW_REDUCED flag: refer to acpi_tb_create_local_fadt(). Consequently ttyS0
+ * interrupts can't work because request_irq() -> ... -> irq_to_desc() returns
+ * NULL for ttyS0. This happens because mp_config_acpi_legacy_irqs() sees a
+ * nr_legacy_irqs() of 0, so it doesn't initialize the array 'mp_irqs[]', and
+ * later setup_IO_APIC_irqs() -> find_irq_entry() fails to find the legacy irqs
+ * from the array and hence doesn't create the necessary irq description info.
+ *
+ * Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() here,
+ * except don't change 'legacy_pic', which keeps its default value
+ * 'default_legacy_pic'. This way, mp_config_acpi_legacy_irqs() sees a non-zero
+ * nr_legacy_irqs() and eventually serial console interrupts works properly.
+ */
+static void __init reduced_hw_init(void)
+{
+	x86_init.timers.timer_init	= x86_init_noop;
+	x86_init.irqs.pre_vector_init	= x86_init_noop;
+}
+
 static void __init ms_hyperv_init_platform(void)
 {
 	int hv_max_functions_eax;
@@ -441,6 +461,8 @@ static void __init ms_hyperv_init_platform(void)
 
 				/* Don't trust Hyper-V's TLB-flushing hypercalls. */
 				ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+
+				x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
 			}
 		}
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
                   ` (3 preceding siblings ...)
  2023-08-20 20:27 ` [PATCH v2 4/9] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-21 14:29   ` Michael Kelley (LINUX)
  2023-08-20 20:27 ` [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present Dexuan Cui
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

Don't set *this_cpu_ptr(hyperv_pcpu_input_arg) before the function
set_memory_decrypted() returns, otherwise we run into this ticky issue:

For a fully enlightened TDX/SNP VM, in hv_common_cpu_init(),
*this_cpu_ptr(hyperv_pcpu_input_arg) is an encrypted page before
the set_memory_decrypted() returns.

When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is not
NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... ->
cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... ->
__send_ipi_mask_ex() tries to call hv_do_fast_hypercall16() with the
hyperv_pcpu_input_arg as the hypercall input page, which must be a
decrypted page in such a VM, but the page is still encrypted at this
point, and a fatal fault is triggered.

Fix the issue by setting *this_cpu_ptr(hyperv_pcpu_input_arg) after
set_memory_decrypted(): if the hyperv_pcpu_input_arg is NULL,
__send_ipi_mask_ex() returns HV_STATUS_INVALID_PARAMETER immediately,
and hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
which can use x2apic_send_IPI_all(), which may be slightly slower than
the hypercall but still works correctly in such a VM.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2: None

 drivers/hv/hv_common.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 897bbb96f4118..4c858e1636da7 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -360,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu)
 	u64 msr_vp_index;
 	gfp_t flags;
 	int pgcount = hv_root_partition ? 2 : 1;
+	void *mem;
 	int ret;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
@@ -372,25 +373,40 @@ int hv_common_cpu_init(unsigned int cpu)
 	 * allocated if this CPU was previously online and then taken offline
 	 */
 	if (!*inputarg) {
-		*inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
-		if (!(*inputarg))
+		mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
+		if (!mem)
 			return -ENOMEM;
 
 		if (hv_root_partition) {
 			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
-			*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
+			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
 		}
 
 		if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) {
-			ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+			ret = set_memory_decrypted((unsigned long)mem, pgcount);
 			if (ret) {
-				/* It may be unsafe to free *inputarg */
-				*inputarg = NULL;
+				/* It may be unsafe to free 'mem' */
 				return ret;
 			}
 
-			memset(*inputarg, 0x00, pgcount * PAGE_SIZE);
+			memset(mem, 0x00, pgcount * HV_HYP_PAGE_SIZE);
 		}
+
+		/*
+		 * In a fully enlightened TDX/SNP VM with more than 64 VPs, if
+		 * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() ->
+		 * ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to
+		 * use hyperv_pcpu_input_arg as the hypercall input page, which
+		 * must be a decrypted page in such a VM, but the page is still
+		 * encrypted before set_memory_decrypted() returns. Fix this by
+		 * setting *inputarg after the above set_memory_decrypted(): if
+		 * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns
+		 * HV_STATUS_INVALID_PARAMETER immediately, and the function
+		 * hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
+		 * which may be slightly slower than the hypercall, but still
+		 * works correctly in such a VM.
+		 */
+		*inputarg = mem;
 	}
 
 	msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
                   ` (4 preceding siblings ...)
  2023-08-20 20:27 ` [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-21 19:33   ` Michael Kelley (LINUX)
  2023-08-20 20:27 ` [PATCH v2 7/9] Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor Dexuan Cui
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

The new variable hyperv_paravisor_present is set only when the VM
is a SNP/TDX with the paravisor running: see ms_hyperv_init_platform().

In many places, hyperv_paravisor_present can replace
ms_hyperv.paravisor_present, and it's also used to replace
hv_isolation_type_snp() in drivers/hv/hv.c.

Call hv_vtom_init() when it's a VBS VM or when hyperv_paravisor_present
is true (i.e. the VM is a SNP/TDX VM with the paravisor).

Enhance hv_vtom_init() for a TDX VM with the paravisor.

The biggest motive to introduce hyperv_paravisor_present is that we
can not use ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
that would introduce a complicated header file dependency issue.

In arch/x86/include/asm/mshyperv.h, _hv_do_fast_hypercall8()
is changed to specially handle HVCALL_SIGNAL_EVENT for a TDX VM with the
paravisor, because such a VM must use TDX GHCI (see hv_tdx_hypercall())
for this hypercall. See vmbus_set_event() -> hv_do_fast_hypercall8() ->
_hv_do_fast_hypercall8() -> hv_tdx_hypercall().

In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
for a TDX VM with the paravisor, just like we don't decrypt the page
for a SNP VM with the paravisor.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2: None

 arch/x86/hyperv/hv_apic.c       |  4 ++--
 arch/x86/hyperv/hv_init.c       |  4 ++--
 arch/x86/hyperv/ivm.c           | 20 ++++++++++++++++++--
 arch/x86/include/asm/mshyperv.h |  9 ++++++---
 arch/x86/kernel/cpu/mshyperv.c  | 13 ++++++++++---
 drivers/hv/connection.c         |  2 +-
 drivers/hv/hv.c                 | 12 ++++++------
 drivers/hv/hv_common.c          |  6 +++++-
 include/asm-generic/mshyperv.h  |  1 +
 9 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index cb7429046d18d..8958836500d01 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -179,7 +179,7 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
 
 	/* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
 	if (!hv_hypercall_pg) {
-		if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
+		if (hyperv_paravisor_present || !hv_isolation_type_tdx())
 			return false;
 	}
 
@@ -239,7 +239,7 @@ static bool __send_ipi_one(int cpu, int vector)
 
 	/* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
 	if (!hv_hypercall_pg) {
-		if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
+		if (hyperv_paravisor_present || !hv_isolation_type_tdx())
 			return false;
 	}
 
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index c1c1b4e1502f0..933a53ef81197 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -658,8 +658,8 @@ bool hv_is_hyperv_initialized(void)
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return false;
 
-	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
-	if (hv_isolation_type_tdx())
+	/* A TDX VM with no paravisor uses TDX GHCI call rather than hv_hypercall_pg */
+	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
 		return true;
 	/*
 	 * Verify that earlier initialization succeeded by checking
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 6c7598d9e68a3..920ecb85802b8 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -497,13 +497,29 @@ int hv_snp_boot_ap(int cpu, unsigned long start_ip)
 
 void __init hv_vtom_init(void)
 {
+	enum hv_isolation_type type = hv_get_isolation_type();
 	/*
 	 * By design, a VM using vTOM doesn't see the SEV setting,
 	 * so SEV initialization is bypassed and sev_status isn't set.
 	 * Set it here to indicate a vTOM VM.
 	 */
-	sev_status = MSR_AMD64_SNP_VTOM;
-	cc_vendor = CC_VENDOR_AMD;
+	switch (type) {
+	case HV_ISOLATION_TYPE_VBS:
+		fallthrough;
+
+	case HV_ISOLATION_TYPE_SNP:
+		sev_status = MSR_AMD64_SNP_VTOM;
+		cc_vendor = CC_VENDOR_AMD;
+		break;
+
+	case HV_ISOLATION_TYPE_TDX:
+		cc_vendor = CC_VENDOR_INTEL;
+		break;
+
+	default:
+		panic("hv_vtom_init: unsupported isolation type %d\n", type);
+	}
+
 	cc_set_mask(ms_hyperv.shared_gpa_boundary);
 	physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 24d7f662a8beb..2a4c7dcf64038 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -43,6 +43,7 @@ static inline unsigned char hv_get_nmi_reason(void)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 extern int hyperv_init_cpuhp;
+extern bool hyperv_paravisor_present;
 
 extern void *hv_hypercall_pg;
 
@@ -76,7 +77,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx())
+	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
 		return hv_tdx_hypercall(control,
 					cc_mkdec(input_address),
 					cc_mkdec(output_address));
@@ -134,7 +135,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx())
+	if (hv_isolation_type_tdx() &&
+		(!hyperv_paravisor_present ||
+		 control == (HVCALL_SIGNAL_EVENT | HV_HYPERCALL_FAST_BIT)))
 		return hv_tdx_hypercall(control, input1, 0);
 
 	if (hv_isolation_type_en_snp()) {
@@ -188,7 +191,7 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx())
+	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
 		return hv_tdx_hypercall(control, input1, input2);
 
 	if (hv_isolation_type_en_snp()) {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 14baa288b1935..3dff2ef43bc73 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -40,6 +40,12 @@ bool hv_root_partition;
 bool hv_nested;
 struct ms_hyperv_info ms_hyperv;
 
+/*
+ * Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h.
+ * Exported in drivers/hv/hv_common.c to not break the ARM64 build.
+ */
+bool hyperv_paravisor_present __ro_after_init;
+
 #if IS_ENABLED(CONFIG_HYPERV)
 static inline unsigned int hv_get_nested_reg(unsigned int reg)
 {
@@ -429,6 +435,8 @@ static void __init ms_hyperv_init_platform(void)
 			ms_hyperv.shared_gpa_boundary =
 				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
 
+		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
+
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
 
@@ -443,7 +451,7 @@ static void __init ms_hyperv_init_platform(void)
 			/* A TDX VM must use x2APIC and doesn't use lazy EOI. */
 			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
 
-			if (!ms_hyperv.paravisor_present) {
+			if (!hyperv_paravisor_present) {
 				/*
 				 * The ms_hyperv.shared_gpa_boundary_active in
 				 * a fully enlightened TDX VM is 0, but the GPAs
@@ -534,8 +542,7 @@ static void __init ms_hyperv_init_platform(void)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
-	    ((hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) &&
-	    ms_hyperv.paravisor_present))
+	    hyperv_paravisor_present)
 		hv_vtom_init();
 	/*
 	 * Setup the hook to get control post apic initialization.
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 02b54f85dc607..7f64fc942323b 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -484,7 +484,7 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
 	++channel->sig_events;
 
-	if (hv_isolation_type_snp())
+	if (hv_isolation_type_snp() && hyperv_paravisor_present)
 		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
 				NULL, sizeof(channel->sig_event));
 	else
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 28bbb354324d6..20bc44923e4f0 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -109,7 +109,7 @@ int hv_synic_alloc(void)
 		 * Synic message and event pages are allocated by paravisor.
 		 * Skip these pages allocation here.
 		 */
-		if (!hv_isolation_type_snp() && !hv_root_partition) {
+		if (!hyperv_paravisor_present && !hv_root_partition) {
 			hv_cpu->synic_message_page =
 				(void *)get_zeroed_page(GFP_ATOMIC);
 			if (hv_cpu->synic_message_page == NULL) {
@@ -128,7 +128,7 @@ int hv_synic_alloc(void)
 			}
 		}
 
-		if (!ms_hyperv.paravisor_present &&
+		if (!hyperv_paravisor_present &&
 		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)
 				hv_cpu->synic_message_page, 1);
@@ -226,7 +226,7 @@ void hv_synic_enable_regs(unsigned int cpu)
 	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
 	simp.simp_enabled = 1;
 
-	if (hv_isolation_type_snp() || hv_root_partition) {
+	if (hyperv_paravisor_present || hv_root_partition) {
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
@@ -249,7 +249,7 @@ void hv_synic_enable_regs(unsigned int cpu)
 	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
 	siefp.siefp_enabled = 1;
 
-	if (hv_isolation_type_snp() || hv_root_partition) {
+	if (hyperv_paravisor_present || hv_root_partition) {
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
@@ -336,7 +336,7 @@ void hv_synic_disable_regs(unsigned int cpu)
 	 * addresses.
 	 */
 	simp.simp_enabled = 0;
-	if (hv_isolation_type_snp() || hv_root_partition) {
+	if (hyperv_paravisor_present || hv_root_partition) {
 		iounmap(hv_cpu->synic_message_page);
 		hv_cpu->synic_message_page = NULL;
 	} else {
@@ -348,7 +348,7 @@ void hv_synic_disable_regs(unsigned int cpu)
 	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
 	siefp.siefp_enabled = 0;
 
-	if (hv_isolation_type_snp() || hv_root_partition) {
+	if (hyperv_paravisor_present || hv_root_partition) {
 		iounmap(hv_cpu->synic_event_page);
 		hv_cpu->synic_event_page = NULL;
 	} else {
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 4c858e1636da7..c0b0ac44ffa3c 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -40,6 +40,9 @@
 bool __weak hv_root_partition;
 EXPORT_SYMBOL_GPL(hv_root_partition);
 
+bool __weak hyperv_paravisor_present;
+EXPORT_SYMBOL_GPL(hyperv_paravisor_present);
+
 bool __weak hv_nested;
 EXPORT_SYMBOL_GPL(hv_nested);
 
@@ -382,7 +385,8 @@ int hv_common_cpu_init(unsigned int cpu)
 			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
 		}
 
-		if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) {
+		if (!hyperv_paravisor_present &&
+		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)mem, pgcount);
 			if (ret) {
 				/* It may be unsafe to free 'mem' */
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index f577eff58ea0b..94f87a0344590 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -176,6 +176,7 @@ extern int vmbus_interrupt;
 extern int vmbus_irq;
 
 extern bool hv_root_partition;
+extern bool hyperv_paravisor_present;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 7/9] Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
                   ` (5 preceding siblings ...)
  2023-08-20 20:27 ` [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM " Dexuan Cui
  2023-08-20 20:27 ` [PATCH v2 9/9] x86/hyperv: Remove hv_isolation_type_en_snp Dexuan Cui
  8 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

The post_msg_page was removed in
commit 9a6b1a170ca8 ("Drivers: hv: vmbus: Remove the per-CPU post_msg_page")

However, it turns out that we need to bring it back, but only for a TDX VM
with the paravisor: in such a VM, the hyperv_pcpu_input_arg is not decrypted,
but the HVCALL_POST_MESSAGE in such a VM needs a decrypted page as the
hypercall input page: see the comments in hyperv_init() for a detailed
explanation.

Except for HVCALL_POST_MESSAGE and HVCALL_SIGNAL_EVENT, the other hypercalls
in a TDX VM with the paravisor still use hv_hypercall_pg and must use the
hyperv_pcpu_input_arg (which is encrypted in such a VM), when a hypercall
input page is used.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2: None

 arch/x86/hyperv/hv_init.c | 20 +++++++++++--
 drivers/hv/hv.c           | 63 ++++++++++++++++++++++++++++++++++-----
 drivers/hv/hyperv_vmbus.h | 11 +++++++
 3 files changed, 85 insertions(+), 9 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 933a53ef81197..892e52afa37cd 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -480,6 +480,22 @@ void __init hyperv_init(void)
 	 * Setup the hypercall page and enable hypercalls.
 	 * 1. Register the guest ID
 	 * 2. Enable the hypercall and register the hypercall page
+	 *
+	 * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
+	 * when the hypercall input is a page, such a VM must pass a decrypted
+	 * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
+	 * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
+	 *
+	 * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
+	 * which are handled by the paravisor and the VM must use an encrypted
+	 * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
+	 * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
+	 * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
+	 * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
+	 * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
+	 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
+	 * instead, hv_post_message() uses the post_msg_page, which is decrypted
+	 * in such a VM and is only used in such a VM.
 	 */
 	guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
@@ -487,8 +503,8 @@ void __init hyperv_init(void)
 	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
 	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
-	if (hv_isolation_type_tdx())
+	/* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg */
+	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
 		goto skip_hypercall_pg_init;
 
 	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 20bc44923e4f0..6b5f1805d4749 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -57,20 +57,39 @@ int hv_post_message(union hv_connection_id connection_id,
 
 	local_irq_save(flags);
 
-	aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	/*
+	 * A TDX VM with the paravisor must use the decrypted post_msg_page: see
+	 * the comment in struct hv_per_cpu_context. A SNP VM with the paravisor
+	 * can use the encrypted hyperv_pcpu_input_arg because it copies the
+	 * input into the GHCB page, which has been decrypted by the paravisor.
+	 */
+	if (hv_isolation_type_tdx() && hyperv_paravisor_present)
+		aligned_msg = this_cpu_ptr(hv_context.cpu_context)->post_msg_page;
+	else
+		aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
 	aligned_msg->connectionid = connection_id;
 	aligned_msg->reserved = 0;
 	aligned_msg->message_type = message_type;
 	aligned_msg->payload_size = payload_size;
 	memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-	if (hv_isolation_type_snp())
-		status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
-				(void *)aligned_msg, NULL,
-				sizeof(*aligned_msg));
-	else
+	if (hyperv_paravisor_present) {
+		if (hv_isolation_type_tdx()) {
+			u64 gpa_boundary = ms_hyperv.shared_gpa_boundary;
+			u64 in = virt_to_phys(aligned_msg) | gpa_boundary;
+
+			status = hv_tdx_hypercall(HVCALL_POST_MESSAGE, in, 0);
+		} else if (hv_isolation_type_snp())
+			status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+						   aligned_msg, NULL,
+						   sizeof(*aligned_msg));
+		else
+			status = HV_STATUS_INVALID_PARAMETER;
+	} else {
 		status = hv_do_hypercall(HVCALL_POST_MESSAGE,
 				aligned_msg, NULL);
+	}
 
 	local_irq_restore(flags);
 
@@ -105,6 +124,24 @@ int hv_synic_alloc(void)
 		tasklet_init(&hv_cpu->msg_dpc,
 			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
 
+		if (hyperv_paravisor_present && hv_isolation_type_tdx()) {
+			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
+			if (hv_cpu->post_msg_page == NULL) {
+				pr_err("Unable to allocate post msg page\n");
+				goto err;
+			}
+
+			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt post msg page: %d\n", ret);
+				/* Just leak the page, as it's unsafe to free the page. */
+				hv_cpu->post_msg_page = NULL;
+				goto err;
+			}
+
+			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
+		}
+
 		/*
 		 * Synic message and event pages are allocated by paravisor.
 		 * Skip these pages allocation here.
@@ -178,7 +215,18 @@ void hv_synic_free(void)
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		/* It's better to leak the page if the encryption fails. */
-		if (!ms_hyperv.paravisor_present &&
+		if (hyperv_paravisor_present && hv_isolation_type_tdx()) {
+			if (hv_cpu->post_msg_page) {
+				ret = set_memory_encrypted((unsigned long)
+					hv_cpu->post_msg_page, 1);
+				if (ret) {
+					pr_err("Failed to encrypt post msg page: %d\n", ret);
+					hv_cpu->post_msg_page = NULL;
+				}
+			}
+		}
+
+		if (!hyperv_paravisor_present &&
 		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
 			if (hv_cpu->synic_message_page) {
 				ret = set_memory_encrypted((unsigned long)
@@ -199,6 +247,7 @@ void hv_synic_free(void)
 			}
 		}
 
+		free_page((unsigned long)hv_cpu->post_msg_page);
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
 	}
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 55f2086841ae4..f6b1e710f8055 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -123,6 +123,17 @@ struct hv_per_cpu_context {
 	void *synic_message_page;
 	void *synic_event_page;
 
+	/*
+	 * The page is only used in hv_post_message() for a TDX VM (with the
+	 * paravisor) to post a messages to Hyper-V: when such a VM calls
+	 * HVCALL_POST_MESSAGE, it can't use the hyperv_pcpu_input_arg (which
+	 * is encrypted in such a VM) as the hypercall input page, because
+	 * the input page for HVCALL_POST_MESSAGE must be decrypted in such a
+	 * VM, so post_msg_page (which is decrypted in hv_synic_alloc()) is
+	 * introduced for this purpose. See hyperv_init() for more comments.
+	 */
+	void *post_msg_page;
+
 	/*
 	 * Starting with win8, we can take channel interrupts on any CPU;
 	 * we will manage the tasklet that handles events messages on a per CPU
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
                   ` (6 preceding siblings ...)
  2023-08-20 20:27 ` [PATCH v2 7/9] Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  2023-08-21 19:33   ` Michael Kelley (LINUX)
  2023-08-20 20:27 ` [PATCH v2 9/9] x86/hyperv: Remove hv_isolation_type_en_snp Dexuan Cui
  8 siblings, 1 reply; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

When the paravisor is present, a SNP VM must use GHCB to access some
special MSRs, including HV_X64_MSR_GUEST_OS_ID and some SynIC MSRs.

Similarly, when the paravisor is present, a TDX VM must use TDX GHCI
to access the same MSRs.

Implement hv_tdx_read_msr() and hv_tdx_write_msr(), and use the helper
functions hv_ivm_msr_read() and hv_ivm_msr_write() to access the MSRs
in a unified way for SNP/TDX VMs with the paravisor.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2: None

 arch/x86/hyperv/hv_init.c       |  8 ++--
 arch/x86/hyperv/ivm.c           | 72 +++++++++++++++++++++++++++++++--
 arch/x86/include/asm/mshyperv.h |  8 ++--
 arch/x86/kernel/cpu/mshyperv.c  |  8 ++--
 4 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 892e52afa37cd..18afbb10edc64 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -500,8 +500,8 @@ void __init hyperv_init(void)
 	guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
-	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
+	/* With the paravisor, the VM must also write the ID via GHCB/GHCI */
+	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
 	/* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg */
 	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
@@ -590,7 +590,7 @@ void __init hyperv_init(void)
 
 clean_guest_os_id:
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 	cpuhp_remove_state(cpuhp);
 free_ghcb_page:
 	free_percpu(hv_ghcb_pg);
@@ -611,7 +611,7 @@ void hyperv_cleanup(void)
 
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
+	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
 	/*
 	 * Reset hypercall page reference before reset the page,
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 920ecb85802b8..93d54d8ef12e1 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -186,7 +186,49 @@ bool hv_ghcb_negotiate_protocol(void)
 	return true;
 }
 
-void hv_ghcb_msr_write(u64 msr, u64 value)
+#define EXIT_REASON_MSR_READ		31
+#define EXIT_REASON_MSR_WRITE		32
+
+static void hv_tdx_read_msr(u64 msr, u64 *val)
+{
+	struct tdx_hypercall_args args = {
+		.r10 = TDX_HYPERCALL_STANDARD,
+		.r11 = EXIT_REASON_MSR_READ,
+		.r12 = msr,
+	};
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+	u64 ret = __tdx_hypercall_ret(&args);
+#else
+	u64 ret = HV_STATUS_INVALID_PARAMETER;
+#endif
+
+	if (WARN_ONCE(ret, "Failed to emulate MSR read: %lld\n", ret))
+		*val = 0;
+	else
+		*val = args.r11;
+}
+
+static void hv_tdx_write_msr(u64 msr, u64 val)
+{
+	struct tdx_hypercall_args args = {
+		.r10 = TDX_HYPERCALL_STANDARD,
+		.r11 = EXIT_REASON_MSR_WRITE,
+		.r12 = msr,
+		.r13 = val,
+	};
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+	u64 ret = __tdx_hypercall(&args);
+#else
+	u64 ret = HV_STATUS_INVALID_PARAMETER;
+	(void)args;
+#endif
+
+	WARN_ONCE(ret, "Failed to emulate MSR write: %lld\n", ret);
+}
+
+static void hv_ghcb_msr_write(u64 msr, u64 value)
 {
 	union hv_ghcb *hv_ghcb;
 	void **ghcb_base;
@@ -214,9 +256,20 @@ void hv_ghcb_msr_write(u64 msr, u64 value)
 
 	local_irq_restore(flags);
 }
-EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
 
-void hv_ghcb_msr_read(u64 msr, u64 *value)
+void hv_ivm_msr_write(u64 msr, u64 value)
+{
+	if (!hyperv_paravisor_present)
+		return;
+
+	if (hv_isolation_type_tdx())
+		hv_tdx_write_msr(msr, value);
+	else if (hv_isolation_type_snp())
+		hv_ghcb_msr_write(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_ivm_msr_write);
+
+static void hv_ghcb_msr_read(u64 msr, u64 *value)
 {
 	union hv_ghcb *hv_ghcb;
 	void **ghcb_base;
@@ -246,7 +299,18 @@ void hv_ghcb_msr_read(u64 msr, u64 *value)
 			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
 	local_irq_restore(flags);
 }
-EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
+
+void hv_ivm_msr_read(u64 msr, u64 *value)
+{
+	if (!hyperv_paravisor_present)
+		return;
+
+	if (hv_isolation_type_tdx())
+		hv_tdx_read_msr(msr, value);
+	else if (hv_isolation_type_snp())
+		hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_ivm_msr_read);
 
 /*
  * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 2a4c7dcf64038..18f569c44c39d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -280,15 +280,15 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
-void hv_ghcb_msr_write(u64 msr, u64 value);
-void hv_ghcb_msr_read(u64 msr, u64 *value);
+void hv_ivm_msr_write(u64 msr, u64 value);
+void hv_ivm_msr_read(u64 msr, u64 *value);
 bool hv_ghcb_negotiate_protocol(void);
 void __noreturn hv_ghcb_terminate(unsigned int set, unsigned int reason);
 void hv_vtom_init(void);
 int hv_snp_boot_ap(int cpu, unsigned long start_ip);
 #else
-static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
-static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
+static inline void hv_ivm_msr_write(u64 msr, u64 value) {}
+static inline void hv_ivm_msr_read(u64 msr, u64 *value) {}
 static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
 static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
 static inline void hv_vtom_init(void) {}
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 3dff2ef43bc73..a196760afa7a1 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -72,8 +72,8 @@ u64 hv_get_non_nested_register(unsigned int reg)
 {
 	u64 value;
 
-	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
-		hv_ghcb_msr_read(reg, &value);
+	if (hv_is_synic_reg(reg) && hyperv_paravisor_present)
+		hv_ivm_msr_read(reg, &value);
 	else
 		rdmsrl(reg, value);
 	return value;
@@ -82,8 +82,8 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
 
 void hv_set_non_nested_register(unsigned int reg, u64 value)
 {
-	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
-		hv_ghcb_msr_write(reg, value);
+	if (hv_is_synic_reg(reg) && hyperv_paravisor_present) {
+		hv_ivm_msr_write(reg, value);
 
 		/* Write proxy bit via wrmsl instruction */
 		if (hv_is_sint_reg(reg))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 9/9] x86/hyperv: Remove hv_isolation_type_en_snp
  2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
                   ` (7 preceding siblings ...)
  2023-08-20 20:27 ` [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM " Dexuan Cui
@ 2023-08-20 20:27 ` Dexuan Cui
  8 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-20 20:27 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mikelley
  Cc: x86, linux-kernel, linux-arch, Tianyu.Lan, rick.p.edgecombe,
	andavis, mheslin, vkuznets, xiaoyao.li, Dexuan Cui

In ms_hyperv_init_platform(), do not distinguish between a SNP VM with
the paravisor and a SNP VM without the paravisor.

Replace hv_isolation_type_en_snp() with
!hyperv_paravisor_present && hv_isolation_type_snp().

The hv_isolation_type_en_snp() in drivers/hv/hv.c and
drivers/hv/hv_common.c can be changed to hv_isolation_type_snp() since
we know !hyperv_paravisor_present is true there.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
  Rebased to Tianyu's v7 SNP patchset: the changes are small.
    In hyperv_init_ghcb() and hyperv_init(), added the test of
hyperv_paravisor_present, which was missed in v1.
    Updated the test before the call of get_vtl().
    Updated the test in hv_do_hypercall() and friends.
    Updated the test for hv_smp_prepare_cpus().

 arch/x86/hyperv/hv_init.c       |  8 ++++----
 arch/x86/hyperv/ivm.c           | 12 +-----------
 arch/x86/include/asm/mshyperv.h | 11 ++++-------
 arch/x86/kernel/cpu/mshyperv.c  |  9 ++++-----
 drivers/hv/hv.c                 |  4 ++--
 drivers/hv/hv_common.c          |  8 +-------
 include/asm-generic/mshyperv.h  |  3 +--
 7 files changed, 17 insertions(+), 38 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 18afbb10edc64..fd79e632023e2 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -52,7 +52,7 @@ static int hyperv_init_ghcb(void)
 	void *ghcb_va;
 	void **ghcb_base;
 
-	if (!hv_isolation_type_snp())
+	if (!hyperv_paravisor_present || !hv_isolation_type_snp())
 		return 0;
 
 	if (!hv_ghcb_pg)
@@ -117,7 +117,7 @@ static int hv_cpu_init(unsigned int cpu)
 			 * is blocked to run in Confidential VM. So only decrypt assist
 			 * page in non-root partition here.
 			 */
-			if (*hvp && hv_isolation_type_en_snp()) {
+			if (*hvp && !hyperv_paravisor_present && hv_isolation_type_snp()) {
 				WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
 				memset(*hvp, 0, PAGE_SIZE);
 			}
@@ -460,7 +460,7 @@ void __init hyperv_init(void)
 			goto common_free;
 	}
 
-	if (hv_isolation_type_snp()) {
+	if (hyperv_paravisor_present && hv_isolation_type_snp()) {
 		/* Negotiate GHCB Version. */
 		if (!hv_ghcb_negotiate_protocol())
 			hv_ghcb_terminate(SEV_TERM_SET_GEN,
@@ -583,7 +583,7 @@ void __init hyperv_init(void)
 	hv_query_ext_cap(0);
 
 	/* Find the VTL */
-	if (hv_isolation_type_en_snp())
+	if (!hyperv_paravisor_present && hv_isolation_type_snp())
 		ms_hyperv.vtl = get_vtl();
 
 	return;
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 93d54d8ef12e1..7d1f553ec0017 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -624,7 +624,7 @@ bool hv_is_isolation_supported(void)
 DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
 
 /*
- * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
+ * hv_isolation_type_snp - Check if the system runs in an AMD SEV-SNP based
  * isolation VM.
  */
 bool hv_isolation_type_snp(void)
@@ -632,16 +632,6 @@ bool hv_isolation_type_snp(void)
 	return static_branch_unlikely(&isolation_type_snp);
 }
 
-DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
-/*
- * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
- * isolation enlightened VM.
- */
-bool hv_isolation_type_en_snp(void)
-{
-	return static_branch_unlikely(&isolation_type_en_snp);
-}
-
 DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
 /*
  * hv_isolation_type_tdx - Check if the system runs in an Intel TDX based
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 18f569c44c39d..f0b3782884d22 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -27,7 +27,6 @@
 union hv_ghcb;
 
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
-DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
 DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
 
 typedef int (*hyperv_fill_flush_list_func)(
@@ -51,7 +50,7 @@ extern u64 hv_current_partition_id;
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
-extern bool hv_isolation_type_en_snp(void);
+bool hv_isolation_type_snp(void);
 bool hv_isolation_type_tdx(void);
 u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
 
@@ -82,7 +81,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 					cc_mkdec(input_address),
 					cc_mkdec(output_address));
 
-	if (hv_isolation_type_en_snp()) {
+	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     "vmmcall"
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -140,7 +139,7 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 		 control == (HVCALL_SIGNAL_EVENT | HV_HYPERCALL_FAST_BIT)))
 		return hv_tdx_hypercall(control, input1, 0);
 
-	if (hv_isolation_type_en_snp()) {
+	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
 		__asm__ __volatile__(
 				"vmmcall"
 				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -194,7 +193,7 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
 		return hv_tdx_hypercall(control, input1, input2);
 
-	if (hv_isolation_type_en_snp()) {
+	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     "vmmcall"
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -295,8 +294,6 @@ static inline void hv_vtom_init(void) {}
 static int hv_snp_boot_ap(int cpu, unsigned long start_ip) { return 0; }
 #endif
 
-extern bool hv_isolation_type_snp(void);
-
 static inline bool hv_is_synic_reg(unsigned int reg)
 {
 	return (reg >= HV_REGISTER_SCONTROL) &&
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index a196760afa7a1..c98a75ae948e4 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -306,7 +306,7 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
 	 *  Override wakeup_secondary_cpu_64 callback for SEV-SNP
 	 *  enlightened guest.
 	 */
-	if (hv_isolation_type_en_snp()) {
+	if (!hyperv_paravisor_present && hv_isolation_type_snp()) {
 		apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
 		return;
 	}
@@ -441,9 +441,7 @@ static void __init ms_hyperv_init_platform(void)
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
 
 
-		if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
-			static_branch_enable(&isolation_type_en_snp);
-		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
+		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
 		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
@@ -566,7 +564,8 @@ static void __init ms_hyperv_init_platform(void)
 
 # ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
-	if (hv_root_partition || hv_isolation_type_en_snp())
+	if (hv_root_partition ||
+	    (!hyperv_paravisor_present && hv_isolation_type_snp()))
 		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
 # endif
 
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 6b5f1805d4749..932b8bc239acb 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -166,7 +166,7 @@ int hv_synic_alloc(void)
 		}
 
 		if (!hyperv_paravisor_present &&
-		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
+		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)
 				hv_cpu->synic_message_page, 1);
 			if (ret) {
@@ -227,7 +227,7 @@ void hv_synic_free(void)
 		}
 
 		if (!hyperv_paravisor_present &&
-		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
+		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			if (hv_cpu->synic_message_page) {
 				ret = set_memory_encrypted((unsigned long)
 					hv_cpu->synic_message_page, 1);
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c0b0ac44ffa3c..d3f95a1be1e99 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -386,7 +386,7 @@ int hv_common_cpu_init(unsigned int cpu)
 		}
 
 		if (!hyperv_paravisor_present &&
-		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
+		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)mem, pgcount);
 			if (ret) {
 				/* It may be unsafe to free 'mem' */
@@ -535,12 +535,6 @@ bool __weak hv_isolation_type_snp(void)
 }
 EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
 
-bool __weak hv_isolation_type_en_snp(void)
-{
-	return false;
-}
-EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
-
 bool __weak hv_isolation_type_tdx(void)
 {
 	return false;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 94f87a0344590..ac271f3b4091c 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -64,8 +64,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
 
 extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
-extern bool hv_isolation_type_snp(void);
-extern bool hv_isolation_type_en_snp(void);
+bool hv_isolation_type_snp(void);
 bool hv_isolation_type_tdx(void);
 
 /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
  2023-08-20 20:27 ` [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM Dexuan Cui
@ 2023-08-21 14:29   ` Michael Kelley (LINUX)
  2023-08-21 18:17     ` Dexuan Cui
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-21 14:29 UTC (permalink / raw)
  To: Dexuan Cui, ak@linux.intel.com, arnd@arndb.de, bp@alien8.de,
	brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

From: Dexuan Cui <decui@microsoft.com> Sent: Sunday, August 20, 2023 1:27 PM
> 
> Don't set *this_cpu_ptr(hyperv_pcpu_input_arg) before the function
> set_memory_decrypted() returns, otherwise we run into this ticky issue:
> 
> For a fully enlightened TDX/SNP VM, in hv_common_cpu_init(),
> *this_cpu_ptr(hyperv_pcpu_input_arg) is an encrypted page before
> the set_memory_decrypted() returns.
> 
> When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is not
> NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... ->
> cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... ->
> __send_ipi_mask_ex() tries to call hv_do_fast_hypercall16() with the

Actually, it tries to call hv_do_rep_hypercall().  Using the memory-based
hyperv_pcpu_input_arg with a "fast" hypercall doesn't make sense.

> hyperv_pcpu_input_arg as the hypercall input page, which must be a
> decrypted page in such a VM, but the page is still encrypted at this
> point, and a fatal fault is triggered.
> 
> Fix the issue by setting *this_cpu_ptr(hyperv_pcpu_input_arg) after
> set_memory_decrypted(): if the hyperv_pcpu_input_arg is NULL,
> __send_ipi_mask_ex() returns HV_STATUS_INVALID_PARAMETER immediately,
> and hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
> which can use x2apic_send_IPI_all(), which may be slightly slower than
> the hypercall but still works correctly in such a VM.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2: None
> 
>  drivers/hv/hv_common.c | 30 +++++++++++++++++++++++-------
>  1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 897bbb96f4118..4c858e1636da7 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -360,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  	u64 msr_vp_index;
>  	gfp_t flags;
>  	int pgcount = hv_root_partition ? 2 : 1;
> +	void *mem;
>  	int ret;
> 
>  	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> @@ -372,25 +373,40 @@ int hv_common_cpu_init(unsigned int cpu)
>  	 * allocated if this CPU was previously online and then taken offline
>  	 */
>  	if (!*inputarg) {
> -		*inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> -		if (!(*inputarg))
> +		mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> +		if (!mem)
>  			return -ENOMEM;
> 
>  		if (hv_root_partition) {
>  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> -			*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> +			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>  		}
> 
>  		if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) {
> -			ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> +			ret = set_memory_decrypted((unsigned long)mem, pgcount);
>  			if (ret) {
> -				/* It may be unsafe to free *inputarg */
> -				*inputarg = NULL;
> +				/* It may be unsafe to free 'mem' */
>  				return ret;
>  			}
> 
> -			memset(*inputarg, 0x00, pgcount * PAGE_SIZE);
> +			memset(mem, 0x00, pgcount * HV_HYP_PAGE_SIZE);
>  		}
> +
> +		/*
> +		 * In a fully enlightened TDX/SNP VM with more than 64 VPs, if
> +		 * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() ->
> +		 * ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to
> +		 * use hyperv_pcpu_input_arg as the hypercall input page, which
> +		 * must be a decrypted page in such a VM, but the page is still
> +		 * encrypted before set_memory_decrypted() returns. Fix this by
> +		 * setting *inputarg after the above set_memory_decrypted(): if
> +		 * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns
> +		 * HV_STATUS_INVALID_PARAMETER immediately, and the function
> +		 * hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
> +		 * which may be slightly slower than the hypercall, but still
> +		 * works correctly in such a VM.
> +		 */
> +		*inputarg = mem;
>  	}
> 
>  	msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX);
> --
> 2.25.1

Modulo the minor correction in the commit message,

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
  2023-08-21 14:29   ` Michael Kelley (LINUX)
@ 2023-08-21 18:17     ` Dexuan Cui
  0 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-21 18:17 UTC (permalink / raw)
  To: Michael Kelley (LINUX), ak@linux.intel.com, arnd@arndb.de,
	bp@alien8.de, brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Monday, August 21, 2023 7:30 AM
> [...]
> From: Dexuan Cui <decui@microsoft.com> Sent: Sunday, August 20, 2023
> 1:27 PM
> > [...]
> > When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is
> > not NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... ->
> > cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... ->
> > __send_ipi_mask_ex() tries to call hv_do_fast_hypercall16() with the

s/hv_do_rep_hypercall/hv_do_rep_hypercall

> Actually, it tries to call hv_do_rep_hypercall().  Using the memory-based
> hyperv_pcpu_input_arg with a "fast" hypercall doesn't make sense.

Thanks for spotting the mistake!

> > hyperv_pcpu_input_arg as the hypercall input page, which must be a
> > decrypted page in such a VM, but the page is still encrypted at this
> > point, and a fatal fault is triggered.
> >
> Modulo the minor correction in the commit message,
> 
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present
  2023-08-20 20:27 ` [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present Dexuan Cui
@ 2023-08-21 19:33   ` Michael Kelley (LINUX)
  2023-08-23  4:23     ` Dexuan Cui
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-21 19:33 UTC (permalink / raw)
  To: Dexuan Cui, ak@linux.intel.com, arnd@arndb.de, bp@alien8.de,
	brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

From: Dexuan Cui <decui@microsoft.com> Sent: Sunday, August 20, 2023 1:27 PM
> 
> The new variable hyperv_paravisor_present is set only when the VM
> is a SNP/TDX with the paravisor running: see ms_hyperv_init_platform().
> 
> In many places, hyperv_paravisor_present can replace

You said "In many places".  Are there places where it can't replace
ms_hyperv.paravisor_present?  It looks like all the uses are gone
after this patch.

> ms_hyperv.paravisor_present, and it's also used to replace
> hv_isolation_type_snp() in drivers/hv/hv.c.
> 
> Call hv_vtom_init() when it's a VBS VM or when hyperv_paravisor_present
> is true (i.e. the VM is a SNP/TDX VM with the paravisor).
> 
> Enhance hv_vtom_init() for a TDX VM with the paravisor.
> 
> The biggest motive to introduce hyperv_paravisor_present is that we
> can not use ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:
> that would introduce a complicated header file dependency issue.

The discussion in this commit messages about hyperv_paravisor_present
is a bit scattered and confusing.  I think you are introducing the global variable
to solve the header file dependency issue.  Otherwise, the ms_hyperv field
would be equivalent.  Then you are using hyperv_paravisor_present for
several purposes, including to decide whether to call hv_vtom_init() and
to simplify the logic in drivers/hv/hv.c.  Maybe you could reorganize
the commit message a bit to be more direct regarding the purpose.

> 
> In arch/x86/include/asm/mshyperv.h, _hv_do_fast_hypercall8()
> is changed to specially handle HVCALL_SIGNAL_EVENT for a TDX VM with the
> paravisor, because such a VM must use TDX GHCI (see hv_tdx_hypercall())
> for this hypercall. See vmbus_set_event() -> hv_do_fast_hypercall8() ->
> _hv_do_fast_hypercall8() -> hv_tdx_hypercall().

Embedding the special case for HVCALL_SIGNAL_EVENT within
hv_do_fast_hypercall8() is not consistent with how this special case
is handled for SNP.  For SNP, the special case is coded directly into
vmbus_set_event().  Any reason not to do the same for TDX + paravisor?

> 
> In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
> for a TDX VM with the paravisor, just like we don't decrypt the page
> for a SNP VM with the paravisor.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2: None
> 
>  arch/x86/hyperv/hv_apic.c       |  4 ++--
>  arch/x86/hyperv/hv_init.c       |  4 ++--
>  arch/x86/hyperv/ivm.c           | 20 ++++++++++++++++++--
>  arch/x86/include/asm/mshyperv.h |  9 ++++++---
>  arch/x86/kernel/cpu/mshyperv.c  | 13 ++++++++++---
>  drivers/hv/connection.c         |  2 +-
>  drivers/hv/hv.c                 | 12 ++++++------
>  drivers/hv/hv_common.c          |  6 +++++-
>  include/asm-generic/mshyperv.h  |  1 +
>  9 files changed, 51 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index cb7429046d18d..8958836500d01 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -179,7 +179,7 @@ static bool __send_ipi_mask(const struct cpumask *mask, int
> vector,
> 
>  	/* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
>  	if (!hv_hypercall_pg) {
> -		if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
> +		if (hyperv_paravisor_present || !hv_isolation_type_tdx())
>  			return false;
>  	}
> 
> @@ -239,7 +239,7 @@ static bool __send_ipi_one(int cpu, int vector)
> 
>  	/* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
>  	if (!hv_hypercall_pg) {
> -		if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
> +		if (hyperv_paravisor_present || !hv_isolation_type_tdx())
>  			return false;
>  	}
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index c1c1b4e1502f0..933a53ef81197 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -658,8 +658,8 @@ bool hv_is_hyperv_initialized(void)
>  	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>  		return false;
> 
> -	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> -	if (hv_isolation_type_tdx())
> +	/* A TDX VM with no paravisor uses TDX GHCI call rather than hv_hypercall_pg */
> +	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
>  		return true;
>  	/*
>  	 * Verify that earlier initialization succeeded by checking
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 6c7598d9e68a3..920ecb85802b8 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -497,13 +497,29 @@ int hv_snp_boot_ap(int cpu, unsigned long start_ip)
> 
>  void __init hv_vtom_init(void)
>  {
> +	enum hv_isolation_type type = hv_get_isolation_type();
>  	/*
>  	 * By design, a VM using vTOM doesn't see the SEV setting,
>  	 * so SEV initialization is bypassed and sev_status isn't set.
>  	 * Set it here to indicate a vTOM VM.
>  	 */

The above comment applies just to the case HV_ISOLATION_TYPE_SNP,
not to the entire switch statement, so it should be moved under the
case.

> -	sev_status = MSR_AMD64_SNP_VTOM;
> -	cc_vendor = CC_VENDOR_AMD;
> +	switch (type) {
> +	case HV_ISOLATION_TYPE_VBS:
> +		fallthrough;
> +
> +	case HV_ISOLATION_TYPE_SNP:
> +		sev_status = MSR_AMD64_SNP_VTOM;
> +		cc_vendor = CC_VENDOR_AMD;
> +		break;
> +
> +	case HV_ISOLATION_TYPE_TDX:
> +		cc_vendor = CC_VENDOR_INTEL;
> +		break;
> +
> +	default:
> +		panic("hv_vtom_init: unsupported isolation type %d\n", type);
> +	}
> +
>  	cc_set_mask(ms_hyperv.shared_gpa_boundary);
>  	physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 24d7f662a8beb..2a4c7dcf64038 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -43,6 +43,7 @@ static inline unsigned char hv_get_nmi_reason(void)
> 
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern int hyperv_init_cpuhp;
> +extern bool hyperv_paravisor_present;
> 
>  extern void *hv_hypercall_pg;
> 
> @@ -76,7 +77,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx())
> +	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
>  		return hv_tdx_hypercall(control,
>  					cc_mkdec(input_address),
>  					cc_mkdec(output_address));
> @@ -134,7 +135,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx())
> +	if (hv_isolation_type_tdx() &&
> +		(!hyperv_paravisor_present ||
> +		 control == (HVCALL_SIGNAL_EVENT | HV_HYPERCALL_FAST_BIT)))

See comment above.  This would be more consistent with SNP if it were
handled directly in vmbus_set_event().

>  		return hv_tdx_hypercall(control, input1, 0);
> 
>  	if (hv_isolation_type_en_snp()) {
> @@ -188,7 +191,7 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx())
> +	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
>  		return hv_tdx_hypercall(control, input1, input2);
> 
>  	if (hv_isolation_type_en_snp()) {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 14baa288b1935..3dff2ef43bc73 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -40,6 +40,12 @@ bool hv_root_partition;
>  bool hv_nested;
>  struct ms_hyperv_info ms_hyperv;
> 
> +/*
> + * Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h.
> + * Exported in drivers/hv/hv_common.c to not break the ARM64 build.
> + */
> +bool hyperv_paravisor_present __ro_after_init;
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
>  static inline unsigned int hv_get_nested_reg(unsigned int reg)
>  {
> @@ -429,6 +435,8 @@ static void __init ms_hyperv_init_platform(void)
>  			ms_hyperv.shared_gpa_boundary =
>  				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> 
> +		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
> +
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> 
> @@ -443,7 +451,7 @@ static void __init ms_hyperv_init_platform(void)
>  			/* A TDX VM must use x2APIC and doesn't use lazy EOI. */
>  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> 
> -			if (!ms_hyperv.paravisor_present) {
> +			if (!hyperv_paravisor_present) {
>  				/*
>  				 * The ms_hyperv.shared_gpa_boundary_active in
>  				 * a fully enlightened TDX VM is 0, but the GPAs
> @@ -534,8 +542,7 @@ static void __init ms_hyperv_init_platform(void)
> 
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
> -	    ((hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) &&
> -	    ms_hyperv.paravisor_present))
> +	    hyperv_paravisor_present)
>  		hv_vtom_init();
>  	/*
>  	 * Setup the hook to get control post apic initialization.
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 02b54f85dc607..7f64fc942323b 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -484,7 +484,7 @@ void vmbus_set_event(struct vmbus_channel *channel)
> 
>  	++channel->sig_events;
> 
> -	if (hv_isolation_type_snp())
> +	if (hv_isolation_type_snp() && hyperv_paravisor_present)

This code change seems to be more properly part of Patch 9 in the
series when hv_isolation_type_en_snp() goes away.

>  		hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
>  				NULL, sizeof(channel->sig_event));
>  	else
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 28bbb354324d6..20bc44923e4f0 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -109,7 +109,7 @@ int hv_synic_alloc(void)
>  		 * Synic message and event pages are allocated by paravisor.
>  		 * Skip these pages allocation here.
>  		 */
> -		if (!hv_isolation_type_snp() && !hv_root_partition) {
> +		if (!hyperv_paravisor_present && !hv_root_partition) {
>  			hv_cpu->synic_message_page =
>  				(void *)get_zeroed_page(GFP_ATOMIC);
>  			if (hv_cpu->synic_message_page == NULL) {
> @@ -128,7 +128,7 @@ int hv_synic_alloc(void)
>  			}
>  		}
> 
> -		if (!ms_hyperv.paravisor_present &&
> +		if (!hyperv_paravisor_present &&
>  		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
>  			ret = set_memory_decrypted((unsigned long)
>  				hv_cpu->synic_message_page, 1);
> @@ -226,7 +226,7 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
>  	simp.simp_enabled = 1;
> 
> -	if (hv_isolation_type_snp() || hv_root_partition) {
> +	if (hyperv_paravisor_present || hv_root_partition) {
>  		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
>  		u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
>  				~ms_hyperv.shared_gpa_boundary;
> @@ -249,7 +249,7 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
>  	siefp.siefp_enabled = 1;
> 
> -	if (hv_isolation_type_snp() || hv_root_partition) {
> +	if (hyperv_paravisor_present || hv_root_partition) {
>  		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
>  		u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
>  				~ms_hyperv.shared_gpa_boundary;
> @@ -336,7 +336,7 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	 * addresses.
>  	 */
>  	simp.simp_enabled = 0;
> -	if (hv_isolation_type_snp() || hv_root_partition) {
> +	if (hyperv_paravisor_present || hv_root_partition) {
>  		iounmap(hv_cpu->synic_message_page);
>  		hv_cpu->synic_message_page = NULL;
>  	} else {
> @@ -348,7 +348,7 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
>  	siefp.siefp_enabled = 0;
> 
> -	if (hv_isolation_type_snp() || hv_root_partition) {
> +	if (hyperv_paravisor_present || hv_root_partition) {
>  		iounmap(hv_cpu->synic_event_page);
>  		hv_cpu->synic_event_page = NULL;
>  	} else {
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 4c858e1636da7..c0b0ac44ffa3c 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -40,6 +40,9 @@
>  bool __weak hv_root_partition;
>  EXPORT_SYMBOL_GPL(hv_root_partition);
> 
> +bool __weak hyperv_paravisor_present;
> +EXPORT_SYMBOL_GPL(hyperv_paravisor_present);
> +
>  bool __weak hv_nested;
>  EXPORT_SYMBOL_GPL(hv_nested);
> 
> @@ -382,7 +385,8 @@ int hv_common_cpu_init(unsigned int cpu)
>  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>  		}
> 
> -		if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) {
> +		if (!hyperv_paravisor_present &&
> +		    (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
>  			ret = set_memory_decrypted((unsigned long)mem, pgcount);
>  			if (ret) {
>  				/* It may be unsafe to free 'mem' */
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index f577eff58ea0b..94f87a0344590 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -176,6 +176,7 @@ extern int vmbus_interrupt;
>  extern int vmbus_irq;
> 
>  extern bool hv_root_partition;
> +extern bool hyperv_paravisor_present;
> 
>  #if IS_ENABLED(CONFIG_HYPERV)
>  /*
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
  2023-08-20 20:27 ` [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM " Dexuan Cui
@ 2023-08-21 19:33   ` Michael Kelley (LINUX)
  2023-08-23  4:30     ` Dexuan Cui
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-21 19:33 UTC (permalink / raw)
  To: Dexuan Cui, ak@linux.intel.com, arnd@arndb.de, bp@alien8.de,
	brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

From: Dexuan Cui <decui@microsoft.com> Sent: Sunday, August 20, 2023 1:27 PM
> 
> When the paravisor is present, a SNP VM must use GHCB to access some
> special MSRs, including HV_X64_MSR_GUEST_OS_ID and some SynIC MSRs.
> 
> Similarly, when the paravisor is present, a TDX VM must use TDX GHCI
> to access the same MSRs.
> 
> Implement hv_tdx_read_msr() and hv_tdx_write_msr(), and use the helper
> functions hv_ivm_msr_read() and hv_ivm_msr_write() to access the MSRs
> in a unified way for SNP/TDX VMs with the paravisor.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2: None
> 
>  arch/x86/hyperv/hv_init.c       |  8 ++--
>  arch/x86/hyperv/ivm.c           | 72 +++++++++++++++++++++++++++++++--
>  arch/x86/include/asm/mshyperv.h |  8 ++--
>  arch/x86/kernel/cpu/mshyperv.c  |  8 ++--
>  4 files changed, 80 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 892e52afa37cd..18afbb10edc64 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -500,8 +500,8 @@ void __init hyperv_init(void)
>  	guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> -	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> -	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> +	/* With the paravisor, the VM must also write the ID via GHCB/GHCI */
> +	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
>  	/* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg
> */
>  	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> @@ -590,7 +590,7 @@ void __init hyperv_init(void)
> 
>  clean_guest_os_id:
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> -	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> +	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>  	cpuhp_remove_state(cpuhp);
>  free_ghcb_page:
>  	free_percpu(hv_ghcb_pg);
> @@ -611,7 +611,7 @@ void hyperv_cleanup(void)
> 
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> -	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> +	hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>  	/*
>  	 * Reset hypercall page reference before reset the page,
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 920ecb85802b8..93d54d8ef12e1 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -186,7 +186,49 @@ bool hv_ghcb_negotiate_protocol(void)
>  	return true;
>  }
> 
> -void hv_ghcb_msr_write(u64 msr, u64 value)
> +#define EXIT_REASON_MSR_READ		31
> +#define EXIT_REASON_MSR_WRITE		32

These exit reasons are defined in arch/x86/include/uapi/asm/vmx.h.
Are they conceptually the same thing and should be reused?

> +
> +static void hv_tdx_read_msr(u64 msr, u64 *val)

Could you make the function name be
hv_tdx_msr_read() so it matches hv_ghcb_msr_read()
and hv_ivm_msr_read()?  :-)

> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_MSR_READ,
> +		.r12 = msr,
> +	};
> +
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +	u64 ret = __tdx_hypercall_ret(&args);
> +#else
> +	u64 ret = HV_STATUS_INVALID_PARAMETER;
> +#endif
> +
> +	if (WARN_ONCE(ret, "Failed to emulate MSR read: %lld\n", ret))
> +		*val = 0;
> +	else
> +		*val = args.r11;
> +}
> +
> +static void hv_tdx_write_msr(u64 msr, u64 val)

Same here on the function name.

> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_MSR_WRITE,
> +		.r12 = msr,
> +		.r13 = val,
> +	};
> +
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +	u64 ret = __tdx_hypercall(&args);
> +#else
> +	u64 ret = HV_STATUS_INVALID_PARAMETER;
> +	(void)args;
> +#endif
> +
> +	WARN_ONCE(ret, "Failed to emulate MSR write: %lld\n", ret);
> +}
> +
> +static void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>  	union hv_ghcb *hv_ghcb;
>  	void **ghcb_base;
> @@ -214,9 +256,20 @@ void hv_ghcb_msr_write(u64 msr, u64 value)
> 
>  	local_irq_restore(flags);
>  }
> -EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
> 
> -void hv_ghcb_msr_read(u64 msr, u64 *value)
> +void hv_ivm_msr_write(u64 msr, u64 value)
> +{
> +	if (!hyperv_paravisor_present)
> +		return;
> +
> +	if (hv_isolation_type_tdx())
> +		hv_tdx_write_msr(msr, value);
> +	else if (hv_isolation_type_snp())
> +		hv_ghcb_msr_write(msr, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_ivm_msr_write);
> +
> +static void hv_ghcb_msr_read(u64 msr, u64 *value)
>  {
>  	union hv_ghcb *hv_ghcb;
>  	void **ghcb_base;
> @@ -246,7 +299,18 @@ void hv_ghcb_msr_read(u64 msr, u64 *value)
>  			| ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
>  	local_irq_restore(flags);
>  }
> -EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
> +
> +void hv_ivm_msr_read(u64 msr, u64 *value)
> +{
> +	if (!hyperv_paravisor_present)
> +		return;
> +
> +	if (hv_isolation_type_tdx())
> +		hv_tdx_read_msr(msr, value);
> +	else if (hv_isolation_type_snp())
> +		hv_ghcb_msr_read(msr, value);
> +}
> +EXPORT_SYMBOL_GPL(hv_ivm_msr_read);
> 
>  /*
>   * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 2a4c7dcf64038..18f569c44c39d 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -280,15 +280,15 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int
> vcpu, int vector,
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> 
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
> -void hv_ghcb_msr_write(u64 msr, u64 value);
> -void hv_ghcb_msr_read(u64 msr, u64 *value);
> +void hv_ivm_msr_write(u64 msr, u64 value);
> +void hv_ivm_msr_read(u64 msr, u64 *value);

These declarations are under CONFIG_AMD_MEM_ENCRYPT, which
is problematic for TDX if the kernel is built with CONFIG_INTEL_TDX_GUEST
but not CONFIG_AMD_MEM_ENCRYPT.  Presumably we want to make
sure that combination builds and works correctly.

I think there's a bigger problem in that arch/x86/hyperv/ivm.c has
a big #ifdef CONFIG_AMD_MEM_ENCRYPT in it, and TDX with paravisor
wants to use the "vtom" functions that are under that #ifdef.

>  bool hv_ghcb_negotiate_protocol(void);
>  void __noreturn hv_ghcb_terminate(unsigned int set, unsigned int reason);
>  void hv_vtom_init(void);
>  int hv_snp_boot_ap(int cpu, unsigned long start_ip);
>  #else
> -static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
> -static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> +static inline void hv_ivm_msr_write(u64 msr, u64 value) {}
> +static inline void hv_ivm_msr_read(u64 msr, u64 *value) {}
>  static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
>  static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
>  static inline void hv_vtom_init(void) {}
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 3dff2ef43bc73..a196760afa7a1 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -72,8 +72,8 @@ u64 hv_get_non_nested_register(unsigned int reg)
>  {
>  	u64 value;
> 
> -	if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
> -		hv_ghcb_msr_read(reg, &value);
> +	if (hv_is_synic_reg(reg) && hyperv_paravisor_present)
> +		hv_ivm_msr_read(reg, &value);
>  	else
>  		rdmsrl(reg, value);
>  	return value;
> @@ -82,8 +82,8 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
> 
>  void hv_set_non_nested_register(unsigned int reg, u64 value)
>  {
> -	if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
> -		hv_ghcb_msr_write(reg, value);
> +	if (hv_is_synic_reg(reg) && hyperv_paravisor_present) {
> +		hv_ivm_msr_write(reg, value);
> 
>  		/* Write proxy bit via wrmsl instruction */
>  		if (hv_is_sint_reg(reg))
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present
  2023-08-21 19:33   ` Michael Kelley (LINUX)
@ 2023-08-23  4:23     ` Dexuan Cui
  2023-08-23  4:28       ` Dexuan Cui
  0 siblings, 1 reply; 17+ messages in thread
From: Dexuan Cui @ 2023-08-23  4:23 UTC (permalink / raw)
  To: Michael Kelley (LINUX), ak@linux.intel.com, arnd@arndb.de,
	bp@alien8.de, brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Monday, August 21, 2023 12:33 PM
>  [...]
> From: Dexuan Cui <decui@microsoft.com> Sent: Sunday, August 20, 2023
> >
> > The new variable hyperv_paravisor_present is set only when the VM
> > is a SNP/TDX with the paravisor running: see ms_hyperv_init_platform().
> >
> > In many places, hyperv_paravisor_present can replace
> 
> You said "In many places".  Are there places where it can't replace
> ms_hyperv.paravisor_present?  It looks like all the uses are gone
> after this patch.

Sorry for the inaccuracy. I meant to say "everywhere" rather than
"In many places".  I think hyperv_paravisor_present and
ms_hyperv.paravisor_present can be used interchangeably except that
I need to use hyperv_paravisor_present in arch/x86/include/asm/mshyperv.h
to avoid the header file dependency issue.

As we discussed offline, we'll add some hypercall function structure in future
for different VM types, and then hyperv_paravisor_present and
ms_hyperv.paravisor_present would go away when we make hypercalls.

So let me use hyperv_paravisor_present only in
arch/x86/include/asm/mshyperv.h to make the patch smaller.

> > ms_hyperv.paravisor_present, and it's also used to replace
> > hv_isolation_type_snp() in drivers/hv/hv.c.
> >
> > Call hv_vtom_init() when it's a VBS VM or when hyperv_paravisor_present
> > is true (i.e. the VM is a SNP/TDX VM with the paravisor).
> >
> > Enhance hv_vtom_init() for a TDX VM with the paravisor.
> >
> > The biggest motive to introduce hyperv_paravisor_present is that we
> > can not use ms_hyperv.paravisor_present in
> arch/x86/include/asm/mshyperv.h:
> > that would introduce a complicated header file dependency issue.
> 
> The discussion in this commit messages about hyperv_paravisor_present
> is a bit scattered and confusing.  I think you are introducing the global
> variable
> to solve the header file dependency issue.  Otherwise, the ms_hyperv field
> would be equivalent.  Then you are using hyperv_paravisor_present for
> several purposes, including to decide whether to call hv_vtom_init() and
> to simplify the logic in drivers/hv/hv.c.  Maybe you could reorganize
> the commit message a bit to be more direct regarding the purpose.

The new changelog will be:

    x86/hyperv: Introduce a global variable hyperv_paravisor_present

    The new variable hyperv_paravisor_present is set only when the VM
    is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform().

    We introduce hyperv_paravisor_present because we can not use
    ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h:

    struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which
    is included at the end of arch/x86/include/asm/mshyperv.h, but at the
    beginning of arch/x86/include/asm/mshyperv.h, we would already need to use
    struct ms_hyperv_info in hv_do_hypercall().

    We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h,
    and use ms_hyperv.paravisor_present elsewhere. In the future, we'll
    introduce a hypercall function structure for different VM types, and
    at boot time, the right function pointers would be written into the
    structure so that runtime testing of TDX vs. SNP vs. normal will be
    avoided and hyperv_paravisor_present will no longer be needed.

    Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present
    is true, i.e. the VM is a SNP VM or TDX VM with the paravisor.

    Enhance hv_vtom_init() for a TDX VM with the paravisor.

    In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg
    for a TDX VM with the paravisor, just like we don't decrypt the page
    for a SNP VM with the paravisor.

BTW, please refer to the link for the v3 version of this patch (WIP):

> > In arch/x86/include/asm/mshyperv.h, _hv_do_fast_hypercall8()
> > is changed to specially handle HVCALL_SIGNAL_EVENT for a TDX VM with
> the
> > paravisor, because such a VM must use TDX GHCI (see hv_tdx_hypercall())
> > for this hypercall. See vmbus_set_event() -> hv_do_fast_hypercall8() ->
> > _hv_do_fast_hypercall8() -> hv_tdx_hypercall().
> 
> Embedding the special case for HVCALL_SIGNAL_EVENT within
> hv_do_fast_hypercall8() is not consistent with how this special case
> is handled for SNP.  For SNP, the special case is coded directly into
> vmbus_set_event().  Any reason not to do the same for TDX + paravisor?

Ok, will handle it directly in vmbus_set_event().

> > @@ -497,13 +497,29 @@ int hv_snp_boot_ap(int cpu, unsigned long
> start_ip)
> >
> >  void __init hv_vtom_init(void)
> >  {
> > +	enum hv_isolation_type type = hv_get_isolation_type();
> >  	/*
> >  	 * By design, a VM using vTOM doesn't see the SEV setting,
> >  	 * so SEV initialization is bypassed and sev_status isn't set.
> >  	 * Set it here to indicate a vTOM VM.
> >  	 */
> 
> The above comment applies just to the case HV_ISOLATION_TYPE_SNP,
> not to the entire switch statement, so it should be moved under the
> case.

Will fix.

> > [...]
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
[...]
> > @@ -134,7 +135,9 @@ static inline u64 _hv_do_fast_hypercall8(u64
> control, u64 input1)
> >  	u64 hv_status;
> >
> >  #ifdef CONFIG_X86_64
> > -	if (hv_isolation_type_tdx())
> > +	if (hv_isolation_type_tdx() &&
> > +		(!hyperv_paravisor_present ||
> > +		 control == (HVCALL_SIGNAL_EVENT |
> HV_HYPERCALL_FAST_BIT)))
> 
> See comment above.  This would be more consistent with SNP if it were
> handled directly in vmbus_set_event().

Will fix.

> > --- a/drivers/hv/connection.c
> > +++ b/drivers/hv/connection.c
> > @@ -484,7 +484,7 @@ void vmbus_set_event(struct vmbus_channel
> *channel)
> >
> >  	++channel->sig_events;
> >
> > -	if (hv_isolation_type_snp())
> > +	if (hv_isolation_type_snp() && hyperv_paravisor_present)
> 
> This code change seems to be more properly part of Patch 9 in the
> series when hv_isolation_type_en_snp() goes away.

The change here is part of the efforts to correctly support hypercalls for
a TDX VM with the paravisor. Patch 9 is just a clean-up patch, which is
not really required for a TDX VM (with or with the paravisor) to run on
Hyper-V, so IMO it's better to keep the change here in this patch.

BTW, please refer to the link for the v3 version of this patch (WIP):



^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present
  2023-08-23  4:23     ` Dexuan Cui
@ 2023-08-23  4:28       ` Dexuan Cui
  0 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-23  4:28 UTC (permalink / raw)
  To: Michael Kelley (LINUX), ak@linux.intel.com, arnd@arndb.de,
	bp@alien8.de, brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

> From: Dexuan Cui
> Sent: Tuesday, August 22, 2023 9:24 PM
> To: Michael Kelley (LINUX) <mikelley@microsoft.com>; ak@linux.intel.com;
>  [...]
> BTW, please refer to the link for the v3 version of this patch (WIP):

I forgot to share the link in my last email... Here it is:
https://github.com/dcui/tdx/commit/3e2a28d5cf1031679bf2e2ac37eb1fd02afc8d44

> [...]
> > > --- a/drivers/hv/connection.c
> > > +++ b/drivers/hv/connection.c
> > > @@ -484,7 +484,7 @@ void vmbus_set_event(struct vmbus_channel
> > *channel)
> > >
> > >  	++channel->sig_events;
> > >
> > > -	if (hv_isolation_type_snp())
> > > +	if (hv_isolation_type_snp() && hyperv_paravisor_present)
> >
> > This code change seems to be more properly part of Patch 9 in the
> > series when hv_isolation_type_en_snp() goes away.
> 
> The change here is part of the efforts to correctly support hypercalls for
> a TDX VM with the paravisor. Patch 9 is just a clean-up patch, which is
> not really required for a TDX VM (with or with the paravisor) to run on
> Hyper-V, so IMO it's better to keep the change here in this patch.
> 
> BTW, please refer to the link for the v3 version of this patch (WIP):

https://github.com/dcui/tdx/commit/3e2a28d5cf1031679bf2e2ac37eb1fd02afc8d44
(the same link as the above one) 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
  2023-08-21 19:33   ` Michael Kelley (LINUX)
@ 2023-08-23  4:30     ` Dexuan Cui
  0 siblings, 0 replies; 17+ messages in thread
From: Dexuan Cui @ 2023-08-23  4:30 UTC (permalink / raw)
  To: Michael Kelley (LINUX), ak@linux.intel.com, arnd@arndb.de,
	bp@alien8.de, brijesh.singh@amd.com, dan.j.williams@intel.com,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, jane.chu@oracle.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan,
	linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, rostedt@goodmis.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
	tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
	jason, nik.borisov@suse.com
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, Tianyu Lan,
	rick.p.edgecombe@intel.com, Anthony Davis, Mark Heslin, vkuznets,
	xiaoyao.li@intel.com

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Monday, August 21, 2023 12:33 PM
> > [...]
> > @@ -186,7 +186,49 @@ bool hv_ghcb_negotiate_protocol(void)
> >  	return true;
> >  }
> >
> > -void hv_ghcb_msr_write(u64 msr, u64 value)
> > +#define EXIT_REASON_MSR_READ		31
> > +#define EXIT_REASON_MSR_WRITE		32
> 
> These exit reasons are defined in arch/x86/include/uapi/asm/vmx.h.
> Are they conceptually the same thing and should be reused?

There is no VM Exit here, but I think we can use the header file.

I'll add
#include <uapi/asm/vmx.h>
and remove the 2 defines I added in this file.

> > +static void hv_tdx_read_msr(u64 msr, u64 *val)
> 
> Could you make the function name be
> hv_tdx_msr_read() so it matches hv_ghcb_msr_read()
> and hv_ivm_msr_read()?  :-)

Will do.
I'll also move the new functions around so that the new functions
won't get interleaved among the hv_ghcb_* functions.

I'll remove
EXPORT_SYMBOL_GPL(hv_ivm_msr_write);
EXPORT_SYMBOL_GPL(hv_ivm_msr_read);
because we never really used hv_ghcb_msr_write() and
hv_ghcb_msr_read() in any module.

The changelog is updated accordingly.

> > +{
> > +	struct tdx_hypercall_args args = {
> > +		.r10 = TDX_HYPERCALL_STANDARD,
> > +		.r11 = EXIT_REASON_MSR_READ,
> > +		.r12 = msr,
> > +	};
> > +
> > +#ifdef CONFIG_INTEL_TDX_GUEST
> > +	u64 ret = __tdx_hypercall_ret(&args);
> > +#else
> > +	u64 ret = HV_STATUS_INVALID_PARAMETER;
> > +#endif
> > +
> > +	if (WARN_ONCE(ret, "Failed to emulate MSR read: %lld\n", ret))
> > +		*val = 0;
> > +	else
> > +		*val = args.r11;
> > +}
> > +
> > +static void hv_tdx_write_msr(u64 msr, u64 val)
> 
> Same here on the function name.

Will fix.

> >  #ifdef CONFIG_AMD_MEM_ENCRYPT
> > -void hv_ghcb_msr_write(u64 msr, u64 value);
> > -void hv_ghcb_msr_read(u64 msr, u64 *value);
> > +void hv_ivm_msr_write(u64 msr, u64 value);
> > +void hv_ivm_msr_read(u64 msr, u64 *value);
> 
> These declarations are under CONFIG_AMD_MEM_ENCRYPT, which
> is problematic for TDX if the kernel is built with CONFIG_INTEL_TDX_GUEST
> but not CONFIG_AMD_MEM_ENCRYPT.  Presumably we want to make
> sure that combination builds and works correctly.
> 
> I think there's a bigger problem in that arch/x86/hyperv/ivm.c has
> a big #ifdef CONFIG_AMD_MEM_ENCRYPT in it, and TDX with paravisor
> wants to use the "vtom" functions that are under that #ifdef.

I worked out a new version of this patch:
https://github.com/dcui/tdx/commit/17b065e175082497907563297083b13dc86d84a9

I updated arch/x86/include/asm/mshyperv.h and arch/x86/hyperv/ivm.c
so that the kernel can still buid if CONFIG_AMD_MEM_ENCRYPT or
CONFIG_INTEL_TDX_GUEST is not set, or neither is set.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-23  4:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-20 20:27 [PATCH v2 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part) Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 1/9] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 2/9] x86/hyperv: Support hypercalls for fully enlightened " Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 3/9] Drivers: hv: vmbus: Support " Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 4/9] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM Dexuan Cui
2023-08-21 14:29   ` Michael Kelley (LINUX)
2023-08-21 18:17     ` Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 6/9] x86/hyperv: Introduce a global variable hyperv_paravisor_present Dexuan Cui
2023-08-21 19:33   ` Michael Kelley (LINUX)
2023-08-23  4:23     ` Dexuan Cui
2023-08-23  4:28       ` Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 7/9] Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 8/9] x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM " Dexuan Cui
2023-08-21 19:33   ` Michael Kelley (LINUX)
2023-08-23  4:30     ` Dexuan Cui
2023-08-20 20:27 ` [PATCH v2 9/9] x86/hyperv: Remove hv_isolation_type_en_snp Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).