linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH hyperv-next 0/6] Confidential VMBus
@ 2025-04-09  0:08 Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 1/6] Documentation: hyperv: " Roman Kisel
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

Logically, there are two parts to this patch series:

1. The first part is to add the support for the confidential VMBus
   protocol, patches 1-4.
2. The second part is to avoid the bounce-buffering when the pages
   aren't shared with the host, patches 5-6.

Let us discuss the motivation and present the value proposition.

The guests running on Hyper-V can be confidential where the memory and the
register content are encrypted, provided that the hardware supports that
(currently AMD SEV-SNP and Intel TDX) and the guest is capable of using
these features. The confidential guests cannot be introspected by the host
nor the hypervisor without the guest sharing the memory contents upon doing
which the memory is decrypted.

In the confidential guests, neither the host nor the hypervisor need to be
trusted, and the guests processing sensitive data can take advantage of that.

Not trusting the host and the hypervisor (removing them from the Trusted
Computing Base aka TCB) ncessitates that the method of communication
between the host and the guest be changed. Below there is the breakdown of
the options used in the both cases (in the diagrams below the server is
marked as S, the client is marked as C):

1. Without the paravisoor the devices are connected to the host, and the
host provides the device emulation or translation to the guest:

+---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
|               |       |                 |        |                |
|               |       |                 |        |                |
|               |       |                 ==========                |
|               |       |                 |        |                |
|               |       |                 |        |                |
|               |       |                 |        |                |
+----- C -------+       +-----------------+        +------- S ------+
       ||                                                   ||
       ||                                                   ||
+------||------------------ VMBus --------------------------||------+
|                     Interrupts, MMIO                              |
+-------------------------------------------------------------------+

2. With the paravisor, the devices are connected to the paravisor, and
the paravisor provides the device emulation or translation to the guest.
The guest doesn't communicate with the host directly, and the guest
communicates with the paravisor via the VMBus. The host is not trusted
in this model, and the paravisor is trusted:

+---- GUEST ------+                                   +-- DEVICE --+
|                 |                                   |            |
| +- PARAVISOR -+ |                                   |            |
| |             ==+====================================            |
| |   OpenHCL   | |                                   |            |
| |             | C=====================              |            |
+-+---- C - S --+-+                   ||              +------------+
        ||  ||                        ||
        ||  ||      +-- VMBus Relay --||--+           +--- HOST ---+
        ||  ||=======   Interrupts, MMIO  |           |            |
        ||          +---------------------+           +---- S -----+
        ||                                                  ||
+-------||----------------- VMBus --------------------------||------+
|                     Interrupts, MMIO                              |
+-------------------------------------------------------------------+

Note that in the second case the guest doesn't need to share the memory
with the host as it communicates only with the paravisor within their
partition boundary. That is precisely the raison d'etre and the value
proposition of this patch series: equip the confidential guest to use
private (encrypted) memory and rely on the paravisor when this is
available to be secure.

I'd like to thank the following people for their help with this
patch series:

- Dexuan for help with the patches 4-6, validation and the fruitful
  discussions,
- Easwar for reviewing the refactoring of the page allocating and
  freeing in `hv.c`,
- John and Sven for the design,
- Mike for helping to avoid pitfalls when dealing with the GFP flags,
- Sven for blazing the trail and implementing the design in few
  codebases.

Roman Kisel (6):
  Documentation: hyperv: Confidential VMBus
  drivers: hyperv: VMBus protocol version 6.0
  arch: hyperv: Get/set SynIC synth.registers via paravisor
  arch: x86, drivers: hyperv: Enable confidential VMBus
  arch, drivers: Add device struct bitfield to not bounce-buffer
  drivers: SCSI: Do not bounce-bufffer for the confidential VMBus

 Documentation/virt/hyperv/vmbus.rst |  41 +++
 arch/arm64/hyperv/mshyperv.c        |  19 ++
 arch/arm64/include/asm/mshyperv.h   |   3 +
 arch/x86/include/asm/mshyperv.h     |   3 +
 arch/x86/kernel/cpu/mshyperv.c      |  51 ++-
 arch/x86/mm/mem_encrypt.c           |   3 +
 drivers/hv/channel.c                |  36 ++-
 drivers/hv/channel_mgmt.c           |  29 +-
 drivers/hv/connection.c             |  10 +-
 drivers/hv/hv.c                     | 485 ++++++++++++++++++++--------
 drivers/hv/hyperv_vmbus.h           |   9 +-
 drivers/hv/ring_buffer.c            |   5 +-
 drivers/hv/vmbus_drv.c              | 152 +++++----
 drivers/scsi/storvsc_drv.c          |   2 +
 include/asm-generic/mshyperv.h      |   1 +
 include/linux/device.h              |   8 +
 include/linux/dma-direct.h          |   3 +
 include/linux/hyperv.h              |  71 ++--
 include/linux/swiotlb.h             |   3 +
 19 files changed, 696 insertions(+), 238 deletions(-)


base-commit: 628cc040b3a2980df6032766e8ef0688e981ab95
-- 
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH hyperv-next 1/6] Documentation: hyperv: Confidential VMBus
  2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
@ 2025-04-09  0:08 ` Roman Kisel
  2025-04-10 16:54   ` ALOK TIWARI
  2025-04-25  6:31   ` Wei Liu
  2025-04-09  0:08 ` [PATCH hyperv-next 2/6] drivers: hyperv: VMBus protocol version 6.0 Roman Kisel
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

Define what the confidential VMBus is and describe what advantages
it offers on the capable hardware.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 Documentation/virt/hyperv/vmbus.rst | 41 +++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/virt/hyperv/vmbus.rst b/Documentation/virt/hyperv/vmbus.rst
index 1dcef6a7fda3..f600e3d09800 100644
--- a/Documentation/virt/hyperv/vmbus.rst
+++ b/Documentation/virt/hyperv/vmbus.rst
@@ -324,3 +324,44 @@ rescinded, neither Hyper-V nor Linux retains any state about
 its previous existence. Such a device might be re-added later,
 in which case it is treated as an entirely new device. See
 vmbus_onoffer_rescind().
+
+Confidential VMBus
+------------------
+
+The confidential VMBus provides the control and data planes where
+the guest doesn't talk to either the hypervisor or the host. Instead,
+it relies on the trusted paravisor. The hardware (SNP or TDX) encrypts
+the guest memory and the register state also measuring the paravisor
+image via using the platform security processor to ensure trsuted and
+confidential computing.
+
+To support confidential communication with the paravisor, a VmBus client
+will first attempt to use regular, non-isolated mechanisms for communication.
+To do this, it must:
+
+* Configure the paravisor SIMP with an encrypted page. The paravisor SIMP is
+  configured by setting the relevant MSR directly, without using GHCB or tdcall.
+
+* Enable SINT 2 on both the paravisor and hypervisor, without setting the proxy
+  flag on the paravisor SINT. Enable interrupts on the paravisor SynIC.
+
+* Configure both the paravisor and hypervisor event flags page.
+  Both pages will need to be scanned when VmBus receives a channel interrupt.
+
+* Send messages to the paravisor by calling HvPostMessage directly, without using
+  GHCB or tdcall.
+
+* Set the EOM MSR directly in the paravisor, without using GHCB or tdcall.
+
+If sending the InitiateContact message using non-isolated HvPostMessage fails,
+the client must fall back to using the hypervisor synic, by using the GHCB/tdcall
+as appropriate.
+
+To fall back, the client will have to reconfigure the following:
+
+* Configure the hypervisor SIMP with a host-visible page.
+  Since the hypervisor SIMP is not used when in confidential mode,
+  this can be done up front, or only when needed, whichever makes sense for
+  the particular implementation.
+
+* Set the proxy flag on SINT 2 for the paravisor.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH hyperv-next 2/6] drivers: hyperv: VMBus protocol version 6.0
  2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 1/6] Documentation: hyperv: " Roman Kisel
@ 2025-04-09  0:08 ` Roman Kisel
  2025-04-10 17:03   ` ALOK TIWARI
  2025-04-09  0:08 ` [PATCH hyperv-next 3/6] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus is supported starting from the protocol
version 6.0 onwards.

Update the relevant definitions, provide a function that returns
whether VMBus is condifential or not.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/vmbus_drv.c         | 12 ++++++
 include/asm-generic/mshyperv.h |  1 +
 include/linux/hyperv.h         | 71 +++++++++++++++++++++++++---------
 3 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 22afebfc28ff..fa3ad6fe0bec 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -55,6 +55,18 @@ static long __percpu *vmbus_evt;
 int vmbus_irq;
 int vmbus_interrupt;
 
+/*
+ * If the Confidential VMBus is used, the data on the "wire" is not
+ * visible to either the host or the hypervisor.
+ */
+static bool is_confidential;
+
+bool vmbus_is_confidential(void)
+{
+	return is_confidential;
+}
+EXPORT_SYMBOL_GPL(vmbus_is_confidential);
+
 /*
  * The panic notifier below is responsible solely for unloading the
  * vmbus connection, which is necessary in a panic event.
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index ccccb1cbf7df..23f707b5aeeb 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -377,5 +377,6 @@ static inline int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u3
 	return -EOPNOTSUPP;
 }
 #endif /* CONFIG_MSHV_ROOT */
+bool vmbus_is_confidential(void);
 
 #endif
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 675959fb97ba..e66fd980789a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -265,16 +265,19 @@ static inline u32 hv_get_avail_to_write_percent(
  * Linux kernel.
  */
 
-#define VERSION_WS2008  ((0 << 16) | (13))
-#define VERSION_WIN7    ((1 << 16) | (1))
-#define VERSION_WIN8    ((2 << 16) | (4))
-#define VERSION_WIN8_1    ((3 << 16) | (0))
-#define VERSION_WIN10 ((4 << 16) | (0))
-#define VERSION_WIN10_V4_1 ((4 << 16) | (1))
-#define VERSION_WIN10_V5 ((5 << 16) | (0))
-#define VERSION_WIN10_V5_1 ((5 << 16) | (1))
-#define VERSION_WIN10_V5_2 ((5 << 16) | (2))
-#define VERSION_WIN10_V5_3 ((5 << 16) | (3))
+#define VMBUS_MAKE_VERSION(MAJ, MIN)	((((u32)MAJ) << 16) | (MIN))
+#define VERSION_WS2008					VMBUS_MAKE_VERSION(0, 13)
+#define VERSION_WIN7					VMBUS_MAKE_VERSION(1, 1)
+#define VERSION_WIN8					VMBUS_MAKE_VERSION(2, 4)
+#define VERSION_WIN8_1					VMBUS_MAKE_VERSION(3, 0)
+#define VERSION_WIN10					VMBUS_MAKE_VERSION(4, 0)
+#define VERSION_WIN10_V4_1				VMBUS_MAKE_VERSION(4, 1)
+#define VERSION_WIN10_V5				VMBUS_MAKE_VERSION(5, 0)
+#define VERSION_WIN10_V5_1				VMBUS_MAKE_VERSION(5, 1)
+#define VERSION_WIN10_V5_2				VMBUS_MAKE_VERSION(5, 2)
+#define VERSION_WIN10_V5_3				VMBUS_MAKE_VERSION(5, 3)
+#define VERSION_WIN_IRON				VERSION_WIN10_V5_3
+#define VERSION_WIN_COPPER				VMBUS_MAKE_VERSION(6, 0)
 
 /* Make maximum size of pipe payload of 16K */
 #define MAX_PIPE_DATA_PAYLOAD		(sizeof(u8) * 16384)
@@ -335,14 +338,22 @@ struct vmbus_channel_offer {
 } __packed;
 
 /* Server Flags */
-#define VMBUS_CHANNEL_ENUMERATE_DEVICE_INTERFACE	1
-#define VMBUS_CHANNEL_SERVER_SUPPORTS_TRANSFER_PAGES	2
-#define VMBUS_CHANNEL_SERVER_SUPPORTS_GPADLS		4
-#define VMBUS_CHANNEL_NAMED_PIPE_MODE			0x10
-#define VMBUS_CHANNEL_LOOPBACK_OFFER			0x100
-#define VMBUS_CHANNEL_PARENT_OFFER			0x200
-#define VMBUS_CHANNEL_REQUEST_MONITORED_NOTIFICATION	0x400
-#define VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER		0x2000
+#define VMBUS_CHANNEL_ENUMERATE_DEVICE_INTERFACE		0x0001
+/*
+ * This flag indicates that the channel is offered by the paravisor, and must
+ * use encrypted memory for the channel ring buffer.
+ */
+#define VMBUS_CHANNEL_CONFIDENTIAL_RING_BUFFER			0x0002
+/*
+ * This flag indicates that the channel is offered by the paravisor, and must
+ * use encrypted memory for GPA direct packets and additional GPADLs.
+ */
+#define VMBUS_CHANNEL_CONFIDENTIAL_EXTERNAL_MEMORY		0x0004
+#define VMBUS_CHANNEL_NAMED_PIPE_MODE					0x0010
+#define VMBUS_CHANNEL_LOOPBACK_OFFER					0x0100
+#define VMBUS_CHANNEL_PARENT_OFFER						0x0200
+#define VMBUS_CHANNEL_REQUEST_MONITORED_NOTIFICATION	0x0400
+#define VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER				0x2000
 
 struct vmpacket_descriptor {
 	u16 type;
@@ -621,6 +632,12 @@ struct vmbus_channel_relid_released {
 	u32 child_relid;
 } __packed;
 
+/*
+ * Used by the paravisor only, means that the encrypted ring buffers and
+ * the encrypted external memory are supported
+ */
+#define VMBUS_FEATURE_FLAG_CONFIDENTIAL_CHANNELS	0x10
+
 struct vmbus_channel_initiate_contact {
 	struct vmbus_channel_message_header header;
 	u32 vmbus_version_requested;
@@ -630,7 +647,8 @@ struct vmbus_channel_initiate_contact {
 		struct {
 			u8	msg_sint;
 			u8	msg_vtl;
-			u8	reserved[6];
+			u8	reserved[2];
+			u32 feature_flags; /* VMBus version 6.0 */
 		};
 	};
 	u64 monitor_page1;
@@ -1002,6 +1020,11 @@ struct vmbus_channel {
 
 	/* The max size of a packet on this channel */
 	u32 max_pkt_size;
+
+	/* The ring buffer is encrypted */
+	bool confidential_ring_buffer;
+	/* The external memory is encrypted */
+	bool confidential_external_memory;
 };
 
 #define lock_requestor(channel, flags)					\
@@ -1026,6 +1049,16 @@ u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
 			     u64 rqst_addr);
 u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id);
 
+static inline bool is_confidential_ring_buffer(const struct vmbus_channel_offer_channel *o)
+{
+	return !!(o->offer.chn_flags & VMBUS_CHANNEL_CONFIDENTIAL_RING_BUFFER);
+}
+
+static inline bool is_confidential_external_memory(const struct vmbus_channel_offer_channel *o)
+{
+	return !!(o->offer.chn_flags & VMBUS_CHANNEL_CONFIDENTIAL_EXTERNAL_MEMORY);
+}
+
 static inline bool is_hvsock_offer(const struct vmbus_channel_offer_channel *o)
 {
 	return !!(o->offer.chn_flags & VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH hyperv-next 3/6] arch: hyperv: Get/set SynIC synth.registers via paravisor
  2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 1/6] Documentation: hyperv: " Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 2/6] drivers: hyperv: VMBus protocol version 6.0 Roman Kisel
@ 2025-04-09  0:08 ` Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 4/6] arch: x86, drivers: hyperv: Enable confidential VMBus Roman Kisel
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus is built on the guest talking to the
paravisor only.

Provide functions that allow manipulating the SynIC registers
via paravisor.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/arm64/hyperv/mshyperv.c      | 19 +++++++++++++++++++
 arch/arm64/include/asm/mshyperv.h |  3 +++
 arch/x86/include/asm/mshyperv.h   |  3 +++
 arch/x86/kernel/cpu/mshyperv.c    | 28 ++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
index 4e27cc29c79e..1647a57a3379 100644
--- a/arch/arm64/hyperv/mshyperv.c
+++ b/arch/arm64/hyperv/mshyperv.c
@@ -91,3 +91,22 @@ bool hv_is_hyperv_initialized(void)
 	return hyperv_initialized;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
+
+/*
+ * Not supported yet.
+ */
+u64 hv_pv_get_synic_register(unsigned int reg, int *err)
+{
+	*err = -ENODEV;
+	return !0ULL;
+}
+EXPORT_SYMBOL_GPL(hv_pv_get_synic_register);
+
+/*
+ * Not supported yet.
+ */
+int hv_pv_set_synic_register(unsigned int reg, u64 val)
+{
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(hv_pv_set_synic_register);
diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index b721d3134ab6..bce37a58dff0 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -53,6 +53,9 @@ static inline u64 hv_get_non_nested_msr(unsigned int reg)
 	return hv_get_msr(reg);
 }
 
+u64 hv_pv_get_synic_register(unsigned int reg, int *err);
+int hv_pv_set_synic_register(unsigned int reg, u64 val);
+
 /* SMCCC hypercall parameters */
 #define HV_SMCCC_FUNC_NUMBER	1
 #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 07aadf0e839f..5650a16cc5e8 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -307,6 +307,9 @@ static __always_inline u64 hv_raw_get_msr(unsigned int reg)
 	return __rdmsr(reg);
 }
 
+u64 hv_pv_get_synic_register(unsigned int reg, int *err);
+int hv_pv_set_synic_register(unsigned int reg, u64 val);
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 3e2533954675..4f6e3d02f730 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -89,6 +89,34 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value)
 }
 EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
 
+/*
+ * Not every paravisor supports getting SynIC registers, and
+ * this function may fail. The caller has to make sure that this function
+ * runs on the CPU of interest.
+ */
+u64 hv_pv_get_synic_register(unsigned int reg, int *err)
+{
+	if (!hv_is_synic_msr(reg)) {
+		*err = -ENODEV;
+		return !0ULL;
+	}
+	return native_read_msr_safe(reg, err);
+}
+EXPORT_SYMBOL_GPL(hv_pv_get_synic_register);
+
+/*
+ * Not every paravisor supports setting SynIC registers, and
+ * this function may fail. The caller has to make sure that this function
+ * runs on the CPU of interest.
+ */
+int hv_pv_set_synic_register(unsigned int reg, u64 val)
+{
+	if (!hv_is_synic_msr(reg))
+		return -ENODEV;
+	return wrmsrl_safe(reg, val);
+}
+EXPORT_SYMBOL_GPL(hv_pv_set_synic_register);
+
 u64 hv_get_msr(unsigned int reg)
 {
 	if (hv_nested)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH hyperv-next 4/6] arch: x86, drivers: hyperv: Enable confidential VMBus
  2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
                   ` (2 preceding siblings ...)
  2025-04-09  0:08 ` [PATCH hyperv-next 3/6] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
@ 2025-04-09  0:08 ` Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer Roman Kisel
  2025-04-09  0:08 ` [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus Roman Kisel
  5 siblings, 0 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

Confidential VMBus employs the paravisor SynIC pages to implement
the control plane of the protocol, and the data plane may use
encrypted pages.

Implement scanning the additional pages in the control plane,
and update the logic not to decrypt ring buffer and GPADLs (GPA
descr. lists) unconditionally.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/kernel/cpu/mshyperv.c |  23 +-
 drivers/hv/channel.c           |  36 +--
 drivers/hv/channel_mgmt.c      |  29 +-
 drivers/hv/connection.c        |  10 +-
 drivers/hv/hv.c                | 485 ++++++++++++++++++++++++---------
 drivers/hv/hyperv_vmbus.h      |   9 +-
 drivers/hv/ring_buffer.c       |   5 +-
 drivers/hv/vmbus_drv.c         | 140 +++++-----
 8 files changed, 518 insertions(+), 219 deletions(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4f6e3d02f730..4163bc24269e 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -28,6 +28,7 @@
 #include <asm/apic.h>
 #include <asm/timer.h>
 #include <asm/reboot.h>
+#include <asm/msr.h>
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/numa.h>
@@ -77,14 +78,28 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_msr);
 
 void hv_set_non_nested_msr(unsigned int reg, u64 value)
 {
+	if (reg == HV_X64_MSR_EOM && vmbus_is_confidential()) {
+		/* Reach out to the paravisor. */
+		native_wrmsrl(reg, value);
+		return;
+	}
+
 	if (hv_is_synic_msr(reg) && ms_hyperv.paravisor_present) {
+		/* The hypervisor will get the intercept. */
 		hv_ivm_msr_write(reg, value);
 
-		/* Write proxy bit via wrmsl instruction */
-		if (hv_is_sint_msr(reg))
-			wrmsrl(reg, value | 1 << 20);
+		if (hv_is_sint_msr(reg)) {
+			/*
+			 * Write proxy bit in the case of non-confidential VMBus control plane.
+			 * Using wrmsl instruction so the following goes to the paravisor.
+			 */
+			u32 proxy = 1 & !vmbus_is_confidential();
+
+			value |= (proxy << 20);
+			native_wrmsrl(reg, value);
+		}
 	} else {
-		wrmsrl(reg, value);
+		native_wrmsrl(reg, value);
 	}
 }
 EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index fb8cd8469328..ef540b72f6ea 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -443,20 +443,23 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 		return ret;
 	}
 
-	/*
-	 * Set the "decrypted" flag to true for the set_memory_decrypted()
-	 * success case. In the failure case, the encryption state of the
-	 * memory is unknown. Leave "decrypted" as true to ensure the
-	 * memory will be leaked instead of going back on the free list.
-	 */
-	gpadl->decrypted = true;
-	ret = set_memory_decrypted((unsigned long)kbuffer,
-				   PFN_UP(size));
-	if (ret) {
-		dev_warn(&channel->device_obj->device,
-			 "Failed to set host visibility for new GPADL %d.\n",
-			 ret);
-		return ret;
+	if ((!channel->confidential_external_memory && type == HV_GPADL_BUFFER) ||
+		(!channel->confidential_ring_buffer && type == HV_GPADL_RING)) {
+		/*
+		 * Set the "decrypted" flag to true for the set_memory_decrypted()
+		 * success case. In the failure case, the encryption state of the
+		 * memory is unknown. Leave "decrypted" as true to ensure the
+		 * memory will be leaked instead of going back on the free list.
+		 */
+		gpadl->decrypted = true;
+		ret = set_memory_decrypted((unsigned long)kbuffer,
+					PFN_UP(size));
+		if (ret) {
+			dev_warn(&channel->device_obj->device,
+				"Failed to set host visibility for new GPADL %d.\n",
+				ret);
+			return ret;
+		}
 	}
 
 	init_completion(&msginfo->waitevent);
@@ -676,12 +679,13 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 		goto error_clean_ring;
 
 	err = hv_ringbuffer_init(&newchannel->outbound,
-				 page, send_pages, 0);
+				 page, send_pages, 0, newchannel->confidential_ring_buffer);
 	if (err)
 		goto error_free_gpadl;
 
 	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
-				 recv_pages, newchannel->max_pkt_size);
+				 recv_pages, newchannel->max_pkt_size,
+				 newchannel->confidential_ring_buffer);
 	if (err)
 		goto error_free_gpadl;
 
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 6e084c207414..39c8b80d967f 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -843,14 +843,14 @@ static void vmbus_wait_for_unload(void)
 				= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 			/*
-			 * In a CoCo VM the synic_message_page is not allocated
+			 * In a CoCo VM the hv_synic_message_page is not allocated
 			 * in hv_synic_alloc(). Instead it is set/cleared in
 			 * hv_synic_enable_regs() and hv_synic_disable_regs()
 			 * such that it is set only when the CPU is online. If
 			 * not all present CPUs are online, the message page
 			 * might be NULL, so skip such CPUs.
 			 */
-			page_addr = hv_cpu->synic_message_page;
+			page_addr = hv_cpu->hv_synic_message_page;
 			if (!page_addr)
 				continue;
 
@@ -891,7 +891,7 @@ static void vmbus_wait_for_unload(void)
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
-		page_addr = hv_cpu->synic_message_page;
+		page_addr = hv_cpu->hv_synic_message_page;
 		if (!page_addr)
 			continue;
 
@@ -1021,6 +1021,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	struct vmbus_channel_offer_channel *offer;
 	struct vmbus_channel *oldchannel, *newchannel;
 	size_t offer_sz;
+	bool confidential_ring_buffer, confidential_external_memory;
 
 	offer = (struct vmbus_channel_offer_channel *)hdr;
 
@@ -1033,6 +1034,18 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		return;
 	}
 
+	confidential_ring_buffer = is_confidential_ring_buffer(offer);
+	if (confidential_ring_buffer) {
+		if (vmbus_proto_version < VERSION_WIN_COPPER || !vmbus_is_confidential())
+			return;
+	}
+
+	confidential_external_memory = is_confidential_external_memory(offer);
+	if (is_confidential_external_memory(offer)) {
+		if (vmbus_proto_version < VERSION_WIN_COPPER || !vmbus_is_confidential())
+			return;
+	}
+
 	oldchannel = find_primary_channel_by_offer(offer);
 
 	if (oldchannel != NULL) {
@@ -1069,6 +1082,14 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 
 		atomic_dec(&vmbus_connection.offer_in_progress);
 
+		if ((oldchannel->confidential_ring_buffer && !confidential_ring_buffer) ||
+				(oldchannel->confidential_external_memory &&
+				!confidential_external_memory)) {
+			pr_err_ratelimited("Offer %d changes the confidential state\n",
+				offer->child_relid);
+			return;
+		}
+
 		WARN_ON(oldchannel->offermsg.child_relid != INVALID_RELID);
 		/* Fix up the relid. */
 		oldchannel->offermsg.child_relid = offer->child_relid;
@@ -1111,6 +1132,8 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		pr_err("Unable to allocate channel object\n");
 		return;
 	}
+	newchannel->confidential_ring_buffer = confidential_ring_buffer;
+	newchannel->confidential_external_memory = confidential_external_memory;
 
 	vmbus_setup_channel_state(newchannel, offer);
 
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 8351360bba16..268b7d58b45b 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -51,7 +51,8 @@ EXPORT_SYMBOL_GPL(vmbus_proto_version);
  * Linux guests and are not listed.
  */
 static __u32 vmbus_versions[] = {
-	VERSION_WIN10_V5_3,
+	VERSION_WIN_COPPER,
+	VERSION_WIN_IRON,
 	VERSION_WIN10_V5_2,
 	VERSION_WIN10_V5_1,
 	VERSION_WIN10_V5,
@@ -65,7 +66,7 @@ static __u32 vmbus_versions[] = {
  * Maximal VMBus protocol version guests can negotiate.  Useful to cap the
  * VMBus version for testing and debugging purpose.
  */
-static uint max_version = VERSION_WIN10_V5_3;
+static uint max_version = VERSION_WIN_COPPER;
 
 module_param(max_version, uint, S_IRUGO);
 MODULE_PARM_DESC(max_version,
@@ -105,6 +106,11 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 		vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
 	}
 
+	if (vmbus_is_confidential() && version >= VERSION_WIN_COPPER)
+		msg->feature_flags = VMBUS_FEATURE_FLAG_CONFIDENTIAL_CHANNELS;
+	else
+		msg->feature_flags = 0;
+
 	/*
 	 * shared_gpa_boundary is zero in non-SNP VMs, so it's safe to always
 	 * bitwise OR it
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 308c8f279df8..2f84e94635e3 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -74,7 +74,7 @@ int hv_post_message(union hv_connection_id connection_id,
 	aligned_msg->payload_size = payload_size;
 	memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-	if (ms_hyperv.paravisor_present) {
+	if (ms_hyperv.paravisor_present && !vmbus_is_confidential()) {
 		if (hv_isolation_type_tdx())
 			status = hv_tdx_hypercall(HVCALL_POST_MESSAGE,
 						  virt_to_phys(aligned_msg), 0);
@@ -94,11 +94,135 @@ int hv_post_message(union hv_connection_id connection_id,
 	return hv_result(status);
 }
 
+enum hv_page_encryption_action {
+	HV_PAGE_ENC_DEAFULT,
+	HV_PAGE_ENC_ENCRYPT,
+	HV_PAGE_ENC_DECRYPT
+};
+
+static int hv_alloc_page(unsigned int cpu, void **page, enum hv_page_encryption_action enc_action,
+	const char *note)
+{
+	int ret = 0;
+
+	pr_debug("allocating %s\n", note);
+
+	/*
+	 * After the page changes its encryption status, its contents will
+	 * appear scrambled. Thus `get_zeroed_page` would zero the page out
+	 * in vain, we do that ourselves exactly one time.
+	 *
+	 * The function might be called from contexts where sleeping is very
+	 * bad (like hotplug callbacks) or not possible (interrupt handling),
+	 * Thus requesting `GFP_ATOMIC`.
+	 *
+	 * The page order is 0 as we need 1 page and log_2 (1) = 0.
+	 */
+	*page = (void *)__get_free_pages(GFP_ATOMIC, 0);
+	if (!*page)
+		return -ENOMEM;
+
+	pr_debug("allocated %s\n", note);
+
+	switch (enc_action) {
+	case HV_PAGE_ENC_ENCRYPT:
+		ret = set_memory_encrypted((unsigned long)*page, 1);
+		break;
+	case HV_PAGE_ENC_DECRYPT:
+		ret = set_memory_decrypted((unsigned long)*page, 1);
+		break;
+	case HV_PAGE_ENC_DEAFULT:
+		break;
+	default:
+		pr_warn("unknown page encryption action %d for %s\n", enc_action, note);
+		break;
+	}
+
+	if (ret)
+		goto failed;
+
+	memset(*page, 0, PAGE_SIZE);
+	return 0;
+
+failed:
+
+	pr_err("page encryption action %d failed for %s, error %d when allocating the page\n",
+		enc_action, note, ret);
+	free_page((unsigned long)*page);
+	*page = NULL;
+	return ret;
+}
+
+static int hv_free_page(void **page, enum hv_page_encryption_action enc_action,
+	const char *note)
+{
+	int ret = 0;
+
+	pr_debug("freeing %s\n", note);
+
+	if (!page)
+		return 0;
+	if (!*page)
+		return 0;
+
+	switch (enc_action) {
+	case HV_PAGE_ENC_ENCRYPT:
+		ret = set_memory_encrypted((unsigned long)*page, 1);
+		break;
+	case HV_PAGE_ENC_DECRYPT:
+		ret = set_memory_decrypted((unsigned long)*page, 1);
+		break;
+	case HV_PAGE_ENC_DEAFULT:
+		break;
+	default:
+		pr_warn("unknown page encryption action %d for %s page\n",
+			enc_action, note);
+		break;
+	}
+
+	/*
+	 * In the case of the action failure, the page is leaked.
+	 * Something is wrong, prefer to lose the page and stay afloat.
+	 */
+	if (ret) {
+		pr_err("page encryption action %d failed for %s, error %d when freeing\n",
+			enc_action, note, ret);
+	} else {
+		pr_debug("freed %s\n", note);
+		free_page((unsigned long)*page);
+	}
+
+	*page = NULL;
+
+	return ret;
+}
+
+static bool hv_should_allocate_post_msg_page(void)
+{
+	return ms_hyperv.paravisor_present && hv_isolation_type_tdx();
+}
+
+static bool hv_should_allocate_synic_pages(void)
+{
+	return !ms_hyperv.paravisor_present && !hv_root_partition();
+}
+
+static bool hv_should_allocate_pv_synic_pages(void)
+{
+	return vmbus_is_confidential();
+}
+
 int hv_synic_alloc(void)
 {
 	int cpu, ret = -ENOMEM;
 	struct hv_per_cpu_context *hv_cpu;
 
+	const bool allocate_post_msg_page = hv_should_allocate_post_msg_page();
+	const bool allocate_synic_pages = hv_should_allocate_synic_pages();
+	const bool allocate_pv_synic_pages = hv_should_allocate_pv_synic_pages();
+	const enum hv_page_encryption_action enc_action =
+		(!vmbus_is_confidential()) ? HV_PAGE_ENC_DECRYPT : HV_PAGE_ENC_DEAFULT;
+
 	/*
 	 * First, zero all per-cpu memory areas so hv_synic_free() can
 	 * detect what memory has been allocated and cleanup properly
@@ -122,74 +246,38 @@ int hv_synic_alloc(void)
 		tasklet_init(&hv_cpu->msg_dpc,
 			     vmbus_on_msg_dpc, (unsigned long)hv_cpu);
 
-		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
-			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->post_msg_page) {
-				pr_err("Unable to allocate post msg page\n");
+		if (allocate_post_msg_page) {
+			ret = hv_alloc_page(cpu, &hv_cpu->post_msg_page,
+				enc_action, "post msg page");
+			if (ret)
 				goto err;
-			}
-
-			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
-			if (ret) {
-				pr_err("Failed to decrypt post msg page: %d\n", ret);
-				/* Just leak the page, as it's unsafe to free the page. */
-				hv_cpu->post_msg_page = NULL;
-				goto err;
-			}
-
-			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
 		}
 
 		/*
-		 * Synic message and event pages are allocated by paravisor.
-		 * Skip these pages allocation here.
+		 * If these SynIC pages are not allocated, SIEF and SIM pages
+		 * are configured using what the root partition or the paravisor
+		 * provides upon reading the SIEFP and SIMP registers.
 		 */
-		if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
-			hv_cpu->synic_message_page =
-				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->synic_message_page) {
-				pr_err("Unable to allocate SYNIC message page\n");
+		if (allocate_synic_pages) {
+			ret = hv_alloc_page(cpu, &hv_cpu->hv_synic_message_page,
+				enc_action, "SynIC msg page");
+			if (ret)
 				goto err;
-			}
-
-			hv_cpu->synic_event_page =
-				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->synic_event_page) {
-				pr_err("Unable to allocate SYNIC event page\n");
-
-				free_page((unsigned long)hv_cpu->synic_message_page);
-				hv_cpu->synic_message_page = NULL;
+			ret = hv_alloc_page(cpu, &hv_cpu->hv_synic_event_page,
+				enc_action, "SynIC event page");
+			if (ret)
 				goto err;
-			}
 		}
 
-		if (!ms_hyperv.paravisor_present &&
-		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
-			ret = set_memory_decrypted((unsigned long)
-				hv_cpu->synic_message_page, 1);
-			if (ret) {
-				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
-				hv_cpu->synic_message_page = NULL;
-
-				/*
-				 * Free the event page here so that hv_synic_free()
-				 * won't later try to re-encrypt it.
-				 */
-				free_page((unsigned long)hv_cpu->synic_event_page);
-				hv_cpu->synic_event_page = NULL;
+		if (allocate_pv_synic_pages) {
+			ret = hv_alloc_page(cpu, &hv_cpu->pv_synic_message_page,
+				HV_PAGE_ENC_DEAFULT, "pv SynIC msg page");
+			if (ret)
 				goto err;
-			}
-
-			ret = set_memory_decrypted((unsigned long)
-				hv_cpu->synic_event_page, 1);
-			if (ret) {
-				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
-				hv_cpu->synic_event_page = NULL;
+			ret = hv_alloc_page(cpu, &hv_cpu->pv_synic_event_page,
+				HV_PAGE_ENC_DEAFULT, "pv SynIC event page");
+			if (ret)
 				goto err;
-			}
-
-			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
-			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
 		}
 	}
 
@@ -205,55 +293,38 @@ int hv_synic_alloc(void)
 
 void hv_synic_free(void)
 {
-	int cpu, ret;
+	int cpu;
+
+	const bool free_post_msg_page = hv_should_allocate_post_msg_page();
+	const bool free_synic_pages = hv_should_allocate_synic_pages();
+	const bool free_pv_synic_pages = hv_should_allocate_pv_synic_pages();
 
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu =
 			per_cpu_ptr(hv_context.cpu_context, cpu);
 
-		/* It's better to leak the page if the encryption fails. */
-		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
-			if (hv_cpu->post_msg_page) {
-				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->post_msg_page, 1);
-				if (ret) {
-					pr_err("Failed to encrypt post msg page: %d\n", ret);
-					hv_cpu->post_msg_page = NULL;
-				}
-			}
+		if (free_post_msg_page)
+			hv_free_page(&hv_cpu->post_msg_page,
+				HV_PAGE_ENC_ENCRYPT, "post msg page");
+		if (free_synic_pages) {
+			hv_free_page(&hv_cpu->hv_synic_event_page,
+				HV_PAGE_ENC_ENCRYPT, "SynIC event page");
+			hv_free_page(&hv_cpu->hv_synic_message_page,
+				HV_PAGE_ENC_ENCRYPT, "SynIC msg page");
 		}
-
-		if (!ms_hyperv.paravisor_present &&
-		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
-			if (hv_cpu->synic_message_page) {
-				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->synic_message_page, 1);
-				if (ret) {
-					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
-					hv_cpu->synic_message_page = NULL;
-				}
-			}
-
-			if (hv_cpu->synic_event_page) {
-				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->synic_event_page, 1);
-				if (ret) {
-					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
-					hv_cpu->synic_event_page = NULL;
-				}
-			}
+		if (free_pv_synic_pages) {
+			hv_free_page(&hv_cpu->pv_synic_event_page,
+				HV_PAGE_ENC_DEAFULT, "pv SynIC event page");
+			hv_free_page(&hv_cpu->pv_synic_message_page,
+				HV_PAGE_ENC_DEAFULT, "pv SynIC msg page");
 		}
-
-		free_page((unsigned long)hv_cpu->post_msg_page);
-		free_page((unsigned long)hv_cpu->synic_event_page);
-		free_page((unsigned long)hv_cpu->synic_message_page);
 	}
 
 	kfree(hv_context.hv_numa_map);
 }
 
 /*
- * hv_synic_init - Initialize the Synthetic Interrupt Controller.
+ * hv_synic_enable_regs - Initialize the Synthetic Interrupt Controller.
  *
  * If it is already initialized by another entity (ie x2v shim), we need to
  * retrieve the initialized message and event pages.  Otherwise, we create and
@@ -266,7 +337,6 @@ void hv_synic_enable_regs(unsigned int cpu)
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_sint shared_sint;
-	union hv_synic_scontrol sctrl;
 
 	/* Setup the Synic's message page */
 	simp.as_uint64 = hv_get_msr(HV_MSR_SIMP);
@@ -276,18 +346,18 @@ void hv_synic_enable_regs(unsigned int cpu)
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
-		hv_cpu->synic_message_page =
-			(void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
-		if (!hv_cpu->synic_message_page)
+		hv_cpu->hv_synic_message_page
+			= (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
+		if (!hv_cpu->hv_synic_message_page)
 			pr_err("Fail to map synic message page.\n");
 	} else {
-		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+		simp.base_simp_gpa = virt_to_phys(hv_cpu->hv_synic_message_page)
 			>> HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_msr(HV_MSR_SIMP, simp.as_uint64);
 
-	/* Setup the Synic's event page */
+	/* Setup the Synic's event page with the hypervisor. */
 	siefp.as_uint64 = hv_get_msr(HV_MSR_SIEFP);
 	siefp.siefp_enabled = 1;
 
@@ -295,12 +365,12 @@ void hv_synic_enable_regs(unsigned int cpu)
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
-		hv_cpu->synic_event_page =
-			(void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
-		if (!hv_cpu->synic_event_page)
+		hv_cpu->hv_synic_event_page
+			= (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
+		if (!hv_cpu->hv_synic_event_page)
 			pr_err("Fail to map synic event page.\n");
 	} else {
-		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->hv_synic_event_page)
 			>> HV_HYP_PAGE_SHIFT;
 	}
 
@@ -313,8 +383,24 @@ void hv_synic_enable_regs(unsigned int cpu)
 
 	shared_sint.vector = vmbus_interrupt;
 	shared_sint.masked = false;
-	shared_sint.auto_eoi = hv_recommend_using_aeoi();
-	hv_set_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+	/*
+	 * On architectures where Hyper-V doesn't support AEOI (e.g., ARM64),
+	 * it doesn't provide a recommendation flag and AEOI must be disabled.
+	 */
+#ifdef HV_DEPRECATING_AEOI_RECOMMENDED
+	shared_sint.auto_eoi =
+			!(ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED);
+#else
+	shared_sint.auto_eoi = 0;
+#endif
+	hv_set_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT,
+				shared_sint.as_uint64);
+}
+
+static void hv_synic_enable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
 
 	/* Enable the global synic bit */
 	sctrl.as_uint64 = hv_get_msr(HV_MSR_SCONTROL);
@@ -323,13 +409,78 @@ void hv_synic_enable_regs(unsigned int cpu)
 	hv_set_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
 }
 
+/*
+ * The paravisor might not support proxying SynIC, and this
+ * function may fail.
+ */
+static int hv_pv_synic_enable_regs(unsigned int cpu)
+{
+	union hv_synic_simp simp;
+	union hv_synic_siefp siefp;
+
+	int err;
+	struct hv_per_cpu_context *hv_cpu
+		= per_cpu_ptr(hv_context.cpu_context, cpu);
+
+	/* Setup the Synic's message page with the paravisor. */
+	simp.as_uint64 = hv_pv_get_synic_register(HV_MSR_SIMP, &err);
+	if (err)
+		return err;
+	simp.simp_enabled = 1;
+	simp.base_simp_gpa = virt_to_phys(hv_cpu->pv_synic_message_page)
+			>> HV_HYP_PAGE_SHIFT;
+	err = hv_pv_set_synic_register(HV_MSR_SIMP, simp.as_uint64);
+	if (err)
+		return err;
+
+	/* Setup the Synic's event page with the paravisor. */
+	siefp.as_uint64 = hv_pv_get_synic_register(HV_MSR_SIEFP, &err);
+	if (err)
+		return err;
+	siefp.siefp_enabled = 1;
+	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->pv_synic_event_page)
+			>> HV_HYP_PAGE_SHIFT;
+	return hv_pv_set_synic_register(HV_MSR_SIEFP, siefp.as_uint64);
+}
+
+static int hv_pv_synic_enable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
+	int err;
+
+	/* Enable the global synic bit */
+	sctrl.as_uint64 = hv_pv_get_synic_register(HV_MSR_SCONTROL, &err);
+	if (err)
+		return err;
+	sctrl.enable = 1;
+
+	return hv_pv_set_synic_register(HV_MSR_SCONTROL, sctrl.as_uint64);
+}
+
 int hv_synic_init(unsigned int cpu)
 {
+	int err = 0;
+
+	/*
+	 * The paravisor may not support the confidential VMBus,
+	 * check on that first.
+	 */
+	if (vmbus_is_confidential())
+		err = hv_pv_synic_enable_regs(cpu);
+	if (err)
+		return err;
+
 	hv_synic_enable_regs(cpu);
+	if (!vmbus_is_confidential())
+		hv_synic_enable_interrupts();
+	else
+		err = hv_pv_synic_enable_interrupts();
+	if (err)
+		return err;
 
 	hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
 
-	return 0;
+	return err;
 }
 
 void hv_synic_disable_regs(unsigned int cpu)
@@ -339,7 +490,6 @@ void hv_synic_disable_regs(unsigned int cpu)
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
-	union hv_synic_scontrol sctrl;
 
 	shared_sint.as_uint64 = hv_get_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT);
 
@@ -358,8 +508,8 @@ void hv_synic_disable_regs(unsigned int cpu)
 	 */
 	simp.simp_enabled = 0;
 	if (ms_hyperv.paravisor_present || hv_root_partition()) {
-		iounmap(hv_cpu->synic_message_page);
-		hv_cpu->synic_message_page = NULL;
+		iounmap(hv_cpu->hv_synic_message_page);
+		hv_cpu->hv_synic_message_page = NULL;
 	} else {
 		simp.base_simp_gpa = 0;
 	}
@@ -370,43 +520,97 @@ void hv_synic_disable_regs(unsigned int cpu)
 	siefp.siefp_enabled = 0;
 
 	if (ms_hyperv.paravisor_present || hv_root_partition()) {
-		iounmap(hv_cpu->synic_event_page);
-		hv_cpu->synic_event_page = NULL;
+		iounmap(hv_cpu->hv_synic_event_page);
+		hv_cpu->hv_synic_event_page = NULL;
 	} else {
 		siefp.base_siefp_gpa = 0;
 	}
 
 	hv_set_msr(HV_MSR_SIEFP, siefp.as_uint64);
+}
+
+static void hv_synic_disable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
 
 	/* Disable the global synic bit */
 	sctrl.as_uint64 = hv_get_msr(HV_MSR_SCONTROL);
 	sctrl.enable = 0;
 	hv_set_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
+}
 
+static void hv_vmbus_disable_percpu_interrupts(void)
+{
 	if (vmbus_irq != -1)
 		disable_percpu_irq(vmbus_irq);
 }
 
+static void hv_pv_synic_disable_regs(unsigned int cpu)
+{
+	/*
+	 * The get/set register errors are deliberatley ignored in
+	 * the cleanup path as they are non-consequential here.
+	 */
+	int err;
+	union hv_synic_simp simp;
+	union hv_synic_siefp siefp;
+
+	struct hv_per_cpu_context *hv_cpu
+		= per_cpu_ptr(hv_context.cpu_context, cpu);
+
+	/* Disable SynIC's message page in the paravisor. */
+	simp.as_uint64 = hv_pv_get_synic_register(HV_MSR_SIMP, &err);
+	if (err)
+		return;
+	simp.simp_enabled = 0;
+
+	iounmap(hv_cpu->pv_synic_message_page);
+	hv_cpu->pv_synic_message_page = NULL;
+
+	err = hv_pv_set_synic_register(HV_MSR_SIMP, simp.as_uint64);
+	if (err)
+		return;
+
+	/* Disable SynIC's event page in the paravisor. */
+	siefp.as_uint64 = hv_pv_get_synic_register(HV_MSR_SIEFP, &err);
+	if (err)
+		return;
+	siefp.siefp_enabled = 0;
+
+	iounmap(hv_cpu->pv_synic_event_page);
+	hv_cpu->pv_synic_event_page = NULL;
+
+	hv_pv_set_synic_register(HV_MSR_SIEFP, siefp.as_uint64);
+}
+
+static void hv_pv_synic_disable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
+	int err;
+
+	/* Disable the global synic bit */
+	sctrl.as_uint64 = hv_pv_get_synic_register(HV_MSR_SCONTROL, &err);
+	if (err)
+		return;
+	sctrl.enable = 0;
+	hv_pv_set_synic_register(HV_MSR_SCONTROL, sctrl.as_uint64);
+}
+
 #define HV_MAX_TRIES 3
-/*
- * Scan the event flags page of 'this' CPU looking for any bit that is set.  If we find one
- * bit set, then wait for a few milliseconds.  Repeat these steps for a maximum of 3 times.
- * Return 'true', if there is still any set bit after this operation; 'false', otherwise.
- *
- * If a bit is set, that means there is a pending channel interrupt.  The expectation is
- * that the normal interrupt handling mechanism will find and process the channel interrupt
- * "very soon", and in the process clear the bit.
- */
-static bool hv_synic_event_pending(void)
+
+static bool hv_synic_event_pending_for(union hv_synic_event_flags *event, int sint)
 {
-	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
-	union hv_synic_event_flags *event =
-		(union hv_synic_event_flags *)hv_cpu->synic_event_page + VMBUS_MESSAGE_SINT;
-	unsigned long *recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
+	unsigned long *recv_int_page;
 	bool pending;
 	u32 relid;
-	int tries = 0;
+	int tries;
+
+	if (!event)
+		return false;
 
+	tries = 0;
+	event += sint;
+	recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
 retry:
 	pending = false;
 	for_each_set_bit(relid, recv_int_page, HV_EVENT_FLAGS_COUNT) {
@@ -460,6 +664,26 @@ static int hv_pick_new_cpu(struct vmbus_channel *channel)
 /*
  * hv_synic_cleanup - Cleanup routine for hv_synic_init().
  */
+/*
+ * Scan the event flags page of 'this' CPU looking for any bit that is set.  If we find one
+ * bit set, then wait for a few milliseconds.  Repeat these steps for a maximum of 3 times.
+ * Return 'true', if there is still any set bit after this operation; 'false', otherwise.
+ *
+ * If a bit is set, that means there is a pending channel interrupt.  The expectation is
+ * that the normal interrupt handling mechanism will find and process the channel interrupt
+ * "very soon", and in the process clear the bit.
+ */
+static bool hv_synic_event_pending(void)
+{
+	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
+	union hv_synic_event_flags *hv_synic_event_page = hv_cpu->hv_synic_event_page;
+	union hv_synic_event_flags *pv_synic_event_page = hv_cpu->pv_synic_event_page;
+
+	return
+		hv_synic_event_pending_for(hv_synic_event_page, VMBUS_MESSAGE_SINT) ||
+		hv_synic_event_pending_for(pv_synic_event_page, VMBUS_MESSAGE_SINT);
+}
+
 int hv_synic_cleanup(unsigned int cpu)
 {
 	struct vmbus_channel *channel, *sc;
@@ -516,6 +740,13 @@ int hv_synic_cleanup(unsigned int cpu)
 	hv_stimer_legacy_cleanup(cpu);
 
 	hv_synic_disable_regs(cpu);
+	if (vmbus_is_confidential())
+		hv_pv_synic_disable_regs(cpu);
+	if (!vmbus_is_confidential())
+		hv_synic_disable_interrupts();
+	else
+		hv_pv_synic_disable_interrupts();
+	hv_vmbus_disable_percpu_interrupts();
 
 	return ret;
 }
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..9337e0afa3ce 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -120,8 +120,10 @@ enum {
  * Per cpu state for channel handling
  */
 struct hv_per_cpu_context {
-	void *synic_message_page;
-	void *synic_event_page;
+	void *hv_synic_message_page;
+	void *hv_synic_event_page;
+	void *pv_synic_message_page;
+	void *pv_synic_event_page;
 
 	/*
 	 * The page is only used in hv_post_message() for a TDX VM (with the
@@ -182,7 +184,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-		       struct page *pages, u32 pagecnt, u32 max_pkt_size);
+		       struct page *pages, u32 pagecnt, u32 max_pkt_size,
+			   bool confidential);
 
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 3c9b02471760..05c2cd42fc75 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -183,7 +183,8 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
 
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
+		       struct page *pages, u32 page_cnt, u32 max_pkt_size,
+			   bool confidential)
 {
 	struct page **pages_wraparound;
 	int i;
@@ -207,7 +208,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 
 	ring_info->ring_buffer = (struct hv_ring_buffer *)
 		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
-			pgprot_decrypted(PAGE_KERNEL));
+			confidential ? PAGE_KERNEL : pgprot_decrypted(PAGE_KERNEL));
 
 	kfree(pages_wraparound);
 	if (!ring_info->ring_buffer)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index fa3ad6fe0bec..2557ec849fa6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1027,12 +1027,9 @@ static void vmbus_onmessage_work(struct work_struct *work)
 	kfree(ctx);
 }
 
-void vmbus_on_msg_dpc(unsigned long data)
+static void vmbus_on_msg_dpc_for(void *message_page_addr)
 {
-	struct hv_per_cpu_context *hv_cpu = (void *)data;
-	void *page_addr = hv_cpu->synic_message_page;
-	struct hv_message msg_copy, *msg = (struct hv_message *)page_addr +
-				  VMBUS_MESSAGE_SINT;
+	struct hv_message msg_copy, *msg;
 	struct vmbus_channel_message_header *hdr;
 	enum vmbus_channel_message_type msgtype;
 	const struct vmbus_channel_message_table_entry *entry;
@@ -1040,6 +1037,10 @@ void vmbus_on_msg_dpc(unsigned long data)
 	__u8 payload_size;
 	u32 message_type;
 
+	if (!message_page_addr)
+		return;
+	msg = (struct hv_message *)message_page_addr + VMBUS_MESSAGE_SINT;
+
 	/*
 	 * 'enum vmbus_channel_message_type' is supposed to always be 'u32' as
 	 * it is being used in 'struct vmbus_channel_message_header' definition
@@ -1165,6 +1166,14 @@ void vmbus_on_msg_dpc(unsigned long data)
 	vmbus_signal_eom(msg, message_type);
 }
 
+void vmbus_on_msg_dpc(unsigned long data)
+{
+	struct hv_per_cpu_context *hv_cpu = (void *)data;
+
+	vmbus_on_msg_dpc_for(hv_cpu->hv_synic_message_page);
+	vmbus_on_msg_dpc_for(hv_cpu->pv_synic_message_page);
+}
+
 #ifdef CONFIG_PM_SLEEP
 /*
  * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
@@ -1203,21 +1212,19 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 #endif /* CONFIG_PM_SLEEP */
 
 /*
- * Schedule all channels with events pending
+ * Schedule all channels with events pending.
+ * The event page can be directly checked to get the id of
+ * the channel that has the interrupt pending.
  */
-static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
+static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu, void *event_page_addr)
 {
 	unsigned long *recv_int_page;
 	u32 maxbits, relid;
+	union hv_synic_event_flags *event;
 
-	/*
-	 * The event page can be directly checked to get the id of
-	 * the channel that has the interrupt pending.
-	 */
-	void *page_addr = hv_cpu->synic_event_page;
-	union hv_synic_event_flags *event
-		= (union hv_synic_event_flags *)page_addr +
-					 VMBUS_MESSAGE_SINT;
+	if (!event_page_addr)
+		return;
+	event = (union hv_synic_event_flags *)event_page_addr + VMBUS_MESSAGE_SINT;
 
 	maxbits = HV_EVENT_FLAGS_COUNT;
 	recv_int_page = event->flags;
@@ -1288,26 +1295,35 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 	}
 }
 
-static void vmbus_isr(void)
+static void vmbus_message_sched(struct hv_per_cpu_context *hv_cpu, void *message_page_addr)
 {
-	struct hv_per_cpu_context *hv_cpu
-		= this_cpu_ptr(hv_context.cpu_context);
-	void *page_addr;
 	struct hv_message *msg;
 
-	vmbus_chan_sched(hv_cpu);
-
-	page_addr = hv_cpu->synic_message_page;
-	msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
+	if (!message_page_addr)
+		return;
+	msg = (struct hv_message *)message_page_addr + VMBUS_MESSAGE_SINT;
 
 	/* Check if there are actual msgs to be processed */
 	if (msg->header.message_type != HVMSG_NONE) {
 		if (msg->header.message_type == HVMSG_TIMER_EXPIRED) {
 			hv_stimer0_isr();
 			vmbus_signal_eom(msg, HVMSG_TIMER_EXPIRED);
-		} else
+		} else {
 			tasklet_schedule(&hv_cpu->msg_dpc);
+		}
 	}
+}
+
+static void vmbus_isr(void)
+{
+	struct hv_per_cpu_context *hv_cpu
+		= this_cpu_ptr(hv_context.cpu_context);
+
+	vmbus_chan_sched(hv_cpu, hv_cpu->hv_synic_event_page);
+	vmbus_chan_sched(hv_cpu, hv_cpu->pv_synic_event_page);
+
+	vmbus_message_sched(hv_cpu, hv_cpu->hv_synic_message_page);
+	vmbus_message_sched(hv_cpu, hv_cpu->pv_synic_message_page);
 
 	add_interrupt_randomness(vmbus_interrupt);
 }
@@ -1318,11 +1334,35 @@ static irqreturn_t vmbus_percpu_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-static void vmbus_percpu_work(struct work_struct *work)
+static int vmbus_setup_control_plane(void)
 {
-	unsigned int cpu = smp_processor_id();
+	int ret;
+	int hyperv_cpuhp_online;
+
+	ret = hv_synic_alloc();
+	if (ret < 0)
+		goto err_alloc;
 
-	hv_synic_init(cpu);
+	/*
+	 * Initialize the per-cpu interrupt state and stimer state.
+	 * Then connect to the host.
+	 */
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
+				hv_synic_init, hv_synic_cleanup);
+	if (ret < 0)
+		goto err_alloc;
+	hyperv_cpuhp_online = ret;
+	ret = vmbus_connect();
+	if (ret)
+		goto err_connect;
+	return 0;
+
+err_connect:
+	cpuhp_remove_state(hyperv_cpuhp_online);
+	return -ENODEV;
+err_alloc:
+	hv_synic_free();
+	return -ENOMEM;
 }
 
 /*
@@ -1335,8 +1375,7 @@ static void vmbus_percpu_work(struct work_struct *work)
  */
 static int vmbus_bus_init(void)
 {
-	int ret, cpu;
-	struct work_struct __percpu *works;
+	int ret;
 
 	ret = hv_init();
 	if (ret != 0) {
@@ -1371,41 +1410,21 @@ static int vmbus_bus_init(void)
 		}
 	}
 
-	ret = hv_synic_alloc();
-	if (ret)
-		goto err_alloc;
-
-	works = alloc_percpu(struct work_struct);
-	if (!works) {
-		ret = -ENOMEM;
-		goto err_alloc;
-	}
-
 	/*
-	 * Initialize the per-cpu interrupt state and stimer state.
-	 * Then connect to the host.
+	 * Attempt to establish the confidential control plane first if this VM is
+	.* a hardware confidential VM, and the paravisor is present.
 	 */
-	cpus_read_lock();
-	for_each_online_cpu(cpu) {
-		struct work_struct *work = per_cpu_ptr(works, cpu);
+	ret = -ENODEV;
+	if (ms_hyperv.paravisor_present && (hv_isolation_type_tdx() || hv_isolation_type_snp())) {
+		is_confidential = true;
+		ret = vmbus_setup_control_plane();
+		is_confidential = ret == 0;
 
-		INIT_WORK(work, vmbus_percpu_work);
-		schedule_work_on(cpu, work);
+		pr_info("VMBus control plane is confidential: %d\n", is_confidential);
 	}
 
-	for_each_online_cpu(cpu)
-		flush_work(per_cpu_ptr(works, cpu));
-
-	/* Register the callbacks for possible CPU online/offline'ing */
-	ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
-						   hv_synic_init, hv_synic_cleanup);
-	cpus_read_unlock();
-	free_percpu(works);
-	if (ret < 0)
-		goto err_alloc;
-	hyperv_cpuhp_online = ret;
-
-	ret = vmbus_connect();
+	if (!is_confidential)
+		ret = vmbus_setup_control_plane();
 	if (ret)
 		goto err_connect;
 
@@ -1421,9 +1440,6 @@ static int vmbus_bus_init(void)
 	return 0;
 
 err_connect:
-	cpuhp_remove_state(hyperv_cpuhp_online);
-err_alloc:
-	hv_synic_free();
 	if (vmbus_irq == -1) {
 		hv_remove_vmbus_handler();
 	} else {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
                   ` (3 preceding siblings ...)
  2025-04-09  0:08 ` [PATCH hyperv-next 4/6] arch: x86, drivers: hyperv: Enable confidential VMBus Roman Kisel
@ 2025-04-09  0:08 ` Roman Kisel
  2025-04-09 10:52   ` Christoph Hellwig
  2025-04-09 16:03   ` Robin Murphy
  2025-04-09  0:08 ` [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus Roman Kisel
  5 siblings, 2 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

Bounce-buffering makes the system spend more time copying
I/O data. When the I/O transaction take place between
a confidential and a non-confidential endpoints, there is
no other way around.

Introduce a device bitfield to indicate that the device
doesn't need to perform bounce buffering. The capable
device may employ it to save on copying data around.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/mm/mem_encrypt.c  | 3 +++
 include/linux/device.h     | 8 ++++++++
 include/linux/dma-direct.h | 3 +++
 include/linux/swiotlb.h    | 3 +++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 95bae74fdab2..6349a02a1da3 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,9 @@
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
+	if (dev->use_priv_pages_for_io)
+		return false;
+
 	/*
 	 * For SEV, all DMA must be to unencrypted addresses.
 	 */
diff --git a/include/linux/device.h b/include/linux/device.h
index 80a5b3268986..4aa4a6fd9580 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -725,6 +725,8 @@ struct device_physical_location {
  * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers.
  * @dma_iommu: Device is using default IOMMU implementation for DMA and
  *		doesn't rely on dma_ops structure.
+ * @use_priv_pages_for_io: Device is using private pages for I/O, no need to
+ *		bounce-buffer.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -843,6 +845,7 @@ struct device {
 #ifdef CONFIG_IOMMU_DMA
 	bool			dma_iommu:1;
 #endif
+	bool			use_priv_pages_for_io:1;
 };
 
 /**
@@ -1079,6 +1082,11 @@ static inline bool dev_removable_is_valid(struct device *dev)
 	return dev->removable != DEVICE_REMOVABLE_NOT_SUPPORTED;
 }
 
+static inline bool dev_priv_pages_for_io(struct device *dev)
+{
+	return dev->use_priv_pages_for_io;
+}
+
 /*
  * High level routines for use by the bus drivers
  */
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index d7e30d4f7503..b096369f847e 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -94,6 +94,9 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
  */
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
+	if (dev_priv_pages_for_io(dev))
+		return phys_to_dma_unencrypted(dev, paddr);
+
 	return __sme_set(phys_to_dma_unencrypted(dev, paddr));
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 3dae0f592063..35ee10641b42 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -173,6 +173,9 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
+	if (dev_priv_pages_for_io(dev))
+		return false;
+
 	return mem && mem->force_bounce;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus
  2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
                   ` (4 preceding siblings ...)
  2025-04-09  0:08 ` [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer Roman Kisel
@ 2025-04-09  0:08 ` Roman Kisel
  2025-04-09 10:53   ` Christoph Hellwig
  5 siblings, 1 reply; 25+ messages in thread
From: Roman Kisel @ 2025-04-09  0:08 UTC (permalink / raw)
  To: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut

The device bit that indicates that the device is capable of I/O
with private pages lets avoid excessive copying in the Hyper-V
SCSI driver.

Set that bit equal to the confidential external memory one to
not bounce buffer

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index a8614e54544e..f647f8fc2f8f 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -686,6 +686,8 @@ static void handle_sc_creation(struct vmbus_channel *new_sc)
 	struct vmstorage_channel_properties props;
 	int ret;
 
+	dev->use_priv_pages_for_io = new_sc->confidential_external_memory;
+
 	stor_device = get_out_stor_device(device);
 	if (!stor_device)
 		return;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09  0:08 ` [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer Roman Kisel
@ 2025-04-09 10:52   ` Christoph Hellwig
  2025-04-09 15:27     ` Roman Kisel
  2025-04-09 16:03   ` Robin Murphy
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2025-04-09 10:52 UTC (permalink / raw)
  To: Roman Kisel
  Cc: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut

On Tue, Apr 08, 2025 at 05:08:34PM -0700, Roman Kisel wrote:
> Bounce-buffering makes the system spend more time copying
> I/O data. When the I/O transaction take place between
> a confidential and a non-confidential endpoints, there is
> no other way around.
> 
> Introduce a device bitfield to indicate that the device
> doesn't need to perform bounce buffering. The capable
> device may employ it to save on copying data around.

I have no idea what this is supposed to mean, you need to explain it
much better.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus
  2025-04-09  0:08 ` [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus Roman Kisel
@ 2025-04-09 10:53   ` Christoph Hellwig
  2025-04-09 15:36     ` Roman Kisel
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2025-04-09 10:53 UTC (permalink / raw)
  To: Roman Kisel
  Cc: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut

On Tue, Apr 08, 2025 at 05:08:35PM -0700, Roman Kisel wrote:
> The device bit that indicates that the device is capable of I/O
> with private pages lets avoid excessive copying in the Hyper-V
> SCSI driver.
> 
> Set that bit equal to the confidential external memory one to
> not bounce buffer

Drivers have absolutely no business telling this.  The need for bounce
buffering or not is a platform/IOMMU decision and not one specific to
a certain device or driver.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 10:52   ` Christoph Hellwig
@ 2025-04-09 15:27     ` Roman Kisel
  0 siblings, 0 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09 15:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut



On 4/9/2025 3:52 AM, Christoph Hellwig wrote:
> On Tue, Apr 08, 2025 at 05:08:34PM -0700, Roman Kisel wrote:
>> Bounce-buffering makes the system spend more time copying
>> I/O data. When the I/O transaction take place between
>> a confidential and a non-confidential endpoints, there is
>> no other way around.
>>
>> Introduce a device bitfield to indicate that the device
>> doesn't need to perform bounce buffering. The capable
>> device may employ it to save on copying data around.
> 
> I have no idea what this is supposed to mean, you need to explain it
> much better.

Thanks for reviewing! I'll fix the description.

> 

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus
  2025-04-09 10:53   ` Christoph Hellwig
@ 2025-04-09 15:36     ` Roman Kisel
  0 siblings, 0 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09 15:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut



On 4/9/2025 3:53 AM, Christoph Hellwig wrote:
> On Tue, Apr 08, 2025 at 05:08:35PM -0700, Roman Kisel wrote:
>> The device bit that indicates that the device is capable of I/O
>> with private pages lets avoid excessive copying in the Hyper-V
>> SCSI driver.
>>
>> Set that bit equal to the confidential external memory one to
>> not bounce buffer
> 
> Drivers have absolutely no business telling this.  The need for bounce
> buffering or not is a platform/IOMMU decision and not one specific to
> a certain device or driver.

Seemed to work although I cannot claim nothing is going to be broken
ever. It did appear from the code that one could have this per-device
bit.

As I understand, you're saying this is architecturally broken. Do you
think a broader set of changes would improve the implementation?

> 

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09  0:08 ` [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer Roman Kisel
  2025-04-09 10:52   ` Christoph Hellwig
@ 2025-04-09 16:03   ` Robin Murphy
  2025-04-09 16:44     ` Roman Kisel
  1 sibling, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2025-04-09 16:03 UTC (permalink / raw)
  To: Roman Kisel, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut, Suzuki K Poulose

On 2025-04-09 1:08 am, Roman Kisel wrote:
> Bounce-buffering makes the system spend more time copying
> I/O data. When the I/O transaction take place between
> a confidential and a non-confidential endpoints, there is
> no other way around.
> 
> Introduce a device bitfield to indicate that the device
> doesn't need to perform bounce buffering. The capable
> device may employ it to save on copying data around.

It's not so much about bounce buffering, it's more fundamentally about 
whether the device is trusted and able to access private memory at all 
or not. And performance is hardly the biggest concern either - if you do 
trust a device to operate on confidential data in private memory, then 
surely it is crucial to actively *prevent* that data ever getting into 
shared SWIOTLB pages where anyone else could also get at it. At worst 
that means CoCo VMs might need an *additional* non-shared SWIOTLB to 
support trusted devices with addressing limitations (and/or 
"swiotlb=force" debugging, potentially).

Also whatever we do for this really wants to tie in with the nascent 
TDISP stuff as well, since we definitely don't want to end up with more 
than one notion of whether a device is in a trusted/locked/private/etc. 
vs. unlocked/shared/etc. state with respect to DMA (or indeed anything 
else if we can avoid it).

Thanks,
Robin.

> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>   arch/x86/mm/mem_encrypt.c  | 3 +++
>   include/linux/device.h     | 8 ++++++++
>   include/linux/dma-direct.h | 3 +++
>   include/linux/swiotlb.h    | 3 +++
>   4 files changed, 17 insertions(+)
> 
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 95bae74fdab2..6349a02a1da3 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -19,6 +19,9 @@
>   /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>   bool force_dma_unencrypted(struct device *dev)
>   {
> +	if (dev->use_priv_pages_for_io)
> +		return false;
> +
>   	/*
>   	 * For SEV, all DMA must be to unencrypted addresses.
>   	 */
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 80a5b3268986..4aa4a6fd9580 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -725,6 +725,8 @@ struct device_physical_location {
>    * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers.
>    * @dma_iommu: Device is using default IOMMU implementation for DMA and
>    *		doesn't rely on dma_ops structure.
> + * @use_priv_pages_for_io: Device is using private pages for I/O, no need to
> + *		bounce-buffer.
>    *
>    * At the lowest level, every device in a Linux system is represented by an
>    * instance of struct device. The device structure contains the information
> @@ -843,6 +845,7 @@ struct device {
>   #ifdef CONFIG_IOMMU_DMA
>   	bool			dma_iommu:1;
>   #endif
> +	bool			use_priv_pages_for_io:1;
>   };
>   
>   /**
> @@ -1079,6 +1082,11 @@ static inline bool dev_removable_is_valid(struct device *dev)
>   	return dev->removable != DEVICE_REMOVABLE_NOT_SUPPORTED;
>   }
>   
> +static inline bool dev_priv_pages_for_io(struct device *dev)
> +{
> +	return dev->use_priv_pages_for_io;
> +}
> +
>   /*
>    * High level routines for use by the bus drivers
>    */
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index d7e30d4f7503..b096369f847e 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -94,6 +94,9 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
>    */
>   static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
>   {
> +	if (dev_priv_pages_for_io(dev))
> +		return phys_to_dma_unencrypted(dev, paddr);
> +
>   	return __sme_set(phys_to_dma_unencrypted(dev, paddr));
>   }
>   
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 3dae0f592063..35ee10641b42 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -173,6 +173,9 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
>   {
>   	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   
> +	if (dev_priv_pages_for_io(dev))
> +		return false;
> +
>   	return mem && mem->force_bounce;
>   }
>   


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 16:03   ` Robin Murphy
@ 2025-04-09 16:44     ` Roman Kisel
  2025-04-09 23:30       ` Dan Williams
  2025-04-10  7:21       ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-09 16:44 UTC (permalink / raw)
  To: Robin Murphy, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut, Suzuki K Poulose



On 4/9/2025 9:03 AM, Robin Murphy wrote:
> On 2025-04-09 1:08 am, Roman Kisel wrote:
>> Bounce-buffering makes the system spend more time copying
>> I/O data. When the I/O transaction take place between
>> a confidential and a non-confidential endpoints, there is
>> no other way around.
>>
>> Introduce a device bitfield to indicate that the device
>> doesn't need to perform bounce buffering. The capable
>> device may employ it to save on copying data around.
> 
> It's not so much about bounce buffering, it's more fundamentally about 
> whether the device is trusted and able to access private memory at all 
> or not. And performance is hardly the biggest concern either - if you do 
> trust a device to operate on confidential data in private memory, then 
> surely it is crucial to actively *prevent* that data ever getting into 
> shared SWIOTLB pages where anyone else could also get at it. At worst 
> that means CoCo VMs might need an *additional* non-shared SWIOTLB to 
> support trusted devices with addressing limitations (and/or 
> "swiotlb=force" debugging, potentially).

Thanks, I should've highlighted that facet most certainly!

> 
> Also whatever we do for this really wants to tie in with the nascent 
> TDISP stuff as well, since we definitely don't want to end up with more 
> than one notion of whether a device is in a trusted/locked/private/etc. 
> vs. unlocked/shared/etc. state with respect to DMA (or indeed anything 
> else if we can avoid it).

Wouldn't TDISP be per-device as well? In which case, a flag would be
needed just as being added in this patch.

Although, there must be a difference between a device with TDISP where
the flag would be the indication of the feature, and this code where the
driver may flip that back and forth...

Do you feel this is shoehorned in `struct device`? I couldn't find an
appropriate private (== opaque pointer) part in the structure to store
that bit (`struct device_private` wouldn't fit the bill) and looked like
adding it to the struct itself would do no harm. However, my read of the
room is that folks see that as dubious :)

What would be your opinion on where to store that flag to tie together
its usage in the Hyper-V SCSI and not bounce-buffering?

> 
> Thanks,
> Robin.
> 
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   arch/x86/mm/mem_encrypt.c  | 3 +++
>>   include/linux/device.h     | 8 ++++++++
>>   include/linux/dma-direct.h | 3 +++
>>   include/linux/swiotlb.h    | 3 +++
>>   4 files changed, 17 insertions(+)
>>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 95bae74fdab2..6349a02a1da3 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -19,6 +19,9 @@
>>   /* Override for DMA direct allocation check - 
>> ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>>   bool force_dma_unencrypted(struct device *dev)
>>   {
>> +    if (dev->use_priv_pages_for_io)
>> +        return false;
>> +
>>       /*
>>        * For SEV, all DMA must be to unencrypted addresses.
>>        */
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 80a5b3268986..4aa4a6fd9580 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -725,6 +725,8 @@ struct device_physical_location {
>>    * @dma_skip_sync: DMA sync operations can be skipped for coherent 
>> buffers.
>>    * @dma_iommu: Device is using default IOMMU implementation for DMA and
>>    *        doesn't rely on dma_ops structure.
>> + * @use_priv_pages_for_io: Device is using private pages for I/O, no 
>> need to
>> + *        bounce-buffer.
>>    *
>>    * At the lowest level, every device in a Linux system is 
>> represented by an
>>    * instance of struct device. The device structure contains the 
>> information
>> @@ -843,6 +845,7 @@ struct device {
>>   #ifdef CONFIG_IOMMU_DMA
>>       bool            dma_iommu:1;
>>   #endif
>> +    bool            use_priv_pages_for_io:1;
>>   };
>>   /**
>> @@ -1079,6 +1082,11 @@ static inline bool 
>> dev_removable_is_valid(struct device *dev)
>>       return dev->removable != DEVICE_REMOVABLE_NOT_SUPPORTED;
>>   }
>> +static inline bool dev_priv_pages_for_io(struct device *dev)
>> +{
>> +    return dev->use_priv_pages_for_io;
>> +}
>> +
>>   /*
>>    * High level routines for use by the bus drivers
>>    */
>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>> index d7e30d4f7503..b096369f847e 100644
>> --- a/include/linux/dma-direct.h
>> +++ b/include/linux/dma-direct.h
>> @@ -94,6 +94,9 @@ static inline dma_addr_t 
>> phys_to_dma_unencrypted(struct device *dev,
>>    */
>>   static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t 
>> paddr)
>>   {
>> +    if (dev_priv_pages_for_io(dev))
>> +        return phys_to_dma_unencrypted(dev, paddr);
>> +
>>       return __sme_set(phys_to_dma_unencrypted(dev, paddr));
>>   }
>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>> index 3dae0f592063..35ee10641b42 100644
>> --- a/include/linux/swiotlb.h
>> +++ b/include/linux/swiotlb.h
>> @@ -173,6 +173,9 @@ static inline bool is_swiotlb_force_bounce(struct 
>> device *dev)
>>   {
>>       struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>> +    if (dev_priv_pages_for_io(dev))
>> +        return false;
>> +
>>       return mem && mem->force_bounce;
>>   }
> 

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 16:44     ` Roman Kisel
@ 2025-04-09 23:30       ` Dan Williams
  2025-04-10  1:16         ` Michael Kelley
                           ` (2 more replies)
  2025-04-10  7:21       ` Christoph Hellwig
  1 sibling, 3 replies; 25+ messages in thread
From: Dan Williams @ 2025-04-09 23:30 UTC (permalink / raw)
  To: Roman Kisel, Robin Murphy, aleksander.lobakin, andriy.shevchenko,
	arnd, bp, catalin.marinas, corbet, dakr, dan.j.williams,
	dave.hansen, decui, gregkh, haiyangz, hch, hpa, James.Bottomley,
	Jonathan.Cameron, kys, leon, lukas, luto, m.szyprowski,
	martin.petersen, mingo, peterz, quic_zijuhu, tglx, wei.liu, will,
	iommu, linux-arch, linux-arm-kernel, linux-doc, linux-hyperv,
	linux-kernel, linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut, Suzuki K Poulose, linux-coco

[ add linux-coco ]

Roman Kisel wrote:
> 
> 
> On 4/9/2025 9:03 AM, Robin Murphy wrote:
> > On 2025-04-09 1:08 am, Roman Kisel wrote:
> >> Bounce-buffering makes the system spend more time copying
> >> I/O data. When the I/O transaction take place between
> >> a confidential and a non-confidential endpoints, there is
> >> no other way around.
> >>
> >> Introduce a device bitfield to indicate that the device
> >> doesn't need to perform bounce buffering. The capable
> >> device may employ it to save on copying data around.
> > 
> > It's not so much about bounce buffering, it's more fundamentally about 
> > whether the device is trusted and able to access private memory at all 
> > or not. And performance is hardly the biggest concern either - if you do 
> > trust a device to operate on confidential data in private memory, then 
> > surely it is crucial to actively *prevent* that data ever getting into 
> > shared SWIOTLB pages where anyone else could also get at it. At worst 
> > that means CoCo VMs might need an *additional* non-shared SWIOTLB to 
> > support trusted devices with addressing limitations (and/or 
> > "swiotlb=force" debugging, potentially).
> 
> Thanks, I should've highlighted that facet most certainly!

One would hope that no one is building a modern device with trusted I/O
capability, *and* with a swiotlb addressing dependency. However, I agree
that a non-shared swiotlb would be needed in such a scenario.

Otherwise the policy around "a device should not even be allowed to
bounce buffer any private page" is a userspace responsibility to either
not load the driver, not release secrets to this CVM, or otherwise make
sure the device is only ever bounce buffering private memory that does
not contain secrets.

> > Also whatever we do for this really wants to tie in with the nascent 
> > TDISP stuff as well, since we definitely don't want to end up with more 
> > than one notion of whether a device is in a trusted/locked/private/etc. 
> > vs. unlocked/shared/etc. state with respect to DMA (or indeed anything 
> > else if we can avoid it).
> 
> Wouldn't TDISP be per-device as well? In which case, a flag would be
> needed just as being added in this patch.
> 
> Although, there must be a difference between a device with TDISP where
> the flag would be the indication of the feature, and this code where the
> driver may flip that back and forth...
> 
> Do you feel this is shoehorned in `struct device`? I couldn't find an
> appropriate private (== opaque pointer) part in the structure to store
> that bit (`struct device_private` wouldn't fit the bill) and looked like
> adding it to the struct itself would do no harm. However, my read of the
> room is that folks see that as dubious :)
> 
> What would be your opinion on where to store that flag to tie together
> its usage in the Hyper-V SCSI and not bounce-buffering?

The name and location of a flag bit is not the issue, it is the common
expectation of how and when that flag is set.

tl;dr Linux likely needs a "private_accepted" flag for devices

Like Christoph said, a driver really has no business opting itself into
different DMA addressing domains. For TDISP we are also being careful to
make sure that flipping a device from shared to private is a suitably
violent event. This is because the Linux DMA layer does not have a
concept of allowing a device to have mappings from two different
addressing domains simultaneously.

In the current TDISP proposal, a device starts in shared mode and only
after validating all of the launch state of the CVM, device
measurements, and a device interface report is it granted access to
private memory. Without dumping a bunch of golden measurement data into
the kernel that validation can really only be performed by userspace.

Enter this vmbus proposal that wants to emulate devices with a paravisor
that is presumably within the TCB at launch, but the kernel can not
really trust that until a "launch state of the CVM + paravisor"
attestation event.

Like PCIe TDISP the capability of this device to access private memory
is a property of the bus and the iommu. However, acceptance of the
device into private operation is a willful policy action. It needs to
validate not only the device provenance and state, but also the Linux
DMA layer requirements of not holding shared or swiotlb mappings over
the "entry into private mode operation" event.

All that said, I would advocate to require a userspace driven "device
accept" event for all devices, not just TDISP, that want to enter
private operation. Maybe later circle back to figure out if there is a
lingering need for accepting devices via golden measurement, or other
means, to skip the userpace round-trip dependency.

A "private_capable" flag might also make sense, but that is really a
property of a bus that need not be carried necessarily in 'struct
device'.

So for this confidential vmbus SCSI device to mesh with the mechanisms
needed for TDISP I would expect it continues to launch in swiotlb mode
by default. Export an attribute via hv_bus->dev_groups to indicate that
the device is "private_capable" and then require userspace to twiddle a
private_accepted flag with some safety for in-flight DMA.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 23:30       ` Dan Williams
@ 2025-04-10  1:16         ` Michael Kelley
  2025-04-11  0:03           ` Dan Williams
  2025-04-10  7:23         ` Christoph Hellwig
  2025-04-10 23:50         ` Jason Gunthorpe
  2 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2025-04-10  1:16 UTC (permalink / raw)
  To: Dan Williams, Roman Kisel, Robin Murphy,
	aleksander.lobakin@intel.com, andriy.shevchenko@linux.intel.com,
	arnd@arndb.de, bp@alien8.de, catalin.marinas@arm.com,
	corbet@lwn.net, dakr@kernel.org, dave.hansen@linux.intel.com,
	decui@microsoft.com, gregkh@linuxfoundation.org,
	haiyangz@microsoft.com, hch@lst.de, hpa@zytor.com,
	James.Bottomley@hansenpartnership.com,
	Jonathan.Cameron@huawei.com, kys@microsoft.com, leon@kernel.org,
	lukas@wunner.de, luto@kernel.org, m.szyprowski@samsung.com,
	martin.petersen@oracle.com, mingo@redhat.com,
	peterz@infradead.org, quic_zijuhu@quicinc.com, tglx@linutronix.de,
	wei.liu@kernel.org, will@kernel.org, iommu@lists.linux.dev,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com, Suzuki K Poulose,
	linux-coco@lists.linux.dev

From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, April 9, 2025 4:30 PM
> 
> [ add linux-coco ]
> 
> Roman Kisel wrote:
> >
> >
> > On 4/9/2025 9:03 AM, Robin Murphy wrote:
> > > On 2025-04-09 1:08 am, Roman Kisel wrote:
> > >> Bounce-buffering makes the system spend more time copying
> > >> I/O data. When the I/O transaction take place between
> > >> a confidential and a non-confidential endpoints, there is
> > >> no other way around.
> > >>
> > >> Introduce a device bitfield to indicate that the device
> > >> doesn't need to perform bounce buffering. The capable
> > >> device may employ it to save on copying data around.
> > >
> > > It's not so much about bounce buffering, it's more fundamentally about
> > > whether the device is trusted and able to access private memory at all
> > > or not. And performance is hardly the biggest concern either - if you do
> > > trust a device to operate on confidential data in private memory, then
> > > surely it is crucial to actively *prevent* that data ever getting into
> > > shared SWIOTLB pages where anyone else could also get at it. At worst
> > > that means CoCo VMs might need an *additional* non-shared SWIOTLB to
> > > support trusted devices with addressing limitations (and/or
> > > "swiotlb=force" debugging, potentially).
> >
> > Thanks, I should've highlighted that facet most certainly!
> 
> One would hope that no one is building a modern device with trusted I/O
> capability, *and* with a swiotlb addressing dependency. However, I agree
> that a non-shared swiotlb would be needed in such a scenario.
> 
> Otherwise the policy around "a device should not even be allowed to
> bounce buffer any private page" is a userspace responsibility to either
> not load the driver, not release secrets to this CVM, or otherwise make
> sure the device is only ever bounce buffering private memory that does
> not contain secrets.
> 
> > > Also whatever we do for this really wants to tie in with the nascent
> > > TDISP stuff as well, since we definitely don't want to end up with more
> > > than one notion of whether a device is in a trusted/locked/private/etc.
> > > vs. unlocked/shared/etc. state with respect to DMA (or indeed anything
> > > else if we can avoid it).
> >
> > Wouldn't TDISP be per-device as well? In which case, a flag would be
> > needed just as being added in this patch.
> >
> > Although, there must be a difference between a device with TDISP where
> > the flag would be the indication of the feature, and this code where the
> > driver may flip that back and forth...
> >
> > Do you feel this is shoehorned in `struct device`? I couldn't find an
> > appropriate private (== opaque pointer) part in the structure to store
> > that bit (`struct device_private` wouldn't fit the bill) and looked like
> > adding it to the struct itself would do no harm. However, my read of the
> > room is that folks see that as dubious :)
> >
> > What would be your opinion on where to store that flag to tie together
> > its usage in the Hyper-V SCSI and not bounce-buffering?
> 
> The name and location of a flag bit is not the issue, it is the common
> expectation of how and when that flag is set.
> 
> tl;dr Linux likely needs a "private_accepted" flag for devices
> 
> Like Christoph said, a driver really has no business opting itself into
> different DMA addressing domains. For TDISP we are also being careful to
> make sure that flipping a device from shared to private is a suitably
> violent event. This is because the Linux DMA layer does not have a
> concept of allowing a device to have mappings from two different
> addressing domains simultaneously.
> 
> In the current TDISP proposal, a device starts in shared mode and only
> after validating all of the launch state of the CVM, device
> measurements, and a device interface report is it granted access to
> private memory. Without dumping a bunch of golden measurement data into
> the kernel that validation can really only be performed by userspace.
> 
> Enter this vmbus proposal that wants to emulate devices with a paravisor
> that is presumably within the TCB at launch, but the kernel can not
> really trust that until a "launch state of the CVM + paravisor"
> attestation event.
> 
> Like PCIe TDISP the capability of this device to access private memory
> is a property of the bus and the iommu. However, acceptance of the
> device into private operation is a willful policy action. It needs to
> validate not only the device provenance and state, but also the Linux
> DMA layer requirements of not holding shared or swiotlb mappings over
> the "entry into private mode operation" event.

To flesh this out the swiotlb aspect a bit, once a TDISP device has
gone private, how does it prevent the DMA layer from ever doing
bounce buffering through the swiotlb? My understanding is that
the DMA layer doesn't make any promises to not do bounce buffering.
Given the vagaries of memory alignment, perhaps add in a virtual
IOMMU, etc., it seems like a device driver can't necessarily predict
what DMA operations might result in bounce buffering. Does TDISP
anticipate needing a formal way to tell the DMA layer "don't bounce
buffer"? (and return an error instead?) Or would there be a separate
swiotlb memory pool that is private memory so that bounce buffer
could be done when necessary and still maintain confidentiality?

Just wondering if there's any thinking on this topic ...

Thanks,

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 16:44     ` Roman Kisel
  2025-04-09 23:30       ` Dan Williams
@ 2025-04-10  7:21       ` Christoph Hellwig
  2025-04-10 15:16         ` Roman Kisel
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2025-04-10  7:21 UTC (permalink / raw)
  To: Roman Kisel
  Cc: Robin Murphy, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut,
	Suzuki K Poulose

On Wed, Apr 09, 2025 at 09:44:03AM -0700, Roman Kisel wrote:
> Do you feel this is shoehorned in `struct device`? I couldn't find an
> appropriate private (== opaque pointer) part in the structure to store
> that bit (`struct device_private` wouldn't fit the bill) and looked like
> adding it to the struct itself would do no harm. However, my read of the
> room is that folks see that as dubious :)

We'll need per-device information.  But it is much higher level than a
need bounce buffer flag.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 23:30       ` Dan Williams
  2025-04-10  1:16         ` Michael Kelley
@ 2025-04-10  7:23         ` Christoph Hellwig
  2025-04-10 23:44           ` Jason Gunthorpe
  2025-04-10 23:50         ` Jason Gunthorpe
  2 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2025-04-10  7:23 UTC (permalink / raw)
  To: Dan Williams
  Cc: Roman Kisel, Robin Murphy, aleksander.lobakin, andriy.shevchenko,
	arnd, bp, catalin.marinas, corbet, dakr, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut,
	Suzuki K Poulose, linux-coco

On Wed, Apr 09, 2025 at 04:30:17PM -0700, Dan Williams wrote:
> > Thanks, I should've highlighted that facet most certainly!
> 
> One would hope that no one is building a modern device with trusted I/O
> capability, *and* with a swiotlb addressing dependency. However, I agree
> that a non-shared swiotlb would be needed in such a scenario.

Hope is never a good idea when dealing with hardware :(  PCIe already
requires no addressing limitations, and programming interface specs
like NVMe double down on that.  But at least one big hyperscaler still
managed to build such a device.

Also even if the periphal device is not addressing limited, the root
port or interconnect might still be, we've seen quite a lot of that.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-10  7:21       ` Christoph Hellwig
@ 2025-04-10 15:16         ` Roman Kisel
  0 siblings, 0 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-10 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hpa, James.Bottomley, Jonathan.Cameron, kys,
	leon, lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut,
	Suzuki K Poulose



On 4/10/2025 12:21 AM, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 09:44:03AM -0700, Roman Kisel wrote:
>> Do you feel this is shoehorned in `struct device`? I couldn't find an
>> appropriate private (== opaque pointer) part in the structure to store
>> that bit (`struct device_private` wouldn't fit the bill) and looked like
>> adding it to the struct itself would do no harm. However, my read of the
>> room is that folks see that as dubious :)
> 
> We'll need per-device information.  But it is much higher level than a
> need bounce buffer flag.
> 

I see, thanks for the explanation!

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 1/6] Documentation: hyperv: Confidential VMBus
  2025-04-09  0:08 ` [PATCH hyperv-next 1/6] Documentation: hyperv: " Roman Kisel
@ 2025-04-10 16:54   ` ALOK TIWARI
  2025-04-10 19:10     ` Roman Kisel
  2025-04-25  6:31   ` Wei Liu
  1 sibling, 1 reply; 25+ messages in thread
From: ALOK TIWARI @ 2025-04-10 16:54 UTC (permalink / raw)
  To: Roman Kisel, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu,
	linux-arch, linux-arm-kernel, linux-doc, linux-hyperv,
	linux-kernel, linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut



On 09-04-2025 05:38, Roman Kisel wrote:
>   in which case it is treated as an entirely new device. See
>   vmbus_onoffer_rescind().
> +
> +Confidential VMBus
> +------------------
> +
> +The confidential VMBus provides the control and data planes where
> +the guest doesn't talk to either the hypervisor or the host. Instead,
> +it relies on the trusted paravisor. The hardware (SNP or TDX) encrypts
> +the guest memory and the register state also measuring the paravisor
> +image via using the platform security processor to ensure trsuted and
> +confidential computing.
> +

typo trsuted  -> trusted

> +To support confidential communication with the paravisor, a VmBus client
> +will first attempt to use regular, non-isolated mechanisms for communication.
> +To do this, it must:


Thanks,
Alok

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 2/6] drivers: hyperv: VMBus protocol version 6.0
  2025-04-09  0:08 ` [PATCH hyperv-next 2/6] drivers: hyperv: VMBus protocol version 6.0 Roman Kisel
@ 2025-04-10 17:03   ` ALOK TIWARI
  0 siblings, 0 replies; 25+ messages in thread
From: ALOK TIWARI @ 2025-04-10 17:03 UTC (permalink / raw)
  To: Roman Kisel, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu,
	linux-arch, linux-arm-kernel, linux-doc, linux-hyperv,
	linux-kernel, linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut



On 09-04-2025 05:38, Roman Kisel wrote:
> The confidential VMBus is supported starting from the protocol
> version 6.0 onwards.
> 
> Update the relevant definitions, provide a function that returns
> whether VMBus is condifential or not.
> 

typo condifential  -> confidential

> Signed-off-by: Roman Kisel<romank@linux.microsoft.com>
> ---
>   drivers/hv/vmbus_drv.c         | 12 ++++++
>   include/asm-generic/mshyperv.h |  1 +
>   include/linux/hyperv.h         | 71 +++++++++++++++++++++++++---------
>   3 files changed, 65 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 22afebfc28ff..fa3ad6fe0bec 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c


Thanks,
Alok

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 1/6] Documentation: hyperv: Confidential VMBus
  2025-04-10 16:54   ` ALOK TIWARI
@ 2025-04-10 19:10     ` Roman Kisel
  0 siblings, 0 replies; 25+ messages in thread
From: Roman Kisel @ 2025-04-10 19:10 UTC (permalink / raw)
  To: ALOK TIWARI, aleksander.lobakin, andriy.shevchenko, arnd, bp,
	catalin.marinas, corbet, dakr, dan.j.williams, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu,
	linux-arch, linux-arm-kernel, linux-doc, linux-hyperv,
	linux-kernel, linux-scsi, x86
  Cc: apais, benhill, bperkins, sunilmut



On 4/10/2025 9:54 AM, ALOK TIWARI wrote:
> 
> 

[...]

> 
> typo trsuted  -> trusted
> 
>> +To support confidential communication with the paravisor, a VmBus client
>> +will first attempt to use regular, non-isolated mechanisms for 
>> communication.
>> +To do this, it must:
> 

Thanks for your help with this patch, and the other ones!! I'll make
sure to use the spellchecker on the patches before sending another
version out.

> 
> Thanks,
> Alok

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-10  7:23         ` Christoph Hellwig
@ 2025-04-10 23:44           ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2025-04-10 23:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Roman Kisel, Robin Murphy, aleksander.lobakin,
	andriy.shevchenko, arnd, bp, catalin.marinas, corbet, dakr,
	dave.hansen, decui, gregkh, haiyangz, hpa, James.Bottomley,
	Jonathan.Cameron, kys, leon, lukas, luto, m.szyprowski,
	martin.petersen, mingo, peterz, quic_zijuhu, tglx, wei.liu, will,
	iommu, linux-arch, linux-arm-kernel, linux-doc, linux-hyperv,
	linux-kernel, linux-scsi, x86, apais, benhill, bperkins, sunilmut,
	Suzuki K Poulose, linux-coco

On Thu, Apr 10, 2025 at 09:23:54AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 04:30:17PM -0700, Dan Williams wrote:
> > > Thanks, I should've highlighted that facet most certainly!
> > 
> > One would hope that no one is building a modern device with trusted I/O
> > capability, *and* with a swiotlb addressing dependency. However, I agree
> > that a non-shared swiotlb would be needed in such a scenario.
> 
> Hope is never a good idea when dealing with hardware :(  PCIe already
> requires no addressing limitations, and programming interface specs
> like NVMe double down on that.  But at least one big hyperscaler still
> managed to build such a device.
> 
> Also even if the periphal device is not addressing limited, the root
> port or interconnect might still be, we've seen quite a lot of that.

Still it would be very obnoxious for someone to build a CC VM platform
where CC DMA devices can't access all the guest physical memory in the
CC address map :\

Keeping in mind that that the CC address map is being created by using
the CPU MMU and the CPU IOMMU so it is entirely virtual and can be
configured to match most problems the devices might have.

Too much memory would be the main issue..

IMHO I wouldn't implement secure SWIOTLB until someone does such a
foolish thing, and I'd make them do the work as some kind of pennance
:P

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-09 23:30       ` Dan Williams
  2025-04-10  1:16         ` Michael Kelley
  2025-04-10  7:23         ` Christoph Hellwig
@ 2025-04-10 23:50         ` Jason Gunthorpe
  2 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2025-04-10 23:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: Roman Kisel, Robin Murphy, aleksander.lobakin, andriy.shevchenko,
	arnd, bp, catalin.marinas, corbet, dakr, dave.hansen, decui,
	gregkh, haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron,
	kys, leon, lukas, luto, m.szyprowski, martin.petersen, mingo,
	peterz, quic_zijuhu, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut,
	Suzuki K Poulose, linux-coco

On Wed, Apr 09, 2025 at 04:30:17PM -0700, Dan Williams wrote:

> Like Christoph said, a driver really has no business opting itself into
> different DMA addressing domains. For TDISP we are also being careful to
> make sure that flipping a device from shared to private is a suitably
> violent event. This is because the Linux DMA layer does not have a
> concept of allowing a device to have mappings from two different
> addressing domains simultaneously.

And this is a very important point, several of the architectures have
two completely independent iommu tables, and maybe even completely
different IOMMU instances for trusted and non-trusted DMA traffic.

I expect configurations where trusted traffic is translated through
the vIOMMU while non-trusted traffic is locked to an identity
translation.

There are more issue here than just swiotlb :\

> A "private_capable" flag might also make sense, but that is really a
> property of a bus that need not be carried necessarily in 'struct
> device'.

However it works, it should be done before the driver is probed and
remain stable for the duration of the driver attachment. From the
iommu side the correct iommu domain, on the correct IOMMU instance to
handle the expected traffic should be setup as the DMA API's iommu
domain.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer
  2025-04-10  1:16         ` Michael Kelley
@ 2025-04-11  0:03           ` Dan Williams
  0 siblings, 0 replies; 25+ messages in thread
From: Dan Williams @ 2025-04-11  0:03 UTC (permalink / raw)
  To: Michael Kelley, Dan Williams, Roman Kisel, Robin Murphy,
	aleksander.lobakin@intel.com, andriy.shevchenko@linux.intel.com,
	arnd@arndb.de, bp@alien8.de, catalin.marinas@arm.com,
	corbet@lwn.net, dakr@kernel.org, dave.hansen@linux.intel.com,
	decui@microsoft.com, gregkh@linuxfoundation.org,
	haiyangz@microsoft.com, hch@lst.de, hpa@zytor.com,
	James.Bottomley@hansenpartnership.com,
	Jonathan.Cameron@huawei.com, kys@microsoft.com, leon@kernel.org,
	lukas@wunner.de, luto@kernel.org, m.szyprowski@samsung.com,
	martin.petersen@oracle.com, mingo@redhat.com,
	peterz@infradead.org, quic_zijuhu@quicinc.com, tglx@linutronix.de,
	wei.liu@kernel.org, will@kernel.org, iommu@lists.linux.dev,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com, Suzuki K Poulose,
	linux-coco@lists.linux.dev

Michael Kelley wrote:
> From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, April 9, 2025 4:30 PM
[..]
> > Like PCIe TDISP the capability of this device to access private memory
> > is a property of the bus and the iommu. However, acceptance of the
> > device into private operation is a willful policy action. It needs to
> > validate not only the device provenance and state, but also the Linux
> > DMA layer requirements of not holding shared or swiotlb mappings over
> > the "entry into private mode operation" event.
> 
> To flesh this out the swiotlb aspect a bit, once a TDISP device has
> gone private, how does it prevent the DMA layer from ever doing
> bounce buffering through the swiotlb? My understanding is that
> the DMA layer doesn't make any promises to not do bounce buffering.
> Given the vagaries of memory alignment, perhaps add in a virtual
> IOMMU, etc., it seems like a device driver can't necessarily predict
> what DMA operations might result in bounce buffering. Does TDISP
> anticipate needing a formal way to tell the DMA layer "don't bounce
> buffer"? (and return an error instead?) Or would there be a separate
> swiotlb memory pool that is private memory so that bounce buffer
> could be done when necessary and still maintain confidentiality?

I expect step 1 is just add some rude errors / safety for attempting to
convert the mode of a device while it has any DMA mappings established,
and explicit failures for attempts to fallback to swiotlb for
'private_accepted' devices.

The easiest way to enforce that a device does not cross the
shared/private boundary while DMA mappings are live is to simply not
allow that transition while a driver is bound (i.e. "dev->driver" is
non-NULL).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH hyperv-next 1/6] Documentation: hyperv: Confidential VMBus
  2025-04-09  0:08 ` [PATCH hyperv-next 1/6] Documentation: hyperv: " Roman Kisel
  2025-04-10 16:54   ` ALOK TIWARI
@ 2025-04-25  6:31   ` Wei Liu
  1 sibling, 0 replies; 25+ messages in thread
From: Wei Liu @ 2025-04-25  6:31 UTC (permalink / raw)
  To: Roman Kisel
  Cc: aleksander.lobakin, andriy.shevchenko, arnd, bp, catalin.marinas,
	corbet, dakr, dan.j.williams, dave.hansen, decui, gregkh,
	haiyangz, hch, hpa, James.Bottomley, Jonathan.Cameron, kys, leon,
	lukas, luto, m.szyprowski, martin.petersen, mingo, peterz,
	quic_zijuhu, robin.murphy, tglx, wei.liu, will, iommu, linux-arch,
	linux-arm-kernel, linux-doc, linux-hyperv, linux-kernel,
	linux-scsi, x86, apais, benhill, bperkins, sunilmut

On Tue, Apr 08, 2025 at 05:08:30PM -0700, Roman Kisel wrote:
> Define what the confidential VMBus is and describe what advantages
> it offers on the capable hardware.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  Documentation/virt/hyperv/vmbus.rst | 41 +++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/Documentation/virt/hyperv/vmbus.rst b/Documentation/virt/hyperv/vmbus.rst
> index 1dcef6a7fda3..f600e3d09800 100644
> --- a/Documentation/virt/hyperv/vmbus.rst
> +++ b/Documentation/virt/hyperv/vmbus.rst
> @@ -324,3 +324,44 @@ rescinded, neither Hyper-V nor Linux retains any state about
>  its previous existence. Such a device might be re-added later,
>  in which case it is treated as an entirely new device. See
>  vmbus_onoffer_rescind().
> +
> +Confidential VMBus
> +------------------
> +
> +The confidential VMBus provides the control and data planes where
> +the guest doesn't talk to either the hypervisor or the host. Instead,
> +it relies on the trusted paravisor. The hardware (SNP or TDX) encrypts
> +the guest memory and the register state also measuring the paravisor
> +image via using the platform security processor to ensure trsuted and
> +confidential computing.
> +
> +To support confidential communication with the paravisor, a VmBus client

Please be consistent. In this document I see VMBus and VmBus. We should
stick with only one form.

Wei.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-04-25  6:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09  0:08 [PATCH hyperv-next 0/6] Confidential VMBus Roman Kisel
2025-04-09  0:08 ` [PATCH hyperv-next 1/6] Documentation: hyperv: " Roman Kisel
2025-04-10 16:54   ` ALOK TIWARI
2025-04-10 19:10     ` Roman Kisel
2025-04-25  6:31   ` Wei Liu
2025-04-09  0:08 ` [PATCH hyperv-next 2/6] drivers: hyperv: VMBus protocol version 6.0 Roman Kisel
2025-04-10 17:03   ` ALOK TIWARI
2025-04-09  0:08 ` [PATCH hyperv-next 3/6] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
2025-04-09  0:08 ` [PATCH hyperv-next 4/6] arch: x86, drivers: hyperv: Enable confidential VMBus Roman Kisel
2025-04-09  0:08 ` [PATCH hyperv-next 5/6] arch, drivers: Add device struct bitfield to not bounce-buffer Roman Kisel
2025-04-09 10:52   ` Christoph Hellwig
2025-04-09 15:27     ` Roman Kisel
2025-04-09 16:03   ` Robin Murphy
2025-04-09 16:44     ` Roman Kisel
2025-04-09 23:30       ` Dan Williams
2025-04-10  1:16         ` Michael Kelley
2025-04-11  0:03           ` Dan Williams
2025-04-10  7:23         ` Christoph Hellwig
2025-04-10 23:44           ` Jason Gunthorpe
2025-04-10 23:50         ` Jason Gunthorpe
2025-04-10  7:21       ` Christoph Hellwig
2025-04-10 15:16         ` Roman Kisel
2025-04-09  0:08 ` [PATCH hyperv-next 6/6] drivers: SCSI: Do not bounce-bufffer for the confidential VMBus Roman Kisel
2025-04-09 10:53   ` Christoph Hellwig
2025-04-09 15:36     ` Roman Kisel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).