linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH hyperv-next v3 00/15] Confidential VMBus
@ 2025-06-04  0:43 Roman Kisel
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
                   ` (15 more replies)
  0 siblings, 16 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The guests running on Hyper-V can be confidential where the memory and the
register content are encrypted, provided that the hardware supports that
(currently support AMD SEV-SNP and Intel TDX is implemented) and the guest
is capable of using these features. The confidential guests cannot be
introspected by the host nor the hypervisor without the guest sharing the
memory contents upon doing which the memory is decrypted.

In the confidential guests, neither the host nor the hypervisor need to be
trusted, and the guests processing sensitive data can take advantage of that.

Not trusting the host and the hypervisor (removing them from the Trusted
Computing Base aka TCB) ncessitates that the method of communication
between the host and the guest be changed. Below there is the breakdown of
the options used in the both cases (in the diagrams below the server is
marked as S, the client is marked as C):

1. Without the paravisoor the devices are connected to the host, and the
host provides the device emulation or translation to the guest:

+---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
|               |       |                 |        |                |
|               |       |                 |        |                |
|               |       |                 ==========                |
|               |       |                 |        |                |
|               |       |                 |        |                |
|               |       |                 |        |                |
+----- C -------+       +-----------------+        +------- S ------+
       ||                                                   ||
       ||                                                   ||
+------||------------------ VMBus --------------------------||------+
|                     Interrupts, MMIO                              |
+-------------------------------------------------------------------+

2. With the paravisor, the devices are connected to the paravisor, and
the paravisor provides the device emulation or translation to the guest.
The guest doesn't communicate with the host directly, and the guest
communicates with the paravisor via the VMBus. The host is not trusted
in this model, and the paravisor is trusted:

+---- GUEST --------------- VTL0 ------+               +-- DEVICE --+
|                                      |               |            |
| +- PARAVISOR --------- VTL2 -----+   |               |            |
| |     +-- VMBus Relay ------+    ====+================            |
| |     |   Interrupts, MMIO  |    |   |               |            |
| |     +-------- S ----------+    |   |               +------------+
| |               ||               |   |
| +---------+     ||               |   |
| |  Linux  |     ||    OpenHCL    |   |
| |  kernel |     ||               |   |
| +---- C --+-----||---------------+   |
|       ||        ||                   |
+-------++------- C -------------------+               +------------+
        ||                                             |    HOST    |
        ||                                             +---- S -----+
+-------||----------------- VMBus ---------------------------||-----+
|                     Interrupts, MMIO                              |
+-------------------------------------------------------------------+

Note that in the second case the guest doesn't need to share the memory
with the host as it communicates only with the paravisor within their
partition boundary. That is precisely the raison d'etre and the value
proposition of this patch series: equip the confidential guest to use
private (encrypted) memory and rely on the paravisor when this is
available to be more secure.

An implementation of the VMBus relay that offers the Confidential VMBus channels
is available in the OpenVMM project as a part of the OpenHCL paravisor. Please
refer to https://openvmm.dev/ and https://github.com/microsoft/openvmm for more
information about the OpenHCL paravisor.

I'd like to thank the following people for their help with this
patch series:

* Dexuan for help with validation and the fruitful discussions,
* Easwar for reviewing the refactoring of the page allocating and
  freeing in `hv.c`,
* John and Sven for the design,
* Mike for helping to avoid pitfalls when dealing with the GFP flags,
* Sven for blazing the trail and implementing the design in few
  codebases.

I made sure to validate the patch series on

    {TrustedLaunch(x86_64), OpenHCL} x
    {SNP(x86_64), TDX(x86_64), No hardware isolation, No paravisor} x
    {VMBus 5.0, VMBus 6.0} x
    {arm64, x86_64}.

[V3]
    - The patch series is rebased on top of the latest hyperv-next branch.
    - Reworked the "wiring" diagram in the cover letter, added links to the
      OpenVMM project and the OpenHCL paravisor.

    - More precise wording in the comments and clearer code.
    **Thank you, Alok!**

    - Reworked the documentation patch.
    - Split the patchset into much more granular patches.
    - Various fixes and improvements throughout the patch series.
    **Thank you, Michael!**

[V2] https://lore.kernel.org/linux-hyperv/20250511230758.160674-1-romank@linux.microsoft.com/
    - The patch series is rebased on top of the latest hyperv-next branch.
  
    - Better wording in the commit messages and the Documentation.
    **Thank you, Alok and Wei!**

    - Removed the patches 5 and 6 concerning turning bounce buffering off from
      the previous version of the patch series as they were found to be
      architecturally unsound. The value proposition of the patch series is not
      diminished by this removal: these patches were an optimization and only for
      the storage (for the simplicity sake) but not for the network. These changes
      might be proposed in the future again after revolving the issues.
    ** Thanks you, Christoph, Dexuan, Dan, Michael, James, Robin! **

[V1] https://lore.kernel.org/linux-hyperv/20250409000835.285105-1-romank@linux.microsoft.com/

Roman Kisel (15):
  Documentation: hyperv: Confidential VMBus
  drivers: hv: VMBus protocol version 6.0
  arch: hyperv: Get/set SynIC synth.registers via paravisor
  arch/x86: mshyperv: Trap on access for some synthetic MSRs
  Drivers: hv: Rename fields for SynIC message and event pages
  Drivers: hv: Allocate the paravisor SynIC pages when required
  Drivers: hv: Post messages via the confidential VMBus if available
  Drivers: hv: remove stale comment
  Drivers: hv: Use memunmap() to check if the address is in IO map
  Drivers: hv: Rename the SynIC enable and disable routines
  Drivers: hv: Functions for setting up and tearing down the paravisor
    SynIC
  Drivers: hv: Allocate encrypted buffers when requested
  Drivers: hv: Support confidential VMBus channels
  Drivers: hv: Support establishing the confidential VMBus connection
  Drivers: hv: Set the default VMBus version to 6.0

 Documentation/virt/hyperv/coco.rst | 125 ++++++++-
 arch/x86/kernel/cpu/mshyperv.c     |  67 ++++-
 drivers/hv/channel.c               |  43 ++--
 drivers/hv/channel_mgmt.c          |  27 +-
 drivers/hv/connection.c            |   6 +-
 drivers/hv/hv.c                    | 399 ++++++++++++++++++++---------
 drivers/hv/hv_common.c             |  13 +
 drivers/hv/hyperv_vmbus.h          |  28 +-
 drivers/hv/mshv_root.h             |   2 +-
 drivers/hv/mshv_synic.c            |   6 +-
 drivers/hv/ring_buffer.c           |   5 +-
 drivers/hv/vmbus_drv.c             | 187 +++++++++-----
 include/asm-generic/mshyperv.h     |   3 +
 include/linux/hyperv.h             |  69 +++--
 14 files changed, 740 insertions(+), 240 deletions(-)


base-commit: 96959283a58d91ae20d025546f00e16f0a555208
-- 
2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 01/15] Documentation: hyperv: Confidential VMBus
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04  2:56   ` Randy Dunlap
                     ` (3 more replies)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 02/15] drivers: hv: VMBus protocol version 6.0 Roman Kisel
                   ` (14 subsequent siblings)
  15 siblings, 4 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

Define what the confidential VMBus is and describe what advantages
it offers on the capable hardware.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
---
 Documentation/virt/hyperv/coco.rst | 125 ++++++++++++++++++++++++++++-
 1 file changed, 124 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/hyperv/coco.rst b/Documentation/virt/hyperv/coco.rst
index c15d6fe34b4e..b4904b64219d 100644
--- a/Documentation/virt/hyperv/coco.rst
+++ b/Documentation/virt/hyperv/coco.rst
@@ -178,7 +178,7 @@ These Hyper-V and VMBus memory pages are marked as decrypted:
 
 * VMBus monitor pages
 
-* Synthetic interrupt controller (synic) related pages (unless supplied by
+* Synthetic interrupt controller (SynIC) related pages (unless supplied by
   the paravisor)
 
 * Per-cpu hypercall input and output pages (unless running with a paravisor)
@@ -258,3 +258,126 @@ normal page fault is generated instead of #VC or #VE, and the page-fault-
 based handlers for load_unaligned_zeropad() fixup the reference. When the
 encrypted/decrypted transition is complete, the pages are marked as "present"
 again. See hv_vtom_clear_present() and hv_vtom_set_host_visibility().
+
+Confidential VMBus
+------------------
+
+The confidential VMBus enables the confidential guest not to interact with the
+untrusted host partition and the untrusted hypervisor. Instead, the guest relies
+on the trusted paravisor to communicate with the devices processing sensitive
+data. The hardware (SNP or TDX) encrypts the guest memory and the register state
+while measuring the paravisor image using the platform security processor to
+ensure trusted and confidential computing.
+
+Confidential VMBus provides a secure communication channel between the guest and
+the paravisor, ensuring that sensitive data is protected from  hypervisor-level
+access through memory encryption and register state isolation.
+
+The unencrypted data never leaves the VM so neither the host partition nor the
+hypervisor can access it at all. In addition to that, the guest only needs to
+establish a VMBus connection with the paravisor for the channels that process
+sensitive data, and the paravisor abstracts the details of communicating with
+the specific devices away.
+
+Confidential VMBus is an extension of Confidential Computing (CoCo) VMs
+(a.k.a. "Isolated" VMs in Hyper-V terminology). Without Confidential VMBus,
+guest VMBus device drivers (the "VSC"s in VMBus terminology) communicate
+with VMBus servers (the VSPs) running on the Hyper-V host. The
+communication must be through memory that has been decrypted so the
+host can access it. With Confidential VMBus, one or more of the VSPs reside
+in the trusted paravisor layer in the guest VM. Since the paravisor layer also
+operates in encrypted memory, the memory used for communication with
+such VSPs does not need to be decrypted and thereby exposed to the
+Hyper-V host. The paravisor is responsible for communicating securely
+with the Hyper-V host as necessary. In some cases (e.g. time synchonization,
+key-value pairs exchange) the unencrypted data doesn't need to be communicated
+with the host at all, and a conventional VMBus connection suffices.
+
+Here is the data flow for a conventional VMBus connection and the Confidential
+VMBus connection (C stands for the client or VSC, S for the server or VSP):
+
++---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
+|               |       |                 |        |                |
+|               |       |                 |        |                |
+|               |       |                 ==========                |
+|               |       |                 |        |                |
+|               |       |                 |        |                |
+|               |       |                 |        |                |
++----- C -------+       +-----------------+        +------- S ------+
+       ||                                                   ||
+       ||                                                   ||
++------||------------------ VMBus --------------------------||------+
+|                     Interrupts, MMIO                              |
++-------------------------------------------------------------------+
+
++---- GUEST --------------- VTL0 ------+               +-- DEVICE --+
+|                                      |               |            |
+| +- PARAVISOR --------- VTL2 -----+   |               |            |
+| |     +-- VMBus Relay ------+    ====+================            |
+| |     |   Interrupts, MMIO  |    |   |               |            |
+| |     +-------- S ----------+    |   |               +------------+
+| |               ||               |   |
+| +---------+     ||               |   |
+| |  Linux  |     ||    OpenHCL    |   |
+| |  kernel |     ||               |   |
+| +---- C --+-----||---------------+   |
+|       ||        ||                   |
++-------++------- C -------------------+               +------------+
+        ||                                             |    HOST    |
+        ||                                             +---- S -----+
++-------||----------------- VMBus ---------------------------||-----+
+|                     Interrupts, MMIO                              |
++-------------------------------------------------------------------+
+
+An implementation of the VMBus relay that offers the Confidential VMBus channels
+is available in the OpenVMM project as a part of the OpenHCL paravisor. Please
+refer to https://openvmm.dev/ and https://github.com/microsoft/openvmm for more
+information about the OpenHCL paravisor.
+
+A guest that is running with a paravisor must determine at runtime if
+Confidential VMBus is supported by the current paravisor. It does so by first
+trying to establish a Confidential VMBus connection with the paravisor using
+standard mechanisms where the memory remains encrypted. If this succeeds,
+then the guest can proceed to use Confidential VMBus. If it fails, then the
+guest must fallback to establishing a non-Confidential VMBus connection with
+the Hyper-V host.
+
+Confidential VMBus is a characteristic of the VMBus connection as a whole,
+and of each VMBus channel that is created. When a Confidential VMBus
+connection is established, the paravisor provides the guest the message-passing
+path that is used for VMBus device creation and deletion, and it provides a
+per-CPU synthetic interrupt controller (SynIC) just like the SynIC that is
+offered by the Hyper-V host. Each VMBus device that is offered to the guest
+indicates the degree to which it participates in Confidential VMBus. The offer
+indicates if the device uses encrypted ring buffers, and if the device uses
+encrypted memory for DMA that is done outside the ring buffer. These settings
+may be different for different devices using the same Confidential VMBus
+connection.
+
+Although these settings are separate, in practice it'll always be encrypted
+ring buffer only or both encrypted ring buffer and external data. If a channel
+is offered by the paravisor with confidential VMBus, the ring buffer can always
+be encrypted since it's strictly for communication between the VTL2 paravisor
+and the VTL0 guest. However, other memory regions are often used for e.g. DMA,
+so they need to be accessible by the underlying hardware, and must be unencrypted
+(unless the device supports encrypted memory). Currently, there are no any VSPs
+in OpenHCL that support encrypted external memory, but we will use it in the
+future.
+
+Because some devices on a Confidential VMBus may require decrypted ring buffers
+and DMA transfers, the guest must interact with two SynICs -- the one provided
+by the paravisor and the one provided by the Hyper-V host when Confidential
+VMBus is not offered. Interrupts are always signaled by the paravisor SynIC, but
+the guest must check for messages and for channel interrupts on both SynICs.
+
+In the case of a confidential VM, regular SynIC access by the guest is
+intercepted by the paravisor (this includes various MSRs such as the SIMP and
+SIEFP, as well as hypercalls like HvPostMessage and HvSignalEvent). If the guest
+actually wants to communicate with the hypervisor, it has to use special mechanisms
+(GHCB page on SNP, or tdcall on TDX). Messages will always be one or the other:
+with confidential VMBus, all messages use the paravisor SynIC, otherwise they all
+use the hypervisor SynIC. For interrupt signaling, though, some channels may be
+running on the host (non-confidential, using the VMBus relay) and use the hypervisor
+SynIC, and some on the paravisor and use its SynIC. The RelIDs are coordinated by
+the OpenHCL VMBus server and are guaranteed to be unique regardless of whether
+the channel originated on the host or the paravisor.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 02/15] drivers: hv: VMBus protocol version 6.0
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04  0:43 ` [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus is supported starting from the protocol
version 6.0 onwards.

Update the relevant definitions, and provide a function that returns
whether VMBus is confidential or not. No functional changes.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
---
 drivers/hv/vmbus_drv.c         | 12 ++++++
 include/asm-generic/mshyperv.h |  1 +
 include/linux/hyperv.h         | 69 ++++++++++++++++++++++++----------
 3 files changed, 63 insertions(+), 19 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index c236081d0a87..678ad59deb2d 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -56,6 +56,18 @@ static long __percpu *vmbus_evt;
 int vmbus_irq;
 int vmbus_interrupt;
 
+/*
+ * If the Confidential VMBus is used, the data on the "wire" is not
+ * visible to either the host or the hypervisor.
+ */
+static bool is_confidential;
+
+bool vmbus_is_confidential(void)
+{
+	return is_confidential;
+}
+EXPORT_SYMBOL_GPL(vmbus_is_confidential);
+
 /*
  * The panic notifier below is responsible solely for unloading the
  * vmbus connection, which is necessary in a panic event.
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index a729b77983fa..9722934a8332 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -373,6 +373,7 @@ static inline int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u3
 	return -EOPNOTSUPP;
 }
 #endif /* CONFIG_MSHV_ROOT */
+bool vmbus_is_confidential(void);
 
 #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
 u8 __init get_vtl(void);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 23b3c3a2ed8c..f87f3f2f8e89 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -265,16 +265,18 @@ static inline u32 hv_get_avail_to_write_percent(
  * Linux kernel.
  */
 
-#define VERSION_WS2008  ((0 << 16) | (13))
-#define VERSION_WIN7    ((1 << 16) | (1))
-#define VERSION_WIN8    ((2 << 16) | (4))
-#define VERSION_WIN8_1    ((3 << 16) | (0))
-#define VERSION_WIN10 ((4 << 16) | (0))
-#define VERSION_WIN10_V4_1 ((4 << 16) | (1))
-#define VERSION_WIN10_V5 ((5 << 16) | (0))
-#define VERSION_WIN10_V5_1 ((5 << 16) | (1))
-#define VERSION_WIN10_V5_2 ((5 << 16) | (2))
-#define VERSION_WIN10_V5_3 ((5 << 16) | (3))
+#define VMBUS_MAKE_VERSION(MAJ, MIN)	((((u32)MAJ) << 16) | (MIN))
+#define VERSION_WS2008					VMBUS_MAKE_VERSION(0, 13)
+#define VERSION_WIN7					VMBUS_MAKE_VERSION(1, 1)
+#define VERSION_WIN8					VMBUS_MAKE_VERSION(2, 4)
+#define VERSION_WIN8_1					VMBUS_MAKE_VERSION(3, 0)
+#define VERSION_WIN10					VMBUS_MAKE_VERSION(4, 0)
+#define VERSION_WIN10_V4_1				VMBUS_MAKE_VERSION(4, 1)
+#define VERSION_WIN10_V5				VMBUS_MAKE_VERSION(5, 0)
+#define VERSION_WIN10_V5_1				VMBUS_MAKE_VERSION(5, 1)
+#define VERSION_WIN10_V5_2				VMBUS_MAKE_VERSION(5, 2)
+#define VERSION_WIN10_V5_3				VMBUS_MAKE_VERSION(5, 3)
+#define VERSION_WIN10_V6_0				VMBUS_MAKE_VERSION(6, 0)
 
 /* Make maximum size of pipe payload of 16K */
 #define MAX_PIPE_DATA_PAYLOAD		(sizeof(u8) * 16384)
@@ -335,14 +337,22 @@ struct vmbus_channel_offer {
 } __packed;
 
 /* Server Flags */
-#define VMBUS_CHANNEL_ENUMERATE_DEVICE_INTERFACE	1
-#define VMBUS_CHANNEL_SERVER_SUPPORTS_TRANSFER_PAGES	2
-#define VMBUS_CHANNEL_SERVER_SUPPORTS_GPADLS		4
-#define VMBUS_CHANNEL_NAMED_PIPE_MODE			0x10
-#define VMBUS_CHANNEL_LOOPBACK_OFFER			0x100
-#define VMBUS_CHANNEL_PARENT_OFFER			0x200
-#define VMBUS_CHANNEL_REQUEST_MONITORED_NOTIFICATION	0x400
-#define VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER		0x2000
+#define VMBUS_CHANNEL_ENUMERATE_DEVICE_INTERFACE		0x0001
+/*
+ * This flag indicates that the channel is offered by the paravisor, and must
+ * use encrypted memory for the channel ring buffer.
+ */
+#define VMBUS_CHANNEL_CONFIDENTIAL_RING_BUFFER			0x0002
+/*
+ * This flag indicates that the channel is offered by the paravisor, and must
+ * use encrypted memory for GPA direct packets and additional GPADLs.
+ */
+#define VMBUS_CHANNEL_CONFIDENTIAL_EXTERNAL_MEMORY		0x0004
+#define VMBUS_CHANNEL_NAMED_PIPE_MODE					0x0010
+#define VMBUS_CHANNEL_LOOPBACK_OFFER					0x0100
+#define VMBUS_CHANNEL_PARENT_OFFER						0x0200
+#define VMBUS_CHANNEL_REQUEST_MONITORED_NOTIFICATION	0x0400
+#define VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER				0x2000
 
 struct vmpacket_descriptor {
 	u16 type;
@@ -621,6 +631,12 @@ struct vmbus_channel_relid_released {
 	u32 child_relid;
 } __packed;
 
+/*
+ * Used by the paravisor only, means that the encrypted ring buffers and
+ * the encrypted external memory are supported
+ */
+#define VMBUS_FEATURE_FLAG_CONFIDENTIAL_CHANNELS	0x10
+
 struct vmbus_channel_initiate_contact {
 	struct vmbus_channel_message_header header;
 	u32 vmbus_version_requested;
@@ -630,7 +646,8 @@ struct vmbus_channel_initiate_contact {
 		struct {
 			u8	msg_sint;
 			u8	msg_vtl;
-			u8	reserved[6];
+			u8	reserved[2];
+			u32 feature_flags; /* VMBus version 6.0 */
 		};
 	};
 	u64 monitor_page1;
@@ -1008,6 +1025,10 @@ struct vmbus_channel {
 
 	/* boolean to control visibility of sysfs for ring buffer */
 	bool ring_sysfs_visible;
+	/* The ring buffer is encrypted */
+	bool co_ring_buffer;
+	/* The external memory is encrypted */
+	bool co_external_memory;
 };
 
 #define lock_requestor(channel, flags)					\
@@ -1032,6 +1053,16 @@ u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
 			     u64 rqst_addr);
 u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id);
 
+static inline bool is_co_ring_buffer(const struct vmbus_channel_offer_channel *o)
+{
+	return !!(o->offer.chn_flags & VMBUS_CHANNEL_CONFIDENTIAL_RING_BUFFER);
+}
+
+static inline bool is_co_external_memory(const struct vmbus_channel_offer_channel *o)
+{
+	return !!(o->offer.chn_flags & VMBUS_CHANNEL_CONFIDENTIAL_EXTERNAL_MEMORY);
+}
+
 static inline bool is_hvsock_offer(const struct vmbus_channel_offer_channel *o)
 {
 	return !!(o->offer.chn_flags & VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
  2025-06-04  0:43 ` [PATCH hyperv-next v3 02/15] drivers: hv: VMBus protocol version 6.0 Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04 13:27   ` ALOK TIWARI
  2025-06-18 16:17   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs Roman Kisel
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus is built on the guest talking to the
paravisor only.

Provide functions that allow manipulating the SynIC registers
via paravisor. No functional changes.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 44 ++++++++++++++++++++++++++++++++++
 drivers/hv/hv_common.c         | 13 ++++++++++
 include/asm-generic/mshyperv.h |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 3e2533954675..83a85d94bcb3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -89,6 +89,50 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value)
 }
 EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
 
+/*
+ * Attempt to get the SynIC register value from the paravisor.
+ *
+ * Not all paravisors support reading SynIC registers, so this function
+ * may fail. The register for the SynIC of the running CPU is accessed.
+ *
+ * Writes the SynIC register value into the memory pointed by val,
+ * and ~0ULL is on failure.
+ *
+ * Returns -ENODEV if the MSR is not a SynIC register, or another error
+ * code if getting the MSR fails (meaning the paravisor doesn't support
+ * relaying VMBus comminucations).
+ */
+int hv_para_get_synic_register(unsigned int reg, u64 *val)
+{
+	u64 reg_val = ~0ULL;
+	int err = -ENODEV;
+
+	if (hv_is_synic_msr(reg))
+		reg_val = native_read_msr_safe(reg, &err);
+	*val = reg_val;
+
+	return err;
+}
+
+/*
+ * Attempt to set the SynIC register value with the paravisor.
+ *
+ * Not all paravisors support reading SynIC registers, so this function
+ * may fail. The register for the SynIC of the running CPU is accessed.
+ *
+ * Sets the register to the value supplied.
+ *
+ * Returns: -ENODEV if the MSR is not a SynIC register, or another error
+ * code if writing to the MSR fails (meaning the paravisor doesn't support
+ * relaying VMBus comminucations).
+ */
+int hv_para_set_synic_register(unsigned int reg, u64 val)
+{
+	if (!hv_is_synic_msr(reg))
+		return -ENODEV;
+	return wrmsrl_safe(reg, val);
+}
+
 u64 hv_get_msr(unsigned int reg)
 {
 	if (hv_nested)
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 49898d10faff..a179ea482cb1 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -716,6 +716,19 @@ u64 __weak hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
 }
 EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
 
+int __weak hv_para_get_synic_register(unsigned int reg, u64 *val)
+{
+	*val = ~0ULL;
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(hv_para_get_synic_register);
+
+int __weak hv_para_set_synic_register(unsigned int reg, u64 val)
+{
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(hv_para_set_synic_register);
+
 void hv_identify_partition_type(void)
 {
 	/* Assume guest role */
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 9722934a8332..f239b102fc53 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -333,6 +333,8 @@ bool hv_is_isolation_supported(void);
 bool hv_isolation_type_snp(void);
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+int hv_para_get_synic_register(unsigned int reg, u64 *val);
+int hv_para_set_synic_register(unsigned int reg, u64 val);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
 void hv_setup_dma_ops(struct device *dev, bool coherent);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (2 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04 13:38   ` ALOK TIWARI
  2025-06-18 16:18   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 05/15] Drivers: hv: Rename fields for SynIC message and event pages Roman Kisel
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

To set up and run the confidential VMBus, the guest needs the paravisor
to intercept access to some synthetic MSRs. In the non-confidential case,
the guest continues using the vendor-specific guest-host communication
protocol.

Update the hv_set_non_nested_msr() function to trap access to some
synthetic MSRs.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 83a85d94bcb3..db6f3e3db012 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -28,6 +28,7 @@
 #include <asm/apic.h>
 #include <asm/timer.h>
 #include <asm/reboot.h>
+#include <asm/msr.h>
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/numa.h>
@@ -77,14 +78,28 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_msr);
 
 void hv_set_non_nested_msr(unsigned int reg, u64 value)
 {
+	if (reg == HV_X64_MSR_EOM && vmbus_is_confidential()) {
+		/* Reach out to the paravisor. */
+		native_wrmsrl(reg, value);
+		return;
+	}
+
 	if (hv_is_synic_msr(reg) && ms_hyperv.paravisor_present) {
+		/* The hypervisor will get the intercept. */
 		hv_ivm_msr_write(reg, value);
 
-		/* Write proxy bit via wrmsl instruction */
-		if (hv_is_sint_msr(reg))
-			wrmsrl(reg, value | 1 << 20);
+		if (hv_is_sint_msr(reg)) {
+			/*
+			 * Write proxy bit in the case of non-confidential VMBus.
+			 * Using wrmsl instruction so the following goes to the paravisor.
+			 */
+			u32 proxy = vmbus_is_confidential() ? 0 : 1;
+
+			value |= (proxy << 20);
+			native_wrmsrl(reg, value);
+		}
 	} else {
-		wrmsrl(reg, value);
+		native_wrmsrl(reg, value);
 	}
 }
 EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 05/15] Drivers: hv: Rename fields for SynIC message and event pages
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (3 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:18   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor SynIC pages when required Roman Kisel
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

Support for the confidential VMBus requires using SynIC message
and event pages shared with the host and separate ones accessible
only to the paravisor.

Rename the host-accessible SynIC message and event pages to be
able to add the paravisor ones. No functional changes. The field
name is also changed in mshv_root.* for consistency.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/channel_mgmt.c |  6 ++--
 drivers/hv/hv.c           | 66 +++++++++++++++++++--------------------
 drivers/hv/hyperv_vmbus.h |  4 +--
 drivers/hv/mshv_root.h    |  2 +-
 drivers/hv/mshv_synic.c   |  6 ++--
 drivers/hv/vmbus_drv.c    |  6 ++--
 6 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 6e084c207414..6f87220e2ca3 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -843,14 +843,14 @@ static void vmbus_wait_for_unload(void)
 				= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 			/*
-			 * In a CoCo VM the synic_message_page is not allocated
+			 * In a CoCo VM the hyp_synic_message_page is not allocated
 			 * in hv_synic_alloc(). Instead it is set/cleared in
 			 * hv_synic_enable_regs() and hv_synic_disable_regs()
 			 * such that it is set only when the CPU is online. If
 			 * not all present CPUs are online, the message page
 			 * might be NULL, so skip such CPUs.
 			 */
-			page_addr = hv_cpu->synic_message_page;
+			page_addr = hv_cpu->hyp_synic_message_page;
 			if (!page_addr)
 				continue;
 
@@ -891,7 +891,7 @@ static void vmbus_wait_for_unload(void)
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
-		page_addr = hv_cpu->synic_message_page;
+		page_addr = hv_cpu->hyp_synic_message_page;
 		if (!page_addr)
 			continue;
 
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 308c8f279df8..964b9102477d 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -145,20 +145,20 @@ int hv_synic_alloc(void)
 		 * Skip these pages allocation here.
 		 */
 		if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
-			hv_cpu->synic_message_page =
+			hv_cpu->hyp_synic_message_page =
 				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->synic_message_page) {
+			if (!hv_cpu->hyp_synic_message_page) {
 				pr_err("Unable to allocate SYNIC message page\n");
 				goto err;
 			}
 
-			hv_cpu->synic_event_page =
+			hv_cpu->hyp_synic_event_page =
 				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->synic_event_page) {
+			if (!hv_cpu->hyp_synic_event_page) {
 				pr_err("Unable to allocate SYNIC event page\n");
 
-				free_page((unsigned long)hv_cpu->synic_message_page);
-				hv_cpu->synic_message_page = NULL;
+				free_page((unsigned long)hv_cpu->hyp_synic_message_page);
+				hv_cpu->hyp_synic_message_page = NULL;
 				goto err;
 			}
 		}
@@ -166,30 +166,30 @@ int hv_synic_alloc(void)
 		if (!ms_hyperv.paravisor_present &&
 		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)
-				hv_cpu->synic_message_page, 1);
+				hv_cpu->hyp_synic_message_page, 1);
 			if (ret) {
 				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
-				hv_cpu->synic_message_page = NULL;
+				hv_cpu->hyp_synic_message_page = NULL;
 
 				/*
 				 * Free the event page here so that hv_synic_free()
 				 * won't later try to re-encrypt it.
 				 */
-				free_page((unsigned long)hv_cpu->synic_event_page);
-				hv_cpu->synic_event_page = NULL;
+				free_page((unsigned long)hv_cpu->hyp_synic_event_page);
+				hv_cpu->hyp_synic_event_page = NULL;
 				goto err;
 			}
 
 			ret = set_memory_decrypted((unsigned long)
-				hv_cpu->synic_event_page, 1);
+				hv_cpu->hyp_synic_event_page, 1);
 			if (ret) {
 				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
-				hv_cpu->synic_event_page = NULL;
+				hv_cpu->hyp_synic_event_page = NULL;
 				goto err;
 			}
 
-			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
-			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+			memset(hv_cpu->hyp_synic_message_page, 0, PAGE_SIZE);
+			memset(hv_cpu->hyp_synic_event_page, 0, PAGE_SIZE);
 		}
 	}
 
@@ -225,28 +225,28 @@ void hv_synic_free(void)
 
 		if (!ms_hyperv.paravisor_present &&
 		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
-			if (hv_cpu->synic_message_page) {
+			if (hv_cpu->hyp_synic_message_page) {
 				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->synic_message_page, 1);
+					hv_cpu->hyp_synic_message_page, 1);
 				if (ret) {
 					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
-					hv_cpu->synic_message_page = NULL;
+					hv_cpu->hyp_synic_message_page = NULL;
 				}
 			}
 
-			if (hv_cpu->synic_event_page) {
+			if (hv_cpu->hyp_synic_event_page) {
 				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->synic_event_page, 1);
+					hv_cpu->hyp_synic_event_page, 1);
 				if (ret) {
 					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
-					hv_cpu->synic_event_page = NULL;
+					hv_cpu->hyp_synic_event_page = NULL;
 				}
 			}
 		}
 
 		free_page((unsigned long)hv_cpu->post_msg_page);
-		free_page((unsigned long)hv_cpu->synic_event_page);
-		free_page((unsigned long)hv_cpu->synic_message_page);
+		free_page((unsigned long)hv_cpu->hyp_synic_event_page);
+		free_page((unsigned long)hv_cpu->hyp_synic_message_page);
 	}
 
 	kfree(hv_context.hv_numa_map);
@@ -276,12 +276,12 @@ void hv_synic_enable_regs(unsigned int cpu)
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
-		hv_cpu->synic_message_page =
+		hv_cpu->hyp_synic_message_page =
 			(void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
-		if (!hv_cpu->synic_message_page)
+		if (!hv_cpu->hyp_synic_message_page)
 			pr_err("Fail to map synic message page.\n");
 	} else {
-		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+		simp.base_simp_gpa = virt_to_phys(hv_cpu->hyp_synic_message_page)
 			>> HV_HYP_PAGE_SHIFT;
 	}
 
@@ -295,12 +295,12 @@ void hv_synic_enable_regs(unsigned int cpu)
 		/* Mask out vTOM bit. ioremap_cache() maps decrypted */
 		u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
 				~ms_hyperv.shared_gpa_boundary;
-		hv_cpu->synic_event_page =
+		hv_cpu->hyp_synic_event_page =
 			(void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
-		if (!hv_cpu->synic_event_page)
+		if (!hv_cpu->hyp_synic_event_page)
 			pr_err("Fail to map synic event page.\n");
 	} else {
-		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->hyp_synic_event_page)
 			>> HV_HYP_PAGE_SHIFT;
 	}
 
@@ -358,8 +358,8 @@ void hv_synic_disable_regs(unsigned int cpu)
 	 */
 	simp.simp_enabled = 0;
 	if (ms_hyperv.paravisor_present || hv_root_partition()) {
-		iounmap(hv_cpu->synic_message_page);
-		hv_cpu->synic_message_page = NULL;
+		iounmap(hv_cpu->hyp_synic_message_page);
+		hv_cpu->hyp_synic_message_page = NULL;
 	} else {
 		simp.base_simp_gpa = 0;
 	}
@@ -370,8 +370,8 @@ void hv_synic_disable_regs(unsigned int cpu)
 	siefp.siefp_enabled = 0;
 
 	if (ms_hyperv.paravisor_present || hv_root_partition()) {
-		iounmap(hv_cpu->synic_event_page);
-		hv_cpu->synic_event_page = NULL;
+		iounmap(hv_cpu->hyp_synic_event_page);
+		hv_cpu->hyp_synic_event_page = NULL;
 	} else {
 		siefp.base_siefp_gpa = 0;
 	}
@@ -401,7 +401,7 @@ static bool hv_synic_event_pending(void)
 {
 	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
 	union hv_synic_event_flags *event =
-		(union hv_synic_event_flags *)hv_cpu->synic_event_page + VMBUS_MESSAGE_SINT;
+		(union hv_synic_event_flags *)hv_cpu->hyp_synic_event_page + VMBUS_MESSAGE_SINT;
 	unsigned long *recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
 	bool pending;
 	u32 relid;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0b450e53161e..fc3cdb26ff1a 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -120,8 +120,8 @@ enum {
  * Per cpu state for channel handling
  */
 struct hv_per_cpu_context {
-	void *synic_message_page;
-	void *synic_event_page;
+	void *hyp_synic_message_page;
+	void *hyp_synic_event_page;
 
 	/*
 	 * The page is only used in hv_post_message() for a TDX VM (with the
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index e3931b0f1269..db6b42db2fdc 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -169,7 +169,7 @@ struct mshv_girq_routing_table {
 };
 
 struct hv_synic_pages {
-	struct hv_message_page *synic_message_page;
+	struct hv_message_page *hyp_synic_message_page;
 	struct hv_synic_event_flags_page *synic_event_flags_page;
 	struct hv_synic_event_ring_page *synic_event_ring_page;
 };
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index e6b6381b7c36..f8b0337cdc82 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -394,7 +394,7 @@ mshv_intercept_isr(struct hv_message *msg)
 void mshv_isr(void)
 {
 	struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
-	struct hv_message_page **msg_page = &spages->synic_message_page;
+	struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
 	struct hv_message *msg;
 	bool handled;
 
@@ -456,7 +456,7 @@ int mshv_synic_init(unsigned int cpu)
 #endif
 	union hv_synic_scontrol sctrl;
 	struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
-	struct hv_message_page **msg_page = &spages->synic_message_page;
+	struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
 	struct hv_synic_event_flags_page **event_flags_page =
 			&spages->synic_event_flags_page;
 	struct hv_synic_event_ring_page **event_ring_page =
@@ -550,7 +550,7 @@ int mshv_synic_cleanup(unsigned int cpu)
 	union hv_synic_sirbp sirbp;
 	union hv_synic_scontrol sctrl;
 	struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
-	struct hv_message_page **msg_page = &spages->synic_message_page;
+	struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
 	struct hv_synic_event_flags_page **event_flags_page =
 		&spages->synic_event_flags_page;
 	struct hv_synic_event_ring_page **event_ring_page =
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 678ad59deb2d..695b1ba7113c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1060,7 +1060,7 @@ static void vmbus_onmessage_work(struct work_struct *work)
 void vmbus_on_msg_dpc(unsigned long data)
 {
 	struct hv_per_cpu_context *hv_cpu = (void *)data;
-	void *page_addr = hv_cpu->synic_message_page;
+	void *page_addr = hv_cpu->hyp_synic_message_page;
 	struct hv_message msg_copy, *msg = (struct hv_message *)page_addr +
 				  VMBUS_MESSAGE_SINT;
 	struct vmbus_channel_message_header *hdr;
@@ -1244,7 +1244,7 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 	 * The event page can be directly checked to get the id of
 	 * the channel that has the interrupt pending.
 	 */
-	void *page_addr = hv_cpu->synic_event_page;
+	void *page_addr = hv_cpu->hyp_synic_event_page;
 	union hv_synic_event_flags *event
 		= (union hv_synic_event_flags *)page_addr +
 					 VMBUS_MESSAGE_SINT;
@@ -1327,7 +1327,7 @@ static void vmbus_isr(void)
 
 	vmbus_chan_sched(hv_cpu);
 
-	page_addr = hv_cpu->synic_message_page;
+	page_addr = hv_cpu->hyp_synic_message_page;
 	msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
 
 	/* Check if there are actual msgs to be processed */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor SynIC pages when required
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (4 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 05/15] Drivers: hv: Rename fields for SynIC message and event pages Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:18   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available Roman Kisel
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The paravisor needs the SynIC pages to communicate with the guest
via the confidential VMBus.

Refactor and extaned the exisitng code to account for that.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/hv.c           | 184 +++++++++++++++++++-------------------
 drivers/hv/hyperv_vmbus.h |  17 ++++
 2 files changed, 111 insertions(+), 90 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 964b9102477d..e25c91eb6af5 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -94,10 +94,70 @@ int hv_post_message(union hv_connection_id connection_id,
 	return hv_result(status);
 }
 
+static int hv_alloc_page(unsigned int cpu, void **page, bool decrypt,
+	const char *note)
+{
+	int ret = 0;
+
+	/*
+	 * After the page changes its encryption status, its contents might
+	 * appear scrambled on some hardware. Thus `get_zeroed_page` would
+	 * zero the page out in vain, we do that ourselves exactly once.
+	 *
+	 * By default, the page is allocated encrypted provided the system
+	 * supports that.
+	 */
+	*page = (void *)__get_free_page(GFP_KERNEL);
+	if (!*page)
+		return -ENOMEM;
+
+	if (decrypt)
+		ret = set_memory_decrypted((unsigned long)*page, 1);
+	if (ret)
+		goto failed;
+
+	memset(*page, 0, PAGE_SIZE);
+	return 0;
+
+failed:
+
+	pr_err("allocation failed for %s page, error %d when allocating the page, decrypted %d\n",
+		note, ret, decrypt);
+	free_page((unsigned long)*page);
+	*page = NULL;
+	return ret;
+}
+
+static int hv_free_page(void **page, bool encrypt, const char *note)
+{
+	int ret = 0;
+
+	if (!*page)
+		return 0;
+
+	if (encrypt)
+		ret = set_memory_encrypted((unsigned long)*page, 1);
+
+	/*
+	 * In the case of the action failure, the page is leaked.
+	 * Something is wrong, prefer to lose the page and stay afloat.
+	 */
+	if (ret) {
+		pr_err("deallocation failed for %s page, error %d, encrypt %d\n",
+			note, ret, encrypt);
+	} else
+		free_page((unsigned long)*page);
+
+	*page = NULL;
+
+	return ret;
+}
+
 int hv_synic_alloc(void)
 {
 	int cpu, ret = -ENOMEM;
 	struct hv_per_cpu_context *hv_cpu;
+	const bool decrypt = !vmbus_is_confidential();
 
 	/*
 	 * First, zero all per-cpu memory areas so hv_synic_free() can
@@ -123,73 +183,37 @@ int hv_synic_alloc(void)
 			     vmbus_on_msg_dpc, (unsigned long)hv_cpu);
 
 		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
-			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->post_msg_page) {
-				pr_err("Unable to allocate post msg page\n");
-				goto err;
-			}
-
-			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
-			if (ret) {
-				pr_err("Failed to decrypt post msg page: %d\n", ret);
-				/* Just leak the page, as it's unsafe to free the page. */
-				hv_cpu->post_msg_page = NULL;
+			ret = hv_alloc_page(cpu, &hv_cpu->post_msg_page,
+				decrypt, "post msg");
+			if (ret)
 				goto err;
-			}
-
-			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
 		}
 
 		/*
-		 * Synic message and event pages are allocated by paravisor.
-		 * Skip these pages allocation here.
+		 * If these SynIC pages are not allocated, SIEF and SIM pages
+		 * are configured using what the root partition or the paravisor
+		 * provides upon reading the SIEFP and SIMP registers.
 		 */
 		if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
-			hv_cpu->hyp_synic_message_page =
-				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->hyp_synic_message_page) {
-				pr_err("Unable to allocate SYNIC message page\n");
+			ret = hv_alloc_page(cpu, &hv_cpu->hyp_synic_message_page,
+				decrypt, "hypervisor SynIC msg");
+			if (ret)
 				goto err;
-			}
-
-			hv_cpu->hyp_synic_event_page =
-				(void *)get_zeroed_page(GFP_ATOMIC);
-			if (!hv_cpu->hyp_synic_event_page) {
-				pr_err("Unable to allocate SYNIC event page\n");
-
-				free_page((unsigned long)hv_cpu->hyp_synic_message_page);
-				hv_cpu->hyp_synic_message_page = NULL;
+			ret = hv_alloc_page(cpu, &hv_cpu->hyp_synic_event_page,
+				decrypt, "hypervisor SynIC event");
+			if (ret)
 				goto err;
 			}
-		}
 
-		if (!ms_hyperv.paravisor_present &&
-		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
-			ret = set_memory_decrypted((unsigned long)
-				hv_cpu->hyp_synic_message_page, 1);
-			if (ret) {
-				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
-				hv_cpu->hyp_synic_message_page = NULL;
-
-				/*
-				 * Free the event page here so that hv_synic_free()
-				 * won't later try to re-encrypt it.
-				 */
-				free_page((unsigned long)hv_cpu->hyp_synic_event_page);
-				hv_cpu->hyp_synic_event_page = NULL;
+		if (vmbus_is_confidential()) {
+			ret = hv_alloc_page(cpu, &hv_cpu->para_synic_message_page,
+				decrypt, "paravisor SynIC msg");
+			if (ret)
 				goto err;
-			}
-
-			ret = set_memory_decrypted((unsigned long)
-				hv_cpu->hyp_synic_event_page, 1);
-			if (ret) {
-				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
-				hv_cpu->hyp_synic_event_page = NULL;
+			ret = hv_alloc_page(cpu, &hv_cpu->para_synic_event_page,
+				decrypt, "paravisor SynIC event");
+			if (ret)
 				goto err;
-			}
-
-			memset(hv_cpu->hyp_synic_message_page, 0, PAGE_SIZE);
-			memset(hv_cpu->hyp_synic_event_page, 0, PAGE_SIZE);
 		}
 	}
 
@@ -205,48 +229,28 @@ int hv_synic_alloc(void)
 
 void hv_synic_free(void)
 {
-	int cpu, ret;
+	int cpu;
+	const bool encrypt = !vmbus_is_confidential();
 
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu =
 			per_cpu_ptr(hv_context.cpu_context, cpu);
 
-		/* It's better to leak the page if the encryption fails. */
-		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
-			if (hv_cpu->post_msg_page) {
-				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->post_msg_page, 1);
-				if (ret) {
-					pr_err("Failed to encrypt post msg page: %d\n", ret);
-					hv_cpu->post_msg_page = NULL;
-				}
-			}
+		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx())
+			hv_free_page(&hv_cpu->post_msg_page,
+				encrypt, "post msg");
+		if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
+			hv_free_page(&hv_cpu->hyp_synic_event_page,
+				encrypt, "hypervisor SynIC event");
+			hv_free_page(&hv_cpu->hyp_synic_message_page,
+				encrypt, "hypervisor SynIC msg");
 		}
-
-		if (!ms_hyperv.paravisor_present &&
-		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
-			if (hv_cpu->hyp_synic_message_page) {
-				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->hyp_synic_message_page, 1);
-				if (ret) {
-					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
-					hv_cpu->hyp_synic_message_page = NULL;
-				}
-			}
-
-			if (hv_cpu->hyp_synic_event_page) {
-				ret = set_memory_encrypted((unsigned long)
-					hv_cpu->hyp_synic_event_page, 1);
-				if (ret) {
-					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
-					hv_cpu->hyp_synic_event_page = NULL;
-				}
-			}
+		if (vmbus_is_confidential()) {
+			hv_free_page(&hv_cpu->para_synic_event_page,
+				encrypt, "paravisor SynIC event");
+			hv_free_page(&hv_cpu->para_synic_message_page,
+				encrypt, "paravisor SynIC msg");
 		}
-
-		free_page((unsigned long)hv_cpu->post_msg_page);
-		free_page((unsigned long)hv_cpu->hyp_synic_event_page);
-		free_page((unsigned long)hv_cpu->hyp_synic_message_page);
 	}
 
 	kfree(hv_context.hv_numa_map);
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index fc3cdb26ff1a..9619edcf9f88 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -120,8 +120,25 @@ enum {
  * Per cpu state for channel handling
  */
 struct hv_per_cpu_context {
+	/*
+	 * SynIC pages for communicating with the host.
+	 *
+	 * These pages are accessible to the host partition and the hypervisor,
+	 * so they can only be used for exchanging data when the host partition
+	 * and the hypervisor are trusted.
+	 */
 	void *hyp_synic_message_page;
 	void *hyp_synic_event_page;
+	/*
+	 * SynIC pages for communicating with the paravisor.
+	 *
+	 * These pages can be accessed only from within the guest partition.
+	 * Neither the host partition nor the hypervisor can access these pages,
+	 * so they can be used for exchanging data when the host partition and
+	 * the hypervisor are not trusted, such as in a confidential VM.
+	 */
+	void *para_synic_message_page;
+	void *para_synic_event_page;
 
 	/*
 	 * The page is only used in hv_post_message() for a TDX VM (with the
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (5 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor SynIC pages when required Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04 13:48   ` ALOK TIWARI
  2025-06-18 16:18   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 08/15] Drivers: hv: remove stale comment Roman Kisel
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

When the confidential VMBus is available, the guest should post
messages via the paravisor.

Update hv_post_message() to request posting messages from the paravisor
rather than through GHCB or TD calls.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index e25c91eb6af5..1f7cf1244509 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -74,7 +74,7 @@ int hv_post_message(union hv_connection_id connection_id,
 	aligned_msg->payload_size = payload_size;
 	memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-	if (ms_hyperv.paravisor_present) {
+	if (ms_hyperv.paravisor_present && !vmbus_is_confidential()) {
 		if (hv_isolation_type_tdx())
 			status = hv_tdx_hypercall(HVCALL_POST_MESSAGE,
 						  virt_to_phys(aligned_msg), 0);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 08/15] Drivers: hv: remove stale comment
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (6 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04  0:43 ` [PATCH hyperv-next v3 09/15] Drivers: hv: Use memunmap() to check if the address is in IO map Roman Kisel
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The comment about the x2v shim is ancient and long since incorrect.

Remove the incorrect comment.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/hv.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 1f7cf1244509..6a4857def82d 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -257,11 +257,7 @@ void hv_synic_free(void)
 }
 
 /*
- * hv_synic_init - Initialize the Synthetic Interrupt Controller.
- *
- * If it is already initialized by another entity (ie x2v shim), we need to
- * retrieve the initialized message and event pages.  Otherwise, we create and
- * initialize the message and event pages.
+ * hv_synic_enable_regs - Initialize the Synthetic Interrupt Controller.
  */
 void hv_synic_enable_regs(unsigned int cpu)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 09/15] Drivers: hv: Use memunmap() to check if the address is in IO map
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (7 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 08/15] Drivers: hv: remove stale comment Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:18   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 10/15] Drivers: hv: Rename the SynIC enable and disable routines Roman Kisel
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

It might happen that some hyp SynIC pages aren't IO mapped.

Use memunmap() that checks for that and only then calls iounmap()

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 6a4857def82d..9a66656d89e0 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -358,7 +358,7 @@ void hv_synic_disable_regs(unsigned int cpu)
 	 */
 	simp.simp_enabled = 0;
 	if (ms_hyperv.paravisor_present || hv_root_partition()) {
-		iounmap(hv_cpu->hyp_synic_message_page);
+		memunmap(hv_cpu->hyp_synic_message_page);
 		hv_cpu->hyp_synic_message_page = NULL;
 	} else {
 		simp.base_simp_gpa = 0;
@@ -370,7 +370,7 @@ void hv_synic_disable_regs(unsigned int cpu)
 	siefp.siefp_enabled = 0;
 
 	if (ms_hyperv.paravisor_present || hv_root_partition()) {
-		iounmap(hv_cpu->hyp_synic_event_page);
+		memunmap(hv_cpu->hyp_synic_event_page);
 		hv_cpu->hyp_synic_event_page = NULL;
 	} else {
 		siefp.base_siefp_gpa = 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 10/15] Drivers: hv: Rename the SynIC enable and disable routines
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (8 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 09/15] Drivers: hv: Use memunmap() to check if the address is in IO map Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:19   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 11/15] Drivers: hv: Functions for setting up and tearing down the paravisor SynIC Roman Kisel
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus requires support for the both hypervisor
facing SynIC and the paravisor one.

Rename the functions that enable and disable SynIC with the
hypervisor.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/channel_mgmt.c |  2 +-
 drivers/hv/hv.c           | 11 ++++++-----
 drivers/hv/hyperv_vmbus.h |  4 ++--
 drivers/hv/vmbus_drv.c    |  6 +++---
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 6f87220e2ca3..ca2fe10c110a 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -845,7 +845,7 @@ static void vmbus_wait_for_unload(void)
 			/*
 			 * In a CoCo VM the hyp_synic_message_page is not allocated
 			 * in hv_synic_alloc(). Instead it is set/cleared in
-			 * hv_synic_enable_regs() and hv_synic_disable_regs()
+			 * hv_hyp_synic_enable_regs() and hv_hyp_synic_disable_regs()
 			 * such that it is set only when the CPU is online. If
 			 * not all present CPUs are online, the message page
 			 * might be NULL, so skip such CPUs.
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 9a66656d89e0..2b561825089a 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -257,9 +257,10 @@ void hv_synic_free(void)
 }
 
 /*
- * hv_synic_enable_regs - Initialize the Synthetic Interrupt Controller.
+ * hv_hyp_synic_enable_regs - Initialize the Synthetic Interrupt Controller
+ * with the hypervisor.
  */
-void hv_synic_enable_regs(unsigned int cpu)
+void hv_hyp_synic_enable_regs(unsigned int cpu)
 {
 	struct hv_per_cpu_context *hv_cpu =
 		per_cpu_ptr(hv_context.cpu_context, cpu);
@@ -325,14 +326,14 @@ void hv_synic_enable_regs(unsigned int cpu)
 
 int hv_synic_init(unsigned int cpu)
 {
-	hv_synic_enable_regs(cpu);
+	hv_hyp_synic_enable_regs(cpu);
 
 	hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
 
 	return 0;
 }
 
-void hv_synic_disable_regs(unsigned int cpu)
+void hv_hyp_synic_disable_regs(unsigned int cpu)
 {
 	struct hv_per_cpu_context *hv_cpu =
 		per_cpu_ptr(hv_context.cpu_context, cpu);
@@ -515,7 +516,7 @@ int hv_synic_cleanup(unsigned int cpu)
 always_cleanup:
 	hv_stimer_legacy_cleanup(cpu);
 
-	hv_synic_disable_regs(cpu);
+	hv_hyp_synic_disable_regs(cpu);
 
 	return ret;
 }
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9619edcf9f88..c1df611d1eb2 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -188,10 +188,10 @@ extern int hv_synic_alloc(void);
 
 extern void hv_synic_free(void);
 
-extern void hv_synic_enable_regs(unsigned int cpu);
+extern void hv_hyp_synic_enable_regs(unsigned int cpu);
 extern int hv_synic_init(unsigned int cpu);
 
-extern void hv_synic_disable_regs(unsigned int cpu);
+extern void hv_hyp_synic_disable_regs(unsigned int cpu);
 extern int hv_synic_cleanup(unsigned int cpu);
 
 /* Interface */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 695b1ba7113c..f7e82a4fe133 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2809,7 +2809,7 @@ static void hv_crash_handler(struct pt_regs *regs)
 	 */
 	cpu = smp_processor_id();
 	hv_stimer_cleanup(cpu);
-	hv_synic_disable_regs(cpu);
+	hv_hyp_synic_disable_regs(cpu);
 };
 
 static int hv_synic_suspend(void)
@@ -2834,14 +2834,14 @@ static int hv_synic_suspend(void)
 	 * interrupts-disabled context.
 	 */
 
-	hv_synic_disable_regs(0);
+	hv_hyp_synic_disable_regs(0);
 
 	return 0;
 }
 
 static void hv_synic_resume(void)
 {
-	hv_synic_enable_regs(0);
+	hv_hyp_synic_enable_regs(0);
 
 	/*
 	 * Note: we don't need to call hv_stimer_init(0), because the timer
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 11/15] Drivers: hv: Functions for setting up and tearing down the paravisor SynIC
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (9 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 10/15] Drivers: hv: Rename the SynIC enable and disable routines Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:19   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 12/15] Drivers: hv: Allocate encrypted buffers when requested Roman Kisel
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus runs with the paravisor SynIC and requires
configuring it with the paravisor.

Add the functions for configuring the paravisor SynIC

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/hv.c | 180 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 169 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 2b561825089a..c9649ab3439e 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -203,7 +203,7 @@ int hv_synic_alloc(void)
 				decrypt, "hypervisor SynIC event");
 			if (ret)
 				goto err;
-			}
+		}
 
 		if (vmbus_is_confidential()) {
 			ret = hv_alloc_page(cpu, &hv_cpu->para_synic_message_page,
@@ -267,7 +267,6 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_sint shared_sint;
-	union hv_synic_scontrol sctrl;
 
 	/* Setup the Synic's message page */
 	simp.as_uint64 = hv_get_msr(HV_MSR_SIMP);
@@ -288,7 +287,7 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
 
 	hv_set_msr(HV_MSR_SIMP, simp.as_uint64);
 
-	/* Setup the Synic's event page */
+	/* Setup the Synic's event page with the hypervisor. */
 	siefp.as_uint64 = hv_get_msr(HV_MSR_SIEFP);
 	siefp.siefp_enabled = 1;
 
@@ -316,6 +315,11 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
 	shared_sint.masked = false;
 	shared_sint.auto_eoi = hv_recommend_using_aeoi();
 	hv_set_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+}
+
+static void hv_hyp_synic_enable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
 
 	/* Enable the global synic bit */
 	sctrl.as_uint64 = hv_get_msr(HV_MSR_SCONTROL);
@@ -324,13 +328,90 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
 	hv_set_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
 }
 
+/*
+ * The paravisor might not support proxying SynIC, and this
+ * function may fail.
+ */
+static int hv_para_synic_enable_regs(unsigned int cpu)
+{
+	int err;
+	union hv_synic_simp simp;
+	union hv_synic_siefp siefp;
+	struct hv_per_cpu_context *hv_cpu
+		= per_cpu_ptr(hv_context.cpu_context, cpu);
+
+	/* Setup the Synic's message page with the paravisor. */
+	err = hv_para_get_synic_register(HV_MSR_SIMP, &simp.as_uint64);
+	if (err)
+		return err;
+	simp.simp_enabled = 1;
+	simp.base_simp_gpa = virt_to_phys(hv_cpu->para_synic_message_page)
+			>> HV_HYP_PAGE_SHIFT;
+	err = hv_para_set_synic_register(HV_MSR_SIMP, simp.as_uint64);
+	if (err)
+		return err;
+
+	/* Setup the Synic's event page with the paravisor. */
+	err = hv_para_get_synic_register(HV_MSR_SIEFP, &siefp.as_uint64);
+	if (err)
+		return err;
+	siefp.siefp_enabled = 1;
+	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->para_synic_event_page)
+			>> HV_HYP_PAGE_SHIFT;
+	return hv_para_set_synic_register(HV_MSR_SIEFP, siefp.as_uint64);
+}
+
+static int hv_para_synic_enable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
+	int err;
+
+	/* Enable the global synic bit */
+	err = hv_para_get_synic_register(HV_MSR_SCONTROL, &sctrl.as_uint64);
+	if (err)
+		return err;
+	sctrl.enable = 1;
+
+	return hv_para_set_synic_register(HV_MSR_SCONTROL, sctrl.as_uint64);
+}
+
 int hv_synic_init(unsigned int cpu)
 {
+	int err;
+
+	/*
+	 * The paravisor may not support the confidential VMBus,
+	 * check on that first.
+	 */
+	if (vmbus_is_confidential()) {
+		err = hv_para_synic_enable_regs(cpu);
+		if (err)
+			goto fail;
+	}
+
 	hv_hyp_synic_enable_regs(cpu);
+	if (vmbus_is_confidential()) {
+		err = hv_para_synic_enable_interrupts();
+		if (err)
+			goto fail;
+	} else
+		hv_hyp_synic_enable_interrupts();
 
 	hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
 
 	return 0;
+
+fail:
+	/*
+	 * The failure may only come from enabling the paravisor SynIC.
+	 * That in turn means that the confidential VMBus cannot be used
+	 * which is not an error: the setup will be re-tried with the
+	 * non-confidential VMBus.
+	 *
+	 * We also don't bother attempting to reset the paravisor registers
+	 * as something isn't working there anyway.
+	 */
+	return err;
 }
 
 void hv_hyp_synic_disable_regs(unsigned int cpu)
@@ -340,7 +421,6 @@ void hv_hyp_synic_disable_regs(unsigned int cpu)
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
-	union hv_synic_scontrol sctrl;
 
 	shared_sint.as_uint64 = hv_get_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT);
 
@@ -378,14 +458,71 @@ void hv_hyp_synic_disable_regs(unsigned int cpu)
 	}
 
 	hv_set_msr(HV_MSR_SIEFP, siefp.as_uint64);
+}
+
+static void hv_hyp_synic_disable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
 
 	/* Disable the global synic bit */
 	sctrl.as_uint64 = hv_get_msr(HV_MSR_SCONTROL);
 	sctrl.enable = 0;
 	hv_set_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
+}
 
-	if (vmbus_irq != -1)
-		disable_percpu_irq(vmbus_irq);
+static void hv_para_synic_disable_regs(unsigned int cpu)
+{
+	/*
+	 * When a get/set register error is encountered, the function
+	 * returns as the paravisor may not support these registers.
+	 */
+	int err;
+	union hv_synic_simp simp;
+	union hv_synic_siefp siefp;
+
+	struct hv_per_cpu_context *hv_cpu
+		= per_cpu_ptr(hv_context.cpu_context, cpu);
+
+	/* Disable SynIC's message page in the paravisor. */
+	err = hv_para_get_synic_register(HV_MSR_SIMP, &simp.as_uint64);
+	if (err)
+		return;
+	simp.simp_enabled = 0;
+
+	if (hv_cpu->para_synic_message_page) {
+		free_page((u64)(hv_cpu->para_synic_message_page));
+		hv_cpu->para_synic_message_page = NULL;
+	}
+
+	err = hv_para_set_synic_register(HV_MSR_SIMP, simp.as_uint64);
+	if (err)
+		return;
+
+	/* Disable SynIC's event page in the paravisor. */
+	err = hv_para_get_synic_register(HV_MSR_SIEFP, &siefp.as_uint64);
+	if (err)
+		return;
+	siefp.siefp_enabled = 0;
+
+	if (hv_cpu->para_synic_event_page) {
+		free_page((u64)(hv_cpu->para_synic_event_page));
+		hv_cpu->para_synic_event_page = NULL;
+	}
+
+	hv_para_set_synic_register(HV_MSR_SIEFP, siefp.as_uint64);
+}
+
+static void hv_para_synic_disable_interrupts(void)
+{
+	union hv_synic_scontrol sctrl;
+	int err;
+
+	/* Disable the global synic bit */
+	err = hv_para_get_synic_register(HV_MSR_SCONTROL, &sctrl.as_uint64);
+	if (err)
+		return;
+	sctrl.enable = 0;
+	hv_para_set_synic_register(HV_MSR_SCONTROL, sctrl.as_uint64);
 }
 
 #define HV_MAX_TRIES 3
@@ -398,16 +535,18 @@ void hv_hyp_synic_disable_regs(unsigned int cpu)
  * that the normal interrupt handling mechanism will find and process the channel interrupt
  * "very soon", and in the process clear the bit.
  */
-static bool hv_synic_event_pending(void)
+static bool __hv_synic_event_pending(union hv_synic_event_flags *event, int sint)
 {
-	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
-	union hv_synic_event_flags *event =
-		(union hv_synic_event_flags *)hv_cpu->hyp_synic_event_page + VMBUS_MESSAGE_SINT;
-	unsigned long *recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
+	unsigned long *recv_int_page;
 	bool pending;
 	u32 relid;
 	int tries = 0;
 
+	if (!event)
+		return false;
+
+	event += sint;
+	recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
 retry:
 	pending = false;
 	for_each_set_bit(relid, recv_int_page, HV_EVENT_FLAGS_COUNT) {
@@ -424,6 +563,17 @@ static bool hv_synic_event_pending(void)
 	return pending;
 }
 
+static bool hv_synic_event_pending(void)
+{
+	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
+	union hv_synic_event_flags *hyp_synic_event_page = hv_cpu->hyp_synic_event_page;
+	union hv_synic_event_flags *para_synic_event_page = hv_cpu->para_synic_event_page;
+
+	return
+		__hv_synic_event_pending(hyp_synic_event_page, VMBUS_MESSAGE_SINT) ||
+		__hv_synic_event_pending(para_synic_event_page, VMBUS_MESSAGE_SINT);
+}
+
 static int hv_pick_new_cpu(struct vmbus_channel *channel)
 {
 	int ret = -EBUSY;
@@ -517,6 +667,14 @@ int hv_synic_cleanup(unsigned int cpu)
 	hv_stimer_legacy_cleanup(cpu);
 
 	hv_hyp_synic_disable_regs(cpu);
+	if (vmbus_is_confidential()) {
+		hv_para_synic_disable_regs(cpu);
+		hv_para_synic_disable_interrupts();
+	} else {
+		hv_hyp_synic_disable_interrupts();
+	}
+	if (vmbus_irq != -1)
+		disable_percpu_irq(vmbus_irq);
 
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 12/15] Drivers: hv: Allocate encrypted buffers when requested
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (10 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 11/15] Drivers: hv: Functions for setting up and tearing down the paravisor SynIC Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:19   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels Roman Kisel
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

Confidential VMBus is built around using buffers not shared with
the host.

Support allocating encrypted buffers when requested.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/channel.c      | 43 +++++++++++++++++++++++----------------
 drivers/hv/hyperv_vmbus.h |  3 ++-
 drivers/hv/ring_buffer.c  |  5 +++--
 3 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index fb8cd8469328..3e2891c4b800 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -443,20 +443,23 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
 		return ret;
 	}
 
-	/*
-	 * Set the "decrypted" flag to true for the set_memory_decrypted()
-	 * success case. In the failure case, the encryption state of the
-	 * memory is unknown. Leave "decrypted" as true to ensure the
-	 * memory will be leaked instead of going back on the free list.
-	 */
-	gpadl->decrypted = true;
-	ret = set_memory_decrypted((unsigned long)kbuffer,
-				   PFN_UP(size));
-	if (ret) {
-		dev_warn(&channel->device_obj->device,
-			 "Failed to set host visibility for new GPADL %d.\n",
-			 ret);
-		return ret;
+	gpadl->decrypted = !((channel->co_external_memory && type == HV_GPADL_BUFFER) ||
+		(channel->co_ring_buffer && type == HV_GPADL_RING));
+	if (gpadl->decrypted) {
+		/*
+		 * Set the "decrypted" flag to true for the set_memory_decrypted()
+		 * success case. In the failure case, the encryption state of the
+		 * memory is unknown. Leave "decrypted" as true to ensure the
+		 * memory will be leaked instead of going back on the free list.
+		 */
+		ret = set_memory_decrypted((unsigned long)kbuffer,
+					PFN_UP(size));
+		if (ret) {
+			dev_warn(&channel->device_obj->device,
+				"Failed to set host visibility for new GPADL %d.\n",
+				ret);
+			return ret;
+		}
 	}
 
 	init_completion(&msginfo->waitevent);
@@ -676,12 +679,13 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 		goto error_clean_ring;
 
 	err = hv_ringbuffer_init(&newchannel->outbound,
-				 page, send_pages, 0);
+				 page, send_pages, 0, newchannel->co_ring_buffer);
 	if (err)
 		goto error_free_gpadl;
 
 	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
-				 recv_pages, newchannel->max_pkt_size);
+				 recv_pages, newchannel->max_pkt_size,
+				 newchannel->co_ring_buffer);
 	if (err)
 		goto error_free_gpadl;
 
@@ -862,8 +866,11 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl *gpad
 
 	kfree(info);
 
-	ret = set_memory_encrypted((unsigned long)gpadl->buffer,
-				   PFN_UP(gpadl->size));
+	if (gpadl->decrypted)
+		ret = set_memory_encrypted((unsigned long)gpadl->buffer,
+					PFN_UP(gpadl->size));
+	else
+		ret = 0;
 	if (ret)
 		pr_warn("Fail to set mem host visibility in GPADL teardown %d.\n", ret);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index c1df611d1eb2..0f02e163b0a0 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -199,7 +199,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-		       struct page *pages, u32 pagecnt, u32 max_pkt_size);
+		       struct page *pages, u32 pagecnt, u32 max_pkt_size,
+			   bool confidential);
 
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 3c9b02471760..05c2cd42fc75 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -183,7 +183,8 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
 
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
+		       struct page *pages, u32 page_cnt, u32 max_pkt_size,
+			   bool confidential)
 {
 	struct page **pages_wraparound;
 	int i;
@@ -207,7 +208,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 
 	ring_info->ring_buffer = (struct hv_ring_buffer *)
 		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
-			pgprot_decrypted(PAGE_KERNEL));
+			confidential ? PAGE_KERNEL : pgprot_decrypted(PAGE_KERNEL));
 
 	kfree(pages_wraparound);
 	if (!ring_info->ring_buffer)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (11 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 12/15] Drivers: hv: Allocate encrypted buffers when requested Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-04 14:15   ` ALOK TIWARI
  2025-06-18 16:19   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 14/15] Drivers: hv: Support establishing the confidential VMBus connection Roman Kisel
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

To run a confidential VMBus channels, one has to initialize the
co_ring_buffers and co_external_memory fields of the channel
structure.

Advertise support upon negoatiating the version and compute
values for those fields and initialize them.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/channel_mgmt.c | 19 +++++++++++++++++++
 drivers/hv/connection.c   |  3 +++
 2 files changed, 22 insertions(+)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index ca2fe10c110a..33bc29e826bd 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -1021,6 +1021,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	struct vmbus_channel_offer_channel *offer;
 	struct vmbus_channel *oldchannel, *newchannel;
 	size_t offer_sz;
+	bool co_ring_buffer, co_external_memory;
 
 	offer = (struct vmbus_channel_offer_channel *)hdr;
 
@@ -1033,6 +1034,22 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		return;
 	}
 
+	co_ring_buffer = is_co_ring_buffer(offer);
+	if (co_ring_buffer) {
+		if (vmbus_proto_version < VERSION_WIN10_V6_0 || !vmbus_is_confidential()) {
+			atomic_dec(&vmbus_connection.offer_in_progress);
+			return;
+		}
+	}
+
+	co_external_memory = is_co_external_memory(offer);
+	if (is_co_external_memory(offer)) {
+		if (vmbus_proto_version < VERSION_WIN10_V6_0 || !vmbus_is_confidential()) {
+			atomic_dec(&vmbus_connection.offer_in_progress);
+			return;
+		}
+	}
+
 	oldchannel = find_primary_channel_by_offer(offer);
 
 	if (oldchannel != NULL) {
@@ -1111,6 +1128,8 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		pr_err("Unable to allocate channel object\n");
 		return;
 	}
+	newchannel->co_ring_buffer = co_ring_buffer;
+	newchannel->co_external_memory = co_external_memory;
 
 	vmbus_setup_channel_state(newchannel, offer);
 
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index be490c598785..eeb472019d69 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -105,6 +105,9 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 		vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
 	}
 
+	if (vmbus_is_confidential() && version >= VERSION_WIN10_V6_0)
+		msg->feature_flags = VMBUS_FEATURE_FLAG_CONFIDENTIAL_CHANNELS;
+
 	/*
 	 * shared_gpa_boundary is zero in non-SNP VMs, so it's safe to always
 	 * bitwise OR it
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 14/15] Drivers: hv: Support establishing the confidential VMBus connection
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (12 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:19   ` Michael Kelley
  2025-06-04  0:43 ` [PATCH hyperv-next v3 15/15] Drivers: hv: Set the default VMBus version to 6.0 Roman Kisel
  2025-06-18 16:13 ` [PATCH hyperv-next v3 00/15] Confidential VMBus Michael Kelley
  15 siblings, 1 reply; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

To establish the confidential VMBus connection the CoCo VM guest
first attempts to connect to the VMBus server run by the paravisor.
If that fails, the guest falls back to the non-confidential VMBus.

Implement that in the VMBus driver initialization.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/vmbus_drv.c | 169 +++++++++++++++++++++++++++--------------
 1 file changed, 110 insertions(+), 59 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f7e82a4fe133..88701c3ad999 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1057,12 +1057,9 @@ static void vmbus_onmessage_work(struct work_struct *work)
 	kfree(ctx);
 }
 
-void vmbus_on_msg_dpc(unsigned long data)
+static void __vmbus_on_msg_dpc(void *message_page_addr)
 {
-	struct hv_per_cpu_context *hv_cpu = (void *)data;
-	void *page_addr = hv_cpu->hyp_synic_message_page;
-	struct hv_message msg_copy, *msg = (struct hv_message *)page_addr +
-				  VMBUS_MESSAGE_SINT;
+	struct hv_message msg_copy, *msg;
 	struct vmbus_channel_message_header *hdr;
 	enum vmbus_channel_message_type msgtype;
 	const struct vmbus_channel_message_table_entry *entry;
@@ -1070,6 +1067,10 @@ void vmbus_on_msg_dpc(unsigned long data)
 	__u8 payload_size;
 	u32 message_type;
 
+	if (!message_page_addr)
+		return;
+	msg = (struct hv_message *)message_page_addr + VMBUS_MESSAGE_SINT;
+
 	/*
 	 * 'enum vmbus_channel_message_type' is supposed to always be 'u32' as
 	 * it is being used in 'struct vmbus_channel_message_header' definition
@@ -1195,6 +1196,14 @@ void vmbus_on_msg_dpc(unsigned long data)
 	vmbus_signal_eom(msg, message_type);
 }
 
+void vmbus_on_msg_dpc(unsigned long data)
+{
+	struct hv_per_cpu_context *hv_cpu = (void *)data;
+
+	__vmbus_on_msg_dpc(hv_cpu->hyp_synic_message_page);
+	__vmbus_on_msg_dpc(hv_cpu->para_synic_message_page);
+}
+
 #ifdef CONFIG_PM_SLEEP
 /*
  * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
@@ -1233,21 +1242,19 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 #endif /* CONFIG_PM_SLEEP */
 
 /*
- * Schedule all channels with events pending
+ * Schedule all channels with events pending.
+ * The event page can be directly checked to get the id of
+ * the channel that has the interrupt pending.
  */
-static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
+static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu, void *event_page_addr)
 {
 	unsigned long *recv_int_page;
 	u32 maxbits, relid;
+	union hv_synic_event_flags *event;
 
-	/*
-	 * The event page can be directly checked to get the id of
-	 * the channel that has the interrupt pending.
-	 */
-	void *page_addr = hv_cpu->hyp_synic_event_page;
-	union hv_synic_event_flags *event
-		= (union hv_synic_event_flags *)page_addr +
-					 VMBUS_MESSAGE_SINT;
+	if (!event_page_addr)
+		return;
+	event = (union hv_synic_event_flags *)event_page_addr + VMBUS_MESSAGE_SINT;
 
 	maxbits = HV_EVENT_FLAGS_COUNT;
 	recv_int_page = event->flags;
@@ -1318,26 +1325,40 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 	}
 }
 
-static void vmbus_isr(void)
+static void vmbus_message_sched(struct hv_per_cpu_context *hv_cpu, void *message_page_addr)
 {
-	struct hv_per_cpu_context *hv_cpu
-		= this_cpu_ptr(hv_context.cpu_context);
-	void *page_addr;
 	struct hv_message *msg;
 
-	vmbus_chan_sched(hv_cpu);
-
-	page_addr = hv_cpu->hyp_synic_message_page;
-	msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
+	if (!message_page_addr)
+		return;
+	msg = (struct hv_message *)message_page_addr + VMBUS_MESSAGE_SINT;
 
 	/* Check if there are actual msgs to be processed */
 	if (msg->header.message_type != HVMSG_NONE) {
 		if (msg->header.message_type == HVMSG_TIMER_EXPIRED) {
 			hv_stimer0_isr();
 			vmbus_signal_eom(msg, HVMSG_TIMER_EXPIRED);
-		} else
+		} else {
 			tasklet_schedule(&hv_cpu->msg_dpc);
+		}
 	}
+}
+
+static void vmbus_isr(void)
+{
+	struct hv_per_cpu_context *hv_cpu
+		= this_cpu_ptr(hv_context.cpu_context);
+
+	/*
+	 * Suggested-by: Michael Kelley <mhklinux@outlook.com>
+	 * One possible optimization would be to keep track of the largest relID that's in use,
+	 * and only scan up to that relID.
+	 */
+	vmbus_chan_sched(hv_cpu, hv_cpu->hyp_synic_event_page);
+	vmbus_chan_sched(hv_cpu, hv_cpu->para_synic_event_page);
+
+	vmbus_message_sched(hv_cpu, hv_cpu->hyp_synic_message_page);
+	vmbus_message_sched(hv_cpu, hv_cpu->para_synic_message_page);
 
 	add_interrupt_randomness(vmbus_interrupt);
 }
@@ -1355,6 +1376,60 @@ static void vmbus_percpu_work(struct work_struct *work)
 	hv_synic_init(cpu);
 }
 
+static int vmbus_alloc_synic_and_connect(void)
+{
+	int ret, cpu;
+	struct work_struct __percpu *works;
+	int hyperv_cpuhp_online;
+
+	ret = hv_synic_alloc();
+	if (ret < 0)
+		goto err_alloc;
+
+
+	works = alloc_percpu(struct work_struct);
+	if (!works) {
+		ret = -ENOMEM;
+		goto err_alloc;
+	}
+
+	/*
+	 * Initialize the per-cpu interrupt state and stimer state.
+	 * Then connect to the host.
+	 */
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		struct work_struct *work = per_cpu_ptr(works, cpu);
+
+		INIT_WORK(work, vmbus_percpu_work);
+		schedule_work_on(cpu, work);
+	}
+
+	for_each_online_cpu(cpu)
+		flush_work(per_cpu_ptr(works, cpu));
+
+	/* Register the callbacks for possible CPU online/offline'ing */
+	ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
+						   hv_synic_init, hv_synic_cleanup);
+	cpus_read_unlock();
+	free_percpu(works);
+	if (ret < 0)
+		goto err_alloc;
+	hyperv_cpuhp_online = ret;
+
+	ret = vmbus_connect();
+	if (ret)
+		goto err_connect;
+	return 0;
+
+err_connect:
+	cpuhp_remove_state(hyperv_cpuhp_online);
+	return -ENODEV;
+err_alloc:
+	hv_synic_free();
+	return -ENOMEM;
+}
+
 /*
  * vmbus_bus_init -Main vmbus driver initialization routine.
  *
@@ -1365,8 +1440,7 @@ static void vmbus_percpu_work(struct work_struct *work)
  */
 static int vmbus_bus_init(void)
 {
-	int ret, cpu;
-	struct work_struct __percpu *works;
+	int ret;
 
 	ret = hv_init();
 	if (ret != 0) {
@@ -1401,41 +1475,21 @@ static int vmbus_bus_init(void)
 		}
 	}
 
-	ret = hv_synic_alloc();
-	if (ret)
-		goto err_alloc;
-
-	works = alloc_percpu(struct work_struct);
-	if (!works) {
-		ret = -ENOMEM;
-		goto err_alloc;
-	}
-
 	/*
-	 * Initialize the per-cpu interrupt state and stimer state.
-	 * Then connect to the host.
+	 * Attempt to establish the confidential VMBus connection first if this VM is
+	 * a hardware confidential VM, and the paravisor is present.
 	 */
-	cpus_read_lock();
-	for_each_online_cpu(cpu) {
-		struct work_struct *work = per_cpu_ptr(works, cpu);
+	ret = -ENODEV;
+	if (ms_hyperv.paravisor_present && (hv_isolation_type_tdx() || hv_isolation_type_snp())) {
+		is_confidential = true;
+		ret = vmbus_alloc_synic_and_connect();
+		is_confidential = !ret;
 
-		INIT_WORK(work, vmbus_percpu_work);
-		schedule_work_on(cpu, work);
+		pr_info("VMBus is confidential: %d\n", is_confidential);
 	}
 
-	for_each_online_cpu(cpu)
-		flush_work(per_cpu_ptr(works, cpu));
-
-	/* Register the callbacks for possible CPU online/offline'ing */
-	ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
-						   hv_synic_init, hv_synic_cleanup);
-	cpus_read_unlock();
-	free_percpu(works);
-	if (ret < 0)
-		goto err_alloc;
-	hyperv_cpuhp_online = ret;
-
-	ret = vmbus_connect();
+	if (!is_confidential)
+		ret = vmbus_alloc_synic_and_connect();
 	if (ret)
 		goto err_connect;
 
@@ -1451,9 +1505,6 @@ static int vmbus_bus_init(void)
 	return 0;
 
 err_connect:
-	cpuhp_remove_state(hyperv_cpuhp_online);
-err_alloc:
-	hv_synic_free();
 	if (vmbus_irq == -1) {
 		hv_remove_vmbus_handler();
 	} else {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH hyperv-next v3 15/15] Drivers: hv: Set the default VMBus version to 6.0
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (13 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 14/15] Drivers: hv: Support establishing the confidential VMBus connection Roman Kisel
@ 2025-06-04  0:43 ` Roman Kisel
  2025-06-18 16:13 ` [PATCH hyperv-next v3 00/15] Confidential VMBus Michael Kelley
  15 siblings, 0 replies; 36+ messages in thread
From: Roman Kisel @ 2025-06-04  0:43 UTC (permalink / raw)
  To: alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui, haiyangz,
	hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

The confidential VMBus is supported by the protocol version
6.0 onwards.

Attempt to establish the VMBus 6.0 connection thus enabling
the confidential VMBus features when available.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/connection.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index eeb472019d69..7cd43463f969 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -51,6 +51,7 @@ EXPORT_SYMBOL_GPL(vmbus_proto_version);
  * Linux guests and are not listed.
  */
 static __u32 vmbus_versions[] = {
+	VERSION_WIN10_V6_0,
 	VERSION_WIN10_V5_3,
 	VERSION_WIN10_V5_2,
 	VERSION_WIN10_V5_1,
@@ -65,7 +66,7 @@ static __u32 vmbus_versions[] = {
  * Maximal VMBus protocol version guests can negotiate.  Useful to cap the
  * VMBus version for testing and debugging purpose.
  */
-static uint max_version = VERSION_WIN10_V5_3;
+static uint max_version = VERSION_WIN10_V6_0;
 
 module_param(max_version, uint, S_IRUGO);
 MODULE_PARM_DESC(max_version,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 01/15] Documentation: hyperv: Confidential VMBus
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
@ 2025-06-04  2:56   ` Randy Dunlap
  2025-06-04 12:53   ` Jonathan Corbet
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Randy Dunlap @ 2025-06-04  2:56 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari, arnd, bp, corbet, dave.hansen, decui,
	haiyangz, hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch,
	linux-doc, linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

Hi--

On 6/3/25 5:43 PM, Roman Kisel wrote:
> Define what the confidential VMBus is and describe what advantages
> it offers on the capable hardware.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
> ---
>  Documentation/virt/hyperv/coco.rst | 125 ++++++++++++++++++++++++++++-
>  1 file changed, 124 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/hyperv/coco.rst b/Documentation/virt/hyperv/coco.rst
> index c15d6fe34b4e..b4904b64219d 100644
> --- a/Documentation/virt/hyperv/coco.rst
> +++ b/Documentation/virt/hyperv/coco.rst
> @@ -178,7 +178,7 @@ These Hyper-V and VMBus memory pages are marked as decrypted:
>  
>  * VMBus monitor pages
>  
> -* Synthetic interrupt controller (synic) related pages (unless supplied by
> +* Synthetic interrupt controller (SynIC) related pages (unless supplied by
>    the paravisor)
>  
>  * Per-cpu hypercall input and output pages (unless running with a paravisor)
> @@ -258,3 +258,126 @@ normal page fault is generated instead of #VC or #VE, and the page-fault-
>  based handlers for load_unaligned_zeropad() fixup the reference. When the
>  encrypted/decrypted transition is complete, the pages are marked as "present"
>  again. See hv_vtom_clear_present() and hv_vtom_set_host_visibility().
> +
> +Confidential VMBus
> +------------------
> +
> +The confidential VMBus enables the confidential guest not to interact with the
> +untrusted host partition and the untrusted hypervisor. Instead, the guest relies
> +on the trusted paravisor to communicate with the devices processing sensitive
> +data. The hardware (SNP or TDX) encrypts the guest memory and the register state
> +while measuring the paravisor image using the platform security processor to
> +ensure trusted and confidential computing.
> +
> +Confidential VMBus provides a secure communication channel between the guest and
> +the paravisor, ensuring that sensitive data is protected from  hypervisor-level
                                                                ^^
                                                              s/  / /

> +access through memory encryption and register state isolation.
> +
> +The unencrypted data never leaves the VM so neither the host partition nor the
> +hypervisor can access it at all. In addition to that, the guest only needs to
> +establish a VMBus connection with the paravisor for the channels that process
> +sensitive data, and the paravisor abstracts the details of communicating with
> +the specific devices away.
> +
> +Confidential VMBus is an extension of Confidential Computing (CoCo) VMs
> +(a.k.a. "Isolated" VMs in Hyper-V terminology). Without Confidential VMBus,
> +guest VMBus device drivers (the "VSC"s in VMBus terminology) communicate
> +with VMBus servers (the VSPs) running on the Hyper-V host. The
> +communication must be through memory that has been decrypted so the
> +host can access it. With Confidential VMBus, one or more of the VSPs reside
> +in the trusted paravisor layer in the guest VM. Since the paravisor layer also
> +operates in encrypted memory, the memory used for communication with
> +such VSPs does not need to be decrypted and thereby exposed to the
> +Hyper-V host. The paravisor is responsible for communicating securely
> +with the Hyper-V host as necessary. In some cases (e.g. time synchonization,
> +key-value pairs exchange) the unencrypted data doesn't need to be communicated
> +with the host at all, and a conventional VMBus connection suffices.
> +
> +Here is the data flow for a conventional VMBus connection and the Confidential
> +VMBus connection (C stands for the client or VSC, S for the server or VSP):
> +
> ++---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> +|               |       |                 ==========                |
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> ++----- C -------+       +-----------------+        +------- S ------+
> +       ||                                                   ||
> +       ||                                                   ||
> ++------||------------------ VMBus --------------------------||------+
> +|                     Interrupts, MMIO                              |
> ++-------------------------------------------------------------------+
> +
> ++---- GUEST --------------- VTL0 ------+               +-- DEVICE --+
> +|                                      |               |            |
> +| +- PARAVISOR --------- VTL2 -----+   |               |            |
> +| |     +-- VMBus Relay ------+    ====+================            |
> +| |     |   Interrupts, MMIO  |    |   |               |            |
> +| |     +-------- S ----------+    |   |               +------------+
> +| |               ||               |   |
> +| +---------+     ||               |   |
> +| |  Linux  |     ||    OpenHCL    |   |
> +| |  kernel |     ||               |   |
> +| +---- C --+-----||---------------+   |
> +|       ||        ||                   |
> ++-------++------- C -------------------+               +------------+
> +        ||                                             |    HOST    |
> +        ||                                             +---- S -----+
> ++-------||----------------- VMBus ---------------------------||-----+
> +|                     Interrupts, MMIO                              |
> ++-------------------------------------------------------------------+
> +
> +An implementation of the VMBus relay that offers the Confidential VMBus channels
> +is available in the OpenVMM project as a part of the OpenHCL paravisor. Please
> +refer to https://openvmm.dev/ and https://github.com/microsoft/openvmm for more
> +information about the OpenHCL paravisor.
> +
> +A guest that is running with a paravisor must determine at runtime if
> +Confidential VMBus is supported by the current paravisor. It does so by first
> +trying to establish a Confidential VMBus connection with the paravisor using
> +standard mechanisms where the memory remains encrypted. If this succeeds,
> +then the guest can proceed to use Confidential VMBus. If it fails, then the
> +guest must fallback to establishing a non-Confidential VMBus connection with
> +the Hyper-V host.
> +
> +Confidential VMBus is a characteristic of the VMBus connection as a whole,
> +and of each VMBus channel that is created. When a Confidential VMBus
> +connection is established, the paravisor provides the guest the message-passing
> +path that is used for VMBus device creation and deletion, and it provides a
> +per-CPU synthetic interrupt controller (SynIC) just like the SynIC that is
> +offered by the Hyper-V host. Each VMBus device that is offered to the guest
> +indicates the degree to which it participates in Confidential VMBus. The offer
> +indicates if the device uses encrypted ring buffers, and if the device uses
> +encrypted memory for DMA that is done outside the ring buffer. These settings
> +may be different for different devices using the same Confidential VMBus
> +connection.

Various lines throughout:
Please try to keep line lengths to <= 80 characters foe people who read
documentation in a terminal window.

> +Although these settings are separate, in practice it'll always be encrypted
> +ring buffer only or both encrypted ring buffer and external data. If a channel
> +is offered by the paravisor with confidential VMBus, the ring buffer can always
> +be encrypted since it's strictly for communication between the VTL2 paravisor
> +and the VTL0 guest. However, other memory regions are often used for e.g. DMA,
> +so they need to be accessible by the underlying hardware, and must be unencrypted
> +(unless the device supports encrypted memory). Currently, there are no any VSPs

                                                                       not

> +in OpenHCL that support encrypted external memory, but we will use it in the
> +future.
> +
> +Because some devices on a Confidential VMBus may require decrypted ring buffers
> +and DMA transfers, the guest must interact with two SynICs -- the one provided
> +by the paravisor and the one provided by the Hyper-V host when Confidential
> +VMBus is not offered. Interrupts are always signaled by the paravisor SynIC, but
> +the guest must check for messages and for channel interrupts on both SynICs.
> +
> +In the case of a confidential VM, regular SynIC access by the guest is
> +intercepted by the paravisor (this includes various MSRs such as the SIMP and
> +SIEFP, as well as hypercalls like HvPostMessage and HvSignalEvent). If the guest
> +actually wants to communicate with the hypervisor, it has to use special mechanisms
> +(GHCB page on SNP, or tdcall on TDX). Messages will always be one or the other:
> +with confidential VMBus, all messages use the paravisor SynIC, otherwise they all
> +use the hypervisor SynIC. For interrupt signaling, though, some channels may be
> +running on the host (non-confidential, using the VMBus relay) and use the hypervisor
> +SynIC, and some on the paravisor and use its SynIC. The RelIDs are coordinated by
> +the OpenHCL VMBus server and are guaranteed to be unique regardless of whether
> +the channel originated on the host or the paravisor.

-- 
~Randy


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 01/15] Documentation: hyperv: Confidential VMBus
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
  2025-06-04  2:56   ` Randy Dunlap
@ 2025-06-04 12:53   ` Jonathan Corbet
  2025-06-04 13:20   ` ALOK TIWARI
  2025-06-18 16:17   ` Michael Kelley
  3 siblings, 0 replies; 36+ messages in thread
From: Jonathan Corbet @ 2025-06-04 12:53 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari, arnd, bp, dave.hansen, decui,
	haiyangz, hpa, kys, mingo, mhklinux, tglx, wei.liu, linux-arch,
	linux-doc, linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut

Roman Kisel <romank@linux.microsoft.com> writes:

> Define what the confidential VMBus is and describe what advantages
> it offers on the capable hardware.
>
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
> ---
>  Documentation/virt/hyperv/coco.rst | 125 ++++++++++++++++++++++++++++-
>  1 file changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/hyperv/coco.rst b/Documentation/virt/hyperv/coco.rst
> index c15d6fe34b4e..b4904b64219d 100644
> --- a/Documentation/virt/hyperv/coco.rst
> +++ b/Documentation/virt/hyperv/coco.rst

[...]

> +
> +Here is the data flow for a conventional VMBus connection and the Confidential
> +VMBus connection (C stands for the client or VSC, S for the server or VSP):
> +
> ++---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> +|               |       |                 ==========                |
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> ++----- C -------+       +-----------------+        +------- S ------+

This will generate all kinds of errors during the docs build and will
not give the results you are looking for - please actually build the
docs after making a change like this.

You need to put the diagram into a literal block.

Thanks,

jon

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 01/15] Documentation: hyperv: Confidential VMBus
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
  2025-06-04  2:56   ` Randy Dunlap
  2025-06-04 12:53   ` Jonathan Corbet
@ 2025-06-04 13:20   ` ALOK TIWARI
  2025-06-18 16:17   ` Michael Kelley
  3 siblings, 0 replies; 36+ messages in thread
From: ALOK TIWARI @ 2025-06-04 13:20 UTC (permalink / raw)
  To: Roman Kisel, arnd, bp, corbet, dave.hansen, decui, haiyangz, hpa,
	kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut



On 04-06-2025 06:13, Roman Kisel wrote:
> +Confidential VMBus is an extension of Confidential Computing (CoCo) VMs
> +(a.k.a. "Isolated" VMs in Hyper-V terminology). Without Confidential VMBus,
> +guest VMBus device drivers (the "VSC"s in VMBus terminology) communicate
> +with VMBus servers (the VSPs) running on the Hyper-V host. The
> +communication must be through memory that has been decrypted so the
> +host can access it. With Confidential VMBus, one or more of the VSPs reside
> +in the trusted paravisor layer in the guest VM. Since the paravisor layer also
> +operates in encrypted memory, the memory used for communication with
> +such VSPs does not need to be decrypted and thereby exposed to the
> +Hyper-V host. The paravisor is responsible for communicating securely
> +with the Hyper-V host as necessary. In some cases (e.g. time synchonization,

Typo synchonization -> synchronization

> +key-value pairs exchange) the unencrypted data doesn't need to be communicated
> +with the host at all, and a conventional VMBus connection suffices.



Thanks,
Alok

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor
  2025-06-04  0:43 ` [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
@ 2025-06-04 13:27   ` ALOK TIWARI
  2025-06-18 16:17   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: ALOK TIWARI @ 2025-06-04 13:27 UTC (permalink / raw)
  To: Roman Kisel, arnd, bp, corbet, dave.hansen, decui, haiyangz, hpa,
	kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut



On 04-06-2025 06:13, Roman Kisel wrote:
> The confidential VMBus is built on the guest talking to the
> paravisor only.
> 
> Provide functions that allow manipulating the SynIC registers
> via paravisor. No functional changes.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
> ---
>   arch/x86/kernel/cpu/mshyperv.c | 44 ++++++++++++++++++++++++++++++++++
>   drivers/hv/hv_common.c         | 13 ++++++++++
>   include/asm-generic/mshyperv.h |  2 ++
>   3 files changed, 59 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 3e2533954675..83a85d94bcb3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -89,6 +89,50 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value)
>   }
>   EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
>   
> +/*
> + * Attempt to get the SynIC register value from the paravisor.
> + *
> + * Not all paravisors support reading SynIC registers, so this function
> + * may fail. The register for the SynIC of the running CPU is accessed.
> + *
> + * Writes the SynIC register value into the memory pointed by val,
> + * and ~0ULL is on failure.
> + *
> + * Returns -ENODEV if the MSR is not a SynIC register, or another error
> + * code if getting the MSR fails (meaning the paravisor doesn't support
> + * relaying VMBus comminucations).

comminucations -> communications

> + */
> +int hv_para_get_synic_register(unsigned int reg, u64 *val)
> +{
> +	u64 reg_val = ~0ULL;
> +	int err = -ENODEV;
> +
> +	if (hv_is_synic_msr(reg))
> +		reg_val = native_read_msr_safe(reg, &err);
> +	*val = reg_val;
> +
> +	return err;
> +}
> +
> +/*
> + * Attempt to set the SynIC register value with the paravisor.
> + *
> + * Not all paravisors support reading SynIC registers, so this function
> + * may fail. The register for the SynIC of the running CPU is accessed.

you are writing, not reading. ?
-> "Not all paravisors support writing SynIC registers"

> + *
> + * Sets the register to the value supplied.
> + *
> + * Returns: -ENODEV if the MSR is not a SynIC register, or another error
> + * code if writing to the MSR fails (meaning the paravisor doesn't support
> + * relaying VMBus comminucations).
> + */
> +int hv_para_set_synic_register(unsigned int reg, u64 val)
> +{
> +	if (!hv_is_synic_msr(reg))
> +		return -ENODEV;
> +	return wrmsrl_safe(reg, val);
> +}


Thanks,
Alok

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs
  2025-06-04  0:43 ` [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs Roman Kisel
@ 2025-06-04 13:38   ` ALOK TIWARI
  2025-06-18 16:18   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: ALOK TIWARI @ 2025-06-04 13:38 UTC (permalink / raw)
  To: Roman Kisel, arnd, bp, corbet, dave.hansen, decui, haiyangz, hpa,
	kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut



On 04-06-2025 06:13, Roman Kisel wrote:
> +			/*
> +			 * Write proxy bit in the case of non-confidential VMBus.
> +			 * Using wrmsl instruction so the following goes to the paravisor.

wrmsl -> wrmsrl

> +			 */
> +			u32 proxy = vmbus_is_confidential() ? 0 : 1;
> +
> +			value |= (proxy << 20);
> +			native_wrmsrl(reg, value);
> +		}
>   	} else {


Thanks,
Alok

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available
  2025-06-04  0:43 ` [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available Roman Kisel
@ 2025-06-04 13:48   ` ALOK TIWARI
  2025-06-18 16:18   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: ALOK TIWARI @ 2025-06-04 13:48 UTC (permalink / raw)
  To: Roman Kisel, arnd, bp, corbet, dave.hansen, decui, haiyangz, hpa,
	kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut



On 04-06-2025 06:13, Roman Kisel wrote:
> -	if (ms_hyperv.paravisor_present) {
> +	if (ms_hyperv.paravisor_present && !vmbus_is_confidential()) {

it is, not vmbus_is_confidential() ?

>   		if (hv_isolation_type_tdx())
>   			status = hv_tdx_hypercall(HVCALL_POST_MESSAGE,
>   						  virt_to_phys(aligned_msg), 0);


Thanks,
Alok

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels
  2025-06-04  0:43 ` [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels Roman Kisel
@ 2025-06-04 14:15   ` ALOK TIWARI
  2025-06-18 16:19   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: ALOK TIWARI @ 2025-06-04 14:15 UTC (permalink / raw)
  To: Roman Kisel, arnd, bp, corbet, dave.hansen, decui, haiyangz, hpa,
	kys, mingo, mhklinux, tglx, wei.liu, linux-arch, linux-doc,
	linux-hyperv, linux-kernel, x86
  Cc: apais, benhill, bperkins, sunilmut



On 04-06-2025 06:13, Roman Kisel wrote:
> To run a confidential VMBus channels, one has to initialize the
> co_ring_buffers and co_external_memory fields of the channel
> structure.
> 
> Advertise support upon negoatiating the version and compute
> values for those fields and initialize them.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>   drivers/hv/channel_mgmt.c | 19 +++++++++++++++++++
>   drivers/hv/connection.c   |  3 +++
>   2 files changed, 22 insertions(+)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index ca2fe10c110a..33bc29e826bd 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -1021,6 +1021,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>   	struct vmbus_channel_offer_channel *offer;
>   	struct vmbus_channel *oldchannel, *newchannel;
>   	size_t offer_sz;
> +	bool co_ring_buffer, co_external_memory;
>   
>   	offer = (struct vmbus_channel_offer_channel *)hdr;
>   
> @@ -1033,6 +1034,22 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>   		return;
>   	}
>   
> +	co_ring_buffer = is_co_ring_buffer(offer);
> +	if (co_ring_buffer) {
> +		if (vmbus_proto_version < VERSION_WIN10_V6_0 || !vmbus_is_confidential()) {
> +			atomic_dec(&vmbus_connection.offer_in_progress);
> +			return;
> +		}
> +	}
> +
> +	co_external_memory = is_co_external_memory(offer);
> +	if (is_co_external_memory(offer)) {

  Redundant call for is_co_external_memory()
  if(co_external_memory)

> +		if (vmbus_proto_version < VERSION_WIN10_V6_0 || !vmbus_is_confidential()) {
> +			atomic_dec(&vmbus_connection.offer_in_progress);
> +			return;
> +		}
> +	}
> +
>   	oldchannel = find_primary_channel_by_offer(offer);
>   
>   	if (oldchannel != NULL) {
> @@ -1111,6 +1128,8 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>   		pr_err("Unable to allocate channel object\n");
>   		return;
>   	}
> +	newchannel->co_ring_buffer = co_ring_buffer;
> +	newchannel->co_external_memory = co_external_memory;
>   
>   	vmbus_setup_channel_state(newchannel, offer);
>   
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index be490c598785..eeb472019d69 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -105,6 +105,9 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
>   		vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
>   	}
>   
> +	if (vmbus_is_confidential() && version >= VERSION_WIN10_V6_0)
> +		msg->feature_flags = VMBUS_FEATURE_FLAG_CONFIDENTIAL_CHANNELS;
> +
>   	/*
>   	 * shared_gpa_boundary is zero in non-SNP VMs, so it's safe to always
>   	 * bitwise OR it


Thanks,
Alok

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 00/15] Confidential VMBus
  2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
                   ` (14 preceding siblings ...)
  2025-06-04  0:43 ` [PATCH hyperv-next v3 15/15] Drivers: hv: Set the default VMBus version to 6.0 Roman Kisel
@ 2025-06-18 16:13 ` Michael Kelley
  15 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:13 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:43 PM
> 
> The guests running on Hyper-V can be confidential where the memory and the
> register content are encrypted, provided that the hardware supports that
> (currently support AMD SEV-SNP and Intel TDX is implemented) and the guest
> is capable of using these features. The confidential guests cannot be
> introspected by the host nor the hypervisor without the guest sharing the
> memory contents upon doing which the memory is decrypted.
> 
> In the confidential guests, neither the host nor the hypervisor need to be
> trusted, and the guests processing sensitive data can take advantage of that.
> 
> Not trusting the host and the hypervisor (removing them from the Trusted
> Computing Base aka TCB) ncessitates that the method of communication
> between the host and the guest be changed. Below there is the breakdown of
> the options used in the both cases (in the diagrams below the server is
> marked as S, the client is marked as C):
> 
> 1. Without the paravisoor the devices are connected to the host, and the
> host provides the device emulation or translation to the guest:
> 
> +---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
> |               |       |                 |        |                |
> |               |       |                 |        |                |
> |               |       |                 ==========                |
> |               |       |                 |        |                |
> |               |       |                 |        |                |
> |               |       |                 |        |                |
> +----- C -------+       +-----------------+        +------- S ------+
>        ||                                                   ||
>        ||                                                   ||
> +------||------------------ VMBus --------------------------||------+
> |                     Interrupts, MMIO                              |
> +-------------------------------------------------------------------+
> 
> 2. With the paravisor, the devices are connected to the paravisor, and
> the paravisor provides the device emulation or translation to the guest.
> The guest doesn't communicate with the host directly, and the guest
> communicates with the paravisor via the VMBus. The host is not trusted
> in this model, and the paravisor is trusted:
> 
> +---- GUEST --------------- VTL0 ------+               +-- DEVICE --+
> |                                      |               |            |
> | +- PARAVISOR --------- VTL2 -----+   |               |            |
> | |     +-- VMBus Relay ------+    ====+================            |
> | |     |   Interrupts, MMIO  |    |   |               |            |
> | |     +-------- S ----------+    |   |               +------------+
> | |               ||               |   |
> | +---------+     ||               |   |
> | |  Linux  |     ||    OpenHCL    |   |
> | |  kernel |     ||               |   |
> | +---- C --+-----||---------------+   |
> |       ||        ||                   |
> +-------++------- C -------------------+               +------------+
>         ||                                             |    HOST    |
>         ||                                             +---- S -----+
> +-------||----------------- VMBus ---------------------------||-----+
> |                     Interrupts, MMIO                              |
> +-------------------------------------------------------------------+
> 
> Note that in the second case the guest doesn't need to share the memory
> with the host as it communicates only with the paravisor within their
> partition boundary. That is precisely the raison d'etre and the value
> proposition of this patch series: equip the confidential guest to use
> private (encrypted) memory and rely on the paravisor when this is
> available to be more secure.

I still have a fairly fundamental question about the value proposition.
The paravisor runs as part of the VM context and so is outside the host.
It should not trust the host any more than the guest does. For operations
that require the host, such as doing disk I/O, does the paravisor provide
any security benefit over the guest just talking directly to the host? Relying
on the paravisor is great, but doesn't that just push the problem from the
guest to the paravisor, which is no better equipped to solve it than the guest?
That's what I'm trying to get clear on.

> 
> An implementation of the VMBus relay that offers the Confidential VMBus channels
> is available in the OpenVMM project as a part of the OpenHCL paravisor. Please
> refer to https://openvmm.dev/guide/ for more
> information about the OpenHCL paravisor.
> 
> I'd like to thank the following people for their help with this
> patch series:
> 
> * Dexuan for help with validation and the fruitful discussions,
> * Easwar for reviewing the refactoring of the page allocating and
>   freeing in `hv.c`,
> * John and Sven for the design,
> * Mike for helping to avoid pitfalls when dealing with the GFP flags,
> * Sven for blazing the trail and implementing the design in few
>   codebases.
> 
> I made sure to validate the patch series on
> 
>     {TrustedLaunch(x86_64), OpenHCL} x
>     {SNP(x86_64), TDX(x86_64), No hardware isolation, No paravisor} x
>     {VMBus 5.0, VMBus 6.0} x
>     {arm64, x86_64}.
> 
> [V3]
>     - The patch series is rebased on top of the latest hyperv-next branch.
>     - Reworked the "wiring" diagram in the cover letter, added links to the
>       OpenVMM project and the OpenHCL paravisor.
> 
>     - More precise wording in the comments and clearer code.
>     **Thank you, Alok!**
> 
>     - Reworked the documentation patch.
>     - Split the patchset into much more granular patches.
>     - Various fixes and improvements throughout the patch series.
>     **Thank you, Michael!**

I have fully reviewed v3 and am sending comments on most of the
individual patches. The split into more granular patches works really
well and was a big help in doing the review. I did not have comments
on a couple of the patches, but I have not given my Reviewed-by yet,
pending resolution of the other issues I've pointed out.

Michael

> 
> [V2] https://lore.kernel.org/linux-hyperv/20250511230758.160674-1-romank@linux.microsoft.com/
>     - The patch series is rebased on top of the latest hyperv-next branch.
> 
>     - Better wording in the commit messages and the Documentation.
>     **Thank you, Alok and Wei!**
> 
>     - Removed the patches 5 and 6 concerning turning bounce buffering off from
>       the previous version of the patch series as they were found to be
>       architecturally unsound. The value proposition of the patch series is not
>       diminished by this removal: these patches were an optimization and only for
>       the storage (for the simplicity sake) but not for the network. These changes
>       might be proposed in the future again after revolving the issues.
>     ** Thanks you, Christoph, Dexuan, Dan, Michael, James, Robin! **
> 
> [V1] https://lore.kernel.org/linux-hyperv/20250409000835.285105-1-romank@linux.microsoft.com/
> 
> Roman Kisel (15):
>   Documentation: hyperv: Confidential VMBus
>   drivers: hv: VMBus protocol version 6.0
>   arch: hyperv: Get/set SynIC synth.registers via paravisor
>   arch/x86: mshyperv: Trap on access for some synthetic MSRs
>   Drivers: hv: Rename fields for SynIC message and event pages
>   Drivers: hv: Allocate the paravisor SynIC pages when required
>   Drivers: hv: Post messages via the confidential VMBus if available
>   Drivers: hv: remove stale comment
>   Drivers: hv: Use memunmap() to check if the address is in IO map
>   Drivers: hv: Rename the SynIC enable and disable routines
>   Drivers: hv: Functions for setting up and tearing down the paravisor
>     SynIC
>   Drivers: hv: Allocate encrypted buffers when requested
>   Drivers: hv: Support confidential VMBus channels
>   Drivers: hv: Support establishing the confidential VMBus connection
>   Drivers: hv: Set the default VMBus version to 6.0
> 
>  Documentation/virt/hyperv/coco.rst | 125 ++++++++-
>  arch/x86/kernel/cpu/mshyperv.c     |  67 ++++-
>  drivers/hv/channel.c               |  43 ++--
>  drivers/hv/channel_mgmt.c          |  27 +-
>  drivers/hv/connection.c            |   6 +-
>  drivers/hv/hv.c                    | 399 ++++++++++++++++++++---------
>  drivers/hv/hv_common.c             |  13 +
>  drivers/hv/hyperv_vmbus.h          |  28 +-
>  drivers/hv/mshv_root.h             |   2 +-
>  drivers/hv/mshv_synic.c            |   6 +-
>  drivers/hv/ring_buffer.c           |   5 +-
>  drivers/hv/vmbus_drv.c             | 187 +++++++++-----
>  include/asm-generic/mshyperv.h     |   3 +
>  include/linux/hyperv.h             |  69 +++--
>  14 files changed, 740 insertions(+), 240 deletions(-)
> 
> 
> base-commit: 96959283a58d91ae20d025546f00e16f0a555208
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 01/15] Documentation: hyperv: Confidential VMBus
  2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
                     ` (2 preceding siblings ...)
  2025-06-04 13:20   ` ALOK TIWARI
@ 2025-06-18 16:17   ` Michael Kelley
  3 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:17 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:43 PM
> 
> Define what the confidential VMBus is and describe what advantages
> it offers on the capable hardware.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
> ---
>  Documentation/virt/hyperv/coco.rst | 125 ++++++++++++++++++++++++++++-
>  1 file changed, 124 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/hyperv/coco.rst b/Documentation/virt/hyperv/coco.rst
> index c15d6fe34b4e..b4904b64219d 100644
> --- a/Documentation/virt/hyperv/coco.rst
> +++ b/Documentation/virt/hyperv/coco.rst
> @@ -178,7 +178,7 @@ These Hyper-V and VMBus memory pages are marked as
> decrypted:
> 
>  * VMBus monitor pages
> 
> -* Synthetic interrupt controller (synic) related pages (unless supplied by
> +* Synthetic interrupt controller (SynIC) related pages (unless supplied by
>    the paravisor)
> 
>  * Per-cpu hypercall input and output pages (unless running with a paravisor)
> @@ -258,3 +258,126 @@ normal page fault is generated instead of #VC or #VE, and the page-fault-
>  based handlers for load_unaligned_zeropad() fixup the reference. When the
>  encrypted/decrypted transition is complete, the pages are marked as "present"
>  again. See hv_vtom_clear_present() and hv_vtom_set_host_visibility().
> +
> +Confidential VMBus
> +------------------

Maybe put this section immediately after the "Guest communication with
Hyper-V" section, since it is an extension of that topic. Then leave the
load_unaligned_zeropad() section as last.

> +
> +The confidential VMBus enables the confidential guest not to interact with the
> +untrusted host partition and the untrusted hypervisor. Instead, the guest relies
> +on the trusted paravisor to communicate with the devices processing sensitive
> +data. The hardware (SNP or TDX) encrypts the guest memory and the register state
> +while measuring the paravisor image using the platform security processor to
> +ensure trusted and confidential computing.
> +
> +Confidential VMBus provides a secure communication channel between the guest and
> +the paravisor, ensuring that sensitive data is protected from  hypervisor-level
> +access through memory encryption and register state isolation.
> +
> +The unencrypted data never leaves the VM 

Is this statement really true? The VM contains both the guest and the paravisor,
so it's also a statement about the paravisor. I don't know what the paravisor
does for channels it is just proxying. Presumably it must communicate the
unencrypted data to the host.

> +so neither the host partition nor the
> +hypervisor can access it at all. In addition to that, the guest only needs to
> +establish a VMBus connection with the paravisor for the channels that process
> +sensitive data, and the paravisor abstracts the details of communicating with
> +the specific devices away.
> +
> +Confidential VMBus is an extension of Confidential Computing (CoCo) VMs
> +(a.k.a. "Isolated" VMs in Hyper-V terminology). Without Confidential VMBus,
> +guest VMBus device drivers (the "VSC"s in VMBus terminology) communicate
> +with VMBus servers (the VSPs) running on the Hyper-V host. The
> +communication must be through memory that has been decrypted so the
> +host can access it. With Confidential VMBus, one or more of the VSPs reside
> +in the trusted paravisor layer in the guest VM. Since the paravisor layer also
> +operates in encrypted memory, the memory used for communication with
> +such VSPs does not need to be decrypted and thereby exposed to the
> +Hyper-V host. The paravisor is responsible for communicating securely
> +with the Hyper-V host as necessary. In some cases (e.g. time synchonization,
> +key-value pairs exchange

I'm not sure about key-value pairs exchange. I thought the whole point of
KVP is to communicate information between the guest and the host.

> +) the unencrypted data doesn't need to be communicated
> +with the host at all, and a conventional VMBus connection suffices.
> +
> +Here is the data flow for a conventional VMBus connection and the Confidential
> +VMBus connection (C stands for the client or VSC, S for the server or VSP):
> +
> ++---- GUEST ----+       +----- DEVICE ----+        +----- HOST -----+
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> +|               |       |                 ==========                |
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> +|               |       |                 |        |                |
> ++----- C -------+       +-----------------+        +------- S ------+
> +       ||                                                   ||
> +       ||                                                   ||
> ++------||------------------ VMBus --------------------------||------+
> +|                     Interrupts, MMIO                              |
> ++-------------------------------------------------------------------+
> +
> ++---- GUEST --------------- VTL0 ------+               +-- DEVICE --+
> +|                                      |               |            |
> +| +- PARAVISOR --------- VTL2 -----+   |               |            |
> +| |     +-- VMBus Relay ------+    ====+================            |
> +| |     |   Interrupts, MMIO  |    |   |               |            |
> +| |     +-------- S ----------+    |   |               +------------+
> +| |               ||               |   |
> +| +---------+     ||               |   |
> +| |  Linux  |     ||    OpenHCL    |   |
> +| |  kernel |     ||               |   |
> +| +---- C --+-----||---------------+   |
> +|       ||        ||                   |
> ++-------++------- C -------------------+               +------------+
> +        ||                                             |    HOST    |
> +        ||                                             +---- S -----+
> ++-------||----------------- VMBus ---------------------------||-----+
> +|                     Interrupts, MMIO                              |
> ++-------------------------------------------------------------------+
> +
> +An implementation of the VMBus relay that offers the Confidential VMBus channels
> +is available in the OpenVMM project as a part of the OpenHCL paravisor. Please
> +refer to https://openvmm.dev/guide/ for more
> +information about the OpenHCL paravisor.
> +
> +A guest that is running with a paravisor must determine at runtime if
> +Confidential VMBus is supported by the current paravisor. It does so by first
> +trying to establish a Confidential VMBus connection with the paravisor using
> +standard mechanisms where the memory remains encrypted. If this succeeds,
> +then the guest can proceed to use Confidential VMBus. If it fails, then the
> +guest must fallback to establishing a non-Confidential VMBus connection with
> +the Hyper-V host.
> +
> +Confidential VMBus is a characteristic of the VMBus connection as a whole,
> +and of each VMBus channel that is created. When a Confidential VMBus
> +connection is established, the paravisor provides the guest the message-passing
> +path that is used for VMBus device creation and deletion, and it provides a
> +per-CPU synthetic interrupt controller (SynIC) just like the SynIC that is
> +offered by the Hyper-V host. Each VMBus device that is offered to the guest
> +indicates the degree to which it participates in Confidential VMBus. The offer
> +indicates if the device uses encrypted ring buffers, and if the device uses
> +encrypted memory for DMA that is done outside the ring buffer. These settings
> +may be different for different devices using the same Confidential VMBus
> +connection.
> +
> +Although these settings are separate, in practice it'll always be encrypted
> +ring buffer only or both encrypted ring buffer and external data. If a channel

s/ring buffer only or both/ring buffer only, or both/

> +is offered by the paravisor with confidential VMBus, the ring buffer can always
> +be encrypted since it's strictly for communication between the VTL2 paravisor
> +and the VTL0 guest. However, other memory regions are often used for e.g. DMA,
> +so they need to be accessible by the underlying hardware, and must be unencrypted
> +(unless the device supports encrypted memory). Currently, there are no any VSPs

s/no any/not any/

> +in OpenHCL that support encrypted external memory, but we will use it in the

Avoid personal pronouns like "we".  Suggest this, or something similar:

"but future versions are expected to enable this capability."

> +future.
> +
> +Because some devices on a Confidential VMBus may require decrypted ring buffers
> +and DMA transfers, the guest must interact with two SynICs -- the one provided
> +by the paravisor and the one provided by the Hyper-V host when Confidential
> +VMBus is not offered. Interrupts are always signaled by the paravisor SynIC, but
> +the guest must check for messages and for channel interrupts on both SynICs.
> +
> +In the case of a confidential VM,

I think you mean "In the case of confidential VMBus,"

> +regular SynIC access by the guest is
> +intercepted by the paravisor (this includes various MSRs such as the SIMP and
> +SIEFP, as well as hypercalls like HvPostMessage and HvSignalEvent). If the guest
> +actually wants to communicate with the hypervisor, it has to use special mechanisms
> +(GHCB page on SNP, or tdcall on TDX). Messages will always be one or the other:
> +with confidential VMBus, all messages use the paravisor SynIC,

This statement seems to contradict the statement in the previous paragraph, and
the code. If all messages use the paravisor SynIC when Confidential VMBus is used,
why must the guest check for messages on both SynICs?

> +otherwise they all
> +use the hypervisor SynIC. For interrupt signaling, though, some channels may be
> +running on the host (non-confidential, using the VMBus relay) and use the hypervisor
> +SynIC, and some on the paravisor and use its SynIC. The RelIDs are coordinated by
> +the OpenHCL VMBus server and are guaranteed to be unique regardless of whether
> +the channel originated on the host or the paravisor.
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor
  2025-06-04  0:43 ` [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
  2025-06-04 13:27   ` ALOK TIWARI
@ 2025-06-18 16:17   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:17 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:43 PM
> 
> The confidential VMBus is built on the guest talking to the
> paravisor only.
> 
> Provide functions that allow manipulating the SynIC registers
> via paravisor. No functional changes.

I'd like to see a little more detailed explanation for "why" the
new functions. Something like:

The existing Hyper-V wrappers for getting and setting MSRs are
hv_get/set_msr(). Via hv_get/set_non_nested_msr(), they detect
when running in a CoCo VM with a paravisor, and use the TDX or
SNP guest-host communication protocol to bypass the paravisor
and go directly to the host hypervisor for SynIC MSRs. The "set"
function also implements the required special handling for the
SINT MSRs.

But in some Confidential VMBus cases, the guest wants to talk only
with the paravisor. To accomplish this, provide new functions for
accessing SynICs that always do direct accesses (i.e., not via
TDX or SNP GHCB), which will go to the paravisor. The mirroring
of the existing "set" function is also not needed. These functions
should be used only in the specific Confidential VMBus cases that
require them.

And I'm not sure "No functional changes" is correct. This is adding
new functionality. It's not just a cosmetic change or code
refactoring.

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 44 ++++++++++++++++++++++++++++++++++
>  drivers/hv/hv_common.c         | 13 ++++++++++
>  include/asm-generic/mshyperv.h |  2 ++
>  3 files changed, 59 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 3e2533954675..83a85d94bcb3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -89,6 +89,50 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value)
>  }
>  EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
> 
> +/*
> + * Attempt to get the SynIC register value from the paravisor.
> + *
> + * Not all paravisors support reading SynIC registers, so this function
> + * may fail. The register for the SynIC of the running CPU is accessed.
> + *
> + * Writes the SynIC register value into the memory pointed by val,
> + * and ~0ULL is on failure.
> + *
> + * Returns -ENODEV if the MSR is not a SynIC register, or another error
> + * code if getting the MSR fails (meaning the paravisor doesn't support
> + * relaying VMBus comminucations).

s/comminucations/communications/

> + */
> +int hv_para_get_synic_register(unsigned int reg, u64 *val)
> +{
> +	u64 reg_val = ~0ULL;
> +	int err = -ENODEV;
> +
> +	if (hv_is_synic_msr(reg))
> +		reg_val = native_read_msr_safe(reg, &err);
> +	*val = reg_val;
> +
> +	return err;
> +}
> +
> +/*
> + * Attempt to set the SynIC register value with the paravisor.
> + *
> + * Not all paravisors support reading SynIC registers, so this function

s/reading/setting/

> + * may fail. The register for the SynIC of the running CPU is accessed.
> + *
> + * Sets the register to the value supplied.
> + *
> + * Returns: -ENODEV if the MSR is not a SynIC register, or another error
> + * code if writing to the MSR fails (meaning the paravisor doesn't support
> + * relaying VMBus comminucations).

s/comminucations/communications/

> + */
> +int hv_para_set_synic_register(unsigned int reg, u64 val)
> +{
> +	if (!hv_is_synic_msr(reg))
> +		return -ENODEV;
> +	return wrmsrl_safe(reg, val);
> +}
> +
>  u64 hv_get_msr(unsigned int reg)
>  {
>  	if (hv_nested)
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 49898d10faff..a179ea482cb1 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -716,6 +716,19 @@ u64 __weak hv_tdx_hypercall(u64 control, u64 param1, u64
> param2)
>  }
>  EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
> 
> +int __weak hv_para_get_synic_register(unsigned int reg, u64 *val)
> +{
> +	*val = ~0ULL;
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(hv_para_get_synic_register);
> +
> +int __weak hv_para_set_synic_register(unsigned int reg, u64 val)
> +{
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(hv_para_set_synic_register);
> +
>  void hv_identify_partition_type(void)
>  {
>  	/* Assume guest role */
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 9722934a8332..f239b102fc53 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -333,6 +333,8 @@ bool hv_is_isolation_supported(void);
>  bool hv_isolation_type_snp(void);
>  u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
>  u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +int hv_para_get_synic_register(unsigned int reg, u64 *val);
> +int hv_para_set_synic_register(unsigned int reg, u64 val);
>  void hyperv_cleanup(void);
>  bool hv_query_ext_cap(u64 cap_query);
>  void hv_setup_dma_ops(struct device *dev, bool coherent);
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs
  2025-06-04  0:43 ` [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs Roman Kisel
  2025-06-04 13:38   ` ALOK TIWARI
@ 2025-06-18 16:18   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:18 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> To set up and run the confidential VMBus, the guest needs the paravisor
> to intercept access to some synthetic MSRs. In the non-confidential case,
> the guest continues using the vendor-specific guest-host communication
> protocol.
> 
> Update the hv_set_non_nested_msr() function to trap access to some
> synthetic MSRs.

"trap access" is somewhat generic, and it's not clear what the intent is.
I'd suggest something like:

hv_set_non_nested_msr() has special handling for SINT MSRs
when a paravisor is present. In addition to updating the MSR on the
host, the mirror MSR in the paravisor is updated, including with the
proxy bit. But with Confidential VMBus, the proxy bit must not be
used, so add a special case to skip it.

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 83a85d94bcb3..db6f3e3db012 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -28,6 +28,7 @@
>  #include <asm/apic.h>
>  #include <asm/timer.h>
>  #include <asm/reboot.h>
> +#include <asm/msr.h>
>  #include <asm/nmi.h>
>  #include <clocksource/hyperv_timer.h>
>  #include <asm/numa.h>
> @@ -77,14 +78,28 @@ EXPORT_SYMBOL_GPL(hv_get_non_nested_msr);
> 
>  void hv_set_non_nested_msr(unsigned int reg, u64 value)
>  {
> +	if (reg == HV_X64_MSR_EOM && vmbus_is_confidential()) {
> +		/* Reach out to the paravisor. */
> +		native_wrmsrl(reg, value);
> +		return;
> +	}
> +

It seems a bit inconsistent to have this particular MSR treated as
a special case in the generic code path, when the new functions
hv_para_get/set_synic_register() have been introduced to handle
the unique requirements of Confidential VMBus. This MSR is set
only in vmbus_signal_eom(), so maybe vmbus_signal_eom()
should test for confidential VM, and call hv_para_set_synic_register()
instead?

>  	if (hv_is_synic_msr(reg) && ms_hyperv.paravisor_present) {
> +		/* The hypervisor will get the intercept. */
>  		hv_ivm_msr_write(reg, value);
> 
> -		/* Write proxy bit via wrmsl instruction */
> -		if (hv_is_sint_msr(reg))
> -			wrmsrl(reg, value | 1 << 20);
> +		if (hv_is_sint_msr(reg)) {
> +			/*
> +			 * Write proxy bit in the case of non-confidential VMBus.
> +			 * Using wrmsl instruction so the following goes to the paravisor.
> +			 */
> +			u32 proxy = vmbus_is_confidential() ? 0 : 1;
> +
> +			value |= (proxy << 20);
> +			native_wrmsrl(reg, value);
> +		}
>  	} else {
> -		wrmsrl(reg, value);
> +		native_wrmsrl(reg, value);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(hv_set_non_nested_msr);
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 05/15] Drivers: hv: Rename fields for SynIC message and event pages
  2025-06-04  0:43 ` [PATCH hyperv-next v3 05/15] Drivers: hv: Rename fields for SynIC message and event pages Roman Kisel
@ 2025-06-18 16:18   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:18 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> Support for the confidential VMBus requires using SynIC message
> and event pages shared with the host and separate ones accessible
> only to the paravisor.

Slight tweak -- tie this to having two SynICs:

Confidential VMBus requires interacting with two SynICs -- one
provided by the host hypervisor, and one provided by the paravisor.
Each SynIC requires its own message and event pages.

> 
> Rename the host-accessible SynIC message and event pages to be
> able to add the paravisor ones. No functional changes. The field
> name is also changed in mshv_root.* for consistency.

Rename the existing host-accessible SynIC message and event pages
with the "hyp_" prefix to clearly distinguish them from the paravisor
ones. The field name is also changed in mshv_root.* for consistency.

No functional change.

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c |  6 ++--
>  drivers/hv/hv.c           | 66 +++++++++++++++++++--------------------
>  drivers/hv/hyperv_vmbus.h |  4 +--
>  drivers/hv/mshv_root.h    |  2 +-
>  drivers/hv/mshv_synic.c   |  6 ++--
>  drivers/hv/vmbus_drv.c    |  6 ++--
>  6 files changed, 45 insertions(+), 45 deletions(-)
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor SynIC pages when required
  2025-06-04  0:43 ` [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor SynIC pages when required Roman Kisel
@ 2025-06-18 16:18   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:18 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> The paravisor needs the SynIC pages to communicate with the guest
> via the confidential VMBus.
> 
> Refactor and extaned the exisitng code to account for that.

Suggest adding a bit more context to the commit message:

Confidential VMBus requires interacting with two SynICs -- one
provided by the host hypervisor, and one provided by the paravisor.
Each SynIC requires its own message and event pages.

Refactor and extend the existing code to add allocating and freeing
the message and event pages for the paravisor SynIC when it is
present.

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/hv.c           | 184 +++++++++++++++++++-------------------
>  drivers/hv/hyperv_vmbus.h |  17 ++++
>  2 files changed, 111 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 964b9102477d..e25c91eb6af5 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -94,10 +94,70 @@ int hv_post_message(union hv_connection_id connection_id,
>  	return hv_result(status);
>  }
> 
> +static int hv_alloc_page(unsigned int cpu, void **page, bool decrypt,
> +	const char *note)

Why have a "cpu" argument to this function? It's not used anywhere.

> +{
> +	int ret = 0;
> +
> +	/*
> +	 * After the page changes its encryption status, its contents might
> +	 * appear scrambled on some hardware. Thus `get_zeroed_page` would
> +	 * zero the page out in vain, we do that ourselves exactly once.

s/we do that/so do that explicitly exactly once/

Avoid personal pronouns like "we".

> +	 *
> +	 * By default, the page is allocated encrypted provided the system
> +	 * supports that.

More precise: "the page is allocated encrypted in a CoCo VM"

> +	 */
> +	*page = (void *)__get_free_page(GFP_KERNEL);
> +	if (!*page)
> +		return -ENOMEM;
> +
> +	if (decrypt)
> +		ret = set_memory_decrypted((unsigned long)*page, 1);
> +	if (ret)
> +		goto failed;
> +
> +	memset(*page, 0, PAGE_SIZE);
> +	return 0;
> +
> +failed:
> +

Don't need a blank line here.

> +	pr_err("allocation failed for %s page, error %d when allocating the page, decrypted %d\n",

The "when allocating the page" portion of the above message is somewhat redundant.
Compare with the similar message in hv_free_page().

> +		note, ret, decrypt);
> +	free_page((unsigned long)*page);

I don't think the page should be freed here. When set_memory_decrypted() fails, the
encryption state of the memory is unknown, so it should not be put back on the free list.
It's the same situation as in hv_free_page().

> +	*page = NULL;
> +	return ret;
> +}
> +
> +static int hv_free_page(void **page, bool encrypt, const char *note)
> +{
> +	int ret = 0;
> +
> +	if (!*page)
> +		return 0;
> +
> +	if (encrypt)
> +		ret = set_memory_encrypted((unsigned long)*page, 1);
> +
> +	/*
> +	 * In the case of the action failure, the page is leaked.
> +	 * Something is wrong, prefer to lose the page and stay afloat.
> +	 */
> +	if (ret) {
> +		pr_err("deallocation failed for %s page, error %d, encrypt %d\n",
> +			note, ret, encrypt);
> +	} else
> +		free_page((unsigned long)*page);
> +
> +	*page = NULL;
> +
> +	return ret;
> +}
> +
>  int hv_synic_alloc(void)
>  {
>  	int cpu, ret = -ENOMEM;
>  	struct hv_per_cpu_context *hv_cpu;
> +	const bool decrypt = !vmbus_is_confidential();
> 
>  	/*
>  	 * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -123,73 +183,37 @@ int hv_synic_alloc(void)
>  			     vmbus_on_msg_dpc, (unsigned long)hv_cpu);
> 
>  		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
> -			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
> -			if (!hv_cpu->post_msg_page) {
> -				pr_err("Unable to allocate post msg page\n");
> -				goto err;
> -			}
> -
> -			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
> -			if (ret) {
> -				pr_err("Failed to decrypt post msg page: %d\n", ret);
> -				/* Just leak the page, as it's unsafe to free the page. */
> -				hv_cpu->post_msg_page = NULL;
> +			ret = hv_alloc_page(cpu, &hv_cpu->post_msg_page,
> +				decrypt, "post msg");
> +			if (ret)
>  				goto err;
> -			}
> -
> -			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
>  		}
> 
>  		/*
> -		 * Synic message and event pages are allocated by paravisor.
> -		 * Skip these pages allocation here.
> +		 * If these SynIC pages are not allocated, SIEF and SIM pages
> +		 * are configured using what the root partition or the paravisor
> +		 * provides upon reading the SIEFP and SIMP registers.
>  		 */
>  		if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
> -			hv_cpu->hyp_synic_message_page =
> -				(void *)get_zeroed_page(GFP_ATOMIC);
> -			if (!hv_cpu->hyp_synic_message_page) {
> -				pr_err("Unable to allocate SYNIC message page\n");
> +			ret = hv_alloc_page(cpu, &hv_cpu->hyp_synic_message_page,
> +				decrypt, "hypervisor SynIC msg");
> +			if (ret)
>  				goto err;
> -			}
> -
> -			hv_cpu->hyp_synic_event_page =
> -				(void *)get_zeroed_page(GFP_ATOMIC);
> -			if (!hv_cpu->hyp_synic_event_page) {
> -				pr_err("Unable to allocate SYNIC event page\n");
> -
> -				free_page((unsigned long)hv_cpu->hyp_synic_message_page);
> -				hv_cpu->hyp_synic_message_page = NULL;
> +			ret = hv_alloc_page(cpu, &hv_cpu->hyp_synic_event_page,
> +				decrypt, "hypervisor SynIC event");
> +			if (ret)
>  				goto err;
>  			}
> -		}
> 
> -		if (!ms_hyperv.paravisor_present &&
> -		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> -			ret = set_memory_decrypted((unsigned long)
> -				hv_cpu->hyp_synic_message_page, 1);
> -			if (ret) {
> -				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
> -				hv_cpu->hyp_synic_message_page = NULL;
> -
> -				/*
> -				 * Free the event page here so that hv_synic_free()
> -				 * won't later try to re-encrypt it.
> -				 */
> -				free_page((unsigned long)hv_cpu->hyp_synic_event_page);
> -				hv_cpu->hyp_synic_event_page = NULL;
> +		if (vmbus_is_confidential()) {
> +			ret = hv_alloc_page(cpu, &hv_cpu->para_synic_message_page,
> +				decrypt, "paravisor SynIC msg");

Shouldn't the "decrypt" parameter just always be passed as "false"? That's the
fundamental tenet of Confidential VMBus -- these pages should not be decrypted.

> +			if (ret)
>  				goto err;
> -			}
> -
> -			ret = set_memory_decrypted((unsigned long)
> -				hv_cpu->hyp_synic_event_page, 1);
> -			if (ret) {
> -				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
> -				hv_cpu->hyp_synic_event_page = NULL;
> +			ret = hv_alloc_page(cpu, &hv_cpu->para_synic_event_page,
> +				decrypt, "paravisor SynIC event");

Same here.  "decrypt" is always "false".

> +			if (ret)
>  				goto err;
> -			}
> -
> -			memset(hv_cpu->hyp_synic_message_page, 0, PAGE_SIZE);
> -			memset(hv_cpu->hyp_synic_event_page, 0, PAGE_SIZE);
>  		}
>  	}

Refactoring this function with the hv_alloc_page() helper function works out
very nicely! The code is simpler and the error handling is much easier to get right.

> 
> @@ -205,48 +229,28 @@ int hv_synic_alloc(void)
> 
>  void hv_synic_free(void)
>  {
> -	int cpu, ret;
> +	int cpu;
> +	const bool encrypt = !vmbus_is_confidential();
> 
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu =
>  			per_cpu_ptr(hv_context.cpu_context, cpu);
> 
> -		/* It's better to leak the page if the encryption fails. */
> -		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
> -			if (hv_cpu->post_msg_page) {
> -				ret = set_memory_encrypted((unsigned long)
> -					hv_cpu->post_msg_page, 1);
> -				if (ret) {
> -					pr_err("Failed to encrypt post msg page: %d\n", ret);
> -					hv_cpu->post_msg_page = NULL;
> -				}
> -			}
> +		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx())
> +			hv_free_page(&hv_cpu->post_msg_page,
> +				encrypt, "post msg");
> +		if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
> +			hv_free_page(&hv_cpu->hyp_synic_event_page,
> +				encrypt, "hypervisor SynIC event");
> +			hv_free_page(&hv_cpu->hyp_synic_message_page,
> +				encrypt, "hypervisor SynIC msg");
>  		}
> -
> -		if (!ms_hyperv.paravisor_present &&
> -		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> -			if (hv_cpu->hyp_synic_message_page) {
> -				ret = set_memory_encrypted((unsigned long)
> -					hv_cpu->hyp_synic_message_page, 1);
> -				if (ret) {
> -					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
> -					hv_cpu->hyp_synic_message_page = NULL;
> -				}
> -			}
> -
> -			if (hv_cpu->hyp_synic_event_page) {
> -				ret = set_memory_encrypted((unsigned long)
> -					hv_cpu->hyp_synic_event_page, 1);
> -				if (ret) {
> -					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
> -					hv_cpu->hyp_synic_event_page = NULL;
> -				}
> -			}
> +		if (vmbus_is_confidential()) {
> +			hv_free_page(&hv_cpu->para_synic_event_page,
> +				encrypt, "paravisor SynIC event");
> +			hv_free_page(&hv_cpu->para_synic_message_page,
> +				encrypt, "paravisor SynIC msg");

As with hv_synic_alloc(), for these two calls, always "false" for the "encrypt"
parameter.

>  		}
> -
> -		free_page((unsigned long)hv_cpu->post_msg_page);
> -		free_page((unsigned long)hv_cpu->hyp_synic_event_page);
> -		free_page((unsigned long)hv_cpu->hyp_synic_message_page);
>  	}

Same here on the refactoring. Very nice!

> 
>  	kfree(hv_context.hv_numa_map);
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index fc3cdb26ff1a..9619edcf9f88 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -120,8 +120,25 @@ enum {
>   * Per cpu state for channel handling
>   */
>  struct hv_per_cpu_context {
> +	/*
> +	 * SynIC pages for communicating with the host.
> +	 *
> +	 * These pages are accessible to the host partition and the hypervisor,
> +	 * so they can only be used for exchanging data when the host partition
> +	 * and the hypervisor are trusted.

This comment isn't quite accurate. The hypervisor SynIC and its message/event
pages are used in today's CoCo VMs (without Confidential VMBus) where the host
partition and hypervisor are not trusted. The guest must be prepared for malicious
behavior by the SynIC, but that doesn't prevent today's CoCo VMs from providing
the intended confidentiality.

> +	 */
>  	void *hyp_synic_message_page;
>  	void *hyp_synic_event_page;
> +	/*
> +	 * SynIC pages for communicating with the paravisor.
> +	 *
> +	 * These pages can be accessed only from within the guest partition.
> +	 * Neither the host partition nor the hypervisor can access these pages,
> +	 * so they can be used for exchanging data when the host partition and
> +	 * the hypervisor are not trusted, such as in a confidential VM.

Same here on this comment.

> +	 */
> +	void *para_synic_message_page;
> +	void *para_synic_event_page;
> 
>  	/*
>  	 * The page is only used in hv_post_message() for a TDX VM (with the
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available
  2025-06-04  0:43 ` [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available Roman Kisel
  2025-06-04 13:48   ` ALOK TIWARI
@ 2025-06-18 16:18   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:18 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> When the confidential VMBus is available, the guest should post
> messages via the paravisor.
> 
> Update hv_post_message() to request posting messages from the paravisor

"via the paravisor"?  I'm not sure what "from the paravisor" means. And
you used "via" in the previous sentence and patch Subject.

> rather than through GHCB or TD calls.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/hv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index e25c91eb6af5..1f7cf1244509 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -74,7 +74,7 @@ int hv_post_message(union hv_connection_id connection_id,
>  	aligned_msg->payload_size = payload_size;
>  	memcpy((void *)aligned_msg->payload, payload, payload_size);
> 
> -	if (ms_hyperv.paravisor_present) {
> +	if (ms_hyperv.paravisor_present && !vmbus_is_confidential()) {

Does this change make post_msg_page unnecessary when Confidential
VMBus is present? When using Confidential VMBus, the code path will be
to use a normal hypercall, which will go to the paravisor, and hence
doesn't need decrypted memory.

If my thinking is correct, the code in hv_synic_alloc() could be updated to
not allocate post_msg_page when vmbus_is_confidential().

>  		if (hv_isolation_type_tdx())
>  			status = hv_tdx_hypercall(HVCALL_POST_MESSAGE,
>  						  virt_to_phys(aligned_msg), 0);
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 09/15] Drivers: hv: Use memunmap() to check if the address is in IO map
  2025-06-04  0:43 ` [PATCH hyperv-next v3 09/15] Drivers: hv: Use memunmap() to check if the address is in IO map Roman Kisel
@ 2025-06-18 16:18   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:18 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> It might happen that some hyp SynIC pages aren't IO mapped.
> 
> Use memunmap() that checks for that and only then calls iounmap()

I'm concerned by the lack of symmetry in using io_remap_cache()
to do the mapping, and then memunmap() to remove it. The issue
is presumably that hyp_synic_[message/event]_page might be NULL?
Or is there some other case? But I'm thinking it would be better to 
explicitly test for NULL and only call iounmap() if non-NULL. Then there's
no dependence on the implementation of memumap().

Not doing the explicit test for NULL actually caused the problem in
the first place. When the paravisor and root partition code was
introduced, iounmap() did a test and just returned, so everything
worked. Then commit 50c6dbdfd16e was added in the 6.12 kernel,
and iounmap() started generating a WARN if NULL is passed in.

Michael

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/hv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 6a4857def82d..9a66656d89e0 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -358,7 +358,7 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	 */
>  	simp.simp_enabled = 0;
>  	if (ms_hyperv.paravisor_present || hv_root_partition()) {
> -		iounmap(hv_cpu->hyp_synic_message_page);
> +		memunmap(hv_cpu->hyp_synic_message_page);
>  		hv_cpu->hyp_synic_message_page = NULL;
>  	} else {
>  		simp.base_simp_gpa = 0;
> @@ -370,7 +370,7 @@ void hv_synic_disable_regs(unsigned int cpu)
>  	siefp.siefp_enabled = 0;
> 
>  	if (ms_hyperv.paravisor_present || hv_root_partition()) {
> -		iounmap(hv_cpu->hyp_synic_event_page);
> +		memunmap(hv_cpu->hyp_synic_event_page);
>  		hv_cpu->hyp_synic_event_page = NULL;
>  	} else {
>  		siefp.base_siefp_gpa = 0;
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 10/15] Drivers: hv: Rename the SynIC enable and disable routines
  2025-06-04  0:43 ` [PATCH hyperv-next v3 10/15] Drivers: hv: Rename the SynIC enable and disable routines Roman Kisel
@ 2025-06-18 16:19   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:19 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> The confidential VMBus requires support for the both hypervisor
> facing SynIC and the paravisor one.
> 
> Rename the functions that enable and disable SynIC with the
> hypervisor.

Add "No functional changes"?

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c |  2 +-
>  drivers/hv/hv.c           | 11 ++++++-----
>  drivers/hv/hyperv_vmbus.h |  4 ++--
>  drivers/hv/vmbus_drv.c    |  6 +++---
>  4 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 6f87220e2ca3..ca2fe10c110a 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -845,7 +845,7 @@ static void vmbus_wait_for_unload(void)
>  			/*
>  			 * In a CoCo VM the hyp_synic_message_page is not allocated
>  			 * in hv_synic_alloc(). Instead it is set/cleared in
> -			 * hv_synic_enable_regs() and hv_synic_disable_regs()
> +			 * hv_hyp_synic_enable_regs() and hv_hyp_synic_disable_regs()
>  			 * such that it is set only when the CPU is online. If
>  			 * not all present CPUs are online, the message page
>  			 * might be NULL, so skip such CPUs.
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 9a66656d89e0..2b561825089a 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -257,9 +257,10 @@ void hv_synic_free(void)
>  }
> 
>  /*
> - * hv_synic_enable_regs - Initialize the Synthetic Interrupt Controller.
> + * hv_hyp_synic_enable_regs - Initialize the Synthetic Interrupt Controller
> + * with the hypervisor.
>   */
> -void hv_synic_enable_regs(unsigned int cpu)
> +void hv_hyp_synic_enable_regs(unsigned int cpu)
>  {
>  	struct hv_per_cpu_context *hv_cpu =
>  		per_cpu_ptr(hv_context.cpu_context, cpu);
> @@ -325,14 +326,14 @@ void hv_synic_enable_regs(unsigned int cpu)
> 
>  int hv_synic_init(unsigned int cpu)
>  {
> -	hv_synic_enable_regs(cpu);
> +	hv_hyp_synic_enable_regs(cpu);
> 
>  	hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
> 
>  	return 0;
>  }
> 
> -void hv_synic_disable_regs(unsigned int cpu)
> +void hv_hyp_synic_disable_regs(unsigned int cpu)
>  {
>  	struct hv_per_cpu_context *hv_cpu =
>  		per_cpu_ptr(hv_context.cpu_context, cpu);
> @@ -515,7 +516,7 @@ int hv_synic_cleanup(unsigned int cpu)
>  always_cleanup:
>  	hv_stimer_legacy_cleanup(cpu);
> 
> -	hv_synic_disable_regs(cpu);
> +	hv_hyp_synic_disable_regs(cpu);
> 
>  	return ret;
>  }
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 9619edcf9f88..c1df611d1eb2 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -188,10 +188,10 @@ extern int hv_synic_alloc(void);
> 
>  extern void hv_synic_free(void);
> 
> -extern void hv_synic_enable_regs(unsigned int cpu);
> +extern void hv_hyp_synic_enable_regs(unsigned int cpu);
>  extern int hv_synic_init(unsigned int cpu);
> 
> -extern void hv_synic_disable_regs(unsigned int cpu);
> +extern void hv_hyp_synic_disable_regs(unsigned int cpu);
>  extern int hv_synic_cleanup(unsigned int cpu);
> 
>  /* Interface */
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 695b1ba7113c..f7e82a4fe133 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2809,7 +2809,7 @@ static void hv_crash_handler(struct pt_regs *regs)
>  	 */
>  	cpu = smp_processor_id();
>  	hv_stimer_cleanup(cpu);
> -	hv_synic_disable_regs(cpu);
> +	hv_hyp_synic_disable_regs(cpu);
>  };
> 
>  static int hv_synic_suspend(void)
> @@ -2834,14 +2834,14 @@ static int hv_synic_suspend(void)
>  	 * interrupts-disabled context.
>  	 */
> 
> -	hv_synic_disable_regs(0);
> +	hv_hyp_synic_disable_regs(0);
> 
>  	return 0;
>  }
> 
>  static void hv_synic_resume(void)
>  {
> -	hv_synic_enable_regs(0);
> +	hv_hyp_synic_enable_regs(0);
> 
>  	/*
>  	 * Note: we don't need to call hv_stimer_init(0), because the timer
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 11/15] Drivers: hv: Functions for setting up and tearing down the paravisor SynIC
  2025-06-04  0:43 ` [PATCH hyperv-next v3 11/15] Drivers: hv: Functions for setting up and tearing down the paravisor SynIC Roman Kisel
@ 2025-06-18 16:19   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:19 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> The confidential VMBus runs with the paravisor SynIC and requires
> configuring it with the paravisor.
> 
> Add the functions for configuring the paravisor SynIC

Suggest:

Add the functions for configuring the paravisor SynIC. Update
overall SynIC initialization logic to initialize the SynIC if it is present.
Finally, break out SynIC interrupt enable/disable code into separate
functions so that SynIC interrupts can be enabled/disable via the
paravisor instead of the hypervisor if the paravisor SynIC is present.

> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/hv.c | 180 +++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 169 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 2b561825089a..c9649ab3439e 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -203,7 +203,7 @@ int hv_synic_alloc(void)
>  				decrypt, "hypervisor SynIC event");
>  			if (ret)
>  				goto err;
> -			}
> +		}

This looks like a code cleanup that should be part of Patch 6 of this series.

> 
>  		if (vmbus_is_confidential()) {
>  			ret = hv_alloc_page(cpu, &hv_cpu->para_synic_message_page,
> @@ -267,7 +267,6 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
>  	union hv_synic_simp simp;
>  	union hv_synic_siefp siefp;
>  	union hv_synic_sint shared_sint;
> -	union hv_synic_scontrol sctrl;
> 
>  	/* Setup the Synic's message page */
>  	simp.as_uint64 = hv_get_msr(HV_MSR_SIMP);
> @@ -288,7 +287,7 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
> 
>  	hv_set_msr(HV_MSR_SIMP, simp.as_uint64);
> 
> -	/* Setup the Synic's event page */
> +	/* Setup the Synic's event page with the hypervisor. */

You added "with the hypervisor" to this comment, but not the similar
comment above for the message page. Consistency .... :-)

>  	siefp.as_uint64 = hv_get_msr(HV_MSR_SIEFP);
>  	siefp.siefp_enabled = 1;
> 
> @@ -316,6 +315,11 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
>  	shared_sint.masked = false;
>  	shared_sint.auto_eoi = hv_recommend_using_aeoi();
>  	hv_set_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +}
> +
> +static void hv_hyp_synic_enable_interrupts(void)
> +{
> +	union hv_synic_scontrol sctrl;
> 
>  	/* Enable the global synic bit */
>  	sctrl.as_uint64 = hv_get_msr(HV_MSR_SCONTROL);
> @@ -324,13 +328,90 @@ void hv_hyp_synic_enable_regs(unsigned int cpu)
>  	hv_set_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
>  }
> 
> +/*
> + * The paravisor might not support proxying SynIC, and this
> + * function may fail.
> + */
> +static int hv_para_synic_enable_regs(unsigned int cpu)
> +{
> +	int err;
> +	union hv_synic_simp simp;
> +	union hv_synic_siefp siefp;
> +	struct hv_per_cpu_context *hv_cpu
> +		= per_cpu_ptr(hv_context.cpu_context, cpu);
> +
> +	/* Setup the Synic's message page with the paravisor. */
> +	err = hv_para_get_synic_register(HV_MSR_SIMP, &simp.as_uint64);
> +	if (err)
> +		return err;
> +	simp.simp_enabled = 1;
> +	simp.base_simp_gpa = virt_to_phys(hv_cpu->para_synic_message_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +	err = hv_para_set_synic_register(HV_MSR_SIMP, simp.as_uint64);
> +	if (err)
> +		return err;
> +
> +	/* Setup the Synic's event page with the paravisor. */
> +	err = hv_para_get_synic_register(HV_MSR_SIEFP, &siefp.as_uint64);
> +	if (err)
> +		return err;
> +	siefp.siefp_enabled = 1;
> +	siefp.base_siefp_gpa = virt_to_phys(hv_cpu->para_synic_event_page)
> +			>> HV_HYP_PAGE_SHIFT;
> +	return hv_para_set_synic_register(HV_MSR_SIEFP, siefp.as_uint64);
> +}
> +
> +static int hv_para_synic_enable_interrupts(void)
> +{
> +	union hv_synic_scontrol sctrl;
> +	int err;
> +
> +	/* Enable the global synic bit */
> +	err = hv_para_get_synic_register(HV_MSR_SCONTROL, &sctrl.as_uint64);
> +	if (err)
> +		return err;
> +	sctrl.enable = 1;
> +
> +	return hv_para_set_synic_register(HV_MSR_SCONTROL, sctrl.as_uint64);
> +}
> +
>  int hv_synic_init(unsigned int cpu)
>  {
> +	int err;
> +
> +	/*
> +	 * The paravisor may not support the confidential VMBus,
> +	 * check on that first.
> +	 */
> +	if (vmbus_is_confidential()) {
> +		err = hv_para_synic_enable_regs(cpu);
> +		if (err)
> +			goto fail;
> +	}
> +
>  	hv_hyp_synic_enable_regs(cpu);
> +	if (vmbus_is_confidential()) {
> +		err = hv_para_synic_enable_interrupts();
> +		if (err)
> +			goto fail;
> +	} else
> +		hv_hyp_synic_enable_interrupts();

It would be nice to have some detailed code comments somewhere
describing how VMBus interrupts work with Confidential VMBus. I
see that the SINT is set in hv_hyperv_synic_enable_regs() by calling
hv_set_msr().  hv_set_msr() in turn has special case code for the
SINT MSRs that write to the hypervisor version of the MSR *and*
the paravisor version of the MSR (but *without* the proxy bit when
VMBus is confidential). Then the code above enables interrupts
via the paravisor if VMBus is confidential, and otherwise via the
hypervisor.

Assuming my description is correct, maybe just writing that down
in a comment somewhere will confirm to later developers of this
code about what is supposed to be happening.

> 
>  	hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
> 
>  	return 0;
> +
> +fail:
> +	/*
> +	 * The failure may only come from enabling the paravisor SynIC.
> +	 * That in turn means that the confidential VMBus cannot be used
> +	 * which is not an error: the setup will be re-tried with the
> +	 * non-confidential VMBus.
> +	 *
> +	 * We also don't bother attempting to reset the paravisor registers
> +	 * as something isn't working there anyway.
> +	 */
> +	return err;
>  }
> 
>  void hv_hyp_synic_disable_regs(unsigned int cpu)
> @@ -340,7 +421,6 @@ void hv_hyp_synic_disable_regs(unsigned int cpu)
>  	union hv_synic_sint shared_sint;
>  	union hv_synic_simp simp;
>  	union hv_synic_siefp siefp;
> -	union hv_synic_scontrol sctrl;
> 
>  	shared_sint.as_uint64 = hv_get_msr(HV_MSR_SINT0 + VMBUS_MESSAGE_SINT);
> 
> @@ -378,14 +458,71 @@ void hv_hyp_synic_disable_regs(unsigned int cpu)
>  	}
> 
>  	hv_set_msr(HV_MSR_SIEFP, siefp.as_uint64);
> +}
> +
> +static void hv_hyp_synic_disable_interrupts(void)
> +{
> +	union hv_synic_scontrol sctrl;
> 
>  	/* Disable the global synic bit */
>  	sctrl.as_uint64 = hv_get_msr(HV_MSR_SCONTROL);
>  	sctrl.enable = 0;
>  	hv_set_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
> +}
> 
> -	if (vmbus_irq != -1)
> -		disable_percpu_irq(vmbus_irq);
> +static void hv_para_synic_disable_regs(unsigned int cpu)
> +{
> +	/*
> +	 * When a get/set register error is encountered, the function
> +	 * returns as the paravisor may not support these registers.
> +	 */
> +	int err;
> +	union hv_synic_simp simp;
> +	union hv_synic_siefp siefp;
> +
> +	struct hv_per_cpu_context *hv_cpu
> +		= per_cpu_ptr(hv_context.cpu_context, cpu);
> +
> +	/* Disable SynIC's message page in the paravisor. */
> +	err = hv_para_get_synic_register(HV_MSR_SIMP, &simp.as_uint64);
> +	if (err)
> +		return;
> +	simp.simp_enabled = 0;
> +
> +	if (hv_cpu->para_synic_message_page) {
> +		free_page((u64)(hv_cpu->para_synic_message_page));
> +		hv_cpu->para_synic_message_page = NULL;
> +	}

Freeing the para_synic_message_page memory here causes problems if a
CPU is taken offline, then back online. This function gets called when the CPU
goes offline, so the para_synic_message_page and para_synic_event_page
memory is freed. Then if the CPU comes back online,
hv_para_synic_enable_regs() runs without the memory being allocated
again.

Be sure to test the CPU offlining/onlining scenario to make sure it
works throughout the stack, including the paravisor.

> +
> +	err = hv_para_set_synic_register(HV_MSR_SIMP, simp.as_uint64);
> +	if (err)
> +		return;
> +
> +	/* Disable SynIC's event page in the paravisor. */
> +	err = hv_para_get_synic_register(HV_MSR_SIEFP, &siefp.as_uint64);
> +	if (err)
> +		return;
> +	siefp.siefp_enabled = 0;
> +
> +	if (hv_cpu->para_synic_event_page) {
> +		free_page((u64)(hv_cpu->para_synic_event_page));
> +		hv_cpu->para_synic_event_page = NULL;

As above, don't free the memory here.

> +	}
> +
> +	hv_para_set_synic_register(HV_MSR_SIEFP, siefp.as_uint64);
> +}
> +
> +static void hv_para_synic_disable_interrupts(void)
> +{
> +	union hv_synic_scontrol sctrl;
> +	int err;
> +
> +	/* Disable the global synic bit */
> +	err = hv_para_get_synic_register(HV_MSR_SCONTROL, &sctrl.as_uint64);
> +	if (err)
> +		return;
> +	sctrl.enable = 0;
> +	hv_para_set_synic_register(HV_MSR_SCONTROL, sctrl.as_uint64);
>  }
> 
>  #define HV_MAX_TRIES 3
> @@ -398,16 +535,18 @@ void hv_hyp_synic_disable_regs(unsigned int cpu)
>   * that the normal interrupt handling mechanism will find and process the channel interrupt
>   * "very soon", and in the process clear the bit.
>   */
> -static bool hv_synic_event_pending(void)
> +static bool __hv_synic_event_pending(union hv_synic_event_flags *event, int sint)
>  {
> -	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> -	union hv_synic_event_flags *event =
> -		(union hv_synic_event_flags *)hv_cpu->hyp_synic_event_page + VMBUS_MESSAGE_SINT;
> -	unsigned long *recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
> +	unsigned long *recv_int_page;
>  	bool pending;
>  	u32 relid;
>  	int tries = 0;
> 
> +	if (!event)
> +		return false;
> +
> +	event += sint;
> +	recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
>  retry:
>  	pending = false;
>  	for_each_set_bit(relid, recv_int_page, HV_EVENT_FLAGS_COUNT) {
> @@ -424,6 +563,17 @@ static bool hv_synic_event_pending(void)
>  	return pending;
>  }
> 
> +static bool hv_synic_event_pending(void)
> +{
> +	struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> +	union hv_synic_event_flags *hyp_synic_event_page = hv_cpu->hyp_synic_event_page;
> +	union hv_synic_event_flags *para_synic_event_page = hv_cpu->para_synic_event_page;
> +
> +	return
> +		__hv_synic_event_pending(hyp_synic_event_page, VMBUS_MESSAGE_SINT) ||
> +		__hv_synic_event_pending(para_synic_event_page, VMBUS_MESSAGE_SINT);
> +}
> +
>  static int hv_pick_new_cpu(struct vmbus_channel *channel)
>  {
>  	int ret = -EBUSY;
> @@ -517,6 +667,14 @@ int hv_synic_cleanup(unsigned int cpu)
>  	hv_stimer_legacy_cleanup(cpu);
> 
>  	hv_hyp_synic_disable_regs(cpu);
> +	if (vmbus_is_confidential()) {
> +		hv_para_synic_disable_regs(cpu);
> +		hv_para_synic_disable_interrupts();

The ordering of the interrupt handling here is a bit scary to me, but maybe
it's right. hv_hyp_synic_disable_regs() first masks the SINT, and because it's
a SINT MSR, it does so first in the hypervisor and then in the paravisor.
Then hv_para_synic_disable_regs() resets the SIMP and SIEFP. Finally
hv_para_synic_disable_interrupts() does a global interrupt disable via
the paravisor. I worry about there being gaps between steps where bad
things can happen. Might want to confirm with the hypervisor and
paravisor folks.

When VMBus is not confidential, the sequencing is the same as it is
now, which is good. There's presumably no reason to change that.

This is another aspect of my earlier comment about adding comments
that clearly describe how interrupts are to be setup/disabled when both
SynICs are present.

> +	} else {
> +		hv_hyp_synic_disable_interrupts();
> +	}
> +	if (vmbus_irq != -1)
> +		disable_percpu_irq(vmbus_irq);
> 
>  	return ret;
>  }
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 12/15] Drivers: hv: Allocate encrypted buffers when requested
  2025-06-04  0:43 ` [PATCH hyperv-next v3 12/15] Drivers: hv: Allocate encrypted buffers when requested Roman Kisel
@ 2025-06-18 16:19   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:19 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> Confidential VMBus is built around using buffers not shared with
> the host.
> 
> Support allocating encrypted buffers when requested.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/channel.c      | 43 +++++++++++++++++++++++----------------
>  drivers/hv/hyperv_vmbus.h |  3 ++-
>  drivers/hv/ring_buffer.c  |  5 +++--
>  3 files changed, 30 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index fb8cd8469328..3e2891c4b800 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -443,20 +443,23 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  		return ret;
>  	}
> 
> -	/*
> -	 * Set the "decrypted" flag to true for the set_memory_decrypted()
> -	 * success case. In the failure case, the encryption state of the
> -	 * memory is unknown. Leave "decrypted" as true to ensure the
> -	 * memory will be leaked instead of going back on the free list.
> -	 */
> -	gpadl->decrypted = true;
> -	ret = set_memory_decrypted((unsigned long)kbuffer,
> -				   PFN_UP(size));
> -	if (ret) {
> -		dev_warn(&channel->device_obj->device,
> -			 "Failed to set host visibility for new GPADL %d.\n",
> -			 ret);
> -		return ret;
> +	gpadl->decrypted = !((channel->co_external_memory && type == HV_GPADL_BUFFER) ||
> +		(channel->co_ring_buffer && type == HV_GPADL_RING));
> +	if (gpadl->decrypted) {
> +		/*
> +		 * Set the "decrypted" flag to true for the set_memory_decrypted()
> +		 * success case. In the failure case, the encryption state of the
> +		 * memory is unknown. Leave "decrypted" as true to ensure the
> +		 * memory will be leaked instead of going back on the free list.
> +		 */

The wording in the comment seems a little weird since the code below isn't setting
the "decrypted" flag. Perhaps:

The "decrypted" flag being true assumes that set_memory_decrypted() succeeds.
But if it fails, the encryption state of the memory is unknown. In that case, leave
"decrypted" as true to ensure the memory is leaked instead of going back on the
free list.

> +		ret = set_memory_decrypted((unsigned long)kbuffer,
> +					PFN_UP(size));
> +		if (ret) {
> +			dev_warn(&channel->device_obj->device,
> +				"Failed to set host visibility for new GPADL %d.\n",
> +				ret);
> +			return ret;

This return is leaking the "msginfo" memory and any "submsginfo" memory that
was allocated by create_gpadl_header(). It looks like that leak is present in the
existing code as well.

> +		}

There's still a problem here. If VMBus is Confidential, then the buffer memory remains
encrypted. Then later in __vmbus_establish_gpadl() if we get to the "cleanup:" label
with ret != 0 due to some error along the way, set_memory_encrypted() is called on
memory that is already encrypted. That should not be done as
set_memory_encrypted/decrypted() are not idempotent.

>  	}
> 
>  	init_completion(&msginfo->waitevent);
> @@ -676,12 +679,13 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  		goto error_clean_ring;
> 
>  	err = hv_ringbuffer_init(&newchannel->outbound,
> -				 page, send_pages, 0);
> +				 page, send_pages, 0, newchannel->co_ring_buffer);
>  	if (err)
>  		goto error_free_gpadl;
> 
>  	err = hv_ringbuffer_init(&newchannel->inbound, &page[send_pages],
> -				 recv_pages, newchannel->max_pkt_size);
> +				 recv_pages, newchannel->max_pkt_size,
> +				 newchannel->co_ring_buffer);
>  	if (err)
>  		goto error_free_gpadl;
> 
> @@ -862,8 +866,11 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl *gpad
> 
>  	kfree(info);
> 
> -	ret = set_memory_encrypted((unsigned long)gpadl->buffer,
> -				   PFN_UP(gpadl->size));
> +	if (gpadl->decrypted)
> +		ret = set_memory_encrypted((unsigned long)gpadl->buffer,
> +					PFN_UP(gpadl->size));
> +	else
> +		ret = 0;
>  	if (ret)
>  		pr_warn("Fail to set mem host visibility in GPADL teardown %d.\n", ret);
> 
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index c1df611d1eb2..0f02e163b0a0 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -199,7 +199,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
>  void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
> 
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> -		       struct page *pages, u32 pagecnt, u32 max_pkt_size);
> +		       struct page *pages, u32 pagecnt, u32 max_pkt_size,
> +			   bool confidential);
> 
>  void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
> 
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 3c9b02471760..05c2cd42fc75 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -183,7 +183,8 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
> 
>  /* Initialize the ring buffer. */
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> -		       struct page *pages, u32 page_cnt, u32 max_pkt_size)
> +		       struct page *pages, u32 page_cnt, u32 max_pkt_size,
> +			   bool confidential)
>  {
>  	struct page **pages_wraparound;
>  	int i;
> @@ -207,7 +208,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> 
>  	ring_info->ring_buffer = (struct hv_ring_buffer *)
>  		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
> -			pgprot_decrypted(PAGE_KERNEL));
> +			confidential ? PAGE_KERNEL : pgprot_decrypted(PAGE_KERNEL));
> 
>  	kfree(pages_wraparound);
>  	if (!ring_info->ring_buffer)
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels
  2025-06-04  0:43 ` [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels Roman Kisel
  2025-06-04 14:15   ` ALOK TIWARI
@ 2025-06-18 16:19   ` Michael Kelley
  1 sibling, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:19 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> To run a confidential VMBus channels, one has to initialize the

How about:

To make use of Confidential VMBus channels, initialize the

> co_ring_buffers and co_external_memory fields of the channel
> structure.
> 
> Advertise support upon negoatiating the version and compute

s/negoatiating/negotiating/

> values for those fields and initialize them.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 19 +++++++++++++++++++
>  drivers/hv/connection.c   |  3 +++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index ca2fe10c110a..33bc29e826bd 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -1021,6 +1021,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>  	struct vmbus_channel_offer_channel *offer;
>  	struct vmbus_channel *oldchannel, *newchannel;
>  	size_t offer_sz;
> +	bool co_ring_buffer, co_external_memory;
> 
>  	offer = (struct vmbus_channel_offer_channel *)hdr;
> 
> @@ -1033,6 +1034,22 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>  		return;
>  	}
> 
> +	co_ring_buffer = is_co_ring_buffer(offer);
> +	if (co_ring_buffer) {
> +		if (vmbus_proto_version < VERSION_WIN10_V6_0 || !vmbus_is_confidential()) {
> +			atomic_dec(&vmbus_connection.offer_in_progress);
> +			return;
> +		}
> +	}
> +
> +	co_external_memory = is_co_external_memory(offer);
> +	if (is_co_external_memory(offer)) {

Use the local variable co_external_memory instead of the function, like with co_ring_buffer?
Consistency .... :-)


> +		if (vmbus_proto_version < VERSION_WIN10_V6_0 || !vmbus_is_confidential()) {
> +			atomic_dec(&vmbus_connection.offer_in_progress);
> +			return;
> +		}

The test for valid vmbus_proto_version and VMBus being confidential is duplicated. You
could do:

	if (co_ring_buffer || co_external_memory)

and have just one copy of the tests. Also I'd suggest adding an error message like
with the "vmbus_is_valid_offer()" test at the start of vmbus_onoffer(). That way
incoming offers aren't silently ignored. Something is wrong on the paravisor or
host side if the tests fail.

Also, since the combination where co_external_memory = true and
co_ring_buffer = false is not allowed, perhaps a check for that invalid
combination should be made here as well.

> +	}
> +
>  	oldchannel = find_primary_channel_by_offer(offer);
> 
>  	if (oldchannel != NULL) {
> @@ -1111,6 +1128,8 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>  		pr_err("Unable to allocate channel object\n");
>  		return;
>  	}
> +	newchannel->co_ring_buffer = co_ring_buffer;
> +	newchannel->co_external_memory = co_external_memory;
> 
>  	vmbus_setup_channel_state(newchannel, offer);
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index be490c598785..eeb472019d69 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -105,6 +105,9 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
>  		vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
>  	}
> 
> +	if (vmbus_is_confidential() && version >= VERSION_WIN10_V6_0)
> +		msg->feature_flags = VMBUS_FEATURE_FLAG_CONFIDENTIAL_CHANNELS;
> +
>  	/*
>  	 * shared_gpa_boundary is zero in non-SNP VMs, so it's safe to always
>  	 * bitwise OR it
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH hyperv-next v3 14/15] Drivers: hv: Support establishing the confidential VMBus connection
  2025-06-04  0:43 ` [PATCH hyperv-next v3 14/15] Drivers: hv: Support establishing the confidential VMBus connection Roman Kisel
@ 2025-06-18 16:19   ` Michael Kelley
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Kelley @ 2025-06-18 16:19 UTC (permalink / raw)
  To: Roman Kisel, alok.a.tiwari@oracle.com, arnd@arndb.de,
	bp@alien8.de, corbet@lwn.net, dave.hansen@linux.intel.com,
	decui@microsoft.com, haiyangz@microsoft.com, hpa@zytor.com,
	kys@microsoft.com, mingo@redhat.com, tglx@linutronix.de,
	wei.liu@kernel.org, linux-arch@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com,
	bperkins@microsoft.com, sunilmut@microsoft.com

From: Roman Kisel <romank@linux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
> 
> To establish the confidential VMBus connection the CoCo VM guest
> first attempts to connect to the VMBus server run by the paravisor.
> If that fails, the guest falls back to the non-confidential VMBus.
> 
> Implement that in the VMBus driver initialization.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 169 +++++++++++++++++++++++++++--------------
>  1 file changed, 110 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f7e82a4fe133..88701c3ad999 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1057,12 +1057,9 @@ static void vmbus_onmessage_work(struct work_struct *work)
>  	kfree(ctx);
>  }
> 
> -void vmbus_on_msg_dpc(unsigned long data)
> +static void __vmbus_on_msg_dpc(void *message_page_addr)
>  {
> -	struct hv_per_cpu_context *hv_cpu = (void *)data;
> -	void *page_addr = hv_cpu->hyp_synic_message_page;
> -	struct hv_message msg_copy, *msg = (struct hv_message *)page_addr +
> -				  VMBUS_MESSAGE_SINT;
> +	struct hv_message msg_copy, *msg;
>  	struct vmbus_channel_message_header *hdr;
>  	enum vmbus_channel_message_type msgtype;
>  	const struct vmbus_channel_message_table_entry *entry;
> @@ -1070,6 +1067,10 @@ void vmbus_on_msg_dpc(unsigned long data)
>  	__u8 payload_size;
>  	u32 message_type;
> 
> +	if (!message_page_addr)
> +		return;
> +	msg = (struct hv_message *)message_page_addr + VMBUS_MESSAGE_SINT;
> +
>  	/*
>  	 * 'enum vmbus_channel_message_type' is supposed to always be 'u32' as
>  	 * it is being used in 'struct vmbus_channel_message_header' definition
> @@ -1195,6 +1196,14 @@ void vmbus_on_msg_dpc(unsigned long data)
>  	vmbus_signal_eom(msg, message_type);
>  }
> 
> +void vmbus_on_msg_dpc(unsigned long data)
> +{
> +	struct hv_per_cpu_context *hv_cpu = (void *)data;
> +
> +	__vmbus_on_msg_dpc(hv_cpu->hyp_synic_message_page);
> +	__vmbus_on_msg_dpc(hv_cpu->para_synic_message_page);
> +}
> +
>  #ifdef CONFIG_PM_SLEEP
>  /*
>   * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
> @@ -1233,21 +1242,19 @@ static void vmbus_force_channel_rescinded(struct
> vmbus_channel *channel)
>  #endif /* CONFIG_PM_SLEEP */
> 
>  /*
> - * Schedule all channels with events pending
> + * Schedule all channels with events pending.
> + * The event page can be directly checked to get the id of
> + * the channel that has the interrupt pending.
>   */
> -static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
> +static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu, void *event_page_addr)

The hv_cpu parameter to this function isn't used.  Couldn't it be removed? Perhaps there's a
case for making the function API more parallel to vmbus_message_sched(), but in my judgment
that's not enough value to warrant passing an unneeded parameter.

>  {
>  	unsigned long *recv_int_page;
>  	u32 maxbits, relid;
> +	union hv_synic_event_flags *event;
> 
> -	/*
> -	 * The event page can be directly checked to get the id of
> -	 * the channel that has the interrupt pending.
> -	 */
> -	void *page_addr = hv_cpu->hyp_synic_event_page;
> -	union hv_synic_event_flags *event
> -		= (union hv_synic_event_flags *)page_addr +
> -					 VMBUS_MESSAGE_SINT;
> +	if (!event_page_addr)
> +		return;
> +	event = (union hv_synic_event_flags *)event_page_addr + VMBUS_MESSAGE_SINT;
> 
>  	maxbits = HV_EVENT_FLAGS_COUNT;
>  	recv_int_page = event->flags;
> @@ -1318,26 +1325,40 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
>  	}
>  }
> 
> -static void vmbus_isr(void)
> +static void vmbus_message_sched(struct hv_per_cpu_context *hv_cpu, void *message_page_addr)
>  {
> -	struct hv_per_cpu_context *hv_cpu
> -		= this_cpu_ptr(hv_context.cpu_context);
> -	void *page_addr;
>  	struct hv_message *msg;
> 
> -	vmbus_chan_sched(hv_cpu);
> -
> -	page_addr = hv_cpu->hyp_synic_message_page;
> -	msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
> +	if (!message_page_addr)
> +		return;
> +	msg = (struct hv_message *)message_page_addr + VMBUS_MESSAGE_SINT;
> 
>  	/* Check if there are actual msgs to be processed */
>  	if (msg->header.message_type != HVMSG_NONE) {
>  		if (msg->header.message_type == HVMSG_TIMER_EXPIRED) {
>  			hv_stimer0_isr();
>  			vmbus_signal_eom(msg, HVMSG_TIMER_EXPIRED);
> -		} else
> +		} else {
>  			tasklet_schedule(&hv_cpu->msg_dpc);
> +		}
>  	}
> +}
> +
> +static void vmbus_isr(void)
> +{
> +	struct hv_per_cpu_context *hv_cpu
> +		= this_cpu_ptr(hv_context.cpu_context);
> +
> +	/*
> +	 * Suggested-by: Michael Kelley <mhklinux@outlook.com>
> +	 * One possible optimization would be to keep track of the largest relID that's in use,
> +	 * and only scan up to that relID.
> +	 */

I'd put this comment in vmbus_chan_sched() just before the "for_each_set_bit()"
loop. That's the loop that is doing the scanning.

> +	vmbus_chan_sched(hv_cpu, hv_cpu->hyp_synic_event_page);
> +	vmbus_chan_sched(hv_cpu, hv_cpu->para_synic_event_page);
> +
> +	vmbus_message_sched(hv_cpu, hv_cpu->hyp_synic_message_page);
> +	vmbus_message_sched(hv_cpu, hv_cpu->para_synic_message_page);
> 
>  	add_interrupt_randomness(vmbus_interrupt);
>  }
> @@ -1355,6 +1376,60 @@ static void vmbus_percpu_work(struct work_struct *work)
>  	hv_synic_init(cpu);
>  }
> 
> +static int vmbus_alloc_synic_and_connect(void)
> +{
> +	int ret, cpu;
> +	struct work_struct __percpu *works;
> +	int hyperv_cpuhp_online;
> +
> +	ret = hv_synic_alloc();
> +	if (ret < 0)
> +		goto err_alloc;
> +

Extra blank line here.

> +
> +	works = alloc_percpu(struct work_struct);
> +	if (!works) {
> +		ret = -ENOMEM;
> +		goto err_alloc;
> +	}
> +
> +	/*
> +	 * Initialize the per-cpu interrupt state and stimer state.
> +	 * Then connect to the host.
> +	 */
> +	cpus_read_lock();
> +	for_each_online_cpu(cpu) {
> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +
> +		INIT_WORK(work, vmbus_percpu_work);
> +		schedule_work_on(cpu, work);
> +	}
> +
> +	for_each_online_cpu(cpu)
> +		flush_work(per_cpu_ptr(works, cpu));
> +
> +	/* Register the callbacks for possible CPU online/offline'ing */
> +	ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
> +						   hv_synic_init, hv_synic_cleanup);
> +	cpus_read_unlock();

There's a subtle problem here. As soon as the lock is released, there could be
vCPUs that start coming online (started by the udev daemon), and running
hv_synic_init(). My assumption is that hv_synic_init() might fail if this invocation
of vmbus_alloc_synic_and_connect() is trying to determine if Confidential VMBus
is supported. A failure will abort the new CPU coming online. That suggests that
hv_synic_init() should not return an error in the case where initializing the paravisor
SynIC doesn't work.

> +	free_percpu(works);
> +	if (ret < 0)
> +		goto err_alloc;
> +	hyperv_cpuhp_online = ret;
> +
> +	ret = vmbus_connect();

When doing vmbus_alloc_synic_and_connect() the first time with "is_confidential"
set to "true", where exactly does the failure occur if the paravisor doesn't support
Confidential VMBus? Since your patches add machinery to detect MSR accesses
against the paravisor that fail, I'm presuming hv_synic_init() will fail for all vCPUs.
But the error return from hv_synic_init() isn't checked (except as I mentioned above
for cpuhp_setup_state). So evidently the failure is detected in vmbus_connect() when
vmbus_negotiate_version() sends the INITIATE_CONTACT message. And presumably
sending the message fails because there's no way to wait for a failure response since
the SynICs didn't initialized. So is it the HV_POST_MESSAGE hypercall that is the
exact point of failure, and if so, what error status is returned? It looks like
vmbus_post_msg() would output an error message. Or is there some other failure
point that I'm missing?

I ask because there's a lot of work done before the failure is detected. Each
vCPU must try to initialize their paravisor SynIC and fail. Then each CPU does
hv_synic_cleanup() as part of cpuhp_remove_state(). Workqueues are created
in vmbus_connect(), and memory is allocated, all of which must be cleaned
up. Then everything is retried with "is_confidential" set to "false". Do you have
any sense of how much elapsed time it takes to get the initial failure and then
cleanup? Consider the case with a large number of vCPUs, all of which must
run hv_synic_init() and then hv_synic_cleanup(). See commit 87c9741a38c4
where this elapsed time was a concern in large VMs.

Can the failure be detected earlier by doing a simple MSR read against
the paravisor for an MSR that's only available if Confidential VMBus is
implemented? Then "is _confidential" could be set correctly before
all the work is done, and the work would only be done once even
if Confidential VMBus isn't supported.

> +	if (ret)
> +		goto err_connect;
> +	return 0;
> +
> +err_connect:
> +	cpuhp_remove_state(hyperv_cpuhp_online);
> +	return -ENODEV;
> +err_alloc:
> +	hv_synic_free();
> +	return -ENOMEM;
> +}
> +
>  /*
>   * vmbus_bus_init -Main vmbus driver initialization routine.
>   *
> @@ -1365,8 +1440,7 @@ static void vmbus_percpu_work(struct work_struct *work)
>   */
>  static int vmbus_bus_init(void)
>  {
> -	int ret, cpu;
> -	struct work_struct __percpu *works;
> +	int ret;
> 
>  	ret = hv_init();
>  	if (ret != 0) {
> @@ -1401,41 +1475,21 @@ static int vmbus_bus_init(void)
>  		}
>  	}
> 
> -	ret = hv_synic_alloc();
> -	if (ret)
> -		goto err_alloc;
> -
> -	works = alloc_percpu(struct work_struct);
> -	if (!works) {
> -		ret = -ENOMEM;
> -		goto err_alloc;
> -	}
> -
>  	/*
> -	 * Initialize the per-cpu interrupt state and stimer state.
> -	 * Then connect to the host.
> +	 * Attempt to establish the confidential VMBus connection first if this VM is
> +	 * a hardware confidential VM, and the paravisor is present.
>  	 */
> -	cpus_read_lock();
> -	for_each_online_cpu(cpu) {
> -		struct work_struct *work = per_cpu_ptr(works, cpu);
> +	ret = -ENODEV;
> +	if (ms_hyperv.paravisor_present && (hv_isolation_type_tdx() || hv_isolation_type_snp())) {
> +		is_confidential = true;
> +		ret = vmbus_alloc_synic_and_connect();
> +		is_confidential = !ret;
> 
> -		INIT_WORK(work, vmbus_percpu_work);
> -		schedule_work_on(cpu, work);
> +		pr_info("VMBus is confidential: %d\n", is_confidential);
>  	}
> 
> -	for_each_online_cpu(cpu)
> -		flush_work(per_cpu_ptr(works, cpu));
> -
> -	/* Register the callbacks for possible CPU online/offline'ing */
> -	ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
> -						   hv_synic_init, hv_synic_cleanup);
> -	cpus_read_unlock();
> -	free_percpu(works);
> -	if (ret < 0)
> -		goto err_alloc;
> -	hyperv_cpuhp_online = ret;
> -
> -	ret = vmbus_connect();
> +	if (!is_confidential)
> +		ret = vmbus_alloc_synic_and_connect();
>  	if (ret)
>  		goto err_connect;
> 
> @@ -1451,9 +1505,6 @@ static int vmbus_bus_init(void)
>  	return 0;
> 
>  err_connect:
> -	cpuhp_remove_state(hyperv_cpuhp_online);
> -err_alloc:
> -	hv_synic_free();
>  	if (vmbus_irq == -1) {
>  		hv_remove_vmbus_handler();
>  	} else {
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2025-06-18 16:19 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04  0:43 [PATCH hyperv-next v3 00/15] Confidential VMBus Roman Kisel
2025-06-04  0:43 ` [PATCH hyperv-next v3 01/15] Documentation: hyperv: " Roman Kisel
2025-06-04  2:56   ` Randy Dunlap
2025-06-04 12:53   ` Jonathan Corbet
2025-06-04 13:20   ` ALOK TIWARI
2025-06-18 16:17   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 02/15] drivers: hv: VMBus protocol version 6.0 Roman Kisel
2025-06-04  0:43 ` [PATCH hyperv-next v3 03/15] arch: hyperv: Get/set SynIC synth.registers via paravisor Roman Kisel
2025-06-04 13:27   ` ALOK TIWARI
2025-06-18 16:17   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 04/15] arch/x86: mshyperv: Trap on access for some synthetic MSRs Roman Kisel
2025-06-04 13:38   ` ALOK TIWARI
2025-06-18 16:18   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 05/15] Drivers: hv: Rename fields for SynIC message and event pages Roman Kisel
2025-06-18 16:18   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor SynIC pages when required Roman Kisel
2025-06-18 16:18   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 07/15] Drivers: hv: Post messages via the confidential VMBus if available Roman Kisel
2025-06-04 13:48   ` ALOK TIWARI
2025-06-18 16:18   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 08/15] Drivers: hv: remove stale comment Roman Kisel
2025-06-04  0:43 ` [PATCH hyperv-next v3 09/15] Drivers: hv: Use memunmap() to check if the address is in IO map Roman Kisel
2025-06-18 16:18   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 10/15] Drivers: hv: Rename the SynIC enable and disable routines Roman Kisel
2025-06-18 16:19   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 11/15] Drivers: hv: Functions for setting up and tearing down the paravisor SynIC Roman Kisel
2025-06-18 16:19   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 12/15] Drivers: hv: Allocate encrypted buffers when requested Roman Kisel
2025-06-18 16:19   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 13/15] Drivers: hv: Support confidential VMBus channels Roman Kisel
2025-06-04 14:15   ` ALOK TIWARI
2025-06-18 16:19   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 14/15] Drivers: hv: Support establishing the confidential VMBus connection Roman Kisel
2025-06-18 16:19   ` Michael Kelley
2025-06-04  0:43 ` [PATCH hyperv-next v3 15/15] Drivers: hv: Set the default VMBus version to 6.0 Roman Kisel
2025-06-18 16:13 ` [PATCH hyperv-next v3 00/15] Confidential VMBus Michael Kelley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).