[RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
@ 2025-11-24 18:29 Tianyu Lan
  2025-11-27  5:15 ` vdso
  2025-11-28 17:47 ` Michael Kelley
  0 siblings, 2 replies; 8+ messages in thread
From: Tianyu Lan @ 2025-11-24 18:29 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, vdso
  Cc: Tianyu Lan, linux-hyperv, linux-kernel

In CVM(Confidential VM), system memory is encrypted
by default. Device drivers typically use the swiotlb
bounce buffer for DMA memory, which is decrypted
and shared between the guest and host. Confidential
Vmbus, however, supports a confidential channel
that employs encrypted memory for the Vmbus ring
buffer and external DMA memory. The support for
the confidential ring buffer has already been
integrated.

In CVM, device drivers usually employ the standard
DMA API to map DMA memory with the bounce buffer,
which remains transparent to the device driver.
For external DMA memory support, Hyper-V specific
DMA operations are introduced, bypassing the bounce
buffer when the confidential external memory flag
is set. These DMA operations might also be reused
for TDISP devices in the future, which also support
DMA operations with encrypted memory.

The DMA operations used are global architecture
DMA operations (for details, see get_arch_dma_ops()
and get_dma_ops()), and there is no need to set up
for each device individually.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0dc4692b411a..ca31231b2c32 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -39,6 +39,9 @@
 #include <clocksource/hyperv_timer.h>
 #include <asm/mshyperv.h>
 #include "hyperv_vmbus.h"
+#include "../../kernel/dma/direct.h"
+
+extern const struct dma_map_ops *dma_ops;
 
 struct vmbus_dynid {
 	struct list_head node;
@@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void)
 	return -ENOMEM;
 }
 
+
+static bool hyperv_private_memory_dma(struct device *dev)
+{
+	struct hv_device *hv_dev = device_to_hv_device(dev);
+
+	if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory)
+		return true;
+	else
+		return false;
+}
+
+static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size,
+		enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	phys_addr_t phys = page_to_phys(page) + offset;
+
+	if (hyperv_private_memory_dma(dev))
+		return __phys_to_dma(dev, phys);
+	else
+		return dma_direct_map_phys(dev, phys, size, dir, attrs);
+}
+
+static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	if (!hyperv_private_memory_dma(dev))
+		dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs);
+}
+
+static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl,
+		int nelems, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	struct scatterlist *sg;
+	dma_addr_t dma_addr;
+	int i;
+
+	if (hyperv_private_memory_dma(dev)) {
+		for_each_sg(sgl, sg, nelems, i) {
+			dma_addr = __phys_to_dma(dev, sg_phys(sg));
+			sg_dma_address(sg) = dma_addr;
+			sg_dma_len(sg) = sg->length;
+		}
+
+		return nelems;
+	} else {
+		return dma_direct_map_sg(dev, sgl, nelems, dir, attrs);
+	}
+}
+
+static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl,
+		int nelems, enum dma_data_direction dir, unsigned long attrs)
+{
+	if (!hyperv_private_memory_dma(dev))
+		dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static int hyperv_dma_supported(struct device *dev, u64 mask)
+{
+	dev->coherent_dma_mask = mask;
+	return 1;
+}
+
+static size_t hyperv_dma_max_mapping_size(struct device *dev)
+{
+	if (hyperv_private_memory_dma(dev))
+		return SIZE_MAX;
+	else
+		return swiotlb_max_mapping_size(dev);
+}
+
+const struct dma_map_ops hyperv_dma_ops = {
+	.map_page               = hyperv_dma_map_page,
+	.unmap_page             = hyperv_dma_unmap_page,
+	.map_sg                 = hyperv_dma_map_sg,
+	.unmap_sg               = hyperv_dma_unmap_sg,
+	.dma_supported          = hyperv_dma_supported,
+	.max_mapping_size	= hyperv_dma_max_mapping_size,
+};
+
 /*
  * vmbus_bus_init -Main vmbus driver initialization routine.
  *
@@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void)
 	 * doing that on each VP while initializing SynIC's wastes time.
 	 */
 	is_confidential = ms_hyperv.confidential_vmbus_available;
-	if (is_confidential)
+	if (is_confidential) {
+		dma_ops = &hyperv_dma_ops;
 		pr_info("Establishing connection to the confidential VMBus\n");
+	}
+
 	hv_para_set_sint_proxy(!is_confidential);
 	ret = vmbus_alloc_synic_and_connect();
 	if (ret)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-11-24 18:29 [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support Tianyu Lan
@ 2025-11-27  5:15 ` vdso
  2025-11-27  9:16   ` Tianyu Lan
  2025-11-28 17:47 ` Michael Kelley
  1 sibling, 1 reply; 8+ messages in thread
From: vdso @ 2025-11-27  5:15 UTC (permalink / raw)
  To: ltykernel
  Cc: decui, haiyangz, kys, linux-hyperv, linux-kernel, longli, tiala,
	vdso, wei.liu

From: Roman Kisel <vdso@hexbites.dev>

Tianyu Lan wrote:

> In CVM(Confidential VM), system memory is encrypted
> by default. Device drivers typically use the swiotlb
> bounce buffer for DMA memory, which is decrypted
> and shared between the guest and host. Confidential
> Vmbus, however, supports a confidential channel
> that employs encrypted memory for the Vmbus ring
> buffer and external DMA memory. The support for
> the confidential ring buffer has already been
> integrated.
>
> In CVM, device drivers usually employ the standard
> DMA API to map DMA memory with the bounce buffer,
> which remains transparent to the device driver.
> For external DMA memory support, Hyper-V specific
> DMA operations are introduced, bypassing the bounce
> buffer when the confidential external memory flag
> is set. These DMA operations might also be reused
> for TDISP devices in the future, which also support
> DMA operations with encrypted memory.
>
> The DMA operations used are global architecture
> DMA operations (for details, see get_arch_dma_ops()
> and get_dma_ops()), and there is no need to set up
> for each device individually.
>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>

Tinayu,

Looks great to me!

Reviewed-by: Roman Kisel <vdso@hexbites.dev>

P.S. For the inclined reader, here is how the old, only for
storage and not satisfactory in other ways my attempt to solve this:

https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/
https://lore.kernel.org/linux-hyperv/20250409000835.285105-7-romank@linux.microsoft.com/

Maybe it'd be a good idea to CC folks who provided feedback back then.

> ---
>  drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 0dc4692b411a..ca31231b2c32 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -39,6 +39,9 @@
>  #include <clocksource/hyperv_timer.h>
>  #include <asm/mshyperv.h>
>  #include "hyperv_vmbus.h"
> +#include "../../kernel/dma/direct.h"
> +
> +extern const struct dma_map_ops *dma_ops;
>
>  struct vmbus_dynid {
>  	struct list_head node;
> @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void)
>  	return -ENOMEM;
>  }
>
> +
> +static bool hyperv_private_memory_dma(struct device *dev)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +
> +	if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory)
> +		return true;
> +	else
> +		return false;
> +}
> +
> +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page,
> +		unsigned long offset, size_t size,
> +		enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	phys_addr_t phys = page_to_phys(page) + offset;
> +
> +	if (hyperv_private_memory_dma(dev))
> +		return __phys_to_dma(dev, phys);
> +	else
> +		return dma_direct_map_phys(dev, phys, size, dir, attrs);
> +}
> +
> +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	if (!hyperv_private_memory_dma(dev))
> +		dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs);
> +}
> +
> +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl,
> +		int nelems, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	struct scatterlist *sg;
> +	dma_addr_t dma_addr;
> +	int i;
> +
> +	if (hyperv_private_memory_dma(dev)) {
> +		for_each_sg(sgl, sg, nelems, i) {
> +			dma_addr = __phys_to_dma(dev, sg_phys(sg));
> +			sg_dma_address(sg) = dma_addr;
> +			sg_dma_len(sg) = sg->length;
> +		}
> +
> +		return nelems;
> +	} else {
> +		return dma_direct_map_sg(dev, sgl, nelems, dir, attrs);
> +	}
> +}
> +
> +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl,
> +		int nelems, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	if (!hyperv_private_memory_dma(dev))
> +		dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs);
> +}
> +
> +static int hyperv_dma_supported(struct device *dev, u64 mask)
> +{
> +	dev->coherent_dma_mask = mask;
> +	return 1;
> +}
> +
> +static size_t hyperv_dma_max_mapping_size(struct device *dev)
> +{
> +	if (hyperv_private_memory_dma(dev))
> +		return SIZE_MAX;
> +	else
> +		return swiotlb_max_mapping_size(dev);
> +}
> +
> +const struct dma_map_ops hyperv_dma_ops = {
> +	.map_page               = hyperv_dma_map_page,
> +	.unmap_page             = hyperv_dma_unmap_page,
> +	.map_sg                 = hyperv_dma_map_sg,
> +	.unmap_sg               = hyperv_dma_unmap_sg,
> +	.dma_supported          = hyperv_dma_supported,
> +	.max_mapping_size	= hyperv_dma_max_mapping_size,
> +};
> +
>  /*
>   * vmbus_bus_init -Main vmbus driver initialization routine.
>   *
> @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void)
>  	 * doing that on each VP while initializing SynIC's wastes time.
>  	 */
>  	is_confidential = ms_hyperv.confidential_vmbus_available;
> -	if (is_confidential)
> +	if (is_confidential) {
> +		dma_ops = &hyperv_dma_ops;
>  		pr_info("Establishing connection to the confidential VMBus\n");
> +	}
> +
>  	hv_para_set_sint_proxy(!is_confidential);
>  	ret = vmbus_alloc_synic_and_connect();
>  	if (ret)
> --
> 2.50.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-11-27  5:15 ` vdso
@ 2025-11-27  9:16   ` Tianyu Lan
  0 siblings, 0 replies; 8+ messages in thread
From: Tianyu Lan @ 2025-11-27  9:16 UTC (permalink / raw)
  To: vdso
  Cc: decui, haiyangz, kys, linux-hyperv, linux-kernel, longli, tiala,
	wei.liu

On Thu, Nov 27, 2025 at 1:16 PM <vdso@hexbites.dev> wrote:
>
> From: Roman Kisel <vdso@hexbites.dev>
>
> Tianyu Lan wrote:
>
> > In CVM(Confidential VM), system memory is encrypted
> > by default. Device drivers typically use the swiotlb
> > bounce buffer for DMA memory, which is decrypted
> > and shared between the guest and host. Confidential
> > Vmbus, however, supports a confidential channel
> > that employs encrypted memory for the Vmbus ring
> > buffer and external DMA memory. The support for
> > the confidential ring buffer has already been
> > integrated.
> >
> > In CVM, device drivers usually employ the standard
> > DMA API to map DMA memory with the bounce buffer,
> > which remains transparent to the device driver.
> > For external DMA memory support, Hyper-V specific
> > DMA operations are introduced, bypassing the bounce
> > buffer when the confidential external memory flag
> > is set. These DMA operations might also be reused
> > for TDISP devices in the future, which also support
> > DMA operations with encrypted memory.
> >
> > The DMA operations used are global architecture
> > DMA operations (for details, see get_arch_dma_ops()
> > and get_dma_ops()), and there is no need to set up
> > for each device individually.
> >
> > Signed-off-by: Tianyu Lan <tiala@microsoft.com>
>
> Tinayu,
>
> Looks great to me!
>
> Reviewed-by: Roman Kisel <vdso@hexbites.dev>
>
> P.S. For the inclined reader, here is how the old, only for
> storage and not satisfactory in other ways my attempt to solve this:
>
> https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/
> https://lore.kernel.org/linux-hyperv/20250409000835.285105-7-romank@linux.microsoft.com/
>
> Maybe it'd be a good idea to CC folks who provided feedback back then.
>
Hi Roman:
       Thanks for your review. I will follow your suggestion.

Thanks

> > ---
> >  drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 89 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index 0dc4692b411a..ca31231b2c32 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -39,6 +39,9 @@
> >  #include <clocksource/hyperv_timer.h>
> >  #include <asm/mshyperv.h>
> >  #include "hyperv_vmbus.h"
> > +#include "../../kernel/dma/direct.h"
> > +
> > +extern const struct dma_map_ops *dma_ops;
> >
> >  struct vmbus_dynid {
> >       struct list_head node;
> > @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void)
> >       return -ENOMEM;
> >  }
> >
> > +
> > +static bool hyperv_private_memory_dma(struct device *dev)
> > +{
> > +     struct hv_device *hv_dev = device_to_hv_device(dev);
> > +
> > +     if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory)
> > +             return true;
> > +     else
> > +             return false;
> > +}
> > +
> > +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page,
> > +             unsigned long offset, size_t size,
> > +             enum dma_data_direction dir,
> > +             unsigned long attrs)
> > +{
> > +     phys_addr_t phys = page_to_phys(page) + offset;
> > +
> > +     if (hyperv_private_memory_dma(dev))
> > +             return __phys_to_dma(dev, phys);
> > +     else
> > +             return dma_direct_map_phys(dev, phys, size, dir, attrs);
> > +}
> > +
> > +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> > +             size_t size, enum dma_data_direction dir, unsigned long attrs)
> > +{
> > +     if (!hyperv_private_memory_dma(dev))
> > +             dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs);
> > +}
> > +
> > +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl,
> > +             int nelems, enum dma_data_direction dir,
> > +             unsigned long attrs)
> > +{
> > +     struct scatterlist *sg;
> > +     dma_addr_t dma_addr;
> > +     int i;
> > +
> > +     if (hyperv_private_memory_dma(dev)) {
> > +             for_each_sg(sgl, sg, nelems, i) {
> > +                     dma_addr = __phys_to_dma(dev, sg_phys(sg));
> > +                     sg_dma_address(sg) = dma_addr;
> > +                     sg_dma_len(sg) = sg->length;
> > +             }
> > +
> > +             return nelems;
> > +     } else {
> > +             return dma_direct_map_sg(dev, sgl, nelems, dir, attrs);
> > +     }
> > +}
> > +
> > +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl,
> > +             int nelems, enum dma_data_direction dir, unsigned long attrs)
> > +{
> > +     if (!hyperv_private_memory_dma(dev))
> > +             dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs);
> > +}
> > +
> > +static int hyperv_dma_supported(struct device *dev, u64 mask)
> > +{
> > +     dev->coherent_dma_mask = mask;
> > +     return 1;
> > +}
> > +
> > +static size_t hyperv_dma_max_mapping_size(struct device *dev)
> > +{
> > +     if (hyperv_private_memory_dma(dev))
> > +             return SIZE_MAX;
> > +     else
> > +             return swiotlb_max_mapping_size(dev);
> > +}
> > +
> > +const struct dma_map_ops hyperv_dma_ops = {
> > +     .map_page               = hyperv_dma_map_page,
> > +     .unmap_page             = hyperv_dma_unmap_page,
> > +     .map_sg                 = hyperv_dma_map_sg,
> > +     .unmap_sg               = hyperv_dma_unmap_sg,
> > +     .dma_supported          = hyperv_dma_supported,
> > +     .max_mapping_size       = hyperv_dma_max_mapping_size,
> > +};
> > +
> >  /*
> >   * vmbus_bus_init -Main vmbus driver initialization routine.
> >   *
> > @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void)
> >        * doing that on each VP while initializing SynIC's wastes time.
> >        */
> >       is_confidential = ms_hyperv.confidential_vmbus_available;
> > -     if (is_confidential)
> > +     if (is_confidential) {
> > +             dma_ops = &hyperv_dma_ops;
> >               pr_info("Establishing connection to the confidential VMBus\n");
> > +     }
> > +
> >       hv_para_set_sint_proxy(!is_confidential);
> >       ret = vmbus_alloc_synic_and_connect();
> >       if (ret)
> > --
> > 2.50.1



-- 
Thanks
Tianyu Lan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-11-24 18:29 [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support Tianyu Lan
  2025-11-27  5:15 ` vdso
@ 2025-11-28 17:47 ` Michael Kelley
  2025-12-03 14:21   ` Tianyu Lan
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Kelley @ 2025-11-28 17:47 UTC (permalink / raw)
  To: Tianyu Lan, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com,
	vdso@hexbites.dev
  Cc: Tianyu Lan, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM
> 
> In CVM(Confidential VM), system memory is encrypted
> by default. Device drivers typically use the swiotlb
> bounce buffer for DMA memory, which is decrypted
> and shared between the guest and host. Confidential
> Vmbus, however, supports a confidential channel

s/Vmbus/VMBus/   [and elsewhere in this commit msg]

> that employs encrypted memory for the Vmbus ring
> buffer and external DMA memory. The support for
> the confidential ring buffer has already been
> integrated.
> 
> In CVM, device drivers usually employ the standard
> DMA API to map DMA memory with the bounce buffer,
> which remains transparent to the device driver.
> For external DMA memory support,

The "external memory" terminology is not at all clear in the context
of Linux. Presumably the terminology came from Hyper-V or the
paravisor, but I never understood what the "external" refers to. Maybe
it is memory that is "external" with respect to the hypervisor, and therefore
not shared between the hypervisor and guest? But that meaning won't be
evident to other reviewers in the Linux kernel community. In any case, some
explanation of "external memory" is needed. And even consider changing the
field names in the code to be something that makes more sense to Linux.

> Hyper-V specific
> DMA operations are introduced, bypassing the bounce
> buffer when the confidential external memory flag
> is set.

This commit message never explains why it is important to not do bounce
buffering. There's the obvious but unstated reason of improving performance,
but that's not the main reason. The main reason is a confidentiality leak.
When available, Confidential VMBus would be used because it keeps the
DMA data private (i.e., encrypted) and confidential to the guest. Bounce
buffering copies the data to shared (i.e., decrypted) swiotlb memory, where
it is exposed to the hypervisor. That's a confidentiality leak, and is the
primary reason the bounce buffering must be eliminated. This requirement
was pointed out by Robin Murphy in the discussion of Roman Kisel's
original code to eliminate the bounce buffering.

Separately, I have major qualms about using an approach with Hyper-V specific
DMA operations. As implemented here, these DMA operations bypass all
the kernel DMA layer logic when the VMBus synthetic device indicates
"external memory". In that case, the supplied physical address is just used
as the DMA address and everything else is bypassed. While this actually works
for VMBus synthetic devices because of their simple requirements, it also
is implicitly making some assumptions. These assumptions are true today
(as far as I know) but won't necessarily be true in the future.  The assumptions
include:

* There's no vIOMMU in the guest VM. IOMMUs in CoCo VMs have their
own issues to work out, but an implementation that bypasses any IOMMU
logic in the DMA layer is very short-term thinking.

* There are no PCI pass-thru devices in the guest.  Since the implementation
changes the global dma_ops, the drivers for such PCI pass-thru devices would
use the same global dma_ops, and would fail.

The code also has some other problems that I point out further down.

In any case, an approach with Hyper-V specific DMA operations seems like
a very temporary fix (hack?) at best. Further down, I'll proposed at alternate
approach that preserves all the existing DMA layer functionality.

> These DMA operations might also be reused
> for TDISP devices in the future, which also support
> DMA operations with encrypted memory.
> 
> The DMA operations used are global architecture
> DMA operations (for details, see get_arch_dma_ops()
> and get_dma_ops()), and there is no need to set up
> for each device individually.
> 
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 89 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 0dc4692b411a..ca31231b2c32 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -39,6 +39,9 @@
>  #include <clocksource/hyperv_timer.h>
>  #include <asm/mshyperv.h>
>  #include "hyperv_vmbus.h"
> +#include "../../kernel/dma/direct.h"
> +
> +extern const struct dma_map_ops *dma_ops;
> 
>  struct vmbus_dynid {
>  	struct list_head node;
> @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void)
>  	return -ENOMEM;
>  }
> 
> +
> +static bool hyperv_private_memory_dma(struct device *dev)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);

device_to_hv_device() works only when "dev" is for a VMBus device. As noted above,
if a CoCo VM were ever to have a PCI pass-thru device doing DMA, "dev" would
be some PCI device, and device_to_hv_device() would return garbage. 

> +
> +	if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory)
> +		return true;
> +	else
> +		return false;
> +}
> +
> +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page,
> +		unsigned long offset, size_t size,
> +		enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	phys_addr_t phys = page_to_phys(page) + offset;
> +
> +	if (hyperv_private_memory_dma(dev))
> +		return __phys_to_dma(dev, phys);
> +	else
> +		return dma_direct_map_phys(dev, phys, size, dir, attrs);

This code won't build when the VMBus driver is built as a module. 
dma_direct_map_phys() is in inline function that references several other
DMA functions that aren't exported because they aren't intended to be
used by drivers. Same problems occur with dma_direct_unmap_phys()
and similar functions used below.

> +}
> +
> +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	if (!hyperv_private_memory_dma(dev))
> +		dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs);
> +}
> +
> +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl,
> +		int nelems, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	struct scatterlist *sg;
> +	dma_addr_t dma_addr;
> +	int i;
> +
> +	if (hyperv_private_memory_dma(dev)) {
> +		for_each_sg(sgl, sg, nelems, i) {
> +			dma_addr = __phys_to_dma(dev, sg_phys(sg));
> +			sg_dma_address(sg) = dma_addr;
> +			sg_dma_len(sg) = sg->length;
> +		}
> +
> +		return nelems;
> +	} else {
> +		return dma_direct_map_sg(dev, sgl, nelems, dir, attrs);
> +	}
> +}
> +
> +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl,
> +		int nelems, enum dma_data_direction dir, unsigned long attrs)
> +{
> +	if (!hyperv_private_memory_dma(dev))
> +		dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs);
> +}
> +
> +static int hyperv_dma_supported(struct device *dev, u64 mask)
> +{
> +	dev->coherent_dma_mask = mask;
> +	return 1;
> +}
> +
> +static size_t hyperv_dma_max_mapping_size(struct device *dev)
> +{
> +	if (hyperv_private_memory_dma(dev))
> +		return SIZE_MAX;
> +	else
> +		return swiotlb_max_mapping_size(dev);
> +}
> +
> +const struct dma_map_ops hyperv_dma_ops = {
> +	.map_page               = hyperv_dma_map_page,
> +	.unmap_page             = hyperv_dma_unmap_page,
> +	.map_sg                 = hyperv_dma_map_sg,
> +	.unmap_sg               = hyperv_dma_unmap_sg,
> +	.dma_supported          = hyperv_dma_supported,
> +	.max_mapping_size	= hyperv_dma_max_mapping_size,
> +};
> +
>  /*
>   * vmbus_bus_init -Main vmbus driver initialization routine.
>   *
> @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void)
>  	 * doing that on each VP while initializing SynIC's wastes time.
>  	 */
>  	is_confidential = ms_hyperv.confidential_vmbus_available;
> -	if (is_confidential)
> +	if (is_confidential) {
> +		dma_ops = &hyperv_dma_ops;

arm64 doesn't have the global variable dma_ops, so this patch
won't build on arm64, even if Confidential VMBus isn't yet available
for arm64.

>  		pr_info("Establishing connection to the confidential VMBus\n");
> +	}
> +
>  	hv_para_set_sint_proxy(!is_confidential);
>  	ret = vmbus_alloc_synic_and_connect();
>  	if (ret)
> --
> 2.50.1
>

Here's my idea for an alternate approach.  The goal is to allow use of the
swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb
usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the
default swiotlb memory.  For VMBus devices, the calling sequence is
vmbus_device_register() -> device_register() -> device_initialize() ->
swiotlb_dev_init(). But if vmbus_device_register() could override the
dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations
would be disabled on the device. Furthermore, is_swiotlb_force_bounce()
would return "false", and the normal DMA functions would not force the
use of bounce buffers. The entire code change looks like this:

--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
        child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
        dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));

+       device_initialize(&child_device_obj->device);
+       if (child_device_obj->channel->co_external_memory)
+               child_device_obj->device.dma_io_tlb_mem = NULL;
+
        /*
         * Register with the LDM. This will kick off the driver/device
         * binding...which will eventually call vmbus_match() and vmbus_probe()
         */
-       ret = device_register(&child_device_obj->device);
+       ret = device_add(&child_device_obj->device);
        if (ret) {
                pr_err("Unable to register child device\n");
                put_device(&child_device_obj->device);

I've only compile tested the above since I don't have an environment where
I can test Confidential VMBus. You would need to verify whether my thinking
is correct and this produces the intended result.

Directly setting dma_io_tlb_mem to NULL isn't great. It would be better
to add an exported function swiotlb_dev_disable() to swiotlb code that sets
dma_io_tlb_mem to NULL, but you get the idea.

Other reviewers may still see this approach as a bit of a hack, but it's a lot
less of a hack than introducing Hyper-V specific DMA functions.
swiotlb_dev_disable() is conceptually needed for TDISP devices, as TDISP
devices must similarly protect confidentiality by not allowing use of the swiotlb.
So adding swiotlb_dev_disable() is a step in the right direction, even if the
eventual TDISP code does it slightly differently. Doing the disable on a
per-device basis is also the right thing in the long run.

Michael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-11-28 17:47 ` Michael Kelley
@ 2025-12-03 14:21   ` Tianyu Lan
  2025-12-04  3:35     ` Michael Kelley
  0 siblings, 1 reply; 8+ messages in thread
From: Tianyu Lan @ 2025-12-03 14:21 UTC (permalink / raw)
  To: Michael Kelley, Christoph Hellwig, Robin Murphy
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev,
	Tianyu Lan, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote:
>
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM
> >
> > In CVM(Confidential VM), system memory is encrypted
> > by default. Device drivers typically use the swiotlb
> > bounce buffer for DMA memory, which is decrypted
> > and shared between the guest and host. Confidential
> > Vmbus, however, supports a confidential channel
>

Hi Michael:
      Thanks for your review. I spent some time to do test
and still need paravisor(HCL) team to double check.

> s/Vmbus/VMBus/   [and elsewhere in this commit msg]
>

Will update.

> > that employs encrypted memory for the Vmbus ring
> > buffer and external DMA memory. The support for
> > the confidential ring buffer has already been
> > integrated.
> >
> > In CVM, device drivers usually employ the standard
> > DMA API to map DMA memory with the bounce buffer,
> > which remains transparent to the device driver.
> > For external DMA memory support,
>
> The "external memory" terminology is not at all clear in the context
> of Linux. Presumably the terminology came from Hyper-V or the
> paravisor, but I never understood what the "external" refers to. Maybe
> it is memory that is "external" with respect to the hypervisor, and therefore
> not shared between the hypervisor and guest? But that meaning won't be
> evident to other reviewers in the Linux kernel community. In any case, some
> explanation of "external memory" is needed. And even consider changing the
> field names in the code to be something that makes more sense to Linux.
>

Agree. For VMbus devices, there are two major kinds of communication
methods with
the host.

* VMbus ring buffer.
* Dynamic DMA transition

The confidential ring buffer has been upstreamed and merged into the
Linux kernel 6.18
by Roman. Additionally, the confidential DMA transition, also known as
external memory
allows the use of private/encrypted memory as DMA memory.

 > > Hyper-V specific
> > DMA operations are introduced, bypassing the bounce
> > buffer when the confidential external memory flag
> > is set.
>
> This commit message never explains why it is important to not do bounce
> buffering. There's the obvious but unstated reason of improving performance,
> but that's not the main reason. The main reason is a confidentiality leak.
> When available, Confidential VMBus would be used because it keeps the
> DMA data private (i.e., encrypted) and confidential to the guest. Bounce
> buffering copies the data to shared (i.e., decrypted) swiotlb memory, where
> it is exposed to the hypervisor. That's a confidentiality leak, and is the
> primary reason the bounce buffering must be eliminated. This requirement
> was pointed out by Robin Murphy in the discussion of Roman Kisel's
> original code to eliminate the bounce buffering.
>

For CVM on Hyper-V,, there are typically two layers: VTL0 and VTL2. VTL2
plays a similar role as L1 hypervisor of normal guest in VTL0. Communication
between these two layers can utilize private or encrypted memory directly, as
they both reside within the same CVM. This differs from the
communication between
the host and VTL0/VTL2, where shared or decrypted memory (bounce
buffer) is required.

For Confidential VMbus devices in CVM, the device module (emulated
device code) in
VTL2 and interacts with the normal guest in VTL0. As a result, the
Confidential VMbus
device's ring buffer and DMA transactions may use private or encrypted memory.

> Separately, I have major qualms about using an approach with Hyper-V specific
> DMA operations. As implemented here, these DMA operations bypass all
> the kernel DMA layer logic when the VMBus synthetic device indicates
> "external memory". In that case, the supplied physical address is just used
> as the DMA address and everything else is bypassed. While this actually works
> for VMBus synthetic devices because of their simple requirements, it also
> is implicitly making some assumptions. These assumptions are true today
> (as far as I know) but won't necessarily be true in the future.  The assumptions
> include:
>
> * There's no vIOMMU in the guest VM. IOMMUs in CoCo VMs have their
> own issues to work out, but an implementation that bypasses any IOMMU
> logic in the DMA layer is very short-term thinking.
>
> * There are no PCI pass-thru devices in the guest.  Since the implementation
> changes the global dma_ops, the drivers for such PCI pass-thru devices would
> use the same global dma_ops, and would fail.

For PCI pass-thru devices, it may be reused with additional changes(Co-work with
Dexuan for this support)  and we may rework the current callbacks to
meet the future
requirement.

>
> The code also has some other problems that I point out further down.
>
> In any case, an approach with Hyper-V specific DMA operations seems like
> a very temporary fix (hack?) at best. Further down, I'll proposed at alternate
> approach that preserves all the existing DMA layer functionality.
>
> > These DMA operations might also be reused
> > for TDISP devices in the future, which also support
> > DMA operations with encrypted memory.
> >
> > The DMA operations used are global architecture
> > DMA operations (for details, see get_arch_dma_ops()
> > and get_dma_ops()), and there is no need to set up
> > for each device individually.
> >
> > Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> > ---
> >  drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 89 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index 0dc4692b411a..ca31231b2c32 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -39,6 +39,9 @@
> >  #include <clocksource/hyperv_timer.h>
> >  #include <asm/mshyperv.h>
> >  #include "hyperv_vmbus.h"
> > +#include "../../kernel/dma/direct.h"
> > +
> > +extern const struct dma_map_ops *dma_ops;
> >
> >  struct vmbus_dynid {
> >       struct list_head node;
> > @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void)
> >       return -ENOMEM;
> >  }
> >
> > +
> > +static bool hyperv_private_memory_dma(struct device *dev)
> > +{
> > +     struct hv_device *hv_dev = device_to_hv_device(dev);
>
> device_to_hv_device() works only when "dev" is for a VMBus device. As noted above,
> if a CoCo VM were ever to have a PCI pass-thru device doing DMA, "dev" would
> be some PCI device, and device_to_hv_device() would return garbage.

Yes, Nice catch. PCI device support is not included in this patch and
still on the way
and post this with ConfidentialVMbus device for review first.

>
> > +
> > +     if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory)
> > +             return true;
> > +     else
> > +             return false;
> > +}
> > +
> > +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page,
> > +             unsigned long offset, size_t size,
> > +             enum dma_data_direction dir,
> > +             unsigned long attrs)
> > +{
> > +     phys_addr_t phys = page_to_phys(page) + offset;
> > +
> > +     if (hyperv_private_memory_dma(dev))
> > +             return __phys_to_dma(dev, phys);
> > +     else
> > +             return dma_direct_map_phys(dev, phys, size, dir, attrs);
>
> This code won't build when the VMBus driver is built as a module.
> dma_direct_map_phys() is in inline function that references several other
> DMA functions that aren't exported because they aren't intended to be
> used by drivers. Same problems occur with dma_direct_unmap_phys()
> and similar functions used below.

Yes, Nice catch! We may add a function to register dma ops in the built-in file
and call it in the VMbus driver.

>
> > +}
> > +
> > +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> > +             size_t size, enum dma_data_direction dir, unsigned long attrs)
> > +{
> > +     if (!hyperv_private_memory_dma(dev))
> > +             dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs);
> > +}
> > +
> > +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl,
> > +             int nelems, enum dma_data_direction dir,
> > +             unsigned long attrs)
> > +{
> > +     struct scatterlist *sg;
> > +     dma_addr_t dma_addr;
> > +     int i;
> > +
> > +     if (hyperv_private_memory_dma(dev)) {
> > +             for_each_sg(sgl, sg, nelems, i) {
> > +                     dma_addr = __phys_to_dma(dev, sg_phys(sg));
> > +                     sg_dma_address(sg) = dma_addr;
> > +                     sg_dma_len(sg) = sg->length;
> > +             }
> > +
> > +             return nelems;
> > +     } else {
> > +             return dma_direct_map_sg(dev, sgl, nelems, dir, attrs);
> > +     }
> > +}
> > +
> > +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl,
> > +             int nelems, enum dma_data_direction dir, unsigned long attrs)
> > +{
> > +     if (!hyperv_private_memory_dma(dev))
> > +             dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs);
> > +}
> > +
> > +static int hyperv_dma_supported(struct device *dev, u64 mask)
> > +{
> > +     dev->coherent_dma_mask = mask;
> > +     return 1;
> > +}
> > +
> > +static size_t hyperv_dma_max_mapping_size(struct device *dev)
> > +{
> > +     if (hyperv_private_memory_dma(dev))
> > +             return SIZE_MAX;
> > +     else
> > +             return swiotlb_max_mapping_size(dev);
> > +}
> > +
> > +const struct dma_map_ops hyperv_dma_ops = {
> > +     .map_page               = hyperv_dma_map_page,
> > +     .unmap_page             = hyperv_dma_unmap_page,
> > +     .map_sg                 = hyperv_dma_map_sg,
> > +     .unmap_sg               = hyperv_dma_unmap_sg,
> > +     .dma_supported          = hyperv_dma_supported,
> > +     .max_mapping_size       = hyperv_dma_max_mapping_size,
> > +};
> > +
> >  /*
> >   * vmbus_bus_init -Main vmbus driver initialization routine.
> >   *
> > @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void)
> >        * doing that on each VP while initializing SynIC's wastes time.
> >        */
> >       is_confidential = ms_hyperv.confidential_vmbus_available;
> > -     if (is_confidential)
> > +     if (is_confidential) {
> > +             dma_ops = &hyperv_dma_ops;
>
> arm64 doesn't have the global variable dma_ops, so this patch
> won't build on arm64, even if Confidential VMBus isn't yet available
> for arm64.

Yes, there should be a stub function for both ARM and x86 to implement it..

>
> >               pr_info("Establishing connection to the confidential VMBus\n");
> > +     }
> > +
> >       hv_para_set_sint_proxy(!is_confidential);
> >       ret = vmbus_alloc_synic_and_connect();
> >       if (ret)
> > --
> > 2.50.1
> >
>
> Here's my idea for an alternate approach.  The goal is to allow use of the
> swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb
> usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the
> default swiotlb memory.  For VMBus devices, the calling sequence is
> vmbus_device_register() -> device_register() -> device_initialize() ->
> swiotlb_dev_init(). But if vmbus_device_register() could override the
> dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations
> would be disabled on the device. Furthermore, is_swiotlb_force_bounce()
> would return "false", and the normal DMA functions would not force the
> use of bounce buffers. The entire code change looks like this:
>
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
>         child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
>         dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
>
> +       device_initialize(&child_device_obj->device);
> +       if (child_device_obj->channel->co_external_memory)
> +               child_device_obj->device.dma_io_tlb_mem = NULL;
> +
>         /*
>          * Register with the LDM. This will kick off the driver/device
>          * binding...which will eventually call vmbus_match() and vmbus_probe()
>          */
> -       ret = device_register(&child_device_obj->device);
> +       ret = device_add(&child_device_obj->device);
>         if (ret) {
>                 pr_err("Unable to register child device\n");
>                 put_device(&child_device_obj->device);
>
> I've only compile tested the above since I don't have an environment where
> I can test Confidential VMBus. You would need to verify whether my thinking
> is correct and this produces the intended result.

Thanks Michael. I tested it and it seems to hit an issue. Will double check.with
HCL/paravisor team.

 We considered such a change before. From Roman's previous patch, it seems to
need to change phys_to_dma() and force_dma_unencrypted()..

>
> Directly setting dma_io_tlb_mem to NULL isn't great. It would be better
> to add an exported function swiotlb_dev_disable() to swiotlb code that sets
> dma_io_tlb_mem to NULL, but you get the idea.
>
> Other reviewers may still see this approach as a bit of a hack, but it's a lot
> less of a hack than introducing Hyper-V specific DMA functions.
> swiotlb_dev_disable() is conceptually needed for TDISP devices, as TDISP
> devices must similarly protect confidentiality by not allowing use of the swiotlb.
> So adding swiotlb_dev_disable() is a step in the right direction, even if the
> eventual TDISP code does it slightly differently. Doing the disable on a
> per-device basis is also the right thing in the long run.
>

Agree. Include Christoph and Robin here.

Now, we have three options to disable swiotlb bounce buffer to per-device.
1) Add platform DMA ops and check platform condition(e.g, Confidential
VMbus or TDISP
   device) in platform callback before calling stand DMA API which may
use a bounce buffer.

2)  Directly setting dma_io_tlb_mem to NULL in the device/bus level
driver. It seems not
    enough for Confidential VM which may use high bit of address to
identify encrypted
    or decrypted.(Detail please see Roman's previous patch)
https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/
https://lore.kernel.org/linux-hyperv/20250409000835.285105-7-romank@linux.microsoft.com/

3) Add a new stand swiotlb function to disable the use of bounce
buffer for the device.

--
Thanks
Tianyu Lan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-12-03 14:21   ` Tianyu Lan
@ 2025-12-04  3:35     ` Michael Kelley
  2025-12-04 11:35       ` Tianyu Lan
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Kelley @ 2025-12-04  3:35 UTC (permalink / raw)
  To: Tianyu Lan, Christoph Hellwig, Robin Murphy
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev,
	Tianyu Lan, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

From: Tianyu Lan <ltykernel@gmail.com> Sent: Wednesday, December 3, 2025 6:21 AM
> 
> On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote:
> >
> > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM

[snip]

> >
> > Here's my idea for an alternate approach.  The goal is to allow use of the
> > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb
> > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the
> > default swiotlb memory.  For VMBus devices, the calling sequence is
> > vmbus_device_register() -> device_register() -> device_initialize() ->
> > swiotlb_dev_init(). But if vmbus_device_register() could override the
> > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations
> > would be disabled on the device. Furthermore, is_swiotlb_force_bounce()
> > would return "false", and the normal DMA functions would not force the
> > use of bounce buffers. The entire code change looks like this:
> >
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
> >         child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
> >         dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
> >
> > +       device_initialize(&child_device_obj->device);
> > +       if (child_device_obj->channel->co_external_memory)
> > +               child_device_obj->device.dma_io_tlb_mem = NULL;
> > +
> >         /*
> >          * Register with the LDM. This will kick off the driver/device
> >          * binding...which will eventually call vmbus_match() and vmbus_probe()
> >          */
> > -       ret = device_register(&child_device_obj->device);
> > +       ret = device_add(&child_device_obj->device);
> >         if (ret) {
> >                 pr_err("Unable to register child device\n");
> >                 put_device(&child_device_obj->device);
> >
> > I've only compile tested the above since I don't have an environment where
> > I can test Confidential VMBus. You would need to verify whether my thinking
> > is correct and this produces the intended result.
> 
> Thanks Michael. I tested it and it seems to hit an issue. Will double check.with
> HCL/paravisor team.
> 
>  We considered such a change before. From Roman's previous patch, it seems to
> need to change phys_to_dma() and force_dma_unencrypted().

In a Hyper-V SEV-SNP VM with a paravisor, I assert that phys_to_dma() and
__phys_to_dma() do the same thing.  phys_to_dma() calls dma_addr_encrypted(),
which does __sme_set().  But in a Hyper-V VM using vTOM, sme_me_mask is
always 0, so dma_addr_encrypted() is a no-op.  dma_addr_unencrypted() and
dma_addr_canonical() are also no-ops. See include/linux/mem_encrypt.h. So
in a Hyper-V SEV-SNP VM, the DMA layer doesn't change anything related to
encryption when translating between a physical address and a DMA address.
Same thing is true for a Hyper-V TDX VM with paravisor.

force_dma_unencrypted() will indeed return "true", and it is used in
phys_to_dma_direct(). But both return paths in phys_to_dma_direct() return the
same result because of dma_addr_unencrypted() and dma_addr_encrypted()
being no-ops. Other uses of force_dma_unencrypted() are only in the
dma_alloc_*() paths, but dma_alloc_*() isn't used by VMBus devices because
the device control structures are in the ring buffer, which as you have noted, is
already handled separately. So for the moment, I don't think the return value
from force_dma_unencrypted() matters.

So I'm guessing something else unexpected is happening such that just disabling
the swiotlb on a per-device basis doesn't work. Assuming that Roman's original
patch actually worked, I'm trying to figure out how my idea is different in a way
that has a material effect on things. And if your patch works by going directly to
__phys_to_dma(), it should also work when using phys_to_dma() instead.

I will try a few experiments on a normal Confidential VM (i.e., without Confidential
VMBus) to confirm that my conclusions from reading the code really are correct.
FWIW, I'm looking at the linux-next20251119 code base.

Michael

> 
> >
> > Directly setting dma_io_tlb_mem to NULL isn't great. It would be better
> > to add an exported function swiotlb_dev_disable() to swiotlb code that sets
> > dma_io_tlb_mem to NULL, but you get the idea.
> >
> > Other reviewers may still see this approach as a bit of a hack, but it's a lot
> > less of a hack than introducing Hyper-V specific DMA functions.
> > swiotlb_dev_disable() is conceptually needed for TDISP devices, as TDISP
> > devices must similarly protect confidentiality by not allowing use of the swiotlb.
> > So adding swiotlb_dev_disable() is a step in the right direction, even if the
> > eventual TDISP code does it slightly differently. Doing the disable on a
> > per-device basis is also the right thing in the long run.
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-12-04  3:35     ` Michael Kelley
@ 2025-12-04 11:35       ` Tianyu Lan
  2025-12-05  4:10         ` Tianyu Lan
  0 siblings, 1 reply; 8+ messages in thread
From: Tianyu Lan @ 2025-12-04 11:35 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Christoph Hellwig, Robin Murphy, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	longli@microsoft.com, vdso@hexbites.dev, Tianyu Lan,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Dec 4, 2025 at 11:35 AM Michael Kelley <mhklinux@outlook.com> wrote:
>
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Wednesday, December 3, 2025 6:21 AM
> >
> > On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote:
> > >
> > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM
>
> [snip]
>
> > >
> > > Here's my idea for an alternate approach.  The goal is to allow use of the
> > > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb
> > > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the
> > > default swiotlb memory.  For VMBus devices, the calling sequence is
> > > vmbus_device_register() -> device_register() -> device_initialize() ->
> > > swiotlb_dev_init(). But if vmbus_device_register() could override the
> > > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations
> > > would be disabled on the device. Furthermore, is_swiotlb_force_bounce()
> > > would return "false", and the normal DMA functions would not force the
> > > use of bounce buffers. The entire code change looks like this:
> > >
> > > --- a/drivers/hv/vmbus_drv.c
> > > +++ b/drivers/hv/vmbus_drv.c
> > > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
> > >         child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
> > >         dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
> > >
> > > +       device_initialize(&child_device_obj->device);
> > > +       if (child_device_obj->channel->co_external_memory)
> > > +               child_device_obj->device.dma_io_tlb_mem = NULL;
> > > +
> > >         /*
> > >          * Register with the LDM. This will kick off the driver/device
> > >          * binding...which will eventually call vmbus_match() and vmbus_probe()
> > >          */
> > > -       ret = device_register(&child_device_obj->device);
> > > +       ret = device_add(&child_device_obj->device);
> > >         if (ret) {
> > >                 pr_err("Unable to register child device\n");
> > >                 put_device(&child_device_obj->device);
> > >
> > > I've only compile tested the above since I don't have an environment where
> > > I can test Confidential VMBus. You would need to verify whether my thinking
> > > is correct and this produces the intended result.
> >
> > Thanks Michael. I tested it and it seems to hit an issue. Will double check.with
> > HCL/paravisor team.
> >
> >  We considered such a change before. From Roman's previous patch, it seems to
> > need to change phys_to_dma() and force_dma_unencrypted().
>
> In a Hyper-V SEV-SNP VM with a paravisor, I assert that phys_to_dma() and
> __phys_to_dma() do the same thing.  phys_to_dma() calls dma_addr_encrypted(),
> which does __sme_set().  But in a Hyper-V VM using vTOM, sme_me_mask is
> always 0, so dma_addr_encrypted() is a no-op.  dma_addr_unencrypted() and
> dma_addr_canonical() are also no-ops. See include/linux/mem_encrypt.h. So
> in a Hyper-V SEV-SNP VM, the DMA layer doesn't change anything related to
> encryption when translating between a physical address and a DMA address.
> Same thing is true for a Hyper-V TDX VM with paravisor.
>
> force_dma_unencrypted() will indeed return "true", and it is used in
> phys_to_dma_direct(). But both return paths in phys_to_dma_direct() return the
> same result because of dma_addr_unencrypted() and dma_addr_encrypted()
> being no-ops. Other uses of force_dma_unencrypted() are only in the
> dma_alloc_*() paths, but dma_alloc_*() isn't used by VMBus devices because
> the device control structures are in the ring buffer, which as you have noted, is
> already handled separately. So for the moment, I don't think the return value
> from force_dma_unencrypted() matters.
>
> So I'm guessing something else unexpected is happening such that just disabling
> the swiotlb on a per-device basis doesn't work. Assuming that Roman's original
> patch actually worked, I'm trying to figure out how my idea is different in a way
> that has a material effect on things. And if your patch works by going directly to
> __phys_to_dma(), it should also work when using phys_to_dma() instead.
>

I agree with your analysis and your proposal is much simpler. The
issue I hit was a lot
of user space tasks were blocked after FIO test of several minutes. I
also reproduced
with DMA ops patch after longer test and so need to double check
whether we missed
something or it's caused by other issue. If it's related with
private/shared address used
in the normal guest, we may debug in the paravisor.

-- 
Thanks
Tianyu Lan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support
  2025-12-04 11:35       ` Tianyu Lan
@ 2025-12-05  4:10         ` Tianyu Lan
  0 siblings, 0 replies; 8+ messages in thread
From: Tianyu Lan @ 2025-12-05  4:10 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Christoph Hellwig, Robin Murphy, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	longli@microsoft.com, vdso@hexbites.dev, Tianyu Lan,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Dec 4, 2025 at 7:35 PM Tianyu Lan <ltykernel@gmail.com> wrote:
>
> On Thu, Dec 4, 2025 at 11:35 AM Michael Kelley <mhklinux@outlook.com> wrote:
> >
> > From: Tianyu Lan <ltykernel@gmail.com> Sent: Wednesday, December 3, 2025 6:21 AM
> > >
> > > On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote:
> > > >
> > > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM
> >
> > [snip]
> >
> > > >
> > > > Here's my idea for an alternate approach.  The goal is to allow use of the
> > > > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb
> > > > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the
> > > > default swiotlb memory.  For VMBus devices, the calling sequence is
> > > > vmbus_device_register() -> device_register() -> device_initialize() ->
> > > > swiotlb_dev_init(). But if vmbus_device_register() could override the
> > > > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations
> > > > would be disabled on the device. Furthermore, is_swiotlb_force_bounce()
> > > > would return "false", and the normal DMA functions would not force the
> > > > use of bounce buffers. The entire code change looks like this:
> > > >
> > > > --- a/drivers/hv/vmbus_drv.c
> > > > +++ b/drivers/hv/vmbus_drv.c
> > > > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
> > > >         child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
> > > >         dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
> > > >
> > > > +       device_initialize(&child_device_obj->device);
> > > > +       if (child_device_obj->channel->co_external_memory)
> > > > +               child_device_obj->device.dma_io_tlb_mem = NULL;
> > > > +
> > > >         /*
> > > >          * Register with the LDM. This will kick off the driver/device
> > > >          * binding...which will eventually call vmbus_match() and vmbus_probe()
> > > >          */
> > > > -       ret = device_register(&child_device_obj->device);
> > > > +       ret = device_add(&child_device_obj->device);
> > > >         if (ret) {
> > > >                 pr_err("Unable to register child device\n");
> > > >                 put_device(&child_device_obj->device);
> > > >
> > > > I've only compile tested the above since I don't have an environment where
> > > > I can test Confidential VMBus. You would need to verify whether my thinking
> > > > is correct and this produces the intended result.
> > >
> > > Thanks Michael. I tested it and it seems to hit an issue. Will double check.with
> > > HCL/paravisor team.
> > >
> > >  We considered such a change before. From Roman's previous patch, it seems to
> > > need to change phys_to_dma() and force_dma_unencrypted().
> >
> > In a Hyper-V SEV-SNP VM with a paravisor, I assert that phys_to_dma() and
> > __phys_to_dma() do the same thing.  phys_to_dma() calls dma_addr_encrypted(),
> > which does __sme_set().  But in a Hyper-V VM using vTOM, sme_me_mask is
> > always 0, so dma_addr_encrypted() is a no-op.  dma_addr_unencrypted() and
> > dma_addr_canonical() are also no-ops. See include/linux/mem_encrypt.h. So
> > in a Hyper-V SEV-SNP VM, the DMA layer doesn't change anything related to
> > encryption when translating between a physical address and a DMA address.
> > Same thing is true for a Hyper-V TDX VM with paravisor.
> >
> > force_dma_unencrypted() will indeed return "true", and it is used in
> > phys_to_dma_direct(). But both return paths in phys_to_dma_direct() return the
> > same result because of dma_addr_unencrypted() and dma_addr_encrypted()
> > being no-ops. Other uses of force_dma_unencrypted() are only in the
> > dma_alloc_*() paths, but dma_alloc_*() isn't used by VMBus devices because
> > the device control structures are in the ring buffer, which as you have noted, is
> > already handled separately. So for the moment, I don't think the return value
> > from force_dma_unencrypted() matters.
> >

dma_alloc_*() is used by PCI device driver(e.g, Mana NIC
driver) and If we need to support TDisp device, the change
in the force_dma_unencrypted() is still necessary and dma
address to TDisp device should be encrypted memory with
sme_me_mask.

From this point, Hyper-V specific dma ops may resolve this
without change in the DMA core code. Otherwise, we still
need to add a callback or other flag inside for platforms to
check whether it should return encrypted/decrypted address
to drivers.

> > So I'm guessing something else unexpected is happening such that just disabling
> > the swiotlb on a per-device basis doesn't work. Assuming that Roman's original
> > patch actually worked, I'm trying to figure out how my idea is different in a way
> > that has a material effect on things. And if your patch works by going directly to
> > __phys_to_dma(), it should also work when using phys_to_dma() instead.
> >

The issue I met should not be related with bounce buffer disabling.
I don't find any failure to map dma memory with bounce buffer.
For disabling per-device swiotlb, it looks like work to set dma_io_
tlb_mem to be NULL and it makes is_swiotlb_force_bounce()
returns false.

--
Thanks
Tianyu Lan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-05  4:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 18:29 [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support Tianyu Lan
2025-11-27  5:15 ` vdso
2025-11-27  9:16   ` Tianyu Lan
2025-11-28 17:47 ` Michael Kelley
2025-12-03 14:21   ` Tianyu Lan
2025-12-04  3:35     ` Michael Kelley
2025-12-04 11:35       ` Tianyu Lan
2025-12-05  4:10         ` Tianyu Lan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).