* [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support @ 2025-11-24 18:29 Tianyu Lan 2025-11-27 5:15 ` vdso 2025-11-28 17:47 ` Michael Kelley 0 siblings, 2 replies; 8+ messages in thread From: Tianyu Lan @ 2025-11-24 18:29 UTC (permalink / raw) To: kys, haiyangz, wei.liu, decui, longli, vdso Cc: Tianyu Lan, linux-hyperv, linux-kernel In CVM(Confidential VM), system memory is encrypted by default. Device drivers typically use the swiotlb bounce buffer for DMA memory, which is decrypted and shared between the guest and host. Confidential Vmbus, however, supports a confidential channel that employs encrypted memory for the Vmbus ring buffer and external DMA memory. The support for the confidential ring buffer has already been integrated. In CVM, device drivers usually employ the standard DMA API to map DMA memory with the bounce buffer, which remains transparent to the device driver. For external DMA memory support, Hyper-V specific DMA operations are introduced, bypassing the bounce buffer when the confidential external memory flag is set. These DMA operations might also be reused for TDISP devices in the future, which also support DMA operations with encrypted memory. The DMA operations used are global architecture DMA operations (for details, see get_arch_dma_ops() and get_dma_ops()), and there is no need to set up for each device individually. Signed-off-by: Tianyu Lan <tiala@microsoft.com> --- drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 0dc4692b411a..ca31231b2c32 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -39,6 +39,9 @@ #include <clocksource/hyperv_timer.h> #include <asm/mshyperv.h> #include "hyperv_vmbus.h" +#include "../../kernel/dma/direct.h" + +extern const struct dma_map_ops *dma_ops; struct vmbus_dynid { struct list_head node; @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void) return -ENOMEM; } + +static bool hyperv_private_memory_dma(struct device *dev) +{ + struct hv_device *hv_dev = device_to_hv_device(dev); + + if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory) + return true; + else + return false; +} + +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + phys_addr_t phys = page_to_phys(page) + offset; + + if (hyperv_private_memory_dma(dev)) + return __phys_to_dma(dev, phys); + else + return dma_direct_map_phys(dev, phys, size, dir, attrs); +} + +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + if (!hyperv_private_memory_dma(dev)) + dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs); +} + +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl, + int nelems, enum dma_data_direction dir, + unsigned long attrs) +{ + struct scatterlist *sg; + dma_addr_t dma_addr; + int i; + + if (hyperv_private_memory_dma(dev)) { + for_each_sg(sgl, sg, nelems, i) { + dma_addr = __phys_to_dma(dev, sg_phys(sg)); + sg_dma_address(sg) = dma_addr; + sg_dma_len(sg) = sg->length; + } + + return nelems; + } else { + return dma_direct_map_sg(dev, sgl, nelems, dir, attrs); + } +} + +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, + int nelems, enum dma_data_direction dir, unsigned long attrs) +{ + if (!hyperv_private_memory_dma(dev)) + dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs); +} + +static int hyperv_dma_supported(struct device *dev, u64 mask) +{ + dev->coherent_dma_mask = mask; + return 1; +} + +static size_t hyperv_dma_max_mapping_size(struct device *dev) +{ + if (hyperv_private_memory_dma(dev)) + return SIZE_MAX; + else + return swiotlb_max_mapping_size(dev); +} + +const struct dma_map_ops hyperv_dma_ops = { + .map_page = hyperv_dma_map_page, + .unmap_page = hyperv_dma_unmap_page, + .map_sg = hyperv_dma_map_sg, + .unmap_sg = hyperv_dma_unmap_sg, + .dma_supported = hyperv_dma_supported, + .max_mapping_size = hyperv_dma_max_mapping_size, +}; + /* * vmbus_bus_init -Main vmbus driver initialization routine. * @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void) * doing that on each VP while initializing SynIC's wastes time. */ is_confidential = ms_hyperv.confidential_vmbus_available; - if (is_confidential) + if (is_confidential) { + dma_ops = &hyperv_dma_ops; pr_info("Establishing connection to the confidential VMBus\n"); + } + hv_para_set_sint_proxy(!is_confidential); ret = vmbus_alloc_synic_and_connect(); if (ret) -- 2.50.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-11-24 18:29 [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support Tianyu Lan @ 2025-11-27 5:15 ` vdso 2025-11-27 9:16 ` Tianyu Lan 2025-11-28 17:47 ` Michael Kelley 1 sibling, 1 reply; 8+ messages in thread From: vdso @ 2025-11-27 5:15 UTC (permalink / raw) To: ltykernel Cc: decui, haiyangz, kys, linux-hyperv, linux-kernel, longli, tiala, vdso, wei.liu From: Roman Kisel <vdso@hexbites.dev> Tianyu Lan wrote: > In CVM(Confidential VM), system memory is encrypted > by default. Device drivers typically use the swiotlb > bounce buffer for DMA memory, which is decrypted > and shared between the guest and host. Confidential > Vmbus, however, supports a confidential channel > that employs encrypted memory for the Vmbus ring > buffer and external DMA memory. The support for > the confidential ring buffer has already been > integrated. > > In CVM, device drivers usually employ the standard > DMA API to map DMA memory with the bounce buffer, > which remains transparent to the device driver. > For external DMA memory support, Hyper-V specific > DMA operations are introduced, bypassing the bounce > buffer when the confidential external memory flag > is set. These DMA operations might also be reused > for TDISP devices in the future, which also support > DMA operations with encrypted memory. > > The DMA operations used are global architecture > DMA operations (for details, see get_arch_dma_ops() > and get_dma_ops()), and there is no need to set up > for each device individually. > > Signed-off-by: Tianyu Lan <tiala@microsoft.com> Tinayu, Looks great to me! Reviewed-by: Roman Kisel <vdso@hexbites.dev> P.S. For the inclined reader, here is how the old, only for storage and not satisfactory in other ways my attempt to solve this: https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/ https://lore.kernel.org/linux-hyperv/20250409000835.285105-7-romank@linux.microsoft.com/ Maybe it'd be a good idea to CC folks who provided feedback back then. > --- > drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 89 insertions(+), 1 deletion(-) > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index 0dc4692b411a..ca31231b2c32 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -39,6 +39,9 @@ > #include <clocksource/hyperv_timer.h> > #include <asm/mshyperv.h> > #include "hyperv_vmbus.h" > +#include "../../kernel/dma/direct.h" > + > +extern const struct dma_map_ops *dma_ops; > > struct vmbus_dynid { > struct list_head node; > @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void) > return -ENOMEM; > } > > + > +static bool hyperv_private_memory_dma(struct device *dev) > +{ > + struct hv_device *hv_dev = device_to_hv_device(dev); > + > + if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory) > + return true; > + else > + return false; > +} > + > +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + unsigned long attrs) > +{ > + phys_addr_t phys = page_to_phys(page) + offset; > + > + if (hyperv_private_memory_dma(dev)) > + return __phys_to_dma(dev, phys); > + else > + return dma_direct_map_phys(dev, phys, size, dir, attrs); > +} > + > +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, > + size_t size, enum dma_data_direction dir, unsigned long attrs) > +{ > + if (!hyperv_private_memory_dma(dev)) > + dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs); > +} > + > +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl, > + int nelems, enum dma_data_direction dir, > + unsigned long attrs) > +{ > + struct scatterlist *sg; > + dma_addr_t dma_addr; > + int i; > + > + if (hyperv_private_memory_dma(dev)) { > + for_each_sg(sgl, sg, nelems, i) { > + dma_addr = __phys_to_dma(dev, sg_phys(sg)); > + sg_dma_address(sg) = dma_addr; > + sg_dma_len(sg) = sg->length; > + } > + > + return nelems; > + } else { > + return dma_direct_map_sg(dev, sgl, nelems, dir, attrs); > + } > +} > + > +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, > + int nelems, enum dma_data_direction dir, unsigned long attrs) > +{ > + if (!hyperv_private_memory_dma(dev)) > + dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs); > +} > + > +static int hyperv_dma_supported(struct device *dev, u64 mask) > +{ > + dev->coherent_dma_mask = mask; > + return 1; > +} > + > +static size_t hyperv_dma_max_mapping_size(struct device *dev) > +{ > + if (hyperv_private_memory_dma(dev)) > + return SIZE_MAX; > + else > + return swiotlb_max_mapping_size(dev); > +} > + > +const struct dma_map_ops hyperv_dma_ops = { > + .map_page = hyperv_dma_map_page, > + .unmap_page = hyperv_dma_unmap_page, > + .map_sg = hyperv_dma_map_sg, > + .unmap_sg = hyperv_dma_unmap_sg, > + .dma_supported = hyperv_dma_supported, > + .max_mapping_size = hyperv_dma_max_mapping_size, > +}; > + > /* > * vmbus_bus_init -Main vmbus driver initialization routine. > * > @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void) > * doing that on each VP while initializing SynIC's wastes time. > */ > is_confidential = ms_hyperv.confidential_vmbus_available; > - if (is_confidential) > + if (is_confidential) { > + dma_ops = &hyperv_dma_ops; > pr_info("Establishing connection to the confidential VMBus\n"); > + } > + > hv_para_set_sint_proxy(!is_confidential); > ret = vmbus_alloc_synic_and_connect(); > if (ret) > -- > 2.50.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-11-27 5:15 ` vdso @ 2025-11-27 9:16 ` Tianyu Lan 0 siblings, 0 replies; 8+ messages in thread From: Tianyu Lan @ 2025-11-27 9:16 UTC (permalink / raw) To: vdso Cc: decui, haiyangz, kys, linux-hyperv, linux-kernel, longli, tiala, wei.liu On Thu, Nov 27, 2025 at 1:16 PM <vdso@hexbites.dev> wrote: > > From: Roman Kisel <vdso@hexbites.dev> > > Tianyu Lan wrote: > > > In CVM(Confidential VM), system memory is encrypted > > by default. Device drivers typically use the swiotlb > > bounce buffer for DMA memory, which is decrypted > > and shared between the guest and host. Confidential > > Vmbus, however, supports a confidential channel > > that employs encrypted memory for the Vmbus ring > > buffer and external DMA memory. The support for > > the confidential ring buffer has already been > > integrated. > > > > In CVM, device drivers usually employ the standard > > DMA API to map DMA memory with the bounce buffer, > > which remains transparent to the device driver. > > For external DMA memory support, Hyper-V specific > > DMA operations are introduced, bypassing the bounce > > buffer when the confidential external memory flag > > is set. These DMA operations might also be reused > > for TDISP devices in the future, which also support > > DMA operations with encrypted memory. > > > > The DMA operations used are global architecture > > DMA operations (for details, see get_arch_dma_ops() > > and get_dma_ops()), and there is no need to set up > > for each device individually. > > > > Signed-off-by: Tianyu Lan <tiala@microsoft.com> > > Tinayu, > > Looks great to me! > > Reviewed-by: Roman Kisel <vdso@hexbites.dev> > > P.S. For the inclined reader, here is how the old, only for > storage and not satisfactory in other ways my attempt to solve this: > > https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/ > https://lore.kernel.org/linux-hyperv/20250409000835.285105-7-romank@linux.microsoft.com/ > > Maybe it'd be a good idea to CC folks who provided feedback back then. > Hi Roman: Thanks for your review. I will follow your suggestion. Thanks > > --- > > drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 89 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > > index 0dc4692b411a..ca31231b2c32 100644 > > --- a/drivers/hv/vmbus_drv.c > > +++ b/drivers/hv/vmbus_drv.c > > @@ -39,6 +39,9 @@ > > #include <clocksource/hyperv_timer.h> > > #include <asm/mshyperv.h> > > #include "hyperv_vmbus.h" > > +#include "../../kernel/dma/direct.h" > > + > > +extern const struct dma_map_ops *dma_ops; > > > > struct vmbus_dynid { > > struct list_head node; > > @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void) > > return -ENOMEM; > > } > > > > + > > +static bool hyperv_private_memory_dma(struct device *dev) > > +{ > > + struct hv_device *hv_dev = device_to_hv_device(dev); > > + > > + if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory) > > + return true; > > + else > > + return false; > > +} > > + > > +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page, > > + unsigned long offset, size_t size, > > + enum dma_data_direction dir, > > + unsigned long attrs) > > +{ > > + phys_addr_t phys = page_to_phys(page) + offset; > > + > > + if (hyperv_private_memory_dma(dev)) > > + return __phys_to_dma(dev, phys); > > + else > > + return dma_direct_map_phys(dev, phys, size, dir, attrs); > > +} > > + > > +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, > > + size_t size, enum dma_data_direction dir, unsigned long attrs) > > +{ > > + if (!hyperv_private_memory_dma(dev)) > > + dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs); > > +} > > + > > +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl, > > + int nelems, enum dma_data_direction dir, > > + unsigned long attrs) > > +{ > > + struct scatterlist *sg; > > + dma_addr_t dma_addr; > > + int i; > > + > > + if (hyperv_private_memory_dma(dev)) { > > + for_each_sg(sgl, sg, nelems, i) { > > + dma_addr = __phys_to_dma(dev, sg_phys(sg)); > > + sg_dma_address(sg) = dma_addr; > > + sg_dma_len(sg) = sg->length; > > + } > > + > > + return nelems; > > + } else { > > + return dma_direct_map_sg(dev, sgl, nelems, dir, attrs); > > + } > > +} > > + > > +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, > > + int nelems, enum dma_data_direction dir, unsigned long attrs) > > +{ > > + if (!hyperv_private_memory_dma(dev)) > > + dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs); > > +} > > + > > +static int hyperv_dma_supported(struct device *dev, u64 mask) > > +{ > > + dev->coherent_dma_mask = mask; > > + return 1; > > +} > > + > > +static size_t hyperv_dma_max_mapping_size(struct device *dev) > > +{ > > + if (hyperv_private_memory_dma(dev)) > > + return SIZE_MAX; > > + else > > + return swiotlb_max_mapping_size(dev); > > +} > > + > > +const struct dma_map_ops hyperv_dma_ops = { > > + .map_page = hyperv_dma_map_page, > > + .unmap_page = hyperv_dma_unmap_page, > > + .map_sg = hyperv_dma_map_sg, > > + .unmap_sg = hyperv_dma_unmap_sg, > > + .dma_supported = hyperv_dma_supported, > > + .max_mapping_size = hyperv_dma_max_mapping_size, > > +}; > > + > > /* > > * vmbus_bus_init -Main vmbus driver initialization routine. > > * > > @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void) > > * doing that on each VP while initializing SynIC's wastes time. > > */ > > is_confidential = ms_hyperv.confidential_vmbus_available; > > - if (is_confidential) > > + if (is_confidential) { > > + dma_ops = &hyperv_dma_ops; > > pr_info("Establishing connection to the confidential VMBus\n"); > > + } > > + > > hv_para_set_sint_proxy(!is_confidential); > > ret = vmbus_alloc_synic_and_connect(); > > if (ret) > > -- > > 2.50.1 -- Thanks Tianyu Lan ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-11-24 18:29 [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support Tianyu Lan 2025-11-27 5:15 ` vdso @ 2025-11-28 17:47 ` Michael Kelley 2025-12-03 14:21 ` Tianyu Lan 1 sibling, 1 reply; 8+ messages in thread From: Michael Kelley @ 2025-11-28 17:47 UTC (permalink / raw) To: Tianyu Lan, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev Cc: Tianyu Lan, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM > > In CVM(Confidential VM), system memory is encrypted > by default. Device drivers typically use the swiotlb > bounce buffer for DMA memory, which is decrypted > and shared between the guest and host. Confidential > Vmbus, however, supports a confidential channel s/Vmbus/VMBus/ [and elsewhere in this commit msg] > that employs encrypted memory for the Vmbus ring > buffer and external DMA memory. The support for > the confidential ring buffer has already been > integrated. > > In CVM, device drivers usually employ the standard > DMA API to map DMA memory with the bounce buffer, > which remains transparent to the device driver. > For external DMA memory support, The "external memory" terminology is not at all clear in the context of Linux. Presumably the terminology came from Hyper-V or the paravisor, but I never understood what the "external" refers to. Maybe it is memory that is "external" with respect to the hypervisor, and therefore not shared between the hypervisor and guest? But that meaning won't be evident to other reviewers in the Linux kernel community. In any case, some explanation of "external memory" is needed. And even consider changing the field names in the code to be something that makes more sense to Linux. > Hyper-V specific > DMA operations are introduced, bypassing the bounce > buffer when the confidential external memory flag > is set. This commit message never explains why it is important to not do bounce buffering. There's the obvious but unstated reason of improving performance, but that's not the main reason. The main reason is a confidentiality leak. When available, Confidential VMBus would be used because it keeps the DMA data private (i.e., encrypted) and confidential to the guest. Bounce buffering copies the data to shared (i.e., decrypted) swiotlb memory, where it is exposed to the hypervisor. That's a confidentiality leak, and is the primary reason the bounce buffering must be eliminated. This requirement was pointed out by Robin Murphy in the discussion of Roman Kisel's original code to eliminate the bounce buffering. Separately, I have major qualms about using an approach with Hyper-V specific DMA operations. As implemented here, these DMA operations bypass all the kernel DMA layer logic when the VMBus synthetic device indicates "external memory". In that case, the supplied physical address is just used as the DMA address and everything else is bypassed. While this actually works for VMBus synthetic devices because of their simple requirements, it also is implicitly making some assumptions. These assumptions are true today (as far as I know) but won't necessarily be true in the future. The assumptions include: * There's no vIOMMU in the guest VM. IOMMUs in CoCo VMs have their own issues to work out, but an implementation that bypasses any IOMMU logic in the DMA layer is very short-term thinking. * There are no PCI pass-thru devices in the guest. Since the implementation changes the global dma_ops, the drivers for such PCI pass-thru devices would use the same global dma_ops, and would fail. The code also has some other problems that I point out further down. In any case, an approach with Hyper-V specific DMA operations seems like a very temporary fix (hack?) at best. Further down, I'll proposed at alternate approach that preserves all the existing DMA layer functionality. > These DMA operations might also be reused > for TDISP devices in the future, which also support > DMA operations with encrypted memory. > > The DMA operations used are global architecture > DMA operations (for details, see get_arch_dma_ops() > and get_dma_ops()), and there is no need to set up > for each device individually. > > Signed-off-by: Tianyu Lan <tiala@microsoft.com> > --- > drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 89 insertions(+), 1 deletion(-) > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index 0dc4692b411a..ca31231b2c32 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -39,6 +39,9 @@ > #include <clocksource/hyperv_timer.h> > #include <asm/mshyperv.h> > #include "hyperv_vmbus.h" > +#include "../../kernel/dma/direct.h" > + > +extern const struct dma_map_ops *dma_ops; > > struct vmbus_dynid { > struct list_head node; > @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void) > return -ENOMEM; > } > > + > +static bool hyperv_private_memory_dma(struct device *dev) > +{ > + struct hv_device *hv_dev = device_to_hv_device(dev); device_to_hv_device() works only when "dev" is for a VMBus device. As noted above, if a CoCo VM were ever to have a PCI pass-thru device doing DMA, "dev" would be some PCI device, and device_to_hv_device() would return garbage. > + > + if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory) > + return true; > + else > + return false; > +} > + > +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + unsigned long attrs) > +{ > + phys_addr_t phys = page_to_phys(page) + offset; > + > + if (hyperv_private_memory_dma(dev)) > + return __phys_to_dma(dev, phys); > + else > + return dma_direct_map_phys(dev, phys, size, dir, attrs); This code won't build when the VMBus driver is built as a module. dma_direct_map_phys() is in inline function that references several other DMA functions that aren't exported because they aren't intended to be used by drivers. Same problems occur with dma_direct_unmap_phys() and similar functions used below. > +} > + > +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, > + size_t size, enum dma_data_direction dir, unsigned long attrs) > +{ > + if (!hyperv_private_memory_dma(dev)) > + dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs); > +} > + > +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl, > + int nelems, enum dma_data_direction dir, > + unsigned long attrs) > +{ > + struct scatterlist *sg; > + dma_addr_t dma_addr; > + int i; > + > + if (hyperv_private_memory_dma(dev)) { > + for_each_sg(sgl, sg, nelems, i) { > + dma_addr = __phys_to_dma(dev, sg_phys(sg)); > + sg_dma_address(sg) = dma_addr; > + sg_dma_len(sg) = sg->length; > + } > + > + return nelems; > + } else { > + return dma_direct_map_sg(dev, sgl, nelems, dir, attrs); > + } > +} > + > +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, > + int nelems, enum dma_data_direction dir, unsigned long attrs) > +{ > + if (!hyperv_private_memory_dma(dev)) > + dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs); > +} > + > +static int hyperv_dma_supported(struct device *dev, u64 mask) > +{ > + dev->coherent_dma_mask = mask; > + return 1; > +} > + > +static size_t hyperv_dma_max_mapping_size(struct device *dev) > +{ > + if (hyperv_private_memory_dma(dev)) > + return SIZE_MAX; > + else > + return swiotlb_max_mapping_size(dev); > +} > + > +const struct dma_map_ops hyperv_dma_ops = { > + .map_page = hyperv_dma_map_page, > + .unmap_page = hyperv_dma_unmap_page, > + .map_sg = hyperv_dma_map_sg, > + .unmap_sg = hyperv_dma_unmap_sg, > + .dma_supported = hyperv_dma_supported, > + .max_mapping_size = hyperv_dma_max_mapping_size, > +}; > + > /* > * vmbus_bus_init -Main vmbus driver initialization routine. > * > @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void) > * doing that on each VP while initializing SynIC's wastes time. > */ > is_confidential = ms_hyperv.confidential_vmbus_available; > - if (is_confidential) > + if (is_confidential) { > + dma_ops = &hyperv_dma_ops; arm64 doesn't have the global variable dma_ops, so this patch won't build on arm64, even if Confidential VMBus isn't yet available for arm64. > pr_info("Establishing connection to the confidential VMBus\n"); > + } > + > hv_para_set_sint_proxy(!is_confidential); > ret = vmbus_alloc_synic_and_connect(); > if (ret) > -- > 2.50.1 > Here's my idea for an alternate approach. The goal is to allow use of the swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the default swiotlb memory. For VMBus devices, the calling sequence is vmbus_device_register() -> device_register() -> device_initialize() -> swiotlb_dev_init(). But if vmbus_device_register() could override the dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations would be disabled on the device. Furthermore, is_swiotlb_force_bounce() would return "false", and the normal DMA functions would not force the use of bounce buffers. The entire code change looks like this: --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj) child_device_obj->device.dma_mask = &child_device_obj->dma_mask; dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); + device_initialize(&child_device_obj->device); + if (child_device_obj->channel->co_external_memory) + child_device_obj->device.dma_io_tlb_mem = NULL; + /* * Register with the LDM. This will kick off the driver/device * binding...which will eventually call vmbus_match() and vmbus_probe() */ - ret = device_register(&child_device_obj->device); + ret = device_add(&child_device_obj->device); if (ret) { pr_err("Unable to register child device\n"); put_device(&child_device_obj->device); I've only compile tested the above since I don't have an environment where I can test Confidential VMBus. You would need to verify whether my thinking is correct and this produces the intended result. Directly setting dma_io_tlb_mem to NULL isn't great. It would be better to add an exported function swiotlb_dev_disable() to swiotlb code that sets dma_io_tlb_mem to NULL, but you get the idea. Other reviewers may still see this approach as a bit of a hack, but it's a lot less of a hack than introducing Hyper-V specific DMA functions. swiotlb_dev_disable() is conceptually needed for TDISP devices, as TDISP devices must similarly protect confidentiality by not allowing use of the swiotlb. So adding swiotlb_dev_disable() is a step in the right direction, even if the eventual TDISP code does it slightly differently. Doing the disable on a per-device basis is also the right thing in the long run. Michael ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-11-28 17:47 ` Michael Kelley @ 2025-12-03 14:21 ` Tianyu Lan 2025-12-04 3:35 ` Michael Kelley 0 siblings, 1 reply; 8+ messages in thread From: Tianyu Lan @ 2025-12-03 14:21 UTC (permalink / raw) To: Michael Kelley, Christoph Hellwig, Robin Murphy Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev, Tianyu Lan, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote: > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM > > > > In CVM(Confidential VM), system memory is encrypted > > by default. Device drivers typically use the swiotlb > > bounce buffer for DMA memory, which is decrypted > > and shared between the guest and host. Confidential > > Vmbus, however, supports a confidential channel > Hi Michael: Thanks for your review. I spent some time to do test and still need paravisor(HCL) team to double check. > s/Vmbus/VMBus/ [and elsewhere in this commit msg] > Will update. > > that employs encrypted memory for the Vmbus ring > > buffer and external DMA memory. The support for > > the confidential ring buffer has already been > > integrated. > > > > In CVM, device drivers usually employ the standard > > DMA API to map DMA memory with the bounce buffer, > > which remains transparent to the device driver. > > For external DMA memory support, > > The "external memory" terminology is not at all clear in the context > of Linux. Presumably the terminology came from Hyper-V or the > paravisor, but I never understood what the "external" refers to. Maybe > it is memory that is "external" with respect to the hypervisor, and therefore > not shared between the hypervisor and guest? But that meaning won't be > evident to other reviewers in the Linux kernel community. In any case, some > explanation of "external memory" is needed. And even consider changing the > field names in the code to be something that makes more sense to Linux. > Agree. For VMbus devices, there are two major kinds of communication methods with the host. * VMbus ring buffer. * Dynamic DMA transition The confidential ring buffer has been upstreamed and merged into the Linux kernel 6.18 by Roman. Additionally, the confidential DMA transition, also known as external memory allows the use of private/encrypted memory as DMA memory. > > Hyper-V specific > > DMA operations are introduced, bypassing the bounce > > buffer when the confidential external memory flag > > is set. > > This commit message never explains why it is important to not do bounce > buffering. There's the obvious but unstated reason of improving performance, > but that's not the main reason. The main reason is a confidentiality leak. > When available, Confidential VMBus would be used because it keeps the > DMA data private (i.e., encrypted) and confidential to the guest. Bounce > buffering copies the data to shared (i.e., decrypted) swiotlb memory, where > it is exposed to the hypervisor. That's a confidentiality leak, and is the > primary reason the bounce buffering must be eliminated. This requirement > was pointed out by Robin Murphy in the discussion of Roman Kisel's > original code to eliminate the bounce buffering. > For CVM on Hyper-V,, there are typically two layers: VTL0 and VTL2. VTL2 plays a similar role as L1 hypervisor of normal guest in VTL0. Communication between these two layers can utilize private or encrypted memory directly, as they both reside within the same CVM. This differs from the communication between the host and VTL0/VTL2, where shared or decrypted memory (bounce buffer) is required. For Confidential VMbus devices in CVM, the device module (emulated device code) in VTL2 and interacts with the normal guest in VTL0. As a result, the Confidential VMbus device's ring buffer and DMA transactions may use private or encrypted memory. > Separately, I have major qualms about using an approach with Hyper-V specific > DMA operations. As implemented here, these DMA operations bypass all > the kernel DMA layer logic when the VMBus synthetic device indicates > "external memory". In that case, the supplied physical address is just used > as the DMA address and everything else is bypassed. While this actually works > for VMBus synthetic devices because of their simple requirements, it also > is implicitly making some assumptions. These assumptions are true today > (as far as I know) but won't necessarily be true in the future. The assumptions > include: > > * There's no vIOMMU in the guest VM. IOMMUs in CoCo VMs have their > own issues to work out, but an implementation that bypasses any IOMMU > logic in the DMA layer is very short-term thinking. > > * There are no PCI pass-thru devices in the guest. Since the implementation > changes the global dma_ops, the drivers for such PCI pass-thru devices would > use the same global dma_ops, and would fail. For PCI pass-thru devices, it may be reused with additional changes(Co-work with Dexuan for this support) and we may rework the current callbacks to meet the future requirement. > > The code also has some other problems that I point out further down. > > In any case, an approach with Hyper-V specific DMA operations seems like > a very temporary fix (hack?) at best. Further down, I'll proposed at alternate > approach that preserves all the existing DMA layer functionality. > > > These DMA operations might also be reused > > for TDISP devices in the future, which also support > > DMA operations with encrypted memory. > > > > The DMA operations used are global architecture > > DMA operations (for details, see get_arch_dma_ops() > > and get_dma_ops()), and there is no need to set up > > for each device individually. > > > > Signed-off-by: Tianyu Lan <tiala@microsoft.com> > > --- > > drivers/hv/vmbus_drv.c | 90 +++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 89 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > > index 0dc4692b411a..ca31231b2c32 100644 > > --- a/drivers/hv/vmbus_drv.c > > +++ b/drivers/hv/vmbus_drv.c > > @@ -39,6 +39,9 @@ > > #include <clocksource/hyperv_timer.h> > > #include <asm/mshyperv.h> > > #include "hyperv_vmbus.h" > > +#include "../../kernel/dma/direct.h" > > + > > +extern const struct dma_map_ops *dma_ops; > > > > struct vmbus_dynid { > > struct list_head node; > > @@ -1429,6 +1432,88 @@ static int vmbus_alloc_synic_and_connect(void) > > return -ENOMEM; > > } > > > > + > > +static bool hyperv_private_memory_dma(struct device *dev) > > +{ > > + struct hv_device *hv_dev = device_to_hv_device(dev); > > device_to_hv_device() works only when "dev" is for a VMBus device. As noted above, > if a CoCo VM were ever to have a PCI pass-thru device doing DMA, "dev" would > be some PCI device, and device_to_hv_device() would return garbage. Yes, Nice catch. PCI device support is not included in this patch and still on the way and post this with ConfidentialVMbus device for review first. > > > + > > + if (hv_dev && hv_dev->channel && hv_dev->channel->co_external_memory) > > + return true; > > + else > > + return false; > > +} > > + > > +static dma_addr_t hyperv_dma_map_page(struct device *dev, struct page *page, > > + unsigned long offset, size_t size, > > + enum dma_data_direction dir, > > + unsigned long attrs) > > +{ > > + phys_addr_t phys = page_to_phys(page) + offset; > > + > > + if (hyperv_private_memory_dma(dev)) > > + return __phys_to_dma(dev, phys); > > + else > > + return dma_direct_map_phys(dev, phys, size, dir, attrs); > > This code won't build when the VMBus driver is built as a module. > dma_direct_map_phys() is in inline function that references several other > DMA functions that aren't exported because they aren't intended to be > used by drivers. Same problems occur with dma_direct_unmap_phys() > and similar functions used below. Yes, Nice catch! We may add a function to register dma ops in the built-in file and call it in the VMbus driver. > > > +} > > + > > +static void hyperv_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, > > + size_t size, enum dma_data_direction dir, unsigned long attrs) > > +{ > > + if (!hyperv_private_memory_dma(dev)) > > + dma_direct_unmap_phys(dev, dma_handle, size, dir, attrs); > > +} > > + > > +static int hyperv_dma_map_sg(struct device *dev, struct scatterlist *sgl, > > + int nelems, enum dma_data_direction dir, > > + unsigned long attrs) > > +{ > > + struct scatterlist *sg; > > + dma_addr_t dma_addr; > > + int i; > > + > > + if (hyperv_private_memory_dma(dev)) { > > + for_each_sg(sgl, sg, nelems, i) { > > + dma_addr = __phys_to_dma(dev, sg_phys(sg)); > > + sg_dma_address(sg) = dma_addr; > > + sg_dma_len(sg) = sg->length; > > + } > > + > > + return nelems; > > + } else { > > + return dma_direct_map_sg(dev, sgl, nelems, dir, attrs); > > + } > > +} > > + > > +static void hyperv_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, > > + int nelems, enum dma_data_direction dir, unsigned long attrs) > > +{ > > + if (!hyperv_private_memory_dma(dev)) > > + dma_direct_unmap_sg(dev, sgl, nelems, dir, attrs); > > +} > > + > > +static int hyperv_dma_supported(struct device *dev, u64 mask) > > +{ > > + dev->coherent_dma_mask = mask; > > + return 1; > > +} > > + > > +static size_t hyperv_dma_max_mapping_size(struct device *dev) > > +{ > > + if (hyperv_private_memory_dma(dev)) > > + return SIZE_MAX; > > + else > > + return swiotlb_max_mapping_size(dev); > > +} > > + > > +const struct dma_map_ops hyperv_dma_ops = { > > + .map_page = hyperv_dma_map_page, > > + .unmap_page = hyperv_dma_unmap_page, > > + .map_sg = hyperv_dma_map_sg, > > + .unmap_sg = hyperv_dma_unmap_sg, > > + .dma_supported = hyperv_dma_supported, > > + .max_mapping_size = hyperv_dma_max_mapping_size, > > +}; > > + > > /* > > * vmbus_bus_init -Main vmbus driver initialization routine. > > * > > @@ -1479,8 +1564,11 @@ static int vmbus_bus_init(void) > > * doing that on each VP while initializing SynIC's wastes time. > > */ > > is_confidential = ms_hyperv.confidential_vmbus_available; > > - if (is_confidential) > > + if (is_confidential) { > > + dma_ops = &hyperv_dma_ops; > > arm64 doesn't have the global variable dma_ops, so this patch > won't build on arm64, even if Confidential VMBus isn't yet available > for arm64. Yes, there should be a stub function for both ARM and x86 to implement it.. > > > pr_info("Establishing connection to the confidential VMBus\n"); > > + } > > + > > hv_para_set_sint_proxy(!is_confidential); > > ret = vmbus_alloc_synic_and_connect(); > > if (ret) > > -- > > 2.50.1 > > > > Here's my idea for an alternate approach. The goal is to allow use of the > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the > default swiotlb memory. For VMBus devices, the calling sequence is > vmbus_device_register() -> device_register() -> device_initialize() -> > swiotlb_dev_init(). But if vmbus_device_register() could override the > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations > would be disabled on the device. Furthermore, is_swiotlb_force_bounce() > would return "false", and the normal DMA functions would not force the > use of bounce buffers. The entire code change looks like this: > > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj) > child_device_obj->device.dma_mask = &child_device_obj->dma_mask; > dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); > > + device_initialize(&child_device_obj->device); > + if (child_device_obj->channel->co_external_memory) > + child_device_obj->device.dma_io_tlb_mem = NULL; > + > /* > * Register with the LDM. This will kick off the driver/device > * binding...which will eventually call vmbus_match() and vmbus_probe() > */ > - ret = device_register(&child_device_obj->device); > + ret = device_add(&child_device_obj->device); > if (ret) { > pr_err("Unable to register child device\n"); > put_device(&child_device_obj->device); > > I've only compile tested the above since I don't have an environment where > I can test Confidential VMBus. You would need to verify whether my thinking > is correct and this produces the intended result. Thanks Michael. I tested it and it seems to hit an issue. Will double check.with HCL/paravisor team. We considered such a change before. From Roman's previous patch, it seems to need to change phys_to_dma() and force_dma_unencrypted().. > > Directly setting dma_io_tlb_mem to NULL isn't great. It would be better > to add an exported function swiotlb_dev_disable() to swiotlb code that sets > dma_io_tlb_mem to NULL, but you get the idea. > > Other reviewers may still see this approach as a bit of a hack, but it's a lot > less of a hack than introducing Hyper-V specific DMA functions. > swiotlb_dev_disable() is conceptually needed for TDISP devices, as TDISP > devices must similarly protect confidentiality by not allowing use of the swiotlb. > So adding swiotlb_dev_disable() is a step in the right direction, even if the > eventual TDISP code does it slightly differently. Doing the disable on a > per-device basis is also the right thing in the long run. > Agree. Include Christoph and Robin here. Now, we have three options to disable swiotlb bounce buffer to per-device. 1) Add platform DMA ops and check platform condition(e.g, Confidential VMbus or TDISP device) in platform callback before calling stand DMA API which may use a bounce buffer. 2) Directly setting dma_io_tlb_mem to NULL in the device/bus level driver. It seems not enough for Confidential VM which may use high bit of address to identify encrypted or decrypted.(Detail please see Roman's previous patch) https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/ https://lore.kernel.org/linux-hyperv/20250409000835.285105-7-romank@linux.microsoft.com/ 3) Add a new stand swiotlb function to disable the use of bounce buffer for the device. -- Thanks Tianyu Lan ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-12-03 14:21 ` Tianyu Lan @ 2025-12-04 3:35 ` Michael Kelley 2025-12-04 11:35 ` Tianyu Lan 0 siblings, 1 reply; 8+ messages in thread From: Michael Kelley @ 2025-12-04 3:35 UTC (permalink / raw) To: Tianyu Lan, Christoph Hellwig, Robin Murphy Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev, Tianyu Lan, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org From: Tianyu Lan <ltykernel@gmail.com> Sent: Wednesday, December 3, 2025 6:21 AM > > On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote: > > > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM [snip] > > > > Here's my idea for an alternate approach. The goal is to allow use of the > > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb > > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the > > default swiotlb memory. For VMBus devices, the calling sequence is > > vmbus_device_register() -> device_register() -> device_initialize() -> > > swiotlb_dev_init(). But if vmbus_device_register() could override the > > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations > > would be disabled on the device. Furthermore, is_swiotlb_force_bounce() > > would return "false", and the normal DMA functions would not force the > > use of bounce buffers. The entire code change looks like this: > > > > --- a/drivers/hv/vmbus_drv.c > > +++ b/drivers/hv/vmbus_drv.c > > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj) > > child_device_obj->device.dma_mask = &child_device_obj->dma_mask; > > dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); > > > > + device_initialize(&child_device_obj->device); > > + if (child_device_obj->channel->co_external_memory) > > + child_device_obj->device.dma_io_tlb_mem = NULL; > > + > > /* > > * Register with the LDM. This will kick off the driver/device > > * binding...which will eventually call vmbus_match() and vmbus_probe() > > */ > > - ret = device_register(&child_device_obj->device); > > + ret = device_add(&child_device_obj->device); > > if (ret) { > > pr_err("Unable to register child device\n"); > > put_device(&child_device_obj->device); > > > > I've only compile tested the above since I don't have an environment where > > I can test Confidential VMBus. You would need to verify whether my thinking > > is correct and this produces the intended result. > > Thanks Michael. I tested it and it seems to hit an issue. Will double check.with > HCL/paravisor team. > > We considered such a change before. From Roman's previous patch, it seems to > need to change phys_to_dma() and force_dma_unencrypted(). In a Hyper-V SEV-SNP VM with a paravisor, I assert that phys_to_dma() and __phys_to_dma() do the same thing. phys_to_dma() calls dma_addr_encrypted(), which does __sme_set(). But in a Hyper-V VM using vTOM, sme_me_mask is always 0, so dma_addr_encrypted() is a no-op. dma_addr_unencrypted() and dma_addr_canonical() are also no-ops. See include/linux/mem_encrypt.h. So in a Hyper-V SEV-SNP VM, the DMA layer doesn't change anything related to encryption when translating between a physical address and a DMA address. Same thing is true for a Hyper-V TDX VM with paravisor. force_dma_unencrypted() will indeed return "true", and it is used in phys_to_dma_direct(). But both return paths in phys_to_dma_direct() return the same result because of dma_addr_unencrypted() and dma_addr_encrypted() being no-ops. Other uses of force_dma_unencrypted() are only in the dma_alloc_*() paths, but dma_alloc_*() isn't used by VMBus devices because the device control structures are in the ring buffer, which as you have noted, is already handled separately. So for the moment, I don't think the return value from force_dma_unencrypted() matters. So I'm guessing something else unexpected is happening such that just disabling the swiotlb on a per-device basis doesn't work. Assuming that Roman's original patch actually worked, I'm trying to figure out how my idea is different in a way that has a material effect on things. And if your patch works by going directly to __phys_to_dma(), it should also work when using phys_to_dma() instead. I will try a few experiments on a normal Confidential VM (i.e., without Confidential VMBus) to confirm that my conclusions from reading the code really are correct. FWIW, I'm looking at the linux-next20251119 code base. Michael > > > > > Directly setting dma_io_tlb_mem to NULL isn't great. It would be better > > to add an exported function swiotlb_dev_disable() to swiotlb code that sets > > dma_io_tlb_mem to NULL, but you get the idea. > > > > Other reviewers may still see this approach as a bit of a hack, but it's a lot > > less of a hack than introducing Hyper-V specific DMA functions. > > swiotlb_dev_disable() is conceptually needed for TDISP devices, as TDISP > > devices must similarly protect confidentiality by not allowing use of the swiotlb. > > So adding swiotlb_dev_disable() is a step in the right direction, even if the > > eventual TDISP code does it slightly differently. Doing the disable on a > > per-device basis is also the right thing in the long run. > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-12-04 3:35 ` Michael Kelley @ 2025-12-04 11:35 ` Tianyu Lan 2025-12-05 4:10 ` Tianyu Lan 0 siblings, 1 reply; 8+ messages in thread From: Tianyu Lan @ 2025-12-04 11:35 UTC (permalink / raw) To: Michael Kelley Cc: Christoph Hellwig, Robin Murphy, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev, Tianyu Lan, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Dec 4, 2025 at 11:35 AM Michael Kelley <mhklinux@outlook.com> wrote: > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Wednesday, December 3, 2025 6:21 AM > > > > On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote: > > > > > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM > > [snip] > > > > > > > Here's my idea for an alternate approach. The goal is to allow use of the > > > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb > > > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the > > > default swiotlb memory. For VMBus devices, the calling sequence is > > > vmbus_device_register() -> device_register() -> device_initialize() -> > > > swiotlb_dev_init(). But if vmbus_device_register() could override the > > > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations > > > would be disabled on the device. Furthermore, is_swiotlb_force_bounce() > > > would return "false", and the normal DMA functions would not force the > > > use of bounce buffers. The entire code change looks like this: > > > > > > --- a/drivers/hv/vmbus_drv.c > > > +++ b/drivers/hv/vmbus_drv.c > > > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj) > > > child_device_obj->device.dma_mask = &child_device_obj->dma_mask; > > > dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); > > > > > > + device_initialize(&child_device_obj->device); > > > + if (child_device_obj->channel->co_external_memory) > > > + child_device_obj->device.dma_io_tlb_mem = NULL; > > > + > > > /* > > > * Register with the LDM. This will kick off the driver/device > > > * binding...which will eventually call vmbus_match() and vmbus_probe() > > > */ > > > - ret = device_register(&child_device_obj->device); > > > + ret = device_add(&child_device_obj->device); > > > if (ret) { > > > pr_err("Unable to register child device\n"); > > > put_device(&child_device_obj->device); > > > > > > I've only compile tested the above since I don't have an environment where > > > I can test Confidential VMBus. You would need to verify whether my thinking > > > is correct and this produces the intended result. > > > > Thanks Michael. I tested it and it seems to hit an issue. Will double check.with > > HCL/paravisor team. > > > > We considered such a change before. From Roman's previous patch, it seems to > > need to change phys_to_dma() and force_dma_unencrypted(). > > In a Hyper-V SEV-SNP VM with a paravisor, I assert that phys_to_dma() and > __phys_to_dma() do the same thing. phys_to_dma() calls dma_addr_encrypted(), > which does __sme_set(). But in a Hyper-V VM using vTOM, sme_me_mask is > always 0, so dma_addr_encrypted() is a no-op. dma_addr_unencrypted() and > dma_addr_canonical() are also no-ops. See include/linux/mem_encrypt.h. So > in a Hyper-V SEV-SNP VM, the DMA layer doesn't change anything related to > encryption when translating between a physical address and a DMA address. > Same thing is true for a Hyper-V TDX VM with paravisor. > > force_dma_unencrypted() will indeed return "true", and it is used in > phys_to_dma_direct(). But both return paths in phys_to_dma_direct() return the > same result because of dma_addr_unencrypted() and dma_addr_encrypted() > being no-ops. Other uses of force_dma_unencrypted() are only in the > dma_alloc_*() paths, but dma_alloc_*() isn't used by VMBus devices because > the device control structures are in the ring buffer, which as you have noted, is > already handled separately. So for the moment, I don't think the return value > from force_dma_unencrypted() matters. > > So I'm guessing something else unexpected is happening such that just disabling > the swiotlb on a per-device basis doesn't work. Assuming that Roman's original > patch actually worked, I'm trying to figure out how my idea is different in a way > that has a material effect on things. And if your patch works by going directly to > __phys_to_dma(), it should also work when using phys_to_dma() instead. > I agree with your analysis and your proposal is much simpler. The issue I hit was a lot of user space tasks were blocked after FIO test of several minutes. I also reproduced with DMA ops patch after longer test and so need to double check whether we missed something or it's caused by other issue. If it's related with private/shared address used in the normal guest, we may debug in the paravisor. -- Thanks Tianyu Lan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support 2025-12-04 11:35 ` Tianyu Lan @ 2025-12-05 4:10 ` Tianyu Lan 0 siblings, 0 replies; 8+ messages in thread From: Tianyu Lan @ 2025-12-05 4:10 UTC (permalink / raw) To: Michael Kelley Cc: Christoph Hellwig, Robin Murphy, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com, vdso@hexbites.dev, Tianyu Lan, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, Dec 4, 2025 at 7:35 PM Tianyu Lan <ltykernel@gmail.com> wrote: > > On Thu, Dec 4, 2025 at 11:35 AM Michael Kelley <mhklinux@outlook.com> wrote: > > > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Wednesday, December 3, 2025 6:21 AM > > > > > > On Sat, Nov 29, 2025 at 1:47 AM Michael Kelley <mhklinux@outlook.com> wrote: > > > > > > > > From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, November 24, 2025 10:29 AM > > > > [snip] > > > > > > > > > > Here's my idea for an alternate approach. The goal is to allow use of the > > > > swiotlb to be disabled on a per-device basis. A device is initialized for swiotlb > > > > usage by swiotlb_dev_init(), which sets dev->dma_io_tlb_mem to point to the > > > > default swiotlb memory. For VMBus devices, the calling sequence is > > > > vmbus_device_register() -> device_register() -> device_initialize() -> > > > > swiotlb_dev_init(). But if vmbus_device_register() could override the > > > > dev->dma_io_tlb_mem value and put it back to NULL, swiotlb operations > > > > would be disabled on the device. Furthermore, is_swiotlb_force_bounce() > > > > would return "false", and the normal DMA functions would not force the > > > > use of bounce buffers. The entire code change looks like this: > > > > > > > > --- a/drivers/hv/vmbus_drv.c > > > > +++ b/drivers/hv/vmbus_drv.c > > > > @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj) > > > > child_device_obj->device.dma_mask = &child_device_obj->dma_mask; > > > > dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); > > > > > > > > + device_initialize(&child_device_obj->device); > > > > + if (child_device_obj->channel->co_external_memory) > > > > + child_device_obj->device.dma_io_tlb_mem = NULL; > > > > + > > > > /* > > > > * Register with the LDM. This will kick off the driver/device > > > > * binding...which will eventually call vmbus_match() and vmbus_probe() > > > > */ > > > > - ret = device_register(&child_device_obj->device); > > > > + ret = device_add(&child_device_obj->device); > > > > if (ret) { > > > > pr_err("Unable to register child device\n"); > > > > put_device(&child_device_obj->device); > > > > > > > > I've only compile tested the above since I don't have an environment where > > > > I can test Confidential VMBus. You would need to verify whether my thinking > > > > is correct and this produces the intended result. > > > > > > Thanks Michael. I tested it and it seems to hit an issue. Will double check.with > > > HCL/paravisor team. > > > > > > We considered such a change before. From Roman's previous patch, it seems to > > > need to change phys_to_dma() and force_dma_unencrypted(). > > > > In a Hyper-V SEV-SNP VM with a paravisor, I assert that phys_to_dma() and > > __phys_to_dma() do the same thing. phys_to_dma() calls dma_addr_encrypted(), > > which does __sme_set(). But in a Hyper-V VM using vTOM, sme_me_mask is > > always 0, so dma_addr_encrypted() is a no-op. dma_addr_unencrypted() and > > dma_addr_canonical() are also no-ops. See include/linux/mem_encrypt.h. So > > in a Hyper-V SEV-SNP VM, the DMA layer doesn't change anything related to > > encryption when translating between a physical address and a DMA address. > > Same thing is true for a Hyper-V TDX VM with paravisor. > > > > force_dma_unencrypted() will indeed return "true", and it is used in > > phys_to_dma_direct(). But both return paths in phys_to_dma_direct() return the > > same result because of dma_addr_unencrypted() and dma_addr_encrypted() > > being no-ops. Other uses of force_dma_unencrypted() are only in the > > dma_alloc_*() paths, but dma_alloc_*() isn't used by VMBus devices because > > the device control structures are in the ring buffer, which as you have noted, is > > already handled separately. So for the moment, I don't think the return value > > from force_dma_unencrypted() matters. > > dma_alloc_*() is used by PCI device driver(e.g, Mana NIC driver) and If we need to support TDisp device, the change in the force_dma_unencrypted() is still necessary and dma address to TDisp device should be encrypted memory with sme_me_mask. From this point, Hyper-V specific dma ops may resolve this without change in the DMA core code. Otherwise, we still need to add a callback or other flag inside for platforms to check whether it should return encrypted/decrypted address to drivers. > > So I'm guessing something else unexpected is happening such that just disabling > > the swiotlb on a per-device basis doesn't work. Assuming that Roman's original > > patch actually worked, I'm trying to figure out how my idea is different in a way > > that has a material effect on things. And if your patch works by going directly to > > __phys_to_dma(), it should also work when using phys_to_dma() instead. > > The issue I met should not be related with bounce buffer disabling. I don't find any failure to map dma memory with bounce buffer. For disabling per-device swiotlb, it looks like work to set dma_io_ tlb_mem to be NULL and it makes is_swiotlb_force_bounce() returns false. -- Thanks Tianyu Lan ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-12-05 4:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-24 18:29 [RFC PATCH] Drivers: hv: Confidential VMBus exernal memory support Tianyu Lan 2025-11-27 5:15 ` vdso 2025-11-27 9:16 ` Tianyu Lan 2025-11-28 17:47 ` Michael Kelley 2025-12-03 14:21 ` Tianyu Lan 2025-12-04 3:35 ` Michael Kelley 2025-12-04 11:35 ` Tianyu Lan 2025-12-05 4:10 ` Tianyu Lan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).