public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
@ 2024-04-08  4:11 mhkelley58
  2024-04-08  4:11 ` [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices mhkelley58
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: mhkelley58 @ 2024-04-08  4:11 UTC (permalink / raw)
  To: robin.murphy, joro, will, jgross, sstabellini,
	oleksandr_tyshchenko, hch, m.szyprowski, iommu, linux-kernel,
	xen-devel
  Cc: petr, roberto.sassu

From: Michael Kelley <mhklinux@outlook.com>

Currently swiotlb_tbl_map_single() takes alloc_align_mask and
alloc_size arguments to specify an swiotlb allocation that is
larger than mapping_size. This larger allocation is used solely
by iommu_dma_map_single() to handle untrusted devices that should
not have DMA visibility to memory pages that are partially used
for unrelated kernel data.

Having two arguments to specify the allocation is redundant. While
alloc_align_mask naturally specifies the alignment of the starting
address of the allocation, it can also implicitly specify the size
by rounding up the mapping_size to that alignment.

Additionally, the current approach has an edge case bug.
iommu_dma_map_page() already does the rounding up to compute the
alloc_size argument. But swiotlb_tbl_map_single() then calculates
the alignment offset based on the DMA min_align_mask, and adds
that offset to alloc_size. If the offset is non-zero, the addition
may result in a value that is larger than the max the swiotlb can
allocate. If the rounding up is done _after_ the alignment offset is
added to the mapping_size (and the original mapping_size conforms to
the value returned by swiotlb_max_mapping_size), then the max that the
swiotlb can allocate will not be exceeded.

In view of these issues, simplify the swiotlb_tbl_map_single() interface
by removing the alloc_size argument. Most call sites pass the same
value for mapping_size and alloc_size, and they pass alloc_align_mask
as zero. Just remove the redundant argument from these callers, as they
will see no functional change. For iommu_dma_map_page() also remove
the alloc_size argument, and have swiotlb_tbl_map_single() compute
the alloc_size by rounding up mapping_size after adding the offset
based on min_align_mask. This has the side effect of fixing the
edge case bug but with no other functional change.

Also add a sanity test on the alloc_align_mask. While IOMMU code
currently ensures the granule is not larger than PAGE_SIZE, if
that guarantee were to be removed in the future, the downstream
effect on the swiotlb might go unnoticed until strange allocation
failures occurred.

Tested on an ARM64 system with 16K page size and some kernel
test-only hackery to allow modifying the DMA min_align_mask and
the granule size that becomes the alloc_align_mask. Tested these
combinations with a variety of original memory addresses and
sizes, including those that reproduce the edge case bug:

* 4K granule and 0 min_align_mask
* 4K granule and 0xFFF min_align_mask (4K - 1)
* 16K granule and 0xFFF min_align_mask
* 64K granule and 0xFFF min_align_mask
* 64K granule and 0x3FFF min_align_mask (16K - 1)

With the changes, all combinations pass.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
I've haven't used any "Fixes:" tags. This patch really should be
backported only if all the other recent swiotlb fixes get backported,
and I'm unclear on whether that will happen.

I saw the brief discussion about removing the "dir" parameter from
swiotlb_tbl_map_single(). That removal could easily be done as part
of this patch, since it's already changing the swiotlb_tbl_map_single()
parameters. But I think the conclusion of the discussion was to leave
the "dir" parameter for symmetry with the swiotlb_sync_*() functions.
Please correct me if that's wrong, and I'll respin this patch to do
the removal.

 drivers/iommu/dma-iommu.c |  2 +-
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  2 +-
 kernel/dma/swiotlb.c      | 56 +++++++++++++++++++++++++++++----------
 4 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 07d087eecc17..c21ef1388499 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1165,7 +1165,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		trace_swiotlb_bounced(dev, phys, size);
 
 		aligned_size = iova_align(iovad, size);
-		phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
+		phys = swiotlb_tbl_map_single(dev, phys, size,
 					      iova_mask(iovad), dir, attrs);
 
 		if (phys == DMA_MAPPING_ERROR)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 1c4ef5111651..6579ae3f6dac 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -216,7 +216,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	 */
 	trace_swiotlb_bounced(dev, dev_addr, size);
 
-	map = swiotlb_tbl_map_single(dev, phys, size, size, 0, dir, attrs);
+	map = swiotlb_tbl_map_single(dev, phys, size, 0, dir, attrs);
 	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ea23097e351f..14bc10c1bb23 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -43,7 +43,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 extern void __init swiotlb_update_mem_attributes(void);
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
-		size_t mapping_size, size_t alloc_size,
+		size_t mapping_size,
 		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
 		unsigned long attrs);
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a5e0dfc44d24..046da973a7e2 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1340,15 +1340,40 @@ static unsigned long mem_used(struct io_tlb_mem *mem)
 
 #endif /* CONFIG_DEBUG_FS */
 
+/**
+ * swiotlb_tbl_map_single() - bounce buffer map a single contiguous physical area
+ * @dev:		Device which maps the buffer.
+ * @orig_addr:		Original (non-bounced) physical IO buffer address
+ * @mapping_size:	Requested size of the actual bounce buffer, excluding
+ *			any pre- or post-padding for alignment
+ * @alloc_align_mask:	Required start and end alignment of the allocated buffer
+ * @dir:		DMA direction
+ * @attrs:		Optional DMA attributes for the map operation
+ *
+ * Find and allocate a suitable sequence of IO TLB slots for the request.
+ * The allocated space starts at an alignment specified by alloc_align_mask,
+ * and the size of the allocated space is rounded up so that the total amount
+ * of allocated space is a multiple of (alloc_align_mask + 1). If
+ * alloc_align_mask is zero, the allocated space may be at any alignment and
+ * the size is not rounded up.
+ *
+ * The returned address is within the allocated space and matches the bits
+ * of orig_addr that are specified in the DMA min_align_mask for the device. As
+ * such, this returned address may be offset from the beginning of the allocated
+ * space. The bounce buffer space starting at the returned address for
+ * mapping_size bytes is initialized to the contents of the original IO buffer
+ * area. Any pre-padding (due to an offset) and any post-padding (due to
+ * rounding-up the size) is not initialized.
+ */
 phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
-		size_t mapping_size, size_t alloc_size,
-		unsigned int alloc_align_mask, enum dma_data_direction dir,
-		unsigned long attrs)
+		size_t mapping_size, unsigned int alloc_align_mask,
+		enum dma_data_direction dir, unsigned long attrs)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 	unsigned int offset;
 	struct io_tlb_pool *pool;
 	unsigned int i;
+	size_t size;
 	int index;
 	phys_addr_t tlb_addr;
 	unsigned short pad_slots;
@@ -1362,20 +1387,24 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
 		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
 
-	if (mapping_size > alloc_size) {
-		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
-			      mapping_size, alloc_size);
-		return (phys_addr_t)DMA_MAPPING_ERROR;
-	}
+	/*
+	 * The default swiotlb memory pool is allocated with PAGE_SIZE
+	 * alignment. If a mapping is requested with larger alignment,
+	 * the mapping may be unable to use the initial slot(s) in all
+	 * sets of IO_TLB_SEGSIZE slots. In such case, a mapping request
+	 * of or near the maximum mapping size would always fail.
+	 */
+	dev_WARN_ONCE(dev, alloc_align_mask > ~PAGE_MASK,
+		"Alloc alignment may prevent fulfilling requests with max mapping_size\n");
 
 	offset = swiotlb_align_offset(dev, alloc_align_mask, orig_addr);
-	index = swiotlb_find_slots(dev, orig_addr,
-				   alloc_size + offset, alloc_align_mask, &pool);
+	size = ALIGN(mapping_size + offset, alloc_align_mask + 1);
+	index = swiotlb_find_slots(dev, orig_addr, size, alloc_align_mask, &pool);
 	if (index == -1) {
 		if (!(attrs & DMA_ATTR_NO_WARN))
 			dev_warn_ratelimited(dev,
 	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-				 alloc_size, mem->nslabs, mem_used(mem));
+				 size, mem->nslabs, mem_used(mem));
 		return (phys_addr_t)DMA_MAPPING_ERROR;
 	}
 
@@ -1388,7 +1417,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	offset &= (IO_TLB_SIZE - 1);
 	index += pad_slots;
 	pool->slots[index].pad_slots = pad_slots;
-	for (i = 0; i < nr_slots(alloc_size + offset); i++)
+	for (i = 0; i < (nr_slots(size) - pad_slots); i++)
 		pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
 	tlb_addr = slot_addr(pool->start, index) + offset;
 	/*
@@ -1543,8 +1572,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
 
 	trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size);
 
-	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, 0, dir,
-			attrs);
+	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, 0, dir, attrs);
 	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices
  2024-04-08  4:11 [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() mhkelley58
@ 2024-04-08  4:11 ` mhkelley58
  2024-05-06 15:48   ` Petr Tesařík
  2024-05-07  5:36   ` Christoph Hellwig
  2024-04-15 11:46 ` [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() Petr Tesařík
  2024-05-06 15:14 ` Michael Kelley
  2 siblings, 2 replies; 11+ messages in thread
From: mhkelley58 @ 2024-04-08  4:11 UTC (permalink / raw)
  To: robin.murphy, joro, will, jgross, sstabellini,
	oleksandr_tyshchenko, hch, m.szyprowski, iommu, linux-kernel,
	xen-devel
  Cc: petr, roberto.sassu

From: Michael Kelley <mhklinux@outlook.com>

iommu_dma_map_page() allocates swiotlb memory as a bounce buffer when
an untrusted device wants to map only part of the memory in an
granule. The goal is to disallow the untrusted device having
DMA access to unrelated kernel data that may be sharing the granule.
To meet this goal, the bounce buffer itself is zero'ed, and any
additional swiotlb memory up to alloc_size after the bounce buffer
end (i.e., "post-padding") is also zero'ed.

However, as of commit 901c7280ca0d ("Reinstate some of "swiotlb: rework
"fix info leak with DMA_FROM_DEVICE"""), swiotlb_tbl_map_single()
always initializes the contents of the bounce buffer to the original
memory. Zero'ing the bounce buffer is redundant and probably wrong per
the discussion in that commit. Only the post-padding needs to be
zero'ed.

Also, when the DMA min_align_mask is non-zero, the allocated bounce
buffer space may not start on a granule boundary. The swiotlb memory
from the granule boundary to the start of the allocated bounce buffer
might belong to some unrelated bounce buffer. So as described in the
"second issue" in [1], it can't be zero'ed to protect against untrusted
devices. But as of commit XXXXXXXXXXXX ("swiotlb: extend buffer
pre-padding to alloc_align_mask if necessary"), swiotlb_tbl_map_single()
allocates pre-padding slots when necessary to meet min_align_mask
requirements, making it possible to zero the pre-padding area as well.

Finally, iommu_dma_map_page() uses the swiotlb for untrusted devices
and also for certain kmalloc() memory. Current code does the zero'ing
for both cases, but it is needed only for the untrusted device case.

Fix all of this by updating iommu_dma_map_page() to zero both the
pre-padding and post-padding areas, but not the actual bounce buffer.
Do this only in the case where the bounce buffer is used because
of an untrusted device.

[1] https://lore.kernel.org/all/20210929023300.335969-1-stevensd@google.com/

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
I've wondered if this code for zero'ing the pre- and post-padding
should go in swiotlb_tbl_map_single(). The bounce buffer proper is
already being initialized there. But swiotlb_tbl_map_single()
would need to test for an untrusted device (or have a "zero the
padding" flag passed in as part of the "attrs" argument), which
adds complexity. Thoughts?

The commit ID of Petr's patch is X'ed out above because Petr's patch
hasn't gone into Linus' tree yet. We can add the real commit ID once
this patch is ready to go in.

Also I've haven't used any "Fixes:" tags. This patch really should
be backported only if all the other recent swiotlb fixes get
backported, and I'm unclear on whether that will happen.

 drivers/iommu/dma-iommu.c | 29 ++++++++++++++++-------------
 include/linux/iova.h      |  5 +++++
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c21ef1388499..ecac39b3190d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1154,9 +1154,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	 */
 	if (dev_use_swiotlb(dev, size, dir) &&
 	    iova_offset(iovad, phys | size)) {
-		void *padding_start;
-		size_t padding_size, aligned_size;
-
 		if (!is_swiotlb_active(dev)) {
 			dev_warn_once(dev, "DMA bounce buffers are inactive, unable to map unaligned transaction.\n");
 			return DMA_MAPPING_ERROR;
@@ -1164,24 +1161,30 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 
 		trace_swiotlb_bounced(dev, phys, size);
 
-		aligned_size = iova_align(iovad, size);
 		phys = swiotlb_tbl_map_single(dev, phys, size,
 					      iova_mask(iovad), dir, attrs);
 
 		if (phys == DMA_MAPPING_ERROR)
 			return DMA_MAPPING_ERROR;
 
-		/* Cleanup the padding area. */
-		padding_start = phys_to_virt(phys);
-		padding_size = aligned_size;
+		/*
+		 * Untrusted devices should not see padding areas with random
+		 * leftover kernel data, so zero the pre- and post-padding.
+		 * swiotlb_tbl_map_single() has initialized the bounce buffer
+		 * proper to the contents of the original memory buffer.
+		 */
+		if (dev_is_untrusted(dev)) {
+			size_t start, virt = (size_t)phys_to_virt(phys);
 
-		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-		    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
-			padding_start += size;
-			padding_size -= size;
-		}
+			/* Pre-padding */
+			start = iova_align_down(iovad, virt);
+			memset((void *)start, 0, virt - start);
 
-		memset(padding_start, 0, padding_size);
+			/* Post-padding */
+			start = virt + size;
+			memset((void *)start, 0,
+			       iova_align(iovad, start) - start);
+		}
 	}
 
 	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 83c00fac2acb..d2c4fd923efa 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -65,6 +65,11 @@ static inline size_t iova_align(struct iova_domain *iovad, size_t size)
 	return ALIGN(size, iovad->granule);
 }
 
+static inline size_t iova_align_down(struct iova_domain *iovad, size_t size)
+{
+	return ALIGN_DOWN(size, iovad->granule);
+}
+
 static inline dma_addr_t iova_dma_addr(struct iova_domain *iovad, struct iova *iova)
 {
 	return (dma_addr_t)iova->pfn_lo << iova_shift(iovad);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-04-08  4:11 [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() mhkelley58
  2024-04-08  4:11 ` [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices mhkelley58
@ 2024-04-15 11:46 ` Petr Tesařík
  2024-04-15 12:23   ` Michael Kelley
  2024-05-06 15:14 ` Michael Kelley
  2 siblings, 1 reply; 11+ messages in thread
From: Petr Tesařík @ 2024-04-15 11:46 UTC (permalink / raw)
  To: mhkelley58
  Cc: mhklinux, robin.murphy, joro, will, jgross, sstabellini,
	oleksandr_tyshchenko, hch, m.szyprowski, iommu, linux-kernel,
	xen-devel, roberto.sassu

On Sun,  7 Apr 2024 21:11:41 -0700
mhkelley58@gmail.com wrote:

> From: Michael Kelley <mhklinux@outlook.com>
> 
> Currently swiotlb_tbl_map_single() takes alloc_align_mask and
> alloc_size arguments to specify an swiotlb allocation that is
> larger than mapping_size. This larger allocation is used solely
> by iommu_dma_map_single() to handle untrusted devices that should
> not have DMA visibility to memory pages that are partially used
> for unrelated kernel data.
> 
> Having two arguments to specify the allocation is redundant. While
> alloc_align_mask naturally specifies the alignment of the starting
> address of the allocation, it can also implicitly specify the size
> by rounding up the mapping_size to that alignment.
> 
> Additionally, the current approach has an edge case bug.
> iommu_dma_map_page() already does the rounding up to compute the
> alloc_size argument. But swiotlb_tbl_map_single() then calculates
> the alignment offset based on the DMA min_align_mask, and adds
> that offset to alloc_size. If the offset is non-zero, the addition
> may result in a value that is larger than the max the swiotlb can
> allocate. If the rounding up is done _after_ the alignment offset is
> added to the mapping_size (and the original mapping_size conforms to
> the value returned by swiotlb_max_mapping_size), then the max that the
> swiotlb can allocate will not be exceeded.
> 
> In view of these issues, simplify the swiotlb_tbl_map_single() interface
> by removing the alloc_size argument. Most call sites pass the same
> value for mapping_size and alloc_size, and they pass alloc_align_mask
> as zero. Just remove the redundant argument from these callers, as they
> will see no functional change. For iommu_dma_map_page() also remove
> the alloc_size argument, and have swiotlb_tbl_map_single() compute
> the alloc_size by rounding up mapping_size after adding the offset
> based on min_align_mask. This has the side effect of fixing the
> edge case bug but with no other functional change.
> 
> Also add a sanity test on the alloc_align_mask. While IOMMU code
> currently ensures the granule is not larger than PAGE_SIZE, if
> that guarantee were to be removed in the future, the downstream
> effect on the swiotlb might go unnoticed until strange allocation
> failures occurred.
> 
> Tested on an ARM64 system with 16K page size and some kernel
> test-only hackery to allow modifying the DMA min_align_mask and
> the granule size that becomes the alloc_align_mask. Tested these
> combinations with a variety of original memory addresses and
> sizes, including those that reproduce the edge case bug:
> 
> * 4K granule and 0 min_align_mask
> * 4K granule and 0xFFF min_align_mask (4K - 1)
> * 16K granule and 0xFFF min_align_mask
> * 64K granule and 0xFFF min_align_mask
> * 64K granule and 0x3FFF min_align_mask (16K - 1)
> 
> With the changes, all combinations pass.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> I've haven't used any "Fixes:" tags. This patch really should be
> backported only if all the other recent swiotlb fixes get backported,
> and I'm unclear on whether that will happen.
> 
> I saw the brief discussion about removing the "dir" parameter from
> swiotlb_tbl_map_single(). That removal could easily be done as part
> of this patch, since it's already changing the swiotlb_tbl_map_single()
> parameters. But I think the conclusion of the discussion was to leave
> the "dir" parameter for symmetry with the swiotlb_sync_*() functions.
> Please correct me if that's wrong, and I'll respin this patch to do
> the removal.

Hi Michael,

sorry for taking so long to answer. Yes, there was no agreement on the
removal of the "dir" parameter, but I'm not sure it's because of
symmetry with swiotlb_sync_*(), because the topic was not really
discussed.

The discussion was about the KUnit test suite and whether direction is
a property of the bounce buffer or of each sync operation. Since DMA API
defines associates each DMA buffer with a direction, the direction
parameter passed to swiotlb_sync_*() should match what was passed to
swiotlb_tbl_map_single(), because that's how it is used by the generic
DMA code. In other words, if the parameter is kept, it should be kept
to match dma_map_*().

However, there is also symmetry with swiotlb_tbl_unmap_single(). This
function does use the parameter for the final sync. I believe there
should be a matching initial sync in swiotlb_tbl_map_single(). In
short, the buffer sync for DMA non-coherent devices should be moved from
swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
the flags parameter.

To sum it up:

* Do *NOT* remove the "dir" parameter.
* Let me send a patch which moves the initial buffer sync.

Petr T

>  drivers/iommu/dma-iommu.c |  2 +-
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   |  2 +-
>  kernel/dma/swiotlb.c      | 56
> +++++++++++++++++++++++++++++---------- 4 files changed, 45
> insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 07d087eecc17..c21ef1388499 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1165,7 +1165,7 @@ static dma_addr_t iommu_dma_map_page(struct
> device *dev, struct page *page, trace_swiotlb_bounced(dev, phys,
> size); 
>  		aligned_size = iova_align(iovad, size);
> -		phys = swiotlb_tbl_map_single(dev, phys, size,
> aligned_size,
> +		phys = swiotlb_tbl_map_single(dev, phys, size,
>  					      iova_mask(iovad), dir,
> attrs); 
>  		if (phys == DMA_MAPPING_ERROR)
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 1c4ef5111651..6579ae3f6dac 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -216,7 +216,7 @@ static dma_addr_t xen_swiotlb_map_page(struct
> device *dev, struct page *page, */
>  	trace_swiotlb_bounced(dev, dev_addr, size);
>  
> -	map = swiotlb_tbl_map_single(dev, phys, size, size, 0, dir,
> attrs);
> +	map = swiotlb_tbl_map_single(dev, phys, size, 0, dir, attrs);
>  	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
>  		return DMA_MAPPING_ERROR;
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index ea23097e351f..14bc10c1bb23 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -43,7 +43,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  extern void __init swiotlb_update_mem_attributes(void);
>  
>  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t
> phys,
> -		size_t mapping_size, size_t alloc_size,
> +		size_t mapping_size,
>  		unsigned int alloc_aligned_mask, enum
> dma_data_direction dir, unsigned long attrs);
>  
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index a5e0dfc44d24..046da973a7e2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -1340,15 +1340,40 @@ static unsigned long mem_used(struct
> io_tlb_mem *mem) 
>  #endif /* CONFIG_DEBUG_FS */
>  
> +/**
> + * swiotlb_tbl_map_single() - bounce buffer map a single contiguous
> physical area
> + * @dev:		Device which maps the buffer.
> + * @orig_addr:		Original (non-bounced) physical IO
> buffer address
> + * @mapping_size:	Requested size of the actual bounce buffer,
> excluding
> + *			any pre- or post-padding for alignment
> + * @alloc_align_mask:	Required start and end alignment of the
> allocated buffer
> + * @dir:		DMA direction
> + * @attrs:		Optional DMA attributes for the map
> operation
> + *
> + * Find and allocate a suitable sequence of IO TLB slots for the
> request.
> + * The allocated space starts at an alignment specified by
> alloc_align_mask,
> + * and the size of the allocated space is rounded up so that the
> total amount
> + * of allocated space is a multiple of (alloc_align_mask + 1). If
> + * alloc_align_mask is zero, the allocated space may be at any
> alignment and
> + * the size is not rounded up.
> + *
> + * The returned address is within the allocated space and matches
> the bits
> + * of orig_addr that are specified in the DMA min_align_mask for the
> device. As
> + * such, this returned address may be offset from the beginning of
> the allocated
> + * space. The bounce buffer space starting at the returned address
> for
> + * mapping_size bytes is initialized to the contents of the original
> IO buffer
> + * area. Any pre-padding (due to an offset) and any post-padding
> (due to
> + * rounding-up the size) is not initialized.
> + */
>  phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t
> orig_addr,
> -		size_t mapping_size, size_t alloc_size,
> -		unsigned int alloc_align_mask, enum
> dma_data_direction dir,
> -		unsigned long attrs)
> +		size_t mapping_size, unsigned int alloc_align_mask,
> +		enum dma_data_direction dir, unsigned long attrs)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  	unsigned int offset;
>  	struct io_tlb_pool *pool;
>  	unsigned int i;
> +	size_t size;
>  	int index;
>  	phys_addr_t tlb_addr;
>  	unsigned short pad_slots;
> @@ -1362,20 +1387,24 @@ phys_addr_t swiotlb_tbl_map_single(struct
> device *dev, phys_addr_t orig_addr, if
> (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) pr_warn_once("Memory
> encryption is active and system is using DMA bounce buffers\n"); 
> -	if (mapping_size > alloc_size) {
> -		dev_warn_once(dev, "Invalid sizes (mapping: %zd
> bytes, alloc: %zd bytes)",
> -			      mapping_size, alloc_size);
> -		return (phys_addr_t)DMA_MAPPING_ERROR;
> -	}
> +	/*
> +	 * The default swiotlb memory pool is allocated with
> PAGE_SIZE
> +	 * alignment. If a mapping is requested with larger
> alignment,
> +	 * the mapping may be unable to use the initial slot(s) in
> all
> +	 * sets of IO_TLB_SEGSIZE slots. In such case, a mapping
> request
> +	 * of or near the maximum mapping size would always fail.
> +	 */
> +	dev_WARN_ONCE(dev, alloc_align_mask > ~PAGE_MASK,
> +		"Alloc alignment may prevent fulfilling requests
> with max mapping_size\n"); 
>  	offset = swiotlb_align_offset(dev, alloc_align_mask,
> orig_addr);
> -	index = swiotlb_find_slots(dev, orig_addr,
> -				   alloc_size + offset,
> alloc_align_mask, &pool);
> +	size = ALIGN(mapping_size + offset, alloc_align_mask + 1);
> +	index = swiotlb_find_slots(dev, orig_addr, size,
> alloc_align_mask, &pool); if (index == -1) {
>  		if (!(attrs & DMA_ATTR_NO_WARN))
>  			dev_warn_ratelimited(dev,
>  	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots),
> used %lu (slots)\n",
> -				 alloc_size, mem->nslabs,
> mem_used(mem));
> +				 size, mem->nslabs, mem_used(mem));
>  		return (phys_addr_t)DMA_MAPPING_ERROR;
>  	}
>  
> @@ -1388,7 +1417,7 @@ phys_addr_t swiotlb_tbl_map_single(struct
> device *dev, phys_addr_t orig_addr, offset &= (IO_TLB_SIZE - 1);
>  	index += pad_slots;
>  	pool->slots[index].pad_slots = pad_slots;
> -	for (i = 0; i < nr_slots(alloc_size + offset); i++)
> +	for (i = 0; i < (nr_slots(size) - pad_slots); i++)
>  		pool->slots[index + i].orig_addr =
> slot_addr(orig_addr, i); tlb_addr = slot_addr(pool->start, index) +
> offset; /*
> @@ -1543,8 +1572,7 @@ dma_addr_t swiotlb_map(struct device *dev,
> phys_addr_t paddr, size_t size, 
>  	trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size);
>  
> -	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size,
> size, 0, dir,
> -			attrs);
> +	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, 0,
> dir, attrs); if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
>  		return DMA_MAPPING_ERROR;
>  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-04-15 11:46 ` [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() Petr Tesařík
@ 2024-04-15 12:23   ` Michael Kelley
  2024-04-15 12:50     ` Petr Tesařík
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Kelley @ 2024-04-15 12:23 UTC (permalink / raw)
  To: Petr Tesařík, mhkelley58@gmail.com
  Cc: robin.murphy@arm.com, joro@8bytes.org, will@kernel.org,
	jgross@suse.com, sstabellini@kernel.org,
	oleksandr_tyshchenko@epam.com, hch@lst.de,
	m.szyprowski@samsung.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org,
	roberto.sassu@huaweicloud.com

From: Petr Tesařík <petr@tesarici.cz> Sent: Monday, April 15, 2024 4:46 AM
> 
> Hi Michael,
> 
> sorry for taking so long to answer. Yes, there was no agreement on the
> removal of the "dir" parameter, but I'm not sure it's because of
> symmetry with swiotlb_sync_*(), because the topic was not really
> discussed.
> 
> The discussion was about the KUnit test suite and whether direction is
> a property of the bounce buffer or of each sync operation. Since DMA API
> defines associates each DMA buffer with a direction, the direction
> parameter passed to swiotlb_sync_*() should match what was passed to
> swiotlb_tbl_map_single(), because that's how it is used by the generic
> DMA code. In other words, if the parameter is kept, it should be kept
> to match dma_map_*().
> 
> However, there is also symmetry with swiotlb_tbl_unmap_single(). This
> function does use the parameter for the final sync. I believe there
> should be a matching initial sync in swiotlb_tbl_map_single(). In
> short, the buffer sync for DMA non-coherent devices should be moved from
> swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
> then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
> the flags parameter.
> 
> To sum it up:
> 
> * Do *NOT* remove the "dir" parameter.
> * Let me send a patch which moves the initial buffer sync.
> 

I'm not seeing the need to move the initial buffer sync.  All
callers of swiotlb_tbl_map_single() already have a subsequent
check for a non-coherent device, and a call to 
arch_sync_dma_for_device().  And the Xen code has some 
special handling that probably shouldn't go in
swiotlb_tbl_map_single().  Or am I missing something?

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-04-15 12:23   ` Michael Kelley
@ 2024-04-15 12:50     ` Petr Tesařík
  2024-04-15 13:03       ` Michael Kelley
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Tesařík @ 2024-04-15 12:50 UTC (permalink / raw)
  To: Michael Kelley
  Cc: mhkelley58@gmail.com, robin.murphy@arm.com, joro@8bytes.org,
	will@kernel.org, jgross@suse.com, sstabellini@kernel.org,
	oleksandr_tyshchenko@epam.com, hch@lst.de,
	m.szyprowski@samsung.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org,
	roberto.sassu@huaweicloud.com

On Mon, 15 Apr 2024 12:23:22 +0000
Michael Kelley <mhklinux@outlook.com> wrote:

> From: Petr Tesařík <petr@tesarici.cz> Sent: Monday, April 15, 2024 4:46 AM
> > 
> > Hi Michael,
> > 
> > sorry for taking so long to answer. Yes, there was no agreement on the
> > removal of the "dir" parameter, but I'm not sure it's because of
> > symmetry with swiotlb_sync_*(), because the topic was not really
> > discussed.
> > 
> > The discussion was about the KUnit test suite and whether direction is
> > a property of the bounce buffer or of each sync operation. Since DMA API
> > defines associates each DMA buffer with a direction, the direction
> > parameter passed to swiotlb_sync_*() should match what was passed to
> > swiotlb_tbl_map_single(), because that's how it is used by the generic
> > DMA code. In other words, if the parameter is kept, it should be kept
> > to match dma_map_*().
> > 
> > However, there is also symmetry with swiotlb_tbl_unmap_single(). This
> > function does use the parameter for the final sync. I believe there
> > should be a matching initial sync in swiotlb_tbl_map_single(). In
> > short, the buffer sync for DMA non-coherent devices should be moved from
> > swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
> > then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
> > the flags parameter.
> > 
> > To sum it up:
> > 
> > * Do *NOT* remove the "dir" parameter.
> > * Let me send a patch which moves the initial buffer sync.
> >   
> 
> I'm not seeing the need to move the initial buffer sync.  All
> callers of swiotlb_tbl_map_single() already have a subsequent
> check for a non-coherent device, and a call to 
> arch_sync_dma_for_device().  And the Xen code has some 
> special handling that probably shouldn't go in
> swiotlb_tbl_map_single().  Or am I missing something?

Oh, sure, there's nothing broken ATM. It's merely a cleanup. The API is
asymmetric and thus confusing. You get a final sync by default if you
call swiotlb_tbl_unmap_single(), but you don't get an initial sync by
default if you call swiotlb_tbl_map_single(). This is difficult to
remember, so potential new users of the API may incorrectly assume that
an initial sync is done, or that a final sync is not done.

And yes, when moving the code, all current users of
swiotlb_tbl_map_single() should specify DMA_ATTR_SKIP_CPU_SYNC.

Petr T

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-04-15 12:50     ` Petr Tesařík
@ 2024-04-15 13:03       ` Michael Kelley
  2024-04-15 13:19         ` Petr Tesařík
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Kelley @ 2024-04-15 13:03 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: mhkelley58@gmail.com, robin.murphy@arm.com, joro@8bytes.org,
	will@kernel.org, jgross@suse.com, sstabellini@kernel.org,
	oleksandr_tyshchenko@epam.com, hch@lst.de,
	m.szyprowski@samsung.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org,
	roberto.sassu@huaweicloud.com

From: Petr Tesařík <petr@tesarici.cz> Sent: Monday, April 15, 2024 5:50 AM
> 
> On Mon, 15 Apr 2024 12:23:22 +0000
> Michael Kelley <mhklinux@outlook.com> wrote:
> 
> > From: Petr Tesařík <petr@tesarici.cz> Sent: Monday, April 15, 2024 4:46 AM
> > >
> > > Hi Michael,
> > >
> > > sorry for taking so long to answer. Yes, there was no agreement on the
> > > removal of the "dir" parameter, but I'm not sure it's because of
> > > symmetry with swiotlb_sync_*(), because the topic was not really
> > > discussed.
> > >
> > > The discussion was about the KUnit test suite and whether direction is
> > > a property of the bounce buffer or of each sync operation. Since DMA API
> > > defines associates each DMA buffer with a direction, the direction
> > > parameter passed to swiotlb_sync_*() should match what was passed to
> > > swiotlb_tbl_map_single(), because that's how it is used by the generic
> > > DMA code. In other words, if the parameter is kept, it should be kept
> > > to match dma_map_*().
> > >
> > > However, there is also symmetry with swiotlb_tbl_unmap_single(). This
> > > function does use the parameter for the final sync. I believe there
> > > should be a matching initial sync in swiotlb_tbl_map_single(). In
> > > short, the buffer sync for DMA non-coherent devices should be moved from
> > > swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
> > > then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
> > > the flags parameter.
> > >
> > > To sum it up:
> > >
> > > * Do *NOT* remove the "dir" parameter.
> > > * Let me send a patch which moves the initial buffer sync.
> > >
> >
> > I'm not seeing the need to move the initial buffer sync.  All
> > callers of swiotlb_tbl_map_single() already have a subsequent
> > check for a non-coherent device, and a call to
> > arch_sync_dma_for_device().  And the Xen code has some
> > special handling that probably shouldn't go in
> > swiotlb_tbl_map_single().  Or am I missing something?
> 
> Oh, sure, there's nothing broken ATM. It's merely a cleanup. The API is
> asymmetric and thus confusing. You get a final sync by default if you
> call swiotlb_tbl_unmap_single(), 

I don't see that final sync in swiotlb_tbl_unmap_single().  It calls
swiotlb_bounce() to copy the data, but it doesn't deal with
non-coherent devices or call arch_sync_dma_for_cpu().

> but you don't get an initial sync by
> default if you call swiotlb_tbl_map_single(). This is difficult to
> remember, so potential new users of the API may incorrectly assume that
> an initial sync is done, or that a final sync is not done.
> 
> And yes, when moving the code, all current users of
> swiotlb_tbl_map_single() should specify DMA_ATTR_SKIP_CPU_SYNC.
> 
> Petr T

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-04-15 13:03       ` Michael Kelley
@ 2024-04-15 13:19         ` Petr Tesařík
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Tesařík @ 2024-04-15 13:19 UTC (permalink / raw)
  To: Michael Kelley
  Cc: mhkelley58@gmail.com, robin.murphy@arm.com, joro@8bytes.org,
	will@kernel.org, jgross@suse.com, sstabellini@kernel.org,
	oleksandr_tyshchenko@epam.com, hch@lst.de,
	m.szyprowski@samsung.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org,
	roberto.sassu@huaweicloud.com

On Mon, 15 Apr 2024 13:03:30 +0000
Michael Kelley <mhklinux@outlook.com> wrote:

> From: Petr Tesařík <petr@tesarici.cz> Sent: Monday, April 15, 2024 5:50 AM
> > 
> > On Mon, 15 Apr 2024 12:23:22 +0000
> > Michael Kelley <mhklinux@outlook.com> wrote:
> >   
> > > From: Petr Tesařík <petr@tesarici.cz> Sent: Monday, April 15, 2024 4:46 AM  
> > > >
> > > > Hi Michael,
> > > >
> > > > sorry for taking so long to answer. Yes, there was no agreement on the
> > > > removal of the "dir" parameter, but I'm not sure it's because of
> > > > symmetry with swiotlb_sync_*(), because the topic was not really
> > > > discussed.
> > > >
> > > > The discussion was about the KUnit test suite and whether direction is
> > > > a property of the bounce buffer or of each sync operation. Since DMA API
> > > > defines associates each DMA buffer with a direction, the direction
> > > > parameter passed to swiotlb_sync_*() should match what was passed to
> > > > swiotlb_tbl_map_single(), because that's how it is used by the generic
> > > > DMA code. In other words, if the parameter is kept, it should be kept
> > > > to match dma_map_*().
> > > >
> > > > However, there is also symmetry with swiotlb_tbl_unmap_single(). This
> > > > function does use the parameter for the final sync. I believe there
> > > > should be a matching initial sync in swiotlb_tbl_map_single(). In
> > > > short, the buffer sync for DMA non-coherent devices should be moved from
> > > > swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
> > > > then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
> > > > the flags parameter.
> > > >
> > > > To sum it up:
> > > >
> > > > * Do *NOT* remove the "dir" parameter.
> > > > * Let me send a patch which moves the initial buffer sync.
> > > >  
> > >
> > > I'm not seeing the need to move the initial buffer sync.  All
> > > callers of swiotlb_tbl_map_single() already have a subsequent
> > > check for a non-coherent device, and a call to
> > > arch_sync_dma_for_device().  And the Xen code has some
> > > special handling that probably shouldn't go in
> > > swiotlb_tbl_map_single().  Or am I missing something?  
> > 
> > Oh, sure, there's nothing broken ATM. It's merely a cleanup. The API is
> > asymmetric and thus confusing. You get a final sync by default if you
> > call swiotlb_tbl_unmap_single(),   
> 
> I don't see that final sync in swiotlb_tbl_unmap_single().  It calls
> swiotlb_bounce() to copy the data, but it doesn't deal with
> non-coherent devices or call arch_sync_dma_for_cpu().

Ouch. You're right! The buffer gets only bounced but not synced if
device DMA is non-coherent. So, how is this supposed to work?

Now I'm looking at the code in dma_direct_map_page(), and it calls
arch_sync_dma_for_device() explicitly, _except_ when using SWIOTLB. So,
maybe I should instead review all callers of swiotlb_map(), make sure
that they handle non-coherent devices, and then remove the sync from
swiotlb_map()?

I mean, the current situation seems somewhat disorganized to me.

Petr T

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-04-08  4:11 [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() mhkelley58
  2024-04-08  4:11 ` [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices mhkelley58
  2024-04-15 11:46 ` [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() Petr Tesařík
@ 2024-05-06 15:14 ` Michael Kelley
  2024-05-06 15:35   ` Petr Tesařík
  2 siblings, 1 reply; 11+ messages in thread
From: Michael Kelley @ 2024-05-06 15:14 UTC (permalink / raw)
  To: Michael Kelley, robin.murphy@arm.com, joro@8bytes.org,
	will@kernel.org, jgross@suse.com, sstabellini@kernel.org,
	oleksandr_tyshchenko@epam.com, hch@lst.de,
	m.szyprowski@samsung.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org
  Cc: petr@tesarici.cz, roberto.sassu@huaweicloud.com

From: mhkelley58@gmail.com <mhkelley58@gmail.com>
> 

Gentle ping ...

Anyone interested in reviewing this series of two patches?  It fixes
an edge case bug in the size of the swiotlb request coming from
dma-iommu, and plugs a hole that allows untrusted devices to see
kernel data unrelated to the intended DMA transfer.  I think these are
the last "known bugs" that came out of the extensive swiotlb discussion
and patches for 6.9.

Michael

> Currently swiotlb_tbl_map_single() takes alloc_align_mask and
> alloc_size arguments to specify an swiotlb allocation that is
> larger than mapping_size. This larger allocation is used solely
> by iommu_dma_map_single() to handle untrusted devices that should
> not have DMA visibility to memory pages that are partially used
> for unrelated kernel data.
> 
> Having two arguments to specify the allocation is redundant. While
> alloc_align_mask naturally specifies the alignment of the starting
> address of the allocation, it can also implicitly specify the size
> by rounding up the mapping_size to that alignment.
> 
> Additionally, the current approach has an edge case bug.
> iommu_dma_map_page() already does the rounding up to compute the
> alloc_size argument. But swiotlb_tbl_map_single() then calculates
> the alignment offset based on the DMA min_align_mask, and adds
> that offset to alloc_size. If the offset is non-zero, the addition
> may result in a value that is larger than the max the swiotlb can
> allocate. If the rounding up is done _after_ the alignment offset is
> added to the mapping_size (and the original mapping_size conforms to
> the value returned by swiotlb_max_mapping_size), then the max that the
> swiotlb can allocate will not be exceeded.
> 
> In view of these issues, simplify the swiotlb_tbl_map_single() interface
> by removing the alloc_size argument. Most call sites pass the same
> value for mapping_size and alloc_size, and they pass alloc_align_mask
> as zero. Just remove the redundant argument from these callers, as they
> will see no functional change. For iommu_dma_map_page() also remove
> the alloc_size argument, and have swiotlb_tbl_map_single() compute
> the alloc_size by rounding up mapping_size after adding the offset
> based on min_align_mask. This has the side effect of fixing the
> edge case bug but with no other functional change.
> 
> Also add a sanity test on the alloc_align_mask. While IOMMU code
> currently ensures the granule is not larger than PAGE_SIZE, if
> that guarantee were to be removed in the future, the downstream
> effect on the swiotlb might go unnoticed until strange allocation
> failures occurred.
> 
> Tested on an ARM64 system with 16K page size and some kernel
> test-only hackery to allow modifying the DMA min_align_mask and
> the granule size that becomes the alloc_align_mask. Tested these
> combinations with a variety of original memory addresses and
> sizes, including those that reproduce the edge case bug:
> 
> * 4K granule and 0 min_align_mask
> * 4K granule and 0xFFF min_align_mask (4K - 1)
> * 16K granule and 0xFFF min_align_mask
> * 64K granule and 0xFFF min_align_mask
> * 64K granule and 0x3FFF min_align_mask (16K - 1)
> 
> With the changes, all combinations pass.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> I've haven't used any "Fixes:" tags. This patch really should be
> backported only if all the other recent swiotlb fixes get backported,
> and I'm unclear on whether that will happen.
> 
> I saw the brief discussion about removing the "dir" parameter from
> swiotlb_tbl_map_single(). That removal could easily be done as part
> of this patch, since it's already changing the swiotlb_tbl_map_single()
> parameters. But I think the conclusion of the discussion was to leave
> the "dir" parameter for symmetry with the swiotlb_sync_*() functions.
> Please correct me if that's wrong, and I'll respin this patch to do
> the removal.
> 
>  drivers/iommu/dma-iommu.c |  2 +-
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   |  2 +-
>  kernel/dma/swiotlb.c      | 56 +++++++++++++++++++++++++++++----------
>  4 files changed, 45 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 07d087eecc17..c21ef1388499 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1165,7 +1165,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>  		trace_swiotlb_bounced(dev, phys, size);
> 
>  		aligned_size = iova_align(iovad, size);
> -		phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
> +		phys = swiotlb_tbl_map_single(dev, phys, size,
>  					      iova_mask(iovad), dir, attrs);
> 
>  		if (phys == DMA_MAPPING_ERROR)
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 1c4ef5111651..6579ae3f6dac 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -216,7 +216,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
>  	 */
>  	trace_swiotlb_bounced(dev, dev_addr, size);
> 
> -	map = swiotlb_tbl_map_single(dev, phys, size, size, 0, dir, attrs);
> +	map = swiotlb_tbl_map_single(dev, phys, size, 0, dir, attrs);
>  	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
>  		return DMA_MAPPING_ERROR;
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index ea23097e351f..14bc10c1bb23 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -43,7 +43,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  extern void __init swiotlb_update_mem_attributes(void);
> 
>  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> -		size_t mapping_size, size_t alloc_size,
> +		size_t mapping_size,
>  		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
>  		unsigned long attrs);
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index a5e0dfc44d24..046da973a7e2 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -1340,15 +1340,40 @@ static unsigned long mem_used(struct io_tlb_mem
> *mem)
> 
>  #endif /* CONFIG_DEBUG_FS */
> 
> +/**
> + * swiotlb_tbl_map_single() - bounce buffer map a single contiguous physical area
> + * @dev:		Device which maps the buffer.
> + * @orig_addr:		Original (non-bounced) physical IO buffer address
> + * @mapping_size:	Requested size of the actual bounce buffer, excluding
> + *			any pre- or post-padding for alignment
> + * @alloc_align_mask:	Required start and end alignment of the allocated buffer
> + * @dir:		DMA direction
> + * @attrs:		Optional DMA attributes for the map operation
> + *
> + * Find and allocate a suitable sequence of IO TLB slots for the request.
> + * The allocated space starts at an alignment specified by alloc_align_mask,
> + * and the size of the allocated space is rounded up so that the total amount
> + * of allocated space is a multiple of (alloc_align_mask + 1). If
> + * alloc_align_mask is zero, the allocated space may be at any alignment and
> + * the size is not rounded up.
> + *
> + * The returned address is within the allocated space and matches the bits
> + * of orig_addr that are specified in the DMA min_align_mask for the device. As
> + * such, this returned address may be offset from the beginning of the allocated
> + * space. The bounce buffer space starting at the returned address for
> + * mapping_size bytes is initialized to the contents of the original IO buffer
> + * area. Any pre-padding (due to an offset) and any post-padding (due to
> + * rounding-up the size) is not initialized.
> + */
>  phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> -		size_t mapping_size, size_t alloc_size,
> -		unsigned int alloc_align_mask, enum dma_data_direction dir,
> -		unsigned long attrs)
> +		size_t mapping_size, unsigned int alloc_align_mask,
> +		enum dma_data_direction dir, unsigned long attrs)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  	unsigned int offset;
>  	struct io_tlb_pool *pool;
>  	unsigned int i;
> +	size_t size;
>  	int index;
>  	phys_addr_t tlb_addr;
>  	unsigned short pad_slots;
> @@ -1362,20 +1387,24 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>  		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> 
> -	if (mapping_size > alloc_size) {
> -		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
> -			      mapping_size, alloc_size);
> -		return (phys_addr_t)DMA_MAPPING_ERROR;
> -	}
> +	/*
> +	 * The default swiotlb memory pool is allocated with PAGE_SIZE
> +	 * alignment. If a mapping is requested with larger alignment,
> +	 * the mapping may be unable to use the initial slot(s) in all
> +	 * sets of IO_TLB_SEGSIZE slots. In such case, a mapping request
> +	 * of or near the maximum mapping size would always fail.
> +	 */
> +	dev_WARN_ONCE(dev, alloc_align_mask > ~PAGE_MASK,
> +		"Alloc alignment may prevent fulfilling requests with max mapping_size\n");
> 
>  	offset = swiotlb_align_offset(dev, alloc_align_mask, orig_addr);
> -	index = swiotlb_find_slots(dev, orig_addr,
> -				   alloc_size + offset, alloc_align_mask, &pool);
> +	size = ALIGN(mapping_size + offset, alloc_align_mask + 1);
> +	index = swiotlb_find_slots(dev, orig_addr, size, alloc_align_mask, &pool);
>  	if (index == -1) {
>  		if (!(attrs & DMA_ATTR_NO_WARN))
>  			dev_warn_ratelimited(dev,
>  	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
> -				 alloc_size, mem->nslabs, mem_used(mem));
> +				 size, mem->nslabs, mem_used(mem));
>  		return (phys_addr_t)DMA_MAPPING_ERROR;
>  	}
> 
> @@ -1388,7 +1417,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>  	offset &= (IO_TLB_SIZE - 1);
>  	index += pad_slots;
>  	pool->slots[index].pad_slots = pad_slots;
> -	for (i = 0; i < nr_slots(alloc_size + offset); i++)
> +	for (i = 0; i < (nr_slots(size) - pad_slots); i++)
>  		pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
>  	tlb_addr = slot_addr(pool->start, index) + offset;
>  	/*
> @@ -1543,8 +1572,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> 
>  	trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size);
> 
> -	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, 0, dir,
> -			attrs);
> +	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, 0, dir, attrs);
>  	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
>  		return DMA_MAPPING_ERROR;
> 
> --
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()
  2024-05-06 15:14 ` Michael Kelley
@ 2024-05-06 15:35   ` Petr Tesařík
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Tesařík @ 2024-05-06 15:35 UTC (permalink / raw)
  To: Michael Kelley
  Cc: robin.murphy@arm.com, joro@8bytes.org, will@kernel.org,
	jgross@suse.com, sstabellini@kernel.org,
	oleksandr_tyshchenko@epam.com, hch@lst.de,
	m.szyprowski@samsung.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org,
	roberto.sassu@huaweicloud.com

On Mon, 6 May 2024 15:14:05 +0000
Michael Kelley <mhklinux@outlook.com> wrote:

> From: mhkelley58@gmail.com <mhkelley58@gmail.com>
> >   
> 
> Gentle ping ...
> 
> Anyone interested in reviewing this series of two patches?  It fixes
> an edge case bug in the size of the swiotlb request coming from
> dma-iommu, and plugs a hole that allows untrusted devices to see
> kernel data unrelated to the intended DMA transfer.  I think these are
> the last "known bugs" that came out of the extensive swiotlb discussion
> and patches for 6.9.
> 
> Michael
> 
> > Currently swiotlb_tbl_map_single() takes alloc_align_mask and
> > alloc_size arguments to specify an swiotlb allocation that is
> > larger than mapping_size. This larger allocation is used solely
> > by iommu_dma_map_single() to handle untrusted devices that should
> > not have DMA visibility to memory pages that are partially used
> > for unrelated kernel data.
> > 
> > Having two arguments to specify the allocation is redundant. While
> > alloc_align_mask naturally specifies the alignment of the starting
> > address of the allocation, it can also implicitly specify the size
> > by rounding up the mapping_size to that alignment.
> > 
> > Additionally, the current approach has an edge case bug.
> > iommu_dma_map_page() already does the rounding up to compute the
> > alloc_size argument. But swiotlb_tbl_map_single() then calculates
> > the alignment offset based on the DMA min_align_mask, and adds
> > that offset to alloc_size. If the offset is non-zero, the addition
> > may result in a value that is larger than the max the swiotlb can
> > allocate. If the rounding up is done _after_ the alignment offset is
> > added to the mapping_size (and the original mapping_size conforms to
> > the value returned by swiotlb_max_mapping_size), then the max that the
> > swiotlb can allocate will not be exceeded.
> > 
> > In view of these issues, simplify the swiotlb_tbl_map_single() interface
> > by removing the alloc_size argument. Most call sites pass the same
> > value for mapping_size and alloc_size, and they pass alloc_align_mask
> > as zero. Just remove the redundant argument from these callers, as they
> > will see no functional change. For iommu_dma_map_page() also remove
> > the alloc_size argument, and have swiotlb_tbl_map_single() compute
> > the alloc_size by rounding up mapping_size after adding the offset
> > based on min_align_mask. This has the side effect of fixing the
> > edge case bug but with no other functional change.
> > 
> > Also add a sanity test on the alloc_align_mask. While IOMMU code
> > currently ensures the granule is not larger than PAGE_SIZE, if
> > that guarantee were to be removed in the future, the downstream
> > effect on the swiotlb might go unnoticed until strange allocation
> > failures occurred.
> > 
> > Tested on an ARM64 system with 16K page size and some kernel
> > test-only hackery to allow modifying the DMA min_align_mask and
> > the granule size that becomes the alloc_align_mask. Tested these
> > combinations with a variety of original memory addresses and
> > sizes, including those that reproduce the edge case bug:
> > 
> > * 4K granule and 0 min_align_mask
> > * 4K granule and 0xFFF min_align_mask (4K - 1)
> > * 16K granule and 0xFFF min_align_mask
> > * 64K granule and 0xFFF min_align_mask
> > * 64K granule and 0x3FFF min_align_mask (16K - 1)
> > 
> > With the changes, all combinations pass.
> > 
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>

Looks good to me. My previous discussion was not related to this
change; I was merely trying to find an answer to your question whether
anything else should be changed, and IIUC the result was that not.

Reviewed-by: Petr Tesarik <petr@tesarici.cz>

Petr T

> > ---
> > I've haven't used any "Fixes:" tags. This patch really should be
> > backported only if all the other recent swiotlb fixes get backported,
> > and I'm unclear on whether that will happen.
> > 
> > I saw the brief discussion about removing the "dir" parameter from
> > swiotlb_tbl_map_single(). That removal could easily be done as part
> > of this patch, since it's already changing the swiotlb_tbl_map_single()
> > parameters. But I think the conclusion of the discussion was to leave
> > the "dir" parameter for symmetry with the swiotlb_sync_*() functions.
> > Please correct me if that's wrong, and I'll respin this patch to do
> > the removal.
> > 
> >  drivers/iommu/dma-iommu.c |  2 +-
> >  drivers/xen/swiotlb-xen.c |  2 +-
> >  include/linux/swiotlb.h   |  2 +-
> >  kernel/dma/swiotlb.c      | 56 +++++++++++++++++++++++++++++----------
> >  4 files changed, 45 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 07d087eecc17..c21ef1388499 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1165,7 +1165,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> >  		trace_swiotlb_bounced(dev, phys, size);
> > 
> >  		aligned_size = iova_align(iovad, size);
> > -		phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
> > +		phys = swiotlb_tbl_map_single(dev, phys, size,
> >  					      iova_mask(iovad), dir, attrs);
> > 
> >  		if (phys == DMA_MAPPING_ERROR)
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 1c4ef5111651..6579ae3f6dac 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -216,7 +216,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> >  	 */
> >  	trace_swiotlb_bounced(dev, dev_addr, size);
> > 
> > -	map = swiotlb_tbl_map_single(dev, phys, size, size, 0, dir, attrs);
> > +	map = swiotlb_tbl_map_single(dev, phys, size, 0, dir, attrs);
> >  	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
> >  		return DMA_MAPPING_ERROR;
> > 
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index ea23097e351f..14bc10c1bb23 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -43,7 +43,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> >  extern void __init swiotlb_update_mem_attributes(void);
> > 
> >  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> > -		size_t mapping_size, size_t alloc_size,
> > +		size_t mapping_size,
> >  		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
> >  		unsigned long attrs);
> > 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index a5e0dfc44d24..046da973a7e2 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -1340,15 +1340,40 @@ static unsigned long mem_used(struct io_tlb_mem
> > *mem)
> > 
> >  #endif /* CONFIG_DEBUG_FS */
> > 
> > +/**
> > + * swiotlb_tbl_map_single() - bounce buffer map a single contiguous physical area
> > + * @dev:		Device which maps the buffer.
> > + * @orig_addr:		Original (non-bounced) physical IO buffer address
> > + * @mapping_size:	Requested size of the actual bounce buffer, excluding
> > + *			any pre- or post-padding for alignment
> > + * @alloc_align_mask:	Required start and end alignment of the allocated buffer
> > + * @dir:		DMA direction
> > + * @attrs:		Optional DMA attributes for the map operation
> > + *
> > + * Find and allocate a suitable sequence of IO TLB slots for the request.
> > + * The allocated space starts at an alignment specified by alloc_align_mask,
> > + * and the size of the allocated space is rounded up so that the total amount
> > + * of allocated space is a multiple of (alloc_align_mask + 1). If
> > + * alloc_align_mask is zero, the allocated space may be at any alignment and
> > + * the size is not rounded up.
> > + *
> > + * The returned address is within the allocated space and matches the bits
> > + * of orig_addr that are specified in the DMA min_align_mask for the device. As
> > + * such, this returned address may be offset from the beginning of the allocated
> > + * space. The bounce buffer space starting at the returned address for
> > + * mapping_size bytes is initialized to the contents of the original IO buffer
> > + * area. Any pre-padding (due to an offset) and any post-padding (due to
> > + * rounding-up the size) is not initialized.
> > + */
> >  phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > -		size_t mapping_size, size_t alloc_size,
> > -		unsigned int alloc_align_mask, enum dma_data_direction dir,
> > -		unsigned long attrs)
> > +		size_t mapping_size, unsigned int alloc_align_mask,
> > +		enum dma_data_direction dir, unsigned long attrs)
> >  {
> >  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >  	unsigned int offset;
> >  	struct io_tlb_pool *pool;
> >  	unsigned int i;
> > +	size_t size;
> >  	int index;
> >  	phys_addr_t tlb_addr;
> >  	unsigned short pad_slots;
> > @@ -1362,20 +1387,24 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> >  		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
> > 
> > -	if (mapping_size > alloc_size) {
> > -		dev_warn_once(dev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
> > -			      mapping_size, alloc_size);
> > -		return (phys_addr_t)DMA_MAPPING_ERROR;
> > -	}
> > +	/*
> > +	 * The default swiotlb memory pool is allocated with PAGE_SIZE
> > +	 * alignment. If a mapping is requested with larger alignment,
> > +	 * the mapping may be unable to use the initial slot(s) in all
> > +	 * sets of IO_TLB_SEGSIZE slots. In such case, a mapping request
> > +	 * of or near the maximum mapping size would always fail.
> > +	 */
> > +	dev_WARN_ONCE(dev, alloc_align_mask > ~PAGE_MASK,
> > +		"Alloc alignment may prevent fulfilling requests with max mapping_size\n");
> > 
> >  	offset = swiotlb_align_offset(dev, alloc_align_mask, orig_addr);
> > -	index = swiotlb_find_slots(dev, orig_addr,
> > -				   alloc_size + offset, alloc_align_mask, &pool);
> > +	size = ALIGN(mapping_size + offset, alloc_align_mask + 1);
> > +	index = swiotlb_find_slots(dev, orig_addr, size, alloc_align_mask, &pool);
> >  	if (index == -1) {
> >  		if (!(attrs & DMA_ATTR_NO_WARN))
> >  			dev_warn_ratelimited(dev,
> >  	"swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
> > -				 alloc_size, mem->nslabs, mem_used(mem));
> > +				 size, mem->nslabs, mem_used(mem));
> >  		return (phys_addr_t)DMA_MAPPING_ERROR;
> >  	}
> > 
> > @@ -1388,7 +1417,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> >  	offset &= (IO_TLB_SIZE - 1);
> >  	index += pad_slots;
> >  	pool->slots[index].pad_slots = pad_slots;
> > -	for (i = 0; i < nr_slots(alloc_size + offset); i++)
> > +	for (i = 0; i < (nr_slots(size) - pad_slots); i++)
> >  		pool->slots[index + i].orig_addr = slot_addr(orig_addr, i);
> >  	tlb_addr = slot_addr(pool->start, index) + offset;
> >  	/*
> > @@ -1543,8 +1572,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> > 
> >  	trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size);
> > 
> > -	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, 0, dir,
> > -			attrs);
> > +	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, 0, dir, attrs);
> >  	if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
> >  		return DMA_MAPPING_ERROR;
> > 
> > --
> > 2.25.1
> >   
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices
  2024-04-08  4:11 ` [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices mhkelley58
@ 2024-05-06 15:48   ` Petr Tesařík
  2024-05-07  5:36   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Petr Tesařík @ 2024-05-06 15:48 UTC (permalink / raw)
  To: mhkelley58
  Cc: mhklinux, robin.murphy, joro, will, jgross, sstabellini,
	oleksandr_tyshchenko, hch, m.szyprowski, iommu, linux-kernel,
	xen-devel, roberto.sassu

V Sun,  7 Apr 2024 21:11:42 -0700
mhkelley58@gmail.com napsáno:

> From: Michael Kelley <mhklinux@outlook.com>
> 
> iommu_dma_map_page() allocates swiotlb memory as a bounce buffer when
> an untrusted device wants to map only part of the memory in an
> granule. The goal is to disallow the untrusted device having
> DMA access to unrelated kernel data that may be sharing the granule.
> To meet this goal, the bounce buffer itself is zero'ed, and any
> additional swiotlb memory up to alloc_size after the bounce buffer
> end (i.e., "post-padding") is also zero'ed.
> 
> However, as of commit 901c7280ca0d ("Reinstate some of "swiotlb: rework
> "fix info leak with DMA_FROM_DEVICE"""), swiotlb_tbl_map_single()
> always initializes the contents of the bounce buffer to the original
> memory. Zero'ing the bounce buffer is redundant and probably wrong per
> the discussion in that commit. Only the post-padding needs to be
> zero'ed.
> 
> Also, when the DMA min_align_mask is non-zero, the allocated bounce
> buffer space may not start on a granule boundary. The swiotlb memory
> from the granule boundary to the start of the allocated bounce buffer
> might belong to some unrelated bounce buffer. So as described in the
> "second issue" in [1], it can't be zero'ed to protect against untrusted
> devices. But as of commit XXXXXXXXXXXX ("swiotlb: extend buffer
> pre-padding to alloc_align_mask if necessary"), swiotlb_tbl_map_single()

This is now commit af133562d5af.

> allocates pre-padding slots when necessary to meet min_align_mask
> requirements, making it possible to zero the pre-padding area as well.
> 
> Finally, iommu_dma_map_page() uses the swiotlb for untrusted devices
> and also for certain kmalloc() memory. Current code does the zero'ing
> for both cases, but it is needed only for the untrusted device case.
> 
> Fix all of this by updating iommu_dma_map_page() to zero both the
> pre-padding and post-padding areas, but not the actual bounce buffer.
> Do this only in the case where the bounce buffer is used because
> of an untrusted device.
> 
> [1] https://lore.kernel.org/all/20210929023300.335969-1-stevensd@google.com/
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> I've wondered if this code for zero'ing the pre- and post-padding
> should go in swiotlb_tbl_map_single(). The bounce buffer proper is
> already being initialized there. But swiotlb_tbl_map_single()
> would need to test for an untrusted device (or have a "zero the
> padding" flag passed in as part of the "attrs" argument), which
> adds complexity. Thoughts?

Historically, swiotlb has never cared about exposing data from a
previous user of a bounce buffer. I assume that's because it was
pointless to make an attempt at protecting system memory from a
malicious device that can do DMA to any address anyway. The situation
has changed with hardware IOMMUs, and that could be why the zeroing is
only done in the IOMMU path.

In short, if anybody can explain the value of concealing potentially
sensitive data from devices that are not behind an IOMMU, let's move
the zeroing to swiotlb. Otherwise, let's keep what we have.

Other than that (and the missing commit id), the patch looks good to me.

Reviewed-by: Petr Tesarik <petr@tesarici.cz>

Petr T

> 
> The commit ID of Petr's patch is X'ed out above because Petr's patch
> hasn't gone into Linus' tree yet. We can add the real commit ID once
> this patch is ready to go in.
> 
> Also I've haven't used any "Fixes:" tags. This patch really should
> be backported only if all the other recent swiotlb fixes get
> backported, and I'm unclear on whether that will happen.
> 
>  drivers/iommu/dma-iommu.c | 29 ++++++++++++++++-------------
>  include/linux/iova.h      |  5 +++++
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c21ef1388499..ecac39b3190d 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1154,9 +1154,6 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>  	 */
>  	if (dev_use_swiotlb(dev, size, dir) &&
>  	    iova_offset(iovad, phys | size)) {
> -		void *padding_start;
> -		size_t padding_size, aligned_size;
> -
>  		if (!is_swiotlb_active(dev)) {
>  			dev_warn_once(dev, "DMA bounce buffers are inactive, unable to map unaligned transaction.\n");
>  			return DMA_MAPPING_ERROR;
> @@ -1164,24 +1161,30 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>  
>  		trace_swiotlb_bounced(dev, phys, size);
>  
> -		aligned_size = iova_align(iovad, size);
>  		phys = swiotlb_tbl_map_single(dev, phys, size,
>  					      iova_mask(iovad), dir, attrs);
>  
>  		if (phys == DMA_MAPPING_ERROR)
>  			return DMA_MAPPING_ERROR;
>  
> -		/* Cleanup the padding area. */
> -		padding_start = phys_to_virt(phys);
> -		padding_size = aligned_size;
> +		/*
> +		 * Untrusted devices should not see padding areas with random
> +		 * leftover kernel data, so zero the pre- and post-padding.
> +		 * swiotlb_tbl_map_single() has initialized the bounce buffer
> +		 * proper to the contents of the original memory buffer.
> +		 */
> +		if (dev_is_untrusted(dev)) {
> +			size_t start, virt = (size_t)phys_to_virt(phys);
>  
> -		if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> -		    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> -			padding_start += size;
> -			padding_size -= size;
> -		}
> +			/* Pre-padding */
> +			start = iova_align_down(iovad, virt);
> +			memset((void *)start, 0, virt - start);
>  
> -		memset(padding_start, 0, padding_size);
> +			/* Post-padding */
> +			start = virt + size;
> +			memset((void *)start, 0,
> +			       iova_align(iovad, start) - start);
> +		}
>  	}
>  
>  	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 83c00fac2acb..d2c4fd923efa 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -65,6 +65,11 @@ static inline size_t iova_align(struct iova_domain *iovad, size_t size)
>  	return ALIGN(size, iovad->granule);
>  }
>  
> +static inline size_t iova_align_down(struct iova_domain *iovad, size_t size)
> +{
> +	return ALIGN_DOWN(size, iovad->granule);
> +}
> +
>  static inline dma_addr_t iova_dma_addr(struct iova_domain *iovad, struct iova *iova)
>  {
>  	return (dma_addr_t)iova->pfn_lo << iova_shift(iovad);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices
  2024-04-08  4:11 ` [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices mhkelley58
  2024-05-06 15:48   ` Petr Tesařík
@ 2024-05-07  5:36   ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-05-07  5:36 UTC (permalink / raw)
  To: mhklinux
  Cc: robin.murphy, joro, will, jgross, sstabellini,
	oleksandr_tyshchenko, hch, m.szyprowski, iommu, linux-kernel,
	xen-devel, petr, roberto.sassu

On Sun, Apr 07, 2024 at 09:11:42PM -0700, mhkelley58@gmail.com wrote:
> I've wondered if this code for zero'ing the pre- and post-padding
> should go in swiotlb_tbl_map_single(). The bounce buffer proper is
> already being initialized there. But swiotlb_tbl_map_single()
> would need to test for an untrusted device (or have a "zero the
> padding" flag passed in as part of the "attrs" argument), which
> adds complexity. Thoughts?

If we want to go down that route it should be the latter.  I'm
not sure if it is an improvement, but we'd have to implement it
to see if it does.

> The commit ID of Petr's patch is X'ed out above because Petr's patch
> hasn't gone into Linus' tree yet. We can add the real commit ID once
> this patch is ready to go in.

I've fixed that up and commit the series.

Thanks a lot!


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-05-07  5:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-08  4:11 [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() mhkelley58
2024-04-08  4:11 ` [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices mhkelley58
2024-05-06 15:48   ` Petr Tesařík
2024-05-07  5:36   ` Christoph Hellwig
2024-04-15 11:46 ` [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single() Petr Tesařík
2024-04-15 12:23   ` Michael Kelley
2024-04-15 12:50     ` Petr Tesařík
2024-04-15 13:03       ` Michael Kelley
2024-04-15 13:19         ` Petr Tesařík
2024-05-06 15:14 ` Michael Kelley
2024-05-06 15:35   ` Petr Tesařík

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox