[PATCH v4 00/14] Restricted DMA

devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/14] Restricted DMA
@ 2021-02-09  6:21 Claire Chang
  2021-02-09  6:21 ` [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start Claire Chang
                   ` (15 more replies)
  0 siblings, 16 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

Claire Chang (14):
  swiotlb: Remove external access to io_tlb_start
  swiotlb: Move is_swiotlb_buffer() to swiotlb.c
  swiotlb: Add struct swiotlb
  swiotlb: Refactor swiotlb_late_init_with_tbl
  swiotlb: Add DMA_RESTRICTED_POOL
  swiotlb: Add restricted DMA pool
  swiotlb: Update swiotlb API to gain a struct device argument
  swiotlb: Use restricted DMA pool if available
  swiotlb: Refactor swiotlb_tbl_{map,unmap}_single
  dma-direct: Add a new wrapper __dma_direct_free_pages()
  swiotlb: Add is_dev_swiotlb_force()
  swiotlb: Add restricted DMA alloc/free support.
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt       |  24 +
 arch/powerpc/platforms/pseries/svm.c          |   4 +-
 drivers/iommu/dma-iommu.c                     |  12 +-
 drivers/of/address.c                          |  25 +
 drivers/of/device.c                           |   3 +
 drivers/of/of_private.h                       |   5 +
 drivers/xen/swiotlb-xen.c                     |   4 +-
 include/linux/device.h                        |   4 +
 include/linux/swiotlb.h                       |  32 +-
 kernel/dma/Kconfig                            |  14 +
 kernel/dma/direct.c                           |  51 +-
 kernel/dma/direct.h                           |   8 +-
 kernel/dma/swiotlb.c                          | 636 ++++++++++++------
 13 files changed, 582 insertions(+), 240 deletions(-)

-- 

v4:
  - Fix spinlock bad magic
  - Use rmem->name for debugfs entry
  - Address the comments in v3

v3:
  Using only one reserved memory region for both streaming DMA and memory
  allocation.
  https://lore.kernel.org/patchwork/cover/1360992/

v2:
  Building on top of swiotlb.
  https://lore.kernel.org/patchwork/cover/1280705/

v1:
  Using dma_map_ops.
  https://lore.kernel.org/patchwork/cover/1271660/

2.30.0.478.g8a0d178c01-goog

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  8:40   ` Christoph Hellwig
  2021-02-09  6:21 ` [PATCH v4 02/14] swiotlb: Move is_swiotlb_buffer() to swiotlb.c Claire Chang
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Add a new function, get_swiotlb_start(), and remove external access to
io_tlb_start, so we can entirely hide struct swiotlb inside of swiotlb.c
in the following patches.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 arch/powerpc/platforms/pseries/svm.c | 4 ++--
 drivers/xen/swiotlb-xen.c            | 4 ++--
 include/linux/swiotlb.h              | 1 +
 kernel/dma/swiotlb.c                 | 5 +++++
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c
index 7b739cc7a8a9..c10c51d49f3d 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -55,8 +55,8 @@ void __init svm_swiotlb_init(void)
 	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, false))
 		return;
 
-	if (io_tlb_start)
-		memblock_free_early(io_tlb_start,
+	if (vstart)
+		memblock_free_early(vstart,
 				    PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
 	panic("SVM: Cannot allocate SWIOTLB buffer");
 }
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 2b385c1b4a99..91f8c68d1a9b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -192,8 +192,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
 	/*
 	 * IO TLB memory already allocated. Just use it.
 	 */
-	if (io_tlb_start != 0) {
-		xen_io_tlb_start = phys_to_virt(io_tlb_start);
+	if (is_swiotlb_active()) {
+		xen_io_tlb_start = phys_to_virt(get_swiotlb_start());
 		goto end;
 	}
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d9c9fc9ca5d2..83200f3b042a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -81,6 +81,7 @@ void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
+phys_addr_t get_swiotlb_start(void);
 void __init swiotlb_adjust_size(unsigned long new_size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..e180211f6ad9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -719,6 +719,11 @@ bool is_swiotlb_active(void)
 	return io_tlb_end != 0;
 }
 
+phys_addr_t get_swiotlb_start(void)
+{
+	return io_tlb_start;
+}
+
 #ifdef CONFIG_DEBUG_FS
 
 static int __init swiotlb_create_debugfs(void)
--

This can be dropped if Christoph's swiotlb cleanups are landed.
https://lore.kernel.org/linux-iommu/20210207160934.2955931-1-hch@lst.de/T/#m7124f29b6076d462101fcff6433295157621da09 

2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start
  2021-02-09  6:21 ` [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start Claire Chang
@ 2021-02-09  8:40   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2021-02-09  8:40 UTC (permalink / raw)
  To: Claire Chang
  Cc: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski, benh, paulus, list@263.net:IOMMU DRIVERS,
	sstabellini, Robin Murphy, grant.likely, xypron.glpk,
	Thierry Reding, mingo, bauerman, peterz, Greg KH, Saravana Kannan,
	Rafael J . Wysocki, heikki.krogerus, Andy Shevchenko,
	Randy Dunlap, Dan Williams, Bartosz Golaszewski, linux-devicetree,
	lkml, linuxppc-dev, xen-devel, Nicolas Boichat, Jim Quinlan

On Tue, Feb 09, 2021 at 02:21:18PM +0800, Claire Chang wrote:
> This can be dropped if Christoph's swiotlb cleanups are landed.
> https://lore.kernel.org/linux-iommu/20210207160934.2955931-1-hch@lst.de/T/#m7124f29b6076d462101fcff6433295157621da09 

FYI, I've also started looking into additional cleanups based on your
struct in this branch, but I'd like to wait for all the previous
changes to settle first:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-struct

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 02/14] swiotlb: Move is_swiotlb_buffer() to swiotlb.c
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
  2021-02-09  6:21 ` [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 03/14] swiotlb: Add struct swiotlb Claire Chang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Move is_swiotlb_buffer() to swiotlb.c and make io_tlb_{start,end}
static, so we can entirely hide struct swiotlb inside of swiotlb.c in
the following patches.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 include/linux/swiotlb.h | 7 +------
 kernel/dma/swiotlb.c    | 7 ++++++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 83200f3b042a..041611bf3c2a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -70,13 +70,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
-extern phys_addr_t io_tlb_start, io_tlb_end;
-
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
-{
-	return paddr >= io_tlb_start && paddr < io_tlb_end;
-}
 
+bool is_swiotlb_buffer(phys_addr_t paddr);
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e180211f6ad9..678490d39e55 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -69,7 +69,7 @@ enum swiotlb_force swiotlb_force;
  * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
  * API.
  */
-phys_addr_t io_tlb_start, io_tlb_end;
+static phys_addr_t io_tlb_start, io_tlb_end;
 
 /*
  * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
@@ -719,6 +719,11 @@ bool is_swiotlb_active(void)
 	return io_tlb_end != 0;
 }
 
+bool is_swiotlb_buffer(phys_addr_t paddr)
+{
+	return paddr >= io_tlb_start && paddr < io_tlb_end;
+}
+
 phys_addr_t get_swiotlb_start(void)
 {
 	return io_tlb_start;
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 03/14] swiotlb: Add struct swiotlb
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
  2021-02-09  6:21 ` [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start Claire Chang
  2021-02-09  6:21 ` [PATCH v4 02/14] swiotlb: Move is_swiotlb_buffer() to swiotlb.c Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 04/14] swiotlb: Refactor swiotlb_late_init_with_tbl Claire Chang
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Added a new struct, swiotlb, as the IO TLB memory pool descriptor and
moved relevant global variables into that struct.
This will be useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 kernel/dma/swiotlb.c | 327 +++++++++++++++++++++++--------------------
 1 file changed, 172 insertions(+), 155 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 678490d39e55..28b7bfe7a2a8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -61,33 +61,43 @@
  * allocate a contiguous 1MB, we're probably in trouble anyway.
  */
 #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
+#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
 enum swiotlb_force swiotlb_force;
 
 /*
- * Used to do a quick range check in swiotlb_tbl_unmap_single and
- * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
- * API.
- */
-static phys_addr_t io_tlb_start, io_tlb_end;
-
-/*
- * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
- * io_tlb_end.  This is command line adjustable via setup_io_tlb_npages.
- */
-static unsigned long io_tlb_nslabs;
-
-/*
- * The number of used IO TLB block
- */
-static unsigned long io_tlb_used;
-
-/*
- * This is a free list describing the number of free entries available from
- * each index
+ * struct swiotlb - Software IO TLB Memory Pool Descriptor
+ *
+ * @start:      The start address of the swiotlb memory pool. Used to do a quick
+ *              range check to see if the memory was in fact allocated by this
+ *              API.
+ * @end:        The end address of the swiotlb memory pool. Used to do a quick
+ *              range check to see if the memory was in fact allocated by this
+ *              API.
+ * @nslabs:     The number of IO TLB blocks (in groups of 64) between @start and
+ *              @end. This is command line adjustable via setup_io_tlb_npages.
+ * @used:       The number of used IO TLB block.
+ * @list:       The free list describing the number of free entries available
+ *              from each index.
+ * @index:      The index to start searching in the next round.
+ * @orig_addr:  The original address corresponding to a mapped entry for the
+ *              sync operations.
+ * @lock:       The lock to protect the above data structures in the map and
+ *              unmap calls.
+ * @debugfs:    The dentry to debugfs.
  */
-static unsigned int *io_tlb_list;
-static unsigned int io_tlb_index;
+struct swiotlb {
+	phys_addr_t start;
+	phys_addr_t end;
+	unsigned long nslabs;
+	unsigned long used;
+	unsigned int *list;
+	unsigned int index;
+	phys_addr_t *orig_addr;
+	spinlock_t lock;
+	struct dentry *debugfs;
+};
+static struct swiotlb default_swiotlb;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -95,27 +105,17 @@ static unsigned int io_tlb_index;
  */
 static unsigned int max_segment;
 
-/*
- * We need to save away the original address corresponding to a mapped entry
- * for the sync operations.
- */
-#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
-static phys_addr_t *io_tlb_orig_addr;
-
-/*
- * Protect the above data structures in the map and unmap calls
- */
-static DEFINE_SPINLOCK(io_tlb_lock);
-
 static int late_alloc;
 
 static int __init
 setup_io_tlb_npages(char *str)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
+
 	if (isdigit(*str)) {
-		io_tlb_nslabs = simple_strtoul(str, &str, 0);
+		swiotlb->nslabs = simple_strtoul(str, &str, 0);
 		/* avoid tail segment of size < IO_TLB_SEGSIZE */
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 	}
 	if (*str == ',')
 		++str;
@@ -123,7 +123,7 @@ setup_io_tlb_npages(char *str)
 		swiotlb_force = SWIOTLB_FORCE;
 	} else if (!strcmp(str, "noforce")) {
 		swiotlb_force = SWIOTLB_NO_FORCE;
-		io_tlb_nslabs = 1;
+		swiotlb->nslabs = 1;
 	}
 
 	return 0;
@@ -134,7 +134,7 @@ static bool no_iotlb_memory;
 
 unsigned long swiotlb_nr_tbl(void)
 {
-	return unlikely(no_iotlb_memory) ? 0 : io_tlb_nslabs;
+	return unlikely(no_iotlb_memory) ? 0 : default_swiotlb.nslabs;
 }
 EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 
@@ -156,13 +156,14 @@ unsigned long swiotlb_size_or_default(void)
 {
 	unsigned long size;
 
-	size = io_tlb_nslabs << IO_TLB_SHIFT;
+	size = default_swiotlb.nslabs << IO_TLB_SHIFT;
 
 	return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
 void __init swiotlb_adjust_size(unsigned long new_size)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long size;
 
 	/*
@@ -170,10 +171,10 @@ void __init swiotlb_adjust_size(unsigned long new_size)
 	 * architectures such as those supporting memory encryption to
 	 * adjust/expand SWIOTLB size for their use.
 	 */
-	if (!io_tlb_nslabs) {
+	if (!swiotlb->nslabs) {
 		size = ALIGN(new_size, 1 << IO_TLB_SHIFT);
-		io_tlb_nslabs = size >> IO_TLB_SHIFT;
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+		swiotlb->nslabs = size >> IO_TLB_SHIFT;
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 
 		pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
 	}
@@ -181,14 +182,15 @@ void __init swiotlb_adjust_size(unsigned long new_size)
 
 void swiotlb_print_info(void)
 {
-	unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+	struct swiotlb *swiotlb = &default_swiotlb;
+	unsigned long bytes = swiotlb->nslabs << IO_TLB_SHIFT;
 
 	if (no_iotlb_memory) {
 		pr_warn("No low mem\n");
 		return;
 	}
 
-	pr_info("mapped [mem %pa-%pa] (%luMB)\n", &io_tlb_start, &io_tlb_end,
+	pr_info("mapped [mem %pa-%pa] (%luMB)\n", &swiotlb->start, &swiotlb->end,
 	       bytes >> 20);
 }
 
@@ -200,57 +202,61 @@ void swiotlb_print_info(void)
  */
 void __init swiotlb_update_mem_attributes(void)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	void *vaddr;
 	unsigned long bytes;
 
 	if (no_iotlb_memory || late_alloc)
 		return;
 
-	vaddr = phys_to_virt(io_tlb_start);
-	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+	vaddr = phys_to_virt(swiotlb->start);
+	bytes = PAGE_ALIGN(swiotlb->nslabs << IO_TLB_SHIFT);
 	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
 	memset(vaddr, 0, bytes);
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long i, bytes;
 	size_t alloc_size;
 
 	bytes = nslabs << IO_TLB_SHIFT;
 
-	io_tlb_nslabs = nslabs;
-	io_tlb_start = __pa(tlb);
-	io_tlb_end = io_tlb_start + bytes;
+	swiotlb->nslabs = nslabs;
+	swiotlb->start = __pa(tlb);
+	swiotlb->end = swiotlb->start + bytes;
 
 	/*
 	 * Allocate and initialize the free list array.  This array is used
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
-	 * between io_tlb_start and io_tlb_end.
+	 * between swiotlb->start and swiotlb->end.
 	 */
-	alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(int));
-	io_tlb_list = memblock_alloc(alloc_size, PAGE_SIZE);
-	if (!io_tlb_list)
+	alloc_size = PAGE_ALIGN(swiotlb->nslabs * sizeof(int));
+	swiotlb->list = memblock_alloc(alloc_size, PAGE_SIZE);
+	if (!swiotlb->list)
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t));
-	io_tlb_orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
-	if (!io_tlb_orig_addr)
+	alloc_size = PAGE_ALIGN(swiotlb->nslabs * sizeof(phys_addr_t));
+	swiotlb->orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
+	if (!swiotlb->orig_addr)
 		panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
 		      __func__, alloc_size, PAGE_SIZE);
 
-	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
-		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+	for (i = 0; i < swiotlb->nslabs; i++) {
+		swiotlb->list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 	}
-	io_tlb_index = 0;
+	swiotlb->index = 0;
 	no_iotlb_memory = false;
 
 	if (verbose)
 		swiotlb_print_info();
 
-	swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+	swiotlb_set_max_segment(swiotlb->nslabs << IO_TLB_SHIFT);
+	spin_lock_init(&swiotlb->lock);
+
 	return 0;
 }
 
@@ -261,26 +267,27 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 void  __init
 swiotlb_init(int verbose)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	size_t default_size = IO_TLB_DEFAULT_SIZE;
 	unsigned char *vstart;
 	unsigned long bytes;
 
-	if (!io_tlb_nslabs) {
-		io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+	if (!swiotlb->nslabs) {
+		swiotlb->nslabs = (default_size >> IO_TLB_SHIFT);
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 	}
 
-	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+	bytes = swiotlb->nslabs << IO_TLB_SHIFT;
 
 	/* Get IO TLB memory from the low pages */
 	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
-	if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
+	if (vstart && !swiotlb_init_with_tbl(vstart, swiotlb->nslabs, verbose))
 		return;
 
-	if (io_tlb_start) {
-		memblock_free_early(io_tlb_start,
-				    PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
-		io_tlb_start = 0;
+	if (swiotlb->start) {
+		memblock_free_early(swiotlb->start,
+				    PAGE_ALIGN(swiotlb->nslabs << IO_TLB_SHIFT));
+		swiotlb->start = 0;
 	}
 	pr_warn("Cannot allocate buffer");
 	no_iotlb_memory = true;
@@ -294,22 +301,23 @@ swiotlb_init(int verbose)
 int
 swiotlb_late_init_with_default_size(size_t default_size)
 {
-	unsigned long bytes, req_nslabs = io_tlb_nslabs;
+	struct swiotlb *swiotlb = &default_swiotlb;
+	unsigned long bytes, req_nslabs = swiotlb->nslabs;
 	unsigned char *vstart = NULL;
 	unsigned int order;
 	int rc = 0;
 
-	if (!io_tlb_nslabs) {
-		io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
-		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+	if (!swiotlb->nslabs) {
+		swiotlb->nslabs = (default_size >> IO_TLB_SHIFT);
+		swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 	}
 
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	order = get_order(io_tlb_nslabs << IO_TLB_SHIFT);
-	io_tlb_nslabs = SLABS_PER_PAGE << order;
-	bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+	order = get_order(swiotlb->nslabs << IO_TLB_SHIFT);
+	swiotlb->nslabs = SLABS_PER_PAGE << order;
+	bytes = swiotlb->nslabs << IO_TLB_SHIFT;
 
 	while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
 		vstart = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
@@ -320,15 +328,15 @@ swiotlb_late_init_with_default_size(size_t default_size)
 	}
 
 	if (!vstart) {
-		io_tlb_nslabs = req_nslabs;
+		swiotlb->nslabs = req_nslabs;
 		return -ENOMEM;
 	}
 	if (order != get_order(bytes)) {
 		pr_warn("only able to allocate %ld MB\n",
 			(PAGE_SIZE << order) >> 20);
-		io_tlb_nslabs = SLABS_PER_PAGE << order;
+		swiotlb->nslabs = SLABS_PER_PAGE << order;
 	}
-	rc = swiotlb_late_init_with_tbl(vstart, io_tlb_nslabs);
+	rc = swiotlb_late_init_with_tbl(vstart, swiotlb->nslabs);
 	if (rc)
 		free_pages((unsigned long)vstart, order);
 
@@ -337,22 +345,25 @@ swiotlb_late_init_with_default_size(size_t default_size)
 
 static void swiotlb_cleanup(void)
 {
-	io_tlb_end = 0;
-	io_tlb_start = 0;
-	io_tlb_nslabs = 0;
+	struct swiotlb *swiotlb = &default_swiotlb;
+
+	swiotlb->end = 0;
+	swiotlb->start = 0;
+	swiotlb->nslabs = 0;
 	max_segment = 0;
 }
 
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long i, bytes;
 
 	bytes = nslabs << IO_TLB_SHIFT;
 
-	io_tlb_nslabs = nslabs;
-	io_tlb_start = virt_to_phys(tlb);
-	io_tlb_end = io_tlb_start + bytes;
+	swiotlb->nslabs = nslabs;
+	swiotlb->start = virt_to_phys(tlb);
+	swiotlb->end = swiotlb->start + bytes;
 
 	set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
 	memset(tlb, 0, bytes);
@@ -360,39 +371,40 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	/*
 	 * Allocate and initialize the free list array.  This array is used
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
-	 * between io_tlb_start and io_tlb_end.
+	 * between swiotlb->start and swiotlb->end.
 	 */
-	io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
-	                              get_order(io_tlb_nslabs * sizeof(int)));
-	if (!io_tlb_list)
+	swiotlb->list = (unsigned int *)__get_free_pages(GFP_KERNEL,
+	                              get_order(swiotlb->nslabs * sizeof(int)));
+	if (!swiotlb->list)
 		goto cleanup3;
 
-	io_tlb_orig_addr = (phys_addr_t *)
+	swiotlb->orig_addr = (phys_addr_t *)
 		__get_free_pages(GFP_KERNEL,
-				 get_order(io_tlb_nslabs *
+				 get_order(swiotlb->nslabs *
 					   sizeof(phys_addr_t)));
-	if (!io_tlb_orig_addr)
+	if (!swiotlb->orig_addr)
 		goto cleanup4;
 
-	for (i = 0; i < io_tlb_nslabs; i++) {
-		io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
-		io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+	for (i = 0; i < swiotlb->nslabs; i++) {
+		swiotlb->list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+		swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 	}
-	io_tlb_index = 0;
+	swiotlb->index = 0;
 	no_iotlb_memory = false;
 
 	swiotlb_print_info();
 
 	late_alloc = 1;
 
-	swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+	swiotlb_set_max_segment(swiotlb->nslabs << IO_TLB_SHIFT);
+	spin_lock_init(&swiotlb->lock);
 
 	return 0;
 
 cleanup4:
-	free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
-	                                                 sizeof(int)));
-	io_tlb_list = NULL;
+	free_pages((unsigned long)swiotlb->list,
+		   get_order(swiotlb->nslabs * sizeof(int)));
+	swiotlb->list = NULL;
 cleanup3:
 	swiotlb_cleanup();
 	return -ENOMEM;
@@ -400,23 +412,25 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 
 void __init swiotlb_exit(void)
 {
-	if (!io_tlb_orig_addr)
+	struct swiotlb *swiotlb = &default_swiotlb;
+
+	if (!swiotlb->orig_addr)
 		return;
 
 	if (late_alloc) {
-		free_pages((unsigned long)io_tlb_orig_addr,
-			   get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
-		free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
-								 sizeof(int)));
-		free_pages((unsigned long)phys_to_virt(io_tlb_start),
-			   get_order(io_tlb_nslabs << IO_TLB_SHIFT));
+		free_pages((unsigned long)swiotlb->orig_addr,
+			   get_order(swiotlb->nslabs * sizeof(phys_addr_t)));
+		free_pages((unsigned long)swiotlb->list,
+			   get_order(swiotlb->nslabs * sizeof(int)));
+		free_pages((unsigned long)phys_to_virt(swiotlb->start),
+			   get_order(swiotlb->nslabs << IO_TLB_SHIFT));
 	} else {
-		memblock_free_late(__pa(io_tlb_orig_addr),
-				   PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
-		memblock_free_late(__pa(io_tlb_list),
-				   PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
-		memblock_free_late(io_tlb_start,
-				   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+		memblock_free_late(__pa(swiotlb->orig_addr),
+				   PAGE_ALIGN(swiotlb->nslabs * sizeof(phys_addr_t)));
+		memblock_free_late(__pa(swiotlb->list),
+				   PAGE_ALIGN(swiotlb->nslabs * sizeof(int)));
+		memblock_free_late(swiotlb->start,
+				   PAGE_ALIGN(swiotlb->nslabs << IO_TLB_SHIFT));
 	}
 	swiotlb_cleanup();
 }
@@ -465,7 +479,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		size_t mapping_size, size_t alloc_size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
+	struct swiotlb *swiotlb = &default_swiotlb;
+	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, swiotlb->start);
 	unsigned long flags;
 	phys_addr_t tlb_addr;
 	unsigned int nslots, stride, index, wrap;
@@ -516,13 +531,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	 * Find suitable number of IO TLB entries size that will fit this
 	 * request and allocate a buffer from that IO TLB pool.
 	 */
-	spin_lock_irqsave(&io_tlb_lock, flags);
+	spin_lock_irqsave(&swiotlb->lock, flags);
 
-	if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
+	if (unlikely(nslots > swiotlb->nslabs - swiotlb->used))
 		goto not_found;
 
-	index = ALIGN(io_tlb_index, stride);
-	if (index >= io_tlb_nslabs)
+	index = ALIGN(swiotlb->index, stride);
+	if (index >= swiotlb->nslabs)
 		index = 0;
 	wrap = index;
 
@@ -530,7 +545,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		while (iommu_is_span_boundary(index, nslots, offset_slots,
 					      max_slots)) {
 			index += stride;
-			if (index >= io_tlb_nslabs)
+			if (index >= swiotlb->nslabs)
 				index = 0;
 			if (index == wrap)
 				goto not_found;
@@ -541,40 +556,40 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		 * contiguous buffers, we allocate the buffers from that slot
 		 * and mark the entries as '0' indicating unavailable.
 		 */
-		if (io_tlb_list[index] >= nslots) {
+		if (swiotlb->list[index] >= nslots) {
 			int count = 0;
 
 			for (i = index; i < (int) (index + nslots); i++)
-				io_tlb_list[i] = 0;
-			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
-				io_tlb_list[i] = ++count;
-			tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+				swiotlb->list[i] = 0;
+			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && swiotlb->list[i]; i--)
+				swiotlb->list[i] = ++count;
+			tlb_addr = swiotlb->start + (index << IO_TLB_SHIFT);
 
 			/*
 			 * Update the indices to avoid searching in the next
 			 * round.
 			 */
-			io_tlb_index = ((index + nslots) < io_tlb_nslabs
-					? (index + nslots) : 0);
+			swiotlb->index = ((index + nslots) < swiotlb->nslabs
+				      ? (index + nslots) : 0);
 
 			goto found;
 		}
 		index += stride;
-		if (index >= io_tlb_nslabs)
+		if (index >= swiotlb->nslabs)
 			index = 0;
 	} while (index != wrap);
 
 not_found:
-	tmp_io_tlb_used = io_tlb_used;
+	tmp_io_tlb_used = swiotlb->used;
 
-	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	spin_unlock_irqrestore(&swiotlb->lock, flags);
 	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
 		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
+			 alloc_size, swiotlb->nslabs, tmp_io_tlb_used);
 	return (phys_addr_t)DMA_MAPPING_ERROR;
 found:
-	io_tlb_used += nslots;
-	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	swiotlb->used += nslots;
+	spin_unlock_irqrestore(&swiotlb->lock, flags);
 
 	/*
 	 * Save away the mapping from the original address to the DMA address.
@@ -582,7 +597,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	 * needed.
 	 */
 	for (i = 0; i < nslots; i++)
-		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+		swiotlb->orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -597,10 +612,11 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      size_t mapping_size, size_t alloc_size,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
+	struct swiotlb *swiotlb = &default_swiotlb;
 	unsigned long flags;
 	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
-	phys_addr_t orig_addr = io_tlb_orig_addr[index];
+	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
+	phys_addr_t orig_addr = swiotlb->orig_addr[index];
 
 	/*
 	 * First, sync the memory before unmapping the entry
@@ -616,36 +632,37 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	 * While returning the entries to the free list, we merge the entries
 	 * with slots below and above the pool being returned.
 	 */
-	spin_lock_irqsave(&io_tlb_lock, flags);
+	spin_lock_irqsave(&swiotlb->lock, flags);
 	{
 		count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-			 io_tlb_list[index + nslots] : 0);
+			 swiotlb->list[index + nslots] : 0);
 		/*
 		 * Step 1: return the slots to the free list, merging the
 		 * slots with superceeding slots
 		 */
 		for (i = index + nslots - 1; i >= index; i--) {
-			io_tlb_list[i] = ++count;
-			io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+			swiotlb->list[i] = ++count;
+			swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 		}
 		/*
 		 * Step 2: merge the returned slots with the preceding slots,
 		 * if available (non zero)
 		 */
-		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
-			io_tlb_list[i] = ++count;
+		for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && swiotlb->list[i]; i--)
+			swiotlb->list[i] = ++count;
 
-		io_tlb_used -= nslots;
+		swiotlb->used -= nslots;
 	}
-	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	spin_unlock_irqrestore(&swiotlb->lock, flags);
 }
 
 void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 			     size_t size, enum dma_data_direction dir,
 			     enum dma_sync_target target)
 {
-	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
-	phys_addr_t orig_addr = io_tlb_orig_addr[index];
+	struct swiotlb *swiotlb = &default_swiotlb;
+	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
+	phys_addr_t orig_addr = swiotlb->orig_addr[index];
 
 	if (orig_addr == INVALID_PHYS_ADDR)
 		return;
@@ -713,31 +730,31 @@ size_t swiotlb_max_mapping_size(struct device *dev)
 bool is_swiotlb_active(void)
 {
 	/*
-	 * When SWIOTLB is initialized, even if io_tlb_start points to physical
-	 * address zero, io_tlb_end surely doesn't.
+	 * When SWIOTLB is initialized, even if swiotlb->start points to
+	 * physical address zero, swiotlb->end surely doesn't.
 	 */
-	return io_tlb_end != 0;
+	return default_swiotlb.end != 0;
 }
 
 bool is_swiotlb_buffer(phys_addr_t paddr)
 {
-	return paddr >= io_tlb_start && paddr < io_tlb_end;
+	return paddr >= default_swiotlb.start && paddr < default_swiotlb.end;
 }
 
 phys_addr_t get_swiotlb_start(void)
 {
-	return io_tlb_start;
+	return default_swiotlb.start;
 }
 
 #ifdef CONFIG_DEBUG_FS
 
 static int __init swiotlb_create_debugfs(void)
 {
-	struct dentry *root;
+	struct swiotlb *swiotlb = &default_swiotlb;
 
-	root = debugfs_create_dir("swiotlb", NULL);
-	debugfs_create_ulong("io_tlb_nslabs", 0400, root, &io_tlb_nslabs);
-	debugfs_create_ulong("io_tlb_used", 0400, root, &io_tlb_used);
+	swiotlb->debugfs = debugfs_create_dir("swiotlb", NULL);
+	debugfs_create_ulong("io_tlb_nslabs", 0400, swiotlb->debugfs, &swiotlb->nslabs);
+	debugfs_create_ulong("io_tlb_used", 0400, swiotlb->debugfs, &swiotlb->used);
 	return 0;
 }
 
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 04/14] swiotlb: Refactor swiotlb_late_init_with_tbl
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (2 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 03/14] swiotlb: Add struct swiotlb Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 05/14] swiotlb: Add DMA_RESTRICTED_POOL Claire Chang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Refactor swiotlb_late_init_with_tbl to make the code reusable for
restricted DMA pool initialization.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 kernel/dma/swiotlb.c | 65 ++++++++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 28b7bfe7a2a8..dc37951c6924 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -353,20 +353,21 @@ static void swiotlb_cleanup(void)
 	max_segment = 0;
 }
 
-int
-swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
+static int swiotlb_init_tlb_pool(struct swiotlb *swiotlb, phys_addr_t start,
+				size_t size)
 {
-	struct swiotlb *swiotlb = &default_swiotlb;
-	unsigned long i, bytes;
+	unsigned long i;
+	void *vaddr = phys_to_virt(start);
 
-	bytes = nslabs << IO_TLB_SHIFT;
+	size = ALIGN(size, 1 << IO_TLB_SHIFT);
+	swiotlb->nslabs = size >> IO_TLB_SHIFT;
+	swiotlb->nslabs = ALIGN(swiotlb->nslabs, IO_TLB_SEGSIZE);
 
-	swiotlb->nslabs = nslabs;
-	swiotlb->start = virt_to_phys(tlb);
-	swiotlb->end = swiotlb->start + bytes;
+	swiotlb->start = start;
+	swiotlb->end = swiotlb->start + size;
 
-	set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-	memset(tlb, 0, bytes);
+	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
+	memset(vaddr, 0, size);
 
 	/*
 	 * Allocate and initialize the free list array.  This array is used
@@ -390,13 +391,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 		swiotlb->orig_addr[i] = INVALID_PHYS_ADDR;
 	}
 	swiotlb->index = 0;
-	no_iotlb_memory = false;
-
-	swiotlb_print_info();
 
-	late_alloc = 1;
-
-	swiotlb_set_max_segment(swiotlb->nslabs << IO_TLB_SHIFT);
 	spin_lock_init(&swiotlb->lock);
 
 	return 0;
@@ -410,6 +405,27 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	return -ENOMEM;
 }
 
+int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
+{
+	struct swiotlb *swiotlb = &default_swiotlb;
+	unsigned long bytes = nslabs << IO_TLB_SHIFT;
+	int ret;
+
+	ret = swiotlb_init_tlb_pool(swiotlb, virt_to_phys(tlb), bytes);
+	if (ret)
+		return ret;
+
+	no_iotlb_memory = false;
+
+	swiotlb_print_info();
+
+	late_alloc = 1;
+
+	swiotlb_set_max_segment(bytes);
+
+	return 0;
+}
+
 void __init swiotlb_exit(void)
 {
 	struct swiotlb *swiotlb = &default_swiotlb;
@@ -747,17 +763,20 @@ phys_addr_t get_swiotlb_start(void)
 }
 
 #ifdef CONFIG_DEBUG_FS
-
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs(struct swiotlb *swiotlb, const char *name,
+				   struct dentry *node)
 {
-	struct swiotlb *swiotlb = &default_swiotlb;
-
-	swiotlb->debugfs = debugfs_create_dir("swiotlb", NULL);
+	swiotlb->debugfs = debugfs_create_dir(name, node);
 	debugfs_create_ulong("io_tlb_nslabs", 0400, swiotlb->debugfs, &swiotlb->nslabs);
 	debugfs_create_ulong("io_tlb_used", 0400, swiotlb->debugfs, &swiotlb->used);
-	return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+static int __init swiotlb_create_default_debugfs(void)
+{
+	swiotlb_create_debugfs(&default_swiotlb, "swiotlb", NULL);
+
+	return 0;
+}
 
+late_initcall(swiotlb_create_default_debugfs);
 #endif
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 05/14] swiotlb: Add DMA_RESTRICTED_POOL
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (3 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 04/14] swiotlb: Refactor swiotlb_late_init_with_tbl Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 06/14] swiotlb: Add restricted DMA pool Claire Chang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Add a new kconfig symbol, DMA_RESTRICTED_POOL, for restricted DMA pool.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 kernel/dma/Kconfig | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 479fc145acfc..97ff9f8dd3c8 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -83,6 +83,20 @@ config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+	bool "DMA Restricted Pool"
+	depends on OF && OF_RESERVED_MEM
+	select SWIOTLB
+	help
+	  This enables support for restricted DMA pools which provide a level of
+	  DMA memory protection on systems with limited hardware protection
+	  capabilities, such as those lacking an IOMMU.
+
+	  For more information see
+	  <Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt>
+	  and <kernel/dma/swiotlb.c>.
+	  If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 06/14] swiotlb: Add restricted DMA pool
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (4 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 05/14] swiotlb: Add DMA_RESTRICTED_POOL Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 07/14] swiotlb: Update swiotlb API to gain a struct device argument Claire Chang
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 include/linux/device.h |  4 ++
 kernel/dma/swiotlb.c   | 94 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 7619a84f8ce4..08d440627b93 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -415,6 +415,7 @@ struct dev_links_info {
  * @dma_pools:	Dma pools (if dma'ble device).
  * @dma_mem:	Internal for coherent mem override.
  * @cma_area:	Contiguous memory area for dma allocations
+ * @dev_swiotlb: Internal for swiotlb override.
  * @archdata:	For arch-specific additions.
  * @of_node:	Associated device tree node.
  * @fwnode:	Associated device node supplied by platform firmware.
@@ -517,6 +518,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
 	struct cma *cma_area;		/* contiguous memory area for dma
 					   allocations */
+#endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+	struct swiotlb *dev_swiotlb;
 #endif
 	/* arch specific additions */
 	struct dev_archdata	archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dc37951c6924..3a17451c5981 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include <linux/debugfs.h>
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include <linux/io.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>
+#include <linux/slab.h>
+#endif
 
 #include <asm/io.h>
 #include <asm/dma.h>
@@ -75,7 +82,8 @@ enum swiotlb_force swiotlb_force;
  *              range check to see if the memory was in fact allocated by this
  *              API.
  * @nslabs:     The number of IO TLB blocks (in groups of 64) between @start and
- *              @end. This is command line adjustable via setup_io_tlb_npages.
+ *              @end. For default swiotlb, this is command line adjustable via
+ *              setup_io_tlb_npages.
  * @used:       The number of used IO TLB block.
  * @list:       The free list describing the number of free entries available
  *              from each index.
@@ -780,3 +788,87 @@ static int __init swiotlb_create_default_debugfs(void)
 
 late_initcall(swiotlb_create_default_debugfs);
 #endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+				    struct device *dev)
+{
+	struct swiotlb *swiotlb = rmem->priv;
+	int ret;
+
+	if (dev->dev_swiotlb)
+		return -EBUSY;
+
+	/* Since multiple devices can share the same pool, the private data,
+	 * swiotlb struct, will be initialized by the first device attached
+	 * to it.
+	 */
+	if (!swiotlb) {
+		swiotlb = kzalloc(sizeof(*swiotlb), GFP_KERNEL);
+		if (!swiotlb)
+			return -ENOMEM;
+#ifdef CONFIG_ARM
+		unsigned long pfn = PHYS_PFN(reme->base);
+
+		if (!PageHighMem(pfn_to_page(pfn))) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+#endif /* CONFIG_ARM */
+
+		ret = swiotlb_init_tlb_pool(swiotlb, rmem->base, rmem->size);
+		if (ret)
+			goto cleanup;
+
+		rmem->priv = swiotlb;
+	}
+
+#ifdef CONFIG_DEBUG_FS
+	swiotlb_create_debugfs(swiotlb, rmem->name, default_swiotlb.debugfs);
+#endif /* CONFIG_DEBUG_FS */
+
+	dev->dev_swiotlb = swiotlb;
+
+	return 0;
+
+cleanup:
+	kfree(swiotlb);
+
+	return ret;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+					struct device *dev)
+{
+	if (!dev)
+		return;
+
+#ifdef CONFIG_DEBUG_FS
+	debugfs_remove_recursive(dev->dev_swiotlb->debugfs);
+#endif /* CONFIG_DEBUG_FS */
+	dev->dev_swiotlb = NULL;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+	.device_init = rmem_swiotlb_device_init,
+	.device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+	unsigned long node = rmem->fdt_node;
+
+	if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+	    of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+	    of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+	    of_get_flat_dt_prop(node, "no-map", NULL))
+		return -EINVAL;
+
+	rmem->ops = &rmem_swiotlb_ops;
+	pr_info("Reserved memory: created device swiotlb memory pool at %pa, size %ld MiB\n",
+		&rmem->base, (unsigned long)rmem->size / SZ_1M);
+	return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 07/14] swiotlb: Update swiotlb API to gain a struct device argument
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (5 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 06/14] swiotlb: Add restricted DMA pool Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 08/14] swiotlb: Use restricted DMA pool if available Claire Chang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Introduce the get_swiotlb() getter and update all callers of
is_swiotlb_active(), is_swiotlb_buffer() and get_swiotlb_start() to gain
a struct device argument.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 drivers/iommu/dma-iommu.c | 12 ++++++------
 drivers/xen/swiotlb-xen.c |  4 ++--
 include/linux/swiotlb.h   | 10 +++++-----
 kernel/dma/direct.c       |  8 ++++----
 kernel/dma/direct.h       |  6 +++---
 kernel/dma/swiotlb.c      | 23 +++++++++++++++++------
 6 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f659395e7959..abdbe14472cc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -503,7 +503,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
 
 	__iommu_dma_unmap(dev, dma_addr, size);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(dev, phys)))
 		swiotlb_tbl_unmap_single(dev, phys, size,
 				iova_align(iovad, size), dir, attrs);
 }
@@ -580,7 +580,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
 	}
 
 	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-	if ((iova == DMA_MAPPING_ERROR) && is_swiotlb_buffer(phys))
+	if ((iova == DMA_MAPPING_ERROR) && is_swiotlb_buffer(dev, phys))
 		swiotlb_tbl_unmap_single(dev, phys, org_size,
 				aligned_size, dir, attrs);
 
@@ -753,7 +753,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu(phys, size, dir);
 
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(dev, phys))
 		swiotlb_tbl_sync_single(dev, phys, size, dir, SYNC_FOR_CPU);
 }
 
@@ -766,7 +766,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-	if (is_swiotlb_buffer(phys))
+	if (is_swiotlb_buffer(dev, phys))
 		swiotlb_tbl_sync_single(dev, phys, size, dir, SYNC_FOR_DEVICE);
 
 	if (!dev_is_dma_coherent(dev))
@@ -787,7 +787,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(dev, sg_phys(sg)))
 			swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
 						dir, SYNC_FOR_CPU);
 	}
@@ -804,7 +804,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
 		return;
 
 	for_each_sg(sgl, sg, nelems, i) {
-		if (is_swiotlb_buffer(sg_phys(sg)))
+		if (is_swiotlb_buffer(dev, sg_phys(sg)))
 			swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
 						dir, SYNC_FOR_DEVICE);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 91f8c68d1a9b..f424d46756b1 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -192,8 +192,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
 	/*
 	 * IO TLB memory already allocated. Just use it.
 	 */
-	if (is_swiotlb_active()) {
-		xen_io_tlb_start = phys_to_virt(get_swiotlb_start());
+	if (is_swiotlb_active(NULL)) {
+		xen_io_tlb_start = phys_to_virt(get_swiotlb_start(NULL));
 		goto end;
 	}
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 041611bf3c2a..f13a52a97382 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -71,16 +71,16 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
-bool is_swiotlb_buffer(phys_addr_t paddr);
+bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr);
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
-phys_addr_t get_swiotlb_start(void);
+bool is_swiotlb_active(struct device *dev);
+phys_addr_t get_swiotlb_start(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long new_size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
 	return false;
 }
@@ -96,7 +96,7 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev)
 	return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
 	return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 002268262c9a..30ccbc08e229 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
 	for_each_sg(sgl, sg, nents, i) {
 		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(dev, paddr)))
 			swiotlb_tbl_sync_single(dev, paddr, sg->length,
 					dir, SYNC_FOR_DEVICE);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(paddr, sg->length, dir);
 
-		if (unlikely(is_swiotlb_buffer(paddr)))
+		if (unlikely(is_swiotlb_buffer(dev, paddr)))
 			swiotlb_tbl_sync_single(dev, paddr, sg->length, dir,
 					SYNC_FOR_CPU);
 
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
 	/* If SWIOTLB is active, use its maximum mapping size */
-	if (is_swiotlb_active() &&
+	if (is_swiotlb_active(dev) &&
 	    (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
 		return swiotlb_max_mapping_size(dev);
 	return SIZE_MAX;
@@ -504,7 +504,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
 	return !dev_is_dma_coherent(dev) ||
-		is_swiotlb_buffer(dma_to_phys(dev, dma_addr));
+		is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
 }
 
 /**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index b98615578737..7b83b1595989 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -56,7 +56,7 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(dev, paddr)))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
 
 	if (!dev_is_dma_coherent(dev))
@@ -73,7 +73,7 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_sync_dma_for_cpu_all();
 	}
 
-	if (unlikely(is_swiotlb_buffer(paddr)))
+	if (unlikely(is_swiotlb_buffer(dev, paddr)))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
 
 	if (dir == DMA_FROM_DEVICE)
@@ -113,7 +113,7 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
-	if (unlikely(is_swiotlb_buffer(phys)))
+	if (unlikely(is_swiotlb_buffer(dev, phys)))
 		swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3a17451c5981..e22e7ae75f1c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -107,6 +107,11 @@ struct swiotlb {
 };
 static struct swiotlb default_swiotlb;
 
+static inline struct swiotlb *get_swiotlb(struct device *dev)
+{
+	return &default_swiotlb;
+}
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -751,23 +756,29 @@ size_t swiotlb_max_mapping_size(struct device *dev)
 	return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
+	struct swiotlb *swiotlb = get_swiotlb(dev);
+
 	/*
 	 * When SWIOTLB is initialized, even if swiotlb->start points to
 	 * physical address zero, swiotlb->end surely doesn't.
 	 */
-	return default_swiotlb.end != 0;
+	return swiotlb->end != 0;
 }
 
-bool is_swiotlb_buffer(phys_addr_t paddr)
+bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-	return paddr >= default_swiotlb.start && paddr < default_swiotlb.end;
+	struct swiotlb *swiotlb = get_swiotlb(dev);
+
+	return paddr >= swiotlb->start && paddr < swiotlb->end;
 }
 
-phys_addr_t get_swiotlb_start(void)
+phys_addr_t get_swiotlb_start(struct device *dev)
 {
-	return default_swiotlb.start;
+	struct swiotlb *swiotlb = get_swiotlb(dev);
+
+	return swiotlb->start;
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 08/14] swiotlb: Use restricted DMA pool if available
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (6 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 07/14] swiotlb: Update swiotlb API to gain a struct device argument Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 09/14] swiotlb: Refactor swiotlb_tbl_{map,unmap}_single Claire Chang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 include/linux/swiotlb.h | 13 +++++++++++++
 kernel/dma/direct.h     |  2 +-
 kernel/dma/swiotlb.c    | 20 +++++++++++++++++---
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f13a52a97382..76f86c684524 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -71,6 +71,15 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+bool is_swiotlb_force(struct device *dev);
+#else
+static inline bool is_swiotlb_force(struct device *dev)
+{
+	return unlikely(swiotlb_force == SWIOTLB_FORCE);
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr);
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
@@ -80,6 +89,10 @@ phys_addr_t get_swiotlb_start(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long new_size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
+static inline bool is_swiotlb_force(struct device *dev)
+{
+	return false;
+}
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
 	return false;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 7b83b1595989..b011db1b625d 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
 	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-	if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+	if (is_swiotlb_force(dev))
 		return swiotlb_map(dev, phys, size, dir, attrs);
 
 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e22e7ae75f1c..6fdebde8fb1f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -40,6 +40,7 @@
 #include <linux/debugfs.h>
 #endif
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+#include <linux/device.h>
 #include <linux/io.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
@@ -109,6 +110,10 @@ static struct swiotlb default_swiotlb;
 
 static inline struct swiotlb *get_swiotlb(struct device *dev)
 {
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+	if (dev && dev->dev_swiotlb)
+		return dev->dev_swiotlb;
+#endif
 	return &default_swiotlb;
 }
 
@@ -508,7 +513,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		size_t mapping_size, size_t alloc_size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	struct swiotlb *swiotlb = &default_swiotlb;
+	struct swiotlb *swiotlb = get_swiotlb(hwdev);
 	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, swiotlb->start);
 	unsigned long flags;
 	phys_addr_t tlb_addr;
@@ -519,7 +524,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	unsigned long max_slots;
 	unsigned long tmp_io_tlb_used;
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+	if (no_iotlb_memory && !hwdev->dev_swiotlb)
+#else
 	if (no_iotlb_memory)
+#endif
 		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
 
 	if (mem_encrypt_active())
@@ -641,7 +650,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      size_t mapping_size, size_t alloc_size,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
-	struct swiotlb *swiotlb = &default_swiotlb;
+	struct swiotlb *swiotlb = get_swiotlb(hwdev);
 	unsigned long flags;
 	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
 	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
@@ -689,7 +698,7 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 			     size_t size, enum dma_data_direction dir,
 			     enum dma_sync_target target)
 {
-	struct swiotlb *swiotlb = &default_swiotlb;
+	struct swiotlb *swiotlb = get_swiotlb(hwdev);
 	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = swiotlb->orig_addr[index];
 
@@ -801,6 +810,11 @@ late_initcall(swiotlb_create_default_debugfs);
 #endif
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+bool is_swiotlb_force(struct device *dev)
+{
+	return unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dev_swiotlb;
+}
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
 				    struct device *dev)
 {
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 09/14] swiotlb: Refactor swiotlb_tbl_{map,unmap}_single
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (7 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 08/14] swiotlb: Use restricted DMA pool if available Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 10/14] dma-direct: Add a new wrapper __dma_direct_free_pages() Claire Chang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Refactor swiotlb_tbl_{map,unmap}_single to make the code reusable for
dev_swiotlb_{alloc,free}.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 kernel/dma/swiotlb.c | 116 ++++++++++++++++++++++++++-----------------
 1 file changed, 71 insertions(+), 45 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6fdebde8fb1f..f64cbe6e84cc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -509,14 +509,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 	}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
-		size_t mapping_size, size_t alloc_size,
-		enum dma_data_direction dir, unsigned long attrs)
+static int swiotlb_tbl_find_free_region(struct device *hwdev,
+					dma_addr_t tbl_dma_addr,
+					size_t alloc_size, unsigned long attrs)
 {
 	struct swiotlb *swiotlb = get_swiotlb(hwdev);
-	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, swiotlb->start);
 	unsigned long flags;
-	phys_addr_t tlb_addr;
 	unsigned int nslots, stride, index, wrap;
 	int i;
 	unsigned long mask;
@@ -531,15 +529,6 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 #endif
 		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
 
-	if (mem_encrypt_active())
-		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
-	if (mapping_size > alloc_size) {
-		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
-			      mapping_size, alloc_size);
-		return (phys_addr_t)DMA_MAPPING_ERROR;
-	}
-
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;
@@ -601,7 +590,6 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 				swiotlb->list[i] = 0;
 			for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && swiotlb->list[i]; i--)
 				swiotlb->list[i] = ++count;
-			tlb_addr = swiotlb->start + (index << IO_TLB_SHIFT);
 
 			/*
 			 * Update the indices to avoid searching in the next
@@ -624,45 +612,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
 		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
 			 alloc_size, swiotlb->nslabs, tmp_io_tlb_used);
-	return (phys_addr_t)DMA_MAPPING_ERROR;
+	return -ENOMEM;
+
 found:
 	swiotlb->used += nslots;
 	spin_unlock_irqrestore(&swiotlb->lock, flags);
 
-	/*
-	 * Save away the mapping from the original address to the DMA address.
-	 * This is needed when we sync the memory.  Then we sync the buffer if
-	 * needed.
-	 */
-	for (i = 0; i < nslots; i++)
-		swiotlb->orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-
-	return tlb_addr;
+	return index;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
-			      size_t mapping_size, size_t alloc_size,
-			      enum dma_data_direction dir, unsigned long attrs)
+static void swiotlb_tbl_release_region(struct device *hwdev, int index, size_t size)
 {
 	struct swiotlb *swiotlb = get_swiotlb(hwdev);
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
-	phys_addr_t orig_addr = swiotlb->orig_addr[index];
-
-	/*
-	 * First, sync the memory before unmapping the entry
-	 */
-	if (orig_addr != INVALID_PHYS_ADDR &&
-	    !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-	    ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
-		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+	int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
 
 	/*
 	 * Return the buffer to the free list by setting the corresponding
@@ -694,6 +657,69 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	spin_unlock_irqrestore(&swiotlb->lock, flags);
 }
 
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
+				   size_t mapping_size, size_t alloc_size,
+				   enum dma_data_direction dir,
+				   unsigned long attrs)
+{
+	struct swiotlb *swiotlb = get_swiotlb(hwdev);
+	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, swiotlb->start);
+	phys_addr_t tlb_addr;
+	unsigned int nslots, index;
+	int i;
+
+	if (mem_encrypt_active())
+		pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+	if (mapping_size > alloc_size) {
+		dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+			      mapping_size, alloc_size);
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+	}
+
+	index = swiotlb_tbl_find_free_region(hwdev, tbl_dma_addr, alloc_size, attrs);
+	if (index < 0)
+		return (phys_addr_t)DMA_MAPPING_ERROR;
+
+	tlb_addr = swiotlb->start + (index << IO_TLB_SHIFT);
+
+	/*
+	 * Save away the mapping from the original address to the DMA address.
+	 * This is needed when we sync the memory.  Then we sync the buffer if
+	 * needed.
+	 */
+	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	for (i = 0; i < nslots; i++)
+		swiotlb->orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT);
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
+
+	return tlb_addr;
+}
+
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
+			      size_t mapping_size, size_t alloc_size,
+			      enum dma_data_direction dir, unsigned long attrs)
+{
+	struct swiotlb *swiotlb = get_swiotlb(hwdev);
+	int index = (tlb_addr - swiotlb->start) >> IO_TLB_SHIFT;
+	phys_addr_t orig_addr = swiotlb->orig_addr[index];
+
+	/*
+	 * First, sync the memory before unmapping the entry
+	 */
+	if (orig_addr != INVALID_PHYS_ADDR &&
+	    !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	    ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+	swiotlb_tbl_release_region(hwdev, index, alloc_size);
+}
+
 void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 			     size_t size, enum dma_data_direction dir,
 			     enum dma_sync_target target)
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 10/14] dma-direct: Add a new wrapper __dma_direct_free_pages()
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (8 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 09/14] swiotlb: Refactor swiotlb_tbl_{map,unmap}_single Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 11/14] swiotlb: Add is_dev_swiotlb_force() Claire Chang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Add a new wrapper __dma_direct_free_pages() that will be useful later
for dev_swiotlb_free().

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 kernel/dma/direct.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 30ccbc08e229..a76a1a2f24da 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,11 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 		min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page, size_t size)
+{
+	dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 		gfp_t gfp)
 {
@@ -237,7 +242,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 			return NULL;
 	}
 out_free_pages:
-	dma_free_contiguous(dev, page, size);
+	__dma_direct_free_pages(dev, page, size);
 	return NULL;
 }
 
@@ -273,7 +278,7 @@ void dma_direct_free(struct device *dev, size_t size,
 	else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
 		arch_dma_clear_uncached(cpu_addr, size);
 
-	dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+	__dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
 struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -310,7 +315,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
 	*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
 	return page;
 out_free_pages:
-	dma_free_contiguous(dev, page, size);
+	__dma_direct_free_pages(dev, page, size);
 	return NULL;
 }
 
@@ -329,7 +334,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
 	if (force_dma_unencrypted(dev))
 		set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
 
-	dma_free_contiguous(dev, page, size);
+	__dma_direct_free_pages(dev, page, size);
 }
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 11/14] swiotlb: Add is_dev_swiotlb_force()
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (9 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 10/14] dma-direct: Add a new wrapper __dma_direct_free_pages() Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support Claire Chang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Add is_dev_swiotlb_force() which returns true if the device has
restricted DMA pool (e.g. dev->dev_swiotlb is set).

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 include/linux/swiotlb.h | 9 +++++++++
 kernel/dma/swiotlb.c    | 5 +++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 76f86c684524..b9f2a250c8da 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,11 +73,16 @@ extern enum swiotlb_force swiotlb_force;
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
 bool is_swiotlb_force(struct device *dev);
+bool is_dev_swiotlb_force(struct device *dev);
 #else
 static inline bool is_swiotlb_force(struct device *dev)
 {
 	return unlikely(swiotlb_force == SWIOTLB_FORCE);
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+	return false;
+}
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
 bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr);
@@ -93,6 +98,10 @@ static inline bool is_swiotlb_force(struct device *dev)
 {
 	return false;
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+	return false;
+}
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
 	return false;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f64cbe6e84cc..fd9c1bd183ac 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -841,6 +841,11 @@ bool is_swiotlb_force(struct device *dev)
 	return unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dev_swiotlb;
 }
 
+bool is_dev_swiotlb_force(struct device *dev)
+{
+	return dev->dev_swiotlb;
+}
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
 				    struct device *dev)
 {
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support.
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (10 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 11/14] swiotlb: Add is_dev_swiotlb_force() Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-02-26  4:17   ` Claire Chang
  2021-02-09  6:21 ` [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool Claire Chang
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Add the functions, dev_swiotlb_{alloc,free} to support the memory
allocation from restricted DMA pool.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/direct.c     | 30 ++++++++++++++++++++++--------
 kernel/dma/swiotlb.c    | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b9f2a250c8da..2cd39e102915 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -74,6 +74,8 @@ extern enum swiotlb_force swiotlb_force;
 #ifdef CONFIG_DMA_RESTRICTED_POOL
 bool is_swiotlb_force(struct device *dev);
 bool is_dev_swiotlb_force(struct device *dev);
+struct page *dev_swiotlb_alloc(struct device *dev, size_t size, gfp_t gfp);
+bool dev_swiotlb_free(struct device *dev, struct page *page, size_t size);
 #else
 static inline bool is_swiotlb_force(struct device *dev)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a76a1a2f24da..f9a9321f7559 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -12,6 +12,7 @@
 #include <linux/pfn.h>
 #include <linux/vmalloc.h>
 #include <linux/set_memory.h>
+#include <linux/swiotlb.h>
 #include <linux/slab.h>
 #include "direct.h"
 
@@ -77,6 +78,10 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 
 static void __dma_direct_free_pages(struct device *dev, struct page *page, size_t size)
 {
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+	if (dev_swiotlb_free(dev, page, size))
+		return;
+#endif
 	dma_free_contiguous(dev, page, size);
 }
 
@@ -89,6 +94,12 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 
 	WARN_ON_ONCE(!PAGE_ALIGNED(size));
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+	page = dev_swiotlb_alloc(dev, size, gfp);
+	if (page)
+		return page;
+#endif
+
 	gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
 					   &phys_limit);
 	page = dma_alloc_contiguous(dev, size, gfp);
@@ -147,7 +158,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 		gfp |= __GFP_NOWARN;
 
 	if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-	    !force_dma_unencrypted(dev)) {
+	    !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
 		page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
 		if (!page)
 			return NULL;
@@ -160,8 +171,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 	}
 
 	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-	    !dev_is_dma_coherent(dev))
+	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+	    !is_dev_swiotlb_force(dev))
 		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
 	/*
@@ -171,7 +182,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
 	    !gfpflags_allow_blocking(gfp) &&
 	    (force_dma_unencrypted(dev) ||
-	     (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev))))
+	     (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+	      !dev_is_dma_coherent(dev))) &&
+	    !is_dev_swiotlb_force(dev))
 		return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
 	/* we always manually zero the memory once we are done */
@@ -252,15 +265,15 @@ void dma_direct_free(struct device *dev, size_t size,
 	unsigned int page_order = get_order(size);
 
 	if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-	    !force_dma_unencrypted(dev)) {
+	    !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
 		/* cpu_addr is a struct page cookie, not a kernel address */
 		dma_free_contiguous(dev, cpu_addr, size);
 		return;
 	}
 
 	if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-	    !dev_is_dma_coherent(dev)) {
+	    !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+	    !is_dev_swiotlb_force(dev)) {
 		arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
 		return;
 	}
@@ -288,7 +301,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
 	void *ret;
 
 	if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-	    force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+	    force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+	    !is_dev_swiotlb_force(dev))
 		return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
 	page = __dma_direct_alloc_pages(dev, size, gfp);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fd9c1bd183ac..8b77fd64199e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -836,6 +836,40 @@ late_initcall(swiotlb_create_default_debugfs);
 #endif
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *dev_swiotlb_alloc(struct device *dev, size_t size, gfp_t gfp)
+{
+	struct swiotlb *swiotlb;
+	phys_addr_t tlb_addr;
+	unsigned int index;
+
+	/* dev_swiotlb_alloc can be used only in the context which permits sleeping. */
+	if (!dev->dev_swiotlb || !gfpflags_allow_blocking(gfp))
+		return NULL;
+
+	swiotlb = dev->dev_swiotlb;
+	index = swiotlb_tbl_find_free_region(dev, swiotlb->start, size, 0);
+	if (index < 0)
+		return NULL;
+
+	tlb_addr = swiotlb->start + (index << IO_TLB_SHIFT);
+
+	return pfn_to_page(PFN_DOWN(tlb_addr));
+}
+
+bool dev_swiotlb_free(struct device *dev, struct page *page, size_t size)
+{
+	unsigned int index;
+	phys_addr_t tlb_addr = page_to_phys(page);
+
+	if (!is_swiotlb_buffer(dev, tlb_addr))
+		return false;
+
+	index = (tlb_addr - dev->dev_swiotlb->start) >> IO_TLB_SHIFT;
+	swiotlb_tbl_release_region(dev, index, size);
+
+	return true;
+}
+
 bool is_swiotlb_force(struct device *dev)
 {
 	return unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dev_swiotlb;
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support.
  2021-02-09  6:21 ` [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support Claire Chang
@ 2021-02-26  4:17   ` Claire Chang
  2021-02-26  5:17     ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Claire Chang @ 2021-02-26  4:17 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan

> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index fd9c1bd183ac..8b77fd64199e 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -836,6 +836,40 @@ late_initcall(swiotlb_create_default_debugfs);
>  #endif
>
>  #ifdef CONFIG_DMA_RESTRICTED_POOL
> +struct page *dev_swiotlb_alloc(struct device *dev, size_t size, gfp_t gfp)
> +{
> +       struct swiotlb *swiotlb;
> +       phys_addr_t tlb_addr;
> +       unsigned int index;
> +
> +       /* dev_swiotlb_alloc can be used only in the context which permits sleeping. */
> +       if (!dev->dev_swiotlb || !gfpflags_allow_blocking(gfp))

Just noticed that !gfpflags_allow_blocking(gfp) shouldn't be here.

Hi Christoph,

Do you think I should fix this and rebase on the latest linux-next
now? I wonder if there are more factor and clean up coming and I
should wait after that.

Thanks,
Claire

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support.
  2021-02-26  4:17   ` Claire Chang
@ 2021-02-26  5:17     ` Christoph Hellwig
  2021-02-26  9:35       ` Claire Chang
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2021-02-26  5:17 UTC (permalink / raw)
  To: Claire Chang
  Cc: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski, benh, paulus, list@263.net:IOMMU DRIVERS,
	sstabellini, Robin Murphy, grant.likely, xypron.glpk,
	Thierry Reding, mingo, bauerman, peterz, Greg KH, Saravana Kannan,
	Rafael J . Wysocki, heikki.krogerus, Andy Shevchenko,
	Randy Dunlap, Dan Williams, Bartosz Golaszewski, linux-devicetree,
	lkml, linuxppc-dev, xen-devel, Nicolas Boichat, Jim Quinlan

On Fri, Feb 26, 2021 at 12:17:50PM +0800, Claire Chang wrote:
> Do you think I should fix this and rebase on the latest linux-next
> now? I wonder if there are more factor and clean up coming and I
> should wait after that.

Here is my preferred plan:

 1) wait for my series to support the min alignment in swiotlb to
    land in Linus tree
 2) I'll resend my series with the further swiotlb cleanup and
    refactoring, which includes a slightly rebased version of your
    patch to add the io_tlb_mem structure
 3) resend your series on top of that as a baseline

This is my current WIP tree for 2:

  http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-struct

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support.
  2021-02-26  5:17     ` Christoph Hellwig
@ 2021-02-26  9:35       ` Claire Chang
  0 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-26  9:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Marek Szyprowski,
	benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan

On Fri, Feb 26, 2021 at 1:17 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Feb 26, 2021 at 12:17:50PM +0800, Claire Chang wrote:
> > Do you think I should fix this and rebase on the latest linux-next
> > now? I wonder if there are more factor and clean up coming and I
> > should wait after that.
>
> Here is my preferred plan:
>
>  1) wait for my series to support the min alignment in swiotlb to
>     land in Linus tree
>  2) I'll resend my series with the further swiotlb cleanup and
>     refactoring, which includes a slightly rebased version of your
>     patch to add the io_tlb_mem structure
>  3) resend your series on top of that as a baseline
>
> This is my current WIP tree for 2:
>
>   http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-struct

Sounds good to me. Thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (11 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-03-10 16:07   ` Will Deacon
  2021-02-09  6:21 ` [PATCH v4 14/14] of: Add plumbing for " Claire Chang
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..fc9a12c2f679 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,20 @@ compatible (optional) - standard definition
           used as a shared pool of DMA buffers for a set of devices. It can
           be used by an operating system to instantiate the necessary pool
           management subsystem if necessary.
+        - restricted-dma-pool: This indicates a region of memory meant to be
+          used as a pool of restricted DMA buffers for a set of devices. The
+          memory region would be the only region accessible to those devices.
+          When using this, the no-map and reusable properties must not be set,
+          so the operating system can create a virtual mapping that will be used
+          for synchronization. The main purpose for restricted DMA is to
+          mitigate the lack of DMA access control on systems without an IOMMU,
+          which could result in the DMA accessing the system memory at
+          unexpected times and/or unexpected addresses, possibly leading to data
+          leakage or corruption. The feature on its own provides a basic level
+          of protection against the DMA overwriting buffer contents at
+          unexpected times. However, to protect against general data leakage and
+          system memory corruption, the system needs to provide way to lock down
+          the memory access, e.g., MPU.
         - vendor specific string in the form <vendor>,[<device>-]<usage>
 no-map (optional) - empty property
     - Indicates the operating system must not create a virtual mapping
@@ -120,6 +134,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
 			compatible = "acme,multimedia-memory";
 			reg = <0x77000000 0x4000000>;
 		};
+
+		restricted_dma_mem_reserved: restricted_dma_mem_reserved {
+			compatible = "restricted-dma-pool";
+			reg = <0x50000000 0x400000>;
+		};
 	};
 
 	/* ... */
@@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
 		memory-region = <&multimedia_reserved>;
 		/* ... */
 	};
+
+	pcie_device: pcie_device@0,0 {
+		memory-region = <&restricted_dma_mem_reserved>;
+		/* ... */
+	};
 };
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool
  2021-02-09  6:21 ` [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool Claire Chang
@ 2021-03-10 16:07   ` Will Deacon
  2021-03-10 21:40     ` Rob Herring
  0 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2021-03-10 16:07 UTC (permalink / raw)
  To: Claire Chang
  Cc: Rob Herring, mpe, Joerg Roedel, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski, benh, paulus, list@263.net:IOMMU DRIVERS,
	sstabellini, Robin Murphy, grant.likely, xypron.glpk,
	Thierry Reding, mingo, bauerman, peterz, Greg KH, Saravana Kannan,
	Rafael J . Wysocki, heikki.krogerus, Andy Shevchenko,
	Randy Dunlap, Dan Williams, Bartosz Golaszewski, linux-devicetree,
	lkml, linuxppc-dev, xen-devel, Nicolas Boichat, Jim Quinlan

Hi Claire,

On Tue, Feb 09, 2021 at 02:21:30PM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the reserved-memory node.
> 
> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
>  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..fc9a12c2f679 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>            used as a shared pool of DMA buffers for a set of devices. It can
>            be used by an operating system to instantiate the necessary pool
>            management subsystem if necessary.
> +        - restricted-dma-pool: This indicates a region of memory meant to be
> +          used as a pool of restricted DMA buffers for a set of devices. The
> +          memory region would be the only region accessible to those devices.
> +          When using this, the no-map and reusable properties must not be set,
> +          so the operating system can create a virtual mapping that will be used
> +          for synchronization. The main purpose for restricted DMA is to
> +          mitigate the lack of DMA access control on systems without an IOMMU,
> +          which could result in the DMA accessing the system memory at
> +          unexpected times and/or unexpected addresses, possibly leading to data
> +          leakage or corruption. The feature on its own provides a basic level
> +          of protection against the DMA overwriting buffer contents at
> +          unexpected times. However, to protect against general data leakage and
> +          system memory corruption, the system needs to provide way to lock down
> +          the memory access, e.g., MPU.

As far as I can tell, these pools work with both static allocations (which
seem to match your use-case where firmware has preconfigured the DMA ranges)
but also with dynamic allocations where a 'size' property is present instead
of the 'reg' property and the kernel is responsible for allocating the
reservation during boot. Am I right and, if so, is that deliberate?

I ask because I think that would potentially be useful to us for the
Protected KVM work, where we need to bounce virtio memory accesses via
guest-determined windows because the guest memory is generally inaccessible
to the host. We've been hacking this using a combination of "swiotlb=force"
and set_memory_{decrypted,encrypted}() but it would be much better to
leverage the stuff you have here.

Also:

> +
> +		restricted_dma_mem_reserved: restricted_dma_mem_reserved {
> +			compatible = "restricted-dma-pool";
> +			reg = <0x50000000 0x400000>;
> +		};
>  	};
>  
>  	/* ... */
> @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
>  		memory-region = <&multimedia_reserved>;
>  		/* ... */
>  	};
> +
> +	pcie_device: pcie_device@0,0 {
> +		memory-region = <&restricted_dma_mem_reserved>;
> +		/* ... */
> +	};

I find this example a bit weird, as I didn't think we usually had DT nodes
for PCI devices; rather they are discovered as a result of probing config
space. Is the idea that you have one reserved memory region attached to the
RC and all the PCI devices below that share the region, or is there a need
for a mapping mechanism?

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool
  2021-03-10 16:07   ` Will Deacon
@ 2021-03-10 21:40     ` Rob Herring
  2021-03-11  5:04       ` Florian Fainelli
  0 siblings, 1 reply; 35+ messages in thread
From: Rob Herring @ 2021-03-10 21:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: Claire Chang, Michael Ellerman, Joerg Roedel, Frank Rowand,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Juergen Gross,
	Christoph Hellwig, Marek Szyprowski, Benjamin Herrenschmidt,
	Paul Mackerras, list@263.net:IOMMU DRIVERS, Stefano Stabellini,
	Robin Murphy, Grant Likely, Heinrich Schuchardt, Thierry Reding,
	Ingo Molnar, Thiago Jung Bauermann, Peter Zijlstra, Greg KH,
	Saravana Kannan, Rafael J . Wysocki, Heikki Krogerus,
	Andy Shevchenko, Randy Dunlap, Dan Williams, Bartosz Golaszewski,
	linux-devicetree, lkml, linuxppc-dev, xen-devel, Nicolas Boichat,
	Jim Quinlan

On Wed, Mar 10, 2021 at 9:08 AM Will Deacon <will@kernel.org> wrote:
>
> Hi Claire,
>
> On Tue, Feb 09, 2021 at 02:21:30PM +0800, Claire Chang wrote:
> > Introduce the new compatible string, restricted-dma-pool, for restricted
> > DMA. One can specify the address and length of the restricted DMA memory
> > region by restricted-dma-pool in the reserved-memory node.
> >
> > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > ---
> >  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index e8d3096d922c..fc9a12c2f679 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> >            used as a shared pool of DMA buffers for a set of devices. It can
> >            be used by an operating system to instantiate the necessary pool
> >            management subsystem if necessary.
> > +        - restricted-dma-pool: This indicates a region of memory meant to be
> > +          used as a pool of restricted DMA buffers for a set of devices. The
> > +          memory region would be the only region accessible to those devices.
> > +          When using this, the no-map and reusable properties must not be set,
> > +          so the operating system can create a virtual mapping that will be used
> > +          for synchronization. The main purpose for restricted DMA is to
> > +          mitigate the lack of DMA access control on systems without an IOMMU,
> > +          which could result in the DMA accessing the system memory at
> > +          unexpected times and/or unexpected addresses, possibly leading to data
> > +          leakage or corruption. The feature on its own provides a basic level
> > +          of protection against the DMA overwriting buffer contents at
> > +          unexpected times. However, to protect against general data leakage and
> > +          system memory corruption, the system needs to provide way to lock down
> > +          the memory access, e.g., MPU.
>
> As far as I can tell, these pools work with both static allocations (which
> seem to match your use-case where firmware has preconfigured the DMA ranges)
> but also with dynamic allocations where a 'size' property is present instead
> of the 'reg' property and the kernel is responsible for allocating the
> reservation during boot. Am I right and, if so, is that deliberate?

I believe so. I'm not keen on having size only reservations in DT.
Yes, we allowed that already, but that's back from the days of needing
large CMA carveouts to be reserved early in boot. I've read that the
kernel is much better now at contiguous allocations, so do we really
need this in DT anymore?

> I ask because I think that would potentially be useful to us for the
> Protected KVM work, where we need to bounce virtio memory accesses via
> guest-determined windows because the guest memory is generally inaccessible
> to the host. We've been hacking this using a combination of "swiotlb=force"
> and set_memory_{decrypted,encrypted}() but it would be much better to
> leverage the stuff you have here.
>
> Also:
>
> > +
> > +             restricted_dma_mem_reserved: restricted_dma_mem_reserved {
> > +                     compatible = "restricted-dma-pool";
> > +                     reg = <0x50000000 0x400000>;
> > +             };
> >       };
> >
> >       /* ... */
> > @@ -138,4 +157,9 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB).
> >               memory-region = <&multimedia_reserved>;
> >               /* ... */
> >       };
> > +
> > +     pcie_device: pcie_device@0,0 {
> > +             memory-region = <&restricted_dma_mem_reserved>;
> > +             /* ... */
> > +     };
>
> I find this example a bit weird, as I didn't think we usually had DT nodes
> for PCI devices; rather they are discovered as a result of probing config
> space. Is the idea that you have one reserved memory region attached to the
> RC and all the PCI devices below that share the region, or is there a need
> for a mapping mechanism?

We can have DT nodes for PCI. AIUI, IBM power systems always do. For
FDT, it's only if there are extra non-discoverable resources. It's
particularly fun when it's resources which need to be enabled for the
PCI device to be discovered. That seems to be a growing problem as PCI
becomes more common on embedded systems.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool
  2021-03-10 21:40     ` Rob Herring
@ 2021-03-11  5:04       ` Florian Fainelli
  0 siblings, 0 replies; 35+ messages in thread
From: Florian Fainelli @ 2021-03-11  5:04 UTC (permalink / raw)
  To: Rob Herring, Will Deacon
  Cc: Claire Chang, Michael Ellerman, Joerg Roedel, Frank Rowand,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Juergen Gross,
	Christoph Hellwig, Marek Szyprowski, Benjamin Herrenschmidt,
	Paul Mackerras, list@263.net:IOMMU DRIVERS, Stefano Stabellini,
	Robin Murphy, Grant Likely, Heinrich Schuchardt, Thierry Reding,
	Ingo Molnar, Thiago Jung Bauermann, Peter Zijlstra, Greg KH,
	Saravana Kannan, Rafael J . Wysocki, Heikki Krogerus,
	Andy Shevchenko, Randy Dunlap, Dan Williams, Bartosz Golaszewski,
	linux-devicetree, lkml, linuxppc-dev, xen-devel, Nicolas Boichat,
	Jim Quinlan



On 3/10/2021 1:40 PM, Rob Herring wrote:
> On Wed, Mar 10, 2021 at 9:08 AM Will Deacon <will@kernel.org> wrote:
>>
>> Hi Claire,
>>
>> On Tue, Feb 09, 2021 at 02:21:30PM +0800, Claire Chang wrote:
>>> Introduce the new compatible string, restricted-dma-pool, for restricted
>>> DMA. One can specify the address and length of the restricted DMA memory
>>> region by restricted-dma-pool in the reserved-memory node.
>>>
>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>> ---
>>>  .../reserved-memory/reserved-memory.txt       | 24 +++++++++++++++++++
>>>  1 file changed, 24 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>> index e8d3096d922c..fc9a12c2f679 100644
>>> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>>>            used as a shared pool of DMA buffers for a set of devices. It can
>>>            be used by an operating system to instantiate the necessary pool
>>>            management subsystem if necessary.
>>> +        - restricted-dma-pool: This indicates a region of memory meant to be
>>> +          used as a pool of restricted DMA buffers for a set of devices. The
>>> +          memory region would be the only region accessible to those devices.
>>> +          When using this, the no-map and reusable properties must not be set,
>>> +          so the operating system can create a virtual mapping that will be used
>>> +          for synchronization. The main purpose for restricted DMA is to
>>> +          mitigate the lack of DMA access control on systems without an IOMMU,
>>> +          which could result in the DMA accessing the system memory at
>>> +          unexpected times and/or unexpected addresses, possibly leading to data
>>> +          leakage or corruption. The feature on its own provides a basic level
>>> +          of protection against the DMA overwriting buffer contents at
>>> +          unexpected times. However, to protect against general data leakage and
>>> +          system memory corruption, the system needs to provide way to lock down
>>> +          the memory access, e.g., MPU.
>>
>> As far as I can tell, these pools work with both static allocations (which
>> seem to match your use-case where firmware has preconfigured the DMA ranges)
>> but also with dynamic allocations where a 'size' property is present instead
>> of the 'reg' property and the kernel is responsible for allocating the
>> reservation during boot. Am I right and, if so, is that deliberate?
> 
> I believe so. I'm not keen on having size only reservations in DT.
> Yes, we allowed that already, but that's back from the days of needing
> large CMA carveouts to be reserved early in boot. I've read that the
> kernel is much better now at contiguous allocations, so do we really
> need this in DT anymore?

I would say yes, there can be a number of times where you want to semi
statically partition your physical memory and their reserved regions. Be
it to pack everything together under the same protection rules or
because you need to allocate memory from a particular address range in
say a non-uniform memory controller architecture where address windows
have different scheduling algorithms.
-- 
Florian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 14/14] of: Add plumbing for restricted DMA pool
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (12 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool Claire Chang
@ 2021-02-09  6:21 ` Claire Chang
  2021-04-22  8:20 ` [PATCH v4 00/14] Restricted DMA Claire Chang
  2025-03-21 15:38 ` Using Restricted DMA for virtio-pci David Woodhouse
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-02-09  6:21 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan, Claire Chang

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang <tientzu@chromium.org>
---
 drivers/of/address.c    | 25 +++++++++++++++++++++++++
 drivers/of/device.c     |  3 +++
 drivers/of/of_private.h |  5 +++++
 3 files changed, 33 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..b6093c9b135d 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include <linux/logic_pio.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
+#include <linux/of_reserved_mem.h>
 #include <linux/pci.h>
 #include <linux/pci_regs.h>
 #include <linux/sizes.h>
@@ -1094,3 +1095,27 @@ bool of_dma_is_coherent(struct device_node *np)
 	return false;
 }
 EXPORT_SYMBOL_GPL(of_dma_is_coherent);
+
+int of_dma_set_restricted_buffer(struct device *dev)
+{
+	struct device_node *node;
+	int count, i;
+
+	if (!dev->of_node)
+		return 0;
+
+	count = of_property_count_elems_of_size(dev->of_node, "memory-region",
+						sizeof(phandle));
+	for (i = 0; i < count; i++) {
+		node = of_parse_phandle(dev->of_node, "memory-region", i);
+		/* There might be multiple memory regions, but only one
+		 * restriced-dma-pool region is allowed.
+		 */
+		if (of_device_is_compatible(node, "restricted-dma-pool") &&
+		    of_device_is_available(node))
+			return of_reserved_mem_device_init_by_idx(
+				dev, dev->of_node, i);
+	}
+
+	return 0;
+}
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 1122daa8e273..38c631f1fafa 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -186,6 +186,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
 
 	arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+	if (!iommu)
+		return of_dma_set_restricted_buffer(dev);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..28a2dfa197ba 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,17 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
 		const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev);
 #else
 static inline int of_dma_get_range(struct device_node *np,
 		const struct bus_dma_region **map)
 {
 	return -ENODEV;
 }
+static inline int of_dma_get_restricted_buffer(struct device *dev)
+{
+	return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/14] Restricted DMA
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (13 preceding siblings ...)
  2021-02-09  6:21 ` [PATCH v4 14/14] of: Add plumbing for " Claire Chang
@ 2021-04-22  8:20 ` Claire Chang
  2025-03-21 15:38 ` Using Restricted DMA for virtio-pci David Woodhouse
  15 siblings, 0 replies; 35+ messages in thread
From: Claire Chang @ 2021-04-22  8:20 UTC (permalink / raw)
  To: Rob Herring, mpe, Joerg Roedel, Will Deacon, Frank Rowand,
	Konrad Rzeszutek Wilk, boris.ostrovsky, jgross, Christoph Hellwig,
	Marek Szyprowski
  Cc: benh, paulus, list@263.net:IOMMU DRIVERS, sstabellini,
	Robin Murphy, grant.likely, xypron.glpk, Thierry Reding, mingo,
	bauerman, peterz, Greg KH, Saravana Kannan, Rafael J . Wysocki,
	heikki.krogerus, Andy Shevchenko, Randy Dunlap, Dan Williams,
	Bartosz Golaszewski, linux-devicetree, lkml, linuxppc-dev,
	xen-devel, Nicolas Boichat, Jim Quinlan

v5 here: https://lore.kernel.org/patchwork/cover/1416899/ to rebase
onto Christoph's swiotlb cleanups.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Using Restricted DMA for virtio-pci
  2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
                   ` (14 preceding siblings ...)
  2021-04-22  8:20 ` [PATCH v4 00/14] Restricted DMA Claire Chang
@ 2025-03-21 15:38 ` David Woodhouse
  2025-03-21 18:32   ` Michael S. Tsirkin
  15 siblings, 1 reply; 35+ messages in thread
From: David Woodhouse @ 2025-03-21 15:38 UTC (permalink / raw)
  To: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski
  Cc: heikki.krogerus, peterz, benh, grant.likely, paulus, mingo,
	sstabellini, Saravana Kannan, xypron.glpk, Rafael J . Wysocki,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	linuxppc-dev, Nicolas Boichat, Dan Williams, Andy Shevchenko,
	Greg KH, Randy Dunlap, lkml, list@263.net:IOMMU DRIVERS,
	Jim Quinlan, Robin Murphy, hch, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, virtualization, graf

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.

Replying to an ancient (2021) thread which has already been merged...

I'd like to be able to use this facility for virtio devices.

Virtio already has a complicated relationship with the DMA API, because
there were a bunch of early VMM bugs where the virtio devices where
magically exempted from IOMMU protection, but the VMM lied to the guest
and claimed they weren't.

With the advent of confidential computing, and the VMM (or whatever's
emulating the virtio device) not being *allowed* to arbitrarily access
all of the guest's memory, the DMA API becomes necessary again.

Either a virtual IOMMU needs to determine which guest memory the VMM
may access, or the DMA API is wrappers around operations which
share/unshare (or unencrypt/encrypt) the memory in question.

All of which is complicated and slow, if we're looking at a minimal
privileged hypervisor stub like pKVM which enforces the lack of guest
memory access from VMM.

I'm thinking of defining a new type of virtio-pci device which cannot
do DMA to arbitrary system memory. Instead it has an additional memory
BAR which is used as a SWIOTLB for bounce buffering.

The driver for it would look much like the existing virtio-pci device
except that it would register the restricted-dma region first (and thus
the swiotlb dma_ops), and then just go through the rest of the setup
like any other virtio device.

That seems like it ought to be fairly simple, and seems like a
reasonable way to allow an untrusted VMM to provide virtio devices with
restricted DMA access.

While I start actually doing the typing... does anyone want to start
yelling at me now? Christoph? mst? :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-21 15:38 ` Using Restricted DMA for virtio-pci David Woodhouse
@ 2025-03-21 18:32   ` Michael S. Tsirkin
  2025-03-21 18:42     ` David Woodhouse
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-03-21 18:32 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

On Fri, Mar 21, 2025 at 03:38:10PM +0000, David Woodhouse wrote:
> On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> 
> Replying to an ancient (2021) thread which has already been merged...
> 
> I'd like to be able to use this facility for virtio devices.
> 
> Virtio already has a complicated relationship with the DMA API, because
> there were a bunch of early VMM bugs where the virtio devices where
> magically exempted from IOMMU protection, but the VMM lied to the guest
> and claimed they weren't.
> 
> With the advent of confidential computing, and the VMM (or whatever's
> emulating the virtio device) not being *allowed* to arbitrarily access
> all of the guest's memory, the DMA API becomes necessary again.
> 
> Either a virtual IOMMU needs to determine which guest memory the VMM
> may access, or the DMA API is wrappers around operations which
> share/unshare (or unencrypt/encrypt) the memory in question.
> 
> All of which is complicated and slow, if we're looking at a minimal
> privileged hypervisor stub like pKVM which enforces the lack of guest
> memory access from VMM.
> 
> I'm thinking of defining a new type of virtio-pci device which cannot
> do DMA to arbitrary system memory. Instead it has an additional memory
> BAR which is used as a SWIOTLB for bounce buffering.
> 
> The driver for it would look much like the existing virtio-pci device
> except that it would register the restricted-dma region first (and thus
> the swiotlb dma_ops), and then just go through the rest of the setup
> like any other virtio device.
> 
> That seems like it ought to be fairly simple, and seems like a
> reasonable way to allow an untrusted VMM to provide virtio devices with
> restricted DMA access.
> 
> While I start actually doing the typing... does anyone want to start
> yelling at me now? Christoph? mst? :)


I don't mind as such (though I don't understand completely), but since
this is changing the device anyway, I am a bit confused why you can't
just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
which will DTRT for you, will it not?

-- 
MST


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-21 18:32   ` Michael S. Tsirkin
@ 2025-03-21 18:42     ` David Woodhouse
  2025-03-28 17:40       ` David Woodhouse
  2025-03-30 17:06       ` Michael S. Tsirkin
  0 siblings, 2 replies; 35+ messages in thread
From: David Woodhouse @ 2025-03-21 18:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

[-- Attachment #1: Type: text/plain, Size: 4027 bytes --]

On Fri, 2025-03-21 at 14:32 -0400, Michael S. Tsirkin wrote:
> On Fri, Mar 21, 2025 at 03:38:10PM +0000, David Woodhouse wrote:
> > On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > > This series implements mitigations for lack of DMA access control on
> > > systems without an IOMMU, which could result in the DMA accessing the
> > > system memory at unexpected times and/or unexpected addresses, possibly
> > > leading to data leakage or corruption.
> > 
> > Replying to an ancient (2021) thread which has already been merged...
> > 
> > I'd like to be able to use this facility for virtio devices.
> > 
> > Virtio already has a complicated relationship with the DMA API, because
> > there were a bunch of early VMM bugs where the virtio devices where
> > magically exempted from IOMMU protection, but the VMM lied to the guest
> > and claimed they weren't.
> > 
> > With the advent of confidential computing, and the VMM (or whatever's
> > emulating the virtio device) not being *allowed* to arbitrarily access
> > all of the guest's memory, the DMA API becomes necessary again.
> > 
> > Either a virtual IOMMU needs to determine which guest memory the VMM
> > may access, or the DMA API is wrappers around operations which
> > share/unshare (or unencrypt/encrypt) the memory in question.
> > 
> > All of which is complicated and slow, if we're looking at a minimal
> > privileged hypervisor stub like pKVM which enforces the lack of guest
> > memory access from VMM.
> > 
> > I'm thinking of defining a new type of virtio-pci device which cannot
> > do DMA to arbitrary system memory. Instead it has an additional memory
> > BAR which is used as a SWIOTLB for bounce buffering.
> > 
> > The driver for it would look much like the existing virtio-pci device
> > except that it would register the restricted-dma region first (and thus
> > the swiotlb dma_ops), and then just go through the rest of the setup
> > like any other virtio device.
> > 
> > That seems like it ought to be fairly simple, and seems like a
> > reasonable way to allow an untrusted VMM to provide virtio devices with
> > restricted DMA access.
> > 
> > While I start actually doing the typing... does anyone want to start
> > yelling at me now? Christoph? mst? :)
> 
> 
> I don't mind as such (though I don't understand completely), but since
> this is changing the device anyway, I am a bit confused why you can't
> just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> which will DTRT for you, will it not?

That would be necessary but not sufficient. The question is *what* does
the DMA API do?

For a real passthrough PCI device, perhaps we'd have a vIOMMU exposed
to the guest so that it can do real protection with two-stage page
tables (IOVA→GPA under control of the guest, GPA→HPA under control of
the hypervisor). For that to work in the pKVM model though, you'd need
pKVM to be talking the guest's stage1 I/O page tables to see if a given
access from the VMM ought to be permitted?

Or for confidential guests there could be DMA ops which are an
'enlightenment'; a hypercall into pKVM to share/unshare pages so that
the VMM can actually access them, or SEV-SNP guests might mark pages
unencrypted to have the same effect with hardware protection.

Doing any of those dynamically to allow the VMM to access buffers in
arbitrary guest memory (when it wouldn't normally have access to
arbitrary guest memory) is complex and doesn't perform very well. And
exposes a full 4KiB page for any byte that needs to be made available.

Thus the idea of having a fixed range of memory to use for a SWIOTLB,
which is fairly much what the restricted DMA setup is all about.

We're just proposing that we build it in to a virtio-pci device model,
which automatically uses the extra memory BAR instead of the
restricted-dma-pool DT node.

It's basically just allowing us to expose through PCI, what I believe
we can already do for virtio in DT.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-21 18:42     ` David Woodhouse
@ 2025-03-28 17:40       ` David Woodhouse
  2025-03-30 13:42         ` Michael S. Tsirkin
  2025-03-30 17:06       ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: David Woodhouse @ 2025-03-28 17:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

[-- Attachment #1: Type: text/plain, Size: 7499 bytes --]

On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
> On Fri, 2025-03-21 at 14:32 -0400, Michael S. Tsirkin wrote:
> > On Fri, Mar 21, 2025 at 03:38:10PM +0000, David Woodhouse wrote:
> > > On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > > > This series implements mitigations for lack of DMA access control on
> > > > systems without an IOMMU, which could result in the DMA accessing the
> > > > system memory at unexpected times and/or unexpected addresses, possibly
> > > > leading to data leakage or corruption.
> > > 
> > > Replying to an ancient (2021) thread which has already been merged...
> > > 
> > > I'd like to be able to use this facility for virtio devices.
> > > 
> > > Virtio already has a complicated relationship with the DMA API, because
> > > there were a bunch of early VMM bugs where the virtio devices where
> > > magically exempted from IOMMU protection, but the VMM lied to the guest
> > > and claimed they weren't.
> > > 
> > > With the advent of confidential computing, and the VMM (or whatever's
> > > emulating the virtio device) not being *allowed* to arbitrarily access
> > > all of the guest's memory, the DMA API becomes necessary again.
> > > 
> > > Either a virtual IOMMU needs to determine which guest memory the VMM
> > > may access, or the DMA API is wrappers around operations which
> > > share/unshare (or unencrypt/encrypt) the memory in question.
> > > 
> > > All of which is complicated and slow, if we're looking at a minimal
> > > privileged hypervisor stub like pKVM which enforces the lack of guest
> > > memory access from VMM.
> > > 
> > > I'm thinking of defining a new type of virtio-pci device which cannot
> > > do DMA to arbitrary system memory. Instead it has an additional memory
> > > BAR which is used as a SWIOTLB for bounce buffering.
> > > 
> > > The driver for it would look much like the existing virtio-pci device
> > > except that it would register the restricted-dma region first (and thus
> > > the swiotlb dma_ops), and then just go through the rest of the setup
> > > like any other virtio device.
> > > 
> > > That seems like it ought to be fairly simple, and seems like a
> > > reasonable way to allow an untrusted VMM to provide virtio devices with
> > > restricted DMA access.
> > > 
> > > While I start actually doing the typing... does anyone want to start
> > > yelling at me now? Christoph? mst? :)
> > 
> > 
> > I don't mind as such (though I don't understand completely), but since
> > this is changing the device anyway, I am a bit confused why you can't
> > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > which will DTRT for you, will it not?
> 
> That would be necessary but not sufficient. ...

My first cut at a proposed spec change looks something like this. I'll
post it to the virtio-comment list once I've done some corporate
bureaucracy and when the list stops sending me python tracebacks in
response to my subscribe request.

In the meantime I'll hack up some QEMU and guest Linux driver support
to match.

diff --git a/content.tex b/content.tex
index c17ffa6..1e6e1d6 100644
--- a/content.tex
+++ b/content.tex
@@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 Currently these device-independent feature bits are defined:
 
 \begin{description}
+  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
+  provides a memory region which is to be used for bounce buffering,
+  rather than permitting direct memory access to system memory.
   \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
   that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
   flag set, as described in \ref{sec:Basic Facilities of a Virtio
@@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical
 addresses to the device.
 
+A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
+then pass only addresses within the Software IOTLB bounce buffer to the
+device.
+
 A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
 
 A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
@@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
 accepted.
 
+A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
+provide a Software IOTLB bounce buffer.
+A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
+
 If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
 buffers in the same order in which they have been available.
 
diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..23e0d57 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
 \item ISR Status
 \item Device-specific configuration (optional)
 \item PCI configuration access
+\item SWIOTLB bounce buffer
 \end{itemize}
 
 Each structure can be mapped by a Base Address register (BAR) belonging to
@@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
 #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
 /* Vendor-specific data */
 #define VIRTIO_PCI_CAP_VENDOR_CFG        9
+/* Software IOTLB bounce buffer */
+#define VIRTIO_PCI_CAP_SWIOTLB           10
 \end{lstlisting}
 
         Any other value is reserved for future use.
@@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
 The driver MUST qualify the \field{vendor_id} before
 interpreting or writing into the Vendor data capability.
 
+\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Software IOTLB bounce buffer capability}
+
+The optional Software IOTLB bounce buffer capability allows the
+device to provide a memory region which can be used by the driver
+driver for bounce buffering. This allows a device on the PCI
+transport to operate without DMA access to system memory addresses.
+
+The Software IOTLB region is referenced by the
+VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
+range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
+capability, if negotiated.
+
+\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Software IOTLB bounce buffer capability}
+
+Devices which present the Software IOTLB bounce buffer capability
+SHOULD also offer the VIRTIO_F_SWIOTLB feature.
+
+\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Software IOTLB bounce buffer capability}
+
+The driver SHOULD use the offered buffer in preference to passing system
+memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
+feature, then the driver MUST use the offered buffer and never pass system
+memory addresses to the device.
+
 \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
 
 The VIRTIO_PCI_CAP_PCI_CFG capability


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-28 17:40       ` David Woodhouse
@ 2025-03-30 13:42         ` Michael S. Tsirkin
  2025-03-30 15:07           ` David Woodhouse
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-03-30 13:42 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote:
> On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
> > On Fri, 2025-03-21 at 14:32 -0400, Michael S. Tsirkin wrote:
> > > On Fri, Mar 21, 2025 at 03:38:10PM +0000, David Woodhouse wrote:
> > > > On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > > > > This series implements mitigations for lack of DMA access control on
> > > > > systems without an IOMMU, which could result in the DMA accessing the
> > > > > system memory at unexpected times and/or unexpected addresses, possibly
> > > > > leading to data leakage or corruption.
> > > > 
> > > > Replying to an ancient (2021) thread which has already been merged...
> > > > 
> > > > I'd like to be able to use this facility for virtio devices.
> > > > 
> > > > Virtio already has a complicated relationship with the DMA API, because
> > > > there were a bunch of early VMM bugs where the virtio devices where
> > > > magically exempted from IOMMU protection, but the VMM lied to the guest
> > > > and claimed they weren't.
> > > > 
> > > > With the advent of confidential computing, and the VMM (or whatever's
> > > > emulating the virtio device) not being *allowed* to arbitrarily access
> > > > all of the guest's memory, the DMA API becomes necessary again.
> > > > 
> > > > Either a virtual IOMMU needs to determine which guest memory the VMM
> > > > may access, or the DMA API is wrappers around operations which
> > > > share/unshare (or unencrypt/encrypt) the memory in question.
> > > > 
> > > > All of which is complicated and slow, if we're looking at a minimal
> > > > privileged hypervisor stub like pKVM which enforces the lack of guest
> > > > memory access from VMM.
> > > > 
> > > > I'm thinking of defining a new type of virtio-pci device which cannot
> > > > do DMA to arbitrary system memory. Instead it has an additional memory
> > > > BAR which is used as a SWIOTLB for bounce buffering.
> > > > 
> > > > The driver for it would look much like the existing virtio-pci device
> > > > except that it would register the restricted-dma region first (and thus
> > > > the swiotlb dma_ops), and then just go through the rest of the setup
> > > > like any other virtio device.
> > > > 
> > > > That seems like it ought to be fairly simple, and seems like a
> > > > reasonable way to allow an untrusted VMM to provide virtio devices with
> > > > restricted DMA access.
> > > > 
> > > > While I start actually doing the typing... does anyone want to start
> > > > yelling at me now? Christoph? mst? :)
> > > 
> > > 
> > > I don't mind as such (though I don't understand completely), but since
> > > this is changing the device anyway, I am a bit confused why you can't
> > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > > which will DTRT for you, will it not?
> > 
> > That would be necessary but not sufficient. ...

could you explain pls?

> My first cut at a proposed spec change looks something like this. I'll
> post it to the virtio-comment list once I've done some corporate
> bureaucracy and when the list stops sending me python tracebacks in
> response to my subscribe request.

the linux foundation one does this? maybe poke at the admins.

> In the meantime I'll hack up some QEMU and guest Linux driver support
> to match.
> 
> diff --git a/content.tex b/content.tex
> index c17ffa6..1e6e1d6 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  Currently these device-independent feature bits are defined:
>  
>  \begin{description}
> +  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
> +  provides a memory region which is to be used for bounce buffering,
> +  rather than permitting direct memory access to system memory.
>    \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
>    that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
>    flag set, as described in \ref{sec:Basic Facilities of a Virtio
> @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical
>  addresses to the device.
>  
> +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
> +then pass only addresses within the Software IOTLB bounce buffer to the
> +device.
> +
>  A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
>  
>  A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
> @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
>  accepted.
>  
> +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
> +provide a Software IOTLB bounce buffer.
> +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
> +
>  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
>  buffers in the same order in which they have been available.
>  
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..23e0d57 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>  \item ISR Status
>  \item Device-specific configuration (optional)
>  \item PCI configuration access
> +\item SWIOTLB bounce buffer
>  \end{itemize}
>  
>  Each structure can be mapped by a Base Address register (BAR) belonging to
> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>  /* Vendor-specific data */
>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> +/* Software IOTLB bounce buffer */
> +#define VIRTIO_PCI_CAP_SWIOTLB           10
>  \end{lstlisting}
>  
>          Any other value is reserved for future use.
> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>  The driver MUST qualify the \field{vendor_id} before
>  interpreting or writing into the Vendor data capability.
>  
> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +The optional Software IOTLB bounce buffer capability allows the
> +device to provide a memory region which can be used by the driver
> +driver for bounce buffering. This allows a device on the PCI
> +transport to operate without DMA access to system memory addresses.
> +
> +The Software IOTLB region is referenced by the
> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> +capability, if negotiated.


why not? an optimization?
A mix of swiotlb and system memory might be very challenging from POV
of ordering.


> +
> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +Devices which present the Software IOTLB bounce buffer capability
> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> +
> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +The driver SHOULD use the offered buffer in preference to passing system
> +memory addresses to the device.

Even if not using VIRTIO_F_SWIOTLB? Is that really necessary?

> If the driver accepts the VIRTIO_F_SWIOTLB
> +feature, then the driver MUST use the offered buffer and never pass system
> +memory addresses to the device.
> +
>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
>  
>  The VIRTIO_PCI_CAP_PCI_CFG capability
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-30 13:42         ` Michael S. Tsirkin
@ 2025-03-30 15:07           ` David Woodhouse
  2025-03-30 16:59             ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: David Woodhouse @ 2025-03-30 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

[-- Attachment #1: Type: text/plain, Size: 7731 bytes --]

On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote:
> On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote:
> > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
> > > > 
> > > > I don't mind as such (though I don't understand completely), but since
> > > > this is changing the device anyway, I am a bit confused why you can't
> > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > > > which will DTRT for you, will it not?
> > > 
> > > That would be necessary but not sufficient. ...
> 
> could you explain pls?

There was more to that in the previous email which I elided for this
followup.

https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@infradead.org/

> > My first cut at a proposed spec change looks something like this. I'll
> > post it to the virtio-comment list once I've done some corporate
> > bureaucracy and when the list stops sending me python tracebacks in
> > response to my subscribe request.
> 
> the linux foundation one does this? maybe poke at the admins.
> 
> > In the meantime I'll hack up some QEMU and guest Linux driver support
> > to match.
> > 
> > diff --git a/content.tex b/content.tex
> > index c17ffa6..1e6e1d6 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  Currently these device-independent feature bits are defined:
> >  
> >  \begin{description}
> > +  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
> > +  provides a memory region which is to be used for bounce buffering,
> > +  rather than permitting direct memory access to system memory.
> >    \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
> >    that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> >    flag set, as described in \ref{sec:Basic Facilities of a Virtio
> > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical
> >  addresses to the device.
> >  
> > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
> > +then pass only addresses within the Software IOTLB bounce buffer to the
> > +device.
> > +
> >  A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
> >  
> >  A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
> > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
> >  accepted.
> >  
> > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
> > +provide a Software IOTLB bounce buffer.
> > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
> > +
> >  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> >  buffers in the same order in which they have been available.
> >  
> > diff --git a/transport-pci.tex b/transport-pci.tex
> > index a5c6719..23e0d57 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >  \item ISR Status
> >  \item Device-specific configuration (optional)
> >  \item PCI configuration access
> > +\item SWIOTLB bounce buffer
> >  \end{itemize}
> >  
> >  Each structure can be mapped by a Base Address register (BAR) belonging to
> > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> >  /* Vendor-specific data */
> >  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> > +/* Software IOTLB bounce buffer */
> > +#define VIRTIO_PCI_CAP_SWIOTLB           10
> >  \end{lstlisting}
> >  
> >          Any other value is reserved for future use.
> > @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
> >  The driver MUST qualify the \field{vendor_id} before
> >  interpreting or writing into the Vendor data capability.
> >  
> > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +The optional Software IOTLB bounce buffer capability allows the
> > +device to provide a memory region which can be used by the driver
> > +driver for bounce buffering. This allows a device on the PCI
> > +transport to operate without DMA access to system memory addresses.
> > +
> > +The Software IOTLB region is referenced by the
> > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> > +capability, if negotiated.
> 
> 
> why not? an optimization?
> A mix of swiotlb and system memory might be very challenging from POV
> of ordering.

Conceptually, these addresses are *on* the PCI device. If the device is
accessing addresses which are local to it, they aren't subject to IOMMU
translation/filtering because they never even make it to the PCI bus as
memory transactions.

> 
> > +
> > +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +Devices which present the Software IOTLB bounce buffer capability
> > +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> > +
> > +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +The driver SHOULD use the offered buffer in preference to passing system
> > +memory addresses to the device.
> 
> Even if not using VIRTIO_F_SWIOTLB? Is that really necessary?

That part isn't strictly necessary, but I think it makes sense, for
cases where the SWIOTLB support is an *optimisation* even if it isn't
strictly necessary.

Why might it be an "optimisation"? Well... if we're thinking of a model
like pKVM where the VMM can't just arbitrarily access guest memory,
using the SWIOTLB is a simple way to avoid that (by using the on-board
memory instead, which *can* be shared with the VMM).

But if we want to go to extra lengths to support unenlightened guests,
an implementation might choose to just *disable* the memory protection
if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking
that guest.

Or it might have a complicated emulation/snooping of virtqueues in the
trusted part of the hypervisor so that it knows which addresses the
guest has truly *asked* the VMM to access. (And yes, of course that's
what an IOMMU is for, but when have you seen hardware companies design
a two-stage IOMMU which supports actual PCI passthrough *and* get it
right for the hypervisor to 'snoop' on the stage1 page tables to
support emulated devices too....)

Ultimately I think it was natural to advertise the location of the
buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the
separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious
question of what a device should do if it sees one but *not* the other.

Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually
being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its
equivalent for other transports). But the converse seemed reasonable as
a *hint* even if the use of the SWIOTLB isn't mandatory.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-30 15:07           ` David Woodhouse
@ 2025-03-30 16:59             ` Michael S. Tsirkin
  2025-03-30 20:07               ` David Woodhouse
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-03-30 16:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

On Sun, Mar 30, 2025 at 04:07:56PM +0100, David Woodhouse wrote:
> On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote:
> > On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote:
> > > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
> > > > > 
> > > > > I don't mind as such (though I don't understand completely), but since
> > > > > this is changing the device anyway, I am a bit confused why you can't
> > > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > > > > which will DTRT for you, will it not?
> > > > 
> > > > That would be necessary but not sufficient. ...
> > 
> > could you explain pls?
> 
> There was more to that in the previous email which I elided for this
> followup.
> 
> https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@infradead.org/
> 
> > > My first cut at a proposed spec change looks something like this. I'll
> > > post it to the virtio-comment list once I've done some corporate
> > > bureaucracy and when the list stops sending me python tracebacks in
> > > response to my subscribe request.
> > 
> > the linux foundation one does this? maybe poke at the admins.
> > 
> > > In the meantime I'll hack up some QEMU and guest Linux driver support
> > > to match.
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index c17ffa6..1e6e1d6 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >  Currently these device-independent feature bits are defined:
> > >  
> > >  \begin{description}
> > > +  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
> > > +  provides a memory region which is to be used for bounce buffering,
> > > +  rather than permitting direct memory access to system memory.
> > >    \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
> > >    that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> > >    flag set, as described in \ref{sec:Basic Facilities of a Virtio
> > > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >  VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical
> > >  addresses to the device.
> > >  
> > > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
> > > +then pass only addresses within the Software IOTLB bounce buffer to the
> > > +device.
> > > +
> > >  A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
> > >  
> > >  A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
> > > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >  A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
> > >  accepted.
> > >  
> > > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
> > > +provide a Software IOTLB bounce buffer.
> > > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
> > > +
> > >  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> > >  buffers in the same order in which they have been available.
> > >  
> > > diff --git a/transport-pci.tex b/transport-pci.tex
> > > index a5c6719..23e0d57 100644
> > > --- a/transport-pci.tex
> > > +++ b/transport-pci.tex
> > > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> > >  \item ISR Status
> > >  \item Device-specific configuration (optional)
> > >  \item PCI configuration access
> > > +\item SWIOTLB bounce buffer
> > >  \end{itemize}
> > >  
> > >  Each structure can be mapped by a Base Address register (BAR) belonging to
> > > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> > >  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> > >  /* Vendor-specific data */
> > >  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> > > +/* Software IOTLB bounce buffer */
> > > +#define VIRTIO_PCI_CAP_SWIOTLB           10
> > >  \end{lstlisting}
> > >  
> > >          Any other value is reserved for future use.
> > > @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
> > >  The driver MUST qualify the \field{vendor_id} before
> > >  interpreting or writing into the Vendor data capability.
> > >  
> > > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > > +Software IOTLB bounce buffer capability}
> > > +
> > > +The optional Software IOTLB bounce buffer capability allows the
> > > +device to provide a memory region which can be used by the driver
> > > +driver for bounce buffering. This allows a device on the PCI
> > > +transport to operate without DMA access to system memory addresses.
> > > +
> > > +The Software IOTLB region is referenced by the
> > > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> > > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> > > +capability, if negotiated.
> > 
> > 
> > why not? an optimization?
> > A mix of swiotlb and system memory might be very challenging from POV
> > of ordering.
> 
> Conceptually, these addresses are *on* the PCI device. If the device is
> accessing addresses which are local to it, they aren't subject to IOMMU
> translation/filtering because they never even make it to the PCI bus as
> memory transactions.
> 
> > 
> > > +
> > > +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > > +Software IOTLB bounce buffer capability}
> > > +
> > > +Devices which present the Software IOTLB bounce buffer capability
> > > +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> > > +
> > > +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > > +Software IOTLB bounce buffer capability}
> > > +
> > > +The driver SHOULD use the offered buffer in preference to passing system
> > > +memory addresses to the device.
> > 
> > Even if not using VIRTIO_F_SWIOTLB? Is that really necessary?
> 
> That part isn't strictly necessary, but I think it makes sense, for
> cases where the SWIOTLB support is an *optimisation* even if it isn't
> strictly necessary.
> 
> Why might it be an "optimisation"? Well... if we're thinking of a model
> like pKVM where the VMM can't just arbitrarily access guest memory,
> using the SWIOTLB is a simple way to avoid that (by using the on-board
> memory instead, which *can* be shared with the VMM).
> 
> But if we want to go to extra lengths to support unenlightened guests,
> an implementation might choose to just *disable* the memory protection
> if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking
> that guest.
> 
> Or it might have a complicated emulation/snooping of virtqueues in the
> trusted part of the hypervisor so that it knows which addresses the
> guest has truly *asked* the VMM to access. (And yes, of course that's
> what an IOMMU is for, but when have you seen hardware companies design
> a two-stage IOMMU which supports actual PCI passthrough *and* get it
> right for the hypervisor to 'snoop' on the stage1 page tables to
> support emulated devices too....)
> 
> Ultimately I think it was natural to advertise the location of the
> buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the
> separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious
> question of what a device should do if it sees one but *not* the other.
> 
> Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually
> being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its
> equivalent for other transports). But the converse seemed reasonable as
> a *hint* even if the use of the SWIOTLB isn't mandatory.

OK but I feel it's more work than you think, so we really need
a better reason than just "why not".

For example, it's not at all clear to me how the ordering is
going to work if buffers are in memory but the ring is swiotlb
or the reverse. Ordering will all be messed up.

-- 
MST


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-30 16:59             ` Michael S. Tsirkin
@ 2025-03-30 20:07               ` David Woodhouse
  0 siblings, 0 replies; 35+ messages in thread
From: David Woodhouse @ 2025-03-30 20:07 UTC (permalink / raw)
  To: linuxppc-dev, Michael S. Tsirkin
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, Nicolas Boichat, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

On 30 March 2025 17:59:13 BST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>On Sun, Mar 30, 2025 at 04:07:56PM +0100, David Woodhouse wrote:
>> On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote:
>> > On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote:
>> > > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
>> > > > > 
>> > > > > I don't mind as such (though I don't understand completely), but since
>> > > > > this is changing the device anyway, I am a bit confused why you can't
>> > > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
>> > > > > which will DTRT for you, will it not?
>> > > > 
>> > > > That would be necessary but not sufficient. ...
>> > 
>> > could you explain pls?
>> 
>> There was more to that in the previous email which I elided for this
>> followup.
>> 
>> https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@infradead.org/
>> 
>> > > My first cut at a proposed spec change looks something like this. I'll
>> > > post it to the virtio-comment list once I've done some corporate
>> > > bureaucracy and when the list stops sending me python tracebacks in
>> > > response to my subscribe request.
>> > 
>> > the linux foundation one does this? maybe poke at the admins.
>> > 
>> > > In the meantime I'll hack up some QEMU and guest Linux driver support
>> > > to match.
>> > > 
>> > > diff --git a/content.tex b/content.tex
>> > > index c17ffa6..1e6e1d6 100644
>> > > --- a/content.tex
>> > > +++ b/content.tex
>> > > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>> > >  Currently these device-independent feature bits are defined:
>> > >  
>> > >  \begin{description}
>> > > +  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
>> > > +  provides a memory region which is to be used for bounce buffering,
>> > > +  rather than permitting direct memory access to system memory.
>> > >    \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
>> > >    that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
>> > >    flag set, as described in \ref{sec:Basic Facilities of a Virtio
>> > > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>> > >  VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical
>> > >  addresses to the device.
>> > >  
>> > > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
>> > > +then pass only addresses within the Software IOTLB bounce buffer to the
>> > > +device.
>> > > +
>> > >  A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
>> > >  
>> > >  A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
>> > > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>> > >  A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
>> > >  accepted.
>> > >  
>> > > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
>> > > +provide a Software IOTLB bounce buffer.
>> > > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
>> > > +
>> > >  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
>> > >  buffers in the same order in which they have been available.
>> > >  
>> > > diff --git a/transport-pci.tex b/transport-pci.tex
>> > > index a5c6719..23e0d57 100644
>> > > --- a/transport-pci.tex
>> > > +++ b/transport-pci.tex
>> > > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>> > >  \item ISR Status
>> > >  \item Device-specific configuration (optional)
>> > >  \item PCI configuration access
>> > > +\item SWIOTLB bounce buffer
>> > >  \end{itemize}
>> > >  
>> > >  Each structure can be mapped by a Base Address register (BAR) belonging to
>> > > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>> > >  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>> > >  /* Vendor-specific data */
>> > >  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
>> > > +/* Software IOTLB bounce buffer */
>> > > +#define VIRTIO_PCI_CAP_SWIOTLB           10
>> > >  \end{lstlisting}
>> > >  
>> > >          Any other value is reserved for future use.
>> > > @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>> > >  The driver MUST qualify the \field{vendor_id} before
>> > >  interpreting or writing into the Vendor data capability.
>> > >  
>> > > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
>> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>> > > +Software IOTLB bounce buffer capability}
>> > > +
>> > > +The optional Software IOTLB bounce buffer capability allows the
>> > > +device to provide a memory region which can be used by the driver
>> > > +driver for bounce buffering. This allows a device on the PCI
>> > > +transport to operate without DMA access to system memory addresses.
>> > > +
>> > > +The Software IOTLB region is referenced by the
>> > > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
>> > > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
>> > > +capability, if negotiated.
>> > 
>> > 
>> > why not? an optimization?
>> > A mix of swiotlb and system memory might be very challenging from POV
>> > of ordering.
>> 
>> Conceptually, these addresses are *on* the PCI device. If the device is
>> accessing addresses which are local to it, they aren't subject to IOMMU
>> translation/filtering because they never even make it to the PCI bus as
>> memory transactions.
>> 
>> > 
>> > > +
>> > > +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>> > > +Software IOTLB bounce buffer capability}
>> > > +
>> > > +Devices which present the Software IOTLB bounce buffer capability
>> > > +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
>> > > +
>> > > +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>> > > +Software IOTLB bounce buffer capability}
>> > > +
>> > > +The driver SHOULD use the offered buffer in preference to passing system
>> > > +memory addresses to the device.
>> > 
>> > Even if not using VIRTIO_F_SWIOTLB? Is that really necessary?
>> 
>> That part isn't strictly necessary, but I think it makes sense, for
>> cases where the SWIOTLB support is an *optimisation* even if it isn't
>> strictly necessary.
>> 
>> Why might it be an "optimisation"? Well... if we're thinking of a model
>> like pKVM where the VMM can't just arbitrarily access guest memory,
>> using the SWIOTLB is a simple way to avoid that (by using the on-board
>> memory instead, which *can* be shared with the VMM).
>> 
>> But if we want to go to extra lengths to support unenlightened guests,
>> an implementation might choose to just *disable* the memory protection
>> if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking
>> that guest.
>> 
>> Or it might have a complicated emulation/snooping of virtqueues in the
>> trusted part of the hypervisor so that it knows which addresses the
>> guest has truly *asked* the VMM to access. (And yes, of course that's
>> what an IOMMU is for, but when have you seen hardware companies design
>> a two-stage IOMMU which supports actual PCI passthrough *and* get it
>> right for the hypervisor to 'snoop' on the stage1 page tables to
>> support emulated devices too....)
>> 
>> Ultimately I think it was natural to advertise the location of the
>> buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the
>> separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious
>> question of what a device should do if it sees one but *not* the other.
>> 
>> Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually
>> being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its
>> equivalent for other transports). But the converse seemed reasonable as
>> a *hint* even if the use of the SWIOTLB isn't mandatory.
>
>OK but I feel it's more work than you think, so we really need
>a better reason than just "why not".
>
>For example, it's not at all clear to me how the ordering is
>going to work if buffers are in memory but the ring is swiotlb
>or the reverse. Ordering will all be messed up.

Maybe. Although by the time the driver has *observed* the data written to the swiotlb on the device's BAR, it has had to cross the same PCI bus.

But sure, we could require all-or-nothing. Or require that the SWIOTLB only be used if the driver negotiates VIRTIO_F_SWIOTLB.

Even in the latter case we can still allow for SWIOTLB to either be a requirement or a hint, purely down to whether the device *allows* the driver not to negotiate `VIRTIO_F_SWIOTLB`.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-21 18:42     ` David Woodhouse
  2025-03-28 17:40       ` David Woodhouse
@ 2025-03-30 17:06       ` Michael S. Tsirkin
  2025-03-30 21:27         ` David Woodhouse
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-03-30 17:06 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, linuxppc-dev, Nicolas Boichat,
	Dan Williams, Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

On Fri, Mar 21, 2025 at 06:42:20PM +0000, David Woodhouse wrote:
> On Fri, 2025-03-21 at 14:32 -0400, Michael S. Tsirkin wrote:
> > On Fri, Mar 21, 2025 at 03:38:10PM +0000, David Woodhouse wrote:
> > > On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > > > This series implements mitigations for lack of DMA access control on
> > > > systems without an IOMMU, which could result in the DMA accessing the
> > > > system memory at unexpected times and/or unexpected addresses, possibly
> > > > leading to data leakage or corruption.
> > > 
> > > Replying to an ancient (2021) thread which has already been merged...
> > > 
> > > I'd like to be able to use this facility for virtio devices.
> > > 
> > > Virtio already has a complicated relationship with the DMA API, because
> > > there were a bunch of early VMM bugs where the virtio devices where
> > > magically exempted from IOMMU protection, but the VMM lied to the guest
> > > and claimed they weren't.
> > > 
> > > With the advent of confidential computing, and the VMM (or whatever's
> > > emulating the virtio device) not being *allowed* to arbitrarily access
> > > all of the guest's memory, the DMA API becomes necessary again.
> > > 
> > > Either a virtual IOMMU needs to determine which guest memory the VMM
> > > may access, or the DMA API is wrappers around operations which
> > > share/unshare (or unencrypt/encrypt) the memory in question.
> > > 
> > > All of which is complicated and slow, if we're looking at a minimal
> > > privileged hypervisor stub like pKVM which enforces the lack of guest
> > > memory access from VMM.
> > > 
> > > I'm thinking of defining a new type of virtio-pci device which cannot
> > > do DMA to arbitrary system memory. Instead it has an additional memory
> > > BAR which is used as a SWIOTLB for bounce buffering.
> > > 
> > > The driver for it would look much like the existing virtio-pci device
> > > except that it would register the restricted-dma region first (and thus
> > > the swiotlb dma_ops), and then just go through the rest of the setup
> > > like any other virtio device.
> > > 
> > > That seems like it ought to be fairly simple, and seems like a
> > > reasonable way to allow an untrusted VMM to provide virtio devices with
> > > restricted DMA access.
> > > 
> > > While I start actually doing the typing... does anyone want to start
> > > yelling at me now? Christoph? mst? :)
> > 
> > 
> > I don't mind as such (though I don't understand completely), but since
> > this is changing the device anyway, I am a bit confused why you can't
> > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > which will DTRT for you, will it not?
> 
> That would be necessary but not sufficient. The question is *what* does
> the DMA API do?
> 
> For a real passthrough PCI device, perhaps we'd have a vIOMMU exposed
> to the guest so that it can do real protection with two-stage page
> tables (IOVA→GPA under control of the guest, GPA→HPA under control of
> the hypervisor). For that to work in the pKVM model though, you'd need
> pKVM to be talking the guest's stage1 I/O page tables to see if a given
> access from the VMM ought to be permitted?
> 
> Or for confidential guests there could be DMA ops which are an
> 'enlightenment'; a hypercall into pKVM to share/unshare pages so that
> the VMM can actually access them, or SEV-SNP guests might mark pages
> unencrypted to have the same effect with hardware protection.
> 
> Doing any of those dynamically to allow the VMM to access buffers in
> arbitrary guest memory (when it wouldn't normally have access to
> arbitrary guest memory) is complex and doesn't perform very well. And
> exposes a full 4KiB page for any byte that needs to be made available.
> 
> Thus the idea of having a fixed range of memory to use for a SWIOTLB,
> which is fairly much what the restricted DMA setup is all about.
> 
> We're just proposing that we build it in to a virtio-pci device model,
> which automatically uses the extra memory BAR instead of the
> restricted-dma-pool DT node.
> 
> It's basically just allowing us to expose through PCI, what I believe
> we can already do for virtio in DT.

I am not saying I am against this extension.
The idea to restrict DMA has a lot of merit outside pkvm.
For example, with a physical devices, limiting its DMA
to a fixed range can be good for security at a cost of
an extra data copy.

So I am not saying we have to block this specific hack.

what worries me fundamentally is I am not sure it works well
e.g. for physical virtio cards.
Attempts to pass data between devices will now also require
extra data copies.


Did you think about adding an swiotlb mode to virtio-iommu at all?
Much easier than parsing page tables.



-- 
MST


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-30 17:06       ` Michael S. Tsirkin
@ 2025-03-30 21:27         ` David Woodhouse
  2025-03-30 21:48           ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: David Woodhouse @ 2025-03-30 21:27 UTC (permalink / raw)
  To: linuxppc-dev, Michael S. Tsirkin
  Cc: Claire Chang, Rob Herring, mpe, Joerg Roedel, Will Deacon,
	Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky, jgross,
	Christoph Hellwig, Marek Szyprowski, heikki.krogerus, peterz,
	benh, grant.likely, paulus, mingo, sstabellini, Saravana Kannan,
	xypron.glpk, Rafael J . Wysocki, Bartosz Golaszewski, xen-devel,
	Thierry Reding, linux-devicetree, Nicolas Boichat, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

On 30 March 2025 18:06:47 BST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> It's basically just allowing us to expose through PCI, what I believe
>> we can already do for virtio in DT.
>
>I am not saying I am against this extension.
>The idea to restrict DMA has a lot of merit outside pkvm.
>For example, with a physical devices, limiting its DMA
>to a fixed range can be good for security at a cost of
>an extra data copy.
>
>So I am not saying we have to block this specific hack.
>
>what worries me fundamentally is I am not sure it works well
>e.g. for physical virtio cards.

Not sure why it doesn't work for physical cards. They don't need to be bus-mastering; they just take data from a buffer in their own RAM.

>Attempts to pass data between devices will now also require
>extra data copies.

Yes. I think that's acceptable, but if we really cared we could perhaps extend the capability to refer to a range inside a given BAR on a specific *device*? Or maybe just *function*, and allow sharing of SWIOTLB buffer within a multi-function device?

I think it's overkill though.

>Did you think about adding an swiotlb mode to virtio-iommu at all?
>Much easier than parsing page tables.

Often the guests which need this will have a real IOMMU for the true pass-through devices. Adding a virtio-iommu into the mix (or any other system-wide way of doing something different for certain devices) is problematic.

The on-device buffer keeps it nice and simple, and even allows us to do device support for operating systems like Windows where it's a lot harder to do anything generic in the core OS.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-30 21:27         ` David Woodhouse
@ 2025-03-30 21:48           ` Michael S. Tsirkin
  2025-03-31  9:42             ` David Woodhouse
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-03-30 21:48 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linuxppc-dev, Claire Chang, Rob Herring, mpe, Joerg Roedel,
	Will Deacon, Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky,
	jgross, Christoph Hellwig, Marek Szyprowski, heikki.krogerus,
	peterz, benh, grant.likely, paulus, mingo, sstabellini,
	Saravana Kannan, xypron.glpk, Rafael J . Wysocki,
	Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
	Nicolas Boichat, Dan Williams, Andy Shevchenko, Greg KH,
	Randy Dunlap, lkml, list@263.net:IOMMU DRIVERS, Jim Quinlan,
	Robin Murphy, hch, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	virtualization, graf

On Sun, Mar 30, 2025 at 10:27:58PM +0100, David Woodhouse wrote:
> On 30 March 2025 18:06:47 BST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> It's basically just allowing us to expose through PCI, what I believe
> >> we can already do for virtio in DT.
> >
> >I am not saying I am against this extension.
> >The idea to restrict DMA has a lot of merit outside pkvm.
> >For example, with a physical devices, limiting its DMA
> >to a fixed range can be good for security at a cost of
> >an extra data copy.
> >
> >So I am not saying we have to block this specific hack.
> >
> >what worries me fundamentally is I am not sure it works well
> >e.g. for physical virtio cards.
> 
> Not sure why it doesn't work for physical cards. They don't need to be bus-mastering; they just take data from a buffer in their own RAM.

I mean, it kind of does, it is just that CPU pulling data over the PCI bus
stalls it so is very expensive. It is not by chance people switched to
DMA almost exclusively.


> >Attempts to pass data between devices will now also require
> >extra data copies.
> 
> Yes. I think that's acceptable, but if we really cared we could perhaps extend the capability to refer to a range inside a given BAR on a specific *device*? Or maybe just *function*, and allow sharing of SWIOTLB buffer within a multi-function device?

Fundamentally, this is what dmabuf does.

> I think it's overkill though.
> 
> >Did you think about adding an swiotlb mode to virtio-iommu at all?
> >Much easier than parsing page tables.
> 
> Often the guests which need this will have a real IOMMU for the true
> pass-through devices.

Not sure I understand. You mean with things like stage 2 passthrough?

> Adding a virtio-iommu into the mix (or any other
> system-wide way of doing something different for certain devices) is
> problematic.

OK... but the issue isn't specific to no DMA devices, is it?


> The on-device buffer keeps it nice and simple,

I am not saying it is not.
It's just a little boutique.

> and even allows us to
> do device support for operating systems like Windows where it's a lot
> harder to do anything generic in the core OS.

Well we do need virtio iommu windows support sooner or later, anyway.

-- 
MST


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Using Restricted DMA for virtio-pci
  2025-03-30 21:48           ` Michael S. Tsirkin
@ 2025-03-31  9:42             ` David Woodhouse
  0 siblings, 0 replies; 35+ messages in thread
From: David Woodhouse @ 2025-03-31  9:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linuxppc-dev, Claire Chang, Rob Herring, mpe, Joerg Roedel,
	Will Deacon, Frank Rowand, Konrad Rzeszutek Wilk, boris.ostrovsky,
	jgross, Christoph Hellwig, Marek Szyprowski, heikki.krogerus,
	peterz, benh, grant.likely, paulus, mingo, sstabellini,
	Saravana Kannan, xypron.glpk, Rafael J . Wysocki, xen-devel,
	Thierry Reding, linux-devicetree, Nicolas Boichat, Dan Williams,
	Andy Shevchenko, Greg KH, Randy Dunlap, lkml,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, hch,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, virtualization, graf

[-- Attachment #1: Type: text/plain, Size: 6846 bytes --]

On Sun, 2025-03-30 at 17:48 -0400, Michael S. Tsirkin wrote:
> On Sun, Mar 30, 2025 at 10:27:58PM +0100, David Woodhouse wrote:
> > On 30 March 2025 18:06:47 BST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > It's basically just allowing us to expose through PCI, what I believe
> > > > we can already do for virtio in DT.
> > > 
> > > I am not saying I am against this extension.
> > > The idea to restrict DMA has a lot of merit outside pkvm.
> > > For example, with a physical devices, limiting its DMA
> > > to a fixed range can be good for security at a cost of
> > > an extra data copy.
> > > 
> > > So I am not saying we have to block this specific hack.
> > > 
> > > what worries me fundamentally is I am not sure it works well
> > > e.g. for physical virtio cards.
> > 
> > Not sure why it doesn't work for physical cards. They don't need to
> > be bus-mastering; they just take data from a buffer in their own
> > RAM.
> 
> I mean, it kind of does, it is just that CPU pulling data over the PCI bus
> stalls it so is very expensive. It is not by chance people switched to
> DMA almost exclusively.

Yes. For a physical implementation it would not be the most high-
performance option... unless DMA is somehow blocked as it is in the
pKVM+virt case.

In the case of a virtual implementation, however, the performance is
not an issue because it'll be backed by host memory anyway. (It's just
that because it's presented to the guest and the trusted part of the
hypervisor as PCI BAR space instead of main memory, it's a whole lot
more practical to deal with the fact that it's *shared* with the VMM.)

> > > Attempts to pass data between devices will now also require
> > > extra data copies.
> > 
> > Yes. I think that's acceptable, but if we really cared we could
> > perhaps extend the capability to refer to a range inside a given
> > BAR on a specific *device*? Or maybe just *function*, and allow
> > sharing of SWIOTLB buffer within a multi-function device?
> 
> Fundamentally, this is what dmabuf does.

In software, yes. Extending it to hardware is a little harder.

In principle, it might be quite nice to offer a single SWIOTLB buffer
region (in a BAR of one device) and have multiple virtio devices share
it. Not just because of passing data between devices, as you mentioned,
but also because it'll be a more efficient use of memory than each
device having its own buffer and allocation pool.

So how would a device indicate that it can use a SWIOTLB buffer which
is in a BAR of a *different* device?

Not by physical address, because BARs get moved around.
Not even by PCI bus/dev/fn/BAR# because *buses* get renumbered.

You could limit it to sharing within one PCI "bus", and use just
dev/fn/BAR#? Or even within one PCI device and just fn/BAR#? The latter
could theoretically be usable by multi-function physical devices.

The standard struct virtio_pci_cap (which I used for
VIRTIO_PCI_CAP_SWIOTLB) just contains BAR and offset/length. We could
extend it with device + function, using -1 for 'self', to allow for
such sharing?

Still not convinced it isn't overkill, but it's certainly easy enough
to add on the *spec* side. I haven't yet looked at how that sharing
would work in Linux on the guest side; thus far what I'm proposing is
intended to be almost identical to the per-device thing that should
already work with a `restricted-dma-pool' node in device-tree.

> > I think it's overkill though.
> > 
> > > Did you think about adding an swiotlb mode to virtio-iommu at all?
> > > Much easier than parsing page tables.
> > 
> > Often the guests which need this will have a real IOMMU for the true
> > pass-through devices.
> 
> Not sure I understand. You mean with things like stage 2 passthrough?

Yes. AMD's latest IOMMU spec documents it, for example. Exposing a
'vIOMMU' to the guest which handles just stage 1 (IOVA→GPA) while the
hypervisor controls the normal GPA→HPA translation in stage 2.

Then the guest gets an accelerated path *directly* to the hardware for
its IOTLB flushes... which means the hypervisor doesn't get to *see*
those IOTLB flushes so it's a PITA to do device emulation as if it's
covered by that same IOMMU.

(Actually I haven't checked the AMD one in detail for that flaw; most
*other* 2-stage IOMMUs I've seen do have it, and I *bet* AMD does too).

> > Adding a virtio-iommu into the mix (or any other
> > system-wide way of doing something different for certain devices) is
> > problematic.
> 
> OK... but the issue isn't specific to no DMA devices, is it?

Hm? Allowing virtio devices to operate as "no-DMA devices" is a
*workaround* for the issue.

The issue is that the VMM may not have full access to the guest's
memory for emulating devices. These days, virtio covers a large
proportion of emulated devices.

So I do think the issue is fairly specific to virtio devices, and
suspect that's what you meant to type above?

We pondered teaching the trusted part of the hypervisor (e.g. pKVM) to
snoop on virtqueues enough to 'know' which memory the VMM was genuinely
being *invited* to read/write... and we ran away screaming. (In order
to have sufficient trust, you end up not just snooping but implementing
quite a lot of the emulation on the trusted side. And then complex
enlightenments in the VMM and the untrusted Linux/KVM which hosts it,
to interact with that.)

Then we realised that for existing DT guests it's trivial just to add
the `restricted-dma-pool` node. And wanted to do the same for the
guests who are afflicted with UEFI/ACPI too. So here we are, trying to
add the same capability to virtio-pci.

> > The on-device buffer keeps it nice and simple,
> 
> I am not saying it is not.
> It's just a little boutique.

Fair. Although with the advent of confidential computing and
restrictions on guest memory access, perhaps becoming less boutique
over time?

And it should also be fairly low-friction; it's a whole lot cleaner in
the spec than the awful VIRTIO_F_ACCESS_PLATFORM legacy, and even in
the Linux guest driver it should work fairly simply given the existing
restricted-dma support (although of course that shouldn't entirely be
our guiding motivation).

> > and even allows us to
> > do device support for operating systems like Windows where it's a lot
> > harder to do anything generic in the core OS.
> 
> Well we do need virtio iommu windows support sooner or later, anyway.

Heh, good luck with that :)

And actually, doesn't that only support *DMA* remapping? So you still
wouldn't be able to boot a Windows guest with >255 vCPUs without some
further enlightenment (like Windows guests finally supporting the 15-
bit MSI extension that even Hyper-V supports on the host side...)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-03-31  9:42 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-09  6:21 [PATCH v4 00/14] Restricted DMA Claire Chang
2021-02-09  6:21 ` [PATCH v4 01/14] swiotlb: Remove external access to io_tlb_start Claire Chang
2021-02-09  8:40   ` Christoph Hellwig
2021-02-09  6:21 ` [PATCH v4 02/14] swiotlb: Move is_swiotlb_buffer() to swiotlb.c Claire Chang
2021-02-09  6:21 ` [PATCH v4 03/14] swiotlb: Add struct swiotlb Claire Chang
2021-02-09  6:21 ` [PATCH v4 04/14] swiotlb: Refactor swiotlb_late_init_with_tbl Claire Chang
2021-02-09  6:21 ` [PATCH v4 05/14] swiotlb: Add DMA_RESTRICTED_POOL Claire Chang
2021-02-09  6:21 ` [PATCH v4 06/14] swiotlb: Add restricted DMA pool Claire Chang
2021-02-09  6:21 ` [PATCH v4 07/14] swiotlb: Update swiotlb API to gain a struct device argument Claire Chang
2021-02-09  6:21 ` [PATCH v4 08/14] swiotlb: Use restricted DMA pool if available Claire Chang
2021-02-09  6:21 ` [PATCH v4 09/14] swiotlb: Refactor swiotlb_tbl_{map,unmap}_single Claire Chang
2021-02-09  6:21 ` [PATCH v4 10/14] dma-direct: Add a new wrapper __dma_direct_free_pages() Claire Chang
2021-02-09  6:21 ` [PATCH v4 11/14] swiotlb: Add is_dev_swiotlb_force() Claire Chang
2021-02-09  6:21 ` [PATCH v4 12/14] swiotlb: Add restricted DMA alloc/free support Claire Chang
2021-02-26  4:17   ` Claire Chang
2021-02-26  5:17     ` Christoph Hellwig
2021-02-26  9:35       ` Claire Chang
2021-02-09  6:21 ` [PATCH v4 13/14] dt-bindings: of: Add restricted DMA pool Claire Chang
2021-03-10 16:07   ` Will Deacon
2021-03-10 21:40     ` Rob Herring
2021-03-11  5:04       ` Florian Fainelli
2021-02-09  6:21 ` [PATCH v4 14/14] of: Add plumbing for " Claire Chang
2021-04-22  8:20 ` [PATCH v4 00/14] Restricted DMA Claire Chang
2025-03-21 15:38 ` Using Restricted DMA for virtio-pci David Woodhouse
2025-03-21 18:32   ` Michael S. Tsirkin
2025-03-21 18:42     ` David Woodhouse
2025-03-28 17:40       ` David Woodhouse
2025-03-30 13:42         ` Michael S. Tsirkin
2025-03-30 15:07           ` David Woodhouse
2025-03-30 16:59             ` Michael S. Tsirkin
2025-03-30 20:07               ` David Woodhouse
2025-03-30 17:06       ` Michael S. Tsirkin
2025-03-30 21:27         ` David Woodhouse
2025-03-30 21:48           ` Michael S. Tsirkin
2025-03-31  9:42             ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).