linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/16] dma-mapping: migrate to physical address-based API
@ 2025-08-04 12:42 Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
                   ` (16 more replies)
  0 siblings, 17 replies; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

Changelog:
v1:
 * Added new DMA_ATTR_MMIO attribute to indicate
   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
 * Rewrote dma_map_* functions to use thus new attribute
v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
------------------------------------------------------------------------

This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series maintains export symbol backward compatibility by keeping
the old page-based API as wrapper functions around the new physical
address-based implementations.

Thanks

Leon Romanovsky (16):
  dma-mapping: introduce new DMA attribute to indicate MMIO memory
  iommu/dma: handle MMIO path in dma_iova_link
  dma-debug: refactor to use physical addresses for page mapping
  dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
  dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  kmsan: convert kmsan_handle_dma to use physical addresses
  dma-mapping: handle MMIO flow in dma_map|unmap_page
  xen: swiotlb: Open code map_resource callback
  dma-mapping: export new dma_*map_phys() interface
  mm/hmm: migrate to physical address-based DMA mapping API
  mm/hmm: properly take MMIO path
  block-dma: migrate to dma_map_phys instead of map_page
  block-dma: properly take MMIO path
  nvme-pci: unmap MMIO pages with appropriate interface

 Documentation/core-api/dma-api.rst        |   4 +-
 Documentation/core-api/dma-attributes.rst |   7 ++
 arch/powerpc/kernel/dma-iommu.c           |   4 +-
 block/blk-mq-dma.c                        |  15 ++-
 drivers/iommu/dma-iommu.c                 |  69 +++++++------
 drivers/nvme/host/pci.c                   |  18 +++-
 drivers/virtio/virtio_ring.c              |   4 +-
 drivers/xen/swiotlb-xen.c                 |  21 +++-
 include/linux/blk-mq-dma.h                |   6 +-
 include/linux/blk_types.h                 |   2 +
 include/linux/dma-direct.h                |   2 -
 include/linux/dma-map-ops.h               |   8 +-
 include/linux/dma-mapping.h               |  27 +++++
 include/linux/iommu-dma.h                 |  11 +--
 include/linux/kmsan.h                     |  12 ++-
 include/trace/events/dma.h                |   9 +-
 kernel/dma/debug.c                        |  71 ++++---------
 kernel/dma/debug.h                        |  37 ++-----
 kernel/dma/direct.c                       |  22 +----
 kernel/dma/direct.h                       |  50 ++++++----
 kernel/dma/mapping.c                      | 115 +++++++++++++---------
 kernel/dma/ops_helpers.c                  |   6 +-
 mm/hmm.c                                  |  19 ++--
 mm/kmsan/hooks.c                          |  36 +++++--
 rust/kernel/dma.rs                        |   3 +
 tools/virtio/linux/kmsan.h                |   2 +-
 26 files changed, 320 insertions(+), 260 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-06 17:31   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 02/16] iommu/dma: handle MMIO path in dma_iova_link Leon Romanovsky
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
that reside in memory-mapped I/O (MMIO) regions, such as device BARs
exposed through the host bridge, which are accessible for peer-to-peer
(P2P) DMA.

This attribute is especially useful for exporting device memory to other
devices for DMA without CPU involvement, and avoids unnecessary or
potentially detrimental CPU cache maintenance calls.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/core-api/dma-attributes.rst |  7 +++++++
 include/linux/dma-mapping.h               | 14 ++++++++++++++
 include/trace/events/dma.h                |  3 ++-
 rust/kernel/dma.rs                        |  3 +++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e926..91acd2684e506 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,10 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
 subsystem that the buffer is fully accessible at the elevated privilege
 level (and ideally inaccessible or at least read-only at the
 lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute is especially useful for exporting device memory to other
+devices for DMA without CPU involvement, and avoids unnecessary or
+potentially detrimental CPU cache maintenance calls.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb3..afc89835c7457 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,20 @@
  */
 #define DMA_ATTR_PRIVILEGED		(1UL << 9)
 
+/*
+ * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
+ *
+ * This attribute is used for MMIO memory regions that are exposed through
+ * the host bridge and are accessible for peer-to-peer (P2P) DMA. Memory
+ * marked with this attribute is not system RAM and may represent device
+ * BAR windows or peer-exposed memory.
+ *
+ * Typical usage is for mapping hardware memory BARs or exporting device
+ * memory to other devices for DMA without involving main system RAM.
+ * The attribute guarantees no CPU cache maintenance calls will be made.
+ */
+#define DMA_ATTR_MMIO		(1UL << 10)
+
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
  * be given to a device to use as a DMA source or target.  It is specific to a
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c8..ee90d6f1dcf35 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -31,7 +31,8 @@ TRACE_DEFINE_ENUM(DMA_NONE);
 		{ DMA_ATTR_FORCE_CONTIGUOUS, "FORCE_CONTIGUOUS" }, \
 		{ DMA_ATTR_ALLOC_SINGLE_PAGES, "ALLOC_SINGLE_PAGES" }, \
 		{ DMA_ATTR_NO_WARN, "NO_WARN" }, \
-		{ DMA_ATTR_PRIVILEGED, "PRIVILEGED" })
+		{ DMA_ATTR_PRIVILEGED, "PRIVILEGED" }, \
+		{ DMA_ATTR_MMIO, "MMIO" })
 
 DECLARE_EVENT_CLASS(dma_map,
 	TP_PROTO(struct device *dev, phys_addr_t phys_addr, dma_addr_t dma_addr,
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 2bc8ab51ec280..61d9eed7a786e 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -242,6 +242,9 @@ pub mod attrs {
     /// Indicates that the buffer is fully accessible at an elevated privilege level (and
     /// ideally inaccessible or at least read-only at lesser-privileged levels).
     pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+
+    /// Indicates that the buffer is MMIO memory.
+    pub const DMA_ATTR_MMIO: Attrs = Attrs(bindings::DMA_ATTR_MMIO);
 }
 
 /// An abstraction of the `dma_alloc_coherent` API.
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 02/16] iommu/dma: handle MMIO path in dma_iova_link
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-06 18:10   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Make sure that CPU is not synced if MMIO path is taken.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fef..399838c17b705 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1837,13 +1837,20 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
 		phys_addr_t phys, size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
 {
-	bool coherent = dev_is_dma_coherent(dev);
+	int prot;
 
-	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		arch_sync_dma_for_device(phys, size, dir);
+	if (attrs & DMA_ATTR_MMIO)
+		prot = dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO;
+	else {
+		bool coherent = dev_is_dma_coherent(dev);
+
+		if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+			arch_sync_dma_for_device(phys, size, dir);
+		prot = dma_info_to_prot(dir, coherent, attrs);
+	}
 
 	return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
-			dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
+			prot, GFP_ATOMIC);
 }
 
 static int iommu_dma_iova_bounce_and_link(struct device *dev, dma_addr_t addr,
@@ -1949,9 +1956,13 @@ int dma_iova_link(struct device *dev, struct dma_iova_state *state,
 		return -EIO;
 
 	if (dev_use_swiotlb(dev, size, dir) &&
-	    iova_unaligned(iovad, phys, size))
+	    iova_unaligned(iovad, phys, size)) {
+		if (attrs & DMA_ATTR_MMIO)
+			return -EPERM;
+
 		return iommu_dma_iova_link_swiotlb(dev, state, phys, offset,
 				size, dir, attrs);
+	}
 
 	return __dma_iova_link(dev, state->addr + offset - iova_start_pad,
 			phys - iova_start_pad,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 02/16] iommu/dma: handle MMIO path in dma_iova_link Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-06 18:26   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.

The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.

This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/core-api/dma-api.rst |  4 ++--
 kernel/dma/debug.c                 | 28 +++++++++++++++++-----------
 kernel/dma/debug.h                 | 16 +++++++---------
 kernel/dma/mapping.c               | 15 ++++++++-------
 4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 3087bea715ed2..ca75b35416792 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -761,7 +761,7 @@ example warning message may look like this::
 	[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
 	[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
 	[<ffffffff803c7ea3>] check_unmap+0x203/0x490
-	[<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+	[<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
 	[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
 	[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
 	[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -855,7 +855,7 @@ that a driver may be leaking mappings.
 dma-debug interface debug_dma_mapping_error() to debug drivers that fail
 to check DMA mapping errors on addresses returned by dma_map_single() and
 dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
 the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
 this flag is still set, prints warning message that includes call trace that
 leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index e43c6de2bce4e..da6734e3a4ce9 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -39,6 +39,7 @@ enum {
 	dma_debug_sg,
 	dma_debug_coherent,
 	dma_debug_resource,
+	dma_debug_phy,
 };
 
 enum map_err_types {
@@ -141,6 +142,7 @@ static const char *type2name[] = {
 	[dma_debug_sg] = "scatter-gather",
 	[dma_debug_coherent] = "coherent",
 	[dma_debug_resource] = "resource",
+	[dma_debug_phy] = "phy",
 };
 
 static const char *dir2name[] = {
@@ -1201,9 +1203,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
 }
 EXPORT_SYMBOL(debug_dma_map_single);
 
-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
-			size_t size, int direction, dma_addr_t dma_addr,
-			unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		int direction, dma_addr_t dma_addr, unsigned long attrs)
 {
 	struct dma_debug_entry *entry;
 
@@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 		return;
 
 	entry->dev       = dev;
-	entry->type      = dma_debug_single;
-	entry->paddr	 = page_to_phys(page) + offset;
+	entry->type      = dma_debug_phy;
+	entry->paddr	 = phys;
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
 	entry->map_err_type = MAP_ERR_NOT_CHECKED;
 
-	check_for_stack(dev, page, offset);
+	if (!(attrs & DMA_ATTR_MMIO)) {
+		struct page *page = phys_to_page(phys);
+		size_t offset = offset_in_page(page);
 
-	if (!PageHighMem(page)) {
-		void *addr = page_address(page) + offset;
+		check_for_stack(dev, page, offset);
 
-		check_for_illegal_area(dev, addr, size);
+		if (!PageHighMem(page)) {
+			void *addr = page_address(page) + offset;
+
+			check_for_illegal_area(dev, addr, size);
+		}
 	}
 
 	add_dma_entry(entry, attrs);
@@ -1274,11 +1280,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 }
 EXPORT_SYMBOL(debug_dma_mapping_error);
 
-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
 			  size_t size, int direction)
 {
 	struct dma_debug_entry ref = {
-		.type           = dma_debug_single,
+		.type           = dma_debug_phy,
 		.dev            = dev,
 		.dev_addr       = dma_addr,
 		.size           = size,
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index f525197d3cae6..76adb42bffd5f 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
 #define _KERNEL_DMA_DEBUG_H
 
 #ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
-			       size_t offset, size_t size,
-			       int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+			       size_t size, int direction, dma_addr_t dma_addr,
 			       unsigned long attrs);
 
-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction);
 
 extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -55,14 +54,13 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
 					 struct scatterlist *sg,
 					 int nelems, int direction);
 #else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
-				      size_t offset, size_t size,
-				      int direction, dma_addr_t dma_addr,
-				      unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+				      size_t size, int direction,
+				      dma_addr_t dma_addr, unsigned long attrs)
 {
 }
 
-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
 					size_t size, int direction)
 {
 }
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 107e4a4d251df..4c1dfbabb8ae5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+	    arch_dma_map_page_direct(dev, phys + size))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
-	trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
-			   attrs);
-	debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+	trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
 	return addr;
 }
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_page(dev, addr, size, dir, attrs);
-	debug_dma_unmap_page(dev, addr, size, dir);
+	debug_dma_unmap_phys(dev, addr, size, dir);
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
@@ -712,7 +712,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
 	if (page) {
 		trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle,
 				      size, dir, gfp, 0);
-		debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0);
+		debug_dma_map_phys(dev, page_to_phys(page), size, dir,
+				   *dma_handle, 0);
 	} else {
 		trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0);
 	}
@@ -738,7 +739,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
 		dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 	trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0);
-	debug_dma_unmap_page(dev, dma_handle, size, dir);
+	debug_dma_unmap_phys(dev, dma_handle, size, dir);
 	__dma_free_pages(dev, size, page, dma_handle, dir);
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (2 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/trace/events/dma.h | 4 ++--
 kernel/dma/mapping.c       | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index ee90d6f1dcf35..84416c7d6bfaa 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -72,7 +72,7 @@ DEFINE_EVENT(dma_map, name, \
 		 size_t size, enum dma_data_direction dir, unsigned long attrs), \
 	TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
 
-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
 DEFINE_MAP_EVENT(dma_map_resource);
 
 DECLARE_EVENT_CLASS(dma_unmap,
@@ -110,7 +110,7 @@ DEFINE_EVENT(dma_unmap, name, \
 		 enum dma_data_direction dir, unsigned long attrs), \
 	TP_ARGS(dev, addr, size, dir, attrs))
 
-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
 DEFINE_UNMAP_EVENT(dma_unmap_resource);
 
 DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4c1dfbabb8ae5..fe1f0da6dc507 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
-	trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
 	return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		iommu_dma_unmap_page(dev, addr, size, dir, attrs);
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
-	trace_dma_unmap_page(dev, addr, size, dir, attrs);
+	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (3 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-06 18:44   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.

The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.

All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 14 ++++++--------
 include/linux/iommu-dma.h |  7 +++----
 kernel/dma/mapping.c      |  4 ++--
 kernel/dma/ops_helpers.c  |  6 +++---
 4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 399838c17b705..11c5d5f8c0981 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1190,11 +1190,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
 	return iova_offset(iovad, phys | size);
 }
 
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-	      unsigned long offset, size_t size, enum dma_data_direction dir,
-	      unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
-	phys_addr_t phys = page_to_phys(page) + offset;
 	bool coherent = dev_is_dma_coherent(dev);
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1222,7 +1220,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	return iova;
 }
 
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1341,7 +1339,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		iommu_dma_unmap_page(dev, sg_dma_address(s),
+		iommu_dma_unmap_phys(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -1354,8 +1352,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	sg_dma_mark_swiotlb(sg);
 
 	for_each_sg(sg, s, nents, i) {
-		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
-				s->offset, s->length, dir, attrs);
+		sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+				s->length, dir, attrs);
 		if (sg_dma_address(s) == DMA_MAPPING_ERROR)
 			goto out_unmap;
 		sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39e..485bdffed9888 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
 }
 #endif /* CONFIG_IOMMU_DMA */
 
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs);
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe1f0da6dc507..58482536db9bb 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 	    arch_dma_map_page_direct(dev, phys + size))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else if (use_dma_iommu(dev))
-		addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	    arch_dma_unmap_page_direct(dev, addr + size))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
-		iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb96..6f9d604d9d406 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
 		return NULL;
 
 	if (use_dma_iommu(dev))
-		*dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
-						 DMA_ATTR_SKIP_CPU_SYNC);
+		*dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+						 dir, DMA_ATTR_SKIP_CPU_SYNC);
 	else
 		*dma_handle = ops->map_page(dev, page, 0, size, dir,
 					    DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	if (use_dma_iommu(dev))
-		iommu_dma_unmap_page(dev, dma_handle, size, dir,
+		iommu_dma_unmap_phys(dev, dma_handle, size, dir,
 				     DMA_ATTR_SKIP_CPU_SYNC);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, dma_handle, size, dir,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (4 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 12:07   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
order to allow single phys_addr_t flow.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 11c5d5f8c0981..0a19ce50938b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1193,12 +1193,17 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
 dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	bool coherent = dev_is_dma_coherent(dev);
-	int prot = dma_info_to_prot(dir, coherent, attrs);
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
 	dma_addr_t iova, dma_mask = dma_get_mask(dev);
+	bool coherent;
+	int prot;
+
+	if (attrs & DMA_ATTR_MMIO)
+		return __iommu_dma_map(dev, phys, size,
+				dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+				dma_get_mask(dev));
 
 	/*
 	 * If both the physical buffer start address and size are page aligned,
@@ -1211,6 +1216,9 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
 			return DMA_MAPPING_ERROR;
 	}
 
+	coherent = dev_is_dma_coherent(dev);
+	prot = dma_info_to_prot(dir, coherent, attrs);
+
 	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		arch_sync_dma_for_device(phys, size, dir);
 
@@ -1223,10 +1231,14 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
 void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	phys_addr_t phys;
 
-	phys = iommu_iova_to_phys(domain, dma_handle);
+	if (attrs & DMA_ATTR_MMIO) {
+		__iommu_dma_unmap(dev, dma_handle, size);
+		return;
+	}
+
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
 	if (WARN_ON(!phys))
 		return;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (5 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 12:13   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 arch/powerpc/kernel/dma-iommu.c |  4 +--
 include/linux/dma-map-ops.h     |  8 +++---
 kernel/dma/direct.c             |  6 ++--
 kernel/dma/direct.h             | 50 ++++++++++++++++++++-------------
 kernel/dma/mapping.c            |  8 +++---
 5 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f38..0359ab72cd3ba 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
 #define can_map_direct(dev, addr) \
 	((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))
 
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
 {
 	if (likely(!dev->bus_dma_limit))
 		return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
 
 #define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)
 
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
 {
 	if (likely(!dev->bus_dma_limit))
 		return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5d..71f5b30254159 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
 void arch_dma_clear_uncached(void *addr, size_t size);
 
 #ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
 bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
 		int nents);
 bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
 		int nents);
 #else
-#define arch_dma_map_page_direct(d, a)		(false)
-#define arch_dma_unmap_page_direct(d, a)	(false)
+#define arch_dma_map_phys_direct(d, a)		(false)
+#define arch_dma_unmap_phys_direct(d, a)	(false)
 #define arch_dma_map_sg_direct(d, s, n)		(false)
 #define arch_dma_unmap_sg_direct(d, s, n)	(false)
 #endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c8799..fa75e30700730 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		if (sg_dma_is_bus_address(sg))
 			sg_dma_unmark_bus_address(sg);
 		else
-			dma_direct_unmap_page(dev, sg->dma_address,
+			dma_direct_unmap_phys(dev, sg->dma_address,
 					      sg_dma_len(sg), dir, attrs);
 	}
 }
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 			 */
 			break;
 		case PCI_P2PDMA_MAP_NONE:
-			sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
-					sg->offset, sg->length, dir, attrs);
+			sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+					sg->length, dir, attrs);
 			if (sg->dma_address == DMA_MAPPING_ERROR) {
 				ret = -EIO;
 				goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc0..2b442efc9b5a7 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,42 +80,54 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_dma_mark_clean(paddr, size);
 }
 
-static inline dma_addr_t dma_direct_map_page(struct device *dev,
-		struct page *page, unsigned long offset, size_t size,
-		enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+		phys_addr_t phys, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
 {
-	phys_addr_t phys = page_to_phys(page) + offset;
-	dma_addr_t dma_addr = phys_to_dma(dev, phys);
+	bool is_mmio = attrs & DMA_ATTR_MMIO;
+	dma_addr_t dma_addr;
+	bool capable;
+
+	dma_addr = (is_mmio) ? phys : phys_to_dma(dev, phys);
+	capable = dma_capable(dev, dma_addr, size, is_mmio);
+	if (is_mmio) {
+	       if (unlikely(!capable))
+		       goto err_overflow;
+	       return dma_addr;
+	}
 
-	if (is_swiotlb_force_bounce(dev)) {
-		if (is_pci_p2pdma_page(page))
-			return DMA_MAPPING_ERROR;
+	if (is_swiotlb_force_bounce(dev))
 		return swiotlb_map(dev, phys, size, dir, attrs);
-	}
 
-	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
-	    dma_kmalloc_needs_bounce(dev, size, dir)) {
-		if (is_pci_p2pdma_page(page))
-			return DMA_MAPPING_ERROR;
+	if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
 		if (is_swiotlb_active(dev))
 			return swiotlb_map(dev, phys, size, dir, attrs);
 
-		dev_WARN_ONCE(dev, 1,
-			     "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
-			     &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
-		return DMA_MAPPING_ERROR;
+		goto err_overflow;
 	}
 
 	if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		arch_sync_dma_for_device(phys, size, dir);
 	return dma_addr;
+
+err_overflow:
+	dev_WARN_ONCE(
+		dev, 1,
+		"DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+		&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+	return DMA_MAPPING_ERROR;
 }
 
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	phys_addr_t phys = dma_to_phys(dev, addr);
+	phys_addr_t phys;
+
+	if (attrs & DMA_ATTR_MMIO)
+		/* nothing to do: uncached and no swiotlb */
+		return;
 
+	phys = dma_to_phys(dev, addr);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58482536db9bb..80481a873340a 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_page_direct(dev, phys + size))
-		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+	    arch_dma_map_phys_direct(dev, phys + size))
+		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_unmap_page_direct(dev, addr + size))
-		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	    arch_dma_unmap_phys_direct(dev, addr + size))
+		dma_direct_unmap_phys(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
 	else
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (6 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 12:21   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert the KMSAN DMA handling function from page-based to physical
address-based interface.

The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). A PFN_VALID check is added to prevent KMSAN operations
on non-page memory, preventing from non struct page backed address,

As part of this change, support for highmem addresses is implemented
using kmap_local_page() to handle both lowmem and highmem regions
properly. All callers throughout the codebase are updated to use the
new phys_addr_t based interface.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/virtio/virtio_ring.c |  4 ++--
 include/linux/kmsan.h        | 12 +++++++-----
 kernel/dma/mapping.c         |  2 +-
 mm/kmsan/hooks.c             | 36 +++++++++++++++++++++++++++++-------
 tools/virtio/linux/kmsan.h   |  2 +-
 5 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5062061c4084..c147145a65930 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
 		 * is initialized by the hardware. Explicitly check/unpoison it
 		 * depending on the direction.
 		 */
-		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+		kmsan_handle_dma(sg_phys(sg), sg->length, direction);
 		*addr = (dma_addr_t)sg_phys(sg);
 		return 0;
 	}
@@ -3157,7 +3157,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	if (!vq->use_dma_api) {
-		kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+		kmsan_handle_dma(virt_to_phys(ptr), size, dir);
 		return (dma_addr_t)virt_to_phys(ptr);
 	}
 
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d59..6f27b9824ef77 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
 
 /**
  * kmsan_handle_dma() - Handle a DMA data transfer.
- * @page:   first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys:   physical address of the buffer.
  * @size:   buffer size.
  * @dir:    one of possible dma_data_direction values.
  *
@@ -191,8 +190,11 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
  * * checks the buffer, if it is copied to device;
  * * initializes the buffer, if it is copied from device;
  * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ *
+ * The function handles page lookup internally and supports both lowmem
+ * and highmem addresses.
  */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
 		      enum dma_data_direction dir);
 
 /**
@@ -372,8 +374,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
 {
 }
 
-static inline void kmsan_handle_dma(struct page *page, size_t offset,
-				    size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+				    enum dma_data_direction dir)
 {
 }
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 80481a873340a..709405d46b2b4 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
-	kmsan_handle_dma(page, offset, size, dir);
+	kmsan_handle_dma(phys, size, dir);
 	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f07..eab7912a3bf05 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,25 +336,48 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
 }
 
 /* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
 		      enum dma_data_direction dir)
 {
 	u64 page_offset, to_go, addr;
+	struct page *page;
+	void *kaddr;
 
-	if (PageHighMem(page))
+	if (!pfn_valid(PHYS_PFN(phys)))
 		return;
-	addr = (u64)page_address(page) + offset;
+
+	page = phys_to_page(phys);
+	page_offset = offset_in_page(phys);
+
 	/*
 	 * The kernel may occasionally give us adjacent DMA pages not belonging
 	 * to the same allocation. Process them separately to avoid triggering
 	 * internal KMSAN checks.
 	 */
 	while (size > 0) {
-		page_offset = offset_in_page(addr);
 		to_go = min(PAGE_SIZE - page_offset, (u64)size);
+
+		if (PageHighMem(page))
+			/* Handle highmem pages using kmap */
+			kaddr = kmap_local_page(page);
+		else
+			/* Lowmem pages can be accessed directly */
+			kaddr = page_address(page);
+
+		addr = (u64)kaddr + page_offset;
 		kmsan_handle_dma_page((void *)addr, to_go, dir);
-		addr += to_go;
+
+		if (PageHighMem(page))
+			kunmap_local(page);
+
+		phys += to_go;
 		size -= to_go;
+
+		/* Move to next page if needed */
+		if (size > 0) {
+			page = phys_to_page(phys);
+			page_offset = offset_in_page(phys);
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(kmsan_handle_dma);
@@ -366,8 +389,7 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
 	int i;
 
 	for_each_sg(sg, item, nents, i)
-		kmsan_handle_dma(sg_page(item), item->offset, item->length,
-				 dir);
+		kmsan_handle_dma(sg_phys(item), item->length, dir);
 }
 
 /* Functions from kmsan-checks.h follow. */
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5a..6cd2e3efd03dc 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@
 
 #include <linux/gfp.h>
 
-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
 			     enum dma_data_direction dir)
 {
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (7 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 13:08   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Extend base DMA page API to handle MMIO flow.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 kernel/dma/mapping.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 709405d46b2b4..f5f051737e556 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	phys_addr_t phys = page_to_phys(page) + offset;
+	bool is_mmio = attrs & DMA_ATTR_MMIO;
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
@@ -166,12 +167,23 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_phys_direct(dev, phys + size))
+	    (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
 		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
-	else
+	else if (is_mmio) {
+		if (!ops->map_resource)
+			return DMA_MAPPING_ERROR;
+
+		addr = ops->map_resource(dev, phys, size, dir, attrs);
+	} else {
+		/*
+		 * All platforms which implement .map_page() don't support
+		 * non-struct page backed addresses.
+		 */
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
+	}
+
 	kmsan_handle_dma(phys, size, dir);
 	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
@@ -184,14 +196,18 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	bool is_mmio = attrs & DMA_ATTR_MMIO;
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_unmap_phys_direct(dev, addr + size))
+	    (!is_mmio && arch_dma_unmap_phys_direct(dev, addr + size)))
 		dma_direct_unmap_phys(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
-	else
+	else if (is_mmio) {
+		if (ops->unmap_resource)
+			ops->unmap_resource(dev, addr, size, dir, attrs);
+	} else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 10/16] xen: swiotlb: Open code map_resource callback
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (8 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 14:40   ` Jürgen Groß
  2025-08-04 12:42 ` [PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

General dma_direct_map_resource() is going to be removed
in next patch, so simply open-code it in xen driver.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/xen/swiotlb-xen.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index da1a7d3d377cf..dd7747a2de879 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -392,6 +392,25 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
 	}
 }
 
+static dma_addr_t xen_swiotlb_direct_map_resource(struct device *dev,
+						  phys_addr_t paddr,
+						  size_t size,
+						  enum dma_data_direction dir,
+						  unsigned long attrs)
+{
+	dma_addr_t dma_addr = paddr;
+
+	if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
+		dev_err_once(dev,
+			     "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+			     &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+		WARN_ON_ONCE(1);
+		return DMA_MAPPING_ERROR;
+	}
+
+	return dma_addr;
+}
+
 /*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
@@ -426,5 +445,5 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.alloc_pages_op = dma_common_alloc_pages,
 	.free_pages = dma_common_free_pages,
 	.max_mapping_size = swiotlb_max_mapping_size,
-	.map_resource = dma_direct_map_resource,
+	.map_resource = xen_swiotlb_direct_map_resource,
 };
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (9 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 13:38   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs() and dma_unmap_page_attrs() functions
converted to simple wrappers around the phys-based implementations.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c   | 14 --------
 include/linux/dma-direct.h  |  2 --
 include/linux/dma-mapping.h | 13 +++++++
 include/linux/iommu-dma.h   |  4 ---
 include/trace/events/dma.h  |  2 --
 kernel/dma/debug.c          | 43 -----------------------
 kernel/dma/debug.h          | 21 ------------
 kernel/dma/direct.c         | 16 ---------
 kernel/dma/mapping.c        | 68 ++++++++++++++++++++-----------------
 9 files changed, 49 insertions(+), 134 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0a19ce50938b3..69f85209be7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1556,20 +1556,6 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		__iommu_dma_unmap(dev, start, end - start);
 }
 
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	return __iommu_dma_map(dev, phys, size,
-			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
-			dma_get_mask(dev));
-}
-
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	__iommu_dma_unmap(dev, handle, size);
-}
-
 static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
 {
 	size_t alloc_size = PAGE_ALIGN(size);
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index f3bc0bcd70980..c249912456f96 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -149,7 +149,5 @@ void dma_direct_free_pages(struct device *dev, size_t size,
 		struct page *page, dma_addr_t dma_addr,
 		enum dma_data_direction dir);
 int dma_direct_supported(struct device *dev, u64 mask);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
 
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index afc89835c7457..2aa43a6bed92b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -132,6 +132,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		unsigned long attrs);
 void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
 unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs);
 void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -186,6 +190,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 }
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
 static inline unsigned int dma_map_sg_attrs(struct device *dev,
 		struct scatterlist *sg, int nents, enum dma_data_direction dir,
 		unsigned long attrs)
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 485bdffed9888..a92b3ff9b9343 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -42,10 +42,6 @@ size_t iommu_dma_opt_mapping_size(void);
 size_t iommu_dma_max_mapping_size(struct device *dev);
 void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
 		dma_addr_t handle, unsigned long attrs);
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
 struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev, size_t size,
 		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
 void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index 84416c7d6bfaa..5da59fd8121db 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -73,7 +73,6 @@ DEFINE_EVENT(dma_map, name, \
 	TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
 
 DEFINE_MAP_EVENT(dma_map_phys);
-DEFINE_MAP_EVENT(dma_map_resource);
 
 DECLARE_EVENT_CLASS(dma_unmap,
 	TP_PROTO(struct device *dev, dma_addr_t addr, size_t size,
@@ -111,7 +110,6 @@ DEFINE_EVENT(dma_unmap, name, \
 	TP_ARGS(dev, addr, size, dir, attrs))
 
 DEFINE_UNMAP_EVENT(dma_unmap_phys);
-DEFINE_UNMAP_EVENT(dma_unmap_resource);
 
 DECLARE_EVENT_CLASS(dma_alloc_class,
 	TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index da6734e3a4ce9..06e31fd216e38 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -38,7 +38,6 @@ enum {
 	dma_debug_single,
 	dma_debug_sg,
 	dma_debug_coherent,
-	dma_debug_resource,
 	dma_debug_phy,
 };
 
@@ -141,7 +140,6 @@ static const char *type2name[] = {
 	[dma_debug_single] = "single",
 	[dma_debug_sg] = "scatter-gather",
 	[dma_debug_coherent] = "coherent",
-	[dma_debug_resource] = "resource",
 	[dma_debug_phy] = "phy",
 };
 
@@ -1448,47 +1446,6 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
 	check_unmap(&ref);
 }
 
-void debug_dma_map_resource(struct device *dev, phys_addr_t addr, size_t size,
-			    int direction, dma_addr_t dma_addr,
-			    unsigned long attrs)
-{
-	struct dma_debug_entry *entry;
-
-	if (unlikely(dma_debug_disabled()))
-		return;
-
-	entry = dma_entry_alloc();
-	if (!entry)
-		return;
-
-	entry->type		= dma_debug_resource;
-	entry->dev		= dev;
-	entry->paddr		= addr;
-	entry->size		= size;
-	entry->dev_addr		= dma_addr;
-	entry->direction	= direction;
-	entry->map_err_type	= MAP_ERR_NOT_CHECKED;
-
-	add_dma_entry(entry, attrs);
-}
-
-void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
-			      size_t size, int direction)
-{
-	struct dma_debug_entry ref = {
-		.type           = dma_debug_resource,
-		.dev            = dev,
-		.dev_addr       = dma_addr,
-		.size           = size,
-		.direction      = direction,
-	};
-
-	if (unlikely(dma_debug_disabled()))
-		return;
-
-	check_unmap(&ref);
-}
-
 void debug_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
 				   size_t size, int direction)
 {
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index 76adb42bffd5f..424b8f912aded 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -30,14 +30,6 @@ extern void debug_dma_alloc_coherent(struct device *dev, size_t size,
 extern void debug_dma_free_coherent(struct device *dev, size_t size,
 				    void *virt, dma_addr_t addr);
 
-extern void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
-				   size_t size, int direction,
-				   dma_addr_t dma_addr,
-				   unsigned long attrs);
-
-extern void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
-				     size_t size, int direction);
-
 extern void debug_dma_sync_single_for_cpu(struct device *dev,
 					  dma_addr_t dma_handle, size_t size,
 					  int direction);
@@ -88,19 +80,6 @@ static inline void debug_dma_free_coherent(struct device *dev, size_t size,
 {
 }
 
-static inline void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
-					  size_t size, int direction,
-					  dma_addr_t dma_addr,
-					  unsigned long attrs)
-{
-}
-
-static inline void debug_dma_unmap_resource(struct device *dev,
-					    dma_addr_t dma_addr, size_t size,
-					    int direction)
-{
-}
-
 static inline void debug_dma_sync_single_for_cpu(struct device *dev,
 						 dma_addr_t dma_handle,
 						 size_t size, int direction)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fa75e30700730..1062caac47e7b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -502,22 +502,6 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 	return ret;
 }
 
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	dma_addr_t dma_addr = paddr;
-
-	if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
-		dev_err_once(dev,
-			     "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
-			     &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
-		WARN_ON_ONCE(1);
-		return DMA_MAPPING_ERROR;
-	}
-
-	return dma_addr;
-}
-
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f5f051737e556..b747794448130 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,10 @@ static inline bool dma_map_direct(struct device *dev,
 	return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
-		size_t offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-	phys_addr_t phys = page_to_phys(page) + offset;
 	bool is_mmio = attrs & DMA_ATTR_MMIO;
 	dma_addr_t addr;
 
@@ -177,6 +175,9 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 
 		addr = ops->map_resource(dev, phys, size, dir, attrs);
 	} else {
+		struct page *page = phys_to_page(phys);
+		size_t offset = offset_in_page(phys);
+
 		/*
 		 * All platforms which implement .map_page() don't support
 		 * non-struct page backed addresses.
@@ -190,9 +191,25 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 
 	return addr;
 }
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+		size_t offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	phys_addr_t phys = page_to_phys(page) + offset;
+
+	if (unlikely(attrs & DMA_ATTR_MMIO))
+		return DMA_MAPPING_ERROR;
+
+	if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
+		WARN_ON_ONCE(!pfn_valid(PHYS_PFN(phys)));
+
+	return dma_map_phys(dev, phys, size, dir, attrs);
+}
 EXPORT_SYMBOL(dma_map_page_attrs);
 
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -212,6 +229,16 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
 }
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+		 enum dma_data_direction dir, unsigned long attrs)
+{
+	if (unlikely(attrs & DMA_ATTR_MMIO))
+		return;
+
+	dma_unmap_phys(dev, addr, size, dir, attrs);
+}
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
 static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -337,41 +364,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
 dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-	dma_addr_t addr = DMA_MAPPING_ERROR;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	if (WARN_ON_ONCE(!dev->dma_mask))
+	if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+	    WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_map_direct(dev, ops))
-		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
-	else if (use_dma_iommu(dev))
-		addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
-	else if (ops->map_resource)
-		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
-
-	trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
-	debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
-	return addr;
+	return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
 }
 EXPORT_SYMBOL(dma_map_resource);
 
 void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-	if (dma_map_direct(dev, ops))
-		; /* nothing to do: uncached and no swiotlb */
-	else if (use_dma_iommu(dev))
-		iommu_dma_unmap_resource(dev, addr, size, dir, attrs);
-	else if (ops->unmap_resource)
-		ops->unmap_resource(dev, addr, size, dir, attrs);
-	trace_dma_unmap_resource(dev, addr, size, dir, attrs);
-	debug_dma_unmap_resource(dev, addr, size, dir);
+	dma_unmap_phys(dev, addr, size, dir, attrs | DMA_ATTR_MMIO);
 }
 EXPORT_SYMBOL(dma_unmap_resource);
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 12/16] mm/hmm: migrate to physical address-based DMA mapping API
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (10 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 13:14   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.

The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.

This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 mm/hmm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index d545e24949949..015ab243f0813 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -775,8 +775,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 		if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
 			goto error;
 
-		dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
-					DMA_BIDIRECTIONAL);
+		dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+					DMA_BIDIRECTIONAL, 0);
 		if (dma_mapping_error(dev, dma_addr))
 			goto error;
 
@@ -819,8 +819,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
 		dma_iova_unlink(dev, state, idx * map->dma_entry_size,
 				map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
 	} else if (dma_need_unmap(dev))
-		dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
-			       DMA_BIDIRECTIONAL);
+		dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+			       DMA_BIDIRECTIONAL, 0);
 
 	pfns[idx] &=
 		~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 13/16] mm/hmm: properly take MMIO path
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (11 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 13:14   ` Jason Gunthorpe
  2025-08-04 12:42 ` [PATCH v1 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

In case peer-to-peer transaction traverses through host bridge,
the IOMMU needs to have IOMMU_MMIO flag, together with skip of
CPU sync.

The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag,
but IOMMU flag was missed, due to assumption that such memory
can be treated as regular one.

Reuse newly introduced DMA attribute to properly take MMIO path.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 mm/hmm.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 015ab243f0813..6556c0e074ba8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -746,7 +746,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 	case PCI_P2PDMA_MAP_NONE:
 		break;
 	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-		attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+		attrs |= DMA_ATTR_MMIO;
 		pfns[idx] |= HMM_PFN_P2PDMA;
 		break;
 	case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -776,7 +776,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 			goto error;
 
 		dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
-					DMA_BIDIRECTIONAL, 0);
+					DMA_BIDIRECTIONAL, attrs);
 		if (dma_mapping_error(dev, dma_addr))
 			goto error;
 
@@ -811,16 +811,17 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
 	if ((pfns[idx] & valid_dma) != valid_dma)
 		return false;
 
+	if (pfns[idx] & HMM_PFN_P2PDMA)
+		attrs |= DMA_ATTR_MMIO;
+
 	if (pfns[idx] & HMM_PFN_P2PDMA_BUS)
 		; /* no need to unmap bus address P2P mappings */
-	else if (dma_use_iova(state)) {
-		if (pfns[idx] & HMM_PFN_P2PDMA)
-			attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+	else if (dma_use_iova(state))
 		dma_iova_unlink(dev, state, idx * map->dma_entry_size,
 				map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
-	} else if (dma_need_unmap(dev))
+	else if (dma_need_unmap(dev))
 		dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
-			       DMA_BIDIRECTIONAL, 0);
+			       DMA_BIDIRECTIONAL, attrs);
 
 	pfns[idx] &=
 		~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (12 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 15/16] block-dma: properly take MMIO path Leon Romanovsky
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

After introduction of dma_map_phys(), there is no need to convert
from physical address to struct page in order to map page. So let's
use it directly.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 block/blk-mq-dma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index ad283017caef2..37e2142be4f7d 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
 static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
 		struct blk_dma_iter *iter, struct phys_vec *vec)
 {
-	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
-			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
+			rq_dma_dir(req), 0);
 	if (dma_mapping_error(dma_dev, iter->addr)) {
 		iter->status = BLK_STS_RESOURCE;
 		return false;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 15/16] block-dma: properly take MMIO path
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (13 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-04 12:42 ` [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
  2025-08-07 14:19 ` [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Jason Gunthorpe
  16 siblings, 0 replies; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Make sure that CPU is not synced and IOMMU is configured to take
MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 block/blk-mq-dma.c         | 13 +++++++++++--
 include/linux/blk-mq-dma.h |  6 +++++-
 include/linux/blk_types.h  |  2 ++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index 37e2142be4f7d..d415088ed9fd2 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,13 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
 static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
 		struct blk_dma_iter *iter, struct phys_vec *vec)
 {
+	unsigned int attrs = 0;
+
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
+
 	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
-			rq_dma_dir(req), 0);
+			rq_dma_dir(req), attrs);
 	if (dma_mapping_error(dma_dev, iter->addr)) {
 		iter->status = BLK_STS_RESOURCE;
 		return false;
@@ -103,14 +108,17 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
 {
 	enum dma_data_direction dir = rq_dma_dir(req);
 	unsigned int mapped = 0;
+	unsigned int attrs = 0;
 	int error;
 
 	iter->addr = state->addr;
 	iter->len = dma_iova_size(state);
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
 
 	do {
 		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
-				vec->len, dir, 0);
+				vec->len, dir, attrs);
 		if (error)
 			break;
 		mapped += vec->len;
@@ -176,6 +184,7 @@ bool blk_rq_dma_map_iter_start(struct request *req, struct device *dma_dev,
 			 * same as non-P2P transfers below and during unmap.
 			 */
 			req->cmd_flags &= ~REQ_P2PDMA;
+			req->cmd_flags |= REQ_MMIO;
 			break;
 		default:
 			iter->status = BLK_STS_INVAL;
diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h
index c26a01aeae006..6c55f5e585116 100644
--- a/include/linux/blk-mq-dma.h
+++ b/include/linux/blk-mq-dma.h
@@ -48,12 +48,16 @@ static inline bool blk_rq_dma_map_coalesce(struct dma_iova_state *state)
 static inline bool blk_rq_dma_unmap(struct request *req, struct device *dma_dev,
 		struct dma_iova_state *state, size_t mapped_len)
 {
+	unsigned int attrs = 0;
+
 	if (req->cmd_flags & REQ_P2PDMA)
 		return true;
 
 	if (dma_use_iova(state)) {
+		if (req->cmd_flags & REQ_MMIO)
+			attrs = DMA_ATTR_MMIO;
 		dma_iova_destroy(dma_dev, state, mapped_len, rq_dma_dir(req),
-				 0);
+				 attrs);
 		return true;
 	}
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 09b99d52fd365..283058bcb5b14 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -387,6 +387,7 @@ enum req_flag_bits {
 	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
 	__REQ_ATOMIC,		/* for atomic write operations */
 	__REQ_P2PDMA,		/* contains P2P DMA pages */
+	__REQ_MMIO,		/* contains MMIO memory */
 	/*
 	 * Command specific flags, keep last:
 	 */
@@ -420,6 +421,7 @@ enum req_flag_bits {
 #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
 #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
 #define REQ_P2PDMA	(__force blk_opf_t)(1ULL << __REQ_P2PDMA)
+#define REQ_MMIO	(__force blk_opf_t)(1ULL << __REQ_MMIO)
 
 #define REQ_NOUNMAP	(__force blk_opf_t)(1ULL << __REQ_NOUNMAP)
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (14 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 15/16] block-dma: properly take MMIO path Leon Romanovsky
@ 2025-08-04 12:42 ` Leon Romanovsky
  2025-08-07 13:45   ` Jason Gunthorpe
  2025-08-07 14:19 ` [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Jason Gunthorpe
  16 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-04 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Block layer maps MMIO memory through dma_map_phys() interface
with help of DMA_ATTR_MMIO attribute. There is a need to unmap
that memory with the appropriate unmap function.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/nvme/host/pci.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 071efec25346f..0b624247948c5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -682,11 +682,15 @@ static void nvme_free_prps(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+	unsigned int attrs = 0;
 	unsigned int i;
 
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
+
 	for (i = 0; i < iod->nr_dma_vecs; i++)
-		dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr,
-				iod->dma_vecs[i].len, rq_dma_dir(req));
+		dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
+				iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
 	mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
 }
 
@@ -699,15 +703,19 @@ static void nvme_free_sgls(struct request *req)
 	unsigned int sqe_dma_len = le32_to_cpu(iod->cmd.common.dptr.sgl.length);
 	struct nvme_sgl_desc *sg_list = iod->descriptors[0];
 	enum dma_data_direction dir = rq_dma_dir(req);
+	unsigned int attrs = 0;
+
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
 
 	if (iod->nr_descriptors) {
 		unsigned int nr_entries = sqe_dma_len / sizeof(*sg_list), i;
 
 		for (i = 0; i < nr_entries; i++)
-			dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr),
-				le32_to_cpu(sg_list[i].length), dir);
+			dma_unmap_phys(dma_dev, le64_to_cpu(sg_list[i].addr),
+				le32_to_cpu(sg_list[i].length), dir, attrs);
 	} else {
-		dma_unmap_page(dma_dev, sqe_dma_addr, sqe_dma_len, dir);
+		dma_unmap_phys(dma_dev, sqe_dma_addr, sqe_dma_len, dir, attrs);
 	}
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory
  2025-08-04 12:42 ` [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
@ 2025-08-06 17:31   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-06 17:31 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:35PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
> that reside in memory-mapped I/O (MMIO) regions, such as device BARs
> exposed through the host bridge, which are accessible for peer-to-peer
> (P2P) DMA.
> 
> This attribute is especially useful for exporting device memory to other
> devices for DMA without CPU involvement, and avoids unnecessary or
> potentially detrimental CPU cache maintenance calls.

It is worth mentioning here that dma_map_resource() and DMA_ATTR_MMIO
are intended to be the same thing.

> --- a/Documentation/core-api/dma-attributes.rst
> +++ b/Documentation/core-api/dma-attributes.rst
> @@ -130,3 +130,10 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
>  subsystem that the buffer is fully accessible at the elevated privilege
>  level (and ideally inaccessible or at least read-only at the
>  lesser-privileged levels).
> +
> +DMA_ATTR_MMIO
> +-------------
> +
> +This attribute is especially useful for exporting device memory to other
> +devices for DMA without CPU involvement, and avoids unnecessary or
> +potentially detrimental CPU cache maintenance calls.

How about

This attribute indicates the physical address is not normal system
memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
functions, it may not be cachable, and access using CPU load/store
instructions may not be allowed.

Usually this will be used to describe MMIO addresses, or other non
cachable register addresses. When DMA mapping this sort of address we
call the operation Peer to Peer as a one device is DMA'ing to another
device. For PCI devices the p2pdma APIs must be used to determine if
DMA_ATTR_MMIO is appropriate.

For architectures that require cache flushing for DMA coherence
DMA_ATTR_MMIO will not perform any cache flushing. The address
provided must never be mapped cachable into the CPU.

> +/*
> + * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
> + *
> + * This attribute is used for MMIO memory regions that are exposed through
> + * the host bridge and are accessible for peer-to-peer (P2P) DMA. Memory
> + * marked with this attribute is not system RAM and may represent device
> + * BAR windows or peer-exposed memory.
> + *
> + * Typical usage is for mapping hardware memory BARs or exporting device
> + * memory to other devices for DMA without involving main system RAM.
> + * The attribute guarantees no CPU cache maintenance calls will be made.
> + */

I'd copy the Documentation/ text

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 02/16] iommu/dma: handle MMIO path in dma_iova_link
  2025-08-04 12:42 ` [PATCH v1 02/16] iommu/dma: handle MMIO path in dma_iova_link Leon Romanovsky
@ 2025-08-06 18:10   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-06 18:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:36PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Make sure that CPU is not synced if MMIO path is taken.

Let's elaborate..

Implement DMA_ATTR_MMIO for dma_iova_link().

This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid
touching the possibly non-KVA MMIO memory.

Also correct the incorrect caching attribute for the IOMMU, MMIO
memory should not be cachable inside the IOMMU mapping or it can
possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO.

> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ea2ef53bd4fef..399838c17b705 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1837,13 +1837,20 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
>  		phys_addr_t phys, size_t size, enum dma_data_direction dir,
>  		unsigned long attrs)
>  {
> -	bool coherent = dev_is_dma_coherent(dev);
> +	int prot;
>  
> -	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> -		arch_sync_dma_for_device(phys, size, dir);
> +	if (attrs & DMA_ATTR_MMIO)
> +		prot = dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO;

Yeah, exactly, we need the IOPTE on ARM to have the right cachability
or some systems might go wrong.


> +	else {
> +		bool coherent = dev_is_dma_coherent(dev);
> +
> +		if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +			arch_sync_dma_for_device(phys, size, dir);
> +		prot = dma_info_to_prot(dir, coherent, attrs);
> +	}
>  
>  	return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
> -			dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
> +			prot, GFP_ATOMIC);
>  }

Hmm, I missed this in prior series, ideally the GFP_ATOMIC should be
passed in as a gfp_t here so we can use GFP_KERNEL in callers that are
able.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping
  2025-08-04 12:42 ` [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-08-06 18:26   ` Jason Gunthorpe
  2025-08-06 18:38     ` Leon Romanovsky
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-06 18:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:37PM +0300, Leon Romanovsky wrote:
> +void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> +		int direction, dma_addr_t dma_addr, unsigned long attrs)
>  {
>  	struct dma_debug_entry *entry;

Should this patch should also absorb debug_dma_map_resource() into
here as well and we can have the caller of dma_dma_map_resource() call
debug_dma_map_page with ATTR_MMIO?

If not, this looks OK

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping
  2025-08-06 18:26   ` Jason Gunthorpe
@ 2025-08-06 18:38     ` Leon Romanovsky
  0 siblings, 0 replies; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-06 18:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Wed, Aug 06, 2025 at 03:26:30PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 03:42:37PM +0300, Leon Romanovsky wrote:
> > +void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> > +		int direction, dma_addr_t dma_addr, unsigned long attrs)
> >  {
> >  	struct dma_debug_entry *entry;
> 
> Should this patch should also absorb debug_dma_map_resource() into
> here as well and we can have the caller of dma_dma_map_resource() call
> debug_dma_map_page with ATTR_MMIO?

It is done in "[PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface".

Thanks

> 
> If not, this looks OK
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  2025-08-04 12:42 ` [PATCH v1 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
@ 2025-08-06 18:44   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-06 18:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:39PM +0300, Leon Romanovsky wrote:
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 399838c17b705..11c5d5f8c0981 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1190,11 +1190,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
>  	return iova_offset(iovad, phys | size);
>  }
>  
> -dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> -	      unsigned long offset, size_t size, enum dma_data_direction dir,
> -	      unsigned long attrs)
> +dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> +		enum dma_data_direction dir, unsigned long attrs)
>  {
> -	phys_addr_t phys = page_to_phys(page) + offset;
>  	bool coherent = dev_is_dma_coherent(dev);
>  	int prot = dma_info_to_prot(dir, coherent, attrs);
>  	struct iommu_domain *domain = iommu_get_dma_domain(dev);

No issue with pushing the page_to_phys to the looks like two callers..

It is worth pointing though that today if the page * was a
MEMORY_DEVICE_PCI_P2PDMA page then it is illegal to call the swiotlb
functions a few lines below this:

                phys = iommu_dma_map_swiotlb(dev, phys, size, dir, attrs);

ie struct page alone as a type is not sufficient to make this function
safe for a long time now.

So I would add some explanation in the commit message how this will be
situated in the final call chains, and maybe leave behind a comment
that attrs may not have ATTR_MMIO in this function.

I think the answer is iommu_dma_map_phys() is only called for
!ATTR_MMIO addresses, and that iommu_dma_map_resource() will be called
for ATTR_MMIO?

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
  2025-08-04 12:42 ` [PATCH v1 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
@ 2025-08-07 12:07   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 12:07 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:40PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
> order to allow single phys_addr_t flow.

Some later patch deletes iommu_dma_map_resource() ? Mention that plan here?

> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1193,12 +1193,17 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
>  dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
>  		enum dma_data_direction dir, unsigned long attrs)
>  {
> -	bool coherent = dev_is_dma_coherent(dev);
> -	int prot = dma_info_to_prot(dir, coherent, attrs);
>  	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>  	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>  	struct iova_domain *iovad = &cookie->iovad;
>  	dma_addr_t iova, dma_mask = dma_get_mask(dev);
> +	bool coherent;
> +	int prot;
> +
> +	if (attrs & DMA_ATTR_MMIO)
> +		return __iommu_dma_map(dev, phys, size,
> +				dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
> +				dma_get_mask(dev));

I realize that iommu_dma_map_resource() doesn't today, but shouldn't
this be checking for swiotlb:

	if (dev_use_swiotlb(dev, size, dir) &&
	    iova_unaligned(iovad, phys, size)) {

Except we have to fail for ATTR_MMIO?

Now that we have ATTR_MMIO, should dma_info_to_prot() just handle it
directly instead of open coding the | IOMMU_MMIO and messing with the
coherent attribute?

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  2025-08-04 12:42 ` [PATCH v1 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
@ 2025-08-07 12:13   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 12:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:41PM +0300, Leon Romanovsky wrote:
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -80,42 +80,54 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>  		arch_dma_mark_clean(paddr, size);
>  }
>  
> -static inline dma_addr_t dma_direct_map_page(struct device *dev,
> -		struct page *page, unsigned long offset, size_t size,
> -		enum dma_data_direction dir, unsigned long attrs)
> +static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> +		phys_addr_t phys, size_t size, enum dma_data_direction dir,
> +		unsigned long attrs)
>  {
> -	phys_addr_t phys = page_to_phys(page) + offset;
> -	dma_addr_t dma_addr = phys_to_dma(dev, phys);
> +	bool is_mmio = attrs & DMA_ATTR_MMIO;
> +	dma_addr_t dma_addr;
> +	bool capable;
> +
> +	dma_addr = (is_mmio) ? phys : phys_to_dma(dev, phys);
> +	capable = dma_capable(dev, dma_addr, size, is_mmio);
> +	if (is_mmio) {
> +	       if (unlikely(!capable))
> +		       goto err_overflow;
> +	       return dma_addr;

Similar remark here, shouldn't we be checking swiotlb things for
ATTR_MMIO and failing if swiotlb is needed?

> -	if (is_swiotlb_force_bounce(dev)) {
> -		if (is_pci_p2pdma_page(page))
> -			return DMA_MAPPING_ERROR;

This

> -	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
> -	    dma_kmalloc_needs_bounce(dev, size, dir)) {
> -		if (is_pci_p2pdma_page(page))
> -			return DMA_MAPPING_ERROR;

And this

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-04 12:42 ` [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-08-07 12:21   ` Jason Gunthorpe
  2025-08-13 15:07     ` Leon Romanovsky
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 12:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:42PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Convert the KMSAN DMA handling function from page-based to physical
> address-based interface.
> 
> The refactoring renames kmsan_handle_dma() parameters from accepting
> (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> size_t size). A PFN_VALID check is added to prevent KMSAN operations
> on non-page memory, preventing from non struct page backed address,
> 
> As part of this change, support for highmem addresses is implemented
> using kmap_local_page() to handle both lowmem and highmem regions
> properly. All callers throughout the codebase are updated to use the
> new phys_addr_t based interface.

Use the function Matthew pointed at kmap_local_pfn()

Maybe introduce the kmap_local_phys() he suggested too.

>  /* Helper function to handle DMA data transfers. */
> -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> +void kmsan_handle_dma(phys_addr_t phys, size_t size,
>  		      enum dma_data_direction dir)
>  {
>  	u64 page_offset, to_go, addr;
> +	struct page *page;
> +	void *kaddr;
>  
> -	if (PageHighMem(page))
> +	if (!pfn_valid(PHYS_PFN(phys)))
>  		return;

Not needed, the caller must pass in a phys that is kmap
compatible. Maybe just leave a comment. FWIW today this is also not
checking for P2P or DEVICE non-kmap struct pages either, so it should
be fine without checks.

> -	addr = (u64)page_address(page) + offset;
> +
> +	page = phys_to_page(phys);
> +	page_offset = offset_in_page(phys);
> +
>  	/*
>  	 * The kernel may occasionally give us adjacent DMA pages not belonging
>  	 * to the same allocation. Process them separately to avoid triggering
>  	 * internal KMSAN checks.
>  	 */
>  	while (size > 0) {
> -		page_offset = offset_in_page(addr);
>  		to_go = min(PAGE_SIZE - page_offset, (u64)size);
> +
> +		if (PageHighMem(page))
> +			/* Handle highmem pages using kmap */
> +			kaddr = kmap_local_page(page);

No need for the PageHighMem() - just always call kmap_local_pfn().

I'd also propose that any debug/sanitizer checks that the passed phys
is valid for kmap (eg pfn valid, not zone_device, etc) should be
inside the kmap code.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page
  2025-08-04 12:42 ` [PATCH v1 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
@ 2025-08-07 13:08   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 13:08 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:43PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Extend base DMA page API to handle MMIO flow.

I would mention here this follows the long ago agreement that we don't
need to enable P2P in the legacy dma_ops area. Simply failing when
getting an ATTR_MMIO is OK.

> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  	phys_addr_t phys = page_to_phys(page) + offset;
> +	bool is_mmio = attrs & DMA_ATTR_MMIO;
>  	dma_addr_t addr;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> @@ -166,12 +167,23 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>  		return DMA_MAPPING_ERROR;
>  
>  	if (dma_map_direct(dev, ops) ||
> -	    arch_dma_map_phys_direct(dev, phys + size))
> +	    (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
>  		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);

I don't know this area, maybe explain a bit in the commit message how
you see ATTR_MMIO interacts with arch_dma_map_phys_direct ?

>  	else if (use_dma_iommu(dev))
>  		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
> -	else
> +	else if (is_mmio) {
> +		if (!ops->map_resource)
> +			return DMA_MAPPING_ERROR;
> +
> +		addr = ops->map_resource(dev, phys, size, dir, attrs);
> +	} else {
> +		/*
> +		 * All platforms which implement .map_page() don't support
> +		 * non-struct page backed addresses.
> +		 */
>  		addr = ops->map_page(dev, page, offset, size, dir, attrs);

Comment could be clearer maybe just:

 The dma_ops API contract for ops->map_page() requires kmappable memory, while
 ops->map_resource() does not.

But this approach looks good to me, it prevents non-kmappable phys
from going down to the legacy dma_ops map_page where it cannot work.

From here you could do what Marek and Christoph asked to flush the
struct page out of the ops->map_page() and replace it with
kmap_local_phys().

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 13/16] mm/hmm: properly take MMIO path
  2025-08-04 12:42 ` [PATCH v1 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
@ 2025-08-07 13:14   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 13:14 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:47PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> In case peer-to-peer transaction traverses through host bridge,
> the IOMMU needs to have IOMMU_MMIO flag, together with skip of
> CPU sync.
> 
> The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag,
> but IOMMU flag was missed, due to assumption that such memory
> can be treated as regular one.
> 
> Reuse newly introduced DMA attribute to properly take MMIO path.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  mm/hmm.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 015ab243f0813..6556c0e074ba8 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -746,7 +746,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
>  	case PCI_P2PDMA_MAP_NONE:
>  		break;
>  	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> -		attrs |= DMA_ATTR_SKIP_CPU_SYNC;
> +		attrs |= DMA_ATTR_MMIO;
>  		pfns[idx] |= HMM_PFN_P2PDMA;
>  		break;

Yeah, this is a lot cleaner

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 12/16] mm/hmm: migrate to physical address-based DMA mapping API
  2025-08-04 12:42 ` [PATCH v1 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
@ 2025-08-07 13:14   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 13:14 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:46PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Convert HMM DMA operations from the legacy page-based API to the new
> physical address-based dma_map_phys() and dma_unmap_phys() functions.
> This demonstrates the preferred approach for new code that should use
> physical addresses directly rather than page+offset parameters.
> 
> The change replaces dma_map_page() and dma_unmap_page() calls with
> dma_map_phys() and dma_unmap_phys() respectively, using the physical
> address that was already available in the code. This eliminates the
> redundant page-to-physical address conversion and aligns with the
> DMA subsystem's move toward physical address-centric interfaces.
> 
> This serves as an example of how new code should be written to leverage
> the more efficient physical address API, which provides cleaner interfaces
> for drivers that already have access to physical addresses.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  mm/hmm.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Maybe the next patch should be squished into here too if it is going
to be a full example

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface
  2025-08-04 12:42 ` [PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
@ 2025-08-07 13:38   ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 13:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:45PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
> that operate directly on physical addresses instead of page+offset
> parameters. This provides a more efficient interface for drivers that
> already have physical addresses available.
> 
> The new functions are implemented as the primary mapping layer, with
> the existing dma_map_page_attrs() and dma_unmap_page_attrs() functions
> converted to simple wrappers around the phys-based implementations.

Briefly explain how the existing functions are remapped into wrappers
calling the phys functions.

> +dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
> +		size_t offset, size_t size, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +	phys_addr_t phys = page_to_phys(page) + offset;
> +
> +	if (unlikely(attrs & DMA_ATTR_MMIO))
> +		return DMA_MAPPING_ERROR;
> +
> +	if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
> +		WARN_ON_ONCE(!pfn_valid(PHYS_PFN(phys)));

This is not useful, if we have a struct page and did page_to_phys then
pfn_valid is always true.

Instead this should check for any ZONE_DEVICE page and reject that.
And handle the error:

  if (WARN_ON_ONCE()) return DMA_MAPPING_ERROR;

I'd add another debug check inside dma_map_phys that if !ATTR_MMIO
then pfn_valid, and not zone_device

> @@ -337,41 +364,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
>  dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs)
>  {

> -	const struct dma_map_ops *ops = get_dma_ops(dev);
> -	dma_addr_t addr = DMA_MAPPING_ERROR;
> -
> -	BUG_ON(!valid_dma_direction(dir));
> -
> -	if (WARN_ON_ONCE(!dev->dma_mask))
> +	if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
> +	    WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>  		return DMA_MAPPING_ERROR;
>  
> -	if (dma_map_direct(dev, ops))
> -		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
> -	else if (use_dma_iommu(dev))
> -		addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
> -	else if (ops->map_resource)
> -		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> -
> -	trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
> -	debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
> -	return addr;
> +	return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
>  }
>  EXPORT_SYMBOL(dma_map_resource);

I think this makes alot of sense at least.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface
  2025-08-04 12:42 ` [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
@ 2025-08-07 13:45   ` Jason Gunthorpe
  2025-08-13 15:37     ` Leon Romanovsky
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 13:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:50PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Block layer maps MMIO memory through dma_map_phys() interface
> with help of DMA_ATTR_MMIO attribute. There is a need to unmap
> that memory with the appropriate unmap function.

Be specific, AFIACT the issue is that on dma_ops platforms the map
will call ops->map_resource for ATTR_MMIO so we must have the unmap
call ops->unmap_resournce

Maybe these patches should be swapped then, as adding ATTR_MMIO seems
like it created this issue?

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/16] dma-mapping: migrate to physical address-based API
  2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (15 preceding siblings ...)
  2025-08-04 12:42 ` [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
@ 2025-08-07 14:19 ` Jason Gunthorpe
  2025-08-08 18:51   ` Marek Szyprowski
  16 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-07 14:19 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Mon, Aug 04, 2025 at 03:42:34PM +0300, Leon Romanovsky wrote:
> Changelog:
> v1:
>  * Added new DMA_ATTR_MMIO attribute to indicate
>    PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
>  * Rewrote dma_map_* functions to use thus new attribute
> v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
> ------------------------------------------------------------------------
> 
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.

Lets elaborate this as Robin asked:

This series refactors the DMA mapping API to provide a phys_addr_t
based, and struct-page free, external API that can handle all the
mapping cases we want in modern systems:

 - struct page based cachable DRAM
 - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cachable MMIO
 - struct page-less PCI peer to peer non-cachable MMIO
 - struct page-less "resource" MMIO

Overall this gets much closer to Matthew's long term wish for
struct-pageless IO to cachable DRAM. The remaining primary work would
be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
phys_addr_t without a struct page.

The general design is to remove struct page usage entirely from the
DMA API inner layers. For flows that need to have a KVA for the
physical address they can use kmap_local_pfn() or phys_to_virt(). This
isolates the struct page requirements to MM code only. Long term all
removals of struct page usage are supporting Matthew's memdesc
project which seeks to substantially transform how struct page works.

Instead make the DMA API internals work on phys_addr_t. Internally
there are still dedicated 'page' and 'resource' flows, except they are
now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
flows use the same phys_addr_t.

When DMA_ATTR_MMIO is specified things work similar to the existing
'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
pfn_valid(), etc are never called on the phys_addr_t. This requires
rejecting any configuration that would need swiotlb. CPU cache
flushing is not required, and avoided, as ATTR_MMIO also indicates the
address have no cachable mappings. This effectively removes any
DMA API side requirement to have struct page when DMA_ATTR_MMIO is
used.

In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
except on the common path of no cache flush, no swiotlb it never
touches a struct page. When cache flushing or swiotlb copying
kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
usage. This was already the case on the unmap side, now the map side
is symmetric.

Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
path must also set it. This corrects some existing bugs where iommu
mappings for P2P MMIO were improperly marked IOMMU_CACHE.

Since ATTR_MMIO is made to work with all the existing DMA map entry
points, particularly dma_iova_link(), this finally allows a way to use
the new DMA API to map PCI P2P MMIO without creating struct page. The
VFIO DMABUF series demonstrates how this works. This is intended to
replace the incorrect driver use of dma_map_resource() on PCI BAR
addresses.

This series does the core code and modern flows. A followup series
will give the same treatement to the legacy dma_ops implementation.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 10/16] xen: swiotlb: Open code map_resource callback
  2025-08-04 12:42 ` [PATCH v1 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
@ 2025-08-07 14:40   ` Jürgen Groß
  0 siblings, 0 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-08-07 14:40 UTC (permalink / raw)
  To: Leon Romanovsky, Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, kasan-dev, Keith Busch,
	linux-block, linux-doc, linux-kernel, linux-mm, linux-nvme,
	linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 327 bytes --]

On 04.08.25 14:42, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> General dma_direct_map_resource() is going to be removed
> in next patch, so simply open-code it in xen driver.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/16] dma-mapping: migrate to physical address-based API
  2025-08-07 14:19 ` [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Jason Gunthorpe
@ 2025-08-08 18:51   ` Marek Szyprowski
  2025-08-09 13:34     ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Szyprowski @ 2025-08-08 18:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On 07.08.2025 16:19, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 03:42:34PM +0300, Leon Romanovsky wrote:
>> Changelog:
>> v1:
>>   * Added new DMA_ATTR_MMIO attribute to indicate
>>     PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
>>   * Rewrote dma_map_* functions to use thus new attribute
>> v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
>> ------------------------------------------------------------------------
>>
>> This series refactors the DMA mapping to use physical addresses
>> as the primary interface instead of page+offset parameters. This
>> change aligns the DMA API with the underlying hardware reality where
>> DMA operations work with physical addresses, not page structures.
> Lets elaborate this as Robin asked:
>
> This series refactors the DMA mapping API to provide a phys_addr_t
> based, and struct-page free, external API that can handle all the
> mapping cases we want in modern systems:
>
>   - struct page based cachable DRAM
>   - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cachable MMIO
>   - struct page-less PCI peer to peer non-cachable MMIO
>   - struct page-less "resource" MMIO
>
> Overall this gets much closer to Matthew's long term wish for
> struct-pageless IO to cachable DRAM. The remaining primary work would
> be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
> phys_addr_t without a struct page.
>
> The general design is to remove struct page usage entirely from the
> DMA API inner layers. For flows that need to have a KVA for the
> physical address they can use kmap_local_pfn() or phys_to_virt(). This
> isolates the struct page requirements to MM code only. Long term all
> removals of struct page usage are supporting Matthew's memdesc
> project which seeks to substantially transform how struct page works.
>
> Instead make the DMA API internals work on phys_addr_t. Internally
> there are still dedicated 'page' and 'resource' flows, except they are
> now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
> flows use the same phys_addr_t.
>
> When DMA_ATTR_MMIO is specified things work similar to the existing
> 'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
> pfn_valid(), etc are never called on the phys_addr_t. This requires
> rejecting any configuration that would need swiotlb. CPU cache
> flushing is not required, and avoided, as ATTR_MMIO also indicates the
> address have no cachable mappings. This effectively removes any
> DMA API side requirement to have struct page when DMA_ATTR_MMIO is
> used.
>
> In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
> except on the common path of no cache flush, no swiotlb it never
> touches a struct page. When cache flushing or swiotlb copying
> kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
> usage. This was already the case on the unmap side, now the map side
> is symmetric.
>
> Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
> must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
> path must also set it. This corrects some existing bugs where iommu
> mappings for P2P MMIO were improperly marked IOMMU_CACHE.
>
> Since ATTR_MMIO is made to work with all the existing DMA map entry
> points, particularly dma_iova_link(), this finally allows a way to use
> the new DMA API to map PCI P2P MMIO without creating struct page. The
> VFIO DMABUF series demonstrates how this works. This is intended to
> replace the incorrect driver use of dma_map_resource() on PCI BAR
> addresses.
>
> This series does the core code and modern flows. A followup series
> will give the same treatement to the legacy dma_ops implementation.

Thanks for the elaborate description, that's something that was missing 
in the previous attempt. I read again all the previous discussion and 
this explanation and there are still two things that imho needs more 
clarification.


First - basing the API on the phys_addr_t.

Page based API had the advantage that it was really hard to abuse it and 
call for something that is not 'a normal RAM'. I initially though that 
phys_addr_t based API will somehow simplify arch specific 
implementation, as some of them indeed rely on phys_addr_t internally, 
but I missed other things pointed by Robin. Do we have here any 
alternative?


Second - making dma_map_phys() a single API to handle all cases.

Do we really need such single function to handle all cases? To handle 
P2P case, the caller already must pass DMA_ATTR_MMIO, so it must somehow 
keep such information internally. Cannot it just call existing 
dma_map_resource(), so there will be clear distinction between these 2 
cases (DMA to RAM and P2P DMA)? Do we need additional check for 
DMA_ATTR_MMIO for every typical DMA user? I know that branching is 
cheap, but this will probably increase code size for most of the typical 
users for no reason.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/16] dma-mapping: migrate to physical address-based API
  2025-08-08 18:51   ` Marek Szyprowski
@ 2025-08-09 13:34     ` Jason Gunthorpe
  2025-08-09 16:53       ` Demi Marie Obenour
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-09 13:34 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Fri, Aug 08, 2025 at 08:51:08PM +0200, Marek Szyprowski wrote:
> First - basing the API on the phys_addr_t.
> 
> Page based API had the advantage that it was really hard to abuse it and 
> call for something that is not 'a normal RAM'. 

This is not true anymore. Today we have ZONE_DEVICE as a struct page
type with a whole bunch of non-dram sub-types:

enum memory_type {
	/* 0 is reserved to catch uninitialized type fields */
	MEMORY_DEVICE_PRIVATE = 1,
	MEMORY_DEVICE_COHERENT,
	MEMORY_DEVICE_FS_DAX,
	MEMORY_DEVICE_GENERIC,
	MEMORY_DEVICE_PCI_P2PDMA,
};

Few of which are kmappable/page_to_virtable() in a way that is useful
for the DMA API.

DMA API sort of ignores all of this and relies on the caller to not
pass in an incorrect struct page. eg we rely on things like the block
stack to do the right stuff when a MEMORY_DEVICE_PCI_P2PDMA is present
in a bio_vec.

Which is not really fundamentally different from just using
phys_addr_t in the first place.

Sure, this was a stronger argument when this stuff was originally
written, before ZONE_DEVICE was invented.

> I initially though that phys_addr_t based API will somehow simplify
> arch specific implementation, as some of them indeed rely on
> phys_addr_t internally, but I missed other things pointed by
> Robin. Do we have here any alternative?

I think it is less of a code simplification, more as a reduction in
conceptual load. When we can say directly there is no struct page type
anyhwere in the DMA API layers then we only have to reason about
kmap/phys_to_virt compatibly.

This is also a weaker overall requirement than needing an actual
struct page which allows optimizing other parts of the kernel. Like we
aren't forced to create MEMORY_DEVICE_PCI_P2PDMA stuct pages just to
use the dma api.

Again, any place in the kernel we can get rid of struct page the
smoother the road will be for the MM side struct page restructuring.

For example one of the bigger eventual goes here is to make a bio_vec
store phys_addr_t, not struct page pointers.

DMA API is not alone here, we have been de-struct-paging the kernel
for a long time now:

netdev: https://lore.kernel.org/linux-mm/20250609043225.77229-1-byungchul@sk.com/
slab: https://lore.kernel.org/linux-mm/20211201181510.18784-1-vbabka@suse.cz/
iommmu: https://lore.kernel.org/all/0-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com/
page tables: https://lore.kernel.org/linux-mm/20230731170332.69404-1-vishal.moola@gmail.com/
zswap: https://lore.kernel.org/all/20241216150450.1228021-1-42.hyeyoo@gmail.com/

With a long term goal that struct page only exists for legacy code,
and is maybe entirely compiled out of modern server kernels.

> Second - making dma_map_phys() a single API to handle all cases.
> 
> Do we really need such single function to handle all cases? 

If we accept the direction to remove struct page then it makes little
sense to have a dma_map_ram(phys_addr) and dma_map_resource(phys_addr)
and force key callers (like block) to have more ifs - especially if
the conditional could become "free" inside the dma API (see below).

Plus if we keep the callchain split then adding a
"dma_link_resource"/etc are now needed as well.

> DMA_ATTR_MMIO for every typical DMA user? I know that branching is 
> cheap, but this will probably increase code size for most of the typical 
> users for no reason.

Well, having two call chains will increase the code size much more,
and 'resource' can't be compiled out. Arguably this unification should
reduce the .text size since many of the resource only functions go
away.

There are some branches, and I think the push toward re-using
DMA_ATTR_SKIP_CPU_SYNC was directly to try to reduce that branch
cost.

However, I think we should be looking for a design here that is "free"
on the fast no-swiotlb and non-cache-flush path. I think this can be
achieved by checking ATTR_MMIO only after seeing swiotlb is needed
(like today's is p2p check). And we can probably freely fold it into
the existing sync check:

	if ((attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)) == 0)

I saw Leon hasn't done these micro optimizations, but it seems like it
could work out.

Regards,
Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/16] dma-mapping: migrate to physical address-based API
  2025-08-09 13:34     ` Jason Gunthorpe
@ 2025-08-09 16:53       ` Demi Marie Obenour
  2025-08-10 17:02         ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Demi Marie Obenour @ 2025-08-09 16:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Marek Szyprowski
  Cc: Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3040 bytes --]

On 8/9/25 09:34, Jason Gunthorpe wrote:
> On Fri, Aug 08, 2025 at 08:51:08PM +0200, Marek Szyprowski wrote:
>> First - basing the API on the phys_addr_t.
>>
>> Page based API had the advantage that it was really hard to abuse it and 
>> call for something that is not 'a normal RAM'. 
> 
> This is not true anymore. Today we have ZONE_DEVICE as a struct page
> type with a whole bunch of non-dram sub-types:
> 
> enum memory_type {
> 	/* 0 is reserved to catch uninitialized type fields */
> 	MEMORY_DEVICE_PRIVATE = 1,
> 	MEMORY_DEVICE_COHERENT,
> 	MEMORY_DEVICE_FS_DAX,
> 	MEMORY_DEVICE_GENERIC,
> 	MEMORY_DEVICE_PCI_P2PDMA,
> };
> 
> Few of which are kmappable/page_to_virtable() in a way that is useful
> for the DMA API.
> 
> DMA API sort of ignores all of this and relies on the caller to not
> pass in an incorrect struct page. eg we rely on things like the block
> stack to do the right stuff when a MEMORY_DEVICE_PCI_P2PDMA is present
> in a bio_vec.
> 
> Which is not really fundamentally different from just using
> phys_addr_t in the first place.
> 
> Sure, this was a stronger argument when this stuff was originally
> written, before ZONE_DEVICE was invented.
> 
>> I initially though that phys_addr_t based API will somehow simplify
>> arch specific implementation, as some of them indeed rely on
>> phys_addr_t internally, but I missed other things pointed by
>> Robin. Do we have here any alternative?
> 
> I think it is less of a code simplification, more as a reduction in
> conceptual load. When we can say directly there is no struct page type
> anyhwere in the DMA API layers then we only have to reason about
> kmap/phys_to_virt compatibly.
> 
> This is also a weaker overall requirement than needing an actual
> struct page which allows optimizing other parts of the kernel. Like we
> aren't forced to create MEMORY_DEVICE_PCI_P2PDMA stuct pages just to
> use the dma api.
> 
> Again, any place in the kernel we can get rid of struct page the
> smoother the road will be for the MM side struct page restructuring.
> 
> For example one of the bigger eventual goes here is to make a bio_vec
> store phys_addr_t, not struct page pointers.
> 
> DMA API is not alone here, we have been de-struct-paging the kernel
> for a long time now:
> 
> netdev: https://lore.kernel.org/linux-mm/20250609043225.77229-1-byungchul@sk.com/
> slab: https://lore.kernel.org/linux-mm/20211201181510.18784-1-vbabka@suse.cz/
> iommmu: https://lore.kernel.org/all/0-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com/
> page tables: https://lore.kernel.org/linux-mm/20230731170332.69404-1-vishal.moola@gmail.com/
> zswap: https://lore.kernel.org/all/20241216150450.1228021-1-42.hyeyoo@gmail.com/
> 
> With a long term goal that struct page only exists for legacy code,
> and is maybe entirely compiled out of modern server kernels.

Why just server kernels?  I suspect client systems actually run
newer kernels than servers do.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 00/16] dma-mapping: migrate to physical address-based API
  2025-08-09 16:53       ` Demi Marie Obenour
@ 2025-08-10 17:02         ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-10 17:02 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Sat, Aug 09, 2025 at 12:53:09PM -0400, Demi Marie Obenour wrote:
> > With a long term goal that struct page only exists for legacy code,
> > and is maybe entirely compiled out of modern server kernels.
> 
> Why just server kernels?  I suspect client systems actually run
> newer kernels than servers do.

I would guess this is because of the people who are interested in this
work. Frankly there isn't much benifit for small memory client
systems. Modern servers have > 1TB of memory and struct page really
hurts here.

The flip side of this is the work is enormous and I think there is a
general idea that the smaller set of server related drivers and
subsystems will get ready well before the wider universe of stuff a
client or android might use.

It is not that more can't happen it just ultimately depends on
interest and time.

Many modern servers use quite new kernels if you ignore the enterprise
distros :\

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-07 12:21   ` Jason Gunthorpe
@ 2025-08-13 15:07     ` Leon Romanovsky
  2025-08-14 12:13       ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-13 15:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 07, 2025 at 09:21:15AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 03:42:42PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Convert the KMSAN DMA handling function from page-based to physical
> > address-based interface.
> > 
> > The refactoring renames kmsan_handle_dma() parameters from accepting
> > (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> > size_t size). A PFN_VALID check is added to prevent KMSAN operations
> > on non-page memory, preventing from non struct page backed address,
> > 
> > As part of this change, support for highmem addresses is implemented
> > using kmap_local_page() to handle both lowmem and highmem regions
> > properly. All callers throughout the codebase are updated to use the
> > new phys_addr_t based interface.
> 
> Use the function Matthew pointed at kmap_local_pfn()
> 
> Maybe introduce the kmap_local_phys() he suggested too.

At this point it gives nothing.

> 
> >  /* Helper function to handle DMA data transfers. */
> > -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> > +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> >  		      enum dma_data_direction dir)
> >  {
> >  	u64 page_offset, to_go, addr;
> > +	struct page *page;
> > +	void *kaddr;
> >  
> > -	if (PageHighMem(page))
> > +	if (!pfn_valid(PHYS_PFN(phys)))
> >  		return;
> 
> Not needed, the caller must pass in a phys that is kmap
> compatible. Maybe just leave a comment. FWIW today this is also not
> checking for P2P or DEVICE non-kmap struct pages either, so it should
> be fine without checks.

It is not true as we will call to kmsan_handle_dma() unconditionally in
dma_map_phys(). The reason to it is that kmsan_handle_dma() is guarded
with debug kconfig options and cost of pfn_valid() can be accommodated
in that case. It gives more clean DMA code.

   155 dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
   156                 enum dma_data_direction dir, unsigned long attrs)
   157 {
   <...>
   187
   188         kmsan_handle_dma(phys, size, dir);
   189         trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
   190         debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
   191
   192         return addr;
   193 }
   194 EXPORT_SYMBOL_GPL(dma_map_phys);

So let's keep this patch as is.

Thanks

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface
  2025-08-07 13:45   ` Jason Gunthorpe
@ 2025-08-13 15:37     ` Leon Romanovsky
  0 siblings, 0 replies; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-13 15:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 07, 2025 at 10:45:33AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 03:42:50PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Block layer maps MMIO memory through dma_map_phys() interface
> > with help of DMA_ATTR_MMIO attribute. There is a need to unmap
> > that memory with the appropriate unmap function.
> 
> Be specific, AFIACT the issue is that on dma_ops platforms the map
> will call ops->map_resource for ATTR_MMIO so we must have the unmap
> call ops->unmap_resournce
> 
> Maybe these patches should be swapped then, as adding ATTR_MMIO seems
> like it created this issue?

The best variant will be to squash previous patch "block-dma: properly
take MMIO path", but I don't want to mix them as they for different
kernel areas.

Thanks

> 
> Jason
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-13 15:07     ` Leon Romanovsky
@ 2025-08-14 12:13       ` Jason Gunthorpe
  2025-08-14 12:35         ` Leon Romanovsky
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-14 12:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Wed, Aug 13, 2025 at 06:07:18PM +0300, Leon Romanovsky wrote:
> > >  /* Helper function to handle DMA data transfers. */
> > > -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> > > +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> > >  		      enum dma_data_direction dir)
> > >  {
> > >  	u64 page_offset, to_go, addr;
> > > +	struct page *page;
> > > +	void *kaddr;
> > >  
> > > -	if (PageHighMem(page))
> > > +	if (!pfn_valid(PHYS_PFN(phys)))
> > >  		return;
> > 
> > Not needed, the caller must pass in a phys that is kmap
> > compatible. Maybe just leave a comment. FWIW today this is also not
> > checking for P2P or DEVICE non-kmap struct pages either, so it should
> > be fine without checks.
> 
> It is not true as we will call to kmsan_handle_dma() unconditionally in
> dma_map_phys(). The reason to it is that kmsan_handle_dma() is guarded
> with debug kconfig options and cost of pfn_valid() can be accommodated
> in that case. It gives more clean DMA code.

Then check attrs here, not pfn_valid.

> So let's keep this patch as is.

Still need to fix the remarks you clipped, do not check PageHighMem
just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
patch and should not be here, it is the wrong way to use highmem.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-14 12:13       ` Jason Gunthorpe
@ 2025-08-14 12:35         ` Leon Romanovsky
  2025-08-14 12:44           ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-14 12:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 14, 2025 at 09:13:16AM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 13, 2025 at 06:07:18PM +0300, Leon Romanovsky wrote:
> > > >  /* Helper function to handle DMA data transfers. */
> > > > -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> > > > +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> > > >  		      enum dma_data_direction dir)
> > > >  {
> > > >  	u64 page_offset, to_go, addr;
> > > > +	struct page *page;
> > > > +	void *kaddr;
> > > >  
> > > > -	if (PageHighMem(page))
> > > > +	if (!pfn_valid(PHYS_PFN(phys)))
> > > >  		return;
> > > 
> > > Not needed, the caller must pass in a phys that is kmap
> > > compatible. Maybe just leave a comment. FWIW today this is also not
> > > checking for P2P or DEVICE non-kmap struct pages either, so it should
> > > be fine without checks.
> > 
> > It is not true as we will call to kmsan_handle_dma() unconditionally in
> > dma_map_phys(). The reason to it is that kmsan_handle_dma() is guarded
> > with debug kconfig options and cost of pfn_valid() can be accommodated
> > in that case. It gives more clean DMA code.
> 
> Then check attrs here, not pfn_valid.

attrs are not available in kmsan_handle_dma(). I can add it if you prefer.

> 
> > So let's keep this patch as is.
> 
> Still need to fix the remarks you clipped, do not check PageHighMem
> just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> patch and should not be here, it is the wrong way to use highmem.

Sure, thanks

> 
> Jason
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-14 12:35         ` Leon Romanovsky
@ 2025-08-14 12:44           ` Jason Gunthorpe
  2025-08-14 13:31             ` Leon Romanovsky
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-14 12:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 14, 2025 at 03:35:06PM +0300, Leon Romanovsky wrote:
> > Then check attrs here, not pfn_valid.
> 
> attrs are not available in kmsan_handle_dma(). I can add it if you prefer.

That makes more sense to the overall design. The comments I gave
before were driving at a promise to never try to touch a struct page
for ATTR_MMIO and think this should be comphrensive to never touching
a struct page even if pfnvalid.

> > > So let's keep this patch as is.
> > 
> > Still need to fix the remarks you clipped, do not check PageHighMem
> > just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> > patch and should not be here, it is the wrong way to use highmem.
> 
> Sure, thanks

I am wondering if there is some reason it was written like this in the
first place. Maybe we can't even do kmap here.. So perhaps if there is
not a strong reason to change it just continue to check pagehighmem
and fail.

if (!(attrs & ATTR_MMIO) && PageHighMem(phys_to_page(phys)))
   return;

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-14 12:44           ` Jason Gunthorpe
@ 2025-08-14 13:31             ` Leon Romanovsky
  2025-08-14 14:14               ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Leon Romanovsky @ 2025-08-14 13:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 14, 2025 at 09:44:48AM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 14, 2025 at 03:35:06PM +0300, Leon Romanovsky wrote:
> > > Then check attrs here, not pfn_valid.
> > 
> > attrs are not available in kmsan_handle_dma(). I can add it if you prefer.
> 
> That makes more sense to the overall design. The comments I gave
> before were driving at a promise to never try to touch a struct page
> for ATTR_MMIO and think this should be comphrensive to never touching
> a struct page even if pfnvalid.
> 
> > > > So let's keep this patch as is.
> > > 
> > > Still need to fix the remarks you clipped, do not check PageHighMem
> > > just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> > > patch and should not be here, it is the wrong way to use highmem.
> > 
> > Sure, thanks
> 
> I am wondering if there is some reason it was written like this in the
> first place. Maybe we can't even do kmap here.. So perhaps if there is
> not a strong reason to change it just continue to check pagehighmem
> and fail.
> 
> if (!(attrs & ATTR_MMIO) && PageHighMem(phys_to_page(phys)))
>    return;

Does this version good enough? There is no need to call to
kmap_local_pfn() if we prevent PageHighMem pages.

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index eab7912a3bf0..d9cf70f4159c 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -337,13 +337,13 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,

 /* Helper function to handle DMA data transfers. */
 void kmsan_handle_dma(phys_addr_t phys, size_t size,
-                     enum dma_data_direction dir)
+                     enum dma_data_direction dir, unsigned long attrs)
 {
        u64 page_offset, to_go, addr;
        struct page *page;
        void *kaddr;

-       if (!pfn_valid(PHYS_PFN(phys)))
+       if ((attrs & ATTR_MMIO) || PageHighMem(phys_to_page(phys)))
                return;

        page = phys_to_page(phys);
@@ -357,19 +357,12 @@ void kmsan_handle_dma(phys_addr_t phys, size_t size,
        while (size > 0) {
                to_go = min(PAGE_SIZE - page_offset, (u64)size);

-               if (PageHighMem(page))
-                       /* Handle highmem pages using kmap */
-                       kaddr = kmap_local_page(page);
-               else
-                       /* Lowmem pages can be accessed directly */
-                       kaddr = page_address(page);
+               /* Lowmem pages can be accessed directly */
+               kaddr = page_address(page);

                addr = (u64)kaddr + page_offset;
                kmsan_handle_dma_page((void *)addr, to_go, dir);

-               if (PageHighMem(page))
-                       kunmap_local(page);
-
                phys += to_go;
                size -= to_go;

(END)


> 
> Jason
> 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-14 13:31             ` Leon Romanovsky
@ 2025-08-14 14:14               ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-08-14 14:14 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 14, 2025 at 04:31:06PM +0300, Leon Romanovsky wrote:
> On Thu, Aug 14, 2025 at 09:44:48AM -0300, Jason Gunthorpe wrote:
> > On Thu, Aug 14, 2025 at 03:35:06PM +0300, Leon Romanovsky wrote:
> > > > Then check attrs here, not pfn_valid.
> > > 
> > > attrs are not available in kmsan_handle_dma(). I can add it if you prefer.
> > 
> > That makes more sense to the overall design. The comments I gave
> > before were driving at a promise to never try to touch a struct page
> > for ATTR_MMIO and think this should be comphrensive to never touching
> > a struct page even if pfnvalid.
> > 
> > > > > So let's keep this patch as is.
> > > > 
> > > > Still need to fix the remarks you clipped, do not check PageHighMem
> > > > just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> > > > patch and should not be here, it is the wrong way to use highmem.
> > > 
> > > Sure, thanks
> > 
> > I am wondering if there is some reason it was written like this in the
> > first place. Maybe we can't even do kmap here.. So perhaps if there is
> > not a strong reason to change it just continue to check pagehighmem
> > and fail.
> > 
> > if (!(attrs & ATTR_MMIO) && PageHighMem(phys_to_page(phys)))
> >    return;
> 
> Does this version good enough? There is no need to call to
> kmap_local_pfn() if we prevent PageHighMem pages.

Why make the rest of the changes though, isn't it just:

        if (PageHighMem(page))
                return;

Becomes:

        if (attrs & ATTR_MMIO))
                return;

	page = phys_to_page(phys);
	if (PageHighMem(page))
                 return;

Leave the rest as is?

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2025-08-14 14:14 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-04 12:42 [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-08-04 12:42 ` [PATCH v1 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
2025-08-06 17:31   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 02/16] iommu/dma: handle MMIO path in dma_iova_link Leon Romanovsky
2025-08-06 18:10   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
2025-08-06 18:26   ` Jason Gunthorpe
2025-08-06 18:38     ` Leon Romanovsky
2025-08-04 12:42 ` [PATCH v1 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
2025-08-04 12:42 ` [PATCH v1 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
2025-08-06 18:44   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
2025-08-07 12:07   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
2025-08-07 12:13   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
2025-08-07 12:21   ` Jason Gunthorpe
2025-08-13 15:07     ` Leon Romanovsky
2025-08-14 12:13       ` Jason Gunthorpe
2025-08-14 12:35         ` Leon Romanovsky
2025-08-14 12:44           ` Jason Gunthorpe
2025-08-14 13:31             ` Leon Romanovsky
2025-08-14 14:14               ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
2025-08-07 13:08   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
2025-08-07 14:40   ` Jürgen Groß
2025-08-04 12:42 ` [PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
2025-08-07 13:38   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
2025-08-07 13:14   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
2025-08-07 13:14   ` Jason Gunthorpe
2025-08-04 12:42 ` [PATCH v1 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
2025-08-04 12:42 ` [PATCH v1 15/16] block-dma: properly take MMIO path Leon Romanovsky
2025-08-04 12:42 ` [PATCH v1 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
2025-08-07 13:45   ` Jason Gunthorpe
2025-08-13 15:37     ` Leon Romanovsky
2025-08-07 14:19 ` [PATCH v1 00/16] dma-mapping: migrate to physical address-based API Jason Gunthorpe
2025-08-08 18:51   ` Marek Szyprowski
2025-08-09 13:34     ` Jason Gunthorpe
2025-08-09 16:53       ` Demi Marie Obenour
2025-08-10 17:02         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).