[PATCH 0/8] dma-mapping: migrate to physical address-based API

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/8] dma-mapping: migrate to physical address-based API
@ 2025-06-25 13:18 ` Leon Romanovsky
  2025-06-25 13:18   ` [PATCH 1/8] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
                     ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series consists of 8 patches that progressively convert the DMA
mapping infrastructure from page-based to physical address-based APIs:

The series maintains backward compatibility by keeping the old
page-based API as wrapper functions around the new physical
address-based implementations.

Thanks

Leon Romanovsky (8):
  dma-debug: refactor to use physical addresses for page mapping
  dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  kmsan: convert kmsan_handle_dma to use physical addresses
  dma-mapping: fail early if physical address is mapped through platform
    callback
  dma-mapping: export new dma_*map_phys() interface
  mm/hmm: migrate to physical address-based DMA mapping API

 Documentation/core-api/dma-api.rst |  4 +-
 arch/powerpc/kernel/dma-iommu.c    |  4 +-
 drivers/iommu/dma-iommu.c          | 14 +++----
 drivers/virtio/virtio_ring.c       |  4 +-
 include/linux/dma-map-ops.h        |  8 ++--
 include/linux/dma-mapping.h        | 13 ++++++
 include/linux/iommu-dma.h          |  7 ++--
 include/linux/kmsan.h              | 12 +++---
 include/trace/events/dma.h         |  4 +-
 kernel/dma/debug.c                 | 28 ++++++++-----
 kernel/dma/debug.h                 | 16 ++++---
 kernel/dma/direct.c                |  6 +--
 kernel/dma/direct.h                | 13 +++---
 kernel/dma/mapping.c               | 67 +++++++++++++++++++++---------
 kernel/dma/ops_helpers.c           |  6 +--
 mm/hmm.c                           |  8 ++--
 mm/kmsan/hooks.c                   | 36 ++++++++++++----
 tools/virtio/linux/kmsan.h         |  2 +-
 18 files changed, 159 insertions(+), 93 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/8] dma-debug: refactor to use physical addresses for page mapping
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
@ 2025-06-25 13:18   ` Leon Romanovsky
  2025-06-25 13:18   ` [PATCH 2/8] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.

The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.

This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/core-api/dma-api.rst |  4 ++--
 kernel/dma/debug.c                 | 28 +++++++++++++++++-----------
 kernel/dma/debug.h                 | 16 +++++++---------
 kernel/dma/mapping.c               | 15 ++++++++-------
 4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 2ad08517e626..7491ee85ab25 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -816,7 +816,7 @@ example warning message may look like this::
 	[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
 	[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
 	[<ffffffff803c7ea3>] check_unmap+0x203/0x490
-	[<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+	[<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
 	[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
 	[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
 	[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -910,7 +910,7 @@ that a driver may be leaking mappings.
 dma-debug interface debug_dma_mapping_error() to debug drivers that fail
 to check DMA mapping errors on addresses returned by dma_map_single() and
 dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
 the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
 this flag is still set, prints warning message that includes call trace that
 leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index e43c6de2bce4..517dc58329e0 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -39,6 +39,7 @@ enum {
 	dma_debug_sg,
 	dma_debug_coherent,
 	dma_debug_resource,
+	dma_debug_phy,
 };
 
 enum map_err_types {
@@ -141,6 +142,7 @@ static const char *type2name[] = {
 	[dma_debug_sg] = "scatter-gather",
 	[dma_debug_coherent] = "coherent",
 	[dma_debug_resource] = "resource",
+	[dma_debug_phy] = "phy",
 };
 
 static const char *dir2name[] = {
@@ -1201,9 +1203,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
 }
 EXPORT_SYMBOL(debug_dma_map_single);
 
-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
-			size_t size, int direction, dma_addr_t dma_addr,
-			unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		int direction, dma_addr_t dma_addr, unsigned long attrs)
 {
 	struct dma_debug_entry *entry;
 
@@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 		return;
 
 	entry->dev       = dev;
-	entry->type      = dma_debug_single;
-	entry->paddr	 = page_to_phys(page) + offset;
+	entry->type      = dma_debug_phy;
+	entry->paddr	 = phys;
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
 	entry->map_err_type = MAP_ERR_NOT_CHECKED;
 
-	check_for_stack(dev, page, offset);
+	if (pfn_valid(PHYS_PFN(phys))) {
+		struct page *page = phys_to_page(phys);
+		size_t offset = offset_in_page(page);
 
-	if (!PageHighMem(page)) {
-		void *addr = page_address(page) + offset;
+		check_for_stack(dev, page, offset);
 
-		check_for_illegal_area(dev, addr, size);
+		if (!PageHighMem(page)) {
+			void *addr = page_address(page) + offset;
+
+			check_for_illegal_area(dev, addr, size);
+		}
 	}
 
 	add_dma_entry(entry, attrs);
@@ -1274,11 +1280,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 }
 EXPORT_SYMBOL(debug_dma_mapping_error);
 
-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
 			  size_t size, int direction)
 {
 	struct dma_debug_entry ref = {
-		.type           = dma_debug_single,
+		.type           = dma_debug_phy,
 		.dev            = dev,
 		.dev_addr       = dma_addr,
 		.size           = size,
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index f525197d3cae..76adb42bffd5 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
 #define _KERNEL_DMA_DEBUG_H
 
 #ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
-			       size_t offset, size_t size,
-			       int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+			       size_t size, int direction, dma_addr_t dma_addr,
 			       unsigned long attrs);
 
-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction);
 
 extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -55,14 +54,13 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
 					 struct scatterlist *sg,
 					 int nelems, int direction);
 #else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
-				      size_t offset, size_t size,
-				      int direction, dma_addr_t dma_addr,
-				      unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+				      size_t size, int direction,
+				      dma_addr_t dma_addr, unsigned long attrs)
 {
 }
 
-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
 					size_t size, int direction)
 {
 }
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 107e4a4d251d..4c1dfbabb8ae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+	    arch_dma_map_page_direct(dev, phys + size))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
-	trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
-			   attrs);
-	debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+	trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
 	return addr;
 }
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_page(dev, addr, size, dir, attrs);
-	debug_dma_unmap_page(dev, addr, size, dir);
+	debug_dma_unmap_phys(dev, addr, size, dir);
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
@@ -712,7 +712,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
 	if (page) {
 		trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle,
 				      size, dir, gfp, 0);
-		debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0);
+		debug_dma_map_phys(dev, page_to_phys(page), size, dir,
+				   *dma_handle, 0);
 	} else {
 		trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0);
 	}
@@ -738,7 +739,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
 		dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 	trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0);
-	debug_dma_unmap_page(dev, dma_handle, size, dir);
+	debug_dma_unmap_phys(dev, dma_handle, size, dir);
 	__dma_free_pages(dev, size, page, dma_handle, dir);
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/8] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-06-25 13:18   ` [PATCH 1/8] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-06-25 13:18   ` Leon Romanovsky
  2025-06-25 13:19   ` [PATCH 3/8] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/trace/events/dma.h | 4 ++--
 kernel/dma/mapping.c       | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c..c77d478b6deb 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -71,7 +71,7 @@ DEFINE_EVENT(dma_map, name, \
 		 size_t size, enum dma_data_direction dir, unsigned long attrs), \
 	TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
 
-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
 DEFINE_MAP_EVENT(dma_map_resource);
 
 DECLARE_EVENT_CLASS(dma_unmap,
@@ -109,7 +109,7 @@ DEFINE_EVENT(dma_unmap, name, \
 		 enum dma_data_direction dir, unsigned long attrs), \
 	TP_ARGS(dev, addr, size, dir, attrs))
 
-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
 DEFINE_UNMAP_EVENT(dma_unmap_resource);
 
 DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4c1dfbabb8ae..fe1f0da6dc50 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
-	trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
 	return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		iommu_dma_unmap_page(dev, addr, size, dir, attrs);
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
-	trace_dma_unmap_page(dev, addr, size, dir, attrs);
+	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/8] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-06-25 13:18   ` [PATCH 1/8] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
  2025-06-25 13:18   ` [PATCH 2/8] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
@ 2025-06-25 13:19   ` Leon Romanovsky
  2025-06-25 13:19   ` [PATCH 4/8] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.

The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.

All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 14 ++++++--------
 include/linux/iommu-dma.h |  7 +++----
 kernel/dma/mapping.c      |  4 ++--
 kernel/dma/ops_helpers.c  |  6 +++---
 4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fe..cd4bc22efa96 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1190,11 +1190,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
 	return iova_offset(iovad, phys | size);
 }
 
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-	      unsigned long offset, size_t size, enum dma_data_direction dir,
-	      unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
-	phys_addr_t phys = page_to_phys(page) + offset;
 	bool coherent = dev_is_dma_coherent(dev);
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1222,7 +1220,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	return iova;
 }
 
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1341,7 +1339,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		iommu_dma_unmap_page(dev, sg_dma_address(s),
+		iommu_dma_unmap_phys(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -1354,8 +1352,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	sg_dma_mark_swiotlb(sg);
 
 	for_each_sg(sg, s, nents, i) {
-		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
-				s->offset, s->length, dir, attrs);
+		sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+				s->length, dir, attrs);
 		if (sg_dma_address(s) == DMA_MAPPING_ERROR)
 			goto out_unmap;
 		sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39..485bdffed988 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
 }
 #endif /* CONFIG_IOMMU_DMA */
 
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs);
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe1f0da6dc50..58482536db9b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 	    arch_dma_map_page_direct(dev, phys + size))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else if (use_dma_iommu(dev))
-		addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	    arch_dma_unmap_page_direct(dev, addr + size))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
-		iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb9..6f9d604d9d40 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
 		return NULL;
 
 	if (use_dma_iommu(dev))
-		*dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
-						 DMA_ATTR_SKIP_CPU_SYNC);
+		*dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+						 dir, DMA_ATTR_SKIP_CPU_SYNC);
 	else
 		*dma_handle = ops->map_page(dev, page, 0, size, dir,
 					    DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	if (use_dma_iommu(dev))
-		iommu_dma_unmap_page(dev, dma_handle, size, dir,
+		iommu_dma_unmap_phys(dev, dma_handle, size, dir,
 				     DMA_ATTR_SKIP_CPU_SYNC);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, dma_handle, size, dir,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/8] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (2 preceding siblings ...)
  2025-06-25 13:19   ` [PATCH 3/8] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
@ 2025-06-25 13:19   ` Leon Romanovsky
  2025-06-25 13:19   ` [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with pfn_valid() checks
using PHYS_PFN(phys). This provides more accurate validation for non-page
backed memory regions without need to have "faked" struct page.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 arch/powerpc/kernel/dma-iommu.c |  4 ++--
 include/linux/dma-map-ops.h     |  8 ++++----
 kernel/dma/direct.c             |  6 +++---
 kernel/dma/direct.h             | 13 ++++++-------
 kernel/dma/mapping.c            |  8 ++++----
 5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f3..0359ab72cd3b 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
 #define can_map_direct(dev, addr) \
 	((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))
 
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
 {
 	if (likely(!dev->bus_dma_limit))
 		return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
 
 #define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)
 
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
 {
 	if (likely(!dev->bus_dma_limit))
 		return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5..71f5b3025415 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
 void arch_dma_clear_uncached(void *addr, size_t size);
 
 #ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
 bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
 		int nents);
 bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
 		int nents);
 #else
-#define arch_dma_map_page_direct(d, a)		(false)
-#define arch_dma_unmap_page_direct(d, a)	(false)
+#define arch_dma_map_phys_direct(d, a)		(false)
+#define arch_dma_unmap_phys_direct(d, a)	(false)
 #define arch_dma_map_sg_direct(d, s, n)		(false)
 #define arch_dma_unmap_sg_direct(d, s, n)	(false)
 #endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c879..fa75e3070073 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		if (sg_dma_is_bus_address(sg))
 			sg_dma_unmark_bus_address(sg);
 		else
-			dma_direct_unmap_page(dev, sg->dma_address,
+			dma_direct_unmap_phys(dev, sg->dma_address,
 					      sg_dma_len(sg), dir, attrs);
 	}
 }
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 			 */
 			break;
 		case PCI_P2PDMA_MAP_NONE:
-			sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
-					sg->offset, sg->length, dir, attrs);
+			sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+					sg->length, dir, attrs);
 			if (sg->dma_address == DMA_MAPPING_ERROR) {
 				ret = -EIO;
 				goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc..10c1ba73c482 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,22 +80,21 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_dma_mark_clean(paddr, size);
 }
 
-static inline dma_addr_t dma_direct_map_page(struct device *dev,
-		struct page *page, unsigned long offset, size_t size,
-		enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+		phys_addr_t phys, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
 {
-	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
 	if (is_swiotlb_force_bounce(dev)) {
-		if (is_pci_p2pdma_page(page))
+		if (!pfn_valid(PHYS_PFN(phys)))
 			return DMA_MAPPING_ERROR;
 		return swiotlb_map(dev, phys, size, dir, attrs);
 	}
 
 	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
 	    dma_kmalloc_needs_bounce(dev, size, dir)) {
-		if (is_pci_p2pdma_page(page))
+		if (!pfn_valid(PHYS_PFN(phys)))
 			return DMA_MAPPING_ERROR;
 		if (is_swiotlb_active(dev))
 			return swiotlb_map(dev, phys, size, dir, attrs);
@@ -111,7 +110,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
 	return dma_addr;
 }
 
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	phys_addr_t phys = dma_to_phys(dev, addr);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58482536db9b..80481a873340 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_page_direct(dev, phys + size))
-		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+	    arch_dma_map_phys_direct(dev, phys + size))
+		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_unmap_page_direct(dev, addr + size))
-		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	    arch_dma_unmap_phys_direct(dev, addr + size))
+		dma_direct_unmap_phys(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
 	else
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (3 preceding siblings ...)
  2025-06-25 13:19   ` [PATCH 4/8] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
@ 2025-06-25 13:19   ` Leon Romanovsky
  2025-06-26 17:43     ` Alexander Potapenko
  2025-06-25 13:19   ` [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback Leon Romanovsky
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

Convert the KMSAN DMA handling function from page-based to physical
address-based interface.

The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). A PFN_VALID check is added to prevent KMSAN operations
on non-page memory, preventing from non struct page backed address,

As part of this change, support for highmem addresses is implemented
using kmap_local_page() to handle both lowmem and highmem regions
properly. All callers throughout the codebase are updated to use the
new phys_addr_t based interface.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/virtio/virtio_ring.c |  4 ++--
 include/linux/kmsan.h        | 12 +++++++-----
 kernel/dma/mapping.c         |  2 +-
 mm/kmsan/hooks.c             | 36 +++++++++++++++++++++++++++++-------
 tools/virtio/linux/kmsan.h   |  2 +-
 5 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b784aab66867..dab49385e3e8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
 		 * is initialized by the hardware. Explicitly check/unpoison it
 		 * depending on the direction.
 		 */
-		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+		kmsan_handle_dma(sg_phys(sg), sg->length, direction);
 		*addr = (dma_addr_t)sg_phys(sg);
 		return 0;
 	}
@@ -3149,7 +3149,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	if (!vq->use_dma_api) {
-		kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+		kmsan_handle_dma(virt_to_phys(ptr), size, dir);
 		return (dma_addr_t)virt_to_phys(ptr);
 	}
 
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d5..6f27b9824ef7 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
 
 /**
  * kmsan_handle_dma() - Handle a DMA data transfer.
- * @page:   first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys:   physical address of the buffer.
  * @size:   buffer size.
  * @dir:    one of possible dma_data_direction values.
  *
@@ -191,8 +190,11 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
  * * checks the buffer, if it is copied to device;
  * * initializes the buffer, if it is copied from device;
  * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ *
+ * The function handles page lookup internally and supports both lowmem
+ * and highmem addresses.
  */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
 		      enum dma_data_direction dir);
 
 /**
@@ -372,8 +374,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
 {
 }
 
-static inline void kmsan_handle_dma(struct page *page, size_t offset,
-				    size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+				    enum dma_data_direction dir)
 {
 }
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 80481a873340..709405d46b2b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
-	kmsan_handle_dma(page, offset, size, dir);
+	kmsan_handle_dma(phys, size, dir);
 	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f0..eab7912a3bf0 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,25 +336,48 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
 }
 
 /* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
 		      enum dma_data_direction dir)
 {
 	u64 page_offset, to_go, addr;
+	struct page *page;
+	void *kaddr;
 
-	if (PageHighMem(page))
+	if (!pfn_valid(PHYS_PFN(phys)))
 		return;
-	addr = (u64)page_address(page) + offset;
+
+	page = phys_to_page(phys);
+	page_offset = offset_in_page(phys);
+
 	/*
 	 * The kernel may occasionally give us adjacent DMA pages not belonging
 	 * to the same allocation. Process them separately to avoid triggering
 	 * internal KMSAN checks.
 	 */
 	while (size > 0) {
-		page_offset = offset_in_page(addr);
 		to_go = min(PAGE_SIZE - page_offset, (u64)size);
+
+		if (PageHighMem(page))
+			/* Handle highmem pages using kmap */
+			kaddr = kmap_local_page(page);
+		else
+			/* Lowmem pages can be accessed directly */
+			kaddr = page_address(page);
+
+		addr = (u64)kaddr + page_offset;
 		kmsan_handle_dma_page((void *)addr, to_go, dir);
-		addr += to_go;
+
+		if (PageHighMem(page))
+			kunmap_local(page);
+
+		phys += to_go;
 		size -= to_go;
+
+		/* Move to next page if needed */
+		if (size > 0) {
+			page = phys_to_page(phys);
+			page_offset = offset_in_page(phys);
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(kmsan_handle_dma);
@@ -366,8 +389,7 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
 	int i;
 
 	for_each_sg(sg, item, nents, i)
-		kmsan_handle_dma(sg_page(item), item->offset, item->length,
-				 dir);
+		kmsan_handle_dma(sg_phys(item), item->length, dir);
 }
 
 /* Functions from kmsan-checks.h follow. */
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5..6cd2e3efd03d 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@
 
 #include <linux/gfp.h>
 
-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
 			     enum dma_data_direction dir)
 {
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (4 preceding siblings ...)
  2025-06-25 13:19   ` [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-06-25 13:19   ` Leon Romanovsky
  2025-07-25 20:04     ` Robin Murphy
  2025-06-25 13:19   ` [PATCH 7/8] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

All platforms which implement map_page interface don't support physical
addresses without real struct page. Add condition to check it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 kernel/dma/mapping.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 709405d46b2b..74efb6909103 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	phys_addr_t phys = page_to_phys(page) + offset;
+	bool is_pfn_valid = true;
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
@@ -170,8 +171,20 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
-	else
+	else {
+		if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
+			is_pfn_valid = pfn_valid(PHYS_PFN(phys));
+
+		if (unlikely(!is_pfn_valid))
+			return DMA_MAPPING_ERROR;
+
+		/*
+		 * All platforms which implement .map_page() don't support
+		 * non-struct page backed addresses.
+		 */
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
+	}
+
 	kmsan_handle_dma(phys, size, dir);
 	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/8] dma-mapping: export new dma_*map_phys() interface
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (5 preceding siblings ...)
  2025-06-25 13:19   ` [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback Leon Romanovsky
@ 2025-06-25 13:19   ` Leon Romanovsky
  2025-06-25 13:19   ` [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs() and dma_unmap_page_attrs() functions
converted to simple wrappers around the phys-based implementations.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/dma-mapping.h | 13 +++++++++++++
 kernel/dma/mapping.c        | 25 ++++++++++++++++++++-----
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb..ba54bbeca861 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -118,6 +118,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		unsigned long attrs);
 void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
 unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs);
 void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -172,6 +176,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 }
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
 static inline unsigned int dma_map_sg_attrs(struct device *dev,
 		struct scatterlist *sg, int nents, enum dma_data_direction dir,
 		unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 74efb6909103..29e8594a725a 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,12 @@ static inline bool dma_map_direct(struct device *dev,
 	return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
-		size_t offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-	phys_addr_t phys = page_to_phys(page) + offset;
+	struct page *page = phys_to_page(phys);
+	size_t offset = offset_in_page(page);
 	bool is_pfn_valid = true;
 	dma_addr_t addr;
 
@@ -191,9 +191,17 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 
 	return addr;
 }
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+		size_t offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	return dma_map_phys(dev, page_to_phys(page) + offset, size, dir, attrs);
+}
 EXPORT_SYMBOL(dma_map_page_attrs);
 
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -209,6 +217,13 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
 }
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+		 enum dma_data_direction dir, unsigned long attrs)
+{
+	dma_unmap_phys(dev, addr, size, dir, attrs);
+}
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
 static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (6 preceding siblings ...)
  2025-06-25 13:19   ` [PATCH 7/8] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
@ 2025-06-25 13:19   ` Leon Romanovsky
  2025-07-15 13:24     ` Will Deacon
  2025-06-27 13:44   ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Marek Szyprowski
  2025-07-25 20:05   ` Robin Murphy
  9 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-25 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

From: Leon Romanovsky <leonro@nvidia.com>

Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.

The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.

This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 mm/hmm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index feac86196a65..9354fae3ae06 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -779,8 +779,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 		if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
 			goto error;
 
-		dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
-					DMA_BIDIRECTIONAL);
+		dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+					DMA_BIDIRECTIONAL, 0);
 		if (dma_mapping_error(dev, dma_addr))
 			goto error;
 
@@ -823,8 +823,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
 		dma_iova_unlink(dev, state, idx * map->dma_entry_size,
 				map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
 	} else if (dma_need_unmap(dev))
-		dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
-			       DMA_BIDIRECTIONAL);
+		dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+			       DMA_BIDIRECTIONAL, 0);
 
 	pfns[idx] &=
 		~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-06-25 13:19   ` [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-06-26 17:43     ` Alexander Potapenko
  2025-06-26 18:45       ` Leon Romanovsky
  0 siblings, 1 reply; 37+ messages in thread
From: Alexander Potapenko @ 2025-06-26 17:43 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Christoph Hellwig,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

On Wed, Jun 25, 2025 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>

Hi Leon,

>
> Convert the KMSAN DMA handling function from page-based to physical
> address-based interface.
>
> The refactoring renames kmsan_handle_dma() parameters from accepting
> (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> size_t size).

Could you please elaborate a bit why this is needed? Are you fixing
some particular issue?

> A PFN_VALID check is added to prevent KMSAN operations
> on non-page memory, preventing from non struct page backed address,
>
> As part of this change, support for highmem addresses is implemented
> using kmap_local_page() to handle both lowmem and highmem regions
> properly. All callers throughout the codebase are updated to use the
> new phys_addr_t based interface.

KMSAN only works on 64-bit systems, do we actually have highmem on any of these?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-06-26 17:43     ` Alexander Potapenko
@ 2025-06-26 18:45       ` Leon Romanovsky
  2025-06-27 16:28         ` Alexander Potapenko
  0 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-26 18:45 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Marek Szyprowski, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Thu, Jun 26, 2025 at 07:43:06PM +0200, Alexander Potapenko wrote:
> On Wed, Jun 25, 2025 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Leon Romanovsky <leonro@nvidia.com>
> 
> Hi Leon,
> 
> >
> > Convert the KMSAN DMA handling function from page-based to physical
> > address-based interface.
> >
> > The refactoring renames kmsan_handle_dma() parameters from accepting
> > (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> > size_t size).
> 
> Could you please elaborate a bit why this is needed? Are you fixing
> some particular issue?

It is soft of the fix and improvement at the same time.
Improvement: 
It allows direct call to kmsan_handle_dma() without need
to convert from phys_addr_t to struct page for newly introduced
dma_map_phys() routine.

Fix:
It prevents us from executing kmsan for addresses that don't have struct page
(for example PCI_P2PDMA_MAP_THRU_HOST_BRIDGE pages), which we are doing
with original code.

dma_map_sg_attrs()
 -> __dma_map_sg_attrs()
  -> dma_direct_map_sg()
   -> PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and nents > 0
    -> kmsan_handle_dma_sg();
     -> kmsan_handle_dma(g_page(item) <---- this is "fake" page.

We are trying to build DMA API that doesn't require struct pages.

> 
> > A PFN_VALID check is added to prevent KMSAN operations
> > on non-page memory, preventing from non struct page backed address,
> >
> > As part of this change, support for highmem addresses is implemented
> > using kmap_local_page() to handle both lowmem and highmem regions
> > properly. All callers throughout the codebase are updated to use the
> > new phys_addr_t based interface.
> 
> KMSAN only works on 64-bit systems, do we actually have highmem on any of these?

I don't know, but the original code had this check:
  344         if (PageHighMem(page)) 
  345                 return;

Thanks

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (7 preceding siblings ...)
  2025-06-25 13:19   ` [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
@ 2025-06-27 13:44   ` Marek Szyprowski
  2025-06-27 17:02     ` Leon Romanovsky
  2025-07-25 20:05   ` Robin Murphy
  9 siblings, 1 reply; 37+ messages in thread
From: Marek Szyprowski @ 2025-06-27 13:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On 25.06.2025 15:18, Leon Romanovsky wrote:
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.
>
> The series consists of 8 patches that progressively convert the DMA
> mapping infrastructure from page-based to physical address-based APIs:
>
> The series maintains backward compatibility by keeping the old
> page-based API as wrapper functions around the new physical
> address-based implementations.

Thanks for this rework! I assume that the next step is to add map_phys 
callback also to the dma_map_ops and teach various dma-mapping providers 
to use it to avoid more phys-to-page-to-phys conversions.

I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys() 
API is also suitable for the recently discussed PCI P2P DMA? While 
adding a new API maybe we should take this into account? My main concern 
is the lack of the source phys addr passed to the dma_unmap_phys() 
function and I'm aware that this might complicate a bit code conversion 
from old dma_map/unmap_page() API.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-06-26 18:45       ` Leon Romanovsky
@ 2025-06-27 16:28         ` Alexander Potapenko
  0 siblings, 0 replies; 37+ messages in thread
From: Alexander Potapenko @ 2025-06-27 16:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Thu, Jun 26, 2025 at 8:45 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Thu, Jun 26, 2025 at 07:43:06PM +0200, Alexander Potapenko wrote:
> > On Wed, Jun 25, 2025 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > From: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Alexander Potapenko <glider@google.com>

> >
> > Hi Leon,
> >
> > >
> > > Convert the KMSAN DMA handling function from page-based to physical
> > > address-based interface.
> > >
> > > The refactoring renames kmsan_handle_dma() parameters from accepting
> > > (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> > > size_t size).
> >
> > Could you please elaborate a bit why this is needed? Are you fixing
> > some particular issue?
>
> It is soft of the fix and improvement at the same time.
> Improvement:
> It allows direct call to kmsan_handle_dma() without need
> to convert from phys_addr_t to struct page for newly introduced
> dma_map_phys() routine.
>
> Fix:
> It prevents us from executing kmsan for addresses that don't have struct page
> (for example PCI_P2PDMA_MAP_THRU_HOST_BRIDGE pages), which we are doing
> with original code.
>
> dma_map_sg_attrs()
>  -> __dma_map_sg_attrs()
>   -> dma_direct_map_sg()
>    -> PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and nents > 0
>     -> kmsan_handle_dma_sg();
>      -> kmsan_handle_dma(g_page(item) <---- this is "fake" page.
>
> We are trying to build DMA API that doesn't require struct pages.

Thanks for clarifying that!

> > KMSAN only works on 64-bit systems, do we actually have highmem on any of these?
>
> I don't know, but the original code had this check:
>   344         if (PageHighMem(page))
>   345                 return;
>
> Thanks

Ouch, I overlooked that, sorry!

I spent a while trying to understand where this code originated from,
and found the following discussion:
https://lore.kernel.org/all/20200327170647.GA22758@lst.de/

It's still unclear to me whether we actually need this check, because
with my config it doesn't produce any code.
But I think this shouldn't be blocking your patch, I'd rather make a
follow-up fix.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-06-27 13:44   ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Marek Szyprowski
@ 2025-06-27 17:02     ` Leon Romanovsky
  2025-06-30 13:38       ` Christoph Hellwig
  2025-07-06  6:00       ` Leon Romanovsky
  0 siblings, 2 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-06-27 17:02 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On Fri, Jun 27, 2025 at 03:44:10PM +0200, Marek Szyprowski wrote:
> On 25.06.2025 15:18, Leon Romanovsky wrote:
> > This series refactors the DMA mapping to use physical addresses
> > as the primary interface instead of page+offset parameters. This
> > change aligns the DMA API with the underlying hardware reality where
> > DMA operations work with physical addresses, not page structures.
> >
> > The series consists of 8 patches that progressively convert the DMA
> > mapping infrastructure from page-based to physical address-based APIs:
> >
> > The series maintains backward compatibility by keeping the old
> > page-based API as wrapper functions around the new physical
> > address-based implementations.
> 
> Thanks for this rework! I assume that the next step is to add map_phys 
> callback also to the dma_map_ops and teach various dma-mapping providers 
> to use it to avoid more phys-to-page-to-phys conversions.

Probably Christoph will say yes, however I personally don't see any
benefit in this. Maybe I wrong here, but all existing .map_page()
implementation platforms don't support p2p anyway. They won't benefit
from this such conversion.

> 
> I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys() 
> API is also suitable for the recently discussed PCI P2P DMA? While 
> adding a new API maybe we should take this into account?

First, immediate user (not related to p2p) is blk layer:
https://lore.kernel.org/linux-nvme/bcdcb5eb-17ed-412f-bf5c-303079798fe2@nvidia.com/T/#m7e715697d4b2e3997622a3400243477c75cab406

+static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
+		struct blk_dma_iter *iter, struct phys_vec *vec)
+{
+	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
+			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+	if (dma_mapping_error(dma_dev, iter->addr)) {
+		iter->status = BLK_STS_RESOURCE;
+		return false;
+	}
+	iter->len = vec->len;
+	return true;
+}

Block layer started to store phys addresses instead of struct pages and
this phys_to_page() conversion in data-path will be avoided.

> My main concern is the lack of the source phys addr passed to the dma_unmap_phys() 
> function and I'm aware that this might complicate a bit code conversion 
> from old dma_map/unmap_page() API.
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-06-27 17:02     ` Leon Romanovsky
@ 2025-06-30 13:38       ` Christoph Hellwig
  2025-07-08 10:27         ` Marek Szyprowski
  2025-07-06  6:00       ` Leon Romanovsky
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2025-06-30 13:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm, Jason Gunthorpe

On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> > Thanks for this rework! I assume that the next step is to add map_phys 
> > callback also to the dma_map_ops and teach various dma-mapping providers 
> > to use it to avoid more phys-to-page-to-phys conversions.
> 
> Probably Christoph will say yes, however I personally don't see any
> benefit in this. Maybe I wrong here, but all existing .map_page()
> implementation platforms don't support p2p anyway. They won't benefit
> from this such conversion.

I think that conversion should eventually happen, and rather sooner than
later.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-06-27 17:02     ` Leon Romanovsky
  2025-06-30 13:38       ` Christoph Hellwig
@ 2025-07-06  6:00       ` Leon Romanovsky
  1 sibling, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-06  6:00 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> On Fri, Jun 27, 2025 at 03:44:10PM +0200, Marek Szyprowski wrote:
> > On 25.06.2025 15:18, Leon Romanovsky wrote:
> > > This series refactors the DMA mapping to use physical addresses
> > > as the primary interface instead of page+offset parameters. This
> > > change aligns the DMA API with the underlying hardware reality where
> > > DMA operations work with physical addresses, not page structures.
> > >
> > > The series consists of 8 patches that progressively convert the DMA
> > > mapping infrastructure from page-based to physical address-based APIs:
> > >
> > > The series maintains backward compatibility by keeping the old
> > > page-based API as wrapper functions around the new physical
> > > address-based implementations.
> > 
> > Thanks for this rework! I assume that the next step is to add map_phys 
> > callback also to the dma_map_ops and teach various dma-mapping providers 
> > to use it to avoid more phys-to-page-to-phys conversions.
> 
> Probably Christoph will say yes, however I personally don't see any
> benefit in this. Maybe I wrong here, but all existing .map_page()
> implementation platforms don't support p2p anyway. They won't benefit
> from this such conversion.
> 
> > 
> > I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys() 
> > API is also suitable for the recently discussed PCI P2P DMA? While 
> > adding a new API maybe we should take this into account?
> 
> First, immediate user (not related to p2p) is blk layer:
> https://lore.kernel.org/linux-nvme/bcdcb5eb-17ed-412f-bf5c-303079798fe2@nvidia.com/T/#m7e715697d4b2e3997622a3400243477c75cab406
> 
> +static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
> +		struct blk_dma_iter *iter, struct phys_vec *vec)
> +{
> +	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> +			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> +	if (dma_mapping_error(dma_dev, iter->addr)) {
> +		iter->status = BLK_STS_RESOURCE;
> +		return false;
> +	}
> +	iter->len = vec->len;
> +	return true;
> +}
> 
> Block layer started to store phys addresses instead of struct pages and
> this phys_to_page() conversion in data-path will be avoided.

I almost completed main user of this dma_map_phys() callback. It is
rewrite of this patch [PATCH v3 3/3] vfio/pci: Allow MMIO regions to be exported through dma-buf
https://lore.kernel.org/all/20250307052248.405803-4-vivek.kasireddy@intel.com/

Whole populate_sgt()->dma_map_resource() block looks differently now and
it is relying on dma_map_phys() as we are exporting memory without
struct pages. It will be something like this:

   89         for (i = 0; i < priv->nr_ranges; i++) {
   90                 phys = pci_resource_start(priv->vdev->pdev,
   91                                           dma_ranges[i].region_index);
   92                 phys += dma_ranges[i].offset;
   93
   94                 if (priv->bus_addr) {
   95                         addr = pci_p2pdma_bus_addr_map(&p2pdma_state, phys);
   96                         fill_sg_entry(sgl, dma_ranges[i].length, addr);
   97                         sgl = sg_next(sgl);
   98                 } else if (dma_use_iova(&priv->state)) {
   99                         ret = dma_iova_link(attachment->dev, &priv->state, phys,
  100                                             priv->mapped_len,
  101                                             dma_ranges[i].length, dir, attrs);
  102                         if (ret)
  103                                 goto err_unmap_dma;
  104
  105                         priv->mapped_len += dma_ranges[i].length;
  106                 } else {
  107                         addr = dma_map_phys(attachment->dev, phys, 0,
  108                                             dma_ranges[i].length, dir, attrs);
  109                         ret = dma_mapping_error(attachment->dev, addr);
  110                         if (ret)
  111                                 goto unmap_dma_buf;
  112
  113                         fill_sg_entry(sgl, dma_ranges[i].length, addr);
  114                         sgl = sg_next(sgl);
  115                 }
  116         }
  117
  118         if (dma_use_iova(&priv->state) && !priv->bus_addr) {
  119                 ret = dma_iova_sync(attachment->dev, &pri->state, 0,
  120                                     priv->mapped_len);
  121                 if (ret)
  122                         goto err_unmap_dma;
  123
  124                 fill_sg_entry(sgl, priv->mapped_len, priv->state.addr);
  125         }

> 
> > My main concern is the lack of the source phys addr passed to the dma_unmap_phys() 
> > function and I'm aware that this might complicate a bit code conversion 
> > from old dma_map/unmap_page() API.

It is not needed for now, all p2p logic is external to DMA API.

Thanks

> > 
> > Best regards
> > -- 
> > Marek Szyprowski, PhD
> > Samsung R&D Institute Poland
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-06-30 13:38       ` Christoph Hellwig
@ 2025-07-08 10:27         ` Marek Szyprowski
  2025-07-08 11:00           ` Leon Romanovsky
  2025-07-30 11:11           ` Robin Murphy
  0 siblings, 2 replies; 37+ messages in thread
From: Marek Szyprowski @ 2025-07-08 10:27 UTC (permalink / raw)
  To: Christoph Hellwig, Leon Romanovsky
  Cc: Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On 30.06.2025 15:38, Christoph Hellwig wrote:
> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>> Thanks for this rework! I assume that the next step is to add map_phys
>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>> to use it to avoid more phys-to-page-to-phys conversions.
>> Probably Christoph will say yes, however I personally don't see any
>> benefit in this. Maybe I wrong here, but all existing .map_page()
>> implementation platforms don't support p2p anyway. They won't benefit
>> from this such conversion.
> I think that conversion should eventually happen, and rather sooner than
> later.

Agreed.

Applied patches 1-7 to my dma-mapping-next branch. Let me know if one 
needs a stable branch with it.

Leon, it would be great if You could also prepare an incremental patch 
adding map_phys callback to the dma_maps_ops, so the individual 
arch-specific dma-mapping providers can be then converted (or simplified 
in many cases) too.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-08 10:27         ` Marek Szyprowski
@ 2025-07-08 11:00           ` Leon Romanovsky
  2025-07-08 11:45             ` Marek Szyprowski
  2025-07-30 11:11           ` Robin Murphy
  1 sibling, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-08 11:00 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
> On 30.06.2025 15:38, Christoph Hellwig wrote:
> > On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> >>> Thanks for this rework! I assume that the next step is to add map_phys
> >>> callback also to the dma_map_ops and teach various dma-mapping providers
> >>> to use it to avoid more phys-to-page-to-phys conversions.
> >> Probably Christoph will say yes, however I personally don't see any
> >> benefit in this. Maybe I wrong here, but all existing .map_page()
> >> implementation platforms don't support p2p anyway. They won't benefit
> >> from this such conversion.
> > I think that conversion should eventually happen, and rather sooner than
> > later.
> 
> Agreed.
> 
> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one 
> needs a stable branch with it.

Thanks a lot, I don't think that stable branch is needed. Realistically
speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
it is complete rewrite from RFC version and touches pci-p2p code (to
remove dependency on struct page) in addition to VFIO, so it will take
time.

Regarding, last patch (hmm), it will be great if you can take it.
We didn't touch anything in hmm.c this cycle and have no plans to send PR.
It can safely go through your tree.

> 
> Leon, it would be great if You could also prepare an incremental patch 
> adding map_phys callback to the dma_maps_ops, so the individual 
> arch-specific dma-mapping providers can be then converted (or simplified 
> in many cases) too.

Sure, will do.

> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-08 11:00           ` Leon Romanovsky
@ 2025-07-08 11:45             ` Marek Szyprowski
  2025-07-08 12:06               ` Leon Romanovsky
  0 siblings, 1 reply; 37+ messages in thread
From: Marek Szyprowski @ 2025-07-08 11:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On 08.07.2025 13:00, Leon Romanovsky wrote:
> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>> Probably Christoph will say yes, however I personally don't see any
>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>> from this such conversion.
>>> I think that conversion should eventually happen, and rather sooner than
>>> later.
>> Agreed.
>>
>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>> needs a stable branch with it.
> Thanks a lot, I don't think that stable branch is needed. Realistically
> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
> it is complete rewrite from RFC version and touches pci-p2p code (to
> remove dependency on struct page) in addition to VFIO, so it will take
> time.
>
> Regarding, last patch (hmm), it will be great if you can take it.
> We didn't touch anything in hmm.c this cycle and have no plans to send PR.
> It can safely go through your tree.

Okay, then I would like to get an explicit ack from Jérôme for this.

>> Leon, it would be great if You could also prepare an incremental patch
>> adding map_phys callback to the dma_maps_ops, so the individual
>> arch-specific dma-mapping providers can be then converted (or simplified
>> in many cases) too.
> Sure, will do.

Thanks!

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-08 11:45             ` Marek Szyprowski
@ 2025-07-08 12:06               ` Leon Romanovsky
  2025-07-08 12:56                 ` Marek Szyprowski
  0 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-08 12:06 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On Tue, Jul 08, 2025 at 01:45:20PM +0200, Marek Szyprowski wrote:
> On 08.07.2025 13:00, Leon Romanovsky wrote:
> > On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
> >> On 30.06.2025 15:38, Christoph Hellwig wrote:
> >>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> >>>>> Thanks for this rework! I assume that the next step is to add map_phys
> >>>>> callback also to the dma_map_ops and teach various dma-mapping providers
> >>>>> to use it to avoid more phys-to-page-to-phys conversions.
> >>>> Probably Christoph will say yes, however I personally don't see any
> >>>> benefit in this. Maybe I wrong here, but all existing .map_page()
> >>>> implementation platforms don't support p2p anyway. They won't benefit
> >>>> from this such conversion.
> >>> I think that conversion should eventually happen, and rather sooner than
> >>> later.
> >> Agreed.
> >>
> >> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> >> needs a stable branch with it.
> > Thanks a lot, I don't think that stable branch is needed. Realistically
> > speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
> > it is complete rewrite from RFC version and touches pci-p2p code (to
> > remove dependency on struct page) in addition to VFIO, so it will take
> > time.
> >
> > Regarding, last patch (hmm), it will be great if you can take it.
> > We didn't touch anything in hmm.c this cycle and have no plans to send PR.
> > It can safely go through your tree.
> 
> Okay, then I would like to get an explicit ack from Jérôme for this.

Jerome is not active in HMM world for a long time already.
HMM tree is managed by us (RDMA) https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
➜  kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
...
Pull HMM updates from Jason Gunthorpe:
...

https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209
+hmm		git	git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm

We just never bothered to reflect current situation in MAINTAINERS file.

Thanks

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-08 12:06               ` Leon Romanovsky
@ 2025-07-08 12:56                 ` Marek Szyprowski
  2025-07-08 15:57                   ` Marek Szyprowski
  0 siblings, 1 reply; 37+ messages in thread
From: Marek Szyprowski @ 2025-07-08 12:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On 08.07.2025 14:06, Leon Romanovsky wrote:
> On Tue, Jul 08, 2025 at 01:45:20PM +0200, Marek Szyprowski wrote:
>> On 08.07.2025 13:00, Leon Romanovsky wrote:
>>> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>>>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>>>> Probably Christoph will say yes, however I personally don't see any
>>>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>>>> from this such conversion.
>>>>> I think that conversion should eventually happen, and rather sooner than
>>>>> later.
>>>> Agreed.
>>>>
>>>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>>>> needs a stable branch with it.
>>> Thanks a lot, I don't think that stable branch is needed. Realistically
>>> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
>>> it is complete rewrite from RFC version and touches pci-p2p code (to
>>> remove dependency on struct page) in addition to VFIO, so it will take
>>> time.
>>>
>>> Regarding, last patch (hmm), it will be great if you can take it.
>>> We didn't touch anything in hmm.c this cycle and have no plans to send PR.
>>> It can safely go through your tree.
>> Okay, then I would like to get an explicit ack from Jérôme for this.
> Jerome is not active in HMM world for a long time already.
> HMM tree is managed by us (RDMA) https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
> ➜  kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
> ...
> Pull HMM updates from Jason Gunthorpe:
> ...
>
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209
> +hmm		git	git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm
>
> We just never bothered to reflect current situation in MAINTAINERS file.

Maybe this is the time to update it :)

I was just a bit confused that no-one commented the HMM patch, but if 
You maintain it, then this is okay.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-08 12:56                 ` Marek Szyprowski
@ 2025-07-08 15:57                   ` Marek Szyprowski
  0 siblings, 0 replies; 37+ messages in thread
From: Marek Szyprowski @ 2025-07-08 15:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy,
	Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On 08.07.2025 14:56, Marek Szyprowski wrote:
> On 08.07.2025 14:06, Leon Romanovsky wrote:
>> On Tue, Jul 08, 2025 at 01:45:20PM +0200, Marek Szyprowski wrote:
>>> On 08.07.2025 13:00, Leon Romanovsky wrote:
>>>> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>>>>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>>>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>>>>> Thanks for this rework! I assume that the next step is to add 
>>>>>>>> map_phys
>>>>>>>> callback also to the dma_map_ops and teach various dma-mapping 
>>>>>>>> providers
>>>>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>>>>> Probably Christoph will say yes, however I personally don't see any
>>>>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>>>>> implementation platforms don't support p2p anyway. They won't 
>>>>>>> benefit
>>>>>>> from this such conversion.
>>>>>> I think that conversion should eventually happen, and rather 
>>>>>> sooner than
>>>>>> later.
>>>>> Agreed.
>>>>>
>>>>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>>>>> needs a stable branch with it.
>>>> Thanks a lot, I don't think that stable branch is needed. 
>>>> Realistically
>>>> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
>>>> it is complete rewrite from RFC version and touches pci-p2p code (to
>>>> remove dependency on struct page) in addition to VFIO, so it will take
>>>> time.
>>>>
>>>> Regarding, last patch (hmm), it will be great if you can take it.
>>>> We didn't touch anything in hmm.c this cycle and have no plans to 
>>>> send PR.
>>>> It can safely go through your tree.
>>> Okay, then I would like to get an explicit ack from Jérôme for this.
>> Jerome is not active in HMM world for a long time already.
>> HMM tree is managed by us (RDMA) 
>> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
>> ➜  kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
>> ...
>> Pull HMM updates from Jason Gunthorpe:
>> ...
>>
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209 
>>
>> +hmm        git 
>> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm
>>
>> We just never bothered to reflect current situation in MAINTAINERS file.
>
> Maybe this is the time to update it :)
>
> I was just a bit confused that no-one commented the HMM patch, but if 
> You maintain it, then this is okay.


I've applied the last patch to dma-mapping-for-next branch.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API
  2025-06-25 13:19   ` [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
@ 2025-07-15 13:24     ` Will Deacon
  2025-07-15 13:58       ` Leon Romanovsky
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2025-07-15 13:24 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Christoph Hellwig,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

Hi Leon,

On Wed, Jun 25, 2025 at 04:19:05PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Convert HMM DMA operations from the legacy page-based API to the new
> physical address-based dma_map_phys() and dma_unmap_phys() functions.
> This demonstrates the preferred approach for new code that should use
> physical addresses directly rather than page+offset parameters.
> 
> The change replaces dma_map_page() and dma_unmap_page() calls with
> dma_map_phys() and dma_unmap_phys() respectively, using the physical
> address that was already available in the code. This eliminates the
> redundant page-to-physical address conversion and aligns with the
> DMA subsystem's move toward physical address-centric interfaces.
> 
> This serves as an example of how new code should be written to leverage
> the more efficient physical address API, which provides cleaner interfaces
> for drivers that already have access to physical addresses.

I'm struggling a little to see how this is cleaner or more efficient
than the old code.

From what I can tell, dma_map_page_attrs() takes a 'struct page *' and
converts it to a physical address using page_to_phys() whilst your new
dma_map_phys() interface takes a physical address and converts it to
a 'struct page *' using phys_to_page(). In both cases, hmm_dma_map_pfn()
still needs the page for other reasons. If anything, existing users of
dma_map_page_attrs() now end up with a redundant page-to-phys-to-page
conversion which hopefully the compiler folds away.

I'm assuming there's future work which builds on top of the new API
and removes the reliance on 'struct page' entirely, is that right? If
so, it would've been nicer to be clearer about that as, on its own, I'm
not really sure this patch series achieves an awful lot and the
efficiency argument looks quite weak to me.

Cheers,

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API
  2025-07-15 13:24     ` Will Deacon
@ 2025-07-15 13:58       ` Leon Romanovsky
  0 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-15 13:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marek Szyprowski, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Robin Murphy, Joerg Roedel, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Tue, Jul 15, 2025 at 02:24:38PM +0100, Will Deacon wrote:
> Hi Leon,
> 
> On Wed, Jun 25, 2025 at 04:19:05PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Convert HMM DMA operations from the legacy page-based API to the new
> > physical address-based dma_map_phys() and dma_unmap_phys() functions.
> > This demonstrates the preferred approach for new code that should use
> > physical addresses directly rather than page+offset parameters.
> > 
> > The change replaces dma_map_page() and dma_unmap_page() calls with
> > dma_map_phys() and dma_unmap_phys() respectively, using the physical
> > address that was already available in the code. This eliminates the
> > redundant page-to-physical address conversion and aligns with the
> > DMA subsystem's move toward physical address-centric interfaces.
> > 
> > This serves as an example of how new code should be written to leverage
> > the more efficient physical address API, which provides cleaner interfaces
> > for drivers that already have access to physical addresses.
> 
> I'm struggling a little to see how this is cleaner or more efficient
> than the old code.

It is not, the main reason for hmm conversion is to show how the API is
used. HMM is built around struct page.

> 
> From what I can tell, dma_map_page_attrs() takes a 'struct page *' and
> converts it to a physical address using page_to_phys() whilst your new
> dma_map_phys() interface takes a physical address and converts it to
> a 'struct page *' using phys_to_page(). In both cases, hmm_dma_map_pfn()
> still needs the page for other reasons. If anything, existing users of
> dma_map_page_attrs() now end up with a redundant page-to-phys-to-page
> conversion which hopefully the compiler folds away.
> 
> I'm assuming there's future work which builds on top of the new API
> and removes the reliance on 'struct page' entirely, is that right? If
> so, it would've been nicer to be clearer about that as, on its own, I'm
> not really sure this patch series achieves an awful lot and the
> efficiency argument looks quite weak to me.

Yes, there is ongoing work, which is built on top of dma_map_phys() API
and can't be built without DMA phys.

My WIP branch, where I'm using it can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio

In that branch, we save one phys_to_page conversion in block datapath:
block-dma: migrate to dma_map_phys instead of map_page

and implement DMABUF exporter for MMIO pages:
vfio/pci: Allow MMIO regions to be exported through dma-buf
see vfio_pci_dma_buf_map() function.

Thanks

> 
> Cheers,
> 
> Will
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback
  2025-06-25 13:19   ` [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback Leon Romanovsky
@ 2025-07-25 20:04     ` Robin Murphy
  2025-07-27  6:30       ` Leon Romanovsky
  0 siblings, 1 reply; 37+ messages in thread
From: Robin Murphy @ 2025-07-25 20:04 UTC (permalink / raw)
  To: Leon Romanovsky, Marek Szyprowski
  Cc: Leon Romanovsky, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On 2025-06-25 2:19 pm, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> All platforms which implement map_page interface don't support physical
> addresses without real struct page. Add condition to check it.

As-is, the condition also needs to cover iommu-dma, because that also 
still doesn't support non-page-backed addresses. You can't just do a 
simple s/page/phys/ rename and hope it's OK because you happen to get 
away with it for coherent, 64-bit, trusted devices.

Thanks,
Robin.

> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   kernel/dma/mapping.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 709405d46b2b..74efb6909103 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   	phys_addr_t phys = page_to_phys(page) + offset;
> +	bool is_pfn_valid = true;
>   	dma_addr_t addr;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> @@ -170,8 +171,20 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>   		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
>   	else if (use_dma_iommu(dev))
>   		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
> -	else
> +	else {
> +		if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
> +			is_pfn_valid = pfn_valid(PHYS_PFN(phys));
> +
> +		if (unlikely(!is_pfn_valid))
> +			return DMA_MAPPING_ERROR;
> +
> +		/*
> +		 * All platforms which implement .map_page() don't support
> +		 * non-struct page backed addresses.
> +		 */
>   		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> +	}
> +
>   	kmsan_handle_dma(phys, size, dir);
>   	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
>   	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
                     ` (8 preceding siblings ...)
  2025-06-27 13:44   ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Marek Szyprowski
@ 2025-07-25 20:05   ` Robin Murphy
  2025-07-29 14:03     ` Jason Gunthorpe
  9 siblings, 1 reply; 37+ messages in thread
From: Robin Murphy @ 2025-07-25 20:05 UTC (permalink / raw)
  To: Leon Romanovsky, Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On 2025-06-25 2:18 pm, Leon Romanovsky wrote:
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.

That is obvious nonsense - the DMA *API* does not exist in "hardware 
reality"; the DMA API abstracts *software* operations that must be 
performed before and after the actual hardware DMA operation in order to 
preserve memory coherency etc.

Streaming DMA API callers get their buffers from alloc_pages() or 
kmalloc(); they do not have physical addresses, they have a page or 
virtual address. The internal operations of pretty much every DMA API 
implementation that isn't a no-op also require a page and/or virtual 
address. It is 100% logical for the DMA API interfaces to take a page or 
virtual address (and since virt_to_page() is pretty trivial, we already 
consolidated the two interfaces ages ago).

Yes, once you get right down to the low-level arch_sync_dma_*() 
interfaces that passes a physical address, but that's mostly an artefact 
of them being factored out of old dma_sync_single_*() implementations 
that took a (physical) DMA address. Nearly all of them then use __va() 
or phys_to_virt() to actually consume it. Even though it's a 
phys_addr_t, the implicit guarantee that it represents page-backed 
memory is absolutely vital.

Take a step back; what do you imagine that a DMA API call on a 
non-page-backed physical address could actually *do*?

- Cache maintenance? No, it would be illogical for a P2P address to be 
cached in a CPU cache, and anyway it would almost always crash because 
it requires page-backed memory with a virtual address.

- Bounce buffering? Again no, that would be illogical, defeat the entire 
point of a P2P operation, and anyway would definitely crash because it 
requires page-backed memory with a virtual address.

- IOMMU mappings? Oh hey look that's exactly what dma_map_resource() has 
been doing for 9 years. Not to mention your new IOMMU API if callers 
want to be IOMMU-aware (although without the same guarantee of not also 
doing the crashy things.)

- Debug tracking? Again, already taken care of by dma_map_resource().

- Some entirely new concept? Well, I'm eager to be enlightened if so!

But given what we do already know of from decades of experience, obvious 
question: For the tiny minority of users who know full well when they're 
dealing with a non-page-backed physical address, what's wrong with using 
dma_map_resource?

Does it make sense to try to consolidate our p2p infrastructure so 
dma_map_resource() could return bus addresses where appropriate? Yes, 
almost certainly, if it makes it more convenient to use. And with only 
about 20 users it's not too impractical to add some extra arguments or 
even rejig the whole interface if need be. Indeed an overhaul might even 
help solve the current grey area as to when it should take dma_range_map 
into account or not for platform devices.

> The series consists of 8 patches that progressively convert the DMA
> mapping infrastructure from page-based to physical address-based APIs:

And as a result ends up making said DMA mapping infrastructure slightly 
more complicated and slightly less efficient for all its legitimate 
users, all so one or two highly specialised users can then pretend to 
call it in situations where it must be a no-op anyway? Please explain 
convincingly why that is not a giant waste of time.

Are we trying to remove struct page from the kernel altogether? If yes, 
then for goodness' sake lead with that, but even then I'd still prefer 
to see the replacements for critical related infrastructure like 
pfn_valid() in place before we start trying to reshape the DMA API to fit.

Thanks,
Robin.

> The series maintains backward compatibility by keeping the old
> page-based API as wrapper functions around the new physical
> address-based implementations.
> 
> Thanks
> 
> Leon Romanovsky (8):
>    dma-debug: refactor to use physical addresses for page mapping
>    dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
>    iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
>    dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
>    kmsan: convert kmsan_handle_dma to use physical addresses
>    dma-mapping: fail early if physical address is mapped through platform
>      callback
>    dma-mapping: export new dma_*map_phys() interface
>    mm/hmm: migrate to physical address-based DMA mapping API
> 
>   Documentation/core-api/dma-api.rst |  4 +-
>   arch/powerpc/kernel/dma-iommu.c    |  4 +-
>   drivers/iommu/dma-iommu.c          | 14 +++----
>   drivers/virtio/virtio_ring.c       |  4 +-
>   include/linux/dma-map-ops.h        |  8 ++--
>   include/linux/dma-mapping.h        | 13 ++++++
>   include/linux/iommu-dma.h          |  7 ++--
>   include/linux/kmsan.h              | 12 +++---
>   include/trace/events/dma.h         |  4 +-
>   kernel/dma/debug.c                 | 28 ++++++++-----
>   kernel/dma/debug.h                 | 16 ++++---
>   kernel/dma/direct.c                |  6 +--
>   kernel/dma/direct.h                | 13 +++---
>   kernel/dma/mapping.c               | 67 +++++++++++++++++++++---------
>   kernel/dma/ops_helpers.c           |  6 +--
>   mm/hmm.c                           |  8 ++--
>   mm/kmsan/hooks.c                   | 36 ++++++++++++----
>   tools/virtio/linux/kmsan.h         |  2 +-
>   18 files changed, 159 insertions(+), 93 deletions(-)
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback
  2025-07-25 20:04     ` Robin Murphy
@ 2025-07-27  6:30       ` Leon Romanovsky
  0 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-27  6:30 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Marek Szyprowski, Christoph Hellwig, Jonathan Corbet,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Fri, Jul 25, 2025 at 09:04:50PM +0100, Robin Murphy wrote:
> On 2025-06-25 2:19 pm, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > All platforms which implement map_page interface don't support physical
> > addresses without real struct page. Add condition to check it.
> 
> As-is, the condition also needs to cover iommu-dma, because that also still
> doesn't support non-page-backed addresses. You can't just do a simple
> s/page/phys/ rename and hope it's OK because you happen to get away with it
> for coherent, 64-bit, trusted devices.

It needs to be follow up patch. Is this what you envision? 

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e1586eb52ab34..31214fde88124 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -167,6 +167,12 @@ dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
            arch_dma_map_phys_direct(dev, phys + size))
                addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
        else if (use_dma_iommu(dev))
+               if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+                   !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+                       is_pfn_valid = pfn_valid(PHYS_PFN(phys));
+
+               if (unlikely(!is_pfn_valid))
+                       return DMA_MAPPING_ERROR;
                addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
        else {
                struct page *page = phys_to_page(phys);
~
~
~

Thanks

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-25 20:05   ` Robin Murphy
@ 2025-07-29 14:03     ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-07-29 14:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Leon Romanovsky, Marek Szyprowski, Christoph Hellwig,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

On Fri, Jul 25, 2025 at 09:05:46PM +0100, Robin Murphy wrote:

> But given what we do already know of from decades of experience, obvious
> question: For the tiny minority of users who know full well when they're
> dealing with a non-page-backed physical address, what's wrong with using
> dma_map_resource?

I was also pushing for this, that we would have two seperate paths:

- the phys_addr was guarenteed to have a KVA (and today also struct page)
- the phys_addr is non-cachable and no KVA may exist

This is basically already the distinction today between map resource
and map page.

The caller would have to look at what it is trying to map, do the P2P
evaluation and then call the cachable phys or resource path(s).

Leon, I think you should revive the work you had along these lines. It
would address my concerns with the dma_ops changes too. I continue to
think we should not push non-cachable, non-KVA MMIO down the map_page
ops, those should use the map_resource op.

> Does it make sense to try to consolidate our p2p infrastructure so
> dma_map_resource() could return bus addresses where appropriate?

For some users but not entirely :( The sg path for P2P relies on
storing information inside the scatterlist so unmap knows what to do.

Changing map_resource to return a similar flag and then having drivers
somehow store that flag and give it back to unmap is not a trivial
change. It would be a good API for simple drivers, and I think we
could build such a helper calling through the new flow. But places
like DMABUF that have more complex lists will not like it.

For them we've been following the approach of BIO where the
driver/subystem will maintain a mapping list and be aware of when the
P2P information is changing. Then it has to do different map/unmap
sequences based on its own existing tracking.

I view this as all very low level infrastructure, I'm really hoping we
can get an agreement with Chritain and build a scatterlist replacement
for DMABUF that encapsulates all this away from drivers like BIO does
for block.

But we can't start that until we have a DMA API working fully for
non-struct page P2P memory. That is being driven by this series and
the VFIO DMABUF implementation on top of it.

> Are we trying to remove struct page from the kernel altogether? 

Yes, it is a very long term project being pushed along with the
folios, memdesc conversion and so forth. It is huge, with many
aspects, but we can start to reasonably work on parts of them
independently.

A mid-term dream is to be able to go from pin_user_pages() -> DMA
without drivers needing to touch struct page at all. 

This is a huge project on its own, and we are progressing it slowly
"bottom up" by allowing phys_addr_t in the DMA API then we can build
more infrastructure for subsystems to be struct-page free, culminating
in some pin_user_phyr() and phys_addr_t bio_vec someday.

Certainly a big part of this series is influenced by requirements to
advance pin_user_pages() -> DMA, while the other part is about
allowing P2P to work using phys_addr_t without struct page.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-08 10:27         ` Marek Szyprowski
  2025-07-08 11:00           ` Leon Romanovsky
@ 2025-07-30 11:11           ` Robin Murphy
  2025-07-30 13:40             ` Leon Romanovsky
                               ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Robin Murphy @ 2025-07-30 11:11 UTC (permalink / raw)
  To: Marek Szyprowski, Christoph Hellwig, Leon Romanovsky
  Cc: Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm, Jason Gunthorpe

On 2025-07-08 11:27 am, Marek Szyprowski wrote:
> On 30.06.2025 15:38, Christoph Hellwig wrote:
>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>> Probably Christoph will say yes, however I personally don't see any
>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>> implementation platforms don't support p2p anyway. They won't benefit
>>> from this such conversion.
>> I think that conversion should eventually happen, and rather sooner than
>> later.
> 
> Agreed.
> 
> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> needs a stable branch with it.

As the maintainer of iommu-dma, please drop the iommu-dma patch because 
it is broken. It does not in any way remove the struct page dependency 
from iommu-dma, it merely hides it so things can crash more easily in 
circumstances that clearly nobody's bothered to test.

> Leon, it would be great if You could also prepare an incremental patch
> adding map_phys callback to the dma_maps_ops, so the individual
> arch-specific dma-mapping providers can be then converted (or simplified
> in many cases) too.

Marek, I'm surprised that even you aren't seeing why that would at best 
be pointless churn. The fundamental design of dma_map_page() operating 
on struct page is that it sits in between alloc_pages() at the caller 
and kmap_atomic() deep down in the DMA API implementation (which also 
subsumes any dependencies on having a kernel virtual address at the 
implementation end). The natural working unit for whatever replaces 
dma_map_page() will be whatever the replacement for alloc_pages() 
returns, and the replacement for kmap_atomic() operates on. Until that 
exists (and I simply cannot believe it would be an unadorned physical 
address) there cannot be any *meaningful* progress made towards removing 
the struct page dependency from the DMA API. If there is also a goal to 
kill off highmem before then, then logically we should just wait for 
that to land, then revert back to dma_map_single() being the first-class 
interface, and dma_map_page() can turn into a trivial page_to_virt() 
wrapper for the long tail of caller conversions.

Simply obfuscating the struct page dependency today by dressing it up as 
a phys_addr_t with implicit baggage is not not in any way helpful. It 
only makes the code harder to understand and more bug-prone. Despite the 
disingenuous claims, it is quite blatantly the opposite of "efficient" 
for callers to do extra work to throw away useful information with 
page_to_phys(), and the implementation then have to re-derive that 
information with pfn_valid()/phys_to_page().

And by "bug-prone" I also include greater distractions like this 
misguided idea that the same API could somehow work for non-memory 
addresses too, so then everyone can move on bikeshedding VFIO while 
overlooking the fundamental flaws in the whole premise. I mean, besides 
all the issues I've already pointed out in that regard, not least the 
glaring fact that it's literally just a worse version of *an API we 
already have*, as DMA API maintainer do you *really* approve of a design 
that depends on callers abusing DMA_ATTR_SKIP_CPU_SYNC, yet will still 
readily blow up if they did then call a dma_sync op?

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-30 11:11           ` Robin Murphy
@ 2025-07-30 13:40             ` Leon Romanovsky
  2025-07-30 14:28               ` Jason Gunthorpe
  2025-07-30 16:32             ` Marek Szyprowski
  2025-07-31 17:37             ` Matthew Wilcox
  2 siblings, 1 reply; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-30 13:40 UTC (permalink / raw)
  To: Robin Murphy, Marek Szyprowski
  Cc: Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm, Jason Gunthorpe

On Wed, Jul 30, 2025 at 12:11:32PM +0100, Robin Murphy wrote:
> On 2025-07-08 11:27 am, Marek Szyprowski wrote:
> > On 30.06.2025 15:38, Christoph Hellwig wrote:
> > > On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> > > > > Thanks for this rework! I assume that the next step is to add map_phys
> > > > > callback also to the dma_map_ops and teach various dma-mapping providers
> > > > > to use it to avoid more phys-to-page-to-phys conversions.
> > > > Probably Christoph will say yes, however I personally don't see any
> > > > benefit in this. Maybe I wrong here, but all existing .map_page()
> > > > implementation platforms don't support p2p anyway. They won't benefit
> > > > from this such conversion.
> > > I think that conversion should eventually happen, and rather sooner than
> > > later.
> > 
> > Agreed.
> > 
> > Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> > needs a stable branch with it.
> 
> As the maintainer of iommu-dma, please drop the iommu-dma patch because it
> is broken. It does not in any way remove the struct page dependency from
> iommu-dma, it merely hides it so things can crash more easily in
> circumstances that clearly nobody's bothered to test.
> 
> > Leon, it would be great if You could also prepare an incremental patch
> > adding map_phys callback to the dma_maps_ops, so the individual
> > arch-specific dma-mapping providers can be then converted (or simplified
> > in many cases) too.
> 
> Marek, I'm surprised that even you aren't seeing why that would at best be
> pointless churn. The fundamental design of dma_map_page() operating on
> struct page is that it sits in between alloc_pages() at the caller and
> kmap_atomic() deep down in the DMA API implementation (which also subsumes
> any dependencies on having a kernel virtual address at the implementation
> end). The natural working unit for whatever replaces dma_map_page() will be
> whatever the replacement for alloc_pages() returns, and the replacement for
> kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> would be an unadorned physical address) there cannot be any *meaningful*
> progress made towards removing the struct page dependency from the DMA API.
> If there is also a goal to kill off highmem before then, then logically we
> should just wait for that to land, then revert back to dma_map_single()
> being the first-class interface, and dma_map_page() can turn into a trivial
> page_to_virt() wrapper for the long tail of caller conversions.
> 
> Simply obfuscating the struct page dependency today by dressing it up as a
> phys_addr_t with implicit baggage is not not in any way helpful. It only
> makes the code harder to understand and more bug-prone. Despite the
> disingenuous claims, it is quite blatantly the opposite of "efficient" for
> callers to do extra work to throw away useful information with
> page_to_phys(), and the implementation then have to re-derive that
> information with pfn_valid()/phys_to_page().
> 
> And by "bug-prone" I also include greater distractions like this misguided
> idea that the same API could somehow work for non-memory addresses too, so
> then everyone can move on bikeshedding VFIO while overlooking the
> fundamental flaws in the whole premise. I mean, besides all the issues I've
> already pointed out in that regard, not least the glaring fact that it's
> literally just a worse version of *an API we already have*, as DMA API
> maintainer do you *really* approve of a design that depends on callers
> abusing DMA_ATTR_SKIP_CPU_SYNC, yet will still readily blow up if they did
> then call a dma_sync op?

Robin, Marek

I would like to ask you to do not drop this series and allow me to
gradually change the code during my VFIO DMABUF adventure.

The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
pass it to both dma_map_phys() and dma_iova_link(). This flag will
indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

dma_map_phys() isn't entirely wrong, it just needs an extra tweaks.

Thanks

> 
> Thanks,
> Robin.
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-30 13:40             ` Leon Romanovsky
@ 2025-07-30 14:28               ` Jason Gunthorpe
  2025-07-31  6:01                 ` Leon Romanovsky
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2025-07-30 14:28 UTC (permalink / raw)
  To: Leon Romanovsky, Matthew Wilcox, David Hildenbrand
  Cc: Robin Murphy, Marek Szyprowski, Christoph Hellwig,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm

On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

> > The natural working unit for whatever replaces dma_map_page() will be
> > whatever the replacement for alloc_pages() returns, and the replacement for
> > kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> > would be an unadorned physical address) there cannot be any
> > *meaningful*

alloc_pages becomes legacy.

There will be some new API 'memdesc alloc'. If I understand Matthew's
plan properly - here is a sketch of changing iommu-pages:

--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -36,9 +36,10 @@ static_assert(sizeof(struct ioptdesc) <= sizeof(struct page));
  */
 void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
 {
+       struct ioptdesc *desc;
        unsigned long pgcnt;
-       struct folio *folio;
        unsigned int order;
+       void *addr;

        /* This uses page_address() on the memory. */
        if (WARN_ON(gfp & __GFP_HIGHMEM))
@@ -56,8 +57,8 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
        if (nid == NUMA_NO_NODE)
                nid = numa_mem_id();

-       folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);
-       if (unlikely(!folio))
+       addr = memdesc_alloc_pages(&desc, gfp | __GFP_ZERO, order, nid);
+       if (unlikely(!addr))
                return NULL;

        /*
@@ -73,7 +74,7 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
        mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt);
        lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt);

-       return folio_address(folio);
+       return addr;
 }

Where the memdesc_alloc_pages() will kmalloc a 'struct ioptdesc' and
some other change so that virt_to_ioptdesc() indirects through a new
memdesc. See here:

https://kernelnewbies.org/MatthewWilcox/Memdescs

We don't end up with some kind of catch-all struct to mean 'cachable
CPU memory' anymore because every user gets their own unique "struct
XXXdesc". So the thinking has been that the phys_addr_t is the best
option. I guess the alternative would be the memdesc as a handle, but
I'm not sure that is such a good idea. 

People still express a desire to be able to do IO to cachable memory
that has a KVA through phys_to_virt but no memdesc/page allocation. I
don't know if this will happen but it doesn't seem like a good idea to
make it impossible by forcing memdesc types into low level APIs that
don't use them.

Also, the bio/scatterlist code between pin_user_pages() and DMA
mapping is consolidating physical contiguity. This runs faster if you
don't have to to page_to_phys() because everything is already
phys_addr_t.

> > progress made towards removing the struct page dependency from the DMA API.
> > If there is also a goal to kill off highmem before then, then logically we
> > should just wait for that to land, then revert back to dma_map_single()
> > being the first-class interface, and dma_map_page() can turn into a trivial
> > page_to_virt() wrapper for the long tail of caller conversions.

As I said there are many many projects related here and we can
meaningfully make progress in parts. It is not functionally harmful to
do the phys to page conversion before calling the legacy
dma_ops/SWIOTLB etc. This avoids creating patch dependencies with
highmem removal and other projects.

So long as the legacy things (highmem, dma_ops, etc) continue to work
I think it is OK to accept some obfuscation to allow the modern things
to work better. The majority flow - no highmem, no dma ops, no
swiotlb, does not require struct page. Having to do

  PTE -> phys -> page -> phys -> DMA

Does have a cost.

> The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> pass it to both dma_map_phys() and dma_iova_link(). This flag will
> indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

So the idea is if the memory is non-cachable, no-KVA you'd call
dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
DMA_ATTR_MMIO) ?

And then internally the dma_ops and dma_iommu would use the existing
map_page/map_resource variations based on the flag, thus ensuring that
MMIO is never kmap'd or cache flushed?

dma_map_resource is really then just
dma_map_phys(phys_addr, DMA_ATTR_MMIO)?

I like this, I think it well addresses the concerns.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-30 11:11           ` Robin Murphy
  2025-07-30 13:40             ` Leon Romanovsky
@ 2025-07-30 16:32             ` Marek Szyprowski
  2025-07-31 17:37             ` Matthew Wilcox
  2 siblings, 0 replies; 37+ messages in thread
From: Marek Szyprowski @ 2025-07-30 16:32 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig, Leon Romanovsky
  Cc: Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm, Jason Gunthorpe

On 30.07.2025 13:11, Robin Murphy wrote:
> On 2025-07-08 11:27 am, Marek Szyprowski wrote:
>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>> Thanks for this rework! I assume that the next step is to add 
>>>>> map_phys
>>>>> callback also to the dma_map_ops and teach various dma-mapping 
>>>>> providers
>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>> Probably Christoph will say yes, however I personally don't see any
>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>> from this such conversion.
>>> I think that conversion should eventually happen, and rather sooner 
>>> than
>>> later.
>>
>> Agreed.
>>
>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>> needs a stable branch with it.
>
> As the maintainer of iommu-dma, please drop the iommu-dma patch 
> because it is broken. It does not in any way remove the struct page 
> dependency from iommu-dma, it merely hides it so things can crash more 
> easily in circumstances that clearly nobody's bothered to test.
>
>> Leon, it would be great if You could also prepare an incremental patch
>> adding map_phys callback to the dma_maps_ops, so the individual
>> arch-specific dma-mapping providers can be then converted (or simplified
>> in many cases) too.
>
> Marek, I'm surprised that even you aren't seeing why that would at 
> best be pointless churn. The fundamental design of dma_map_page() 
> operating on struct page is that it sits in between alloc_pages() at 
> the caller and kmap_atomic() deep down in the DMA API implementation 
> (which also subsumes any dependencies on having a kernel virtual 
> address at the implementation end). The natural working unit for 
> whatever replaces dma_map_page() will be whatever the replacement for 
> alloc_pages() returns, and the replacement for kmap_atomic() operates 
> on. Until that exists (and I simply cannot believe it would be an 
> unadorned physical address) there cannot be any *meaningful* progress 
> made towards removing the struct page dependency from the DMA API. If 
> there is also a goal to kill off highmem before then, then logically 
> we should just wait for that to land, then revert back to 
> dma_map_single() being the first-class interface, and dma_map_page() 
> can turn into a trivial page_to_virt() wrapper for the long tail of 
> caller conversions.
>
> Simply obfuscating the struct page dependency today by dressing it up 
> as a phys_addr_t with implicit baggage is not not in any way helpful. 
> It only makes the code harder to understand and more bug-prone. 
> Despite the disingenuous claims, it is quite blatantly the opposite of 
> "efficient" for callers to do extra work to throw away useful 
> information with page_to_phys(), and the implementation then have to 
> re-derive that information with pfn_valid()/phys_to_page().
>
> And by "bug-prone" I also include greater distractions like this 
> misguided idea that the same API could somehow work for non-memory 
> addresses too, so then everyone can move on bikeshedding VFIO while 
> overlooking the fundamental flaws in the whole premise. I mean, 
> besides all the issues I've already pointed out in that regard, not 
> least the glaring fact that it's literally just a worse version of *an 
> API we already have*, as DMA API maintainer do you *really* approve of 
> a design that depends on callers abusing DMA_ATTR_SKIP_CPU_SYNC, yet 
> will still readily blow up if they did then call a dma_sync op?
>
Robin, Your concerns are right. I missed the fact that making everything 
depend on phys_addr_t would make DMA-mapping API prone for various 
abuses. I need to think a bit more on this and try to understand more 
the PCI P2P case, what means that I will probably miss this merge 
window. I'm sorry for the lack of being active in the discussion, but I 
just got back from my holidays and I'm trying to catch up.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-30 14:28               ` Jason Gunthorpe
@ 2025-07-31  6:01                 ` Leon Romanovsky
  0 siblings, 0 replies; 37+ messages in thread
From: Leon Romanovsky @ 2025-07-31  6:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Wilcox, David Hildenbrand, Robin Murphy, Marek Szyprowski,
	Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Wed, Jul 30, 2025 at 11:28:18AM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

<...>

> > The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> > introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> > pass it to both dma_map_phys() and dma_iova_link(). This flag will
> > indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> > right callbacks which will set IOMMU_MMIO flag and skip CPU sync,
> 
> So the idea is if the memory is non-cachable, no-KVA you'd call
> dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
> DMA_ATTR_MMIO) ?

Yes

> 
> And then internally the dma_ops and dma_iommu would use the existing
> map_page/map_resource variations based on the flag, thus ensuring that
> MMIO is never kmap'd or cache flushed?
> 
> dma_map_resource is really then just
> dma_map_phys(phys_addr, DMA_ATTR_MMIO)?
> 
> I like this, I think it well addresses the concerns.

Yes, I had this idea and implementation before. :(

> 
> Jason
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-30 11:11           ` Robin Murphy
  2025-07-30 13:40             ` Leon Romanovsky
  2025-07-30 16:32             ` Marek Szyprowski
@ 2025-07-31 17:37             ` Matthew Wilcox
  2025-08-03 15:59               ` Jason Gunthorpe
  2 siblings, 1 reply; 37+ messages in thread
From: Matthew Wilcox @ 2025-07-31 17:37 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Marek Szyprowski, Christoph Hellwig, Leon Romanovsky,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon,
	Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu,
	Mathieu Desnoyers, Jérôme Glisse, Andrew Morton,
	linux-doc, linux-kernel, linuxppc-dev, iommu, virtualization,
	kasan-dev, linux-trace-kernel, linux-mm, Jason Gunthorpe

Hi Robin,

I don't know the DMA mapping code well and haven't reviewed this
patch set in particular, but I wanted to comment on some of the things
you say here.

> Marek, I'm surprised that even you aren't seeing why that would at best be
> pointless churn. The fundamental design of dma_map_page() operating on
> struct page is that it sits in between alloc_pages() at the caller and
> kmap_atomic() deep down in the DMA API implementation (which also subsumes
> any dependencies on having a kernel virtual address at the implementation
> end). The natural working unit for whatever replaces dma_map_page() will be
> whatever the replacement for alloc_pages() returns, and the replacement for
> kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> would be an unadorned physical address) there cannot be any *meaningful*
> progress made towards removing the struct page dependency from the DMA API.
> If there is also a goal to kill off highmem before then, then logically we
> should just wait for that to land, then revert back to dma_map_single()
> being the first-class interface, and dma_map_page() can turn into a trivial
> page_to_virt() wrapper for the long tail of caller conversions.

While I'm sure we'd all love to kill off highmem, that's not a realistic
goal for another ten years or so.  There are meaningful improvements we
can make, for example pulling page tables out of highmem, but we need to
keep file data and anonymous memory in highmem, so we'll need to support
DMA to highmem for the foreseeable future.

The replacement for kmap_atomic() is already here -- it's
kmap_(atomic|local)_pfn().  If a simple wrapper like kmap_local_phys()
would make this more palatable, that would be fine by me.  Might save
a bit of messing around with calculating offsets in each caller.

As far as replacing alloc_pages() goes, some callers will still use
alloc_pages().  Others will use folio_alloc() or have used kmalloc().
Or maybe the caller won't have used any kind of page allocation because
they're doing I/O to something that isn't part of Linux's memory at all.
Part of the Grand Plan here is for Linux to catch up with Xen's ability
to do I/O to guests without allocating struct pages for every page of
memory in the guests.

You say that a physical address will need some adornment -- can you
elaborate on that for me?  It may be that I'm missing something
important here.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-07-31 17:37             ` Matthew Wilcox
@ 2025-08-03 15:59               ` Jason Gunthorpe
  2025-08-04  3:37                 ` Matthew Wilcox
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2025-08-03 15:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Robin Murphy, Marek Szyprowski, Christoph Hellwig,
	Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Thu, Jul 31, 2025 at 06:37:11PM +0100, Matthew Wilcox wrote:

> The replacement for kmap_atomic() is already here -- it's
> kmap_(atomic|local)_pfn().  If a simple wrapper like kmap_local_phys()
> would make this more palatable, that would be fine by me.  Might save
> a bit of messing around with calculating offsets in each caller.

I think that makes the general plan clearer. We should be removing the
struct pages entirely from the insides of DMA API layer and use the
phys_addr_t, kmap_XX_phys(), phys_to_virt(), and so on.

The request from Christoph and Marek to clean up the dma_ops makes
sense in that context, we'd have to go into the ops and replace the
struct page kmaps/etc with the phys based ones.

This hides the struct page requirement to get to a KVA inside the core
mm code only and that sort of modularity is exactly the sort of thing
that could help entirely remove a struct page requirement for some
kinds of DMA someday.

Matthew, do you think it makes sense to introduce types to make this
clearer? We have two kinds of values that a phys_addr_t can store -
something compatible with kmap_XX_phys(), and something that isn't.

This was recently a long discussion in ARM KVM as well which had a
similar confusion that a phys_addr_t was actually two very different
things inside its logic.

So what about some dedicated types:
 kphys_addr_t - A physical address that can be passed to
     kmap_XX_phys(), phys_to_virt(), etc.

 raw_phys_addr_t - A physical address that may not be cachable, may
     not be DRAM, and does not work with kmap_XX_phys()/etc.

We clearly have these two different ideas floating around in code,
page tables, etc.

I read some of Robin's concern that the struct page provided a certain
amount of type safety in the DMA API, this could provide similar.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-08-03 15:59               ` Jason Gunthorpe
@ 2025-08-04  3:37                 ` Matthew Wilcox
  2025-08-05 15:36                   ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Matthew Wilcox @ 2025-08-04  3:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Marek Szyprowski, Christoph Hellwig,
	Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Sun, Aug 03, 2025 at 12:59:06PM -0300, Jason Gunthorpe wrote:
> Matthew, do you think it makes sense to introduce types to make this
> clearer? We have two kinds of values that a phys_addr_t can store -
> something compatible with kmap_XX_phys(), and something that isn't.

I was with you up until this point.  And then you said "What if we have
a raccoon that isn't a raccoon" and my brain derailed.

> This was recently a long discussion in ARM KVM as well which had a
> similar confusion that a phys_addr_t was actually two very different
> things inside its logic.

No.  A phys_addr_t is a phys_addr_t.  If something's abusing a
phys_addr_t to store something entirely different then THAT is what
should be using a different type.  We've defined what a phys_addr_t
is.  That was in Documentation/core-api/bus-virt-phys-mapping.rst
before Arnd removed it; to excerpt the relevant bit:

---

- CPU untranslated.  This is the "physical" address.  Physical address
  0 is what the CPU sees when it drives zeroes on the memory bus.

[...]
So why do we care about the physical address at all? We do need the physical
address in some cases, it's just not very often in normal code.  The physical
address is needed if you use memory mappings, for example, because the
"remap_pfn_range()" mm function wants the physical address of the memory to
be remapped as measured in units of pages, a.k.a. the pfn.

---

So if somebody is stuffing something else into phys_addr_t, *THAT* is
what needs to be fixed, not adding a new sub-type of phys_addr_t for
things which are actually phys_addr_t.

> We clearly have these two different ideas floating around in code,
> page tables, etc.

No.  No, we don't.  I've never heard of this asininity before.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
  2025-08-04  3:37                 ` Matthew Wilcox
@ 2025-08-05 15:36                   ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-08-05 15:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Robin Murphy, Marek Szyprowski, Christoph Hellwig,
	Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel,
	Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers,
	Jérôme Glisse, Andrew Morton, linux-doc, linux-kernel,
	linuxppc-dev, iommu, virtualization, kasan-dev,
	linux-trace-kernel, linux-mm

On Mon, Aug 04, 2025 at 04:37:56AM +0100, Matthew Wilcox wrote:
> On Sun, Aug 03, 2025 at 12:59:06PM -0300, Jason Gunthorpe wrote:
> > Matthew, do you think it makes sense to introduce types to make this
> > clearer? We have two kinds of values that a phys_addr_t can store -
> > something compatible with kmap_XX_phys(), and something that isn't.
> 
> I was with you up until this point.  And then you said "What if we have
> a raccoon that isn't a raccoon" and my brain derailed.

I though it was clear..

   kmap_local_pfn(phys >> PAGE_SHIFT)
   phys_to_virt(phys)

Does not work for all values of phys. It definately illegal for
non-cachable MMIO. Agree?

There is a subset of phys that is cachable and has struct page that is
usable with kmap_local_pfn()/etc

phys is always this:

> - CPU untranslated.  This is the "physical" address.  Physical address
>   0 is what the CPU sees when it drives zeroes on the memory bus.

But that is a pure HW perspective. It doesn't say which of our SW APIs
are allowed to use this address.

We have callchains in DMA API land that want to do a kmap at the
bottom. It would be nice to mark the whole call chain that the
phys_addr being passed around is actually required to be kmappable.

Because if you pass a non-kmappable MMIO backed phys it will explode
in some way on some platforms.

> > We clearly have these two different ideas floating around in code,
> > page tables, etc.

> No.  No, we don't.  I've never heard of this asininity before.

Welcome to the fun world of cachable and non-cachable memory.

Consider, today we can create struct pages of type
MEMORY_DEVICE_PCI_P2PDMA for non-cachable MMIO. I think today you
"can" use kmap to establish a cachable mapping in the vmap.

But it is *illegal* to establish a cachable CPU mapping of MMIO. Archs
are free to MCE if you do this - speculative cache line load of MMIO
can just error in HW inside the interconnect.

So, the phys_addr is always a "CPU untranslated physical address" but
the cachable/non-cachable cases, or DRAM vs MMIO, are sometimes
semantically very different things for the SW!

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-08-05 15:36 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20250625131920eucas1p271b196cde042bd39ac08fb12beff5baf@eucas1p2.samsung.com>
2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-06-25 13:18   ` [PATCH 1/8] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
2025-06-25 13:18   ` [PATCH 2/8] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
2025-06-25 13:19   ` [PATCH 3/8] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
2025-06-25 13:19   ` [PATCH 4/8] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
2025-06-25 13:19   ` [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
2025-06-26 17:43     ` Alexander Potapenko
2025-06-26 18:45       ` Leon Romanovsky
2025-06-27 16:28         ` Alexander Potapenko
2025-06-25 13:19   ` [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback Leon Romanovsky
2025-07-25 20:04     ` Robin Murphy
2025-07-27  6:30       ` Leon Romanovsky
2025-06-25 13:19   ` [PATCH 7/8] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
2025-06-25 13:19   ` [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
2025-07-15 13:24     ` Will Deacon
2025-07-15 13:58       ` Leon Romanovsky
2025-06-27 13:44   ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Marek Szyprowski
2025-06-27 17:02     ` Leon Romanovsky
2025-06-30 13:38       ` Christoph Hellwig
2025-07-08 10:27         ` Marek Szyprowski
2025-07-08 11:00           ` Leon Romanovsky
2025-07-08 11:45             ` Marek Szyprowski
2025-07-08 12:06               ` Leon Romanovsky
2025-07-08 12:56                 ` Marek Szyprowski
2025-07-08 15:57                   ` Marek Szyprowski
2025-07-30 11:11           ` Robin Murphy
2025-07-30 13:40             ` Leon Romanovsky
2025-07-30 14:28               ` Jason Gunthorpe
2025-07-31  6:01                 ` Leon Romanovsky
2025-07-30 16:32             ` Marek Szyprowski
2025-07-31 17:37             ` Matthew Wilcox
2025-08-03 15:59               ` Jason Gunthorpe
2025-08-04  3:37                 ` Matthew Wilcox
2025-08-05 15:36                   ` Jason Gunthorpe
2025-07-06  6:00       ` Leon Romanovsky
2025-07-25 20:05   ` Robin Murphy
2025-07-29 14:03     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).