linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
@ 2025-08-19 17:36 Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
                   ` (17 more replies)
  0 siblings, 18 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

Changelog:
v4:
 * Fixed kbuild error with mismatch in kmsan function declaration due to
   rebase error.
v3: https://lore.kernel.org/all/cover.1755193625.git.leon@kernel.org
 * Fixed typo in "cacheable" word
 * Simplified kmsan patch a lot to be simple argument refactoring
v2: https://lore.kernel.org/all/cover.1755153054.git.leon@kernel.org
 * Used commit messages and cover letter from Jason
 * Moved setting IOMMU_MMIO flag to dma_info_to_prot function
 * Micro-optimized the code
 * Rebased code on v6.17-rc1
v1: https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org
 * Added new DMA_ATTR_MMIO attribute to indicate
   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
 * Rewrote dma_map_* functions to use thus new attribute
v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
------------------------------------------------------------------------

This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series maintains export symbol backward compatibility by keeping
the old page-based API as wrapper functions around the new physical
address-based implementations.

This series refactors the DMA mapping API to provide a phys_addr_t
based, and struct-page free, external API that can handle all the
mapping cases we want in modern systems:

 - struct page based cachable DRAM
 - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cachable
   MMIO
 - struct page-less PCI peer to peer non-cachable MMIO
 - struct page-less "resource" MMIO

Overall this gets much closer to Matthew's long term wish for
struct-pageless IO to cachable DRAM. The remaining primary work would
be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
phys_addr_t without a struct page.

The general design is to remove struct page usage entirely from the
DMA API inner layers. For flows that need to have a KVA for the
physical address they can use kmap_local_pfn() or phys_to_virt(). This
isolates the struct page requirements to MM code only. Long term all
removals of struct page usage are supporting Matthew's memdesc
project which seeks to substantially transform how struct page works.

Instead make the DMA API internals work on phys_addr_t. Internally
there are still dedicated 'page' and 'resource' flows, except they are
now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
flows use the same phys_addr_t.

When DMA_ATTR_MMIO is specified things work similar to the existing
'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
pfn_valid(), etc are never called on the phys_addr_t. This requires
rejecting any configuration that would need swiotlb. CPU cache
flushing is not required, and avoided, as ATTR_MMIO also indicates the
address have no cachable mappings. This effectively removes any
DMA API side requirement to have struct page when DMA_ATTR_MMIO is
used.

In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
except on the common path of no cache flush, no swiotlb it never
touches a struct page. When cache flushing or swiotlb copying
kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
usage. This was already the case on the unmap side, now the map side
is symmetric.

Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
path must also set it. This corrects some existing bugs where iommu
mappings for P2P MMIO were improperly marked IOMMU_CACHE.

Since ATTR_MMIO is made to work with all the existing DMA map entry
points, particularly dma_iova_link(), this finally allows a way to use
the new DMA API to map PCI P2P MMIO without creating struct page. The
VFIO DMABUF series demonstrates how this works. This is intended to
replace the incorrect driver use of dma_map_resource() on PCI BAR
addresses.

This series does the core code and modern flows. A followup series
will give the same treatment to the legacy dma_ops implementation.

Thanks

Leon Romanovsky (16):
  dma-mapping: introduce new DMA attribute to indicate MMIO memory
  iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
  dma-debug: refactor to use physical addresses for page mapping
  dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
  dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  kmsan: convert kmsan_handle_dma to use physical addresses
  dma-mapping: handle MMIO flow in dma_map|unmap_page
  xen: swiotlb: Open code map_resource callback
  dma-mapping: export new dma_*map_phys() interface
  mm/hmm: migrate to physical address-based DMA mapping API
  mm/hmm: properly take MMIO path
  block-dma: migrate to dma_map_phys instead of map_page
  block-dma: properly take MMIO path
  nvme-pci: unmap MMIO pages with appropriate interface

 Documentation/core-api/dma-api.rst        |   4 +-
 Documentation/core-api/dma-attributes.rst |  18 ++++
 arch/powerpc/kernel/dma-iommu.c           |   4 +-
 block/blk-mq-dma.c                        |  15 ++-
 drivers/iommu/dma-iommu.c                 |  61 +++++------
 drivers/nvme/host/pci.c                   |  18 +++-
 drivers/virtio/virtio_ring.c              |   4 +-
 drivers/xen/swiotlb-xen.c                 |  21 +++-
 include/linux/blk-mq-dma.h                |   6 +-
 include/linux/blk_types.h                 |   2 +
 include/linux/dma-direct.h                |   2 -
 include/linux/dma-map-ops.h               |   8 +-
 include/linux/dma-mapping.h               |  33 ++++++
 include/linux/iommu-dma.h                 |  11 +-
 include/linux/kmsan.h                     |   9 +-
 include/trace/events/dma.h                |   9 +-
 kernel/dma/debug.c                        |  71 ++++---------
 kernel/dma/debug.h                        |  37 ++-----
 kernel/dma/direct.c                       |  22 +---
 kernel/dma/direct.h                       |  52 ++++++----
 kernel/dma/mapping.c                      | 117 +++++++++++++---------
 kernel/dma/ops_helpers.c                  |   6 +-
 mm/hmm.c                                  |  19 ++--
 mm/kmsan/hooks.c                          |   5 +-
 rust/kernel/dma.rs                        |   3 +
 tools/virtio/linux/kmsan.h                |   2 +-
 26 files changed, 305 insertions(+), 254 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 13:03   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
that reside in memory-mapped I/O (MMIO) regions, such as device BARs
exposed through the host bridge, which are accessible for peer-to-peer
(P2P) DMA.

This attribute is especially useful for exporting device memory to other
devices for DMA without CPU involvement, and avoids unnecessary or
potentially detrimental CPU cache maintenance calls.

DMA_ATTR_MMIO is supposed to provide dma_map_resource() functionality
without need to call to special function and perform branching by
the callers.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/core-api/dma-attributes.rst | 18 ++++++++++++++++++
 include/linux/dma-mapping.h               | 20 ++++++++++++++++++++
 include/trace/events/dma.h                |  3 ++-
 rust/kernel/dma.rs                        |  3 +++
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..0bdc2be65e57 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,21 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
 subsystem that the buffer is fully accessible at the elevated privilege
 level (and ideally inaccessible or at least read-only at the
 lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute indicates the physical address is not normal system
+memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+functions, it may not be cacheable, and access using CPU load/store
+instructions may not be allowed.
+
+Usually this will be used to describe MMIO addresses, or other non-cacheable
+register addresses. When DMA mapping this sort of address we call
+the operation Peer to Peer as a one device is DMA'ing to another device.
+For PCI devices the p2pdma APIs must be used to determine if
+DMA_ATTR_MMIO is appropriate.
+
+For architectures that require cache flushing for DMA coherence
+DMA_ATTR_MMIO will not perform any cache flushing. The address
+provided must never be mapped cacheable into the CPU.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb..4254fd9bdf5d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,26 @@
  */
 #define DMA_ATTR_PRIVILEGED		(1UL << 9)
 
+/*
+ * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
+ *
+ * This attribute indicates the physical address is not normal system
+ * memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+ * functions, it may not be cacheable, and access using CPU load/store
+ * instructions may not be allowed.
+ *
+ * Usually this will be used to describe MMIO addresses, or other non-cacheable
+ * register addresses. When DMA mapping this sort of address we call
+ * the operation Peer to Peer as a one device is DMA'ing to another device.
+ * For PCI devices the p2pdma APIs must be used to determine if DMA_ATTR_MMIO
+ * is appropriate.
+ *
+ * For architectures that require cache flushing for DMA coherence
+ * DMA_ATTR_MMIO will not perform any cache flushing. The address
+ * provided must never be mapped cacheable into the CPU.
+ */
+#define DMA_ATTR_MMIO		(1UL << 10)
+
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
  * be given to a device to use as a DMA source or target.  It is specific to a
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c..ee90d6f1dcf3 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -31,7 +31,8 @@ TRACE_DEFINE_ENUM(DMA_NONE);
 		{ DMA_ATTR_FORCE_CONTIGUOUS, "FORCE_CONTIGUOUS" }, \
 		{ DMA_ATTR_ALLOC_SINGLE_PAGES, "ALLOC_SINGLE_PAGES" }, \
 		{ DMA_ATTR_NO_WARN, "NO_WARN" }, \
-		{ DMA_ATTR_PRIVILEGED, "PRIVILEGED" })
+		{ DMA_ATTR_PRIVILEGED, "PRIVILEGED" }, \
+		{ DMA_ATTR_MMIO, "MMIO" })
 
 DECLARE_EVENT_CLASS(dma_map,
 	TP_PROTO(struct device *dev, phys_addr_t phys_addr, dma_addr_t dma_addr,
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 2bc8ab51ec28..61d9eed7a786 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -242,6 +242,9 @@ pub mod attrs {
     /// Indicates that the buffer is fully accessible at an elevated privilege level (and
     /// ideally inaccessible or at least read-only at lesser-privileged levels).
     pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+
+    /// Indicates that the buffer is MMIO memory.
+    pub const DMA_ATTR_MMIO: Attrs = Attrs(bindings::DMA_ATTR_MMIO);
 }
 
 /// An abstraction of the `dma_alloc_coherent` API.
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid
touching the possibly non-KVA MMIO memory.

Also correct the incorrect caching attribute for the IOMMU, MMIO
memory should not be cachable inside the IOMMU mapping or it can
possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fe..e1185ba73e23 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -724,7 +724,12 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, struct device *dev
 static int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
 		     unsigned long attrs)
 {
-	int prot = coherent ? IOMMU_CACHE : 0;
+	int prot;
+
+	if (attrs & DMA_ATTR_MMIO)
+		prot = IOMMU_MMIO;
+	else
+		prot = coherent ? IOMMU_CACHE : 0;
 
 	if (attrs & DMA_ATTR_PRIVILEGED)
 		prot |= IOMMU_PRIV;
@@ -1838,12 +1843,13 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
 		unsigned long attrs)
 {
 	bool coherent = dev_is_dma_coherent(dev);
+	int prot = dma_info_to_prot(dir, coherent, attrs);
 
-	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+	if (!coherent && !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
 		arch_sync_dma_for_device(phys, size, dir);
 
 	return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
-			dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
+			prot, GFP_ATOMIC);
 }
 
 static int iommu_dma_iova_bounce_and_link(struct device *dev, dma_addr_t addr,
@@ -1949,9 +1955,13 @@ int dma_iova_link(struct device *dev, struct dma_iova_state *state,
 		return -EIO;
 
 	if (dev_use_swiotlb(dev, size, dir) &&
-	    iova_unaligned(iovad, phys, size))
+	    iova_unaligned(iovad, phys, size)) {
+		if (attrs & DMA_ATTR_MMIO)
+			return -EPERM;
+
 		return iommu_dma_iova_link_swiotlb(dev, state, phys, offset,
 				size, dir, attrs);
+	}
 
 	return __dma_iova_link(dev, state->addr + offset - iova_start_pad,
 			phys - iova_start_pad,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 13:19   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.

The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.

This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 Documentation/core-api/dma-api.rst |  4 ++--
 kernel/dma/debug.c                 | 28 +++++++++++++++++-----------
 kernel/dma/debug.h                 | 16 +++++++---------
 kernel/dma/mapping.c               | 15 ++++++++-------
 4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 3087bea715ed..ca75b3541679 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -761,7 +761,7 @@ example warning message may look like this::
 	[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
 	[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
 	[<ffffffff803c7ea3>] check_unmap+0x203/0x490
-	[<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+	[<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
 	[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
 	[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
 	[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -855,7 +855,7 @@ that a driver may be leaking mappings.
 dma-debug interface debug_dma_mapping_error() to debug drivers that fail
 to check DMA mapping errors on addresses returned by dma_map_single() and
 dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
 the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
 this flag is still set, prints warning message that includes call trace that
 leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index e43c6de2bce4..da6734e3a4ce 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -39,6 +39,7 @@ enum {
 	dma_debug_sg,
 	dma_debug_coherent,
 	dma_debug_resource,
+	dma_debug_phy,
 };
 
 enum map_err_types {
@@ -141,6 +142,7 @@ static const char *type2name[] = {
 	[dma_debug_sg] = "scatter-gather",
 	[dma_debug_coherent] = "coherent",
 	[dma_debug_resource] = "resource",
+	[dma_debug_phy] = "phy",
 };
 
 static const char *dir2name[] = {
@@ -1201,9 +1203,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
 }
 EXPORT_SYMBOL(debug_dma_map_single);
 
-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
-			size_t size, int direction, dma_addr_t dma_addr,
-			unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		int direction, dma_addr_t dma_addr, unsigned long attrs)
 {
 	struct dma_debug_entry *entry;
 
@@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 		return;
 
 	entry->dev       = dev;
-	entry->type      = dma_debug_single;
-	entry->paddr	 = page_to_phys(page) + offset;
+	entry->type      = dma_debug_phy;
+	entry->paddr	 = phys;
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
 	entry->map_err_type = MAP_ERR_NOT_CHECKED;
 
-	check_for_stack(dev, page, offset);
+	if (!(attrs & DMA_ATTR_MMIO)) {
+		struct page *page = phys_to_page(phys);
+		size_t offset = offset_in_page(page);
 
-	if (!PageHighMem(page)) {
-		void *addr = page_address(page) + offset;
+		check_for_stack(dev, page, offset);
 
-		check_for_illegal_area(dev, addr, size);
+		if (!PageHighMem(page)) {
+			void *addr = page_address(page) + offset;
+
+			check_for_illegal_area(dev, addr, size);
+		}
 	}
 
 	add_dma_entry(entry, attrs);
@@ -1274,11 +1280,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 }
 EXPORT_SYMBOL(debug_dma_mapping_error);
 
-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
 			  size_t size, int direction)
 {
 	struct dma_debug_entry ref = {
-		.type           = dma_debug_single,
+		.type           = dma_debug_phy,
 		.dev            = dev,
 		.dev_addr       = dma_addr,
 		.size           = size,
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index f525197d3cae..76adb42bffd5 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
 #define _KERNEL_DMA_DEBUG_H
 
 #ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
-			       size_t offset, size_t size,
-			       int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+			       size_t size, int direction, dma_addr_t dma_addr,
 			       unsigned long attrs);
 
-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction);
 
 extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -55,14 +54,13 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
 					 struct scatterlist *sg,
 					 int nelems, int direction);
 #else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
-				      size_t offset, size_t size,
-				      int direction, dma_addr_t dma_addr,
-				      unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+				      size_t size, int direction,
+				      dma_addr_t dma_addr, unsigned long attrs)
 {
 }
 
-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
 					size_t size, int direction)
 {
 }
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 107e4a4d251d..4c1dfbabb8ae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+	    arch_dma_map_page_direct(dev, phys + size))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
-	trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
-			   attrs);
-	debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+	trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
 	return addr;
 }
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_page(dev, addr, size, dir, attrs);
-	debug_dma_unmap_page(dev, addr, size, dir);
+	debug_dma_unmap_phys(dev, addr, size, dir);
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
@@ -712,7 +712,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
 	if (page) {
 		trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle,
 				      size, dir, gfp, 0);
-		debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0);
+		debug_dma_map_phys(dev, page_to_phys(page), size, dir,
+				   *dma_handle, 0);
 	} else {
 		trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0);
 	}
@@ -738,7 +739,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
 		dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 	trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0);
-	debug_dma_unmap_page(dev, dma_handle, size, dir);
+	debug_dma_unmap_phys(dev, dma_handle, size, dir);
 	__dma_free_pages(dev, size, page, dma_handle, dir);
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (2 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 13:27   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/trace/events/dma.h | 4 ++--
 kernel/dma/mapping.c       | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index ee90d6f1dcf3..84416c7d6bfa 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -72,7 +72,7 @@ DEFINE_EVENT(dma_map, name, \
 		 size_t size, enum dma_data_direction dir, unsigned long attrs), \
 	TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
 
-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
 DEFINE_MAP_EVENT(dma_map_resource);
 
 DECLARE_EVENT_CLASS(dma_unmap,
@@ -110,7 +110,7 @@ DEFINE_EVENT(dma_unmap, name, \
 		 enum dma_data_direction dir, unsigned long attrs), \
 	TP_ARGS(dev, addr, size, dir, attrs))
 
-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
 DEFINE_UNMAP_EVENT(dma_unmap_resource);
 
 DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4c1dfbabb8ae..fe1f0da6dc50 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
-	trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
 	return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		iommu_dma_unmap_page(dev, addr, size, dir, attrs);
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
-	trace_dma_unmap_page(dev, addr, size, dir, attrs);
+	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (3 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 13:38   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.

The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.

All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 14 ++++++--------
 include/linux/iommu-dma.h |  7 +++----
 kernel/dma/mapping.c      |  4 ++--
 kernel/dma/ops_helpers.c  |  6 +++---
 4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e1185ba73e23..aea119f32f96 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1195,11 +1195,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
 	return iova_offset(iovad, phys | size);
 }
 
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-	      unsigned long offset, size_t size, enum dma_data_direction dir,
-	      unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
-	phys_addr_t phys = page_to_phys(page) + offset;
 	bool coherent = dev_is_dma_coherent(dev);
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1227,7 +1225,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 	return iova;
 }
 
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1346,7 +1344,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		iommu_dma_unmap_page(dev, sg_dma_address(s),
+		iommu_dma_unmap_phys(dev, sg_dma_address(s),
 				sg_dma_len(s), dir, attrs);
 }
 
@@ -1359,8 +1357,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
 	sg_dma_mark_swiotlb(sg);
 
 	for_each_sg(sg, s, nents, i) {
-		sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
-				s->offset, s->length, dir, attrs);
+		sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+				s->length, dir, attrs);
 		if (sg_dma_address(s) == DMA_MAPPING_ERROR)
 			goto out_unmap;
 		sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39..485bdffed988 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
 }
 #endif /* CONFIG_IOMMU_DMA */
 
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs);
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe1f0da6dc50..58482536db9b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 	    arch_dma_map_page_direct(dev, phys + size))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else if (use_dma_iommu(dev))
-		addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	    arch_dma_unmap_page_direct(dev, addr + size))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
-		iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
 	else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb9..6f9d604d9d40 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
 		return NULL;
 
 	if (use_dma_iommu(dev))
-		*dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
-						 DMA_ATTR_SKIP_CPU_SYNC);
+		*dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+						 dir, DMA_ATTR_SKIP_CPU_SYNC);
 	else
 		*dma_handle = ops->map_page(dev, page, 0, size, dir,
 					    DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	if (use_dma_iommu(dev))
-		iommu_dma_unmap_page(dev, dma_handle, size, dir,
+		iommu_dma_unmap_phys(dev, dma_handle, size, dir,
 				     DMA_ATTR_SKIP_CPU_SYNC);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, dma_handle, size, dir,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (4 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 13:49   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
order to allow single phys_addr_t flow.

In the following patches, the iommu_dma_map_resource() will be removed
in favour of iommu_dma_map_phys(..., attrs | DMA_ATTR_MMIO) flow.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index aea119f32f96..6804aaf034a1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1211,16 +1211,19 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
 	 */
 	if (dev_use_swiotlb(dev, size, dir) &&
 	    iova_unaligned(iovad, phys, size)) {
+		if (attrs & DMA_ATTR_MMIO)
+			return DMA_MAPPING_ERROR;
+
 		phys = iommu_dma_map_swiotlb(dev, phys, size, dir, attrs);
 		if (phys == (phys_addr_t)DMA_MAPPING_ERROR)
 			return DMA_MAPPING_ERROR;
 	}
 
-	if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+	if (!coherent && !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
 		arch_sync_dma_for_device(phys, size, dir);
 
 	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR)
+	if (iova == DMA_MAPPING_ERROR && !(attrs & DMA_ATTR_MMIO))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 	return iova;
 }
@@ -1228,10 +1231,14 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
 void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	phys_addr_t phys;
 
-	phys = iommu_iova_to_phys(domain, dma_handle);
+	if (attrs & DMA_ATTR_MMIO) {
+		__iommu_dma_unmap(dev, dma_handle, size);
+		return;
+	}
+
+	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
 	if (WARN_ON(!phys))
 		return;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (5 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 14:19   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 arch/powerpc/kernel/dma-iommu.c |  4 +--
 include/linux/dma-map-ops.h     |  8 ++---
 kernel/dma/direct.c             |  6 ++--
 kernel/dma/direct.h             | 52 +++++++++++++++++++++------------
 kernel/dma/mapping.c            |  8 ++---
 5 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f3..0359ab72cd3b 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
 #define can_map_direct(dev, addr) \
 	((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))
 
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
 {
 	if (likely(!dev->bus_dma_limit))
 		return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
 
 #define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)
 
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
 {
 	if (likely(!dev->bus_dma_limit))
 		return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5..71f5b3025415 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
 void arch_dma_clear_uncached(void *addr, size_t size);
 
 #ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
 bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
 		int nents);
 bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
 		int nents);
 #else
-#define arch_dma_map_page_direct(d, a)		(false)
-#define arch_dma_unmap_page_direct(d, a)	(false)
+#define arch_dma_map_phys_direct(d, a)		(false)
+#define arch_dma_unmap_phys_direct(d, a)	(false)
 #define arch_dma_map_sg_direct(d, s, n)		(false)
 #define arch_dma_unmap_sg_direct(d, s, n)	(false)
 #endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c879..fa75e3070073 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		if (sg_dma_is_bus_address(sg))
 			sg_dma_unmark_bus_address(sg);
 		else
-			dma_direct_unmap_page(dev, sg->dma_address,
+			dma_direct_unmap_phys(dev, sg->dma_address,
 					      sg_dma_len(sg), dir, attrs);
 	}
 }
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 			 */
 			break;
 		case PCI_P2PDMA_MAP_NONE:
-			sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
-					sg->offset, sg->length, dir, attrs);
+			sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+					sg->length, dir, attrs);
 			if (sg->dma_address == DMA_MAPPING_ERROR) {
 				ret = -EIO;
 				goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc..92dbadcd3b2f 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,42 +80,56 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		arch_dma_mark_clean(paddr, size);
 }
 
-static inline dma_addr_t dma_direct_map_page(struct device *dev,
-		struct page *page, unsigned long offset, size_t size,
-		enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+		phys_addr_t phys, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
 {
-	phys_addr_t phys = page_to_phys(page) + offset;
-	dma_addr_t dma_addr = phys_to_dma(dev, phys);
+	dma_addr_t dma_addr;
+	bool capable;
 
 	if (is_swiotlb_force_bounce(dev)) {
-		if (is_pci_p2pdma_page(page))
-			return DMA_MAPPING_ERROR;
+		if (attrs & DMA_ATTR_MMIO)
+			goto err_overflow;
+
 		return swiotlb_map(dev, phys, size, dir, attrs);
 	}
 
-	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
-	    dma_kmalloc_needs_bounce(dev, size, dir)) {
-		if (is_pci_p2pdma_page(page))
-			return DMA_MAPPING_ERROR;
-		if (is_swiotlb_active(dev))
+	if (attrs & DMA_ATTR_MMIO)
+		dma_addr = phys;
+	else
+		dma_addr = phys_to_dma(dev, phys);
+
+	capable = dma_capable(dev, dma_addr, size, !(attrs & DMA_ATTR_MMIO));
+	if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
+		if (is_swiotlb_active(dev) && !(attrs & DMA_ATTR_MMIO))
 			return swiotlb_map(dev, phys, size, dir, attrs);
 
-		dev_WARN_ONCE(dev, 1,
-			     "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
-			     &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
-		return DMA_MAPPING_ERROR;
+		goto err_overflow;
 	}
 
-	if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+	if (!dev_is_dma_coherent(dev) &&
+	    !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
 		arch_sync_dma_for_device(phys, size, dir);
 	return dma_addr;
+
+err_overflow:
+	dev_WARN_ONCE(
+		dev, 1,
+		"DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+		&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+	return DMA_MAPPING_ERROR;
 }
 
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	phys_addr_t phys = dma_to_phys(dev, addr);
+	phys_addr_t phys;
+
+	if (attrs & DMA_ATTR_MMIO)
+		/* nothing to do: uncached and no swiotlb */
+		return;
 
+	phys = dma_to_phys(dev, addr);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58482536db9b..80481a873340 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_page_direct(dev, phys + size))
-		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+	    arch_dma_map_phys_direct(dev, phys + size))
+		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_unmap_page_direct(dev, addr + size))
-		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	    arch_dma_unmap_phys_direct(dev, addr + size))
+		dma_direct_unmap_phys(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
 	else
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (6 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 15:00   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert the KMSAN DMA handling function from page-based to physical
address-based interface.

The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). The existing semantics where callers are expected to
provide only kmap memory is continued here.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/virtio/virtio_ring.c | 4 ++--
 include/linux/kmsan.h        | 9 ++++-----
 kernel/dma/mapping.c         | 3 ++-
 mm/kmsan/hooks.c             | 5 +++--
 tools/virtio/linux/kmsan.h   | 2 +-
 5 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5062061c408..c147145a6593 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
 		 * is initialized by the hardware. Explicitly check/unpoison it
 		 * depending on the direction.
 		 */
-		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+		kmsan_handle_dma(sg_phys(sg), sg->length, direction);
 		*addr = (dma_addr_t)sg_phys(sg);
 		return 0;
 	}
@@ -3157,7 +3157,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	if (!vq->use_dma_api) {
-		kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+		kmsan_handle_dma(virt_to_phys(ptr), size, dir);
 		return (dma_addr_t)virt_to_phys(ptr);
 	}
 
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d5..f2fd221107bb 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
 
 /**
  * kmsan_handle_dma() - Handle a DMA data transfer.
- * @page:   first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys:   physical address of the buffer.
  * @size:   buffer size.
  * @dir:    one of possible dma_data_direction values.
  *
@@ -192,7 +191,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
  * * initializes the buffer, if it is copied from device;
  * * does both, if this is a DMA_BIDIRECTIONAL transfer.
  */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
 		      enum dma_data_direction dir);
 
 /**
@@ -372,8 +371,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
 {
 }
 
-static inline void kmsan_handle_dma(struct page *page, size_t offset,
-				    size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+				    enum dma_data_direction dir)
 {
 }
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 80481a873340..891e1fc3e582 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
-	kmsan_handle_dma(page, offset, size, dir);
+
+	kmsan_handle_dma(phys, size, dir);
 	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f0..6de5c4820330 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,14 +336,15 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
 }
 
 /* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
 		      enum dma_data_direction dir)
 {
+	struct page *page = phys_to_page(phys);
 	u64 page_offset, to_go, addr;
 
 	if (PageHighMem(page))
 		return;
-	addr = (u64)page_address(page) + offset;
+	addr = (u64)page_address(page) + offset_in_page(phys);
 	/*
 	 * The kernel may occasionally give us adjacent DMA pages not belonging
 	 * to the same allocation. Process them separately to avoid triggering
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5..6cd2e3efd03d 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@
 
 #include <linux/gfp.h>
 
-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
 			     enum dma_data_direction dir)
 {
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (7 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 15:17   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Extend base DMA page API to handle MMIO flow and follow
existing dma_map_resource() implementation to rely on dma_map_direct()
only to take DMA direct path.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 kernel/dma/mapping.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 891e1fc3e582..fdabfdaeff1d 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	phys_addr_t phys = page_to_phys(page) + offset;
+	bool is_mmio = attrs & DMA_ATTR_MMIO;
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
@@ -166,14 +167,25 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		return DMA_MAPPING_ERROR;
 
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_map_phys_direct(dev, phys + size))
+	    (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
 		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
-	else
+	else if (is_mmio) {
+		if (!ops->map_resource)
+			return DMA_MAPPING_ERROR;
+
+		addr = ops->map_resource(dev, phys, size, dir, attrs);
+	} else {
+		/*
+		 * The dma_ops API contract for ops->map_page() requires
+		 * kmappable memory, while ops->map_resource() does not.
+		 */
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
+	}
 
-	kmsan_handle_dma(phys, size, dir);
+	if (!is_mmio)
+		kmsan_handle_dma(phys, size, dir);
 	trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
 	debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
 
@@ -185,14 +197,18 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
+	bool is_mmio = attrs & DMA_ATTR_MMIO;
 
 	BUG_ON(!valid_dma_direction(dir));
 	if (dma_map_direct(dev, ops) ||
-	    arch_dma_unmap_phys_direct(dev, addr + size))
+	    (!is_mmio && arch_dma_unmap_phys_direct(dev, addr + size)))
 		dma_direct_unmap_phys(dev, addr, size, dir, attrs);
 	else if (use_dma_iommu(dev))
 		iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
-	else
+	else if (is_mmio) {
+		if (ops->unmap_resource)
+			ops->unmap_resource(dev, addr, size, dir, attrs);
+	} else
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (8 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-28 15:18   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

General dma_direct_map_resource() is going to be removed
in next patch, so simply open-code it in xen driver.

Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/xen/swiotlb-xen.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index da1a7d3d377c..dd7747a2de87 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -392,6 +392,25 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
 	}
 }
 
+static dma_addr_t xen_swiotlb_direct_map_resource(struct device *dev,
+						  phys_addr_t paddr,
+						  size_t size,
+						  enum dma_data_direction dir,
+						  unsigned long attrs)
+{
+	dma_addr_t dma_addr = paddr;
+
+	if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
+		dev_err_once(dev,
+			     "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+			     &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+		WARN_ON_ONCE(1);
+		return DMA_MAPPING_ERROR;
+	}
+
+	return dma_addr;
+}
+
 /*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
@@ -426,5 +445,5 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
 	.alloc_pages_op = dma_common_alloc_pages,
 	.free_pages = dma_common_free_pages,
 	.max_mapping_size = swiotlb_max_mapping_size,
-	.map_resource = dma_direct_map_resource,
+	.map_resource = xen_swiotlb_direct_map_resource,
 };
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (9 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-19 18:22   ` Keith Busch
  2025-08-28 16:01   ` Jason Gunthorpe
  2025-08-19 17:36 ` [PATCH v4 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs()/dma_map_resource() and
dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
wrappers around the phys-based implementations.

In case dma_map_page_attrs(), the struct page is converted to physical
address with help of page_to_phys() function and dma_map_resource()
provides physical address as is together with addition of DMA_ATTR_MMIO
attribute.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/dma-iommu.c   | 14 --------
 include/linux/dma-direct.h  |  2 --
 include/linux/dma-mapping.h | 13 +++++++
 include/linux/iommu-dma.h   |  4 ---
 include/trace/events/dma.h  |  2 --
 kernel/dma/debug.c          | 43 -----------------------
 kernel/dma/debug.h          | 21 -----------
 kernel/dma/direct.c         | 16 ---------
 kernel/dma/mapping.c        | 69 ++++++++++++++++++++-----------------
 9 files changed, 50 insertions(+), 134 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6804aaf034a1..7944a3af4545 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1556,20 +1556,6 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		__iommu_dma_unmap(dev, start, end - start);
 }
 
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	return __iommu_dma_map(dev, phys, size,
-			dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
-			dma_get_mask(dev));
-}
-
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	__iommu_dma_unmap(dev, handle, size);
-}
-
 static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
 {
 	size_t alloc_size = PAGE_ALIGN(size);
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index f3bc0bcd7098..c249912456f9 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -149,7 +149,5 @@ void dma_direct_free_pages(struct device *dev, size_t size,
 		struct page *page, dma_addr_t dma_addr,
 		enum dma_data_direction dir);
 int dma_direct_supported(struct device *dev, u64 mask);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
 
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4254fd9bdf5d..8248ff9363ee 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -138,6 +138,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 		unsigned long attrs);
 void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+		enum dma_data_direction dir, unsigned long attrs);
 unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 		int nents, enum dma_data_direction dir, unsigned long attrs);
 void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -192,6 +196,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 }
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
 static inline unsigned int dma_map_sg_attrs(struct device *dev,
 		struct scatterlist *sg, int nents, enum dma_data_direction dir,
 		unsigned long attrs)
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 485bdffed988..a92b3ff9b934 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -42,10 +42,6 @@ size_t iommu_dma_opt_mapping_size(void);
 size_t iommu_dma_max_mapping_size(struct device *dev);
 void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
 		dma_addr_t handle, unsigned long attrs);
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
 struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev, size_t size,
 		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
 void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index 84416c7d6bfa..5da59fd8121d 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -73,7 +73,6 @@ DEFINE_EVENT(dma_map, name, \
 	TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
 
 DEFINE_MAP_EVENT(dma_map_phys);
-DEFINE_MAP_EVENT(dma_map_resource);
 
 DECLARE_EVENT_CLASS(dma_unmap,
 	TP_PROTO(struct device *dev, dma_addr_t addr, size_t size,
@@ -111,7 +110,6 @@ DEFINE_EVENT(dma_unmap, name, \
 	TP_ARGS(dev, addr, size, dir, attrs))
 
 DEFINE_UNMAP_EVENT(dma_unmap_phys);
-DEFINE_UNMAP_EVENT(dma_unmap_resource);
 
 DECLARE_EVENT_CLASS(dma_alloc_class,
 	TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index da6734e3a4ce..06e31fd216e3 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -38,7 +38,6 @@ enum {
 	dma_debug_single,
 	dma_debug_sg,
 	dma_debug_coherent,
-	dma_debug_resource,
 	dma_debug_phy,
 };
 
@@ -141,7 +140,6 @@ static const char *type2name[] = {
 	[dma_debug_single] = "single",
 	[dma_debug_sg] = "scatter-gather",
 	[dma_debug_coherent] = "coherent",
-	[dma_debug_resource] = "resource",
 	[dma_debug_phy] = "phy",
 };
 
@@ -1448,47 +1446,6 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
 	check_unmap(&ref);
 }
 
-void debug_dma_map_resource(struct device *dev, phys_addr_t addr, size_t size,
-			    int direction, dma_addr_t dma_addr,
-			    unsigned long attrs)
-{
-	struct dma_debug_entry *entry;
-
-	if (unlikely(dma_debug_disabled()))
-		return;
-
-	entry = dma_entry_alloc();
-	if (!entry)
-		return;
-
-	entry->type		= dma_debug_resource;
-	entry->dev		= dev;
-	entry->paddr		= addr;
-	entry->size		= size;
-	entry->dev_addr		= dma_addr;
-	entry->direction	= direction;
-	entry->map_err_type	= MAP_ERR_NOT_CHECKED;
-
-	add_dma_entry(entry, attrs);
-}
-
-void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
-			      size_t size, int direction)
-{
-	struct dma_debug_entry ref = {
-		.type           = dma_debug_resource,
-		.dev            = dev,
-		.dev_addr       = dma_addr,
-		.size           = size,
-		.direction      = direction,
-	};
-
-	if (unlikely(dma_debug_disabled()))
-		return;
-
-	check_unmap(&ref);
-}
-
 void debug_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
 				   size_t size, int direction)
 {
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index 76adb42bffd5..424b8f912ade 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -30,14 +30,6 @@ extern void debug_dma_alloc_coherent(struct device *dev, size_t size,
 extern void debug_dma_free_coherent(struct device *dev, size_t size,
 				    void *virt, dma_addr_t addr);
 
-extern void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
-				   size_t size, int direction,
-				   dma_addr_t dma_addr,
-				   unsigned long attrs);
-
-extern void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
-				     size_t size, int direction);
-
 extern void debug_dma_sync_single_for_cpu(struct device *dev,
 					  dma_addr_t dma_handle, size_t size,
 					  int direction);
@@ -88,19 +80,6 @@ static inline void debug_dma_free_coherent(struct device *dev, size_t size,
 {
 }
 
-static inline void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
-					  size_t size, int direction,
-					  dma_addr_t dma_addr,
-					  unsigned long attrs)
-{
-}
-
-static inline void debug_dma_unmap_resource(struct device *dev,
-					    dma_addr_t dma_addr, size_t size,
-					    int direction)
-{
-}
-
 static inline void debug_dma_sync_single_for_cpu(struct device *dev,
 						 dma_addr_t dma_handle,
 						 size_t size, int direction)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fa75e3070073..1062caac47e7 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -502,22 +502,6 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 	return ret;
 }
 
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-	dma_addr_t dma_addr = paddr;
-
-	if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
-		dev_err_once(dev,
-			     "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
-			     &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
-		WARN_ON_ONCE(1);
-		return DMA_MAPPING_ERROR;
-	}
-
-	return dma_addr;
-}
-
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fdabfdaeff1d..0ca098d2e88d 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,10 @@ static inline bool dma_map_direct(struct device *dev,
 	return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
-		size_t offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-	phys_addr_t phys = page_to_phys(page) + offset;
 	bool is_mmio = attrs & DMA_ATTR_MMIO;
 	dma_addr_t addr;
 
@@ -177,6 +175,9 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 
 		addr = ops->map_resource(dev, phys, size, dir, attrs);
 	} else {
+		struct page *page = phys_to_page(phys);
+		size_t offset = offset_in_page(phys);
+
 		/*
 		 * The dma_ops API contract for ops->map_page() requires
 		 * kmappable memory, while ops->map_resource() does not.
@@ -191,9 +192,26 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
 
 	return addr;
 }
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+		size_t offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	phys_addr_t phys = page_to_phys(page) + offset;
+
+	if (unlikely(attrs & DMA_ATTR_MMIO))
+		return DMA_MAPPING_ERROR;
+
+	if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+	    WARN_ON_ONCE(is_zone_device_page(page)))
+		return DMA_MAPPING_ERROR;
+
+	return dma_map_phys(dev, phys, size, dir, attrs);
+}
 EXPORT_SYMBOL(dma_map_page_attrs);
 
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -213,6 +231,16 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
 	trace_dma_unmap_phys(dev, addr, size, dir, attrs);
 	debug_dma_unmap_phys(dev, addr, size, dir);
 }
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+		 enum dma_data_direction dir, unsigned long attrs)
+{
+	if (unlikely(attrs & DMA_ATTR_MMIO))
+		return;
+
+	dma_unmap_phys(dev, addr, size, dir, attrs);
+}
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
 static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -338,41 +366,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
 dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-	dma_addr_t addr = DMA_MAPPING_ERROR;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	if (WARN_ON_ONCE(!dev->dma_mask))
+	if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+	    WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_map_direct(dev, ops))
-		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
-	else if (use_dma_iommu(dev))
-		addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
-	else if (ops->map_resource)
-		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
-
-	trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
-	debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
-	return addr;
+	return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
 }
 EXPORT_SYMBOL(dma_map_resource);
 
 void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
 		enum dma_data_direction dir, unsigned long attrs)
 {
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-	if (dma_map_direct(dev, ops))
-		; /* nothing to do: uncached and no swiotlb */
-	else if (use_dma_iommu(dev))
-		iommu_dma_unmap_resource(dev, addr, size, dir, attrs);
-	else if (ops->unmap_resource)
-		ops->unmap_resource(dev, addr, size, dir, attrs);
-	trace_dma_unmap_resource(dev, addr, size, dir, attrs);
-	debug_dma_unmap_resource(dev, addr, size, dir);
+	dma_unmap_phys(dev, addr, size, dir, attrs | DMA_ATTR_MMIO);
 }
 EXPORT_SYMBOL(dma_unmap_resource);
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 12/16] mm/hmm: migrate to physical address-based DMA mapping API
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (10 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.

The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.

This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 mm/hmm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index d545e2494994..015ab243f081 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -775,8 +775,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 		if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
 			goto error;
 
-		dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
-					DMA_BIDIRECTIONAL);
+		dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+					DMA_BIDIRECTIONAL, 0);
 		if (dma_mapping_error(dev, dma_addr))
 			goto error;
 
@@ -819,8 +819,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
 		dma_iova_unlink(dev, state, idx * map->dma_entry_size,
 				map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
 	} else if (dma_need_unmap(dev))
-		dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
-			       DMA_BIDIRECTIONAL);
+		dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+			       DMA_BIDIRECTIONAL, 0);
 
 	pfns[idx] &=
 		~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 13/16] mm/hmm: properly take MMIO path
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (11 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-19 17:36 ` [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

In case peer-to-peer transaction traverses through host bridge,
the IOMMU needs to have IOMMU_MMIO flag, together with skip of
CPU sync.

The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag,
but IOMMU flag was missed, due to assumption that such memory
can be treated as regular one.

Reuse newly introduced DMA attribute to properly take MMIO path.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 mm/hmm.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 015ab243f081..6556c0e074ba 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -746,7 +746,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 	case PCI_P2PDMA_MAP_NONE:
 		break;
 	case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-		attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+		attrs |= DMA_ATTR_MMIO;
 		pfns[idx] |= HMM_PFN_P2PDMA;
 		break;
 	case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -776,7 +776,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
 			goto error;
 
 		dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
-					DMA_BIDIRECTIONAL, 0);
+					DMA_BIDIRECTIONAL, attrs);
 		if (dma_mapping_error(dev, dma_addr))
 			goto error;
 
@@ -811,16 +811,17 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
 	if ((pfns[idx] & valid_dma) != valid_dma)
 		return false;
 
+	if (pfns[idx] & HMM_PFN_P2PDMA)
+		attrs |= DMA_ATTR_MMIO;
+
 	if (pfns[idx] & HMM_PFN_P2PDMA_BUS)
 		; /* no need to unmap bus address P2P mappings */
-	else if (dma_use_iova(state)) {
-		if (pfns[idx] & HMM_PFN_P2PDMA)
-			attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+	else if (dma_use_iova(state))
 		dma_iova_unlink(dev, state, idx * map->dma_entry_size,
 				map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
-	} else if (dma_need_unmap(dev))
+	else if (dma_need_unmap(dev))
 		dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
-			       DMA_BIDIRECTIONAL, 0);
+			       DMA_BIDIRECTIONAL, attrs);
 
 	pfns[idx] &=
 		~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (12 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-19 18:20   ` Keith Busch
  2025-09-02 20:49   ` Marek Szyprowski
  2025-08-19 17:36 ` [PATCH v4 15/16] block-dma: properly take MMIO path Leon Romanovsky
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

After introduction of dma_map_phys(), there is no need to convert
from physical address to struct page in order to map page. So let's
use it directly.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 block/blk-mq-dma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index ad283017caef..37e2142be4f7 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
 static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
 		struct blk_dma_iter *iter, struct phys_vec *vec)
 {
-	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
-			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
+			rq_dma_dir(req), 0);
 	if (dma_mapping_error(dma_dev, iter->addr)) {
 		iter->status = BLK_STS_RESOURCE;
 		return false;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (13 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
@ 2025-08-19 17:36 ` Leon Romanovsky
  2025-08-19 18:24   ` Keith Busch
  2025-08-28 15:19   ` Keith Busch
  2025-08-19 17:37 ` [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Make sure that CPU is not synced and IOMMU is configured to take
MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 block/blk-mq-dma.c         | 13 +++++++++++--
 include/linux/blk-mq-dma.h |  6 +++++-
 include/linux/blk_types.h  |  2 ++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index 37e2142be4f7..d415088ed9fd 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,13 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
 static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
 		struct blk_dma_iter *iter, struct phys_vec *vec)
 {
+	unsigned int attrs = 0;
+
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
+
 	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
-			rq_dma_dir(req), 0);
+			rq_dma_dir(req), attrs);
 	if (dma_mapping_error(dma_dev, iter->addr)) {
 		iter->status = BLK_STS_RESOURCE;
 		return false;
@@ -103,14 +108,17 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
 {
 	enum dma_data_direction dir = rq_dma_dir(req);
 	unsigned int mapped = 0;
+	unsigned int attrs = 0;
 	int error;
 
 	iter->addr = state->addr;
 	iter->len = dma_iova_size(state);
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
 
 	do {
 		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
-				vec->len, dir, 0);
+				vec->len, dir, attrs);
 		if (error)
 			break;
 		mapped += vec->len;
@@ -176,6 +184,7 @@ bool blk_rq_dma_map_iter_start(struct request *req, struct device *dma_dev,
 			 * same as non-P2P transfers below and during unmap.
 			 */
 			req->cmd_flags &= ~REQ_P2PDMA;
+			req->cmd_flags |= REQ_MMIO;
 			break;
 		default:
 			iter->status = BLK_STS_INVAL;
diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h
index c26a01aeae00..6c55f5e58511 100644
--- a/include/linux/blk-mq-dma.h
+++ b/include/linux/blk-mq-dma.h
@@ -48,12 +48,16 @@ static inline bool blk_rq_dma_map_coalesce(struct dma_iova_state *state)
 static inline bool blk_rq_dma_unmap(struct request *req, struct device *dma_dev,
 		struct dma_iova_state *state, size_t mapped_len)
 {
+	unsigned int attrs = 0;
+
 	if (req->cmd_flags & REQ_P2PDMA)
 		return true;
 
 	if (dma_use_iova(state)) {
+		if (req->cmd_flags & REQ_MMIO)
+			attrs = DMA_ATTR_MMIO;
 		dma_iova_destroy(dma_dev, state, mapped_len, rq_dma_dir(req),
-				 0);
+				 attrs);
 		return true;
 	}
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 09b99d52fd36..283058bcb5b1 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -387,6 +387,7 @@ enum req_flag_bits {
 	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
 	__REQ_ATOMIC,		/* for atomic write operations */
 	__REQ_P2PDMA,		/* contains P2P DMA pages */
+	__REQ_MMIO,		/* contains MMIO memory */
 	/*
 	 * Command specific flags, keep last:
 	 */
@@ -420,6 +421,7 @@ enum req_flag_bits {
 #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
 #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
 #define REQ_P2PDMA	(__force blk_opf_t)(1ULL << __REQ_P2PDMA)
+#define REQ_MMIO	(__force blk_opf_t)(1ULL << __REQ_MMIO)
 
 #define REQ_NOUNMAP	(__force blk_opf_t)(1ULL << __REQ_NOUNMAP)
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (14 preceding siblings ...)
  2025-08-19 17:36 ` [PATCH v4 15/16] block-dma: properly take MMIO path Leon Romanovsky
@ 2025-08-19 17:37 ` Leon Romanovsky
  2025-08-19 19:58   ` Keith Busch
  2025-08-28 11:57 ` [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
  2025-08-29 13:16 ` Jason Gunthorpe
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 17:37 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

From: Leon Romanovsky <leonro@nvidia.com>

Block layer maps MMIO memory through dma_map_phys() interface
with help of DMA_ATTR_MMIO attribute. There is a need to unmap
that memory with the appropriate unmap function, something which
wasn't possible before adding new REQ attribute to block layer in
previous patch.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/nvme/host/pci.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2c6d9506b172..f8ecc0e0f576 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -682,11 +682,15 @@ static void nvme_free_prps(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+	unsigned int attrs = 0;
 	unsigned int i;
 
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
+
 	for (i = 0; i < iod->nr_dma_vecs; i++)
-		dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr,
-				iod->dma_vecs[i].len, rq_dma_dir(req));
+		dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
+				iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
 	mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
 }
 
@@ -699,15 +703,19 @@ static void nvme_free_sgls(struct request *req)
 	unsigned int sqe_dma_len = le32_to_cpu(iod->cmd.common.dptr.sgl.length);
 	struct nvme_sgl_desc *sg_list = iod->descriptors[0];
 	enum dma_data_direction dir = rq_dma_dir(req);
+	unsigned int attrs = 0;
+
+	if (req->cmd_flags & REQ_MMIO)
+		attrs = DMA_ATTR_MMIO;
 
 	if (iod->nr_descriptors) {
 		unsigned int nr_entries = sqe_dma_len / sizeof(*sg_list), i;
 
 		for (i = 0; i < nr_entries; i++)
-			dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr),
-				le32_to_cpu(sg_list[i].length), dir);
+			dma_unmap_phys(dma_dev, le64_to_cpu(sg_list[i].addr),
+				le32_to_cpu(sg_list[i].length), dir, attrs);
 	} else {
-		dma_unmap_page(dma_dev, sqe_dma_addr, sqe_dma_len, dir);
+		dma_unmap_phys(dma_dev, sqe_dma_addr, sqe_dma_len, dir, attrs);
 	}
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-08-19 17:36 ` [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
@ 2025-08-19 18:20   ` Keith Busch
  2025-08-19 18:49     ` Leon Romanovsky
  2025-09-02 20:49   ` Marek Szyprowski
  1 sibling, 1 reply; 50+ messages in thread
From: Keith Busch @ 2025-08-19 18:20 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Aug 19, 2025 at 08:36:58PM +0300, Leon Romanovsky wrote:
>  static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
>  		struct blk_dma_iter *iter, struct phys_vec *vec)
>  {
> -	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> -			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> +	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
> +			rq_dma_dir(req), 0);

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

Just a random thought when I had to double back to check what the "0"
means: many dma_ api's have a default macro without an "attrs" argument,
then an _attrs() version for when you need it. Not sure if you want to
strictly follow that pattern, but merely a suggestion.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface
  2025-08-19 17:36 ` [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
@ 2025-08-19 18:22   ` Keith Busch
  2025-08-28 16:01   ` Jason Gunthorpe
  1 sibling, 0 replies; 50+ messages in thread
From: Keith Busch @ 2025-08-19 18:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Aug 19, 2025 at 08:36:55PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
> that operate directly on physical addresses instead of page+offset
> parameters. This provides a more efficient interface for drivers that
> already have physical addresses available.
> 
> The new functions are implemented as the primary mapping layer, with
> the existing dma_map_page_attrs()/dma_map_resource() and
> dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
> wrappers around the phys-based implementations.
> 
> In case dma_map_page_attrs(), the struct page is converted to physical
> address with help of page_to_phys() function and dma_map_resource()
> provides physical address as is together with addition of DMA_ATTR_MMIO
> attribute.
> 
> The old page-based API is preserved in mapping.c to ensure that existing
> code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
> variant for dma_*map_phys().

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-19 17:36 ` [PATCH v4 15/16] block-dma: properly take MMIO path Leon Romanovsky
@ 2025-08-19 18:24   ` Keith Busch
  2025-08-28 15:19   ` Keith Busch
  1 sibling, 0 replies; 50+ messages in thread
From: Keith Busch @ 2025-08-19 18:24 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Make sure that CPU is not synced and IOMMU is configured to take
> MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.

We may have a minor patch conflict here with my unmerged dma metadata
series, but not a big deal.

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-08-19 18:20   ` Keith Busch
@ 2025-08-19 18:49     ` Leon Romanovsky
  0 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-19 18:49 UTC (permalink / raw)
  To: Keith Busch
  Cc: Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel



On Tue, Aug 19, 2025, at 20:20, Keith Busch wrote:
> On Tue, Aug 19, 2025 at 08:36:58PM +0300, Leon Romanovsky wrote:
>>  static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
>>  		struct blk_dma_iter *iter, struct phys_vec *vec)
>>  {
>> -	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
>> -			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
>> +	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
>> +			rq_dma_dir(req), 0);
>
> Looks good.
>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
>
> Just a random thought when I had to double back to check what the "0"
> means: many dma_ api's have a default macro without an "attrs" argument,
> then an _attrs() version for when you need it. Not sure if you want to
> strictly follow that pattern, but merely a suggestion.

At some point,  I had both functions with and without attrs, but Christoph said that it is an artefact and I should introduce one function which accepts attrs but without _attrs in the name.

Thanks 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface
  2025-08-19 17:37 ` [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
@ 2025-08-19 19:58   ` Keith Busch
  0 siblings, 0 replies; 50+ messages in thread
From: Keith Busch @ 2025-08-19 19:58 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Aug 19, 2025 at 08:37:00PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Block layer maps MMIO memory through dma_map_phys() interface
> with help of DMA_ATTR_MMIO attribute. There is a need to unmap
> that memory with the appropriate unmap function, something which
> wasn't possible before adding new REQ attribute to block layer in
> previous patch.

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (15 preceding siblings ...)
  2025-08-19 17:37 ` [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
@ 2025-08-28 11:57 ` Leon Romanovsky
  2025-09-01 21:47   ` Marek Szyprowski
  2025-08-29 13:16 ` Jason Gunthorpe
  17 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-28 11:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:44PM +0300, Leon Romanovsky wrote:
> Changelog:
> v4:
>  * Fixed kbuild error with mismatch in kmsan function declaration due to
>    rebase error.
> v3: https://lore.kernel.org/all/cover.1755193625.git.leon@kernel.org
>  * Fixed typo in "cacheable" word
>  * Simplified kmsan patch a lot to be simple argument refactoring
> v2: https://lore.kernel.org/all/cover.1755153054.git.leon@kernel.org
>  * Used commit messages and cover letter from Jason
>  * Moved setting IOMMU_MMIO flag to dma_info_to_prot function
>  * Micro-optimized the code
>  * Rebased code on v6.17-rc1
> v1: https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org
>  * Added new DMA_ATTR_MMIO attribute to indicate
>    PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
>  * Rewrote dma_map_* functions to use thus new attribute
> v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
> ------------------------------------------------------------------------
> 
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.
> 
> The series maintains export symbol backward compatibility by keeping
> the old page-based API as wrapper functions around the new physical
> address-based implementations.
> 
> This series refactors the DMA mapping API to provide a phys_addr_t
> based, and struct-page free, external API that can handle all the
> mapping cases we want in modern systems:
> 
>  - struct page based cachable DRAM
>  - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cachable
>    MMIO
>  - struct page-less PCI peer to peer non-cachable MMIO
>  - struct page-less "resource" MMIO
> 
> Overall this gets much closer to Matthew's long term wish for
> struct-pageless IO to cachable DRAM. The remaining primary work would
> be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
> phys_addr_t without a struct page.
> 
> The general design is to remove struct page usage entirely from the
> DMA API inner layers. For flows that need to have a KVA for the
> physical address they can use kmap_local_pfn() or phys_to_virt(). This
> isolates the struct page requirements to MM code only. Long term all
> removals of struct page usage are supporting Matthew's memdesc
> project which seeks to substantially transform how struct page works.
> 
> Instead make the DMA API internals work on phys_addr_t. Internally
> there are still dedicated 'page' and 'resource' flows, except they are
> now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
> flows use the same phys_addr_t.
> 
> When DMA_ATTR_MMIO is specified things work similar to the existing
> 'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
> pfn_valid(), etc are never called on the phys_addr_t. This requires
> rejecting any configuration that would need swiotlb. CPU cache
> flushing is not required, and avoided, as ATTR_MMIO also indicates the
> address have no cachable mappings. This effectively removes any
> DMA API side requirement to have struct page when DMA_ATTR_MMIO is
> used.
> 
> In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
> except on the common path of no cache flush, no swiotlb it never
> touches a struct page. When cache flushing or swiotlb copying
> kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
> usage. This was already the case on the unmap side, now the map side
> is symmetric.
> 
> Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
> must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
> path must also set it. This corrects some existing bugs where iommu
> mappings for P2P MMIO were improperly marked IOMMU_CACHE.
> 
> Since ATTR_MMIO is made to work with all the existing DMA map entry
> points, particularly dma_iova_link(), this finally allows a way to use
> the new DMA API to map PCI P2P MMIO without creating struct page. The
> VFIO DMABUF series demonstrates how this works. This is intended to
> replace the incorrect driver use of dma_map_resource() on PCI BAR
> addresses.
> 
> This series does the core code and modern flows. A followup series
> will give the same treatment to the legacy dma_ops implementation.
> 
> Thanks
> 
> Leon Romanovsky (16):
>   dma-mapping: introduce new DMA attribute to indicate MMIO memory
>   iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
>   dma-debug: refactor to use physical addresses for page mapping
>   dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
>   iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
>   iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
>   dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
>   kmsan: convert kmsan_handle_dma to use physical addresses
>   dma-mapping: handle MMIO flow in dma_map|unmap_page
>   xen: swiotlb: Open code map_resource callback
>   dma-mapping: export new dma_*map_phys() interface
>   mm/hmm: migrate to physical address-based DMA mapping API
>   mm/hmm: properly take MMIO path
>   block-dma: migrate to dma_map_phys instead of map_page
>   block-dma: properly take MMIO path
>   nvme-pci: unmap MMIO pages with appropriate interface
> 
>  Documentation/core-api/dma-api.rst        |   4 +-
>  Documentation/core-api/dma-attributes.rst |  18 ++++
>  arch/powerpc/kernel/dma-iommu.c           |   4 +-
>  block/blk-mq-dma.c                        |  15 ++-
>  drivers/iommu/dma-iommu.c                 |  61 +++++------
>  drivers/nvme/host/pci.c                   |  18 +++-
>  drivers/virtio/virtio_ring.c              |   4 +-
>  drivers/xen/swiotlb-xen.c                 |  21 +++-
>  include/linux/blk-mq-dma.h                |   6 +-
>  include/linux/blk_types.h                 |   2 +
>  include/linux/dma-direct.h                |   2 -
>  include/linux/dma-map-ops.h               |   8 +-
>  include/linux/dma-mapping.h               |  33 ++++++
>  include/linux/iommu-dma.h                 |  11 +-
>  include/linux/kmsan.h                     |   9 +-
>  include/trace/events/dma.h                |   9 +-
>  kernel/dma/debug.c                        |  71 ++++---------
>  kernel/dma/debug.h                        |  37 ++-----
>  kernel/dma/direct.c                       |  22 +---
>  kernel/dma/direct.h                       |  52 ++++++----
>  kernel/dma/mapping.c                      | 117 +++++++++++++---------
>  kernel/dma/ops_helpers.c                  |   6 +-
>  mm/hmm.c                                  |  19 ++--
>  mm/kmsan/hooks.c                          |   5 +-
>  rust/kernel/dma.rs                        |   3 +
>  tools/virtio/linux/kmsan.h                |   2 +-
>  26 files changed, 305 insertions(+), 254 deletions(-)

Marek,

So what are the next steps here? This series is pre-requirement for the
VFIO MMIO patches.

Thanks

> 
> -- 
> 2.50.1
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory
  2025-08-19 17:36 ` [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
@ 2025-08-28 13:03   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 13:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:45PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
> that reside in memory-mapped I/O (MMIO) regions, such as device BARs
> exposed through the host bridge, which are accessible for peer-to-peer
> (P2P) DMA.
> 
> This attribute is especially useful for exporting device memory to other
> devices for DMA without CPU involvement, and avoids unnecessary or
> potentially detrimental CPU cache maintenance calls.
> 
> DMA_ATTR_MMIO is supposed to provide dma_map_resource() functionality
> without need to call to special function and perform branching by
> the callers.

'branching when processing generic containers like bio_vec by the callers'

Many of the existing dma_map_resource() users already know the thing
is MMIO and don't have branching..

> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  Documentation/core-api/dma-attributes.rst | 18 ++++++++++++++++++
>  include/linux/dma-mapping.h               | 20 ++++++++++++++++++++
>  include/trace/events/dma.h                |  3 ++-
>  rust/kernel/dma.rs                        |  3 +++
>  4 files changed, 43 insertions(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping
  2025-08-19 17:36 ` [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-08-28 13:19   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 13:19 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:47PM +0300, Leon Romanovsky wrote:
> @@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
>  		return;
>  
>  	entry->dev       = dev;
> -	entry->type      = dma_debug_single;
> -	entry->paddr	 = page_to_phys(page) + offset;
> +	entry->type      = dma_debug_phy;
> +	entry->paddr	 = phys;
>  	entry->dev_addr  = dma_addr;
>  	entry->size      = size;
>  	entry->direction = direction;
>  	entry->map_err_type = MAP_ERR_NOT_CHECKED;
>  
> -	check_for_stack(dev, page, offset);
> +	if (!(attrs & DMA_ATTR_MMIO)) {
> +		struct page *page = phys_to_page(phys);
> +		size_t offset = offset_in_page(page);
>  
> -	if (!PageHighMem(page)) {
> -		void *addr = page_address(page) + offset;
> +		check_for_stack(dev, page, offset);
>  
> -		check_for_illegal_area(dev, addr, size);
> +		if (!PageHighMem(page)) {
> +			void *addr = page_address(page) + offset;
> +
> +			check_for_illegal_area(dev, addr, size);
> +		}
>  	}

Not anything to change in this series, but I was looking at what it
would take to someday remove the struct page here and it looks
reasonable.

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 06e31fd216e38e..0d6dd3eb9860ac 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -1051,28 +1051,28 @@ static void check_unmap(struct dma_debug_entry *ref)
 	dma_entry_free(entry);
 }
 
-static void check_for_stack(struct device *dev,
-			    struct page *page, size_t offset)
+static void check_for_stack(struct device *dev, phys_addr_t phys)
 {
 	void *addr;
 	struct vm_struct *stack_vm_area = task_stack_vm_area(current);
 
 	if (!stack_vm_area) {
 		/* Stack is direct-mapped. */
-		if (PageHighMem(page))
+		if (PhysHighMem(phys))
 			return;
-		addr = page_address(page) + offset;
+		addr = phys_to_virt(phys);
 		if (object_is_on_stack(addr))
 			err_printk(dev, NULL, "device driver maps memory from stack [addr=%p]\n", addr);
 	} else {
 		/* Stack is vmalloced. */
+		unsigned long pfn = phys >> PAGE_SHIFT;
 		int i;
 
 		for (i = 0; i < stack_vm_area->nr_pages; i++) {
-			if (page != stack_vm_area->pages[i])
+			if (pfn != page_to_pfn(stack_vm_area->pages[i]))
 				continue;
 
-			addr = (u8 *)current->stack + i * PAGE_SIZE + offset;
+			addr = (u8 *)current->stack + i * PAGE_SIZE + (phys % PAGE_SIZE);
 			err_printk(dev, NULL, "device driver maps memory from stack [probable addr=%p]\n", addr);
 			break;
 		}
@@ -1225,16 +1225,10 @@ void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
 	entry->map_err_type = MAP_ERR_NOT_CHECKED;
 
 	if (!(attrs & DMA_ATTR_MMIO)) {
-		struct page *page = phys_to_page(phys);
-		size_t offset = offset_in_page(page);
+		check_for_stack(dev, phys);
 
-		check_for_stack(dev, page, offset);
-
-		if (!PageHighMem(page)) {
-			void *addr = page_address(page) + offset;
-
-			check_for_illegal_area(dev, addr, size);
-		}
+		if (!PhysHighMem(phys))
+			check_for_illegal_area(dev, phys_to_virt(phys), size);
 	}
 
 	add_dma_entry(entry, attrs);

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  2025-08-19 17:36 ` [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
@ 2025-08-28 13:27   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 13:27 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:48PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> As a preparation for following map_page -> map_phys API conversion,
> let's rename trace_dma_*map_page() to be trace_dma_*map_phys().
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  include/trace/events/dma.h | 4 ++--
>  kernel/dma/mapping.c       | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  2025-08-19 17:36 ` [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
@ 2025-08-28 13:38   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 13:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:49PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Rename the IOMMU DMA mapping functions to better reflect their actual
> calling convention. The functions iommu_dma_map_page() and
> iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
> iommu_dma_unmap_phys() respectively, as they already operate on physical
> addresses rather than page structures.
> 
> The calling convention changes from accepting (struct page *page,
> unsigned long offset) to (phys_addr_t phys), which eliminates the need
> for page-to-physical address conversion within the functions. This
> renaming prepares for the broader DMA API conversion from page-based
> to physical address-based mapping throughout the kernel.
> 
> All callers are updated to pass physical addresses directly, including
> dma_map_page_attrs(), scatterlist mapping functions, and DMA page
> allocation helpers. The change simplifies the code by removing the
> page_to_phys() + offset calculation that was previously done inside
> the IOMMU functions.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/iommu/dma-iommu.c | 14 ++++++--------
>  include/linux/iommu-dma.h |  7 +++----
>  kernel/dma/mapping.c      |  4 ++--
>  kernel/dma/ops_helpers.c  |  6 +++---
>  4 files changed, 14 insertions(+), 17 deletions(-)

This looks fine

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

But related to other patches..

iommu_dma_map_phys() ends up like this:

	if (dev_use_swiotlb(dev, size, dir) &&
	    iova_unaligned(iovad, phys, size)) {
		if (attrs & DMA_ATTR_MMIO)
			return DMA_MAPPING_ERROR;

		phys = iommu_dma_map_swiotlb(dev, phys, size, dir, attrs);

But attrs is passed all the way down to swiotlb_tbl_map_single(),
maybe the DMA_ATTR_MMIO check should be moved there?

There are a few call chains with this redundancy:

dma_iova_link()
 -> iommu_dma_iova_link_swiotlb
  -> iommu_dma_iova_bounce_and_link
   -> iommu_dma_map_swiotlb
    -> swiotlb_tbl_map_single()

iommu_dma_map_phys() 
   -> iommu_dma_map_swiotlb
    -> swiotlb_tbl_map_single()

dma_direct_map_phys()
 -> swiotlb_map()
  -> swiotlb_tbl_map_single()

It makes alot of sense to put the check for MMIO when
slots[].orig_addr is stored because that is the point where we start
to require a pfn_to_page().

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
  2025-08-19 17:36 ` [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
@ 2025-08-28 13:49   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 13:49 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:50PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
> order to allow single phys_addr_t flow.
> 
> In the following patches, the iommu_dma_map_resource() will be removed
> in favour of iommu_dma_map_phys(..., attrs | DMA_ATTR_MMIO) flow.

I would reword this a little bit

iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys()

Make iommu_dma_map_phys() and iommu_dma_unmap_phys() respect
DMA_ATTR_MMIO.

DMA_ATTR_MMIO makes the functions behave the same as
iommu_dma_(un)map_resource():
 - No swiotlb is possible
 - No cache flushing is done (ATTR_MMIO should not be cached memory)
 - prot for iommu_map() has IOMMU_MMIO not IOMMU_CACHE 

This is preperation for replacing iommu_dma_map_resource() callers
with iommu_dma_map_phys(DMA_ATTR_MMIO) and removing
iommu_dma_(un)map_resource().

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  2025-08-19 17:36 ` [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
@ 2025-08-28 14:19   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 14:19 UTC (permalink / raw)
  To: Leon Romanovsky, Suzuki K Poulose, Alexey Kardashevskiy
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:51PM +0300, Leon Romanovsky wrote:
> +static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> +		phys_addr_t phys, size_t size, enum dma_data_direction dir,
> +		unsigned long attrs)
>  {
> -	phys_addr_t phys = page_to_phys(page) + offset;
> -	dma_addr_t dma_addr = phys_to_dma(dev, phys);
> +	dma_addr_t dma_addr;
> +	bool capable;
>  
>  	if (is_swiotlb_force_bounce(dev)) {
> -		if (is_pci_p2pdma_page(page))
> -			return DMA_MAPPING_ERROR;
> +		if (attrs & DMA_ATTR_MMIO)
> +			goto err_overflow;
> +
>  		return swiotlb_map(dev, phys, size, dir, attrs);
>  	}
>  
> -	if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
> -	    dma_kmalloc_needs_bounce(dev, size, dir)) {
> -		if (is_pci_p2pdma_page(page))
> -			return DMA_MAPPING_ERROR;
> -		if (is_swiotlb_active(dev))
> +	if (attrs & DMA_ATTR_MMIO)
> +		dma_addr = phys;
> +	else
> +		dma_addr = phys_to_dma(dev, phys);

I've been trying to unpuzzle this CC related mess for a while and
still am unsure what is right here... But judging from the comments I
think this should always call phys_to_dma(). Though I understand the
existing map_resource path didn't call it so it would also be fine to
leave it like this..

Alexey do you know?

The only time this seems to do anything is on AMD and I have no idea
what AMD has done to their CC memory map with the iommu..

On ARM at least I would expect the DMA API to be dealing only with
canonical IPA, ie if the memory is encrpyted it is in the protect IPA
region, if it is decrypted then it is in the unprotected IPA region.

I think some of this 'dma encrypted' 'dma unencrypted' stuff is a bit
confused, at least on ARM, as I would expect the caller to have a
correct phys_addr_t with the correct IPA aliases already. Passing in
an ambiguous struct page for DMA mapping and then magically fixing it
seems really weird to me. I would expect that a correct phys_addr_t
should just translate 1:1 to a dma_addr_t or an iopte. Suzuki is that
the right idea for ARM?

To that end this series seems like a big improvment for CCA as the
caller can now specify either the protected or unprotected IPA
directly instead of an ambiguous struct page.

One of the things we are going to need for bounce buffering devices
like RDMA is to be able to allocate unencrypted folios, mmap them to
userspace, come back and then dma map them as unencrypted into a
MR.

So it looks to me like this series will be important for this use case
as well.

It looks OK though:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
  2025-08-19 17:36 ` [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-08-28 15:00   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 15:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:52PM +0300, Leon Romanovsky wrote:
>  /* Helper function to handle DMA data transfers. */
> -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> +void kmsan_handle_dma(phys_addr_t phys, size_t size,
>  		      enum dma_data_direction dir)
>  {
> +	struct page *page = phys_to_page(phys);
>  	u64 page_offset, to_go, addr;
>  
>  	if (PageHighMem(page))
>  		return;
> -	addr = (u64)page_address(page) + offset;
> +	addr = (u64)page_address(page) + offset_in_page(phys);

addr = phys_to_virt(phys);

And make addr a void *

Otherwise looks fine

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page
  2025-08-19 17:36 ` [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
@ 2025-08-28 15:17   ` Jason Gunthorpe
  2025-08-31 13:12     ` Leon Romanovsky
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 15:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:53PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Extend base DMA page API to handle MMIO flow and follow
> existing dma_map_resource() implementation to rely on dma_map_direct()
> only to take DMA direct path.

I would reword this a little bit too

dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()

Make dma_map_page_attrs() and dma_map_page_attrs() respect
DMA_ATTR_MMIO.

DMA_ATR_MMIO makes the functions behave the same as dma_(un)map_resource():
 - No swiotlb is possible
 - Legacy dma_ops arches use ops->map_resource()
 - No kmsan
 - No arch_dma_map_phys_direct()

The prior patches have made the internl funtions called here support
DMA_ATTR_MMIO.

This is also preparation for turning dma_map_resource() into an inline
calling dma_map_phys(DMA_ATTR_MMIO) to consolidate the flows.

> @@ -166,14 +167,25 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
>  		return DMA_MAPPING_ERROR;
>  
>  	if (dma_map_direct(dev, ops) ||
> -	    arch_dma_map_phys_direct(dev, phys + size))
> +	    (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
>  		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);

PPC is the only user of arch_dma_map_phys_direct() and it looks like
it should be called on MMIO memory. Seems like another inconsistency
with map_resource. I'd leave it like the above though for this series.

>  	else if (use_dma_iommu(dev))
>  		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
> -	else
> +	else if (is_mmio) {
> +		if (!ops->map_resource)
> +			return DMA_MAPPING_ERROR;

Probably written like:

		if (ops->map_resource)
			addr = ops->map_resource(dev, phys, size, dir, attrs);
		else
			addr = DMA_MAPPING_ERROR;

As I think some of the design here is to run the trace even on the
failure path?

Otherwise looks OK

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback
  2025-08-19 17:36 ` [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
@ 2025-08-28 15:18   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 15:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:54PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> General dma_direct_map_resource() is going to be removed
> in next patch, so simply open-code it in xen driver.
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/xen/swiotlb-xen.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-19 17:36 ` [PATCH v4 15/16] block-dma: properly take MMIO path Leon Romanovsky
  2025-08-19 18:24   ` Keith Busch
@ 2025-08-28 15:19   ` Keith Busch
  2025-08-28 16:54     ` Leon Romanovsky
  1 sibling, 1 reply; 50+ messages in thread
From: Keith Busch @ 2025-08-28 15:19 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 09b99d52fd36..283058bcb5b1 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -387,6 +387,7 @@ enum req_flag_bits {
>  	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
>  	__REQ_ATOMIC,		/* for atomic write operations */
>  	__REQ_P2PDMA,		/* contains P2P DMA pages */
> +	__REQ_MMIO,		/* contains MMIO memory */
>  	/*
>  	 * Command specific flags, keep last:
>  	 */
> @@ -420,6 +421,7 @@ enum req_flag_bits {
>  #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
>  #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
>  #define REQ_P2PDMA	(__force blk_opf_t)(1ULL << __REQ_P2PDMA)
> +#define REQ_MMIO	(__force blk_opf_t)(1ULL << __REQ_MMIO)

Now that my integrity metadata DMA series is staged, I don't think we
can use REQ flags like this because data and metadata may have different
mapping types. I think we should add a flags field to the dma_iova_state
instead.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface
  2025-08-19 17:36 ` [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
  2025-08-19 18:22   ` Keith Busch
@ 2025-08-28 16:01   ` Jason Gunthorpe
  1 sibling, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 16:01 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:55PM +0300, Leon Romanovsky wrote:
> The old page-based API is preserved in mapping.c to ensure that existing
> code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
> variant for dma_*map_phys().
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/iommu/dma-iommu.c   | 14 --------
>  include/linux/dma-direct.h  |  2 --
>  include/linux/dma-mapping.h | 13 +++++++
>  include/linux/iommu-dma.h   |  4 ---
>  include/trace/events/dma.h  |  2 --
>  kernel/dma/debug.c          | 43 -----------------------
>  kernel/dma/debug.h          | 21 -----------
>  kernel/dma/direct.c         | 16 ---------
>  kernel/dma/mapping.c        | 69 ++++++++++++++++++++-----------------
>  9 files changed, 50 insertions(+), 134 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 15:19   ` Keith Busch
@ 2025-08-28 16:54     ` Leon Romanovsky
  2025-08-28 17:15       ` Keith Busch
  0 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-28 16:54 UTC (permalink / raw)
  To: Keith Busch
  Cc: Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 09:19:20AM -0600, Keith Busch wrote:
> On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > index 09b99d52fd36..283058bcb5b1 100644
> > --- a/include/linux/blk_types.h
> > +++ b/include/linux/blk_types.h
> > @@ -387,6 +387,7 @@ enum req_flag_bits {
> >  	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
> >  	__REQ_ATOMIC,		/* for atomic write operations */
> >  	__REQ_P2PDMA,		/* contains P2P DMA pages */
> > +	__REQ_MMIO,		/* contains MMIO memory */
> >  	/*
> >  	 * Command specific flags, keep last:
> >  	 */
> > @@ -420,6 +421,7 @@ enum req_flag_bits {
> >  #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
> >  #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
> >  #define REQ_P2PDMA	(__force blk_opf_t)(1ULL << __REQ_P2PDMA)
> > +#define REQ_MMIO	(__force blk_opf_t)(1ULL << __REQ_MMIO)
> 
> Now that my integrity metadata DMA series is staged, I don't think we
> can use REQ flags like this because data and metadata may have different
> mapping types. I think we should add a flags field to the dma_iova_state
> instead.

Before integrity metadata code was merged, the assumption was that request is
only one type or p2p or host. Is it still holding now?

And we can't store in dma_iova_state() as HMM/RDMA code works in page-based
granularity and one dma_iova_state() can mix different types.

Thanks

> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 16:54     ` Leon Romanovsky
@ 2025-08-28 17:15       ` Keith Busch
  2025-08-28 18:41         ` Jason Gunthorpe
  0 siblings, 1 reply; 50+ messages in thread
From: Keith Busch @ 2025-08-28 17:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 07:54:27PM +0300, Leon Romanovsky wrote:
> On Thu, Aug 28, 2025 at 09:19:20AM -0600, Keith Busch wrote:
> > On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> > > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > > index 09b99d52fd36..283058bcb5b1 100644
> > > --- a/include/linux/blk_types.h
> > > +++ b/include/linux/blk_types.h
> > > @@ -387,6 +387,7 @@ enum req_flag_bits {
> > >  	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
> > >  	__REQ_ATOMIC,		/* for atomic write operations */
> > >  	__REQ_P2PDMA,		/* contains P2P DMA pages */
> > > +	__REQ_MMIO,		/* contains MMIO memory */
> > >  	/*
> > >  	 * Command specific flags, keep last:
> > >  	 */
> > > @@ -420,6 +421,7 @@ enum req_flag_bits {
> > >  #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
> > >  #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
> > >  #define REQ_P2PDMA	(__force blk_opf_t)(1ULL << __REQ_P2PDMA)
> > > +#define REQ_MMIO	(__force blk_opf_t)(1ULL << __REQ_MMIO)
> > 
> > Now that my integrity metadata DMA series is staged, I don't think we
> > can use REQ flags like this because data and metadata may have different
> > mapping types. I think we should add a flags field to the dma_iova_state
> > instead.
> 
> Before integrity metadata code was merged, the assumption was that request is
> only one type or p2p or host. Is it still holding now?

I don't think that was ever the case. Metadata is allocated
independently of the data payload, usually by the kernel in
bio_integrity_prep() just before dispatching the request. The bio may
have a p2p data payload, but the integrity metadata is just a kmalloc
buf in that path.

> And we can't store in dma_iova_state() as HMM/RDMA code works in page-based
> granularity and one dma_iova_state() can mix different types.

I see.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 17:15       ` Keith Busch
@ 2025-08-28 18:41         ` Jason Gunthorpe
  2025-08-28 19:10           ` Keith Busch
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 18:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 11:15:20AM -0600, Keith Busch wrote:
> On Thu, Aug 28, 2025 at 07:54:27PM +0300, Leon Romanovsky wrote:
> > On Thu, Aug 28, 2025 at 09:19:20AM -0600, Keith Busch wrote:
> > > On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> > > > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > > > index 09b99d52fd36..283058bcb5b1 100644
> > > > --- a/include/linux/blk_types.h
> > > > +++ b/include/linux/blk_types.h
> > > > @@ -387,6 +387,7 @@ enum req_flag_bits {
> > > >  	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
> > > >  	__REQ_ATOMIC,		/* for atomic write operations */
> > > >  	__REQ_P2PDMA,		/* contains P2P DMA pages */
> > > > +	__REQ_MMIO,		/* contains MMIO memory */
> > > >  	/*
> > > >  	 * Command specific flags, keep last:
> > > >  	 */
> > > > @@ -420,6 +421,7 @@ enum req_flag_bits {
> > > >  #define REQ_FS_PRIVATE	(__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
> > > >  #define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
> > > >  #define REQ_P2PDMA	(__force blk_opf_t)(1ULL << __REQ_P2PDMA)
> > > > +#define REQ_MMIO	(__force blk_opf_t)(1ULL << __REQ_MMIO)
> > > 
> > > Now that my integrity metadata DMA series is staged, I don't think we
> > > can use REQ flags like this because data and metadata may have different
> > > mapping types. I think we should add a flags field to the dma_iova_state
> > > instead.
> > 
> > Before integrity metadata code was merged, the assumption was that request is
> > only one type or p2p or host. Is it still holding now?
> 
> I don't think that was ever the case. Metadata is allocated
> independently of the data payload, usually by the kernel in
> bio_integrity_prep() just before dispatching the request. The bio may
> have a p2p data payload, but the integrity metadata is just a kmalloc
> buf in that path.

Then you should do two dma mapping operations today, that is how the
API was built. You shouldn't mix P2P and non P2P within a single
operation right now..

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 18:41         ` Jason Gunthorpe
@ 2025-08-28 19:10           ` Keith Busch
  2025-08-28 19:18             ` Jason Gunthorpe
  0 siblings, 1 reply; 50+ messages in thread
From: Keith Busch @ 2025-08-28 19:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 03:41:15PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 28, 2025 at 11:15:20AM -0600, Keith Busch wrote:
> > 
> > I don't think that was ever the case. Metadata is allocated
> > independently of the data payload, usually by the kernel in
> > bio_integrity_prep() just before dispatching the request. The bio may
> > have a p2p data payload, but the integrity metadata is just a kmalloc
> > buf in that path.
> 
> Then you should do two dma mapping operations today, that is how the
> API was built. You shouldn't mix P2P and non P2P within a single
> operation right now..

Data and metadata are mapped as separate operations. They're just
different parts of one blk-mq request.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 19:10           ` Keith Busch
@ 2025-08-28 19:18             ` Jason Gunthorpe
  2025-08-28 20:54               ` Keith Busch
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 19:18 UTC (permalink / raw)
  To: Keith Busch
  Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 01:10:32PM -0600, Keith Busch wrote:
> On Thu, Aug 28, 2025 at 03:41:15PM -0300, Jason Gunthorpe wrote:
> > On Thu, Aug 28, 2025 at 11:15:20AM -0600, Keith Busch wrote:
> > > 
> > > I don't think that was ever the case. Metadata is allocated
> > > independently of the data payload, usually by the kernel in
> > > bio_integrity_prep() just before dispatching the request. The bio may
> > > have a p2p data payload, but the integrity metadata is just a kmalloc
> > > buf in that path.
> > 
> > Then you should do two dma mapping operations today, that is how the
> > API was built. You shouldn't mix P2P and non P2P within a single
> > operation right now..
> 
> Data and metadata are mapped as separate operations. They're just
> different parts of one blk-mq request.

In that case the new bit leon proposes should only be used for the
unmap of the data pages and the metadata unmap should always be
unmapped as CPU?

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 19:18             ` Jason Gunthorpe
@ 2025-08-28 20:54               ` Keith Busch
  2025-08-28 23:45                 ` Jason Gunthorpe
  0 siblings, 1 reply; 50+ messages in thread
From: Keith Busch @ 2025-08-28 20:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 04:18:20PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 28, 2025 at 01:10:32PM -0600, Keith Busch wrote:
> > 
> > Data and metadata are mapped as separate operations. They're just
> > different parts of one blk-mq request.
> 
> In that case the new bit leon proposes should only be used for the
> unmap of the data pages and the metadata unmap should always be
> unmapped as CPU?

The common path uses host allocated memory to attach integrity metadata,
but that isn't the only path. A user can attach their own metadata with
nvme passthrough or the recent io_uring application metadata, and that
could have been allocated from anywhere.

In truth though, I hadn't tried p2p metadata before today, and it looks
like bio_integrity_map_user() is missing the P2P extraction flags to
make that work. Just added this patch below, now I can set p2p or host
memory independently for data and integrity payloads:

---
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 6b077ca937f6b..cf45603e378d5 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -265,6 +265,7 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
 	unsigned int align = blk_lim_dma_alignment_and_pad(&q->limits);
 	struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
 	struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
+	iov_iter_extraction_t extraction_flags = 0;
 	size_t offset, bytes = iter->count;
 	unsigned int nr_bvecs;
 	int ret, nr_vecs;
@@ -286,7 +287,12 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
 	}
 
 	copy = !iov_iter_is_aligned(iter, align, align);
-	ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset);
+
+	if (blk_queue_pci_p2pdma(q))
+		extraction_flags |= ITER_ALLOW_P2PDMA;
+
+	ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs,
+					extraction_flags, &offset);
 	if (unlikely(ret < 0))
 		goto free_bvec;
 
--

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 20:54               ` Keith Busch
@ 2025-08-28 23:45                 ` Jason Gunthorpe
  2025-08-29 12:35                   ` Keith Busch
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-28 23:45 UTC (permalink / raw)
  To: Keith Busch
  Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 02:54:35PM -0600, Keith Busch wrote:

> In truth though, I hadn't tried p2p metadata before today, and it looks
> like bio_integrity_map_user() is missing the P2P extraction flags to
> make that work. Just added this patch below, now I can set p2p or host
> memory independently for data and integrity payloads:

I think it is a bit more than that, you have to make sure all the meta
data is the same, either all p2p or all cpu and then record this
somehow so the DMA mapping knows what kind it is.

Once that is all done then the above should still be OK, the dma unmap
of the data can follow Leon's new flag and the dma unmap of the
integrity can follow however integrity kept track (in the
bio_integrity_payload perhaps?) ??

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 15/16] block-dma: properly take MMIO path
  2025-08-28 23:45                 ` Jason Gunthorpe
@ 2025-08-29 12:35                   ` Keith Busch
  0 siblings, 0 replies; 50+ messages in thread
From: Keith Busch @ 2025-08-29 12:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Thu, Aug 28, 2025 at 08:45:42PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 28, 2025 at 02:54:35PM -0600, Keith Busch wrote:
> 
> > In truth though, I hadn't tried p2p metadata before today, and it looks
> > like bio_integrity_map_user() is missing the P2P extraction flags to
> > make that work. Just added this patch below, now I can set p2p or host
> > memory independently for data and integrity payloads:
> 
> I think it is a bit more than that, you have to make sure all the meta
> data is the same, either all p2p or all cpu and then record this
> somehow so the DMA mapping knows what kind it is.

Sure, I can get all that added in for the real patch.
 
> Once that is all done then the above should still be OK, the dma unmap
> of the data can follow Leon's new flag and the dma unmap of the
> integrity can follow however integrity kept track (in the
> bio_integrity_payload perhaps?) ??

We have available bits in the bio_integrity_payload bip_flags, so that
sounds doable. I think we'll need to rearrange some things so we can
reuse the important code for data and metadata mapping/unmapping, but
doesn't look too bad. I'll get started on that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
  2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
                   ` (16 preceding siblings ...)
  2025-08-28 11:57 ` [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
@ 2025-08-29 13:16 ` Jason Gunthorpe
  17 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-08-29 13:16 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Tue, Aug 19, 2025 at 08:36:44PM +0300, Leon Romanovsky wrote:

> This series does the core code and modern flows. A followup series
> will give the same treatment to the legacy dma_ops implementation.

I took a quick check over this to see that it is sane.  I think using
phys is an improvement for most of the dma_ops implemenations.

  arch/sparc/kernel/pci_sun4v.c
  arch/sparc/kernel/iommu.c
    Uses __pa to get phys from the page, never touches page

  arch/alpha/kernel/pci_iommu.c
  arch/sparc/mm/io-unit.c
  drivers/parisc/ccio-dma.c
  drivers/parisc/sba_iommu.c
    Does page_addres() and later does __pa on it. Doesn't touch struct page

  arch/x86/kernel/amd_gart_64.c
  drivers/xen/swiotlb-xen.c
  arch/mips/jazz/jazzdma.c
    Immediately does page_to_phys(), never touches struct page

  drivers/vdpa/vdpa_user/vduse_dev.c
    Does page_to_phys() to call iommu_map()

  drivers/xen/grant-dma-ops.c
    Does page_to_pfn() and nothing else

  arch/powerpc/platforms/ps3/system-bus.c
   This is a maze but I think it wants only phys and the virt is only
   used for debug prints.

The above all never touch a KVA and just want a phys_addr_t.

The below are touching the KVA somehow:

  arch/sparc/mm/iommu.c
  arch/arm/mm/dma-mapping.c
    Uses page_address to cache flush, would be happy with phys_to_virt()
    and a PhysHighMem()

  arch/powerpc/kernel/dma-iommu.c
  arch/powerpc/platforms/pseries/vio.c
   Uses iommu_map_page() which wants phys_to_virt(), doesn't touch
   struct page

  arch/powerpc/platforms/pseries/ibmebus.c
    Returns phys_to_virt() as dma_addr_t.

The two PPC ones are weird, I didn't figure out how that was working..

It would be easy to make map_phys patches for about half of these, in
the first grouping. Doing so would also grant those arches
map_resource capability.

Overall I didn't think there was any reduction in maintainability in
these places. Most are improvements eliminating code, and some are
just switching to phys_to_virt() from page_address(), which we could
further guard with DMA_ATTR_MMIO and a check for highmem.

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page
  2025-08-28 15:17   ` Jason Gunthorpe
@ 2025-08-31 13:12     ` Leon Romanovsky
  0 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-08-31 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Thu, Aug 28, 2025 at 12:17:30PM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 19, 2025 at 08:36:53PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Extend base DMA page API to handle MMIO flow and follow
> > existing dma_map_resource() implementation to rely on dma_map_direct()
> > only to take DMA direct path.
> 
> I would reword this a little bit too
> 
> dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()
> 
> Make dma_map_page_attrs() and dma_map_page_attrs() respect
> DMA_ATTR_MMIO.
> 
> DMA_ATR_MMIO makes the functions behave the same as dma_(un)map_resource():
>  - No swiotlb is possible
>  - Legacy dma_ops arches use ops->map_resource()
>  - No kmsan
>  - No arch_dma_map_phys_direct()
> 
> The prior patches have made the internl funtions called here support
> DMA_ATTR_MMIO.
> 
> This is also preparation for turning dma_map_resource() into an inline
> calling dma_map_phys(DMA_ATTR_MMIO) to consolidate the flows.
> 
> > @@ -166,14 +167,25 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
> >  		return DMA_MAPPING_ERROR;
> >  
> >  	if (dma_map_direct(dev, ops) ||
> > -	    arch_dma_map_phys_direct(dev, phys + size))
> > +	    (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
> >  		addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
> 
> PPC is the only user of arch_dma_map_phys_direct() and it looks like
> it should be called on MMIO memory. Seems like another inconsistency
> with map_resource. I'd leave it like the above though for this series.
> 
> >  	else if (use_dma_iommu(dev))
> >  		addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
> > -	else
> > +	else if (is_mmio) {
> > +		if (!ops->map_resource)
> > +			return DMA_MAPPING_ERROR;
> 
> Probably written like:
> 
> 		if (ops->map_resource)
> 			addr = ops->map_resource(dev, phys, size, dir, attrs);
> 		else
> 			addr = DMA_MAPPING_ERROR;

I'm big fan of "if (!ops->map_resource)" coding style and prefer to keep it.

> 
> As I think some of the design here is to run the trace even on the
> failure path?

Yes, this is how it worked before.

> 
> Otherwise looks OK
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
  2025-08-28 11:57 ` [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
@ 2025-09-01 21:47   ` Marek Szyprowski
  2025-09-01 22:23     ` Jason Gunthorpe
  0 siblings, 1 reply; 50+ messages in thread
From: Marek Szyprowski @ 2025-09-01 21:47 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel


On 28.08.2025 13:57, Leon Romanovsky wrote:
> On Tue, Aug 19, 2025 at 08:36:44PM +0300, Leon Romanovsky wrote:
>> Changelog:
>> v4:
>>   * Fixed kbuild error with mismatch in kmsan function declaration due to
>>     rebase error.
>> v3: https://lore.kernel.org/all/cover.1755193625.git.leon@kernel.org
>>   * Fixed typo in "cacheable" word
>>   * Simplified kmsan patch a lot to be simple argument refactoring
>> v2: https://lore.kernel.org/all/cover.1755153054.git.leon@kernel.org
>>   * Used commit messages and cover letter from Jason
>>   * Moved setting IOMMU_MMIO flag to dma_info_to_prot function
>>   * Micro-optimized the code
>>   * Rebased code on v6.17-rc1
>> v1: https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org
>>   * Added new DMA_ATTR_MMIO attribute to indicate
>>     PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
>>   * Rewrote dma_map_* functions to use thus new attribute
>> v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
>> ------------------------------------------------------------------------
>>
>> This series refactors the DMA mapping to use physical addresses
>> as the primary interface instead of page+offset parameters. This
>> change aligns the DMA API with the underlying hardware reality where
>> DMA operations work with physical addresses, not page structures.
>>
>> The series maintains export symbol backward compatibility by keeping
>> the old page-based API as wrapper functions around the new physical
>> address-based implementations.
>>
>> This series refactors the DMA mapping API to provide a phys_addr_t
>> based, and struct-page free, external API that can handle all the
>> mapping cases we want in modern systems:
>>
>>   - struct page based cachable DRAM
>>   - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cachable
>>     MMIO
>>   - struct page-less PCI peer to peer non-cachable MMIO
>>   - struct page-less "resource" MMIO
>>
>> Overall this gets much closer to Matthew's long term wish for
>> struct-pageless IO to cachable DRAM. The remaining primary work would
>> be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
>> phys_addr_t without a struct page.
>>
>> The general design is to remove struct page usage entirely from the
>> DMA API inner layers. For flows that need to have a KVA for the
>> physical address they can use kmap_local_pfn() or phys_to_virt(). This
>> isolates the struct page requirements to MM code only. Long term all
>> removals of struct page usage are supporting Matthew's memdesc
>> project which seeks to substantially transform how struct page works.
>>
>> Instead make the DMA API internals work on phys_addr_t. Internally
>> there are still dedicated 'page' and 'resource' flows, except they are
>> now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
>> flows use the same phys_addr_t.
>>
>> When DMA_ATTR_MMIO is specified things work similar to the existing
>> 'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
>> pfn_valid(), etc are never called on the phys_addr_t. This requires
>> rejecting any configuration that would need swiotlb. CPU cache
>> flushing is not required, and avoided, as ATTR_MMIO also indicates the
>> address have no cachable mappings. This effectively removes any
>> DMA API side requirement to have struct page when DMA_ATTR_MMIO is
>> used.
>>
>> In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
>> except on the common path of no cache flush, no swiotlb it never
>> touches a struct page. When cache flushing or swiotlb copying
>> kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
>> usage. This was already the case on the unmap side, now the map side
>> is symmetric.
>>
>> Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
>> must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
>> path must also set it. This corrects some existing bugs where iommu
>> mappings for P2P MMIO were improperly marked IOMMU_CACHE.
>>
>> Since ATTR_MMIO is made to work with all the existing DMA map entry
>> points, particularly dma_iova_link(), this finally allows a way to use
>> the new DMA API to map PCI P2P MMIO without creating struct page. The
>> VFIO DMABUF series demonstrates how this works. This is intended to
>> replace the incorrect driver use of dma_map_resource() on PCI BAR
>> addresses.
>>
>> This series does the core code and modern flows. A followup series
>> will give the same treatment to the legacy dma_ops implementation.
>>
>> Thanks
>>
>> Leon Romanovsky (16):
>>    dma-mapping: introduce new DMA attribute to indicate MMIO memory
>>    iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
>>    dma-debug: refactor to use physical addresses for page mapping
>>    dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
>>    iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
>>    iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
>>    dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
>>    kmsan: convert kmsan_handle_dma to use physical addresses
>>    dma-mapping: handle MMIO flow in dma_map|unmap_page
>>    xen: swiotlb: Open code map_resource callback
>>    dma-mapping: export new dma_*map_phys() interface
>>    mm/hmm: migrate to physical address-based DMA mapping API
>>    mm/hmm: properly take MMIO path
>>    block-dma: migrate to dma_map_phys instead of map_page
>>    block-dma: properly take MMIO path
>>    nvme-pci: unmap MMIO pages with appropriate interface
>>
>>   Documentation/core-api/dma-api.rst        |   4 +-
>>   Documentation/core-api/dma-attributes.rst |  18 ++++
>>   arch/powerpc/kernel/dma-iommu.c           |   4 +-
>>   block/blk-mq-dma.c                        |  15 ++-
>>   drivers/iommu/dma-iommu.c                 |  61 +++++------
>>   drivers/nvme/host/pci.c                   |  18 +++-
>>   drivers/virtio/virtio_ring.c              |   4 +-
>>   drivers/xen/swiotlb-xen.c                 |  21 +++-
>>   include/linux/blk-mq-dma.h                |   6 +-
>>   include/linux/blk_types.h                 |   2 +
>>   include/linux/dma-direct.h                |   2 -
>>   include/linux/dma-map-ops.h               |   8 +-
>>   include/linux/dma-mapping.h               |  33 ++++++
>>   include/linux/iommu-dma.h                 |  11 +-
>>   include/linux/kmsan.h                     |   9 +-
>>   include/trace/events/dma.h                |   9 +-
>>   kernel/dma/debug.c                        |  71 ++++---------
>>   kernel/dma/debug.h                        |  37 ++-----
>>   kernel/dma/direct.c                       |  22 +---
>>   kernel/dma/direct.h                       |  52 ++++++----
>>   kernel/dma/mapping.c                      | 117 +++++++++++++---------
>>   kernel/dma/ops_helpers.c                  |   6 +-
>>   mm/hmm.c                                  |  19 ++--
>>   mm/kmsan/hooks.c                          |   5 +-
>>   rust/kernel/dma.rs                        |   3 +
>>   tools/virtio/linux/kmsan.h                |   2 +-
>>   26 files changed, 305 insertions(+), 254 deletions(-)
> Marek,
>
> So what are the next steps here? This series is pre-requirement for the
> VFIO MMIO patches.

I waited a bit with a hope to get a comment from Robin. It looks that 
there is no other alternative for the phys addr in the struct page 
removal process.

I would like to give those patches a try in linux-next, but in meantime 
I tested it on my test farm and found a regression in dma_map_resource() 
handling. Namely the dma_map_resource() is no longer possible with size 
not aligned to kmalloc()'ed buffer, as dma_direct_map_phys() calls 
dma_kmalloc_needs_bounce(), which in turn calls 
dma_kmalloc_size_aligned(). It looks that the check for !(attrs & 
DMA_ATTR_MMIO) should be moved one level up in dma_direct_map_phys(). 
Here is the log:

------------[ cut here ]------------
dma-pl330 fe550000.dma-controller: DMA addr 0x00000000fe410024+4 
overflow (mask ffffffff, bus limit 0).
WARNING: kernel/dma/direct.h:116 at dma_map_phys+0x3a4/0x3ec, CPU#1: 
speaker-test/405
Modules linked in: ...
CPU: 1 UID: 0 PID: 405 Comm: speaker-test Not tainted 
6.17.0-rc4-next-20250901+ #10958 PREEMPT
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dma_map_phys+0x3a4/0x3ec
lr : dma_map_phys+0x3a4/0x3ec
...
Call trace:
  dma_map_phys+0x3a4/0x3ec (P)
  dma_map_resource+0x14/0x20
  pl330_prep_slave_fifo+0x78/0xd0
  pl330_prep_dma_cyclic+0x70/0x2b0
  snd_dmaengine_pcm_trigger+0xec/0x8bc [snd_pcm_dmaengine]
  dmaengine_pcm_trigger+0x18/0x24 [snd_soc_core]
  snd_soc_pcm_component_trigger+0x164/0x208 [snd_soc_core]
  soc_pcm_trigger+0xe4/0x1ec [snd_soc_core]
  snd_pcm_do_start+0x44/0x70 [snd_pcm]
  snd_pcm_action_single+0x48/0xa4 [snd_pcm]
  snd_pcm_action+0x7c/0x98 [snd_pcm]
  snd_pcm_action_lock_irq+0x48/0xb4 [snd_pcm]
  snd_pcm_common_ioctl+0xf00/0x1f1c [snd_pcm]
  snd_pcm_ioctl+0x30/0x48 [snd_pcm]
  __arm64_sys_ioctl+0xac/0x104
  invoke_syscall+0x48/0x110
  el0_svc_common.constprop.0+0x40/0xe8
  do_el0_svc+0x20/0x2c
  el0_svc+0x4c/0x160
  el0t_64_sync_handler+0xa0/0xe4
  el0t_64_sync+0x198/0x19c
irq event stamp: 6596
hardirqs last  enabled at (6595): [<ffff800081344624>] 
_raw_spin_unlock_irqrestore+0x74/0x78
hardirqs last disabled at (6596): [<ffff8000813439b0>] 
_raw_spin_lock_irq+0x78/0x7c
softirqs last  enabled at (6076): [<ffff8000800c2294>] 
handle_softirqs+0x4c4/0x4dc
softirqs last disabled at (6071): [<ffff800080010690>] 
__do_softirq+0x14/0x20
---[ end trace 0000000000000000 ]---
rockchip-i2s-tdm fe410000.i2s: ASoC error (-12): at 
soc_component_trigger() on fe410000.i2s

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
  2025-09-01 21:47   ` Marek Szyprowski
@ 2025-09-01 22:23     ` Jason Gunthorpe
  2025-09-02  9:29       ` Leon Romanovsky
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2025-09-01 22:23 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko,
	Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
	iommu, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
	Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
	linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
	linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
	Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
	rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
	virtualization, Will Deacon, xen-devel

On Mon, Sep 01, 2025 at 11:47:59PM +0200, Marek Szyprowski wrote:
> I would like to give those patches a try in linux-next, but in meantime 
> I tested it on my test farm and found a regression in dma_map_resource() 
> handling. Namely the dma_map_resource() is no longer possible with size 
> not aligned to kmalloc()'ed buffer, as dma_direct_map_phys() calls 
> dma_kmalloc_needs_bounce(),

Hmm, it's this bit:

	capable = dma_capable(dev, dma_addr, size, !(attrs & DMA_ATTR_MMIO));
	if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
		if (is_swiotlb_active(dev) && !(attrs & DMA_ATTR_MMIO))
			return swiotlb_map(dev, phys, size, dir, attrs);

		goto err_overflow;
	}

We shouldn't be checking dma_kmalloc_needs_bounce() on mmio as there
is no cache flushing so the "dma safe alignment" for non-coherent DMA
does not apply.

Like you say looks good to me, and more of the surrouding code can be
pulled in too, no sense in repeating the boolean logic:

	if (attrs & DMA_ATTR_MMIO) {
		dma_addr = phys;
		if (unlikely(!dma_capable(dev, dma_addr, size, false)))
			goto err_overflow;
	} else {
		dma_addr = phys_to_dma(dev, phys);
		if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
		    dma_kmalloc_needs_bounce(dev, size, dir)) {
			if (is_swiotlb_active(dev))
				return swiotlb_map(dev, phys, size, dir, attrs);

			goto err_overflow;
		}
		if (!dev_is_dma_coherent(dev) &&
		    !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
			arch_sync_dma_for_device(phys, size, dir);
	}

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
  2025-09-01 22:23     ` Jason Gunthorpe
@ 2025-09-02  9:29       ` Leon Romanovsky
  0 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2025-09-02  9:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Marek Szyprowski
  Cc: Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On Mon, Sep 01, 2025 at 07:23:02PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 01, 2025 at 11:47:59PM +0200, Marek Szyprowski wrote:
> > I would like to give those patches a try in linux-next, but in meantime 
> > I tested it on my test farm and found a regression in dma_map_resource() 
> > handling. Namely the dma_map_resource() is no longer possible with size 
> > not aligned to kmalloc()'ed buffer, as dma_direct_map_phys() calls 
> > dma_kmalloc_needs_bounce(),
> 
> Hmm, it's this bit:
> 
> 	capable = dma_capable(dev, dma_addr, size, !(attrs & DMA_ATTR_MMIO));
> 	if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
> 		if (is_swiotlb_active(dev) && !(attrs & DMA_ATTR_MMIO))
> 			return swiotlb_map(dev, phys, size, dir, attrs);
> 
> 		goto err_overflow;
> 	}
> 
> We shouldn't be checking dma_kmalloc_needs_bounce() on mmio as there
> is no cache flushing so the "dma safe alignment" for non-coherent DMA
> does not apply.
> 
> Like you say looks good to me, and more of the surrouding code can be
> pulled in too, no sense in repeating the boolean logic:
> 
> 	if (attrs & DMA_ATTR_MMIO) {
> 		dma_addr = phys;
> 		if (unlikely(!dma_capable(dev, dma_addr, size, false)))
> 			goto err_overflow;
> 	} else {
> 		dma_addr = phys_to_dma(dev, phys);
> 		if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||

I tried to reuse same code as much as possible :(

> 		    dma_kmalloc_needs_bounce(dev, size, dir)) {
> 			if (is_swiotlb_active(dev))
> 				return swiotlb_map(dev, phys, size, dir, attrs);
> 
> 			goto err_overflow;
> 		}
> 		if (!dev_is_dma_coherent(dev) &&
> 		    !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> 			arch_sync_dma_for_device(phys, size, dir);
> 	}

Like Jason wrote, but in diff format:

diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 92dbadcd3b2f..3f4792910604 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -85,7 +85,6 @@ static inline dma_addr_t dma_direct_map_phys(struct device *dev,
                unsigned long attrs)
 {
        dma_addr_t dma_addr;
-       bool capable;

        if (is_swiotlb_force_bounce(dev)) {
                if (attrs & DMA_ATTR_MMIO)
@@ -94,17 +93,19 @@ static inline dma_addr_t dma_direct_map_phys(struct device *dev,
                return swiotlb_map(dev, phys, size, dir, attrs);
        }

-       if (attrs & DMA_ATTR_MMIO)
+       if (attrs & DMA_ATTR_MMIO) {
                dma_addr = phys;
-       else
+               if (unlikely(dma_capable(dev, dma_addr, size, false)))
+                       goto err_overflow;
+       } else {
                dma_addr = phys_to_dma(dev, phys);
+               if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
+                   dma_kmalloc_needs_bounce(dev, size, dir)) {
+                       if (is_swiotlb_active(dev))
+                               return swiotlb_map(dev, phys, size, dir, attrs);

-       capable = dma_capable(dev, dma_addr, size, !(attrs & DMA_ATTR_MMIO));
-       if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
-               if (is_swiotlb_active(dev) && !(attrs & DMA_ATTR_MMIO))
-                       return swiotlb_map(dev, phys, size, dir, attrs);
-
-               goto err_overflow;
+                       goto err_overflow;
+               }
        }

        if (!dev_is_dma_coherent(dev) &&


I created new tag with fixed code.
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/tag/?h=dma-phys-Sep-2

Thanks

> 
> Jason

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-08-19 17:36 ` [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
  2025-08-19 18:20   ` Keith Busch
@ 2025-09-02 20:49   ` Marek Szyprowski
  2025-09-02 21:59     ` Keith Busch
  1 sibling, 1 reply; 50+ messages in thread
From: Marek Szyprowski @ 2025-09-02 20:49 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
	Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, Keith Busch, linux-block, linux-doc, linux-kernel,
	linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
	Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
	Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
	Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
	Will Deacon, xen-devel

On 19.08.2025 19:36, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> After introduction of dma_map_phys(), there is no need to convert
> from physical address to struct page in order to map page. So let's
> use it directly.
>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>   block/blk-mq-dma.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
> index ad283017caef..37e2142be4f7 100644
> --- a/block/blk-mq-dma.c
> +++ b/block/blk-mq-dma.c
> @@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
>   static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
>   		struct blk_dma_iter *iter, struct phys_vec *vec)
>   {
> -	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> -			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> +	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
> +			rq_dma_dir(req), 0);
>   	if (dma_mapping_error(dma_dev, iter->addr)) {
>   		iter->status = BLK_STS_RESOURCE;
>   		return false;

I wonder where is the corresponding dma_unmap_page() call and its change 
to dma_unmap_phys()...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-09-02 20:49   ` Marek Szyprowski
@ 2025-09-02 21:59     ` Keith Busch
  2025-09-02 23:24       ` Jason Gunthorpe
  0 siblings, 1 reply; 50+ messages in thread
From: Keith Busch @ 2025-09-02 21:59 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Leon Romanovsky, Leon Romanovsky, Jason Gunthorpe,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Sep 02, 2025 at 10:49:48PM +0200, Marek Szyprowski wrote:
> On 19.08.2025 19:36, Leon Romanovsky wrote:
> > @@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
> >   static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
> >   		struct blk_dma_iter *iter, struct phys_vec *vec)
> >   {
> > -	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> > -			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> > +	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
> > +			rq_dma_dir(req), 0);
> >   	if (dma_mapping_error(dma_dev, iter->addr)) {
> >   		iter->status = BLK_STS_RESOURCE;
> >   		return false;
> 
> I wonder where is the corresponding dma_unmap_page() call and its change 
> to dma_unmap_phys()...

You can't do that in the generic layer, so it's up to the caller. The
dma addrs that blk_dma_iter yield are used in a caller specific
structure. For example, for NVMe, it goes into an NVMe PRP. The generic
layer doesn't know what that is, so the driver has to provide the
unmapping.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page
  2025-09-02 21:59     ` Keith Busch
@ 2025-09-02 23:24       ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 23:24 UTC (permalink / raw)
  To: Keith Busch
  Cc: Marek Szyprowski, Leon Romanovsky, Leon Romanovsky,
	Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton,
	Christoph Hellwig, Danilo Krummrich, iommu, Jason Wang,
	Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross,
	kasan-dev, linux-block, linux-doc, linux-kernel, linux-mm,
	linux-nvme, linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
	Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
	Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
	Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
	xen-devel

On Tue, Sep 02, 2025 at 03:59:37PM -0600, Keith Busch wrote:
> On Tue, Sep 02, 2025 at 10:49:48PM +0200, Marek Szyprowski wrote:
> > On 19.08.2025 19:36, Leon Romanovsky wrote:
> > > @@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
> > >   static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
> > >   		struct blk_dma_iter *iter, struct phys_vec *vec)
> > >   {
> > > -	iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> > > -			offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> > > +	iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
> > > +			rq_dma_dir(req), 0);
> > >   	if (dma_mapping_error(dma_dev, iter->addr)) {
> > >   		iter->status = BLK_STS_RESOURCE;
> > >   		return false;
> > 
> > I wonder where is the corresponding dma_unmap_page() call and its change 
> > to dma_unmap_phys()...
> 
> You can't do that in the generic layer, so it's up to the caller. The
> dma addrs that blk_dma_iter yield are used in a caller specific
> structure. For example, for NVMe, it goes into an NVMe PRP. The generic
> layer doesn't know what that is, so the driver has to provide the
> unmapping.

To be specific I think it is this hunk in another patch that matches
the above:

@@ -682,11 +682,15 @@ static void nvme_free_prps(struct request *req)
 {
        struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
        struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+       unsigned int attrs = 0;
        unsigned int i;
 
+       if (req->cmd_flags & REQ_MMIO)
+               attrs = DMA_ATTR_MMIO;
+
        for (i = 0; i < iod->nr_dma_vecs; i++)
-               dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr,
-                               iod->dma_vecs[i].len, rq_dma_dir(req));
+               dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
+                               iod->dma_vecs[i].len, rq_dma_dir(req), attrs);


And it is functionally fine to split the series like this because
unmap_page is a nop around unmap_phys:

void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
                 enum dma_data_direction dir, unsigned long attrs)
{
        if (unlikely(attrs & DMA_ATTR_MMIO))
                return;

        dma_unmap_phys(dev, addr, size, dir, attrs);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2025-09-02 23:25 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
2025-08-28 13:03   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
2025-08-28 13:19   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
2025-08-28 13:27   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
2025-08-28 13:38   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
2025-08-28 13:49   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
2025-08-28 14:19   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
2025-08-28 15:00   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
2025-08-28 15:17   ` Jason Gunthorpe
2025-08-31 13:12     ` Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
2025-08-28 15:18   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
2025-08-19 18:22   ` Keith Busch
2025-08-28 16:01   ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
2025-08-19 18:20   ` Keith Busch
2025-08-19 18:49     ` Leon Romanovsky
2025-09-02 20:49   ` Marek Szyprowski
2025-09-02 21:59     ` Keith Busch
2025-09-02 23:24       ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 15/16] block-dma: properly take MMIO path Leon Romanovsky
2025-08-19 18:24   ` Keith Busch
2025-08-28 15:19   ` Keith Busch
2025-08-28 16:54     ` Leon Romanovsky
2025-08-28 17:15       ` Keith Busch
2025-08-28 18:41         ` Jason Gunthorpe
2025-08-28 19:10           ` Keith Busch
2025-08-28 19:18             ` Jason Gunthorpe
2025-08-28 20:54               ` Keith Busch
2025-08-28 23:45                 ` Jason Gunthorpe
2025-08-29 12:35                   ` Keith Busch
2025-08-19 17:37 ` [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
2025-08-19 19:58   ` Keith Busch
2025-08-28 11:57 ` [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-09-01 21:47   ` Marek Szyprowski
2025-09-01 22:23     ` Jason Gunthorpe
2025-09-02  9:29       ` Leon Romanovsky
2025-08-29 13:16 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).