* [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
` (16 more replies)
0 siblings, 17 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Changelog:
v6:
* Based on "dma-debug: don't enforce dma mapping check on noncoherent
allocations" patch.
* Removed some unused variables from kmsan conversion.
* Fixed missed ! in dma check.
v5: https://lore.kernel.org/all/cover.1756822782.git.leon@kernel.org
* Added Jason's and Keith's Reviewed-by tags
* Fixed DMA_ATTR_MMIO check in dma_direct_map_phys
* Jason's cleanup suggestions
v4: https://lore.kernel.org/all/cover.1755624249.git.leon@kernel.org/
* Fixed kbuild error with mismatch in kmsan function declaration due to
rebase error.
v3: https://lore.kernel.org/all/cover.1755193625.git.leon@kernel.org
* Fixed typo in "cacheable" word
* Simplified kmsan patch a lot to be simple argument refactoring
v2: https://lore.kernel.org/all/cover.1755153054.git.leon@kernel.org
* Used commit messages and cover letter from Jason
* Moved setting IOMMU_MMIO flag to dma_info_to_prot function
* Micro-optimized the code
* Rebased code on v6.17-rc1
v1: https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org
* Added new DMA_ATTR_MMIO attribute to indicate
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
* Rewrote dma_map_* functions to use thus new attribute
v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
------------------------------------------------------------------------
This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.
The series maintains export symbol backward compatibility by keeping
the old page-based API as wrapper functions around the new physical
address-based implementations.
This series refactors the DMA mapping API to provide a phys_addr_t
based, and struct-page free, external API that can handle all the
mapping cases we want in modern systems:
- struct page based cacheable DRAM
- struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cacheable
MMIO
- struct page-less PCI peer to peer non-cacheable MMIO
- struct page-less "resource" MMIO
Overall this gets much closer to Matthew's long term wish for
struct-pageless IO to cacheable DRAM. The remaining primary work would
be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
phys_addr_t without a struct page.
The general design is to remove struct page usage entirely from the
DMA API inner layers. For flows that need to have a KVA for the
physical address they can use kmap_local_pfn() or phys_to_virt(). This
isolates the struct page requirements to MM code only. Long term all
removals of struct page usage are supporting Matthew's memdesc
project which seeks to substantially transform how struct page works.
Instead make the DMA API internals work on phys_addr_t. Internally
there are still dedicated 'page' and 'resource' flows, except they are
now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
flows use the same phys_addr_t.
When DMA_ATTR_MMIO is specified things work similar to the existing
'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
pfn_valid(), etc are never called on the phys_addr_t. This requires
rejecting any configuration that would need swiotlb. CPU cache
flushing is not required, and avoided, as ATTR_MMIO also indicates the
address have no cacheable mappings. This effectively removes any
DMA API side requirement to have struct page when DMA_ATTR_MMIO is
used.
In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
except on the common path of no cache flush, no swiotlb it never
touches a struct page. When cache flushing or swiotlb copying
kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
usage. This was already the case on the unmap side, now the map side
is symmetric.
Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
path must also set it. This corrects some existing bugs where iommu
mappings for P2P MMIO were improperly marked IOMMU_CACHE.
Since ATTR_MMIO is made to work with all the existing DMA map entry
points, particularly dma_iova_link(), this finally allows a way to use
the new DMA API to map PCI P2P MMIO without creating struct page. The
VFIO DMABUF series demonstrates how this works. This is intended to
replace the incorrect driver use of dma_map_resource() on PCI BAR
addresses.
This series does the core code and modern flows. A followup series
will give the same treatment to the legacy dma_ops implementation.
Thanks
Leon Romanovsky (16):
dma-mapping: introduce new DMA attribute to indicate MMIO memory
iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
dma-debug: refactor to use physical addresses for page mapping
dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys()
dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
kmsan: convert kmsan_handle_dma to use physical addresses
dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()
xen: swiotlb: Open code map_resource callback
dma-mapping: export new dma_*map_phys() interface
mm/hmm: migrate to physical address-based DMA mapping API
mm/hmm: properly take MMIO path
block-dma: migrate to dma_map_phys instead of map_page
block-dma: properly take MMIO path
nvme-pci: unmap MMIO pages with appropriate interface
Documentation/core-api/dma-api.rst | 4 +-
Documentation/core-api/dma-attributes.rst | 18 ++++
arch/powerpc/kernel/dma-iommu.c | 4 +-
block/blk-mq-dma.c | 15 ++-
drivers/iommu/dma-iommu.c | 61 ++++++------
drivers/nvme/host/pci.c | 18 +++-
drivers/virtio/virtio_ring.c | 4 +-
drivers/xen/swiotlb-xen.c | 21 +++-
include/linux/blk-mq-dma.h | 6 +-
include/linux/blk_types.h | 2 +
include/linux/dma-direct.h | 2 -
include/linux/dma-map-ops.h | 8 +-
include/linux/dma-mapping.h | 33 +++++++
include/linux/iommu-dma.h | 11 +--
include/linux/kmsan.h | 9 +-
include/linux/page-flags.h | 1 +
include/trace/events/dma.h | 9 +-
kernel/dma/debug.c | 82 ++++------------
kernel/dma/debug.h | 37 ++-----
kernel/dma/direct.c | 22 +----
kernel/dma/direct.h | 57 +++++++----
kernel/dma/mapping.c | 112 +++++++++++++---------
kernel/dma/ops_helpers.c | 6 +-
mm/hmm.c | 19 ++--
mm/kmsan/hooks.c | 10 +-
rust/kernel/dma.rs | 3 +
tools/virtio/linux/kmsan.h | 2 +-
27 files changed, 312 insertions(+), 264 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v6 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
` (15 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
that reside in memory-mapped I/O (MMIO) regions, such as device BARs
exposed through the host bridge, which are accessible for peer-to-peer
(P2P) DMA.
This attribute is especially useful for exporting device memory to other
devices for DMA without CPU involvement, and avoids unnecessary or
potentially detrimental CPU cache maintenance calls.
DMA_ATTR_MMIO is supposed to provide dma_map_resource() functionality
without need to call to special function and perform branching when
processing generic containers like bio_vec by the callers.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Documentation/core-api/dma-attributes.rst | 18 ++++++++++++++++++
include/linux/dma-mapping.h | 20 ++++++++++++++++++++
include/trace/events/dma.h | 3 ++-
rust/kernel/dma.rs | 3 +++
4 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e926..0bdc2be65e575 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,21 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute indicates the physical address is not normal system
+memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+functions, it may not be cacheable, and access using CPU load/store
+instructions may not be allowed.
+
+Usually this will be used to describe MMIO addresses, or other non-cacheable
+register addresses. When DMA mapping this sort of address we call
+the operation Peer to Peer as a one device is DMA'ing to another device.
+For PCI devices the p2pdma APIs must be used to determine if
+DMA_ATTR_MMIO is appropriate.
+
+For architectures that require cache flushing for DMA coherence
+DMA_ATTR_MMIO will not perform any cache flushing. The address
+provided must never be mapped cacheable into the CPU.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb3..4254fd9bdf5dd 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,26 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
+ *
+ * This attribute indicates the physical address is not normal system
+ * memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+ * functions, it may not be cacheable, and access using CPU load/store
+ * instructions may not be allowed.
+ *
+ * Usually this will be used to describe MMIO addresses, or other non-cacheable
+ * register addresses. When DMA mapping this sort of address we call
+ * the operation Peer to Peer as a one device is DMA'ing to another device.
+ * For PCI devices the p2pdma APIs must be used to determine if DMA_ATTR_MMIO
+ * is appropriate.
+ *
+ * For architectures that require cache flushing for DMA coherence
+ * DMA_ATTR_MMIO will not perform any cache flushing. The address
+ * provided must never be mapped cacheable into the CPU.
+ */
+#define DMA_ATTR_MMIO (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c8..ee90d6f1dcf35 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -31,7 +31,8 @@ TRACE_DEFINE_ENUM(DMA_NONE);
{ DMA_ATTR_FORCE_CONTIGUOUS, "FORCE_CONTIGUOUS" }, \
{ DMA_ATTR_ALLOC_SINGLE_PAGES, "ALLOC_SINGLE_PAGES" }, \
{ DMA_ATTR_NO_WARN, "NO_WARN" }, \
- { DMA_ATTR_PRIVILEGED, "PRIVILEGED" })
+ { DMA_ATTR_PRIVILEGED, "PRIVILEGED" }, \
+ { DMA_ATTR_MMIO, "MMIO" })
DECLARE_EVENT_CLASS(dma_map,
TP_PROTO(struct device *dev, phys_addr_t phys_addr, dma_addr_t dma_addr,
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 2bc8ab51ec280..61d9eed7a786e 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -242,6 +242,9 @@ pub mod attrs {
/// Indicates that the buffer is fully accessible at an elevated privilege level (and
/// ideally inaccessible or at least read-only at lesser-privileged levels).
pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+
+ /// Indicates that the buffer is MMIO memory.
+ pub const DMA_ATTR_MMIO: Attrs = Attrs(bindings::DMA_ATTR_MMIO);
}
/// An abstraction of the `dma_alloc_coherent` API.
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
` (14 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid
touching the possibly non-KVA MMIO memory.
Also correct the incorrect caching attribute for the IOMMU, MMIO
memory should not be cachable inside the IOMMU mapping or it can
possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/iommu/dma-iommu.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fef..e1185ba73e23a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -724,7 +724,12 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, struct device *dev
static int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
unsigned long attrs)
{
- int prot = coherent ? IOMMU_CACHE : 0;
+ int prot;
+
+ if (attrs & DMA_ATTR_MMIO)
+ prot = IOMMU_MMIO;
+ else
+ prot = coherent ? IOMMU_CACHE : 0;
if (attrs & DMA_ATTR_PRIVILEGED)
prot |= IOMMU_PRIV;
@@ -1838,12 +1843,13 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
unsigned long attrs)
{
bool coherent = dev_is_dma_coherent(dev);
+ int prot = dma_info_to_prot(dir, coherent, attrs);
- if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ if (!coherent && !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
arch_sync_dma_for_device(phys, size, dir);
return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
- dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
+ prot, GFP_ATOMIC);
}
static int iommu_dma_iova_bounce_and_link(struct device *dev, dma_addr_t addr,
@@ -1949,9 +1955,13 @@ int dma_iova_link(struct device *dev, struct dma_iova_state *state,
return -EIO;
if (dev_use_swiotlb(dev, size, dir) &&
- iova_unaligned(iovad, phys, size))
+ iova_unaligned(iovad, phys, size)) {
+ if (attrs & DMA_ATTR_MMIO)
+ return -EPERM;
+
return iommu_dma_iova_link_swiotlb(dev, state, phys, offset,
size, dir, attrs);
+ }
return __dma_iova_link(dev, state->addr + offset - iova_start_pad,
phys - iova_start_pad,
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 19:37 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
` (13 subsequent siblings)
16 siblings, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.
The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.
This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Documentation/core-api/dma-api.rst | 4 +--
include/linux/page-flags.h | 1 +
kernel/dma/debug.c | 39 +++++++++++++++---------------
kernel/dma/debug.h | 16 ++++++------
kernel/dma/mapping.c | 10 ++++----
5 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 3087bea715ed2..ca75b35416792 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -761,7 +761,7 @@ example warning message may look like this::
[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
[<ffffffff803c7ea3>] check_unmap+0x203/0x490
- [<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+ [<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -855,7 +855,7 @@ that a driver may be leaking mappings.
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check DMA mapping errors on addresses returned by dma_map_single() and
dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
this flag is still set, prints warning message that includes call trace that
leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8d3fa3a91ce47..dfbc4ba86bba2 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -614,6 +614,7 @@ FOLIO_FLAG(dropbehind, FOLIO_HEAD_PAGE)
* available at this point.
*/
#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
+#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
#define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
#else
PAGEFLAG_FALSE(HighMem, highmem)
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index b82399437db03..b275db9ca6a03 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -40,6 +40,7 @@ enum {
dma_debug_coherent,
dma_debug_resource,
dma_debug_noncoherent,
+ dma_debug_phy,
};
enum map_err_types {
@@ -143,6 +144,7 @@ static const char *type2name[] = {
[dma_debug_coherent] = "coherent",
[dma_debug_resource] = "resource",
[dma_debug_noncoherent] = "noncoherent",
+ [dma_debug_phy] = "phy",
};
static const char *dir2name[] = {
@@ -1054,17 +1056,16 @@ static void check_unmap(struct dma_debug_entry *ref)
dma_entry_free(entry);
}
-static void check_for_stack(struct device *dev,
- struct page *page, size_t offset)
+static void check_for_stack(struct device *dev, phys_addr_t phys)
{
void *addr;
struct vm_struct *stack_vm_area = task_stack_vm_area(current);
if (!stack_vm_area) {
/* Stack is direct-mapped. */
- if (PageHighMem(page))
+ if (PhysHighMem(phys))
return;
- addr = page_address(page) + offset;
+ addr = phys_to_virt(phys);
if (object_is_on_stack(addr))
err_printk(dev, NULL, "device driver maps memory from stack [addr=%p]\n", addr);
} else {
@@ -1072,10 +1073,12 @@ static void check_for_stack(struct device *dev,
int i;
for (i = 0; i < stack_vm_area->nr_pages; i++) {
- if (page != stack_vm_area->pages[i])
+ if (__phys_to_pfn(phys) !=
+ page_to_pfn(stack_vm_area->pages[i]))
continue;
- addr = (u8 *)current->stack + i * PAGE_SIZE + offset;
+ addr = (u8 *)current->stack + i * PAGE_SIZE +
+ (phys % PAGE_SIZE);
err_printk(dev, NULL, "device driver maps memory from stack [probable addr=%p]\n", addr);
break;
}
@@ -1204,9 +1207,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
}
EXPORT_SYMBOL(debug_dma_map_single);
-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
- size_t size, int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ int direction, dma_addr_t dma_addr, unsigned long attrs)
{
struct dma_debug_entry *entry;
@@ -1221,19 +1223,18 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
return;
entry->dev = dev;
- entry->type = dma_debug_single;
- entry->paddr = page_to_phys(page) + offset;
+ entry->type = dma_debug_phy;
+ entry->paddr = phys;
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
entry->map_err_type = MAP_ERR_NOT_CHECKED;
- check_for_stack(dev, page, offset);
+ if (!(attrs & DMA_ATTR_MMIO)) {
+ check_for_stack(dev, phys);
- if (!PageHighMem(page)) {
- void *addr = page_address(page) + offset;
-
- check_for_illegal_area(dev, addr, size);
+ if (!PhysHighMem(phys))
+ check_for_illegal_area(dev, phys_to_virt(phys), size);
}
add_dma_entry(entry, attrs);
@@ -1277,11 +1278,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
}
EXPORT_SYMBOL(debug_dma_mapping_error);
-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
size_t size, int direction)
{
struct dma_debug_entry ref = {
- .type = dma_debug_single,
+ .type = dma_debug_phy,
.dev = dev,
.dev_addr = dma_addr,
.size = size,
@@ -1305,7 +1306,7 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
return;
for_each_sg(sg, s, nents, i) {
- check_for_stack(dev, sg_page(s), s->offset);
+ check_for_stack(dev, sg_phys(s));
if (!PageHighMem(sg_page(s)))
check_for_illegal_area(dev, sg_virt(s), s->length);
}
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index 48757ca13f314..bedae973e725d 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
#define _KERNEL_DMA_DEBUG_H
#ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction, dma_addr_t dma_addr,
unsigned long attrs);
-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction);
extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -62,14 +61,13 @@ extern void debug_dma_free_pages(struct device *dev, struct page *page,
size_t size, int direction,
dma_addr_t dma_addr);
#else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction,
+ dma_addr_t dma_addr, unsigned long attrs)
{
}
-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction)
{
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 56de28a3b1799..0b7e16c69bf18 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t addr;
BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;
if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+ arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
- attrs);
- debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+ trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
return addr;
}
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_page(dev, addr, size, dir, attrs);
- debug_dma_unmap_page(dev, addr, size, dir);
+ debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (2 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
` (12 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
include/trace/events/dma.h | 4 ++--
kernel/dma/mapping.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index ee90d6f1dcf35..84416c7d6bfaa 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -72,7 +72,7 @@ DEFINE_EVENT(dma_map, name, \
size_t size, enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
DEFINE_MAP_EVENT(dma_map_resource);
DECLARE_EVENT_CLASS(dma_unmap,
@@ -110,7 +110,7 @@ DEFINE_EVENT(dma_unmap, name, \
enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, addr, size, dir, attrs))
-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
DEFINE_UNMAP_EVENT(dma_unmap_resource);
DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0b7e16c69bf18..bd3bb6d59d722 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
iommu_dma_unmap_page(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
- trace_dma_unmap_page(dev, addr, size, dir, attrs);
+ trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (3 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 06/16] iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() Leon Romanovsky
` (11 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.
The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.
All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 ++++++--------
include/linux/iommu-dma.h | 7 +++----
kernel/dma/mapping.c | 4 ++--
kernel/dma/ops_helpers.c | 6 +++---
4 files changed, 14 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e1185ba73e23a..aea119f32f965 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1195,11 +1195,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
return iova_offset(iovad, phys | size);
}
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1227,7 +1225,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
return iova;
}
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1346,7 +1344,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
int i;
for_each_sg(sg, s, nents, i)
- iommu_dma_unmap_page(dev, sg_dma_address(s),
+ iommu_dma_unmap_phys(dev, sg_dma_address(s),
sg_dma_len(s), dir, attrs);
}
@@ -1359,8 +1357,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
sg_dma_mark_swiotlb(sg);
for_each_sg(sg, s, nents, i) {
- sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
- s->offset, s->length, dir, attrs);
+ sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+ s->length, dir, attrs);
if (sg_dma_address(s) == DMA_MAPPING_ERROR)
goto out_unmap;
sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39e..485bdffed9888 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
}
#endif /* CONFIG_IOMMU_DMA */
-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs);
int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index bd3bb6d59d722..90ad728205b93 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
- addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+ addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
arch_dma_unmap_page_direct(dev, addr + size))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+ iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb96..6f9d604d9d406 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
return NULL;
if (use_dma_iommu(dev))
- *dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
- DMA_ATTR_SKIP_CPU_SYNC);
+ *dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
else
*dma_handle = ops->map_page(dev, page, 0, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
const struct dma_map_ops *ops = get_dma_ops(dev);
if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, dma_handle, size, dir,
+ iommu_dma_unmap_phys(dev, dma_handle, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
else if (ops->unmap_page)
ops->unmap_page(dev, dma_handle, size, dir,
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 06/16] iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys()
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (4 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
` (10 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Make iommu_dma_map_phys() and iommu_dma_unmap_phys() respect
DMA_ATTR_MMIO.
DMA_ATTR_MMIO makes the functions behave the same as
iommu_dma_(un)map_resource():
- No swiotlb is possible
- No cache flushing is done (ATTR_MMIO should not be cached memory)
- prot for iommu_map() has IOMMU_MMIO not IOMMU_CACHE
This is preparation for replacing iommu_dma_map_resource() callers
with iommu_dma_map_phys(DMA_ATTR_MMIO) and removing
iommu_dma_(un)map_resource().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/iommu/dma-iommu.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index aea119f32f965..6804aaf034a16 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1211,16 +1211,19 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
*/
if (dev_use_swiotlb(dev, size, dir) &&
iova_unaligned(iovad, phys, size)) {
+ if (attrs & DMA_ATTR_MMIO)
+ return DMA_MAPPING_ERROR;
+
phys = iommu_dma_map_swiotlb(dev, phys, size, dir, attrs);
if (phys == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
}
- if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ if (!coherent && !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
arch_sync_dma_for_device(phys, size, dir);
iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
- if (iova == DMA_MAPPING_ERROR)
+ if (iova == DMA_MAPPING_ERROR && !(attrs & DMA_ATTR_MMIO))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
return iova;
}
@@ -1228,10 +1231,14 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- struct iommu_domain *domain = iommu_get_dma_domain(dev);
phys_addr_t phys;
- phys = iommu_iova_to_phys(domain, dma_handle);
+ if (attrs & DMA_ATTR_MMIO) {
+ __iommu_dma_unmap(dev, dma_handle, size);
+ return;
+ }
+
+ phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
if (WARN_ON(!phys))
return;
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (5 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 06/16] iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
` (9 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.
The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).
Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().
The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
arch/powerpc/kernel/dma-iommu.c | 4 +--
include/linux/dma-map-ops.h | 8 ++---
kernel/dma/direct.c | 6 ++--
kernel/dma/direct.h | 57 +++++++++++++++++++++------------
kernel/dma/mapping.c | 8 ++---
5 files changed, 49 insertions(+), 34 deletions(-)
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f38..0359ab72cd3ba 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
#define can_map_direct(dev, addr) \
((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
{
if (likely(!dev->bus_dma_limit))
return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
{
if (likely(!dev->bus_dma_limit))
return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5d..71f5b30254159 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
void arch_dma_clear_uncached(void *addr, size_t size);
#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
#else
-#define arch_dma_map_page_direct(d, a) (false)
-#define arch_dma_unmap_page_direct(d, a) (false)
+#define arch_dma_map_phys_direct(d, a) (false)
+#define arch_dma_unmap_phys_direct(d, a) (false)
#define arch_dma_map_sg_direct(d, s, n) (false)
#define arch_dma_unmap_sg_direct(d, s, n) (false)
#endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c8799..fa75e30700730 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
if (sg_dma_is_bus_address(sg))
sg_dma_unmark_bus_address(sg);
else
- dma_direct_unmap_page(dev, sg->dma_address,
+ dma_direct_unmap_phys(dev, sg->dma_address,
sg_dma_len(sg), dir, attrs);
}
}
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
*/
break;
case PCI_P2PDMA_MAP_NONE:
- sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
- sg->offset, sg->length, dir, attrs);
+ sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+ sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR) {
ret = -EIO;
goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc0..da2fadf45bcd6 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,42 +80,57 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
arch_dma_mark_clean(paddr, size);
}
-static inline dma_addr_t dma_direct_map_page(struct device *dev,
- struct page *page, unsigned long offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+ phys_addr_t phys, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
- dma_addr_t dma_addr = phys_to_dma(dev, phys);
+ dma_addr_t dma_addr;
if (is_swiotlb_force_bounce(dev)) {
- if (is_pci_p2pdma_page(page))
- return DMA_MAPPING_ERROR;
+ if (attrs & DMA_ATTR_MMIO)
+ goto err_overflow;
+
return swiotlb_map(dev, phys, size, dir, attrs);
}
- if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
- dma_kmalloc_needs_bounce(dev, size, dir)) {
- if (is_pci_p2pdma_page(page))
- return DMA_MAPPING_ERROR;
- if (is_swiotlb_active(dev))
- return swiotlb_map(dev, phys, size, dir, attrs);
-
- dev_WARN_ONCE(dev, 1,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- return DMA_MAPPING_ERROR;
+ if (attrs & DMA_ATTR_MMIO) {
+ dma_addr = phys;
+ if (unlikely(!dma_capable(dev, dma_addr, size, false)))
+ goto err_overflow;
+ } else {
+ dma_addr = phys_to_dma(dev, phys);
+ if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
+ dma_kmalloc_needs_bounce(dev, size, dir)) {
+ if (is_swiotlb_active(dev))
+ return swiotlb_map(dev, phys, size, dir, attrs);
+
+ goto err_overflow;
+ }
}
- if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ if (!dev_is_dma_coherent(dev) &&
+ !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
arch_sync_dma_for_device(phys, size, dir);
return dma_addr;
+
+err_overflow:
+ dev_WARN_ONCE(
+ dev, 1,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ return DMA_MAPPING_ERROR;
}
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = dma_to_phys(dev, addr);
+ phys_addr_t phys;
+
+ if (attrs & DMA_ATTR_MMIO)
+ /* nothing to do: uncached and no swiotlb */
+ return;
+ phys = dma_to_phys(dev, addr);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 90ad728205b93..3ac7d15e095f9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;
if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, phys + size))
- addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+ arch_dma_map_phys_direct(dev, phys + size))
+ addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_page_direct(dev, addr + size))
- dma_direct_unmap_page(dev, addr, size, dir, attrs);
+ arch_dma_unmap_phys_direct(dev, addr + size))
+ dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 08/16] kmsan: convert kmsan_handle_dma to use physical addresses
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (6 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 09/16] dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() Leon Romanovsky
` (8 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Convert the KMSAN DMA handling function from page-based to physical
address-based interface.
The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). The existing semantics where callers are expected to
provide only kmap memory is continued here.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/virtio/virtio_ring.c | 4 ++--
include/linux/kmsan.h | 9 ++++-----
kernel/dma/mapping.c | 3 ++-
mm/kmsan/hooks.c | 10 ++++++----
tools/virtio/linux/kmsan.h | 2 +-
5 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5062061c4084..c147145a65930 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
* is initialized by the hardware. Explicitly check/unpoison it
* depending on the direction.
*/
- kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+ kmsan_handle_dma(sg_phys(sg), sg->length, direction);
*addr = (dma_addr_t)sg_phys(sg);
return 0;
}
@@ -3157,7 +3157,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
struct vring_virtqueue *vq = to_vvq(_vq);
if (!vq->use_dma_api) {
- kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+ kmsan_handle_dma(virt_to_phys(ptr), size, dir);
return (dma_addr_t)virt_to_phys(ptr);
}
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d59..f2fd221107bba 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
/**
* kmsan_handle_dma() - Handle a DMA data transfer.
- * @page: first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys: physical address of the buffer.
* @size: buffer size.
* @dir: one of possible dma_data_direction values.
*
@@ -192,7 +191,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
* * initializes the buffer, if it is copied from device;
* * does both, if this is a DMA_BIDIRECTIONAL transfer.
*/
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir);
/**
@@ -372,8 +371,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
{
}
-static inline void kmsan_handle_dma(struct page *page, size_t offset,
- size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+ enum dma_data_direction dir)
{
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 3ac7d15e095f9..e47bcf7cc43d7 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
- kmsan_handle_dma(page, offset, size, dir);
+
+ kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f07..fa9475e5ec4e9 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,14 +336,16 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
}
/* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- u64 page_offset, to_go, addr;
+ struct page *page = phys_to_page(phys);
+ u64 page_offset, to_go;
+ void *addr;
- if (PageHighMem(page))
+ if (PhysHighMem(phys))
return;
- addr = (u64)page_address(page) + offset;
+ addr = page_to_virt(page);
/*
* The kernel may occasionally give us adjacent DMA pages not belonging
* to the same allocation. Process them separately to avoid triggering
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5a..6cd2e3efd03dc 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@
#include <linux/gfp.h>
-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
}
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 09/16] dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (7 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
` (7 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Make dma_map_page_attrs() and dma_map_page_attrs() respect
DMA_ATTR_MMIO.
DMA_ATR_MMIO makes the functions behave the same as
dma_(un)map_resource():
- No swiotlb is possible
- Legacy dma_ops arches use ops->map_resource()
- No kmsan
- No arch_dma_map_phys_direct()
The prior patches have made the internal functions called here
support DMA_ATTR_MMIO.
This is also preparation for turning dma_map_resource() into an inline
calling dma_map_phys(DMA_ATTR_MMIO) to consolidate the flows.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
kernel/dma/mapping.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e47bcf7cc43d7..95eab531e2273 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
{
const struct dma_map_ops *ops = get_dma_ops(dev);
phys_addr_t phys = page_to_phys(page) + offset;
+ bool is_mmio = attrs & DMA_ATTR_MMIO;
dma_addr_t addr;
BUG_ON(!valid_dma_direction(dir));
@@ -166,14 +167,25 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;
if (dma_map_direct(dev, ops) ||
- arch_dma_map_phys_direct(dev, phys + size))
+ (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
- else
+ else if (is_mmio) {
+ if (!ops->map_resource)
+ return DMA_MAPPING_ERROR;
+
+ addr = ops->map_resource(dev, phys, size, dir, attrs);
+ } else {
+ /*
+ * The dma_ops API contract for ops->map_page() requires
+ * kmappable memory, while ops->map_resource() does not.
+ */
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ }
- kmsan_handle_dma(phys, size, dir);
+ if (!is_mmio)
+ kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
@@ -185,14 +197,18 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ bool is_mmio = attrs & DMA_ATTR_MMIO;
BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_phys_direct(dev, addr + size))
+ (!is_mmio && arch_dma_unmap_phys_direct(dev, addr + size)))
dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
- else
+ else if (is_mmio) {
+ if (ops->unmap_resource)
+ ops->unmap_resource(dev, addr, size, dir, attrs);
+ } else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 10/16] xen: swiotlb: Open code map_resource callback
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (8 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 09/16] dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
` (6 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
General dma_direct_map_resource() is going to be removed
in next patch, so simply open-code it in xen driver.
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/xen/swiotlb-xen.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index da1a7d3d377cf..dd7747a2de879 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -392,6 +392,25 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
}
}
+static dma_addr_t xen_swiotlb_direct_map_resource(struct device *dev,
+ phys_addr_t paddr,
+ size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ dma_addr_t dma_addr = paddr;
+
+ if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
+ dev_err_once(dev,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ WARN_ON_ONCE(1);
+ return DMA_MAPPING_ERROR;
+ }
+
+ return dma_addr;
+}
+
/*
* Return whether the given device DMA address mask can be supported
* properly. For example, if your device can only drive the low 24-bits
@@ -426,5 +445,5 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.max_mapping_size = swiotlb_max_mapping_size,
- .map_resource = dma_direct_map_resource,
+ .map_resource = xen_swiotlb_direct_map_resource,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 11/16] dma-mapping: export new dma_*map_phys() interface
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (9 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
` (5 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.
The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs()/dma_map_resource() and
dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
wrappers around the phys-based implementations.
In case dma_map_page_attrs(), the struct page is converted to physical
address with help of page_to_phys() function and dma_map_resource()
provides physical address as is together with addition of DMA_ATTR_MMIO
attribute.
The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 --------
include/linux/dma-direct.h | 2 --
include/linux/dma-mapping.h | 13 +++++++
include/linux/iommu-dma.h | 4 ---
include/trace/events/dma.h | 2 --
kernel/dma/debug.c | 43 -----------------------
kernel/dma/debug.h | 21 -----------
kernel/dma/direct.c | 16 ---------
kernel/dma/mapping.c | 69 ++++++++++++++++++++-----------------
9 files changed, 50 insertions(+), 134 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6804aaf034a16..7944a3af4545e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1556,20 +1556,6 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
__iommu_dma_unmap(dev, start, end - start);
}
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- return __iommu_dma_map(dev, phys, size,
- dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
- dma_get_mask(dev));
-}
-
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- __iommu_dma_unmap(dev, handle, size);
-}
-
static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
{
size_t alloc_size = PAGE_ALIGN(size);
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index f3bc0bcd70980..c249912456f96 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -149,7 +149,5 @@ void dma_direct_free_pages(struct device *dev, size_t size,
struct page *page, dma_addr_t dma_addr,
enum dma_data_direction dir);
int dma_direct_supported(struct device *dev, u64 mask);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
#endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4254fd9bdf5dd..8248ff9363eed 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -138,6 +138,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs);
void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -192,6 +196,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
}
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
static inline unsigned int dma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents, enum dma_data_direction dir,
unsigned long attrs)
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 485bdffed9888..a92b3ff9b9343 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -42,10 +42,6 @@ size_t iommu_dma_opt_mapping_size(void);
size_t iommu_dma_max_mapping_size(struct device *dev);
void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t handle, unsigned long attrs);
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev, size_t size,
enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index 84416c7d6bfaa..5da59fd8121db 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -73,7 +73,6 @@ DEFINE_EVENT(dma_map, name, \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))
DEFINE_MAP_EVENT(dma_map_phys);
-DEFINE_MAP_EVENT(dma_map_resource);
DECLARE_EVENT_CLASS(dma_unmap,
TP_PROTO(struct device *dev, dma_addr_t addr, size_t size,
@@ -111,7 +110,6 @@ DEFINE_EVENT(dma_unmap, name, \
TP_ARGS(dev, addr, size, dir, attrs))
DEFINE_UNMAP_EVENT(dma_unmap_phys);
-DEFINE_UNMAP_EVENT(dma_unmap_resource);
DECLARE_EVENT_CLASS(dma_alloc_class,
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index b275db9ca6a03..1e5c64cb6a421 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -38,7 +38,6 @@ enum {
dma_debug_single,
dma_debug_sg,
dma_debug_coherent,
- dma_debug_resource,
dma_debug_noncoherent,
dma_debug_phy,
};
@@ -142,7 +141,6 @@ static const char *type2name[] = {
[dma_debug_single] = "single",
[dma_debug_sg] = "scatter-gather",
[dma_debug_coherent] = "coherent",
- [dma_debug_resource] = "resource",
[dma_debug_noncoherent] = "noncoherent",
[dma_debug_phy] = "phy",
};
@@ -1446,47 +1444,6 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
check_unmap(&ref);
}
-void debug_dma_map_resource(struct device *dev, phys_addr_t addr, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
-{
- struct dma_debug_entry *entry;
-
- if (unlikely(dma_debug_disabled()))
- return;
-
- entry = dma_entry_alloc();
- if (!entry)
- return;
-
- entry->type = dma_debug_resource;
- entry->dev = dev;
- entry->paddr = addr;
- entry->size = size;
- entry->dev_addr = dma_addr;
- entry->direction = direction;
- entry->map_err_type = MAP_ERR_NOT_CHECKED;
-
- add_dma_entry(entry, attrs);
-}
-
-void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
- size_t size, int direction)
-{
- struct dma_debug_entry ref = {
- .type = dma_debug_resource,
- .dev = dev,
- .dev_addr = dma_addr,
- .size = size,
- .direction = direction,
- };
-
- if (unlikely(dma_debug_disabled()))
- return;
-
- check_unmap(&ref);
-}
-
void debug_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
size_t size, int direction)
{
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index bedae973e725d..da7be0bddcf67 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -30,14 +30,6 @@ extern void debug_dma_alloc_coherent(struct device *dev, size_t size,
extern void debug_dma_free_coherent(struct device *dev, size_t size,
void *virt, dma_addr_t addr);
-extern void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
- size_t size, int direction,
- dma_addr_t dma_addr,
- unsigned long attrs);
-
-extern void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
- size_t size, int direction);
-
extern void debug_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size,
int direction);
@@ -95,19 +87,6 @@ static inline void debug_dma_free_coherent(struct device *dev, size_t size,
{
}
-static inline void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
- size_t size, int direction,
- dma_addr_t dma_addr,
- unsigned long attrs)
-{
-}
-
-static inline void debug_dma_unmap_resource(struct device *dev,
- dma_addr_t dma_addr, size_t size,
- int direction)
-{
-}
-
static inline void debug_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle,
size_t size, int direction)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fa75e30700730..1062caac47e7b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -502,22 +502,6 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
return ret;
}
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- dma_addr_t dma_addr = paddr;
-
- if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
- dev_err_once(dev,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- WARN_ON_ONCE(1);
- return DMA_MAPPING_ERROR;
- }
-
- return dma_addr;
-}
-
int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 95eab531e2273..fe7472f13b106 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,10 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
}
-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
- size_t offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- phys_addr_t phys = page_to_phys(page) + offset;
bool is_mmio = attrs & DMA_ATTR_MMIO;
dma_addr_t addr;
@@ -177,6 +175,9 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = ops->map_resource(dev, phys, size, dir, attrs);
} else {
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(phys);
+
/*
* The dma_ops API contract for ops->map_page() requires
* kmappable memory, while ops->map_resource() does not.
@@ -191,9 +192,26 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return addr;
}
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+ size_t offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ phys_addr_t phys = page_to_phys(page) + offset;
+
+ if (unlikely(attrs & DMA_ATTR_MMIO))
+ return DMA_MAPPING_ERROR;
+
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+ WARN_ON_ONCE(is_zone_device_page(page)))
+ return DMA_MAPPING_ERROR;
+
+ return dma_map_phys(dev, phys, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_map_page_attrs);
-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -213,6 +231,16 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ if (unlikely(attrs & DMA_ATTR_MMIO))
+ return;
+
+ dma_unmap_phys(dev, addr, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_unmap_page_attrs);
static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -338,41 +366,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- dma_addr_t addr = DMA_MAPPING_ERROR;
-
- BUG_ON(!valid_dma_direction(dir));
-
- if (WARN_ON_ONCE(!dev->dma_mask))
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+ WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;
- if (dma_map_direct(dev, ops))
- addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
- else if (use_dma_iommu(dev))
- addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
- else if (ops->map_resource)
- addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
-
- trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
- debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
- return addr;
+ return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
}
EXPORT_SYMBOL(dma_map_resource);
void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_map_direct(dev, ops))
- ; /* nothing to do: uncached and no swiotlb */
- else if (use_dma_iommu(dev))
- iommu_dma_unmap_resource(dev, addr, size, dir, attrs);
- else if (ops->unmap_resource)
- ops->unmap_resource(dev, addr, size, dir, attrs);
- trace_dma_unmap_resource(dev, addr, size, dir, attrs);
- debug_dma_unmap_resource(dev, addr, size, dir);
+ dma_unmap_phys(dev, addr, size, dir, attrs | DMA_ATTR_MMIO);
}
EXPORT_SYMBOL(dma_unmap_resource);
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 12/16] mm/hmm: migrate to physical address-based DMA mapping API
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (10 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
` (4 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.
The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.
This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
mm/hmm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index d545e24949949..015ab243f0813 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -775,8 +775,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
goto error;
- dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);
if (dma_mapping_error(dev, dma_addr))
goto error;
@@ -819,8 +819,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
} else if (dma_need_unmap(dev))
- dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);
pfns[idx] &=
~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 13/16] mm/hmm: properly take MMIO path
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (11 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
` (3 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
In case peer-to-peer transaction traverses through host bridge,
the IOMMU needs to have IOMMU_MMIO flag, together with skip of
CPU sync.
The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag,
but IOMMU flag was missed, due to assumption that such memory
can be treated as regular one.
Reuse newly introduced DMA attribute to properly take MMIO path.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
mm/hmm.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index 015ab243f0813..6556c0e074ba8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -746,7 +746,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
case PCI_P2PDMA_MAP_NONE:
break;
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
- attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+ attrs |= DMA_ATTR_MMIO;
pfns[idx] |= HMM_PFN_P2PDMA;
break;
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -776,7 +776,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
goto error;
dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
- DMA_BIDIRECTIONAL, 0);
+ DMA_BIDIRECTIONAL, attrs);
if (dma_mapping_error(dev, dma_addr))
goto error;
@@ -811,16 +811,17 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
if ((pfns[idx] & valid_dma) != valid_dma)
return false;
+ if (pfns[idx] & HMM_PFN_P2PDMA)
+ attrs |= DMA_ATTR_MMIO;
+
if (pfns[idx] & HMM_PFN_P2PDMA_BUS)
; /* no need to unmap bus address P2P mappings */
- else if (dma_use_iova(state)) {
- if (pfns[idx] & HMM_PFN_P2PDMA)
- attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+ else if (dma_use_iova(state))
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
- } else if (dma_need_unmap(dev))
+ else if (dma_need_unmap(dev))
dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL, 0);
+ DMA_BIDIRECTIONAL, attrs);
pfns[idx] &=
~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 14/16] block-dma: migrate to dma_map_phys instead of map_page
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (12 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 15/16] block-dma: properly take MMIO path Leon Romanovsky
` (2 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
After introduction of dma_map_phys(), there is no need to convert
from physical address to struct page in order to map page. So let's
use it directly.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
block/blk-mq-dma.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index ad283017caef2..37e2142be4f7d 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
struct blk_dma_iter *iter, struct phys_vec *vec)
{
- iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
- offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+ iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
+ rq_dma_dir(req), 0);
if (dma_mapping_error(dma_dev, iter->addr)) {
iter->status = BLK_STS_RESOURCE;
return false;
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 15/16] block-dma: properly take MMIO path
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (13 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
2025-09-11 22:25 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Marek Szyprowski
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Make sure that CPU is not synced and IOMMU is configured to take
MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
block/blk-mq-dma.c | 13 +++++++++++--
include/linux/blk-mq-dma.h | 6 +++++-
include/linux/blk_types.h | 2 ++
3 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index 37e2142be4f7d..d415088ed9fd2 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,13 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
struct blk_dma_iter *iter, struct phys_vec *vec)
{
+ unsigned int attrs = 0;
+
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
+
iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
- rq_dma_dir(req), 0);
+ rq_dma_dir(req), attrs);
if (dma_mapping_error(dma_dev, iter->addr)) {
iter->status = BLK_STS_RESOURCE;
return false;
@@ -103,14 +108,17 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
{
enum dma_data_direction dir = rq_dma_dir(req);
unsigned int mapped = 0;
+ unsigned int attrs = 0;
int error;
iter->addr = state->addr;
iter->len = dma_iova_size(state);
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
do {
error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
- vec->len, dir, 0);
+ vec->len, dir, attrs);
if (error)
break;
mapped += vec->len;
@@ -176,6 +184,7 @@ bool blk_rq_dma_map_iter_start(struct request *req, struct device *dma_dev,
* same as non-P2P transfers below and during unmap.
*/
req->cmd_flags &= ~REQ_P2PDMA;
+ req->cmd_flags |= REQ_MMIO;
break;
default:
iter->status = BLK_STS_INVAL;
diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h
index c26a01aeae006..6c55f5e585116 100644
--- a/include/linux/blk-mq-dma.h
+++ b/include/linux/blk-mq-dma.h
@@ -48,12 +48,16 @@ static inline bool blk_rq_dma_map_coalesce(struct dma_iova_state *state)
static inline bool blk_rq_dma_unmap(struct request *req, struct device *dma_dev,
struct dma_iova_state *state, size_t mapped_len)
{
+ unsigned int attrs = 0;
+
if (req->cmd_flags & REQ_P2PDMA)
return true;
if (dma_use_iova(state)) {
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
dma_iova_destroy(dma_dev, state, mapped_len, rq_dma_dir(req),
- 0);
+ attrs);
return true;
}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 09b99d52fd365..283058bcb5b14 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -387,6 +387,7 @@ enum req_flag_bits {
__REQ_FS_PRIVATE, /* for file system (submitter) use */
__REQ_ATOMIC, /* for atomic write operations */
__REQ_P2PDMA, /* contains P2P DMA pages */
+ __REQ_MMIO, /* contains MMIO memory */
/*
* Command specific flags, keep last:
*/
@@ -420,6 +421,7 @@ enum req_flag_bits {
#define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
#define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
#define REQ_P2PDMA (__force blk_opf_t)(1ULL << __REQ_P2PDMA)
+#define REQ_MMIO (__force blk_opf_t)(1ULL << __REQ_MMIO)
#define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP)
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v6 16/16] nvme-pci: unmap MMIO pages with appropriate interface
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (14 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 15/16] block-dma: properly take MMIO path Leon Romanovsky
@ 2025-09-09 13:27 ` Leon Romanovsky
2025-09-11 22:25 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Marek Szyprowski
16 siblings, 0 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 13:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
From: Leon Romanovsky <leonro@nvidia.com>
Block layer maps MMIO memory through dma_map_phys() interface
with help of DMA_ATTR_MMIO attribute. There is a need to unmap
that memory with the appropriate unmap function, something which
wasn't possible before adding new REQ attribute to block layer in
previous patch.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/nvme/host/pci.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2c6d9506b1725..f8ecc0e0f576d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -682,11 +682,15 @@ static void nvme_free_prps(struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+ unsigned int attrs = 0;
unsigned int i;
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
+
for (i = 0; i < iod->nr_dma_vecs; i++)
- dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr,
- iod->dma_vecs[i].len, rq_dma_dir(req));
+ dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
+ iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
}
@@ -699,15 +703,19 @@ static void nvme_free_sgls(struct request *req)
unsigned int sqe_dma_len = le32_to_cpu(iod->cmd.common.dptr.sgl.length);
struct nvme_sgl_desc *sg_list = iod->descriptors[0];
enum dma_data_direction dir = rq_dma_dir(req);
+ unsigned int attrs = 0;
+
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
if (iod->nr_descriptors) {
unsigned int nr_entries = sqe_dma_len / sizeof(*sg_list), i;
for (i = 0; i < nr_entries; i++)
- dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr),
- le32_to_cpu(sg_list[i].length), dir);
+ dma_unmap_phys(dma_dev, le64_to_cpu(sg_list[i].addr),
+ le32_to_cpu(sg_list[i].length), dir, attrs);
} else {
- dma_unmap_page(dma_dev, sqe_dma_addr, sqe_dma_len, dir);
+ dma_unmap_phys(dma_dev, sqe_dma_addr, sqe_dma_len, dir, attrs);
}
}
--
2.51.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping
2025-09-09 13:27 ` [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
@ 2025-09-09 19:37 ` Leon Romanovsky
2025-09-10 5:26 ` Leon Romanovsky
0 siblings, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-09 19:37 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
David Hildenbrand, iommu, Jason Wang, Jens Axboe, Joerg Roedel,
Jonathan Corbet, Juergen Gross, kasan-dev, Keith Busch,
linux-block, linux-doc, linux-kernel, linux-mm, linux-nvme,
linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
xen-devel
On Tue, Sep 09, 2025 at 04:27:31PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
<...>
> include/linux/page-flags.h | 1 +
<...>
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -614,6 +614,7 @@ FOLIO_FLAG(dropbehind, FOLIO_HEAD_PAGE)
> * available at this point.
> */
> #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
> +#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
This was a not so great idea to add PhysHighMem() because of "else"
below which unfolds to maze of macros and automatically generated
functions with "static inline int Page##uname ..." signature.
> #define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
> #else
> PAGEFLAG_FALSE(HighMem, highmem)
Thanks
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping
2025-09-09 19:37 ` Leon Romanovsky
@ 2025-09-10 5:26 ` Leon Romanovsky
2025-09-10 11:58 ` Jason Gunthorpe
2025-09-11 22:23 ` Marek Szyprowski
0 siblings, 2 replies; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-10 5:26 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
David Hildenbrand, iommu, Jason Wang, Jens Axboe, Joerg Roedel,
Jonathan Corbet, Juergen Gross, kasan-dev, Keith Busch,
linux-block, linux-doc, linux-kernel, linux-mm, linux-nvme,
linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
xen-devel
On Tue, Sep 09, 2025 at 10:37:48PM +0300, Leon Romanovsky wrote:
> On Tue, Sep 09, 2025 at 04:27:31PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
>
> <...>
>
> > include/linux/page-flags.h | 1 +
>
> <...>
>
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -614,6 +614,7 @@ FOLIO_FLAG(dropbehind, FOLIO_HEAD_PAGE)
> > * available at this point.
> > */
> > #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
> > +#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
>
> This was a not so great idea to add PhysHighMem() because of "else"
> below which unfolds to maze of macros and automatically generated
> functions with "static inline int Page##uname ..." signature.
>
> > #define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
> > #else
> > PAGEFLAG_FALSE(HighMem, highmem)
After sleeping over it, the following hunk will help:
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index dfbc4ba86bba2..2a1f346178024 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -614,11 +614,11 @@ FOLIO_FLAG(dropbehind, FOLIO_HEAD_PAGE)
* available at this point.
*/
#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
-#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
#define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
#else
PAGEFLAG_FALSE(HighMem, highmem)
#endif
+#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
/* Does kmap_local_folio() only allow access to one page of the folio? */
#ifdef CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
>
> Thanks
>
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping
2025-09-10 5:26 ` Leon Romanovsky
@ 2025-09-10 11:58 ` Jason Gunthorpe
2025-09-11 22:23 ` Marek Szyprowski
1 sibling, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2025-09-10 11:58 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko,
Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
David Hildenbrand, iommu, Jason Wang, Jens Axboe, Joerg Roedel,
Jonathan Corbet, Juergen Gross, kasan-dev, Keith Busch,
linux-block, linux-doc, linux-kernel, linux-mm, linux-nvme,
linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
xen-devel
On Wed, Sep 10, 2025 at 08:26:18AM +0300, Leon Romanovsky wrote:
> #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
> -#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
> #define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
> #else
> PAGEFLAG_FALSE(HighMem, highmem)
> #endif
> +#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
Yeah, that's what I imagined, and I'd make it a static inline
static inline bool PhysHighMem(phys_addr_t phys)
These existing macros are old fashioned imho.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping
2025-09-10 5:26 ` Leon Romanovsky
2025-09-10 11:58 ` Jason Gunthorpe
@ 2025-09-11 22:23 ` Marek Szyprowski
1 sibling, 0 replies; 30+ messages in thread
From: Marek Szyprowski @ 2025-09-11 22:23 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
David Hildenbrand, iommu, Jason Wang, Jens Axboe, Joerg Roedel,
Jonathan Corbet, Juergen Gross, kasan-dev, Keith Busch,
linux-block, linux-doc, linux-kernel, linux-mm, linux-nvme,
linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
xen-devel
On 10.09.2025 07:26, Leon Romanovsky wrote:
> On Tue, Sep 09, 2025 at 10:37:48PM +0300, Leon Romanovsky wrote:
>> On Tue, Sep 09, 2025 at 04:27:31PM +0300, Leon Romanovsky wrote:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>> <...>
>>
>>> include/linux/page-flags.h | 1 +
>> <...>
>>
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -614,6 +614,7 @@ FOLIO_FLAG(dropbehind, FOLIO_HEAD_PAGE)
>>> * available at this point.
>>> */
>>> #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
>>> +#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
>> This was a not so great idea to add PhysHighMem() because of "else"
>> below which unfolds to maze of macros and automatically generated
>> functions with "static inline int Page##uname ..." signature.
>>
>>> #define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
>>> #else
>>> PAGEFLAG_FALSE(HighMem, highmem)
> After sleeping over it, the following hunk will help:
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index dfbc4ba86bba2..2a1f346178024 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -614,11 +614,11 @@ FOLIO_FLAG(dropbehind, FOLIO_HEAD_PAGE)
> * available at this point.
> */
> #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
> -#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
> #define folio_test_highmem(__f) is_highmem_idx(folio_zonenum(__f))
> #else
> PAGEFLAG_FALSE(HighMem, highmem)
> #endif
> +#define PhysHighMem(__p) (PageHighMem(phys_to_page(__p)))
>
> /* Does kmap_local_folio() only allow access to one page of the folio? */
> #ifdef CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
Okay, I will add this fixup while applying the patches.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
` (15 preceding siblings ...)
2025-09-09 13:27 ` [PATCH v6 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
@ 2025-09-11 22:25 ` Marek Szyprowski
2025-09-12 9:03 ` Leon Romanovsky
16 siblings, 1 reply; 30+ messages in thread
From: Marek Szyprowski @ 2025-09-11 22:25 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, Keith Busch, linux-block, linux-doc,
linux-kernel, linux-mm, linux-nvme, linuxppc-dev,
linux-trace-kernel, Madhavan Srinivasan, Masami Hiramatsu,
Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy,
rust-for-linux, Sagi Grimberg, Stefano Stabellini, Steven Rostedt,
virtualization, Will Deacon, xen-devel
On 09.09.2025 15:27, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Changelog:
> v6:
> * Based on "dma-debug: don't enforce dma mapping check on noncoherent
> allocations" patch.
> * Removed some unused variables from kmsan conversion.
> * Fixed missed ! in dma check.
> v5: https://lore.kernel.org/all/cover.1756822782.git.leon@kernel.org
> * Added Jason's and Keith's Reviewed-by tags
> * Fixed DMA_ATTR_MMIO check in dma_direct_map_phys
> * Jason's cleanup suggestions
> v4: https://lore.kernel.org/all/cover.1755624249.git.leon@kernel.org/
> * Fixed kbuild error with mismatch in kmsan function declaration due to
> rebase error.
> v3: https://lore.kernel.org/all/cover.1755193625.git.leon@kernel.org
> * Fixed typo in "cacheable" word
> * Simplified kmsan patch a lot to be simple argument refactoring
> v2: https://lore.kernel.org/all/cover.1755153054.git.leon@kernel.org
> * Used commit messages and cover letter from Jason
> * Moved setting IOMMU_MMIO flag to dma_info_to_prot function
> * Micro-optimized the code
> * Rebased code on v6.17-rc1
> v1: https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org
> * Added new DMA_ATTR_MMIO attribute to indicate
> PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
> * Rewrote dma_map_* functions to use thus new attribute
> v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
> ------------------------------------------------------------------------
>
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.
>
> The series maintains export symbol backward compatibility by keeping
> the old page-based API as wrapper functions around the new physical
> address-based implementations.
>
> This series refactors the DMA mapping API to provide a phys_addr_t
> based, and struct-page free, external API that can handle all the
> mapping cases we want in modern systems:
>
> - struct page based cacheable DRAM
> - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cacheable
> MMIO
> - struct page-less PCI peer to peer non-cacheable MMIO
> - struct page-less "resource" MMIO
>
> Overall this gets much closer to Matthew's long term wish for
> struct-pageless IO to cacheable DRAM. The remaining primary work would
> be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
> phys_addr_t without a struct page.
>
> The general design is to remove struct page usage entirely from the
> DMA API inner layers. For flows that need to have a KVA for the
> physical address they can use kmap_local_pfn() or phys_to_virt(). This
> isolates the struct page requirements to MM code only. Long term all
> removals of struct page usage are supporting Matthew's memdesc
> project which seeks to substantially transform how struct page works.
>
> Instead make the DMA API internals work on phys_addr_t. Internally
> there are still dedicated 'page' and 'resource' flows, except they are
> now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
> flows use the same phys_addr_t.
>
> When DMA_ATTR_MMIO is specified things work similar to the existing
> 'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
> pfn_valid(), etc are never called on the phys_addr_t. This requires
> rejecting any configuration that would need swiotlb. CPU cache
> flushing is not required, and avoided, as ATTR_MMIO also indicates the
> address have no cacheable mappings. This effectively removes any
> DMA API side requirement to have struct page when DMA_ATTR_MMIO is
> used.
>
> In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
> except on the common path of no cache flush, no swiotlb it never
> touches a struct page. When cache flushing or swiotlb copying
> kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
> usage. This was already the case on the unmap side, now the map side
> is symmetric.
>
> Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
> must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
> path must also set it. This corrects some existing bugs where iommu
> mappings for P2P MMIO were improperly marked IOMMU_CACHE.
>
> Since ATTR_MMIO is made to work with all the existing DMA map entry
> points, particularly dma_iova_link(), this finally allows a way to use
> the new DMA API to map PCI P2P MMIO without creating struct page. The
> VFIO DMABUF series demonstrates how this works. This is intended to
> replace the incorrect driver use of dma_map_resource() on PCI BAR
> addresses.
>
> This series does the core code and modern flows. A followup series
> will give the same treatment to the legacy dma_ops implementation.
Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it
works fine in linux-next.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-11 22:25 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Marek Szyprowski
@ 2025-09-12 9:03 ` Leon Romanovsky
2025-09-19 16:08 ` Keith Busch
0 siblings, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-12 9:03 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko,
Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich,
David Hildenbrand, iommu, Jason Wang, Jens Axboe, Joerg Roedel,
Jonathan Corbet, Juergen Gross, kasan-dev, Keith Busch,
linux-block, linux-doc, linux-kernel, linux-mm, linux-nvme,
linuxppc-dev, linux-trace-kernel, Madhavan Srinivasan,
Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin,
Miguel Ojeda, Robin Murphy, rust-for-linux, Sagi Grimberg,
Stefano Stabellini, Steven Rostedt, virtualization, Will Deacon,
xen-devel
On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> On 09.09.2025 15:27, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > Changelog:
> > v6:
> > * Based on "dma-debug: don't enforce dma mapping check on noncoherent
> > allocations" patch.
> > * Removed some unused variables from kmsan conversion.
> > * Fixed missed ! in dma check.
> > v5: https://lore.kernel.org/all/cover.1756822782.git.leon@kernel.org
> > * Added Jason's and Keith's Reviewed-by tags
> > * Fixed DMA_ATTR_MMIO check in dma_direct_map_phys
> > * Jason's cleanup suggestions
> > v4: https://lore.kernel.org/all/cover.1755624249.git.leon@kernel.org/
> > * Fixed kbuild error with mismatch in kmsan function declaration due to
> > rebase error.
> > v3: https://lore.kernel.org/all/cover.1755193625.git.leon@kernel.org
> > * Fixed typo in "cacheable" word
> > * Simplified kmsan patch a lot to be simple argument refactoring
> > v2: https://lore.kernel.org/all/cover.1755153054.git.leon@kernel.org
> > * Used commit messages and cover letter from Jason
> > * Moved setting IOMMU_MMIO flag to dma_info_to_prot function
> > * Micro-optimized the code
> > * Rebased code on v6.17-rc1
> > v1: https://lore.kernel.org/all/cover.1754292567.git.leon@kernel.org
> > * Added new DMA_ATTR_MMIO attribute to indicate
> > PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
> > * Rewrote dma_map_* functions to use thus new attribute
> > v0: https://lore.kernel.org/all/cover.1750854543.git.leon@kernel.org/
> > ------------------------------------------------------------------------
> >
> > This series refactors the DMA mapping to use physical addresses
> > as the primary interface instead of page+offset parameters. This
> > change aligns the DMA API with the underlying hardware reality where
> > DMA operations work with physical addresses, not page structures.
> >
> > The series maintains export symbol backward compatibility by keeping
> > the old page-based API as wrapper functions around the new physical
> > address-based implementations.
> >
> > This series refactors the DMA mapping API to provide a phys_addr_t
> > based, and struct-page free, external API that can handle all the
> > mapping cases we want in modern systems:
> >
> > - struct page based cacheable DRAM
> > - struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cacheable
> > MMIO
> > - struct page-less PCI peer to peer non-cacheable MMIO
> > - struct page-less "resource" MMIO
> >
> > Overall this gets much closer to Matthew's long term wish for
> > struct-pageless IO to cacheable DRAM. The remaining primary work would
> > be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
> > phys_addr_t without a struct page.
> >
> > The general design is to remove struct page usage entirely from the
> > DMA API inner layers. For flows that need to have a KVA for the
> > physical address they can use kmap_local_pfn() or phys_to_virt(). This
> > isolates the struct page requirements to MM code only. Long term all
> > removals of struct page usage are supporting Matthew's memdesc
> > project which seeks to substantially transform how struct page works.
> >
> > Instead make the DMA API internals work on phys_addr_t. Internally
> > there are still dedicated 'page' and 'resource' flows, except they are
> > now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
> > flows use the same phys_addr_t.
> >
> > When DMA_ATTR_MMIO is specified things work similar to the existing
> > 'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
> > pfn_valid(), etc are never called on the phys_addr_t. This requires
> > rejecting any configuration that would need swiotlb. CPU cache
> > flushing is not required, and avoided, as ATTR_MMIO also indicates the
> > address have no cacheable mappings. This effectively removes any
> > DMA API side requirement to have struct page when DMA_ATTR_MMIO is
> > used.
> >
> > In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
> > except on the common path of no cache flush, no swiotlb it never
> > touches a struct page. When cache flushing or swiotlb copying
> > kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
> > usage. This was already the case on the unmap side, now the map side
> > is symmetric.
> >
> > Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
> > must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
> > path must also set it. This corrects some existing bugs where iommu
> > mappings for P2P MMIO were improperly marked IOMMU_CACHE.
> >
> > Since ATTR_MMIO is made to work with all the existing DMA map entry
> > points, particularly dma_iova_link(), this finally allows a way to use
> > the new DMA API to map PCI P2P MMIO without creating struct page. The
> > VFIO DMABUF series demonstrates how this works. This is intended to
> > replace the incorrect driver use of dma_map_resource() on PCI BAR
> > addresses.
> >
> > This series does the core code and modern flows. A followup series
> > will give the same treatment to the legacy dma_ops implementation.
>
> Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it
> works fine in linux-next.
Thanks a lot.
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-12 9:03 ` Leon Romanovsky
@ 2025-09-19 16:08 ` Keith Busch
2025-09-20 15:53 ` Leon Romanovsky
0 siblings, 1 reply; 30+ messages in thread
From: Keith Busch @ 2025-09-19 16:08 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Fri, Sep 12, 2025 at 12:03:27PM +0300, Leon Romanovsky wrote:
> On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> > >
> > > This series does the core code and modern flows. A followup series
> > > will give the same treatment to the legacy dma_ops implementation.
> >
> > Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it
> > works fine in linux-next.
>
> Thanks a lot.
Just fyi, when dma debug is enabled, we're seeing this new warning
below. I have not had a chance to look into it yet, so I'm just
reporting the observation.
DMA-API: nvme 0006:01:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported
WARNING: kernel/dma/debug.c:598 at add_dma_entry+0x26c/0x328, CPU#1: (udev-worker)/773
Modules linked in: acpi_power_meter(E) loop(E) efivarfs(E) autofs4(E)
CPU: 1 UID: 0 PID: 773 Comm: (udev-worker) Tainted: G E N 6.17.0-rc6-next-20250918-debug #6 PREEMPT(none)
Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : add_dma_entry+0x26c/0x328
lr : add_dma_entry+0x26c/0x328
sp : ffff80009fe0f460
x29: ffff80009fe0f470 x28: 0000000000000001 x27: 0000000000000001
x26: ffff8000835d7f38 x25: ffff8000835d7000 x24: ffff8000835d7e60
x23: 0000000000000000 x22: 0000000006e2cc00 x21: 0000000000000000
x20: ffff800082e8f218 x19: ffff0000a908ff80 x18: 00000000ffffffff
x17: ffff8000801972a0 x16: ffff800080197054 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000004 x12: 0000000000020006
x11: 0000000030e4ef9f x10: ffff800083443358 x9 : ffff80008019499c
x8 : 00000000fffeffff x7 : ffff800083443358 x6 : 0000000000000000
x5 : 00000000000bfff4 x4 : 0000000000000000 x3 : ffff0000bb005ac0
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000bb005ac0
Call trace:
add_dma_entry+0x26c/0x328 (P)
debug_dma_map_phys+0xc4/0xf0
dma_map_phys+0xe0/0x410
dma_map_page_attrs+0x94/0xf8
blk_dma_map_direct.isra.0+0x64/0xb8
blk_rq_dma_map_iter_next+0x6c/0xc8
nvme_prep_rq+0x894/0xa98
nvme_queue_rqs+0xb0/0x1a0
blk_mq_dispatch_queue_requests+0x268/0x3b8
blk_mq_flush_plug_list+0x90/0x188
__blk_flush_plug+0x104/0x170
blk_finish_plug+0x38/0x50
read_pages+0x1a4/0x3b8
page_cache_ra_unbounded+0x1a0/0x400
force_page_cache_ra+0xa8/0xd8
page_cache_sync_ra+0xa0/0x3f8
filemap_get_pages+0x104/0x950
filemap_read+0xf4/0x498
blkdev_read_iter+0x88/0x180
vfs_read+0x214/0x310
ksys_read+0x70/0x110
__arm64_sys_read+0x20/0x30
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0xc4/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x1a0/0x340
el0t_64_sync_handler+0x98/0xe0
el0t_64_sync+0x17c/0x180
---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-19 16:08 ` Keith Busch
@ 2025-09-20 15:53 ` Leon Romanovsky
2025-09-21 0:47 ` Keith Busch
0 siblings, 1 reply; 30+ messages in thread
From: Leon Romanovsky @ 2025-09-20 15:53 UTC (permalink / raw)
To: Keith Busch
Cc: Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Fri, Sep 19, 2025 at 10:08:21AM -0600, Keith Busch wrote:
> On Fri, Sep 12, 2025 at 12:03:27PM +0300, Leon Romanovsky wrote:
> > On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> > > >
> > > > This series does the core code and modern flows. A followup series
> > > > will give the same treatment to the legacy dma_ops implementation.
> > >
> > > Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it
> > > works fine in linux-next.
> >
> > Thanks a lot.
>
> Just fyi, when dma debug is enabled, we're seeing this new warning
> below. I have not had a chance to look into it yet, so I'm just
> reporting the observation.
Did you apply all patches or only Marek's branch?
I don't get this warning when I run my NVMe tests on current dmabuf-vfio branch.
Thanks
>
> DMA-API: nvme 0006:01:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported
> WARNING: kernel/dma/debug.c:598 at add_dma_entry+0x26c/0x328, CPU#1: (udev-worker)/773
> Modules linked in: acpi_power_meter(E) loop(E) efivarfs(E) autofs4(E)
> CPU: 1 UID: 0 PID: 773 Comm: (udev-worker) Tainted: G E N 6.17.0-rc6-next-20250918-debug #6 PREEMPT(none)
> Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
> pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
> pc : add_dma_entry+0x26c/0x328
> lr : add_dma_entry+0x26c/0x328
> sp : ffff80009fe0f460
> x29: ffff80009fe0f470 x28: 0000000000000001 x27: 0000000000000001
> x26: ffff8000835d7f38 x25: ffff8000835d7000 x24: ffff8000835d7e60
> x23: 0000000000000000 x22: 0000000006e2cc00 x21: 0000000000000000
> x20: ffff800082e8f218 x19: ffff0000a908ff80 x18: 00000000ffffffff
> x17: ffff8000801972a0 x16: ffff800080197054 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000004 x12: 0000000000020006
> x11: 0000000030e4ef9f x10: ffff800083443358 x9 : ffff80008019499c
> x8 : 00000000fffeffff x7 : ffff800083443358 x6 : 0000000000000000
> x5 : 00000000000bfff4 x4 : 0000000000000000 x3 : ffff0000bb005ac0
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000bb005ac0
> Call trace:
> add_dma_entry+0x26c/0x328 (P)
> debug_dma_map_phys+0xc4/0xf0
> dma_map_phys+0xe0/0x410
> dma_map_page_attrs+0x94/0xf8
> blk_dma_map_direct.isra.0+0x64/0xb8
> blk_rq_dma_map_iter_next+0x6c/0xc8
> nvme_prep_rq+0x894/0xa98
> nvme_queue_rqs+0xb0/0x1a0
> blk_mq_dispatch_queue_requests+0x268/0x3b8
> blk_mq_flush_plug_list+0x90/0x188
> __blk_flush_plug+0x104/0x170
> blk_finish_plug+0x38/0x50
> read_pages+0x1a4/0x3b8
> page_cache_ra_unbounded+0x1a0/0x400
> force_page_cache_ra+0xa8/0xd8
> page_cache_sync_ra+0xa0/0x3f8
> filemap_get_pages+0x104/0x950
> filemap_read+0xf4/0x498
> blkdev_read_iter+0x88/0x180
> vfs_read+0x214/0x310
> ksys_read+0x70/0x110
> __arm64_sys_read+0x20/0x30
> invoke_syscall+0x4c/0x118
> el0_svc_common.constprop.0+0xc4/0xf0
> do_el0_svc+0x24/0x38
> el0_svc+0x1a0/0x340
> el0t_64_sync_handler+0x98/0xe0
> el0t_64_sync+0x17c/0x180
> ---[ end trace 0000000000000000 ]---
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-20 15:53 ` Leon Romanovsky
@ 2025-09-21 0:47 ` Keith Busch
2025-09-23 17:09 ` Jason Gunthorpe
0 siblings, 1 reply; 30+ messages in thread
From: Keith Busch @ 2025-09-21 0:47 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Sat, Sep 20, 2025 at 06:53:52PM +0300, Leon Romanovsky wrote:
> On Fri, Sep 19, 2025 at 10:08:21AM -0600, Keith Busch wrote:
> > On Fri, Sep 12, 2025 at 12:03:27PM +0300, Leon Romanovsky wrote:
> > > On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> > > > >
> > > > > This series does the core code and modern flows. A followup series
> > > > > will give the same treatment to the legacy dma_ops implementation.
> > > >
> > > > Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it
> > > > works fine in linux-next.
> > >
> > > Thanks a lot.
> >
> > Just fyi, when dma debug is enabled, we're seeing this new warning
> > below. I have not had a chance to look into it yet, so I'm just
> > reporting the observation.
>
> Did you apply all patches or only Marek's branch?
> I don't get this warning when I run my NVMe tests on current dmabuf-vfio branch.
This was the snapshot of linux-next from the 20250918 tag. It doesn't
have the full patchset applied.
One other thing to note, this was runing on arm64 platform using smmu
configured with 64k pages. If your iommu granule is 4k instead, we
wouldn't use the blk_dma_map_direct path.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-21 0:47 ` Keith Busch
@ 2025-09-23 17:09 ` Jason Gunthorpe
2025-09-23 18:30 ` Keith Busch
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2025-09-23 17:09 UTC (permalink / raw)
To: Keith Busch
Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Sat, Sep 20, 2025 at 06:47:27PM -0600, Keith Busch wrote:
> On Sat, Sep 20, 2025 at 06:53:52PM +0300, Leon Romanovsky wrote:
> > On Fri, Sep 19, 2025 at 10:08:21AM -0600, Keith Busch wrote:
> > > On Fri, Sep 12, 2025 at 12:03:27PM +0300, Leon Romanovsky wrote:
> > > > On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> > > > > >
> > > > > > This series does the core code and modern flows. A followup series
> > > > > > will give the same treatment to the legacy dma_ops implementation.
> > > > >
> > > > > Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it
> > > > > works fine in linux-next.
> > > >
> > > > Thanks a lot.
> > >
> > > Just fyi, when dma debug is enabled, we're seeing this new warning
> > > below. I have not had a chance to look into it yet, so I'm just
> > > reporting the observation.
> >
> > Did you apply all patches or only Marek's branch?
> > I don't get this warning when I run my NVMe tests on current dmabuf-vfio branch.
>
> This was the snapshot of linux-next from the 20250918 tag. It doesn't
> have the full patchset applied.
>
> One other thing to note, this was runing on arm64 platform using smmu
> configured with 64k pages. If your iommu granule is 4k instead, we
> wouldn't use the blk_dma_map_direct path.
I spent some time looking to see if I could guess what this is and
came up empty. It seems most likely we are leaking a dma mapping
tracking somehow? The DMA API side is pretty simple here though..
Not sure the 64k/4k itself is a cause, but triggering the non-iova
flow is probably the issue.
Can you check the output of this debugfs:
/*
* Dump mappings entries on user space via debugfs
*/
static int dump_show(struct seq_file *seq, void *v)
? If the system is idle and it has lots of entries that is probably
confirmation of the theory.
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-23 17:09 ` Jason Gunthorpe
@ 2025-09-23 18:30 ` Keith Busch
2025-09-23 22:22 ` Jason Gunthorpe
0 siblings, 1 reply; 30+ messages in thread
From: Keith Busch @ 2025-09-23 18:30 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Tue, Sep 23, 2025 at 02:09:36PM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 20, 2025 at 06:47:27PM -0600, Keith Busch wrote:
> >
> > One other thing to note, this was runing on arm64 platform using smmu
> > configured with 64k pages. If your iommu granule is 4k instead, we
> > wouldn't use the blk_dma_map_direct path.
>
> I spent some time looking to see if I could guess what this is and
> came up empty. It seems most likely we are leaking a dma mapping
> tracking somehow? The DMA API side is pretty simple here though..
Yeah, nothing stood out to me here either.
> Not sure the 64k/4k itself is a cause, but triggering the non-iova
> flow is probably the issue.
>
> Can you check the output of this debugfs:
I don't have a system in this state at the moment, so we checked
previous logs on machines running older kernels. It's extermely
uncommon, but this error was happening prior to this series, so I don't
think this introduced any new problem here. I'll keeping looking, but I
don't think we'll make much progress if I can't find a more reliable
reproducer.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-23 18:30 ` Keith Busch
@ 2025-09-23 22:22 ` Jason Gunthorpe
2025-09-23 22:35 ` Keith Busch
0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2025-09-23 22:22 UTC (permalink / raw)
To: Keith Busch
Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Tue, Sep 23, 2025 at 12:30:55PM -0600, Keith Busch wrote:
> I don't have a system in this state at the moment, so we checked
> previous logs on machines running older kernels. It's extermely
> uncommon, but this error was happening prior to this series, so I don't
> think this introduced any new problem here. I'll keeping looking, but I
> don't think we'll make much progress if I can't find a more reliable
> reproducer.
Okay, that's great. It needs to get resolved but it is not this series
at fault.
Very rare is a different perspective, I mis-thought it was happening
reproducible all the time..
It seems to me it is actually a legitimate thing for userspace to be
able to trigger this cache line debug. If you do concurrent O_DIRECT
to the very same memory it should trigger if I read it right..
So it may not even be an actual bug???
Jason
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API
2025-09-23 22:22 ` Jason Gunthorpe
@ 2025-09-23 22:35 ` Keith Busch
0 siblings, 0 replies; 30+ messages in thread
From: Keith Busch @ 2025-09-23 22:35 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue,
Alexander Potapenko, Alex Gaynor, Andrew Morton,
Christoph Hellwig, Danilo Krummrich, David Hildenbrand, iommu,
Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet,
Juergen Gross, kasan-dev, linux-block, linux-doc, linux-kernel,
linux-mm, linux-nvme, linuxppc-dev, linux-trace-kernel,
Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman,
Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-for-linux,
Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtualization,
Will Deacon, xen-devel
On Tue, Sep 23, 2025 at 07:22:16PM -0300, Jason Gunthorpe wrote:
> Very rare is a different perspective, I mis-thought it was happening
> reproducible all the time..
Yes, sorry for the false alarm. I think we got unlucky and hit it on one
of the first boots from testing linux-next, so knee-jerk reaction was to
suspect the new code that showed up in the stack.
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2025-09-23 22:35 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20250909132821eucas1p1051ce9e0270ddbf520e105c913fa8db6@eucas1p1.samsung.com>
2025-09-09 13:27 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
2025-09-09 19:37 ` Leon Romanovsky
2025-09-10 5:26 ` Leon Romanovsky
2025-09-10 11:58 ` Jason Gunthorpe
2025-09-11 22:23 ` Marek Szyprowski
2025-09-09 13:27 ` [PATCH v6 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 06/16] iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 09/16] dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 15/16] block-dma: properly take MMIO path Leon Romanovsky
2025-09-09 13:27 ` [PATCH v6 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
2025-09-11 22:25 ` [PATCH v6 00/16] dma-mapping: migrate to physical address-based API Marek Szyprowski
2025-09-12 9:03 ` Leon Romanovsky
2025-09-19 16:08 ` Keith Busch
2025-09-20 15:53 ` Leon Romanovsky
2025-09-21 0:47 ` Keith Busch
2025-09-23 17:09 ` Jason Gunthorpe
2025-09-23 18:30 ` Keith Busch
2025-09-23 22:22 ` Jason Gunthorpe
2025-09-23 22:35 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).