From: Leon Romanovsky <leon@kernel.org>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Abdiel Janulgue <abdiel.janulgue@gmail.com>,
Alexander Potapenko <glider@google.com>,
Alex Gaynor <alex.gaynor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Danilo Krummrich <dakr@kernel.org>,
iommu@lists.linux.dev, Jason Wang <jasowang@redhat.com>,
Jens Axboe <axboe@kernel.dk>, Joerg Roedel <joro@8bytes.org>,
Jonathan Corbet <corbet@lwn.net>, Juergen Gross <jgross@suse.com>,
kasan-dev@googlegroups.com, Keith Busch <kbusch@kernel.org>,
linux-block@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-nvme@lists.infradead.org, linuxppc-dev@lists.ozlabs.org,
linux-trace-kernel@vger.kernel.org,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>,
"Michael S. Tsirkin" <mst@redhat.com>,
Miguel Ojeda <ojeda@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
rust-for-linux@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
Stefano Stabellini <sstabellini@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
virtualization@lists.linux.dev, Will Deacon <will@kernel.org>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 00/16] dma-mapping: migrate to physical address-based API
Date: Fri, 5 Sep 2025 20:38:50 +0300 [thread overview]
Message-ID: <20250905173850.GB25881@unreal> (raw)
In-Reply-To: <7557f31e-1504-4f62-b00b-70e25bb793cb@samsung.com>
On Fri, Sep 05, 2025 at 06:20:51PM +0200, Marek Szyprowski wrote:
> On 29.08.2025 15:16, Jason Gunthorpe wrote:
> > On Tue, Aug 19, 2025 at 08:36:44PM +0300, Leon Romanovsky wrote:
> >
> >> This series does the core code and modern flows. A followup series
> >> will give the same treatment to the legacy dma_ops implementation.
> > I took a quick check over this to see that it is sane. I think using
> > phys is an improvement for most of the dma_ops implemenations.
> >
> > arch/sparc/kernel/pci_sun4v.c
> > arch/sparc/kernel/iommu.c
> > Uses __pa to get phys from the page, never touches page
> >
> > arch/alpha/kernel/pci_iommu.c
> > arch/sparc/mm/io-unit.c
> > drivers/parisc/ccio-dma.c
> > drivers/parisc/sba_iommu.c
> > Does page_addres() and later does __pa on it. Doesn't touch struct page
> >
> > arch/x86/kernel/amd_gart_64.c
> > drivers/xen/swiotlb-xen.c
> > arch/mips/jazz/jazzdma.c
> > Immediately does page_to_phys(), never touches struct page
> >
> > drivers/vdpa/vdpa_user/vduse_dev.c
> > Does page_to_phys() to call iommu_map()
> >
> > drivers/xen/grant-dma-ops.c
> > Does page_to_pfn() and nothing else
> >
> > arch/powerpc/platforms/ps3/system-bus.c
> > This is a maze but I think it wants only phys and the virt is only
> > used for debug prints.
> >
> > The above all never touch a KVA and just want a phys_addr_t.
> >
> > The below are touching the KVA somehow:
> >
> > arch/sparc/mm/iommu.c
> > arch/arm/mm/dma-mapping.c
> > Uses page_address to cache flush, would be happy with phys_to_virt()
> > and a PhysHighMem()
> >
> > arch/powerpc/kernel/dma-iommu.c
> > arch/powerpc/platforms/pseries/vio.c
> > Uses iommu_map_page() which wants phys_to_virt(), doesn't touch
> > struct page
> >
> > arch/powerpc/platforms/pseries/ibmebus.c
> > Returns phys_to_virt() as dma_addr_t.
> >
> > The two PPC ones are weird, I didn't figure out how that was working..
> >
> > It would be easy to make map_phys patches for about half of these, in
> > the first grouping. Doing so would also grant those arches
> > map_resource capability.
> >
> > Overall I didn't think there was any reduction in maintainability in
> > these places. Most are improvements eliminating code, and some are
> > just switching to phys_to_virt() from page_address(), which we could
> > further guard with DMA_ATTR_MMIO and a check for highmem.
>
> Thanks for this summary.
>
> However I would still like to get an answer for the simple question -
> why all this work cannot be replaced by a simple use of dma_map_resource()?
>
> I've checked the most advertised use case in
> https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio
> and I still don't see the reason why it cannot be based
> on dma_map_resource() API? I'm aware of the little asymmetry of the
> client calls is such case, indeed it is not preety, but this should work
> even now:
>
> phys = phys_vec[i].paddr;
>
> if (is_mmio)
> dma_map_resource(phys, len, ...);
> else
> dma_map_page(phys_to_page(phys), offset_in_page(phys), ...);
>
> What did I miss?
"Even now" can't work mainly because both of these interfaces don't
support p2p case (PCI_P2PDMA_MAP_BUS_ADDR).
It is unclear how to extend them without introducing new functions
and/or changing whole kernel. In PCI_P2PDMA_MAP_BUS_ADDR case, there
is no struct page, so dma_map_page() is unlikely to be possible to
extend and dma_map_resource() has no direct way to access PCI
bus_offset. In theory, it is doable, but will be layer violation as DMA
will need to rely on PCI layer for address calculations.
If we don't extend, in general case (for HMM, RDMA and NVMe) end result will be something like that:
if (...PCI_P2PDMA_MAP_BUS_ADDR)
pci_p2pdma_bus_addr_map
else if (mmio)
dma_map_resource
else <- this case is not applicable to VFIO-DMABUF
dma_map_page
In case, we will somehow extend these functions to support it, we will
lose very important optimization where we are performing one IOTLB
sync for whole DMABUF region == dma_iova_state, and I was told that
it is very large region.
103 for (i = 0; i < priv->nr_ranges; i++) {
<...>
107 } else if (dma_use_iova(state)) {
108 ret = dma_iova_link(attachment->dev, state,
109 phys_vec[i].paddr, 0,
110 phys_vec[i].len, dir, attrs);
111 if (ret)
112 goto err_unmap_dma;
113
114 mapped_len += phys_vec[i].len;
<...>
132 }
133
134 if (state && dma_use_iova(state)) {
135 WARN_ON_ONCE(mapped_len != priv->size);
136 ret = dma_iova_sync(attachment->dev, state, 0, mapped_len);
>
> I'm not against this rework, but I would really like to know the
> rationale. I know that the 2-step dma-mapping API also use phys
> addresses and this is the same direction.
This series is continuation of 2-step dma-mapping API. The plan to
provide dma_map_phys() was from the beginning.
Thanks
next prev parent reply other threads:[~2025-09-05 17:38 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 17:36 [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 01/16] dma-mapping: introduce new DMA attribute to indicate MMIO memory Leon Romanovsky
2025-08-28 13:03 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 02/16] iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link() Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 03/16] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
2025-08-28 13:19 ` Jason Gunthorpe
2025-09-05 16:26 ` Marek Szyprowski
2025-08-19 17:36 ` [PATCH v4 04/16] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
2025-08-28 13:27 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 05/16] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
2025-08-28 13:38 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 06/16] iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory Leon Romanovsky
2025-08-28 13:49 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 07/16] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
2025-08-28 14:19 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 08/16] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
2025-08-28 15:00 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 09/16] dma-mapping: handle MMIO flow in dma_map|unmap_page Leon Romanovsky
2025-08-28 15:17 ` Jason Gunthorpe
2025-08-31 13:12 ` Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 10/16] xen: swiotlb: Open code map_resource callback Leon Romanovsky
2025-08-28 15:18 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
2025-08-19 18:22 ` Keith Busch
2025-08-28 16:01 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 12/16] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 13/16] mm/hmm: properly take MMIO path Leon Romanovsky
2025-08-19 17:36 ` [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page Leon Romanovsky
2025-08-19 18:20 ` Keith Busch
2025-08-19 18:49 ` Leon Romanovsky
2025-09-02 20:49 ` Marek Szyprowski
2025-09-02 21:59 ` Keith Busch
2025-09-02 23:24 ` Jason Gunthorpe
2025-08-19 17:36 ` [PATCH v4 15/16] block-dma: properly take MMIO path Leon Romanovsky
2025-08-19 18:24 ` Keith Busch
2025-08-28 15:19 ` Keith Busch
2025-08-28 16:54 ` Leon Romanovsky
2025-08-28 17:15 ` Keith Busch
2025-08-28 18:41 ` Jason Gunthorpe
2025-08-28 19:10 ` Keith Busch
2025-08-28 19:18 ` Jason Gunthorpe
2025-08-28 20:54 ` Keith Busch
2025-08-28 23:45 ` Jason Gunthorpe
2025-08-29 12:35 ` Keith Busch
2025-08-19 17:37 ` [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface Leon Romanovsky
2025-08-19 19:58 ` Keith Busch
2025-08-28 11:57 ` [PATCH v4 00/16] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-09-01 21:47 ` Marek Szyprowski
2025-09-01 22:23 ` Jason Gunthorpe
2025-09-02 9:29 ` Leon Romanovsky
2025-08-29 13:16 ` Jason Gunthorpe
2025-09-05 16:20 ` Marek Szyprowski
2025-09-05 17:38 ` Leon Romanovsky [this message]
2025-09-05 17:43 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250905173850.GB25881@unreal \
--to=leon@kernel.org \
--cc=abdiel.janulgue@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alex.gaynor@gmail.com \
--cc=axboe@kernel.dk \
--cc=corbet@lwn.net \
--cc=dakr@kernel.org \
--cc=glider@google.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=jgross@suse.com \
--cc=joro@8bytes.org \
--cc=kasan-dev@googlegroups.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=m.szyprowski@samsung.com \
--cc=maddy@linux.ibm.com \
--cc=mhiramat@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=mst@redhat.com \
--cc=ojeda@kernel.org \
--cc=robin.murphy@arm.com \
--cc=rostedt@goodmis.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=sstabellini@kernel.org \
--cc=virtualization@lists.linux.dev \
--cc=will@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).