linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Leon Romanovsky <leon@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>
Cc: "Robin Murphy" <robin.murphy@arm.com>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Madhavan Srinivasan" <maddy@linux.ibm.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Christophe Leroy" <christophe.leroy@csgroup.eu>,
	"Joerg Roedel" <joro@8bytes.org>, "Will Deacon" <will@kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Alexander Potapenko" <glider@google.com>,
	"Marco Elver" <elver@google.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, iommu@lists.linux.dev,
	virtualization@lists.linux.dev, kasan-dev@googlegroups.com,
	linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 0/8] dma-mapping: migrate to physical address-based API
Date: Wed, 30 Jul 2025 11:28:18 -0300	[thread overview]
Message-ID: <20250730142818.GL26511@ziepe.ca> (raw)
In-Reply-To: <20250730134026.GQ402218@unreal>

On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

> > The natural working unit for whatever replaces dma_map_page() will be
> > whatever the replacement for alloc_pages() returns, and the replacement for
> > kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> > would be an unadorned physical address) there cannot be any
> > *meaningful*

alloc_pages becomes legacy.

There will be some new API 'memdesc alloc'. If I understand Matthew's
plan properly - here is a sketch of changing iommu-pages:

--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -36,9 +36,10 @@ static_assert(sizeof(struct ioptdesc) <= sizeof(struct page));
  */
 void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
 {
+       struct ioptdesc *desc;
        unsigned long pgcnt;
-       struct folio *folio;
        unsigned int order;
+       void *addr;
 
        /* This uses page_address() on the memory. */
        if (WARN_ON(gfp & __GFP_HIGHMEM))
@@ -56,8 +57,8 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
        if (nid == NUMA_NO_NODE)
                nid = numa_mem_id();
 
-       folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);
-       if (unlikely(!folio))
+       addr = memdesc_alloc_pages(&desc, gfp | __GFP_ZERO, order, nid);
+       if (unlikely(!addr))
                return NULL;
 
        /*
@@ -73,7 +74,7 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
        mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt);
        lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt);
 
-       return folio_address(folio);
+       return addr;
 }

Where the memdesc_alloc_pages() will kmalloc a 'struct ioptdesc' and
some other change so that virt_to_ioptdesc() indirects through a new
memdesc. See here:

https://kernelnewbies.org/MatthewWilcox/Memdescs

We don't end up with some kind of catch-all struct to mean 'cachable
CPU memory' anymore because every user gets their own unique "struct
XXXdesc". So the thinking has been that the phys_addr_t is the best
option. I guess the alternative would be the memdesc as a handle, but
I'm not sure that is such a good idea. 

People still express a desire to be able to do IO to cachable memory
that has a KVA through phys_to_virt but no memdesc/page allocation. I
don't know if this will happen but it doesn't seem like a good idea to
make it impossible by forcing memdesc types into low level APIs that
don't use them.

Also, the bio/scatterlist code between pin_user_pages() and DMA
mapping is consolidating physical contiguity. This runs faster if you
don't have to to page_to_phys() because everything is already
phys_addr_t.

> > progress made towards removing the struct page dependency from the DMA API.
> > If there is also a goal to kill off highmem before then, then logically we
> > should just wait for that to land, then revert back to dma_map_single()
> > being the first-class interface, and dma_map_page() can turn into a trivial
> > page_to_virt() wrapper for the long tail of caller conversions.

As I said there are many many projects related here and we can
meaningfully make progress in parts. It is not functionally harmful to
do the phys to page conversion before calling the legacy
dma_ops/SWIOTLB etc. This avoids creating patch dependencies with
highmem removal and other projects.

So long as the legacy things (highmem, dma_ops, etc) continue to work
I think it is OK to accept some obfuscation to allow the modern things
to work better. The majority flow - no highmem, no dma ops, no
swiotlb, does not require struct page. Having to do

  PTE -> phys -> page -> phys -> DMA

Does have a cost.

> The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> pass it to both dma_map_phys() and dma_iova_link(). This flag will
> indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

So the idea is if the memory is non-cachable, no-KVA you'd call
dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
DMA_ATTR_MMIO) ?

And then internally the dma_ops and dma_iommu would use the existing
map_page/map_resource variations based on the flag, thus ensuring that
MMIO is never kmap'd or cache flushed?

dma_map_resource is really then just
dma_map_phys(phys_addr, DMA_ATTR_MMIO)?

I like this, I think it well addresses the concerns.

Jason


  reply	other threads:[~2025-07-30 14:28 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250625131920eucas1p271b196cde042bd39ac08fb12beff5baf@eucas1p2.samsung.com>
2025-06-25 13:18 ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Leon Romanovsky
2025-06-25 13:18   ` [PATCH 1/8] dma-debug: refactor to use physical addresses for page mapping Leon Romanovsky
2025-06-25 13:18   ` [PATCH 2/8] dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys Leon Romanovsky
2025-06-25 13:19   ` [PATCH 3/8] iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys Leon Romanovsky
2025-06-25 13:19   ` [PATCH 4/8] dma-mapping: convert dma_direct_*map_page to be phys_addr_t based Leon Romanovsky
2025-06-25 13:19   ` [PATCH 5/8] kmsan: convert kmsan_handle_dma to use physical addresses Leon Romanovsky
2025-06-26 17:43     ` Alexander Potapenko
2025-06-26 18:45       ` Leon Romanovsky
2025-06-27 16:28         ` Alexander Potapenko
2025-06-25 13:19   ` [PATCH 6/8] dma-mapping: fail early if physical address is mapped through platform callback Leon Romanovsky
2025-07-25 20:04     ` Robin Murphy
2025-07-27  6:30       ` Leon Romanovsky
2025-06-25 13:19   ` [PATCH 7/8] dma-mapping: export new dma_*map_phys() interface Leon Romanovsky
2025-06-25 13:19   ` [PATCH 8/8] mm/hmm: migrate to physical address-based DMA mapping API Leon Romanovsky
2025-07-15 13:24     ` Will Deacon
2025-07-15 13:58       ` Leon Romanovsky
2025-06-27 13:44   ` [PATCH 0/8] dma-mapping: migrate to physical address-based API Marek Szyprowski
2025-06-27 17:02     ` Leon Romanovsky
2025-06-30 13:38       ` Christoph Hellwig
2025-07-08 10:27         ` Marek Szyprowski
2025-07-08 11:00           ` Leon Romanovsky
2025-07-08 11:45             ` Marek Szyprowski
2025-07-08 12:06               ` Leon Romanovsky
2025-07-08 12:56                 ` Marek Szyprowski
2025-07-08 15:57                   ` Marek Szyprowski
2025-07-30 11:11           ` Robin Murphy
2025-07-30 13:40             ` Leon Romanovsky
2025-07-30 14:28               ` Jason Gunthorpe [this message]
2025-07-31  6:01                 ` Leon Romanovsky
2025-07-30 16:32             ` Marek Szyprowski
2025-07-31 17:37             ` Matthew Wilcox
2025-08-03 15:59               ` Jason Gunthorpe
2025-08-04  3:37                 ` Matthew Wilcox
2025-08-05 15:36                   ` Jason Gunthorpe
2025-07-06  6:00       ` Leon Romanovsky
2025-07-25 20:05   ` Robin Murphy
2025-07-29 14:03     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250730142818.GL26511@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=eperezma@redhat.com \
    --cc=glider@google.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kasan-dev@googlegroups.com \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=m.szyprowski@samsung.com \
    --cc=maddy@linux.ibm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=robin.murphy@arm.com \
    --cc=virtualization@lists.linux.dev \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).