Re: [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration
       [not found]   ` <aikBIESiJftxBdfL@infradead.org>
@ 2026-06-10  9:54     ` David Hildenbrand (Arm)
  2026-06-10 11:34       ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-10  9:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: sw.prabhu6, axboe, io-uring, linux-kernel, dave, dongjoo.seo1,
	Swarna Prabhu, linux-mm@kvack.org, Matthew Wilcox, Zi Yan

On 6/10/26 08:16, Christoph Hellwig wrote:
> On Tue, Jun 09, 2026 at 08:36:43PM +0200, David Hildenbrand (Arm) wrote:
>> I really don't like arbitrary GUP users to starting to special case hugetlb
>> folios, and making assumptions of how other pages they pinned look like (IOW,
>> how the page table mappings actually looked like).
> 
> Me neither, but the current interfaces are kind forcing them :P

Yeah :)

But general rule: if you're outside of MM core and test for hugetlb folios, you
are doing something very wrong.

> 
>>
>> Ideally, we'd have a pin_user_pages_fast() variant that would give you a list of
>> folio ranges instead of individual pages.
> 
> Yes.  iov_iter_extract_bvecs and thus the block direct I/O fast path
> would instantly benefit from that.
The tricky bit for such an interface is that, soon, some pages won't be folios,
but we could still end up with non-folio pages in the address space (e.g.,
vm_insert_page()) and have to pin+return them. So using folios is not future-proof.

There are some long-term plans on providing an interface that would abstract how
you refcount something you GUP'ed. (because, some pages we GUP in the future
might not even have a dedicated refcount, all still fairly unclear). But it's
all not really finalized I think.

For now, we could expose a folio+page/offset+nr_pages interface, where we,
long-term, would not be able to return non-folio pages (e.g., vm_insert_page())
and would instead, in the future, fail the request if we stumble over a
non-folio thing in the page tables. That sounds reasonable for now.

Another solution would be, exposing page-ranges (e.g., page + nr_pages), whereby
we'd say, that all pages in a range belong to the same compound page, and that
we took a single reference for all pages in the range. IOW, page_folio() would
for now be the same for all pages in a range.

As soon as some mapped pages are no longer folios, we'll likely have to modify
plenty of drivers either way, that blindly cast pages to folios ...

So maybe a folio-range based interface is good enough for now.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration
  2026-06-10  9:54     ` [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration David Hildenbrand (Arm)
@ 2026-06-10 11:34       ` Christoph Hellwig
  2026-06-10 13:18         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2026-06-10 11:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Christoph Hellwig, sw.prabhu6, axboe, io-uring, linux-kernel,
	dave, dongjoo.seo1, Swarna Prabhu, linux-mm@kvack.org,
	Matthew Wilcox, Zi Yan

On Wed, Jun 10, 2026 at 11:54:01AM +0200, David Hildenbrand (Arm) wrote:
> > Yes.  iov_iter_extract_bvecs and thus the block direct I/O fast path
> > would instantly benefit from that.
> The tricky bit for such an interface is that, soon, some pages won't be folios,
> but we could still end up with non-folio pages in the address space (e.g.,
> vm_insert_page()) and have to pin+return them. So using folios is not future-proof.

I'm still doubtful on the "soon" beause of all the issues like this
in the I/O path.

> There are some long-term plans on providing an interface that would abstract how
> you refcount something you GUP'ed. (because, some pages we GUP in the future
> might not even have a dedicated refcount, all still fairly unclear). But it's
> all not really finalized I think.
> 
> For now, we could expose a folio+page/offset+nr_pages interface, where we,
> long-term, would not be able to return non-folio pages (e.g., vm_insert_page())
> and would instead, in the future, fail the request if we stumble over a
> non-folio thing in the page tables. That sounds reasonable for now.

I think whatever we're going to use for direct I/O has to also support
non-folio pages, especially PCI P2P memory.  So coming up with an
interface that support this ASAP would be helpful.

> Another solution would be, exposing page-ranges (e.g., page + nr_pages), whereby
> we'd say, that all pages in a range belong to the same compound page, and that
> we took a single reference for all pages in the range. IOW, page_folio() would
> for now be the same for all pages in a range.

This does sound like a reasonable short-term improvement.  One annoying
issue with returning only order 0 page in the current interfaces is
that it fills up the pages array in the caller for no good reason.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration
  2026-06-10 11:34       ` Christoph Hellwig
@ 2026-06-10 13:18         ` David Hildenbrand (Arm)
  2026-06-10 18:10           ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-10 13:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: sw.prabhu6, axboe, io-uring, linux-kernel, dave, dongjoo.seo1,
	Swarna Prabhu, linux-mm@kvack.org, Matthew Wilcox, Zi Yan

On 6/10/26 13:34, Christoph Hellwig wrote:
> On Wed, Jun 10, 2026 at 11:54:01AM +0200, David Hildenbrand (Arm) wrote:
>>> Yes.  iov_iter_extract_bvecs and thus the block direct I/O fast path
>>> would instantly benefit from that.
>> The tricky bit for such an interface is that, soon, some pages won't be folios,
>> but we could still end up with non-folio pages in the address space (e.g.,
>> vm_insert_page()) and have to pin+return them. So using folios is not future-proof.
> 
> I'm still doubtful on the "soon" beause of all the issues like this
> in the I/O path.

Yeah, there are a bunch of very hairy things.

> 
>> There are some long-term plans on providing an interface that would abstract how
>> you refcount something you GUP'ed. (because, some pages we GUP in the future
>> might not even have a dedicated refcount, all still fairly unclear). But it's
>> all not really finalized I think.
>>
>> For now, we could expose a folio+page/offset+nr_pages interface, where we,
>> long-term, would not be able to return non-folio pages (e.g., vm_insert_page())
>> and would instead, in the future, fail the request if we stumble over a
>> non-folio thing in the page tables. That sounds reasonable for now.
> 
> I think whatever we're going to use for direct I/O has to also support
> non-folio pages, especially PCI P2P memory.  So coming up with an
> interface that support this ASAP would be helpful.

Yes.

I think we can keep returning pages as long a the unpin interface knows the
right thing to do to unpin them.

> 
>> Another solution would be, exposing page-ranges (e.g., page + nr_pages), whereby
>> we'd say, that all pages in a range belong to the same compound page, and that
>> we took a single reference for all pages in the range. IOW, page_folio() would
>> for now be the same for all pages in a range.
> 
> This does sound like a reasonable short-term improvement.
Right, and as long as callers don't cast the returned thing to a folio, it would
be future proof. But I guess quite some GUP users cast to folios.

Would there be users for a new interface that returns page ranges as described
above, that would want to still unpin stuff partially? E.g., we give them a page
range that belongs to the same folio with only a single pin/reference, but they
would want to logically split that range and unpin pages individually?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration
  2026-06-10 13:18         ` David Hildenbrand (Arm)
@ 2026-06-10 18:10           ` Matthew Wilcox
  2026-06-10 18:45             ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2026-06-10 18:10 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Christoph Hellwig, sw.prabhu6, axboe, io-uring, linux-kernel,
	dave, dongjoo.seo1, Swarna Prabhu, linux-mm@kvack.org, Zi Yan

On Wed, Jun 10, 2026 at 03:18:52PM +0200, David Hildenbrand (Arm) wrote:
> On 6/10/26 13:34, Christoph Hellwig wrote:
> > On Wed, Jun 10, 2026 at 11:54:01AM +0200, David Hildenbrand (Arm) wrote:
> >> There are some long-term plans on providing an interface that would abstract how
> >> you refcount something you GUP'ed. (because, some pages we GUP in the future
> >> might not even have a dedicated refcount, all still fairly unclear). But it's
> >> all not really finalized I think.
> >>
> >> For now, we could expose a folio+page/offset+nr_pages interface, where we,
> >> long-term, would not be able to return non-folio pages (e.g., vm_insert_page())
> >> and would instead, in the future, fail the request if we stumble over a
> >> non-folio thing in the page tables. That sounds reasonable for now.
> > 
> > I think whatever we're going to use for direct I/O has to also support
> > non-folio pages, especially PCI P2P memory.  So coming up with an
> > interface that support this ASAP would be helpful.
> 
> Yes.
> 
> I think we can keep returning pages as long a the unpin interface knows the
> right thing to do to unpin them.

This would be the get_user_phyrs() interface I've talked about before.

https://lore.kernel.org/all/ZbVO2RKhw-dLUMvf@casper.infradead.org/
and the long thread:
https://lore.kernel.org/all/YdyKWeU0HTv8m7wD@casper.infradead.org/

> Would there be users for a new interface that returns page ranges as described
> above, that would want to still unpin stuff partially? E.g., we give them a page
> range that belongs to the same folio with only a single pin/reference, but they
> would want to logically split that range and unpin pages individually?

Urgh, no, we shouldn't do that.  ranges should be pinned / unpinned
as a whole.  I'm sympathetic to "for this special operation we need to
create a new range from this existing range and adjust the refcount(s)
appropriately so each of the two rangees can be put separately", but
I'm not sympathetic to "we need to allow each page to be individually
refcounted".


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration
  2026-06-10 18:10           ` Matthew Wilcox
@ 2026-06-10 18:45             ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-10 18:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, sw.prabhu6, axboe, io-uring, linux-kernel,
	dave, dongjoo.seo1, Swarna Prabhu, linux-mm@kvack.org, Zi Yan

On 6/10/26 20:10, Matthew Wilcox wrote:
> On Wed, Jun 10, 2026 at 03:18:52PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/10/26 13:34, Christoph Hellwig wrote:
>>>
>>> I think whatever we're going to use for direct I/O has to also support
>>> non-folio pages, especially PCI P2P memory.  So coming up with an
>>> interface that support this ASAP would be helpful.
>>
>> Yes.
>>
>> I think we can keep returning pages as long a the unpin interface knows the
>> right thing to do to unpin them.
> 
> This would be the get_user_phyrs() interface I've talked about before.
> 
> https://lore.kernel.org/all/ZbVO2RKhw-dLUMvf@casper.infradead.org/
> and the long thread:
> https://lore.kernel.org/all/YdyKWeU0HTv8m7wD@casper.infradead.org/
> 
>> Would there be users for a new interface that returns page ranges as described
>> above, that would want to still unpin stuff partially? E.g., we give them a page
>> range that belongs to the same folio with only a single pin/reference, but they
>> would want to logically split that range and unpin pages individually?
> 
> Urgh, no, we shouldn't do that.  ranges should be pinned / unpinned
> as a whole.  I'm sympathetic to "for this special operation we need to
> create a new range from this existing range and adjust the refcount(s)
> appropriately so each of the two rangees can be put separately", but
> I'm not sympathetic to "we need to allow each page to be individually
> refcounted".

Yes, me too. I wanted to understand if that a common thing to happen for users,
such that we would have to worry about it right from the start.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-10 18:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260608062937.804758-1-sw.prabhu6@gmail.com>
     [not found] ` <c924fb59-be47-4fa5-adbf-a50a831ccd7b@kernel.org>
     [not found]   ` <aikBIESiJftxBdfL@infradead.org>
2026-06-10  9:54     ` [RFC v1] io_uring/rsrc: add fast path huge page handling in buffer registration David Hildenbrand (Arm)
2026-06-10 11:34       ` Christoph Hellwig
2026-06-10 13:18         ` David Hildenbrand (Arm)
2026-06-10 18:10           ` Matthew Wilcox
2026-06-10 18:45             ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox