Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
       [not found]     ` <ajGGpDvzZdkGtSbN@google.com>
@ 2026-06-18 14:10       ` Christoph Hellwig
  2026-06-18 18:20         ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2026-06-18 14:10 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Trond Myklebust, linux-nfs, linux-kernel, Anna Schumaker,
	Shivaji Kant, Matthew Wilcox, linux-mm, linux-fsdevel

On Tue, Jun 16, 2026 at 05:23:48PM +0000, Pranjal Shrivastava wrote:
> AFAIU, the MM subsystem explicitly ensures that every valid struct page
> is part of a folio.

It is definitively not what the vision for the folio is, although if
I'm not mistaken it actually is still true right now.  This whole
area is a minefield unfortunately, and we also ran into it with
iov_iter_extract_bvecs and the earlier block code it was extracted
from.  Adding the relevant people and lists, but for now your best
bet is to stick to what the block code does or even better reuse
as much as possible of that code.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-18 14:10       ` [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests Christoph Hellwig
@ 2026-06-18 18:20         ` Matthew Wilcox
  2026-06-19 12:32           ` Pranjal Shrivastava
  2026-06-26  6:56           ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Matthew Wilcox @ 2026-06-18 18:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pranjal Shrivastava, Trond Myklebust, linux-nfs, linux-kernel,
	Anna Schumaker, Shivaji Kant, linux-mm, linux-fsdevel

On Thu, Jun 18, 2026 at 07:10:45AM -0700, Christoph Hellwig wrote:
> On Tue, Jun 16, 2026 at 05:23:48PM +0000, Pranjal Shrivastava wrote:
> > AFAIU, the MM subsystem explicitly ensures that every valid struct page
> > is part of a folio.
> 
> It is definitively not what the vision for the folio is, although if
> I'm not mistaken it actually is still true right now.

It's not true, eg, for slab.  While there's still a struct page there
for slab, there's no refcount and flags like PG_locked have different
meanings.  You'll get into a lot of trouble trying to treat slabs as
folios (and that will include assertions tripping).

> This whole
> area is a minefield unfortunately, and we also ran into it with
> iov_iter_extract_bvecs and the earlier block code it was extracted
> from.  Adding the relevant people and lists, but for now your best
> bet is to stick to what the block code does or even better reuse
> as much as possible of that code.

Yes.  Fundamentally, it is no business of the filesystem what the iov_iter
refers to.  We can do direct io to slab memory, vmalloc memory, memory
that doesn't have a struct page (eg iomem), or whatever we choose.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-18 18:20         ` Matthew Wilcox
@ 2026-06-19 12:32           ` Pranjal Shrivastava
  2026-06-26  6:56           ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Pranjal Shrivastava @ 2026-06-19 12:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Trond Myklebust, linux-nfs, linux-kernel,
	Anna Schumaker, Shivaji Kant, linux-mm, linux-fsdevel

On Thu, Jun 18, 2026 at 07:20:06PM +0100, Matthew Wilcox wrote:

Hi Matthew, Christoph, Trond,

> On Thu, Jun 18, 2026 at 07:10:45AM -0700, Christoph Hellwig wrote:
> > On Tue, Jun 16, 2026 at 05:23:48PM +0000, Pranjal Shrivastava wrote:
> > > AFAIU, the MM subsystem explicitly ensures that every valid struct page
> > > is part of a folio.
> > 
> > It is definitively not what the vision for the folio is, although if
> > I'm not mistaken it actually is still true right now.
> 
> It's not true, eg, for slab.  While there's still a struct page there
> for slab, there's no refcount and flags like PG_locked have different
> meanings.  You'll get into a lot of trouble trying to treat slabs as
> folios (and that will include assertions tripping).
> 
> > This whole
> > area is a minefield unfortunately, and we also ran into it with
> > iov_iter_extract_bvecs and the earlier block code it was extracted
> > from.  Adding the relevant people and lists, but for now your best
> > bet is to stick to what the block code does or even better reuse
> > as much as possible of that code.
> 
> Yes.  Fundamentally, it is no business of the filesystem what the iov_iter
> refers to.  We can do direct io to slab memory, vmalloc memory, memory
> that doesn't have a struct page (eg iomem), or whatever we choose.
> 

Thanks for the clarification. I understand the larger vision of keeping
filesystems agnostic to the underlying memory represented by the iov_iter

The documentation for page_folio() [1] mentions that "Every page is part
of a folio," but it appears there are important nuances regarding slab
and other memory types that I was not aware of.

However, I am a bit confused on one point:
Looking at iov_iter_extract_bvecs() [1] it relies on 
get_contig_folio_len() [2], which calls page_folio() on the pages 
extracted (via iov_iter_extract_pages()) without additional checks for
slab or vmalloc memory. 

I am happy to refactor the NFS Direct I/O path to reuse the same helper
(get_contig_folio_len()) from the bvec extractor, but I'm a little 
confused as the bvec extractor seems to suffer from the same risk?

Is the recommendation to keep these details abstracted by the iov_iter
lib and eventually hide things like iov_iter_extract_pages() and manual
folio conversions from filesystems entirely?

If that's the case, would it help to export get_contig_folio_len() (or 
introduce new helpers) in the iov_iter lib for NFS and other fs to use?

Thanks,
Praan

[1] https://elixir.bootlin.com/linux/v7.1-rc6/source/include/linux/page-flags.h#L291
[2] https://elixir.bootlin.com/linux/v7.1/source/lib/iov_iter.c#L1849




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-18 18:20         ` Matthew Wilcox
  2026-06-19 12:32           ` Pranjal Shrivastava
@ 2026-06-26  6:56           ` Christoph Hellwig
  2026-07-03 12:46             ` Pranjal Shrivastava
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2026-06-26  6:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Pranjal Shrivastava, Trond Myklebust,
	linux-nfs, linux-kernel, Anna Schumaker, Shivaji Kant, linux-mm,
	linux-fsdevel

[sorry, dropped the ball a bit on this due to overload]

On Thu, Jun 18, 2026 at 07:20:06PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 18, 2026 at 07:10:45AM -0700, Christoph Hellwig wrote:
> > On Tue, Jun 16, 2026 at 05:23:48PM +0000, Pranjal Shrivastava wrote:
> > > AFAIU, the MM subsystem explicitly ensures that every valid struct page
> > > is part of a folio.
> > 
> > It is definitively not what the vision for the folio is, although if
> > I'm not mistaken it actually is still true right now.
> 
> It's not true, eg, for slab.  While there's still a struct page there
> for slab, there's no refcount and flags like PG_locked have different
> meanings.  You'll get into a lot of trouble trying to treat slabs as
> folios (and that will include assertions tripping).

True.  But also not relevant for direct I/O user pinning.  If we stopped
having valid folios for anything mapped into userspace,
iov_iter_extract_bvecs would run into problems, and we had the discussion
before that at least right now it would be hard to fix.

Also if iov_iter_extract_bvecs was used on kvec or bvec iters we could
run into the slab problem.  The block usage currently makes sure bvec
iters are not handed to iov_iter_extract_bvecs, but there is no such
thing for kvec vectors, although no one is using them for direct I/O
right now.  Not that I'd want to rely on that in the long run.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-06-26  6:56           ` Christoph Hellwig
@ 2026-07-03 12:46             ` Pranjal Shrivastava
  2026-07-03 14:00               ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Pranjal Shrivastava @ 2026-07-03 12:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, Trond Myklebust, linux-nfs, linux-kernel,
	Anna Schumaker, Shivaji Kant, linux-mm, linux-fsdevel

On Thu, Jun 25, 2026 at 11:56:08PM -0700, Christoph Hellwig wrote:
> [sorry, dropped the ball a bit on this due to overload]
> 
> On Thu, Jun 18, 2026 at 07:20:06PM +0100, Matthew Wilcox wrote:
> > On Thu, Jun 18, 2026 at 07:10:45AM -0700, Christoph Hellwig wrote:
> > > On Tue, Jun 16, 2026 at 05:23:48PM +0000, Pranjal Shrivastava wrote:
> > > > AFAIU, the MM subsystem explicitly ensures that every valid struct page
> > > > is part of a folio.
> > > 
> > > It is definitively not what the vision for the folio is, although if
> > > I'm not mistaken it actually is still true right now.
> > 
> > It's not true, eg, for slab.  While there's still a struct page there
> > for slab, there's no refcount and flags like PG_locked have different
> > meanings.  You'll get into a lot of trouble trying to treat slabs as
> > folios (and that will include assertions tripping).
> 
> True.  But also not relevant for direct I/O user pinning.  If we stopped
> having valid folios for anything mapped into userspace,
> iov_iter_extract_bvecs would run into problems, and we had the discussion
> before that at least right now it would be hard to fix.
> 

+1. I see that extract_bvecs also rely on user memory to have valid
folios even if we were to re-use parts of it (get_contig_folio_len) it
still relies on page_folio() as detailed in the other reply.

> Also if iov_iter_extract_bvecs was used on kvec or bvec iters we could
> run into the slab problem.  The block usage currently makes sure bvec
> iters are not handed to iov_iter_extract_bvecs, but there is no such
> thing for kvec vectors, although no one is using them for direct I/O
> right now.  Not that I'd want to rely on that in the long run.
> 

Do we have use-cases for a kernel user for direct I/O ? (Just curious to
know if there's something on the horizon).

Thanks,
Praan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests
  2026-07-03 12:46             ` Pranjal Shrivastava
@ 2026-07-03 14:00               ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2026-07-03 14:00 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Christoph Hellwig, Matthew Wilcox, Trond Myklebust, linux-nfs,
	linux-kernel, Anna Schumaker, Shivaji Kant, linux-mm,
	linux-fsdevel

On Fri, Jul 03, 2026 at 12:46:58PM +0000, Pranjal Shrivastava wrote:
> Do we have use-cases for a kernel user for direct I/O ? (Just curious to
> know if there's something on the horizon).

Plenty.  Basically any storage on file driver, or storage / file system
server:  loop, zloop, nvmet, scsi target and nfsd ar the ones I know off
by head.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-07-03 14:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260616134000.2733403-1-praan@google.com>
     [not found] ` <20260616134000.2733403-7-praan@google.com>
     [not found]   ` <7ee3bcfdd6126c93cbb1c219bf601182b95c10d9.camel@kernel.org>
     [not found]     ` <ajGGpDvzZdkGtSbN@google.com>
2026-06-18 14:10       ` [PATCH v2 6/7] nfs: Optimize direct I/O to use folios for requests Christoph Hellwig
2026-06-18 18:20         ` Matthew Wilcox
2026-06-19 12:32           ` Pranjal Shrivastava
2026-06-26  6:56           ` Christoph Hellwig
2026-07-03 12:46             ` Pranjal Shrivastava
2026-07-03 14:00               ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox