linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
@ 2025-11-21  3:27 Jaegeuk Kim
  2025-11-21  4:22 ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2025-11-21  3:27 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel
  Cc: Jaegeuk Kim, Matthew Wilcox (Oracle), Christian Brauner

This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
pages directly to the inaccessible mapping.

The inaccessible pages will be invalidated by evict_inode or explicit munlock().

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 include/uapi/linux/fadvise.h |  2 ++
 mm/fadvise.c                 | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
index 0862b87434c2..06018688b99b 100644
--- a/include/uapi/linux/fadvise.h
+++ b/include/uapi/linux/fadvise.h
@@ -19,4 +19,6 @@
 #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
 #endif
 
+#define POSIX_FADV_MLOCK	8 /* Load pages into inaccessible map.  */
+
 #endif	/* FADVISE_H_INCLUDED */
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 588fe76c5a14..849b151d2024 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -56,6 +56,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 		case POSIX_FADV_WILLNEED:
 		case POSIX_FADV_NOREUSE:
 		case POSIX_FADV_DONTNEED:
+		case POSIX_FADV_MLOCK:
 			/* no bad return value, but ignore advice */
 			break;
 		default:
@@ -93,6 +94,19 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 		file->f_mode &= ~FMODE_RANDOM;
 		spin_unlock(&file->f_lock);
 		break;
+	case POSIX_FADV_MLOCK:
+		/* Remove the cached pages. */
+		if (!mapping_unevictable(mapping)) {
+			invalidate_inode_pages2_range(mapping,
+					offset >> PAGE_SHIFT,
+					(offset + len - 1) >> PAGE_SHIFT);
+
+			/* set the mapping is unevictable */
+			filemap_invalidate_lock(mapping);
+			mapping_set_inaccessible(mapping);
+			filemap_invalidate_unlock(mapping);
+		}
+		fallthrough;
 	case POSIX_FADV_WILLNEED:
 		/* First and last PARTIAL page! */
 		start_index = offset >> PAGE_SHIFT;
-- 
2.52.0.487.g5c8c507ade-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21  3:27 [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK Jaegeuk Kim
@ 2025-11-21  4:22 ` Matthew Wilcox
  2025-11-21  4:46   ` Jaegeuk Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2025-11-21  4:22 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> pages directly to the inaccessible mapping.

... what?

This seems like something which is completely different from mlock().
So it needs a different name.

But I don't understand the point of this, whatever it's called.  Need
more information.

> The inaccessible pages will be invalidated by evict_inode or explicit munlock().
> 
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  include/uapi/linux/fadvise.h |  2 ++
>  mm/fadvise.c                 | 14 ++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
> index 0862b87434c2..06018688b99b 100644
> --- a/include/uapi/linux/fadvise.h
> +++ b/include/uapi/linux/fadvise.h
> @@ -19,4 +19,6 @@
>  #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
>  #endif
>  
> +#define POSIX_FADV_MLOCK	8 /* Load pages into inaccessible map.  */
> +
>  #endif	/* FADVISE_H_INCLUDED */
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 588fe76c5a14..849b151d2024 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -56,6 +56,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
>  		case POSIX_FADV_WILLNEED:
>  		case POSIX_FADV_NOREUSE:
>  		case POSIX_FADV_DONTNEED:
> +		case POSIX_FADV_MLOCK:
>  			/* no bad return value, but ignore advice */
>  			break;
>  		default:
> @@ -93,6 +94,19 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
>  		file->f_mode &= ~FMODE_RANDOM;
>  		spin_unlock(&file->f_lock);
>  		break;
> +	case POSIX_FADV_MLOCK:
> +		/* Remove the cached pages. */
> +		if (!mapping_unevictable(mapping)) {
> +			invalidate_inode_pages2_range(mapping,
> +					offset >> PAGE_SHIFT,
> +					(offset + len - 1) >> PAGE_SHIFT);
> +
> +			/* set the mapping is unevictable */
> +			filemap_invalidate_lock(mapping);
> +			mapping_set_inaccessible(mapping);
> +			filemap_invalidate_unlock(mapping);
> +		}
> +		fallthrough;
>  	case POSIX_FADV_WILLNEED:
>  		/* First and last PARTIAL page! */
>  		start_index = offset >> PAGE_SHIFT;
> -- 
> 2.52.0.487.g5c8c507ade-goog
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21  4:22 ` Matthew Wilcox
@ 2025-11-21  4:46   ` Jaegeuk Kim
  2025-11-21 14:27     ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2025-11-21  4:46 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On 11/21, Matthew Wilcox wrote:
> On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > pages directly to the inaccessible mapping.
> 
> ... what?
> 
> This seems like something which is completely different from mlock().
> So it needs a different name.
> 
> But I don't understand the point of this, whatever it's called.  Need
> more information.

So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
takes 330ms additionally in order to migrate all the pages into inaccessible
map, IIUC.

So, I'm thinking to combine two operations into single fadvise() with whatever
advise. Does it make sense?

> 
> > The inaccessible pages will be invalidated by evict_inode or explicit munlock().
> > 
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  include/uapi/linux/fadvise.h |  2 ++
> >  mm/fadvise.c                 | 14 ++++++++++++++
> >  2 files changed, 16 insertions(+)
> > 
> > diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
> > index 0862b87434c2..06018688b99b 100644
> > --- a/include/uapi/linux/fadvise.h
> > +++ b/include/uapi/linux/fadvise.h
> > @@ -19,4 +19,6 @@
> >  #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
> >  #endif
> >  
> > +#define POSIX_FADV_MLOCK	8 /* Load pages into inaccessible map.  */
> > +
> >  #endif	/* FADVISE_H_INCLUDED */
> > diff --git a/mm/fadvise.c b/mm/fadvise.c
> > index 588fe76c5a14..849b151d2024 100644
> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -56,6 +56,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> >  		case POSIX_FADV_WILLNEED:
> >  		case POSIX_FADV_NOREUSE:
> >  		case POSIX_FADV_DONTNEED:
> > +		case POSIX_FADV_MLOCK:
> >  			/* no bad return value, but ignore advice */
> >  			break;
> >  		default:
> > @@ -93,6 +94,19 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> >  		file->f_mode &= ~FMODE_RANDOM;
> >  		spin_unlock(&file->f_lock);
> >  		break;
> > +	case POSIX_FADV_MLOCK:
> > +		/* Remove the cached pages. */
> > +		if (!mapping_unevictable(mapping)) {
> > +			invalidate_inode_pages2_range(mapping,
> > +					offset >> PAGE_SHIFT,
> > +					(offset + len - 1) >> PAGE_SHIFT);
> > +
> > +			/* set the mapping is unevictable */
> > +			filemap_invalidate_lock(mapping);
> > +			mapping_set_inaccessible(mapping);
> > +			filemap_invalidate_unlock(mapping);
> > +		}
> > +		fallthrough;
> >  	case POSIX_FADV_WILLNEED:
> >  		/* First and last PARTIAL page! */
> >  		start_index = offset >> PAGE_SHIFT;
> > -- 
> > 2.52.0.487.g5c8c507ade-goog
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21  4:46   ` Jaegeuk Kim
@ 2025-11-21 14:27     ` Matthew Wilcox
  2025-11-21 18:02       ` Jaegeuk Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2025-11-21 14:27 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote:
> On 11/21, Matthew Wilcox wrote:
> > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > > pages directly to the inaccessible mapping.
> > 
> > ... what?
> > 
> > This seems like something which is completely different from mlock().
> > So it needs a different name.
> > 
> > But I don't understand the point of this, whatever it's called.  Need
> > more information.
> 
> So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
> by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
> takes 330ms additionally in order to migrate all the pages into inaccessible
> map, IIUC.

Oh, so the MLOCK part is right, but the inaccessible() part is wrong.
Inaccessible is special weird guest_memfd crap that has all kinds of
side-effects that you don't want.

Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and
then calling readahead() for the desired range?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21 14:27     ` Matthew Wilcox
@ 2025-11-21 18:02       ` Jaegeuk Kim
  2025-11-21 19:52         ` Jaegeuk Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2025-11-21 18:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On 11/21, Matthew Wilcox wrote:
> On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote:
> > On 11/21, Matthew Wilcox wrote:
> > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > > > pages directly to the inaccessible mapping.
> > > 
> > > ... what?
> > > 
> > > This seems like something which is completely different from mlock().
> > > So it needs a different name.
> > > 
> > > But I don't understand the point of this, whatever it's called.  Need
> > > more information.
> > 
> > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
> > by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
> > takes 330ms additionally in order to migrate all the pages into inaccessible
> > map, IIUC.
> 
> Oh, so the MLOCK part is right, but the inaccessible() part is wrong.
> Inaccessible is special weird guest_memfd crap that has all kinds of
> side-effects that you don't want.
> 
> Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and
> then calling readahead() for the desired range?

Oh, thank you. Let me try.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21 18:02       ` Jaegeuk Kim
@ 2025-11-21 19:52         ` Jaegeuk Kim
  2025-11-21 19:58           ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2025-11-21 19:52 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On 11/21, Jaegeuk Kim wrote:
> On 11/21, Matthew Wilcox wrote:
> > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote:
> > > On 11/21, Matthew Wilcox wrote:
> > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > > > > pages directly to the inaccessible mapping.
> > > > 
> > > > ... what?
> > > > 
> > > > This seems like something which is completely different from mlock().
> > > > So it needs a different name.
> > > > 
> > > > But I don't understand the point of this, whatever it's called.  Need
> > > > more information.
> > > 
> > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
> > > by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
> > > takes 330ms additionally in order to migrate all the pages into inaccessible
> > > map, IIUC.
> > 
> > Oh, so the MLOCK part is right, but the inaccessible() part is wrong.
> > Inaccessible is special weird guest_memfd crap that has all kinds of
> > side-effects that you don't want.
> > 
> > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and
> > then calling readahead() for the desired range?
> 
> Oh, thank you. Let me try.

After checking the code and experiment, I don't think that gives what we need.
That flag skips populate_vma_page_range only, but we need to allocate pages
in the inaccessible mapping and fill the pages afterwards.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21 19:52         ` Jaegeuk Kim
@ 2025-11-21 19:58           ` Matthew Wilcox
  2025-11-21 21:32             ` Jaegeuk Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2025-11-21 19:58 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On Fri, Nov 21, 2025 at 07:52:02PM +0000, Jaegeuk Kim wrote:
> On 11/21, Jaegeuk Kim wrote:
> > On 11/21, Matthew Wilcox wrote:
> > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote:
> > > > On 11/21, Matthew Wilcox wrote:
> > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > > > > > pages directly to the inaccessible mapping.
> > > > > 
> > > > > ... what?
> > > > > 
> > > > > This seems like something which is completely different from mlock().
> > > > > So it needs a different name.
> > > > > 
> > > > > But I don't understand the point of this, whatever it's called.  Need
> > > > > more information.
> > > > 
> > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
> > > > by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
> > > > takes 330ms additionally in order to migrate all the pages into inaccessible
> > > > map, IIUC.
> > > 
> > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong.
> > > Inaccessible is special weird guest_memfd crap that has all kinds of
> > > side-effects that you don't want.
> > > 
> > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and
> > > then calling readahead() for the desired range?
> > 
> > Oh, thank you. Let me try.
> 
> After checking the code and experiment, I don't think that gives what we need.
> That flag skips populate_vma_page_range only, but we need to allocate pages
> in the inaccessible mapping and fill the pages afterwards.

Then either I don't understand what you're trying to do, or you don't
understand what the inaccessible mapping is for.  Is this just for
speeding up mlock() as you suggested earlier, or are you genuinely
trying to do something with the inaccessible mapping?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21 19:58           ` Matthew Wilcox
@ 2025-11-21 21:32             ` Jaegeuk Kim
  2025-11-22  2:47               ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Jaegeuk Kim @ 2025-11-21 21:32 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On 11/21, Matthew Wilcox wrote:
> On Fri, Nov 21, 2025 at 07:52:02PM +0000, Jaegeuk Kim wrote:
> > On 11/21, Jaegeuk Kim wrote:
> > > On 11/21, Matthew Wilcox wrote:
> > > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote:
> > > > > On 11/21, Matthew Wilcox wrote:
> > > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > > > > > > pages directly to the inaccessible mapping.
> > > > > > 
> > > > > > ... what?
> > > > > > 
> > > > > > This seems like something which is completely different from mlock().
> > > > > > So it needs a different name.
> > > > > > 
> > > > > > But I don't understand the point of this, whatever it's called.  Need
> > > > > > more information.
> > > > > 
> > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
> > > > > by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
> > > > > takes 330ms additionally in order to migrate all the pages into inaccessible
> > > > > map, IIUC.
> > > > 
> > > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong.
> > > > Inaccessible is special weird guest_memfd crap that has all kinds of
> > > > side-effects that you don't want.
> > > > 
> > > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and
> > > > then calling readahead() for the desired range?
> > > 
> > > Oh, thank you. Let me try.
> > 
> > After checking the code and experiment, I don't think that gives what we need.
> > That flag skips populate_vma_page_range only, but we need to allocate pages
> > in the inaccessible mapping and fill the pages afterwards.
> 
> Then either I don't understand what you're trying to do, or you don't
> understand what the inaccessible mapping is for.  Is this just for
> speeding up mlock() as you suggested earlier, or are you genuinely
> trying to do something with the inaccessible mapping?

The latter. I'd like to propose a new read flow with the inaccessible mapping.

As-Is:
 mmap() -> fadvise(fd, POSIX_FADV_WILLNEED) -> mlock()

1. fadvise() proposal
 mmap() -> fadvise(fd, POSIX_FADV_MLOCK)
 : all the pages will be loaded into inaccessible page cache directly

2. mlock2() proposal
 mmap() -> mlock2(MLOCK_ONFAULT) -> madvise(MADV_POPULATE_READ)

If you mean #2, I need to find whether we can get the space for madvise, since
we have only fd when reading the pages. And, also I need to find a way to handle
the folio order directly instead of starging from 0 in madvise() path.
Let me think about it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK
  2025-11-21 21:32             ` Jaegeuk Kim
@ 2025-11-22  2:47               ` Matthew Wilcox
  0 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2025-11-22  2:47 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, Christian Brauner

On Fri, Nov 21, 2025 at 09:32:12PM +0000, Jaegeuk Kim wrote:
> On 11/21, Matthew Wilcox wrote:
> > On Fri, Nov 21, 2025 at 07:52:02PM +0000, Jaegeuk Kim wrote:
> > > On 11/21, Jaegeuk Kim wrote:
> > > > On 11/21, Matthew Wilcox wrote:
> > > > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote:
> > > > > > On 11/21, Matthew Wilcox wrote:
> > > > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > > > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > > > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > > > > > > > pages directly to the inaccessible mapping.
> > > > > > > 
> > > > > > > ... what?
> > > > > > > 
> > > > > > > This seems like something which is completely different from mlock().
> > > > > > > So it needs a different name.
> > > > > > > 
> > > > > > > But I don't understand the point of this, whatever it's called.  Need
> > > > > > > more information.
> > > > > > 
> > > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
> > > > > > by  mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
> > > > > > takes 330ms additionally in order to migrate all the pages into inaccessible
> > > > > > map, IIUC.
> > > > > 
> > > > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong.
> > > > > Inaccessible is special weird guest_memfd crap that has all kinds of
> > > > > side-effects that you don't want.
> > > > > 
> > > > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and
> > > > > then calling readahead() for the desired range?
> > > > 
> > > > Oh, thank you. Let me try.
> > > 
> > > After checking the code and experiment, I don't think that gives what we need.
> > > That flag skips populate_vma_page_range only, but we need to allocate pages
> > > in the inaccessible mapping and fill the pages afterwards.
> > 
> > Then either I don't understand what you're trying to do, or you don't
> > understand what the inaccessible mapping is for.  Is this just for
> > speeding up mlock() as you suggested earlier, or are you genuinely
> > trying to do something with the inaccessible mapping?
> 
> The latter. I'd like to propose a new read flow with the inaccessible mapping.

You REALLY REALLY REALLY need to explain what you're doing because this
all sounds completely bogus.

The inaccessible mapping is something special that guest_memfd does.
But here you are talking about it like it's some kind of normal
filesystem thing.

So, from the top.  What are you trying to accomplish?  Starting from "We
have application A.  It wants to ..."

> As-Is:
>  mmap() -> fadvise(fd, POSIX_FADV_WILLNEED) -> mlock()
> 
> 1. fadvise() proposal
>  mmap() -> fadvise(fd, POSIX_FADV_MLOCK)
>  : all the pages will be loaded into inaccessible page cache directly
> 
> 2. mlock2() proposal
>  mmap() -> mlock2(MLOCK_ONFAULT) -> madvise(MADV_POPULATE_READ)
> 
> If you mean #2, I need to find whether we can get the space for madvise, since
> we have only fd when reading the pages. And, also I need to find a way to handle
> the folio order directly instead of starging from 0 in madvise() path.
> Let me think about it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-22  2:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-21  3:27 [PATCH] [RFC] mm/fadvise: introduce POSIX_FADV_MLOCK Jaegeuk Kim
2025-11-21  4:22 ` Matthew Wilcox
2025-11-21  4:46   ` Jaegeuk Kim
2025-11-21 14:27     ` Matthew Wilcox
2025-11-21 18:02       ` Jaegeuk Kim
2025-11-21 19:52         ` Jaegeuk Kim
2025-11-21 19:58           ` Matthew Wilcox
2025-11-21 21:32             ` Jaegeuk Kim
2025-11-22  2:47               ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).