* [PATCH 0/2] mm: Add readahead support for IOCB_NOWAIT
@ 2024-08-12 9:05 Yafang Shao
2024-08-12 9:05 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Yafang Shao
2024-08-12 9:05 ` [PATCH 2/2] mm: allow read-ahead with IOCB_NOWAIT set Yafang Shao
0 siblings, 2 replies; 37+ messages in thread
From: Yafang Shao @ 2024-08-12 9:05 UTC (permalink / raw)
To: akpm, viro, brauner, jack, david; +Cc: linux-fsdevel, linux-mm, Yafang Shao
We have use cases where we want to trigger readahead in preadv2(2) with
RWF_NOWAIT set [0].
Readahead support was originally added for IOCB_NOWAIT in commit
2e85abf053b9 ("mm: allow read-ahead with IOCB_NOWAIT set"). However, this
behavior was modified in commit efa8480a8316 ("fs: RWF_NOWAIT should imply
IOCB_NOIO") due to concerns about potential blocking during memory
reclamation.
To prevent blocking on memory reclamation, we can utilize
memalloc_nowait_{save,restore} to ensure non-blocking behavior. By
restoring the original behavior, we can allow preadv2(IOCB_NOWAIT) to read
data directly from disk if it's not already in the page cache.
Changes:
- fs: Add a new flag RWF_IOWAIT for preadv2(2)
https://lore.kernel.org/linux-fsdevel/20240804080251.21239-1-laoar.shao@gmail.com/ [0]
Yafang Shao (2):
mm: Add memalloc_nowait_{save,restore}
mm: allow read-ahead with IOCB_NOWAIT set
include/linux/fs.h | 1 -
include/linux/sched/mm.h | 30 ++++++++++++++++++++++++++++++
mm/filemap.c | 6 ++++++
3 files changed, 36 insertions(+), 1 deletion(-)
--
2.43.5
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 9:05 [PATCH 0/2] mm: Add readahead support for IOCB_NOWAIT Yafang Shao
@ 2024-08-12 9:05 ` Yafang Shao
2024-08-12 11:37 ` Christoph Hellwig
2024-08-14 0:28 ` Dave Chinner
2024-08-12 9:05 ` [PATCH 2/2] mm: allow read-ahead with IOCB_NOWAIT set Yafang Shao
1 sibling, 2 replies; 37+ messages in thread
From: Yafang Shao @ 2024-08-12 9:05 UTC (permalink / raw)
To: akpm, viro, brauner, jack, david
Cc: linux-fsdevel, linux-mm, Yafang Shao, Kent Overstreet
The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
this, let's add two helper functions, memalloc_nowait_{save,restore}, which
will be useful in scenarios where we want to avoid waiting for memory
reclamation.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
---
include/linux/sched/mm.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 91546493c43d..4fbae0013166 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -484,6 +484,36 @@ static inline void memalloc_pin_restore(unsigned int flags)
memalloc_flags_restore(flags);
}
+/**
+ * memalloc_nowait_save - Marks implicit ~__GFP_DIRECT_RECLAIM allocation scope.
+ *
+ * This functions marks the beginning of the ~__GFP_DIRECT_RECLAIM allocation
+ * scope. All further allocations will implicitly drop __GFP_DIRECT_RECLAIM flag
+ * and so they are won't wait for direct memory reclaiming. Use
+ * memalloc_nowait_restore to end the scope with flags returned by this
+ * function.
+ *
+ * Context: This function is safe to be used from any context.
+ * Return: The saved flags to be passed to memalloc_nowait_restore.
+ */
+static inline unsigned int memalloc_nowait_save(void)
+{
+ return memalloc_flags_save(PF_MEMALLOC_NORECLAIM);
+}
+
+/**
+ * memalloc_nofs_restore - Ends the implicit ~__GFP_DIRECT_RECLAIM scope.
+ * @flags: Flags to restore.
+ *
+ * Ends the implicit ~__GFP_DIRECT_RECLAIM scope started by memalloc_nowait_save
+ * function. Always make sure that the given flags is the return value from the
+ * pairing memalloc_nowait_save call.
+ */
+static inline void memalloc_nowait_restore(unsigned int flags)
+{
+ memalloc_flags_restore(flags);
+}
+
#ifdef CONFIG_MEMCG
DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
/**
--
2.43.5
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 2/2] mm: allow read-ahead with IOCB_NOWAIT set
2024-08-12 9:05 [PATCH 0/2] mm: Add readahead support for IOCB_NOWAIT Yafang Shao
2024-08-12 9:05 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Yafang Shao
@ 2024-08-12 9:05 ` Yafang Shao
1 sibling, 0 replies; 37+ messages in thread
From: Yafang Shao @ 2024-08-12 9:05 UTC (permalink / raw)
To: akpm, viro, brauner, jack, david
Cc: linux-fsdevel, linux-mm, Yafang Shao, Jens Axboe, Matthew Wilcox
Readahead support for IOCB_NOWAIT was introduced in commit 2e85abf053b9
("mm: allow read-ahead with IOCB_NOWAIT set"). However, this behavior was
altered in commit efa8480a8316 ("fs: RWF_NOWAIT should imply IOCB_NOIO") to
prevent potential blocking during memory reclamation.
To address the blocking issue during memory reclamation, we can use
memalloc_nowait_{save,restore} to ensure non-blocking behavior. With this
change, we can restore the original functionality, allowing
preadv2(IOCB_NOWAIT) to read data directly from the disk if it's not
already in the page cache.
Link: https://lore.gnuweeb.org/io-uring/20200624164127.GP21350@casper.infradead.org/
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
---
include/linux/fs.h | 1 -
mm/filemap.c | 6 ++++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..ced74b1b350d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3455,7 +3455,6 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags,
if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
return -EOPNOTSUPP;
- kiocb_flags |= IOCB_NOIO;
}
if (flags & RWF_ATOMIC) {
if (rw_type != WRITE)
diff --git a/mm/filemap.c b/mm/filemap.c
index 657bcd887fdb..93c690e1d1fd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -46,6 +46,7 @@
#include <linux/pipe_fs_i.h>
#include <linux/splice.h>
#include <linux/rcupdate_wait.h>
+#include <linux/sched/mm.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
@@ -2514,6 +2515,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
pgoff_t last_index;
struct folio *folio;
+ unsigned int flags;
int err = 0;
/* "last_index" is the index of the page beyond the end of the read */
@@ -2526,8 +2528,12 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & IOCB_NOIO)
return -EAGAIN;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ flags = memalloc_nowait_save();
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ memalloc_nowait_restore(flags);
filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
}
if (!folio_batch_count(fbatch)) {
--
2.43.5
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 9:05 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Yafang Shao
@ 2024-08-12 11:37 ` Christoph Hellwig
2024-08-12 12:59 ` Yafang Shao
2024-08-12 16:48 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Kent Overstreet
2024-08-14 0:28 ` Dave Chinner
1 sibling, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-12 11:37 UTC (permalink / raw)
To: Yafang Shao
Cc: akpm, viro, brauner, jack, david, linux-fsdevel, linux-mm,
Kent Overstreet
On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> will be useful in scenarios where we want to avoid waiting for memory
> reclamation.
No, forcing nowait on callee contets is just asking for trouble.
Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
and thus will lead to kernel crashes.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 11:37 ` Christoph Hellwig
@ 2024-08-12 12:59 ` Yafang Shao
2024-08-12 13:21 ` Christoph Hellwig
2024-08-14 7:42 ` Michal Hocko
2024-08-12 16:48 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Kent Overstreet
1 sibling, 2 replies; 37+ messages in thread
From: Yafang Shao @ 2024-08-12 12:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: akpm, viro, brauner, jack, david, linux-fsdevel, linux-mm,
Kent Overstreet
On Mon, Aug 12, 2024 at 7:37 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > will be useful in scenarios where we want to avoid waiting for memory
> > reclamation.
>
> No, forcing nowait on callee contets is just asking for trouble.
> Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
I don’t see any incompatibility in __alloc_pages_slowpath(). The
~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
performed, but it doesn’t prevent the allocation of pages from
ALLOC_MIN_RESERVE, correct?
> and thus will lead to kernel crashes.
Could you please explain in detail where this might lead to kernel crashes?
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 12:59 ` Yafang Shao
@ 2024-08-12 13:21 ` Christoph Hellwig
2024-08-13 2:09 ` Yafang Shao
2024-08-14 7:42 ` Michal Hocko
1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-12 13:21 UTC (permalink / raw)
To: Yafang Shao
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Mon, Aug 12, 2024 at 08:59:53PM +0800, Yafang Shao wrote:
>
> I don’t see any incompatibility in __alloc_pages_slowpath(). The
> ~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
> performed, but it doesn’t prevent the allocation of pages from
> ALLOC_MIN_RESERVE, correct?
>
> > and thus will lead to kernel crashes.
>
> Could you please explain in detail where this might lead to kernel crashes?
Sorry, I misread your patch as doing what your subject says.
A nestable noreclaim is probably fine, but please name it that way,
as memalloc_nowait_{save,restore} implies a context version
of GFP_NOWAIT.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 11:37 ` Christoph Hellwig
2024-08-12 12:59 ` Yafang Shao
@ 2024-08-12 16:48 ` Kent Overstreet
2024-08-14 5:24 ` Christoph Hellwig
1 sibling, 1 reply; 37+ messages in thread
From: Kent Overstreet @ 2024-08-12 16:48 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Yafang Shao, akpm, viro, brauner, jack, david, linux-fsdevel,
linux-mm
On Mon, Aug 12, 2024 at 04:37:58AM GMT, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > will be useful in scenarios where we want to avoid waiting for memory
> > reclamation.
>
> No, forcing nowait on callee contets is just asking for trouble.
> Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
> and thus will lead to kernel crashes.
No different from passing GFP_NOWAIT to mempoool_alloc(), and we're
trying to get away from passing gfp flags directly for multiple reasons
so we need it.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 13:21 ` Christoph Hellwig
@ 2024-08-13 2:09 ` Yafang Shao
2024-08-14 5:27 ` Christoph Hellwig
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-13 2:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: akpm, viro, brauner, jack, david, linux-fsdevel, linux-mm,
Kent Overstreet
On Mon, Aug 12, 2024 at 9:21 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Aug 12, 2024 at 08:59:53PM +0800, Yafang Shao wrote:
> >
> > I don’t see any incompatibility in __alloc_pages_slowpath(). The
> > ~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
> > performed, but it doesn’t prevent the allocation of pages from
> > ALLOC_MIN_RESERVE, correct?
> >
> > > and thus will lead to kernel crashes.
> >
> > Could you please explain in detail where this might lead to kernel crashes?
>
> Sorry, I misread your patch as doing what your subject says.
> A nestable noreclaim is probably fine, but please name it that way,
> as memalloc_nowait_{save,restore} implies a context version
> of GFP_NOWAIT.
There are already memalloc_noreclaim_{save,restore} which imply __GFP_MEMALLOC:
memalloc_noreclaim_save - Marks implicit __GFP_MEMALLOC scope.
That is why I name it memalloc_nowait_{save,restore}. GFP_NOWAIT has
the same meaning with ~__GFP_DIRECT_RECLAIM:
%GFP_NOWAIT is for kernel allocations that should not stall for direct
reclaim, start physical IO or use any filesystem callback.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 9:05 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Yafang Shao
2024-08-12 11:37 ` Christoph Hellwig
@ 2024-08-14 0:28 ` Dave Chinner
2024-08-14 2:19 ` Yafang Shao
1 sibling, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2024-08-14 0:28 UTC (permalink / raw)
To: Yafang Shao
Cc: akpm, viro, brauner, jack, linux-fsdevel, linux-mm,
Kent Overstreet
On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> will be useful in scenarios where we want to avoid waiting for memory
> reclamation.
Readahead already uses this context:
static inline gfp_t readahead_gfp_mask(struct address_space *x)
{
return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
}
and __GFP_NORETRY means minimal direct reclaim should be performed.
Most filesystems already have GFP_NOFS context from
mapping_gfp_mask(), so how much difference does completely avoiding
direct reclaim actually make under memory pressure?
i.e. doing some direct reclaim without blocking when under memory
pressure might actually give better performance than skipping direct
reclaim and aborting readahead altogether....
This really, really needs some numbers (both throughput and IO
latency histograms) to go with it because we have no evidence either
way to determine what is the best approach here.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 0:28 ` Dave Chinner
@ 2024-08-14 2:19 ` Yafang Shao
2024-08-14 5:42 ` Dave Chinner
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-14 2:19 UTC (permalink / raw)
To: Dave Chinner
Cc: akpm, viro, brauner, jack, linux-fsdevel, linux-mm,
Kent Overstreet
On Wed, Aug 14, 2024 at 8:28 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > will be useful in scenarios where we want to avoid waiting for memory
> > reclamation.
>
> Readahead already uses this context:
>
> static inline gfp_t readahead_gfp_mask(struct address_space *x)
> {
> return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
> }
>
> and __GFP_NORETRY means minimal direct reclaim should be performed.
> Most filesystems already have GFP_NOFS context from
> mapping_gfp_mask(), so how much difference does completely avoiding
> direct reclaim actually make under memory pressure?
Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies
__GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for IO,
right?
>
> i.e. doing some direct reclaim without blocking when under memory
> pressure might actually give better performance than skipping direct
> reclaim and aborting readahead altogether....
>
> This really, really needs some numbers (both throughput and IO
> latency histograms) to go with it because we have no evidence either
> way to determine what is the best approach here.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 16:48 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Kent Overstreet
@ 2024-08-14 5:24 ` Christoph Hellwig
0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-14 5:24 UTC (permalink / raw)
To: Kent Overstreet
Cc: Christoph Hellwig, Yafang Shao, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm
On Mon, Aug 12, 2024 at 12:48:26PM -0400, Kent Overstreet wrote:
> > Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
> > and thus will lead to kernel crashes.
>
> No different from passing GFP_NOWAIT to mempoool_alloc(), and we're
> trying to get away from passing gfp flags directly for multiple reasons
> so we need it.
It is very different. Passing GFP_NOWAIT to mempool_alloc affects
the piece of code calling mempool_alloc. Setting a GFP_NOWAIT context
affects all called code, i.e. all block drivers under the file system,
and possible another file system under the loop driver. And none of
them expect the consequences of that.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-13 2:09 ` Yafang Shao
@ 2024-08-14 5:27 ` Christoph Hellwig
2024-08-14 7:33 ` Yafang Shao
0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-14 5:27 UTC (permalink / raw)
To: Yafang Shao
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet, Vlastimil Babka,
Michal Hocko
> There are already memalloc_noreclaim_{save,restore} which imply __GFP_MEMALLOC:
>
> memalloc_noreclaim_save - Marks implicit __GFP_MEMALLOC scope.
.. and those are horrible misnamed :(
If we can't even keep our APIs consistently name, who is supposed
to understand all this?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 2:19 ` Yafang Shao
@ 2024-08-14 5:42 ` Dave Chinner
2024-08-14 7:32 ` Yafang Shao
0 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2024-08-14 5:42 UTC (permalink / raw)
To: Yafang Shao
Cc: akpm, viro, brauner, jack, linux-fsdevel, linux-mm,
Kent Overstreet
On Wed, Aug 14, 2024 at 10:19:36AM +0800, Yafang Shao wrote:
> On Wed, Aug 14, 2024 at 8:28 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > will be useful in scenarios where we want to avoid waiting for memory
> > > reclamation.
> >
> > Readahead already uses this context:
> >
> > static inline gfp_t readahead_gfp_mask(struct address_space *x)
> > {
> > return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
> > }
> >
> > and __GFP_NORETRY means minimal direct reclaim should be performed.
> > Most filesystems already have GFP_NOFS context from
> > mapping_gfp_mask(), so how much difference does completely avoiding
> > direct reclaim actually make under memory pressure?
>
> Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies
> __GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for IO,
> right?
There's a *lot* more difference between __GFP_NORETRY and
__GFP_NOWAIT than just __GFP_NOIO. I don't need you to try to
describe to me what the differences are; What I'm asking you is this:
> > i.e. doing some direct reclaim without blocking when under memory
> > pressure might actually give better performance than skipping direct
> > reclaim and aborting readahead altogether....
> >
> > This really, really needs some numbers (both throughput and IO
> > latency histograms) to go with it because we have no evidence either
> > way to determine what is the best approach here.
Put simply: does the existing readahead mechanism give better results
than the proposed one, and if so, why wouldn't we just reenable
readahead unconditionally instead of making it behave differently
for this specific case?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 5:42 ` Dave Chinner
@ 2024-08-14 7:32 ` Yafang Shao
2024-08-15 2:54 ` Dave Chinner
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-14 7:32 UTC (permalink / raw)
To: Dave Chinner
Cc: akpm, viro, brauner, jack, linux-fsdevel, linux-mm,
Kent Overstreet
On Wed, Aug 14, 2024 at 1:42 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Aug 14, 2024 at 10:19:36AM +0800, Yafang Shao wrote:
> > On Wed, Aug 14, 2024 at 8:28 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > reclamation.
> > >
> > > Readahead already uses this context:
> > >
> > > static inline gfp_t readahead_gfp_mask(struct address_space *x)
> > > {
> > > return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
> > > }
> > >
> > > and __GFP_NORETRY means minimal direct reclaim should be performed.
> > > Most filesystems already have GFP_NOFS context from
> > > mapping_gfp_mask(), so how much difference does completely avoiding
> > > direct reclaim actually make under memory pressure?
> >
> > Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies
> > __GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for IO,
> > right?
>
> There's a *lot* more difference between __GFP_NORETRY and
> __GFP_NOWAIT than just __GFP_NOIO. I don't need you to try to
> describe to me what the differences are; What I'm asking you is this:
>
> > > i.e. doing some direct reclaim without blocking when under memory
> > > pressure might actually give better performance than skipping direct
> > > reclaim and aborting readahead altogether....
> > >
> > > This really, really needs some numbers (both throughput and IO
> > > latency histograms) to go with it because we have no evidence either
> > > way to determine what is the best approach here.
>
> Put simply: does the existing readahead mechanism give better results
> than the proposed one, and if so, why wouldn't we just reenable
> readahead unconditionally instead of making it behave differently
> for this specific case?
Are you suggesting we compare the following change with the current proposal?
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..ced74b1b350d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3455,7 +3455,6 @@ static inline int kiocb_set_rw_flags(struct
kiocb *ki, rwf_t flags,
if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
return -EOPNOTSUPP;
- kiocb_flags |= IOCB_NOIO;
}
if (flags & RWF_ATOMIC) {
if (rw_type != WRITE)
Doesn't unconditional readahead break the semantics of RWF_NOWAIT,
which is supposed to avoid waiting for I/O? For example, it might
trigger a pageout for a dirty page.
--
Regards
Yafang
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 5:27 ` Christoph Hellwig
@ 2024-08-14 7:33 ` Yafang Shao
2024-09-01 20:24 ` Vlastimil Babka
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-14 7:33 UTC (permalink / raw)
To: Christoph Hellwig
Cc: akpm, viro, brauner, jack, david, linux-fsdevel, linux-mm,
Kent Overstreet, Vlastimil Babka, Michal Hocko
On Wed, Aug 14, 2024 at 1:27 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> > There are already memalloc_noreclaim_{save,restore} which imply __GFP_MEMALLOC:
> >
> > memalloc_noreclaim_save - Marks implicit __GFP_MEMALLOC scope.
>
> .. and those are horrible misnamed :(
What about renaming it to memalloc_memalloc_save ?
>
> If we can't even keep our APIs consistently name, who is supposed
> to understand all this?
>
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-12 12:59 ` Yafang Shao
2024-08-12 13:21 ` Christoph Hellwig
@ 2024-08-14 7:42 ` Michal Hocko
2024-08-14 8:12 ` Yafang Shao
1 sibling, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2024-08-14 7:42 UTC (permalink / raw)
To: Yafang Shao
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Mon 12-08-24 20:59:53, Yafang Shao wrote:
> On Mon, Aug 12, 2024 at 7:37 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > will be useful in scenarios where we want to avoid waiting for memory
> > > reclamation.
> >
> > No, forcing nowait on callee contets is just asking for trouble.
> > Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
>
> I don’t see any incompatibility in __alloc_pages_slowpath(). The
> ~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
> performed, but it doesn’t prevent the allocation of pages from
> ALLOC_MIN_RESERVE, correct?
Right but this means that you just made any potential nested allocation
within the scope that is GFP_NOFAIL a busy loop essentially. Not to
mention it BUG_ON as non-sleeping GFP_NOFAIL allocations are
unsupported. I believe this is what Christoph had in mind. I am really
surprised that we even have PF_MEMALLOC_NORECLAIM in the first place!
Unsurprisingly this was merged without any review by the MM community :/
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 7:42 ` Michal Hocko
@ 2024-08-14 8:12 ` Yafang Shao
2024-08-14 12:43 ` Michal Hocko
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-14 8:12 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Wed, Aug 14, 2024 at 3:42 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 12-08-24 20:59:53, Yafang Shao wrote:
> > On Mon, Aug 12, 2024 at 7:37 PM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > reclamation.
> > >
> > > No, forcing nowait on callee contets is just asking for trouble.
> > > Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
> >
> > I don’t see any incompatibility in __alloc_pages_slowpath(). The
> > ~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
> > performed, but it doesn’t prevent the allocation of pages from
> > ALLOC_MIN_RESERVE, correct?
>
> Right but this means that you just made any potential nested allocation
> within the scope that is GFP_NOFAIL a busy loop essentially. Not to
> mention it BUG_ON as non-sleeping GFP_NOFAIL allocations are
> unsupported. I believe this is what Christoph had in mind.
If that's the case, I believe we should at least consider adding the
following code change to the kernel:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ecf99190ea2..89411ee23c7f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4168,6 +4168,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned int zonelist_iter_cookie;
int reserve_flags;
+ WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL && !(gfp_mask &
__GFP_DIRECT_RECLAIM));
restart:
compaction_retries = 0;
no_progress_loops = 0;
> I am really
> surprised that we even have PF_MEMALLOC_NORECLAIM in the first place!
There's use cases for it.
> Unsurprisingly this was merged without any review by the MM community :/
> --
> Michal Hocko
> SUSE Labs
--
Regards
Yafang
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 8:12 ` Yafang Shao
@ 2024-08-14 12:43 ` Michal Hocko
2024-08-15 3:26 ` Yafang Shao
0 siblings, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2024-08-14 12:43 UTC (permalink / raw)
To: Yafang Shao
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Wed 14-08-24 16:12:27, Yafang Shao wrote:
> On Wed, Aug 14, 2024 at 3:42 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 12-08-24 20:59:53, Yafang Shao wrote:
> > > On Mon, Aug 12, 2024 at 7:37 PM Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > > reclamation.
> > > >
> > > > No, forcing nowait on callee contets is just asking for trouble.
> > > > Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
> > >
> > > I don’t see any incompatibility in __alloc_pages_slowpath(). The
> > > ~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
> > > performed, but it doesn’t prevent the allocation of pages from
> > > ALLOC_MIN_RESERVE, correct?
> >
> > Right but this means that you just made any potential nested allocation
> > within the scope that is GFP_NOFAIL a busy loop essentially. Not to
> > mention it BUG_ON as non-sleeping GFP_NOFAIL allocations are
> > unsupported. I believe this is what Christoph had in mind.
>
> If that's the case, I believe we should at least consider adding the
> following code change to the kernel:
We already do have that
/*
* All existing users of the __GFP_NOFAIL are blockable, so warn
* of any new users that actually require GFP_NOWAIT
*/
if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
goto fail;
But Barry has patches to turn that into BUG because failing NOFAIL
allocations is not cool and cause unexpected failures. Have a look at
https://lore.kernel.org/all/20240731000155.109583-1-21cnbao@gmail.com/
> > I am really
> > surprised that we even have PF_MEMALLOC_NORECLAIM in the first place!
>
> There's use cases for it.
Right but there are certain constrains that we need to worry about to
have a maintainable code. Scope allocation contrains are really a good
feature when that has a well defined semantic. E.g. NOFS, NOIO or
NOMEMALLOC (although this is more self inflicted injury exactly because
PF_MEMALLOC had a "use case"). NOWAIT scope semantic might seem a good
feature but it falls appart on nested NOFAIL allocations! So the flag is
usable _only_ if you fully control the whole scoped context. Good luck
with that long term! This is fragile, hard to review and even harder to
keep working properly. The flag would have been Nacked on that ground.
But nobody asked...
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 7:32 ` Yafang Shao
@ 2024-08-15 2:54 ` Dave Chinner
2024-08-15 3:38 ` Yafang Shao
0 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2024-08-15 2:54 UTC (permalink / raw)
To: Yafang Shao
Cc: akpm, viro, brauner, jack, linux-fsdevel, linux-mm,
Kent Overstreet
On Wed, Aug 14, 2024 at 03:32:26PM +0800, Yafang Shao wrote:
> On Wed, Aug 14, 2024 at 1:42 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Aug 14, 2024 at 10:19:36AM +0800, Yafang Shao wrote:
> > > On Wed, Aug 14, 2024 at 8:28 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > > reclamation.
> > > >
> > > > Readahead already uses this context:
> > > >
> > > > static inline gfp_t readahead_gfp_mask(struct address_space *x)
> > > > {
> > > > return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
> > > > }
> > > >
> > > > and __GFP_NORETRY means minimal direct reclaim should be performed.
> > > > Most filesystems already have GFP_NOFS context from
> > > > mapping_gfp_mask(), so how much difference does completely avoiding
> > > > direct reclaim actually make under memory pressure?
> > >
> > > Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies
> > > __GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for IO,
> > > right?
> >
> > There's a *lot* more difference between __GFP_NORETRY and
> > __GFP_NOWAIT than just __GFP_NOIO. I don't need you to try to
> > describe to me what the differences are; What I'm asking you is this:
> >
> > > > i.e. doing some direct reclaim without blocking when under memory
> > > > pressure might actually give better performance than skipping direct
> > > > reclaim and aborting readahead altogether....
> > > >
> > > > This really, really needs some numbers (both throughput and IO
> > > > latency histograms) to go with it because we have no evidence either
> > > > way to determine what is the best approach here.
> >
> > Put simply: does the existing readahead mechanism give better results
> > than the proposed one, and if so, why wouldn't we just reenable
> > readahead unconditionally instead of making it behave differently
> > for this specific case?
>
> Are you suggesting we compare the following change with the current proposal?
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index fd34b5755c0b..ced74b1b350d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3455,7 +3455,6 @@ static inline int kiocb_set_rw_flags(struct
> kiocb *ki, rwf_t flags,
> if (flags & RWF_NOWAIT) {
> if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
> return -EOPNOTSUPP;
> - kiocb_flags |= IOCB_NOIO;
> }
> if (flags & RWF_ATOMIC) {
> if (rw_type != WRITE)
Yes.
> Doesn't unconditional readahead break the semantics of RWF_NOWAIT,
> which is supposed to avoid waiting for I/O? For example, it might
> trigger a pageout for a dirty page.
Yes, but only for *some filesystems* in *some configurations*.
Readahead allocation behaviour is specifically controlled by the gfp
mask set on the mapping by the filesystem at inode instantiation
time. i.e. via a call to mapping_set_gfp_mask().
XFS, for one, always clears __GFP_FS from this mask, and several
other filesystems set it to GFP_NOFS. Filesystems that do this will
not do pageout for a dirty page during memory allocation.
Further, memory reclaim can not write dirty pages to a filesystem
without a ->writepage implementation. ->writepage is almost
completely gone - neither ext4, btrfs or XFS have a ->writepage
implementation anymore - with f2fs being the only "major" filesystem
with a ->writepage implementation remaining.
IOWs, for most readahead cases right now, direct memory reclaim will
not issue writeback IO on dirty cached file pages and in the near
future that will change to -never-.
That means the only IO that direct reclaim will be able to do is for
swapping and compaction. Both of these can be prevented simply by
setting a GFP_NOIO allocation context. IOWs, in the not-to-distant
future we won't have to turn direct reclaim off to prevent IO from
and blocking in direct reclaim during readahead - GFP_NOIO context
will be all that is necessary for IOCB_NOWAIT readahead.
That's why I'm asking if just doing readahead as it stands from
RWF_NOWAIT causes any obvious problems. I think we really only need
need GFP_NOIO | __GFP_NORETRY allocation context for NOWAIT
readahead IO, and that's something we already have a context API
for.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 12:43 ` Michal Hocko
@ 2024-08-15 3:26 ` Yafang Shao
2024-08-15 6:22 ` Michal Hocko
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-15 3:26 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Wed, Aug 14, 2024 at 8:43 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 14-08-24 16:12:27, Yafang Shao wrote:
> > On Wed, Aug 14, 2024 at 3:42 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 12-08-24 20:59:53, Yafang Shao wrote:
> > > > On Mon, Aug 12, 2024 at 7:37 PM Christoph Hellwig <hch@infradead.org> wrote:
> > > > >
> > > > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > > > reclamation.
> > > > >
> > > > > No, forcing nowait on callee contets is just asking for trouble.
> > > > > Unlike NOIO or NOFS this is incompatible with NOFAIL allocations
> > > >
> > > > I don’t see any incompatibility in __alloc_pages_slowpath(). The
> > > > ~__GFP_DIRECT_RECLAIM flag only ensures that direct reclaim is not
> > > > performed, but it doesn’t prevent the allocation of pages from
> > > > ALLOC_MIN_RESERVE, correct?
> > >
> > > Right but this means that you just made any potential nested allocation
> > > within the scope that is GFP_NOFAIL a busy loop essentially. Not to
> > > mention it BUG_ON as non-sleeping GFP_NOFAIL allocations are
> > > unsupported. I believe this is what Christoph had in mind.
> >
> > If that's the case, I believe we should at least consider adding the
> > following code change to the kernel:
>
> We already do have that
> /*
> * All existing users of the __GFP_NOFAIL are blockable, so warn
> * of any new users that actually require GFP_NOWAIT
> */
> if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> goto fail;
I don't see a reason to place the `goto fail;` above the
`__alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);`
line. Since we've already woken up kswapd, it should be acceptable to
allocate memory from ALLOC_MIN_RESERVE temporarily. Why not consider
implementing the following changes instead?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ecf99190ea2..598d4df829cd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4386,13 +4386,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned
int order,
* we always retry
*/
if (gfp_mask & __GFP_NOFAIL) {
- /*
- * All existing users of the __GFP_NOFAIL are blockable, so warn
- * of any new users that actually require GFP_NOWAIT
- */
- if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
- goto fail;
-
/*
* PF_MEMALLOC request from this context is rather bizarre
* because we cannot reclaim anything and only can loop waiting
@@ -4419,6 +4412,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned
int order,
if (page)
goto got_pg;
+ /*
+ * All existing users of the __GFP_NOFAIL are blockable, so warn
+ * of any new users that actually require GFP_NOWAIT
+ */
+ if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) {
+ goto fail;
+ }
+
cond_resched();
goto retry;
}
>
> But Barry has patches to turn that into BUG because failing NOFAIL
> allocations is not cool and cause unexpected failures. Have a look at
> https://lore.kernel.org/all/20240731000155.109583-1-21cnbao@gmail.com/
>
> > > I am really
> > > surprised that we even have PF_MEMALLOC_NORECLAIM in the first place!
> >
> > There's use cases for it.
>
> Right but there are certain constrains that we need to worry about to
> have a maintainable code. Scope allocation contrains are really a good
> feature when that has a well defined semantic. E.g. NOFS, NOIO or
> NOMEMALLOC (although this is more self inflicted injury exactly because
> PF_MEMALLOC had a "use case"). NOWAIT scope semantic might seem a good
> feature but it falls appart on nested NOFAIL allocations! So the flag is
> usable _only_ if you fully control the whole scoped context. Good luck
> with that long term! This is fragile, hard to review and even harder to
> keep working properly. The flag would have been Nacked on that ground.
> But nobody asked...
It's already implemented, and complaints won't resolve the issue. How
about making the following change to provide a warning when this new
flag is used incorrectly?
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 4fbae0013166..5a1e1bcde347 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -267,9 +267,10 @@ static inline gfp_t current_gfp_context(gfp_t flags)
* Stronger flags before weaker flags:
* NORECLAIM implies NOIO, which in turn implies NOFS
*/
- if (pflags & PF_MEMALLOC_NORECLAIM)
+ if (pflags & PF_MEMALLOC_NORECLAIM) {
flags &= ~__GFP_DIRECT_RECLAIM;
- else if (pflags & PF_MEMALLOC_NOIO)
+ WARN_ON_ONCE_GFP(flags & __GFP_NOFAIL, flags)
+ } else if (pflags & PF_MEMALLOC_NOIO)
flags &= ~(__GFP_IO | __GFP_FS);
else if (pflags & PF_MEMALLOC_NOFS)
flags &= ~__GFP_FS;
--
Regards
Yafang
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-15 2:54 ` Dave Chinner
@ 2024-08-15 3:38 ` Yafang Shao
0 siblings, 0 replies; 37+ messages in thread
From: Yafang Shao @ 2024-08-15 3:38 UTC (permalink / raw)
To: Dave Chinner
Cc: akpm, viro, brauner, jack, linux-fsdevel, linux-mm,
Kent Overstreet
On Thu, Aug 15, 2024 at 10:54 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Aug 14, 2024 at 03:32:26PM +0800, Yafang Shao wrote:
> > On Wed, Aug 14, 2024 at 1:42 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Aug 14, 2024 at 10:19:36AM +0800, Yafang Shao wrote:
> > > > On Wed, Aug 14, 2024 at 8:28 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > On Mon, Aug 12, 2024 at 05:05:24PM +0800, Yafang Shao wrote:
> > > > > > The PF_MEMALLOC_NORECLAIM flag was introduced in commit eab0af905bfc
> > > > > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"). To complement
> > > > > > this, let's add two helper functions, memalloc_nowait_{save,restore}, which
> > > > > > will be useful in scenarios where we want to avoid waiting for memory
> > > > > > reclamation.
> > > > >
> > > > > Readahead already uses this context:
> > > > >
> > > > > static inline gfp_t readahead_gfp_mask(struct address_space *x)
> > > > > {
> > > > > return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
> > > > > }
> > > > >
> > > > > and __GFP_NORETRY means minimal direct reclaim should be performed.
> > > > > Most filesystems already have GFP_NOFS context from
> > > > > mapping_gfp_mask(), so how much difference does completely avoiding
> > > > > direct reclaim actually make under memory pressure?
> > > >
> > > > Besides the __GFP_NOFS , ~__GFP_DIRECT_RECLAIM also implies
> > > > __GPF_NOIO. If we don't set __GPF_NOIO, the readahead can wait for IO,
> > > > right?
> > >
> > > There's a *lot* more difference between __GFP_NORETRY and
> > > __GFP_NOWAIT than just __GFP_NOIO. I don't need you to try to
> > > describe to me what the differences are; What I'm asking you is this:
> > >
> > > > > i.e. doing some direct reclaim without blocking when under memory
> > > > > pressure might actually give better performance than skipping direct
> > > > > reclaim and aborting readahead altogether....
> > > > >
> > > > > This really, really needs some numbers (both throughput and IO
> > > > > latency histograms) to go with it because we have no evidence either
> > > > > way to determine what is the best approach here.
> > >
> > > Put simply: does the existing readahead mechanism give better results
> > > than the proposed one, and if so, why wouldn't we just reenable
> > > readahead unconditionally instead of making it behave differently
> > > for this specific case?
> >
> > Are you suggesting we compare the following change with the current proposal?
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index fd34b5755c0b..ced74b1b350d 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -3455,7 +3455,6 @@ static inline int kiocb_set_rw_flags(struct
> > kiocb *ki, rwf_t flags,
> > if (flags & RWF_NOWAIT) {
> > if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
> > return -EOPNOTSUPP;
> > - kiocb_flags |= IOCB_NOIO;
> > }
> > if (flags & RWF_ATOMIC) {
> > if (rw_type != WRITE)
>
> Yes.
>
> > Doesn't unconditional readahead break the semantics of RWF_NOWAIT,
> > which is supposed to avoid waiting for I/O? For example, it might
> > trigger a pageout for a dirty page.
>
> Yes, but only for *some filesystems* in *some configurations*.
> Readahead allocation behaviour is specifically controlled by the gfp
> mask set on the mapping by the filesystem at inode instantiation
> time. i.e. via a call to mapping_set_gfp_mask().
>
> XFS, for one, always clears __GFP_FS from this mask, and several
> other filesystems set it to GFP_NOFS. Filesystems that do this will
> not do pageout for a dirty page during memory allocation.
>
> Further, memory reclaim can not write dirty pages to a filesystem
> without a ->writepage implementation. ->writepage is almost
> completely gone - neither ext4, btrfs or XFS have a ->writepage
> implementation anymore - with f2fs being the only "major" filesystem
> with a ->writepage implementation remaining.
>
> IOWs, for most readahead cases right now, direct memory reclaim will
> not issue writeback IO on dirty cached file pages and in the near
> future that will change to -never-.
>
> That means the only IO that direct reclaim will be able to do is for
> swapping and compaction. Both of these can be prevented simply by
> setting a GFP_NOIO allocation context. IOWs, in the not-to-distant
> future we won't have to turn direct reclaim off to prevent IO from
> and blocking in direct reclaim during readahead - GFP_NOIO context
> will be all that is necessary for IOCB_NOWAIT readahead.
>
> That's why I'm asking if just doing readahead as it stands from
> RWF_NOWAIT causes any obvious problems. I think we really only need
> need GFP_NOIO | __GFP_NORETRY allocation context for NOWAIT
> readahead IO, and that's something we already have a context API
> for.
Understood, thanks for your explanation.
so we need below changes,
@@ -2526,8 +2528,12 @@ static int filemap_get_pages(struct kiocb
*iocb, size_t count,
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & IOCB_NOIO)
return -EAGAIN;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ flags = memalloc_noio_save();
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ memalloc_noio_restore(flags);
filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
}
if (!folio_batch_count(fbatch)) {
What data would you recommend collecting after implementing the above
change? Should we measure the latency of preadv2(2) under high memory
pressure? Although latency can vary, it seems we have no choice but to
use memalloc_noio_save instead of memalloc_nowait_save, as the MM
folks are not in favor of the latter.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-15 3:26 ` Yafang Shao
@ 2024-08-15 6:22 ` Michal Hocko
2024-08-15 6:32 ` Yafang Shao
0 siblings, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2024-08-15 6:22 UTC (permalink / raw)
To: Yafang Shao
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Thu 15-08-24 11:26:09, Yafang Shao wrote:
> On Wed, Aug 14, 2024 at 8:43 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > > If that's the case, I believe we should at least consider adding the
> > > following code change to the kernel:
> >
> > We already do have that
> > /*
> > * All existing users of the __GFP_NOFAIL are blockable, so warn
> > * of any new users that actually require GFP_NOWAIT
> > */
> > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> > goto fail;
>
> I don't see a reason to place the `goto fail;` above the
> `__alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);`
> line. Since we've already woken up kswapd, it should be acceptable to
> allocate memory from ALLOC_MIN_RESERVE temporarily. Why not consider
> implementing the following changes instead?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9ecf99190ea2..598d4df829cd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4386,13 +4386,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned
> int order,
> * we always retry
> */
> if (gfp_mask & __GFP_NOFAIL) {
> - /*
> - * All existing users of the __GFP_NOFAIL are blockable, so warn
> - * of any new users that actually require GFP_NOWAIT
> - */
> - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> - goto fail;
> -
> /*
> * PF_MEMALLOC request from this context is rather bizarre
> * because we cannot reclaim anything and only can loop waiting
> @@ -4419,6 +4412,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned
> int order,
> if (page)
> goto got_pg;
>
> + /*
> + * All existing users of the __GFP_NOFAIL are blockable, so warn
> + * of any new users that actually require GFP_NOWAIT
> + */
> + if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) {
> + goto fail;
> + }
> +
> cond_resched();
> goto retry;
> }
How does this solve anything. It will still eventually fail the NOFAIL
allocation. It might happen slightly later but that doesn't change the
fact it will _fail_. I have referenced a discussion why that is not
really desireable and why Barry wants that addressed. We have added that
WARN_ON_ONCE because we have assumed that people do understand that
NOFAIL without reclaim is just too much to ask. We were wrong there was
one user in the kernel. That one was not too hard to find out because
you can _grep_ for those flags. Scoped APIs make that impossible!
> > But Barry has patches to turn that into BUG because failing NOFAIL
> > allocations is not cool and cause unexpected failures. Have a look at
> > https://lore.kernel.org/all/20240731000155.109583-1-21cnbao@gmail.com/
> >
> > > > I am really
> > > > surprised that we even have PF_MEMALLOC_NORECLAIM in the first place!
> > >
> > > There's use cases for it.
> >
> > Right but there are certain constrains that we need to worry about to
> > have a maintainable code. Scope allocation contrains are really a good
> > feature when that has a well defined semantic. E.g. NOFS, NOIO or
> > NOMEMALLOC (although this is more self inflicted injury exactly because
> > PF_MEMALLOC had a "use case"). NOWAIT scope semantic might seem a good
> > feature but it falls appart on nested NOFAIL allocations! So the flag is
> > usable _only_ if you fully control the whole scoped context. Good luck
> > with that long term! This is fragile, hard to review and even harder to
> > keep working properly. The flag would have been Nacked on that ground.
> > But nobody asked...
>
> It's already implemented, and complaints won't resolve the issue. How
> about making the following change to provide a warning when this new
> flag is used incorrectly?
How does this solve anything at all? It will warn you that your code is
incorrect and what next? Are you going to remove GFP_NOFAIL from the
nested allocation side? NOFAIL is a strong requirement and it is not
used nilly willy. There must have been a very good reason to use it. Are
you going to drop the scope?
Let me repeat, nested NOFAIL allocations will BUG_ON on failure. Your
warning might catch those users slightly earlier when allocation succeed
but that doesn't make those crashes impossible. PF_MEMALLOC_NORECLAIM
might be already merged but this concept is inherently fragile and
should be reverted rather than finding impossible ways around it. And it
should be done before this spreads outside of bcachefs.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-15 6:22 ` Michal Hocko
@ 2024-08-15 6:32 ` Yafang Shao
2024-08-15 6:51 ` Michal Hocko
0 siblings, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-15 6:32 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Thu, Aug 15, 2024 at 2:22 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 15-08-24 11:26:09, Yafang Shao wrote:
> > On Wed, Aug 14, 2024 at 8:43 PM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > > If that's the case, I believe we should at least consider adding the
> > > > following code change to the kernel:
> > >
> > > We already do have that
> > > /*
> > > * All existing users of the __GFP_NOFAIL are blockable, so warn
> > > * of any new users that actually require GFP_NOWAIT
> > > */
> > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> > > goto fail;
> >
> > I don't see a reason to place the `goto fail;` above the
> > `__alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);`
> > line. Since we've already woken up kswapd, it should be acceptable to
> > allocate memory from ALLOC_MIN_RESERVE temporarily. Why not consider
> > implementing the following changes instead?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9ecf99190ea2..598d4df829cd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4386,13 +4386,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned
> > int order,
> > * we always retry
> > */
> > if (gfp_mask & __GFP_NOFAIL) {
> > - /*
> > - * All existing users of the __GFP_NOFAIL are blockable, so warn
> > - * of any new users that actually require GFP_NOWAIT
> > - */
> > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> > - goto fail;
> > -
> > /*
> > * PF_MEMALLOC request from this context is rather bizarre
> > * because we cannot reclaim anything and only can loop waiting
> > @@ -4419,6 +4412,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned
> > int order,
> > if (page)
> > goto got_pg;
> >
> > + /*
> > + * All existing users of the __GFP_NOFAIL are blockable, so warn
> > + * of any new users that actually require GFP_NOWAIT
> > + */
> > + if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) {
> > + goto fail;
> > + }
> > +
> > cond_resched();
> > goto retry;
> > }
>
> How does this solve anything. It will still eventually fail the NOFAIL
> allocation. It might happen slightly later but that doesn't change the
> fact it will _fail_. I have referenced a discussion why that is not
> really desireable and why Barry wants that addressed. We have added that
> WARN_ON_ONCE because we have assumed that people do understand that
> NOFAIL without reclaim is just too much to ask. We were wrong there was
> one user in the kernel. That one was not too hard to find out because
> you can _grep_ for those flags. Scoped APIs make that impossible!
>
> > > But Barry has patches to turn that into BUG because failing NOFAIL
> > > allocations is not cool and cause unexpected failures. Have a look at
> > > https://lore.kernel.org/all/20240731000155.109583-1-21cnbao@gmail.com/
> > >
> > > > > I am really
> > > > > surprised that we even have PF_MEMALLOC_NORECLAIM in the first place!
> > > >
> > > > There's use cases for it.
> > >
> > > Right but there are certain constrains that we need to worry about to
> > > have a maintainable code. Scope allocation contrains are really a good
> > > feature when that has a well defined semantic. E.g. NOFS, NOIO or
> > > NOMEMALLOC (although this is more self inflicted injury exactly because
> > > PF_MEMALLOC had a "use case"). NOWAIT scope semantic might seem a good
> > > feature but it falls appart on nested NOFAIL allocations! So the flag is
> > > usable _only_ if you fully control the whole scoped context. Good luck
> > > with that long term! This is fragile, hard to review and even harder to
> > > keep working properly. The flag would have been Nacked on that ground.
> > > But nobody asked...
> >
> > It's already implemented, and complaints won't resolve the issue. How
> > about making the following change to provide a warning when this new
> > flag is used incorrectly?
>
> How does this solve anything at all? It will warn you that your code is
> incorrect and what next? Are you going to remove GFP_NOFAIL from the
> nested allocation side? NOFAIL is a strong requirement and it is not
> used nilly willy. There must have been a very good reason to use it. Are
> you going to drop the scope?
>
> Let me repeat, nested NOFAIL allocations will BUG_ON on failure.
The key question is whether it actually fails after we've already
woken up kswapd. Have we encountered real issues, or is this just
based on code review? Instead of allowing it to fail, why not allocate
from the reserve memory to prevent this from happening?
> Your
> warning might catch those users slightly earlier when allocation succeed
> but that doesn't make those crashes impossible. PF_MEMALLOC_NORECLAIM
> might be already merged but this concept is inherently fragile and
> should be reverted rather than finding impossible ways around it. And it
> should be done before this spreads outside of bcachefs.
> --
> Michal Hocko
> SUSE Labs
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-15 6:32 ` Yafang Shao
@ 2024-08-15 6:51 ` Michal Hocko
2024-08-16 8:17 ` [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM Michal Hocko
0 siblings, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2024-08-15 6:51 UTC (permalink / raw)
To: Yafang Shao
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Thu 15-08-24 14:32:10, Yafang Shao wrote:
> On Thu, Aug 15, 2024 at 2:22 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > Let me repeat, nested NOFAIL allocations will BUG_ON on failure.
>
> The key question is whether it actually fails after we've already
> woken up kswapd. Have we encountered real issues, or is this just
> based on code review?
Depleting memory to the level that even min memory reserves is
insufficient is not a theoretical concern. OOMing the system is a real
thing!
> Instead of allowing it to fail, why not allocate
> from the reserve memory to prevent this from happening?
Because those memory reserves are shared and potentially required by
other more important users which cannot make forward progress without
them. And even with that, those can get depleted so the failure point is
just a matter of a specific memory consumption pattern. The failure
could be more rare but that also means much harder to predict and test
for. Really there are no ways around non sleeping GFP_NOFAIL, either you
disalow them or you just create a busy loop inside the allocator. We
have chosen the first option because that is a saner model to support.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-15 6:51 ` Michal Hocko
@ 2024-08-16 8:17 ` Michal Hocko
2024-08-16 8:22 ` Christoph Hellwig
2024-08-17 2:29 ` Yafang Shao
0 siblings, 2 replies; 37+ messages in thread
From: Michal Hocko @ 2024-08-16 8:17 UTC (permalink / raw)
To: Yafang Shao, Andrew Morton
Cc: Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
be removed from the tree altogether please? For the full context the
email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
---
From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 16 Aug 2024 10:10:00 +0200
Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
that such a allocation contex is inherently unsafe if the context
doesn't fully control all allocations called from this context. Any
potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM
context would BUG_ON if the allocation would fail.
[1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
include/linux/sched.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f8d150343d42..0c9061d2a8bd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1657,7 +1657,12 @@ extern struct pid *cad_pid;
* I am cleaning dirty pages from some other bdi. */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
-#define PF_MEMALLOC_NORECLAIM 0x00800000 /* All allocation requests will clear __GFP_DIRECT_RECLAIM */
+#define PF_MEMALLOC_NORECLAIM 0x00800000 /* All allocation requests will clear __GFP_DIRECT_RECLAIM.
+ * This is inherently unsafe unless the context fully controls
+ * all allocations used. Any potential __GFP_NOFAIL nested allocation
+ * could BUG_ON as the page allocator doesn't support non-sleeping
+ * __GFP_NOFAIL requests.
+ */
#define PF_MEMALLOC_NOWARN 0x01000000 /* All allocation requests will inherit __GFP_NOWARN */
#define PF__HOLE__02000000 0x02000000
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */
--
2.46.0
--
2.46.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 8:17 ` [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM Michal Hocko
@ 2024-08-16 8:22 ` Christoph Hellwig
2024-08-16 8:54 ` Michal Hocko
2024-08-17 2:29 ` Yafang Shao
1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-16 8:22 UTC (permalink / raw)
To: Michal Hocko
Cc: Yafang Shao, Andrew Morton, Christoph Hellwig, viro, brauner,
jack, david, linux-fsdevel, linux-mm, Kent Overstreet
On Fri, Aug 16, 2024 at 10:17:54AM +0200, Michal Hocko wrote:
> Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> be removed from the tree altogether please? For the full context the
> email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
I don't think that's enough. We just need to kill it given that it was
added without ACKs and despite explicit earlier objections to the API.
That might leave bcachefs with broken inode allocation handling, but so
what - it's marked experimental and there should be other ways to sort
this out.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 8:22 ` Christoph Hellwig
@ 2024-08-16 8:54 ` Michal Hocko
2024-08-16 14:26 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Michal Hocko @ 2024-08-16 8:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Yafang Shao, Andrew Morton, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Fri 16-08-24 01:22:37, Christoph Hellwig wrote:
> On Fri, Aug 16, 2024 at 10:17:54AM +0200, Michal Hocko wrote:
> > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> > be removed from the tree altogether please? For the full context the
> > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
>
> I don't think that's enough. We just need to kill it given that it was
> added without ACKs and despite explicit earlier objections to the API.
Yes, I think we should kill it before it spreads even more but I would
not like to make the existing user just broken. I have zero visibility
and understanding of the bcachefs code but from a quick look at __bch2_new_inode
it shouldn't be really terribly hard to push GFP_NOWAIT flag there
directly. It would require inode_init_always_gfp variant as well (to not
touch all existing callers that do not have any locking requirements but
I do not see any other nested allocations.
So a very quick and untested change would look as follows:
---
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 15fc41e63b6c..7a55167b9133 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -231,9 +231,9 @@ static struct inode *bch2_alloc_inode(struct super_block *sb)
BUG();
}
-static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
+static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
{
- struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS);
+ struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
if (!inode)
return NULL;
@@ -245,7 +245,7 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
mutex_init(&inode->ei_quota_lock);
memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
- if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) {
+ if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
kmem_cache_free(bch2_inode_cache, inode);
return NULL;
}
@@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
*/
static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
{
- struct bch_inode_info *inode =
- memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
- __bch2_new_inode(trans->c));
+ struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
if (unlikely(!inode)) {
- int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM);
+ int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
if (ret && inode) {
__destroy_inode(&inode->v);
kmem_cache_free(bch2_inode_cache, inode);
@@ -328,7 +326,7 @@ __bch2_create(struct mnt_idmap *idmap,
if (ret)
return ERR_PTR(ret);
#endif
- inode = __bch2_new_inode(c);
+ inode = __bch2_new_inode(c, GFP_NOFS);
if (unlikely(!inode)) {
inode = ERR_PTR(-ENOMEM);
goto err;
diff --git a/fs/inode.c b/fs/inode.c
index 86670941884b..95fd67a6cac3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
* These are initializations that need to be done on every inode
* allocation as the fields are not initialised by slab allocation.
*/
-int inode_init_always(struct super_block *sb, struct inode *inode)
+int inode_init_always(struct super_block *sb, struct inode *inode, gfp_t gfp)
{
static const struct inode_operations empty_iops;
static const struct file_operations no_open_fops = {.open = no_open};
@@ -230,14 +230,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
#endif
inode->i_flctx = NULL;
- if (unlikely(security_inode_alloc(inode)))
+ if (unlikely(security_inode_alloc(inode, gfp)))
return -ENOMEM;
this_cpu_inc(nr_inodes);
return 0;
}
-EXPORT_SYMBOL(inode_init_always);
+EXPORT_SYMBOL(inode_init_always_gfp);
void free_inode_nonrcu(struct inode *inode)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..5c613a89718b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
-extern int inode_init_always(struct super_block *, struct inode *);
+extern int inode_init_always(struct super_block *, struct inode *, gfp_t);
+static inline int inode_init_always(struct super_block *, struct inode *)
+{
+ return inode_init_always_gfp(GFP_NOFS);
+}
+
extern void inode_init_once(struct inode *);
extern void address_space_init_once(struct address_space *mapping);
extern struct inode * igrab(struct inode *);
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a2ade0ffe9e7..b08472d64765 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -150,6 +150,6 @@ extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
__used __section(".early_lsm_info.init") \
__aligned(sizeof(unsigned long))
-extern int lsm_inode_alloc(struct inode *inode);
+extern int lsm_inode_alloc(struct inode *inode, gfp_t gfp);
#endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index 1390f1efb4f0..7c6b9b038a0d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -336,7 +336,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode,
struct cred *new);
int security_path_notify(const struct path *path, u64 mask,
unsigned int obj_type);
-int security_inode_alloc(struct inode *inode);
+int security_inode_alloc(struct inode *inode, gfp_t gfp);
void security_inode_free(struct inode *inode);
int security_inode_init_security(struct inode *inode, struct inode *dir,
const struct qstr *qstr,
@@ -769,7 +769,7 @@ static inline int security_path_notify(const struct path *path, u64 mask,
return 0;
}
-static inline int security_inode_alloc(struct inode *inode)
+static inline int security_inode_alloc(struct inode *inode, gfp_t gfp)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index 8cee5b6c6e6d..8634d3bee56f 100644
--- a/security/security.c
+++ b/security/security.c
@@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
*
* Returns 0, or -ENOMEM if memory can't be allocated.
*/
-int lsm_inode_alloc(struct inode *inode)
+int lsm_inode_alloc(struct inode *inode, gfp)
{
if (!lsm_inode_cache) {
inode->i_security = NULL;
return 0;
}
- inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
+ inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
if (inode->i_security == NULL)
return -ENOMEM;
return 0;
@@ -1582,9 +1582,9 @@ int security_path_notify(const struct path *path, u64 mask,
*
* Return: Return 0 if operation was successful.
*/
-int security_inode_alloc(struct inode *inode)
+int security_inode_alloc(struct inode *inode, gfp_t gfp)
{
- int rc = lsm_inode_alloc(inode);
+ int rc = lsm_inode_alloc(inode, gfp);
if (unlikely(rc))
return rc;
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 8:54 ` Michal Hocko
@ 2024-08-16 14:26 ` Christoph Hellwig
2024-08-16 15:57 ` Michal Hocko
2024-08-21 7:30 ` Michal Hocko
2024-08-21 11:44 ` Christoph Hellwig
2 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-16 14:26 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Hellwig, Yafang Shao, Andrew Morton, viro, brauner,
jack, david, linux-fsdevel, linux-mm, Kent Overstreet
On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> Yes, I think we should kill it before it spreads even more but I would
> not like to make the existing user just broken. I have zero visibility
> and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> directly. It would require inode_init_always_gfp variant as well (to not
> touch all existing callers that do not have any locking requirements but
> I do not see any other nested allocations.
I'll probably have to go down into security_inode_alloc as well.
That being said there is no explanation for the behavior here in the
commit logs or the code itself, so who knows.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 14:26 ` Christoph Hellwig
@ 2024-08-16 15:57 ` Michal Hocko
0 siblings, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2024-08-16 15:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Yafang Shao, Andrew Morton, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Fri 16-08-24 07:26:06, Christoph Hellwig wrote:
> On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> > Yes, I think we should kill it before it spreads even more but I would
> > not like to make the existing user just broken. I have zero visibility
> > and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> > it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> > directly. It would require inode_init_always_gfp variant as well (to not
> > touch all existing callers that do not have any locking requirements but
> > I do not see any other nested allocations.
>
> I'll probably have to go down into security_inode_alloc as well.
yes, I have done that as well. I was not sure about
inode_alloc_security. It has alloc in the name but none of the caller
actually allocate from there. lsm_inode_alloc was trivial to update.
> That being said there is no explanation for the behavior here in the
> commit logs or the code itself, so who knows.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 8:17 ` [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-16 8:22 ` Christoph Hellwig
@ 2024-08-17 2:29 ` Yafang Shao
2024-08-19 7:57 ` Michal Hocko
1 sibling, 1 reply; 37+ messages in thread
From: Yafang Shao @ 2024-08-17 2:29 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Christoph Hellwig, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Fri, Aug 16, 2024 at 4:17 PM Michal Hocko <mhocko@suse.com> wrote:
>
> Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> be removed from the tree altogether please? For the full context the
> email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> ---
> From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 16 Aug 2024 10:10:00 +0200
> Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
>
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context. Any
> potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM
> context would BUG_ON if the allocation would fail.
>
> [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
Documenting the risk is a good first step. For this change:
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Even without the PF_MEMALLOC_NORECLAIM flag, the underlying risk
remains, as users can still potentially set both ~__GPF_DIRECT_RECLAIM
and __GFP_NOFAIL. PF_MEMALLOC_NORECLAIM does not create this risk; it
only exacerbates it. The core problem lies in the complexity of the
various GFP flags and the lack of robust management for them. While we
have extensive documentation on these flags, it can still be
confusing, particularly for new developers who haven't yet encountered
real-world issues.
For instance:
* %GFP_NOWAIT is for kernel allocations that should not stall for direct
* reclaim,
#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
Initially, it wasn't clear to me why setting __GFP_KSWAPD_RECLAIM and
__GFP_NOWARN would prevent direct reclaim. It only became apparent
after I studied the entire code path of page allocation. I believe
other newcomers to kernel development may face similar confusion as I
did early in my experience.
The real issue we need to address is improving the management of these
GFP flags, though I don't have a concrete solution at this time.
Since I don't have a better alternative yet, I'm not strongly opposed
to reverting the PF_MEMALLOC_NORECLAIM flag if it is deemed the root
cause of the problem.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-17 2:29 ` Yafang Shao
@ 2024-08-19 7:57 ` Michal Hocko
0 siblings, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2024-08-19 7:57 UTC (permalink / raw)
To: Yafang Shao
Cc: Andrew Morton, Christoph Hellwig, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Sat 17-08-24 10:29:31, Yafang Shao wrote:
> On Fri, Aug 16, 2024 at 4:17 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> > be removed from the tree altogether please? For the full context the
> > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> > ---
> > From f17d36975ec343d9388aa6dbf9ca8d1b58ed09ce Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 16 Aug 2024 10:10:00 +0200
> > Subject: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
> >
> > PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> > that such a allocation contex is inherently unsafe if the context
> > doesn't fully control all allocations called from this context. Any
> > potential __GFP_NOFAIL request from withing PF_MEMALLOC_NORECLAIM
> > context would BUG_ON if the allocation would fail.
> >
> > [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
>
> Documenting the risk is a good first step. For this change:
>
> Acked-by: Yafang Shao <laoar.shao@gmail.com>
>
> Even without the PF_MEMALLOC_NORECLAIM flag, the underlying risk
> remains, as users can still potentially set both ~__GPF_DIRECT_RECLAIM
> and __GFP_NOFAIL.
Users can configure all sorts of nonsensical gfp flags combination. That
is a sad reality of the interface. But we do assume that kernel code is
somehow sane.
Besides that Barry is working on making this less likely by droppong
__GFP_NOFAIL and replace it by GFP_NOFAIL which always includes
__GFP_DIRECT_RECLAIM. Sure nothing will prevent callers from clearing
that flag explicitly but we have no real defense afains broken code.
> PF_MEMALLOC_NORECLAIM does not create this risk; it
> only exacerbates it. The core problem lies in the complexity of the
> various GFP flags and the lack of robust management for them. While we
> have extensive documentation on these flags, it can still be
> confusing, particularly for new developers who haven't yet encountered
> real-world issues.
>
> For instance:
>
> * %GFP_NOWAIT is for kernel allocations that should not stall for direct
> * reclaim,
> #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
>
> Initially, it wasn't clear to me why setting __GFP_KSWAPD_RECLAIM and
> __GFP_NOWARN would prevent direct reclaim. It only became apparent
> after I studied the entire code path of page allocation. I believe
> other newcomers to kernel development may face similar confusion as I
> did early in my experience.
>
> The real issue we need to address is improving the management of these
> GFP flags, though I don't have a concrete solution at this time.
Welcome to the club. Changing this interface is a _huge_ undertaking.
Just have a look how many users of the gfp flags we have in the kernel.
I can tell you from a first hand experience that even minor tweaks are
really hard to make.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 8:54 ` Michal Hocko
2024-08-16 14:26 ` Christoph Hellwig
@ 2024-08-21 7:30 ` Michal Hocko
2024-08-21 11:44 ` Christoph Hellwig
2 siblings, 0 replies; 37+ messages in thread
From: Michal Hocko @ 2024-08-21 7:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Yafang Shao, Andrew Morton, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Fri 16-08-24 10:54:39, Michal Hocko wrote:
> On Fri 16-08-24 01:22:37, Christoph Hellwig wrote:
> > On Fri, Aug 16, 2024 at 10:17:54AM +0200, Michal Hocko wrote:
> > > Andrew, could you merge the following before PF_MEMALLOC_NORECLAIM can
> > > be removed from the tree altogether please? For the full context the
> > > email thread starts here: https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
> >
> > I don't think that's enough. We just need to kill it given that it was
> > added without ACKs and despite explicit earlier objections to the API.
>
> Yes, I think we should kill it before it spreads even more but I would
> not like to make the existing user just broken. I have zero visibility
> and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> directly. It would require inode_init_always_gfp variant as well (to not
> touch all existing callers that do not have any locking requirements but
> I do not see any other nested allocations.
>
> So a very quick and untested change would look as follows:
Anybody managed to give it a more detailed look? I have a fixup for
[...]
> diff --git a/security/security.c b/security/security.c
> index 8cee5b6c6e6d..8634d3bee56f 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
> *
> * Returns 0, or -ENOMEM if memory can't be allocated.
> */
> -int lsm_inode_alloc(struct inode *inode)
> +int lsm_inode_alloc(struct inode *inode, gfp)
this
> {
> if (!lsm_inode_cache) {
> inode->i_security = NULL;
> return 0;
> }
>
> - inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
> + inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
> if (inode->i_security == NULL)
> return -ENOMEM;
> return 0;
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-16 8:54 ` Michal Hocko
2024-08-16 14:26 ` Christoph Hellwig
2024-08-21 7:30 ` Michal Hocko
@ 2024-08-21 11:44 ` Christoph Hellwig
2024-08-21 12:37 ` Michal Hocko
2 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-08-21 11:44 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Hellwig, Yafang Shao, Andrew Morton, viro, brauner,
jack, david, linux-fsdevel, linux-mm, Kent Overstreet
On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> Yes, I think we should kill it before it spreads even more but I would
> not like to make the existing user just broken. I have zero visibility
> and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> directly. >
I don't understand that sentence. You're adding the gfp_t argument to
it, which to mean counts as pushing it there directly.
> It would require inode_init_always_gfp variant as well (to not
> touch all existing callers that do not have any locking requirements but
> I do not see any other nested allocations.
inode_init_always only has 4 callers, so I'd just add the gfp_t
argument. Otherwise this looks good modulo the fix your posted:
Acked-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-21 11:44 ` Christoph Hellwig
@ 2024-08-21 12:37 ` Michal Hocko
2024-08-22 9:09 ` Christian Brauner
0 siblings, 1 reply; 37+ messages in thread
From: Michal Hocko @ 2024-08-21 12:37 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Yafang Shao, Andrew Morton, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Wed 21-08-24 04:44:03, Christoph Hellwig wrote:
> On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> > Yes, I think we should kill it before it spreads even more but I would
> > not like to make the existing user just broken. I have zero visibility
> > and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> > it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> > directly. >
>
> I don't understand that sentence. You're adding the gfp_t argument to
> it, which to mean counts as pushing it there directly.
Sorry, what I meant to say is that pushing GFP_NOWAIT directly seem fine
unless I have missed some hidden corners down the call path which would
require a scope flag to override a hardcoded gfp flag.
> > It would require inode_init_always_gfp variant as well (to not
> > touch all existing callers that do not have any locking requirements but
> > I do not see any other nested allocations.
>
> inode_init_always only has 4 callers, so I'd just add the gfp_t
> argument. Otherwise this looks good modulo the fix your posted:
>
> Acked-by: Christoph Hellwig <hch@lst.de>
Thanks. I will wait for more review and post this as a real patch. I
would really appreciate any help with actual testing.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM
2024-08-21 12:37 ` Michal Hocko
@ 2024-08-22 9:09 ` Christian Brauner
0 siblings, 0 replies; 37+ messages in thread
From: Christian Brauner @ 2024-08-22 9:09 UTC (permalink / raw)
To: Michal Hocko
Cc: Christoph Hellwig, Yafang Shao, Andrew Morton, viro, jack, david,
linux-fsdevel, linux-mm, Kent Overstreet
On Wed, Aug 21, 2024 at 02:37:33PM GMT, Michal Hocko wrote:
> On Wed 21-08-24 04:44:03, Christoph Hellwig wrote:
> > On Fri, Aug 16, 2024 at 10:54:37AM +0200, Michal Hocko wrote:
> > > Yes, I think we should kill it before it spreads even more but I would
> > > not like to make the existing user just broken. I have zero visibility
> > > and understanding of the bcachefs code but from a quick look at __bch2_new_inode
> > > it shouldn't be really terribly hard to push GFP_NOWAIT flag there
> > > directly. >
> >
> > I don't understand that sentence. You're adding the gfp_t argument to
> > it, which to mean counts as pushing it there directly.
>
> Sorry, what I meant to say is that pushing GFP_NOWAIT directly seem fine
> unless I have missed some hidden corners down the call path which would
> require a scope flag to override a hardcoded gfp flag.
>
> > > It would require inode_init_always_gfp variant as well (to not
> > > touch all existing callers that do not have any locking requirements but
> > > I do not see any other nested allocations.
> >
> > inode_init_always only has 4 callers, so I'd just add the gfp_t
> > argument. Otherwise this looks good modulo the fix your posted:
> >
> > Acked-by: Christoph Hellwig <hch@lst.de>
>
> Thanks. I will wait for more review and post this as a real patch. I
> would really appreciate any help with actual testing.
Ugh, I really don't like that we have to add a flag argument especially
for an api that's broken but fine.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-08-14 7:33 ` Yafang Shao
@ 2024-09-01 20:24 ` Vlastimil Babka
2024-09-01 20:42 ` Kent Overstreet
0 siblings, 1 reply; 37+ messages in thread
From: Vlastimil Babka @ 2024-09-01 20:24 UTC (permalink / raw)
To: Yafang Shao, Christoph Hellwig
Cc: akpm, viro, brauner, jack, david, linux-fsdevel, linux-mm,
Kent Overstreet, Michal Hocko
On 8/14/24 09:33, Yafang Shao wrote:
> On Wed, Aug 14, 2024 at 1:27 PM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> > There are already memalloc_noreclaim_{save,restore} which imply __GFP_MEMALLOC:
>> >
>> > memalloc_noreclaim_save - Marks implicit __GFP_MEMALLOC scope.
>>
>> .. and those are horrible misnamed :(
Yes I agree, sorry about that.
> What about renaming it to memalloc_memalloc_save ?
While it looks weird, it could be indeed better than the current name. It's
not obvious, so it should force the user to read the description.
memalloc_noreclaim_save() might look too obviously "this disables reclaim"
but it's misleading as that's not the full story of PF_MEMALLOC.
>>
>> If we can't even keep our APIs consistently name, who is supposed
>> to understand all this?
>>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] mm: Add memalloc_nowait_{save,restore}
2024-09-01 20:24 ` Vlastimil Babka
@ 2024-09-01 20:42 ` Kent Overstreet
0 siblings, 0 replies; 37+ messages in thread
From: Kent Overstreet @ 2024-09-01 20:42 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Yafang Shao, Christoph Hellwig, akpm, viro, brauner, jack, david,
linux-fsdevel, linux-mm, Michal Hocko
On Sun, Sep 01, 2024 at 10:24:10PM GMT, Vlastimil Babka wrote:
> On 8/14/24 09:33, Yafang Shao wrote:
> > On Wed, Aug 14, 2024 at 1:27 PM Christoph Hellwig <hch@infradead.org> wrote:
> >>
> >> > There are already memalloc_noreclaim_{save,restore} which imply __GFP_MEMALLOC:
> >> >
> >> > memalloc_noreclaim_save - Marks implicit __GFP_MEMALLOC scope.
> >>
> >> .. and those are horrible misnamed :(
>
> Yes I agree, sorry about that.
>
> > What about renaming it to memalloc_memalloc_save ?
>
> While it looks weird, it could be indeed better than the current name. It's
> not obvious, so it should force the user to read the description.
> memalloc_noreclaim_save() might look too obviously "this disables reclaim"
> but it's misleading as that's not the full story of PF_MEMALLOC.
I was actually thinking about killing the helpers in favor of just
memalloc_flags_save() - I don't think the extra names get us anything
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2024-09-01 20:42 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 9:05 [PATCH 0/2] mm: Add readahead support for IOCB_NOWAIT Yafang Shao
2024-08-12 9:05 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Yafang Shao
2024-08-12 11:37 ` Christoph Hellwig
2024-08-12 12:59 ` Yafang Shao
2024-08-12 13:21 ` Christoph Hellwig
2024-08-13 2:09 ` Yafang Shao
2024-08-14 5:27 ` Christoph Hellwig
2024-08-14 7:33 ` Yafang Shao
2024-09-01 20:24 ` Vlastimil Babka
2024-09-01 20:42 ` Kent Overstreet
2024-08-14 7:42 ` Michal Hocko
2024-08-14 8:12 ` Yafang Shao
2024-08-14 12:43 ` Michal Hocko
2024-08-15 3:26 ` Yafang Shao
2024-08-15 6:22 ` Michal Hocko
2024-08-15 6:32 ` Yafang Shao
2024-08-15 6:51 ` Michal Hocko
2024-08-16 8:17 ` [PATCH] mm: document risk of PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-16 8:22 ` Christoph Hellwig
2024-08-16 8:54 ` Michal Hocko
2024-08-16 14:26 ` Christoph Hellwig
2024-08-16 15:57 ` Michal Hocko
2024-08-21 7:30 ` Michal Hocko
2024-08-21 11:44 ` Christoph Hellwig
2024-08-21 12:37 ` Michal Hocko
2024-08-22 9:09 ` Christian Brauner
2024-08-17 2:29 ` Yafang Shao
2024-08-19 7:57 ` Michal Hocko
2024-08-12 16:48 ` [PATCH 1/2] mm: Add memalloc_nowait_{save,restore} Kent Overstreet
2024-08-14 5:24 ` Christoph Hellwig
2024-08-14 0:28 ` Dave Chinner
2024-08-14 2:19 ` Yafang Shao
2024-08-14 5:42 ` Dave Chinner
2024-08-14 7:32 ` Yafang Shao
2024-08-15 2:54 ` Dave Chinner
2024-08-15 3:38 ` Yafang Shao
2024-08-12 9:05 ` [PATCH 2/2] mm: allow read-ahead with IOCB_NOWAIT set Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).