* [PATCH 1/3] btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
2024-07-05 6:15 [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Qu Wenruo
@ 2024-07-05 6:15 ` Qu Wenruo
2024-07-05 6:15 ` [PATCH 2/3] btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap() Qu Wenruo
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-05 6:15 UTC (permalink / raw)
To: linux-btrfs
Although extent buffer is a critical infrastructure, I'm not convinced
that it really needs to go __GFP_NOFAIL.
As an experimental to reduce the __GFP_NOFAIL usage, for
CONFIG_BTRFS_DEBUG builds remove the __GFP_NOFAIL flag for
alloc_extent_buffer(), so we can test this change enough before pushing
it to end users.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index aa7f8148cd0d..d43a3a9fc650 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2672,9 +2672,13 @@ static struct extent_buffer *
__alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
unsigned long len)
{
+ const bool debug = IS_ENABLED(CONFIG_BTRFS_DEBUG);
struct extent_buffer *eb = NULL;
- eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
+ eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS |
+ (!debug ? __GFP_NOFAIL : 0));
+ if (!eb)
+ return NULL;
eb->start = start;
eb->len = len;
eb->fs_info = fs_info;
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/3] btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
2024-07-05 6:15 [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Qu Wenruo
2024-07-05 6:15 ` [PATCH 1/3] btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer() Qu Wenruo
@ 2024-07-05 6:15 ` Qu Wenruo
2024-07-05 6:15 ` [PATCH 3/3] btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array() Qu Wenruo
2024-07-05 14:55 ` [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Josef Bacik
3 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-05 6:15 UTC (permalink / raw)
To: linux-btrfs
The gfp_flag passed into filemap_add_folio() is utilized by two
locations:
- mem_cgroup_charge()
I'm not confident mem cgroup is going to handle the different
combinations of folio order and __GFP_NOFAIL that well.
In fact I'm already hitting various soft lockup in mem cgroup related
code during larger metadata folio tests.
- Xarray split
I believe this is mostly fine.
Although extent buffer allocation is very critical, we can still handle
the errors, the worst case is to abort the current transaction.
So to enable more testing and hopefully to provide a smooth transaction,
for CONFIG_BTRFS_DEBUG builds do not use __GFP_NOFAIL flag.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d43a3a9fc650..97c3f272fcaa 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2966,6 +2966,7 @@ static int check_eb_alignment(struct btrfs_fs_info *fs_info, u64 start)
* and @found_eb_ret would be updated.
* Return -EAGAIN if the filemap has an existing folio but with different size
* than @eb.
+ * Return -ENOMEM if the memory allocation failed.
* The caller needs to free the existing folios and retry using the same order.
*/
static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
@@ -2977,6 +2978,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
struct address_space *mapping = fs_info->btree_inode->i_mapping;
const unsigned long index = eb->start >> PAGE_SHIFT;
struct folio *existing_folio = NULL;
+ const bool debug = IS_ENABLED(CONFIG_BTRFS_DEBUG);
int ret;
ASSERT(found_eb_ret);
@@ -2986,10 +2988,13 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
retry:
ret = filemap_add_folio(mapping, eb->folios[i], index + i,
- GFP_NOFS | __GFP_NOFAIL);
+ GFP_NOFS | (!debug ? __GFP_NOFAIL : 0));
if (!ret)
goto finish;
+ if (unlikely(ret == -ENOMEM))
+ return ret;
+ ASSERT(ret == -EEXIST);
existing_folio = filemap_lock_folio(mapping, index + i);
/* The page cache only exists for a very short time, just retry. */
if (IS_ERR(existing_folio)) {
@@ -3126,6 +3131,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
ASSERT(existing_eb);
goto out;
}
+ if (unlikely(ret == -ENOMEM))
+ goto out;
/*
* TODO: Special handling for a corner case where the order of
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/3] btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
2024-07-05 6:15 [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Qu Wenruo
2024-07-05 6:15 ` [PATCH 1/3] btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer() Qu Wenruo
2024-07-05 6:15 ` [PATCH 2/3] btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap() Qu Wenruo
@ 2024-07-05 6:15 ` Qu Wenruo
2024-07-05 14:55 ` [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Josef Bacik
3 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-05 6:15 UTC (permalink / raw)
To: linux-btrfs
Since all callers of this function is already properly handling the
allocation error, and for the worst case we're just going to abort the
current transaction, I do no believe we need __GFP_NOFAIL here.
So to enable more testing and hopefully to provide a smooth transaction,
for CONFIG_BTRFS_DEBUG builds do not use __GFP_NOFAIL flag.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 97c3f272fcaa..dadf0da171bf 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -695,7 +695,8 @@ int btrfs_alloc_folio_array(unsigned int nr_folios, struct folio **folio_array)
int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
bool nofail)
{
- const gfp_t gfp = nofail ? (GFP_NOFS | __GFP_NOFAIL) : GFP_NOFS;
+ const bool debug = IS_ENABLED(CONFIG_BTRFS_DEBUG);
+ const gfp_t gfp = GFP_NOFS | ((!debug && nofail) ? __GFP_NOFAIL : 0);
unsigned int allocated;
for (allocated = 0; allocated < nr_pages;) {
--
2.45.2
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds
2024-07-05 6:15 [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Qu Wenruo
` (2 preceding siblings ...)
2024-07-05 6:15 ` [PATCH 3/3] btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array() Qu Wenruo
@ 2024-07-05 14:55 ` Josef Bacik
2024-07-05 22:40 ` Qu Wenruo
2024-07-30 13:55 ` David Sterba
3 siblings, 2 replies; 10+ messages in thread
From: Josef Bacik @ 2024-07-05 14:55 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Jul 05, 2024 at 03:45:37PM +0930, Qu Wenruo wrote:
> This patchset removes all __GFP_NOFAIL flags usage inside btrfs for
> DEBUG builds.
>
> There are 3 call sites utilizing __GFP_NOFAIL:
>
> - __alloc_extent_buffer()
> It's for the extent_buffer structure allocation.
> All callers are already handling the errors.
>
> - attach_eb_folio_to_filemap()
> It's for the filemap_add_folio() call, the flag is also passed to mem
> cgroup, which I suspect is not handling larger folio and __GFP_NOFAIL
> correctly, as I'm hitting soft lockups when testing larger folios
>
> New error handling is added.
>
> - btrfs_alloc_folio_array()
> This is for page allocation for extent buffers.
> All callers are already handling the errors.
>
> Furthermore, to enable more testing while not affecting end users, the
> change is only implemented for DEBUG builds.
>
> Qu Wenruo (3):
> btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
> btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
> btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
The reason I want to leave NOFAIL is because in a cgroup memory constrained
environment we could get an errant ENOMEM on some sort of metadata operation,
which then gets turned into an aborted transaction. I don't want a memory
constrained cgroup flipping the whole file system read only because it got an
ENOMEM in a place where we have no choice but to abort the transaction.
If we could eliminate that possibility then hooray, but that's not actually
possible, because any COW for a multi-modification case (think finish ordered
io) could result in an ENOMEM and thus a transaction abort. We need to live
with NOFAIL for these cases. Thanks,
Josef
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds
2024-07-05 14:55 ` [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Josef Bacik
@ 2024-07-05 22:40 ` Qu Wenruo
2024-07-30 13:55 ` David Sterba
1 sibling, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-07-05 22:40 UTC (permalink / raw)
To: Josef Bacik, Qu Wenruo; +Cc: linux-btrfs
在 2024/7/6 00:25, Josef Bacik 写道:
> On Fri, Jul 05, 2024 at 03:45:37PM +0930, Qu Wenruo wrote:
>> This patchset removes all __GFP_NOFAIL flags usage inside btrfs for
>> DEBUG builds.
>>
>> There are 3 call sites utilizing __GFP_NOFAIL:
>>
>> - __alloc_extent_buffer()
>> It's for the extent_buffer structure allocation.
>> All callers are already handling the errors.
>>
>> - attach_eb_folio_to_filemap()
>> It's for the filemap_add_folio() call, the flag is also passed to mem
>> cgroup, which I suspect is not handling larger folio and __GFP_NOFAIL
>> correctly, as I'm hitting soft lockups when testing larger folios
>>
>> New error handling is added.
>>
>> - btrfs_alloc_folio_array()
>> This is for page allocation for extent buffers.
>> All callers are already handling the errors.
>>
>> Furthermore, to enable more testing while not affecting end users, the
>> change is only implemented for DEBUG builds.
>>
>> Qu Wenruo (3):
>> btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
>> btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
>> btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
>
> The reason I want to leave NOFAIL is because in a cgroup memory constrained
> environment we could get an errant ENOMEM on some sort of metadata operation,
> which then gets turned into an aborted transaction. I don't want a memory
> constrained cgroup flipping the whole file system read only because it got an
> ENOMEM in a place where we have no choice but to abort the transaction.
Well, I'm already seeing mem_cgroup_charge() soft locks up, possible due
to the _NOFAIL flag and larger folios at filemap_add_folio().
Thus the _NOFAIL flag is not only affecting page allocation, but also
passed to mem cgroup related code, and that can already be problematic.
>
> If we could eliminate that possibility then hooray, but that's not actually
> possible, because any COW for a multi-modification case (think finish ordered
> io) could result in an ENOMEM and thus a transaction abort. We need to live
> with NOFAIL for these cases. Thanks,
In that cgroup controlled case, I'm wondering how it would even handle
btrfs extent buffer allocation.
Wouldn't all the eb charged to certain cgroup meanwhile the eb pages are
a shared resource for all btrfs operations?
If so, I'd say the mem cgroup on btree inode is already problematic, and
we should eliminate the behavior other than using _NOFAIL to workaround it.
Thanks,
Qu
>
> Josef
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds
2024-07-05 14:55 ` [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds Josef Bacik
2024-07-05 22:40 ` Qu Wenruo
@ 2024-07-30 13:55 ` David Sterba
2024-07-30 22:17 ` Qu Wenruo
1 sibling, 1 reply; 10+ messages in thread
From: David Sterba @ 2024-07-30 13:55 UTC (permalink / raw)
To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs
On Fri, Jul 05, 2024 at 10:55:43AM -0400, Josef Bacik wrote:
> On Fri, Jul 05, 2024 at 03:45:37PM +0930, Qu Wenruo wrote:
> > This patchset removes all __GFP_NOFAIL flags usage inside btrfs for
> > DEBUG builds.
> >
> > There are 3 call sites utilizing __GFP_NOFAIL:
> >
> > - __alloc_extent_buffer()
> > It's for the extent_buffer structure allocation.
> > All callers are already handling the errors.
> >
> > - attach_eb_folio_to_filemap()
> > It's for the filemap_add_folio() call, the flag is also passed to mem
> > cgroup, which I suspect is not handling larger folio and __GFP_NOFAIL
> > correctly, as I'm hitting soft lockups when testing larger folios
> >
> > New error handling is added.
> >
> > - btrfs_alloc_folio_array()
> > This is for page allocation for extent buffers.
> > All callers are already handling the errors.
> >
> > Furthermore, to enable more testing while not affecting end users, the
> > change is only implemented for DEBUG builds.
> >
> > Qu Wenruo (3):
> > btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
> > btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
> > btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
>
> The reason I want to leave NOFAIL is because in a cgroup memory constrained
> environment we could get an errant ENOMEM on some sort of metadata operation,
> which then gets turned into an aborted transaction. I don't want a memory
> constrained cgroup flipping the whole file system read only because it got an
> ENOMEM in a place where we have no choice but to abort the transaction.
>
> If we could eliminate that possibility then hooray, but that's not actually
> possible, because any COW for a multi-modification case (think finish ordered
> io) could result in an ENOMEM and thus a transaction abort. We need to live
> with NOFAIL for these cases. Thanks,
I agree with keeping NOFAIL. Please add the above as a comment to
btrfs_alloc_folio_array().
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds
2024-07-30 13:55 ` David Sterba
@ 2024-07-30 22:17 ` Qu Wenruo
2024-07-31 17:13 ` David Sterba
0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2024-07-30 22:17 UTC (permalink / raw)
To: dsterba, Josef Bacik; +Cc: linux-btrfs
在 2024/7/30 23:25, David Sterba 写道:
> On Fri, Jul 05, 2024 at 10:55:43AM -0400, Josef Bacik wrote:
>> On Fri, Jul 05, 2024 at 03:45:37PM +0930, Qu Wenruo wrote:
>>> This patchset removes all __GFP_NOFAIL flags usage inside btrfs for
>>> DEBUG builds.
>>>
>>> There are 3 call sites utilizing __GFP_NOFAIL:
>>>
>>> - __alloc_extent_buffer()
>>> It's for the extent_buffer structure allocation.
>>> All callers are already handling the errors.
>>>
>>> - attach_eb_folio_to_filemap()
>>> It's for the filemap_add_folio() call, the flag is also passed to mem
>>> cgroup, which I suspect is not handling larger folio and __GFP_NOFAIL
>>> correctly, as I'm hitting soft lockups when testing larger folios
>>>
>>> New error handling is added.
>>>
>>> - btrfs_alloc_folio_array()
>>> This is for page allocation for extent buffers.
>>> All callers are already handling the errors.
>>>
>>> Furthermore, to enable more testing while not affecting end users, the
>>> change is only implemented for DEBUG builds.
>>>
>>> Qu Wenruo (3):
>>> btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
>>> btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
>>> btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
>>
>> The reason I want to leave NOFAIL is because in a cgroup memory constrained
>> environment we could get an errant ENOMEM on some sort of metadata operation,
>> which then gets turned into an aborted transaction. I don't want a memory
>> constrained cgroup flipping the whole file system read only because it got an
>> ENOMEM in a place where we have no choice but to abort the transaction.
>>
>> If we could eliminate that possibility then hooray, but that's not actually
>> possible, because any COW for a multi-modification case (think finish ordered
>> io) could result in an ENOMEM and thus a transaction abort. We need to live
>> with NOFAIL for these cases. Thanks,
>
> I agree with keeping NOFAIL. Please add the above as a comment to
> btrfs_alloc_folio_array().
That will soon no longer be a problem.
The cgroup guys are fine if certain inode should not be limited by mem
cgroup, so I already have patches to use root mem cgroup so that it will
not be charged at all.
https://lore.kernel.org/linux-mm/6a9ba2c8e70c7b5c4316404612f281a031f847da.1721384771.git.wqu@suse.com/
Furthermore, using NOFAIL just to workaround mem cgroup looks like an
abuse to me.
Thanks,
Qu
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds
2024-07-30 22:17 ` Qu Wenruo
@ 2024-07-31 17:13 ` David Sterba
2024-08-04 9:22 ` Qu Wenruo
0 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2024-07-31 17:13 UTC (permalink / raw)
To: Qu Wenruo; +Cc: dsterba, Josef Bacik, linux-btrfs
On Wed, Jul 31, 2024 at 07:47:51AM +0930, Qu Wenruo wrote:
>
>
> 在 2024/7/30 23:25, David Sterba 写道:
> > On Fri, Jul 05, 2024 at 10:55:43AM -0400, Josef Bacik wrote:
> >> On Fri, Jul 05, 2024 at 03:45:37PM +0930, Qu Wenruo wrote:
> >>> This patchset removes all __GFP_NOFAIL flags usage inside btrfs for
> >>> DEBUG builds.
> >>>
> >>> There are 3 call sites utilizing __GFP_NOFAIL:
> >>>
> >>> - __alloc_extent_buffer()
> >>> It's for the extent_buffer structure allocation.
> >>> All callers are already handling the errors.
> >>>
> >>> - attach_eb_folio_to_filemap()
> >>> It's for the filemap_add_folio() call, the flag is also passed to mem
> >>> cgroup, which I suspect is not handling larger folio and __GFP_NOFAIL
> >>> correctly, as I'm hitting soft lockups when testing larger folios
> >>>
> >>> New error handling is added.
> >>>
> >>> - btrfs_alloc_folio_array()
> >>> This is for page allocation for extent buffers.
> >>> All callers are already handling the errors.
> >>>
> >>> Furthermore, to enable more testing while not affecting end users, the
> >>> change is only implemented for DEBUG builds.
> >>>
> >>> Qu Wenruo (3):
> >>> btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
> >>> btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
> >>> btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
> >>
> >> The reason I want to leave NOFAIL is because in a cgroup memory constrained
> >> environment we could get an errant ENOMEM on some sort of metadata operation,
> >> which then gets turned into an aborted transaction. I don't want a memory
> >> constrained cgroup flipping the whole file system read only because it got an
> >> ENOMEM in a place where we have no choice but to abort the transaction.
> >>
> >> If we could eliminate that possibility then hooray, but that's not actually
> >> possible, because any COW for a multi-modification case (think finish ordered
> >> io) could result in an ENOMEM and thus a transaction abort. We need to live
> >> with NOFAIL for these cases. Thanks,
> >
> > I agree with keeping NOFAIL. Please add the above as a comment to
> > btrfs_alloc_folio_array().
>
> That will soon no longer be a problem.
>
> The cgroup guys are fine if certain inode should not be limited by mem
> cgroup, so I already have patches to use root mem cgroup so that it will
> not be charged at all.
>
> https://lore.kernel.org/linux-mm/6a9ba2c8e70c7b5c4316404612f281a031f847da.1721384771.git.wqu@suse.com/
>
> Furthermore, using NOFAIL just to workaround mem cgroup looks like an
> abuse to me.
It's not just because of cgroup, the nofail semantics for metadata
means that MM subsystem should try to get some memory instead of
failing, where the filesystem operation can wait a bit. What josef
described regarding transaction aborts applies in general.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] btrfs: remove __GFP_NOFAIL usage for debug builds
2024-07-31 17:13 ` David Sterba
@ 2024-08-04 9:22 ` Qu Wenruo
0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2024-08-04 9:22 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: Josef Bacik, linux-btrfs
在 2024/8/1 02:43, David Sterba 写道:
> On Wed, Jul 31, 2024 at 07:47:51AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/7/30 23:25, David Sterba 写道:
>>> On Fri, Jul 05, 2024 at 10:55:43AM -0400, Josef Bacik wrote:
>>>> On Fri, Jul 05, 2024 at 03:45:37PM +0930, Qu Wenruo wrote:
>>>>> This patchset removes all __GFP_NOFAIL flags usage inside btrfs for
>>>>> DEBUG builds.
>>>>>
>>>>> There are 3 call sites utilizing __GFP_NOFAIL:
>>>>>
>>>>> - __alloc_extent_buffer()
>>>>> It's for the extent_buffer structure allocation.
>>>>> All callers are already handling the errors.
>>>>>
>>>>> - attach_eb_folio_to_filemap()
>>>>> It's for the filemap_add_folio() call, the flag is also passed to mem
>>>>> cgroup, which I suspect is not handling larger folio and __GFP_NOFAIL
>>>>> correctly, as I'm hitting soft lockups when testing larger folios
>>>>>
>>>>> New error handling is added.
>>>>>
>>>>> - btrfs_alloc_folio_array()
>>>>> This is for page allocation for extent buffers.
>>>>> All callers are already handling the errors.
>>>>>
>>>>> Furthermore, to enable more testing while not affecting end users, the
>>>>> change is only implemented for DEBUG builds.
>>>>>
>>>>> Qu Wenruo (3):
>>>>> btrfs: do not use __GFP_NOFAIL flag for __alloc_extent_buffer()
>>>>> btrfs: do not use __GFP_NOFAIL flag for attach_eb_folio_to_filemap()
>>>>> btrfs: do not use __GFP_NOFAIL flag for btrfs_alloc_folio_array()
>>>>
>>>> The reason I want to leave NOFAIL is because in a cgroup memory constrained
>>>> environment we could get an errant ENOMEM on some sort of metadata operation,
>>>> which then gets turned into an aborted transaction. I don't want a memory
>>>> constrained cgroup flipping the whole file system read only because it got an
>>>> ENOMEM in a place where we have no choice but to abort the transaction.
>>>>
>>>> If we could eliminate that possibility then hooray, but that's not actually
>>>> possible, because any COW for a multi-modification case (think finish ordered
>>>> io) could result in an ENOMEM and thus a transaction abort. We need to live
>>>> with NOFAIL for these cases. Thanks,
>>>
>>> I agree with keeping NOFAIL. Please add the above as a comment to
>>> btrfs_alloc_folio_array().
>>
>> That will soon no longer be a problem.
>>
>> The cgroup guys are fine if certain inode should not be limited by mem
>> cgroup, so I already have patches to use root mem cgroup so that it will
>> not be charged at all.
>>
>> https://lore.kernel.org/linux-mm/6a9ba2c8e70c7b5c4316404612f281a031f847da.1721384771.git.wqu@suse.com/
>>
>> Furthermore, using NOFAIL just to workaround mem cgroup looks like an
>> abuse to me.
>
> It's not just because of cgroup, the nofail semantics for metadata
> means that MM subsystem should try to get some memory instead of
> failing, where the filesystem operation can wait a bit. What josef
> described regarding transaction aborts applies in general.
>
If you are only talking about the original patch, yes, we should not
remove NOFAIL for folio allocation, at least for now.
But this patchset is already a little outdated, the latest way to remove
NOFAIL is to only remove the usage for filemap_add_folio().
Filemap_add_folio() is only doing two things:
- Cgroup charge
This will be addressed by manually using root memcgroup, so that
we will never trigger cgroup charge anymore.
- Xarray split (aka new slot allocation)
This is only allocating some xarray slots, if that fails I do not
really believe NOFAIL can do anything much better.
So all your argument on the NOFAIL and metadata allocation no longer stands.
Because I'm not even going to touch folio allocation part (even I still
believe we should not go NOFAIL for metadata folio allocation).
Thanks,
Qu
^ permalink raw reply [flat|nested] 10+ messages in thread