Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method)
       [not found]   ` <71d723c9-8f36-4fd1-bea7-7d962da465e2@gmx.com>
@ 2023-11-27  5:10     ` Qu Wenruo
  2023-11-27 16:19       ` Josef Bacik
  2023-11-28 16:26       ` David Sterba
  0 siblings, 2 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-11-27  5:10 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs, Linux FS Devel

On 2023/11/23 06:33, Qu Wenruo wrote:
[...]
>> I wonder if we still can keep the __GFP_NOFAIL for the fallback
>> allocation, it's there right now and seems to work on sysmtems under
>> stress and does not cause random failures due to ENOMEM.
>>
> Oh, I forgot the __NOFAIL gfp flags, that's not hard to fix, just
> re-introduce the gfp flags to btrfs_alloc_page_array().

BTW, I think it's a good time to start a new discussion on whether we
should go __GFP_NOFAIL.
(Although I have updated the patch to keep the GFP_NOFAIL behavior)

I totally understand that we need some memory for tree block during
transaction commitment and other critical sections.

And it's not that uncommon to see __GFP_NOFAIL usage in other mainstream
filesystems.

But my concern is, we also have a lot of memory allocation which can
lead to a lot of problems either, like btrfs_csum_one_bio() or even
join_transaction().

I doubt if btrfs (or any other filesystems) would be to blamed if we're
really running out of memory.
Should the memory hungry user space programs to be firstly killed far
before we failed to allocate memory?

Furthermore, at least for btrfs, I don't think we would hit a situation
where memory allocation failure for metadata would lead to any data
corruption.
The worst case is we hit transaction abort, and the fs flips RO.

Thus I'm wondering if we really need __NOFAIL for btrfs?

Thanks,
Qu
>
> Thanks,
> Qu
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method)
  2023-11-27  5:10     ` Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method) Qu Wenruo
@ 2023-11-27 16:19       ` Josef Bacik
  2023-11-28 16:26       ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2023-11-27 16:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Linux FS Devel

On Mon, Nov 27, 2023 at 03:40:41PM +1030, Qu Wenruo wrote:
> On 2023/11/23 06:33, Qu Wenruo wrote:
> [...]
> > > I wonder if we still can keep the __GFP_NOFAIL for the fallback
> > > allocation, it's there right now and seems to work on sysmtems under
> > > stress and does not cause random failures due to ENOMEM.
> > > 
> > Oh, I forgot the __NOFAIL gfp flags, that's not hard to fix, just
> > re-introduce the gfp flags to btrfs_alloc_page_array().
> 
> BTW, I think it's a good time to start a new discussion on whether we
> should go __GFP_NOFAIL.
> (Although I have updated the patch to keep the GFP_NOFAIL behavior)
> 
> I totally understand that we need some memory for tree block during
> transaction commitment and other critical sections.
> 
> And it's not that uncommon to see __GFP_NOFAIL usage in other mainstream
> filesystems.
> 
> But my concern is, we also have a lot of memory allocation which can
> lead to a lot of problems either, like btrfs_csum_one_bio() or even
> join_transaction().
> 
> I doubt if btrfs (or any other filesystems) would be to blamed if we're
> really running out of memory.
> Should the memory hungry user space programs to be firstly killed far
> before we failed to allocate memory?
> 
> 
> Furthermore, at least for btrfs, I don't think we would hit a situation
> where memory allocation failure for metadata would lead to any data
> corruption.
> The worst case is we hit transaction abort, and the fs flips RO.
> 
> Thus I'm wondering if we really need __NOFAIL for btrfs?

It's my preference that we don't use GFP_NOFAIL at all, anywhere.  We don't
really have this option for some things, mostly lock_extent/unlock_extent, but
for extent buffers we check for errors here and generally can safely handle
ENOMEM in these cases.  I would prefer to drop it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method)
  2023-11-27  5:10     ` Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method) Qu Wenruo
  2023-11-27 16:19       ` Josef Bacik
@ 2023-11-28 16:26       ` David Sterba
  2023-11-28 20:06         ` Qu Wenruo
  1 sibling, 1 reply; 4+ messages in thread
From: David Sterba @ 2023-11-28 16:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Linux FS Devel

On Mon, Nov 27, 2023 at 03:40:41PM +1030, Qu Wenruo wrote:
> On 2023/11/23 06:33, Qu Wenruo wrote:
> [...]
> >> I wonder if we still can keep the __GFP_NOFAIL for the fallback
> >> allocation, it's there right now and seems to work on sysmtems under
> >> stress and does not cause random failures due to ENOMEM.
> >>
> > Oh, I forgot the __NOFAIL gfp flags, that's not hard to fix, just
> > re-introduce the gfp flags to btrfs_alloc_page_array().
> 
> BTW, I think it's a good time to start a new discussion on whether we
> should go __GFP_NOFAIL.
> (Although I have updated the patch to keep the GFP_NOFAIL behavior)
> 
> I totally understand that we need some memory for tree block during
> transaction commitment and other critical sections.
> 
> And it's not that uncommon to see __GFP_NOFAIL usage in other mainstream
> filesystems.

The use of NOFAIL is either carefuly evaluated or it's there for
historical reasons. The comment for the flag says that,
https://elixir.bootlin.com/linux/latest/source/include/linux/gfp_types.h#L198
and I know MM people see the flag as problematic and that it should not
be used if possible.

> But my concern is, we also have a lot of memory allocation which can
> lead to a lot of problems either, like btrfs_csum_one_bio() or even
> join_transaction().

While I agree that there are many places that can fail due to memory
allocations, the extent buffer requires whole 4 pages, other structures
could be taken from the generic slabs or our named caches. The latter
has lower chance to fail.

> I doubt if btrfs (or any other filesystems) would be to blamed if we're
> really running out of memory.

Well, people blame btrfs for everything.

> Should the memory hungry user space programs to be firstly killed far
> before we failed to allocate memory?

That's up to the allocator and I think it does a good job of providing
the memory to kernel rather than to user space programs.

We do the critical allocations as GFP_NOFS which so far provides the "do
not fail" guarantees. It's a long going discussion,
https://lwn.net/Articles/653573/ (2015). We can let many allocations
fail with a fallback, but still a lot of them would lead to transaction
abort. And as Josef said, there are some that can't fail because they're
too deep or there's no clear exit path.

> Furthermore, at least for btrfs, I don't think we would hit a situation
> where memory allocation failure for metadata would lead to any data
> corruption.
> The worst case is we hit transaction abort, and the fs flips RO.

Yeah, corruption can't happen as long as we have all the error handling
in place and the transaction abort as the ultimate fallback.

> Thus I'm wondering if we really need __NOFAIL for btrfs?

It's hard to say if or when the NOFAIL semantics actually apply. Let's
say there are applications doing metadata operations, the system is
under load, memory is freed slowly by writing data etc. Application that
waits inside the eb allocation will continue eventually. Without the
NOFAIL it would exit early.

As a middle ground, we may want something like "try hard" that would not
fail too soon but it could eventually. That's __GFP_RETRY_MAYFAIL .

Right now there are several changes around the extent buffers, I'd like
do the conversion first and then replace/drop the NOFAIL flag so we
don't mix too many changes in one release. The extent buffers are
critical so one step a time, with lots of testing.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method)
  2023-11-28 16:26       ` David Sterba
@ 2023-11-28 20:06         ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-11-28 20:06 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs, Linux FS Devel



On 2023/11/29 02:56, David Sterba wrote:
> On Mon, Nov 27, 2023 at 03:40:41PM +1030, Qu Wenruo wrote:
>> On 2023/11/23 06:33, Qu Wenruo wrote:
>> [...]
>>>> I wonder if we still can keep the __GFP_NOFAIL for the fallback
>>>> allocation, it's there right now and seems to work on sysmtems under
>>>> stress and does not cause random failures due to ENOMEM.
>>>>
>>> Oh, I forgot the __NOFAIL gfp flags, that's not hard to fix, just
>>> re-introduce the gfp flags to btrfs_alloc_page_array().
>>
>> BTW, I think it's a good time to start a new discussion on whether we
>> should go __GFP_NOFAIL.
>> (Although I have updated the patch to keep the GFP_NOFAIL behavior)
>>
>> I totally understand that we need some memory for tree block during
>> transaction commitment and other critical sections.
>>
>> And it's not that uncommon to see __GFP_NOFAIL usage in other mainstream
>> filesystems.
> 
> The use of NOFAIL is either carefuly evaluated or it's there for
> historical reasons. The comment for the flag says that,
> https://elixir.bootlin.com/linux/latest/source/include/linux/gfp_types.h#L198
> and I know MM people see the flag as problematic and that it should not
> be used if possible.
> 
>> But my concern is, we also have a lot of memory allocation which can
>> lead to a lot of problems either, like btrfs_csum_one_bio() or even
>> join_transaction().
> 
> While I agree that there are many places that can fail due to memory
> allocations, the extent buffer requires whole 4 pages, other structures
> could be taken from the generic slabs or our named caches. The latter
> has lower chance to fail.
> 
>> I doubt if btrfs (or any other filesystems) would be to blamed if we're
>> really running out of memory.
> 
> Well, people blame btrfs for everything.
> 
>> Should the memory hungry user space programs to be firstly killed far
>> before we failed to allocate memory?
> 
> That's up to the allocator and I think it does a good job of providing
> the memory to kernel rather than to user space programs.
> 
> We do the critical allocations as GFP_NOFS which so far provides the "do
> not fail" guarantees. It's a long going discussion,
> https://lwn.net/Articles/653573/ (2015). We can let many allocations
> fail with a fallback, but still a lot of them would lead to transaction
> abort. And as Josef said, there are some that can't fail because they're
> too deep or there's no clear exit path.

Yep, for those call sites (aka, extent io tree) we still need NOFAIL 
until we added error handling for all the call sites.

> 
>> Furthermore, at least for btrfs, I don't think we would hit a situation
>> where memory allocation failure for metadata would lead to any data
>> corruption.
>> The worst case is we hit transaction abort, and the fs flips RO.
> 
> Yeah, corruption can't happen as long as we have all the error handling
> in place and the transaction abort as the ultimate fallback.
> 
>> Thus I'm wondering if we really need __NOFAIL for btrfs?
> 
> It's hard to say if or when the NOFAIL semantics actually apply. Let's
> say there are applications doing metadata operations, the system is
> under load, memory is freed slowly by writing data etc. Application that
> waits inside the eb allocation will continue eventually. Without the
> NOFAIL it would exit early.
> 
> As a middle ground, we may want something like "try hard" that would not
> fail too soon but it could eventually. That's __GFP_RETRY_MAYFAIL .

This sounds good. Although I'd say the MM is already doing a too good 
job, thus I'm not sure if we even need the extra retry.

> 
> Right now there are several changes around the extent buffers, I'd like
> do the conversion first and then replace/drop the NOFAIL flag so we
> don't mix too many changes in one release. The extent buffers are
> critical so one step a time, with lots of testing.

This sounds very reasonable.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-28 20:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <ffeb6b667a9ff0cf161f7dcd82899114782c0834.1700609426.git.wqu@suse.com>
     [not found] ` <20231122143815.GD11264@twin.jikos.cz>
     [not found]   ` <71d723c9-8f36-4fd1-bea7-7d962da465e2@gmx.com>
2023-11-27  5:10     ` Should we still go __GFP_NOFAIL? (Was Re: [PATCH] btrfs: refactor alloc_extent_buffer() to allocate-then-attach method) Qu Wenruo
2023-11-27 16:19       ` Josef Bacik
2023-11-28 16:26       ` David Sterba
2023-11-28 20:06         ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).