[PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio
@ 2022-10-14 14:00 Josef Bacik
  2022-10-14 14:00 ` [PATCH 1/3] btrfs: do not use GFP_ATOMIC in the read endio Josef Bacik
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Josef Bacik @ 2022-10-14 14:00 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

As you can imagine we have workloads that don't behave super well sometimes, and
they'll OOM the box in a really spectacular fashion.  Sometimes these trip the
BUG_ON(!prealloc) things inside of the extent io tree code.

We've talked about switching these allocations to mempools for a while, but
that's going to require some extra work.  We can drastically reduce the
likelihood of failing these allocations by simply dropping the tree lock and
attempting to make the allocation with the original gfp_mask.

The main problem with that approach is we've been using GFP_ATOMIC in the endio
path for....reasons?  I *think* the read endio work used to happen in IRQ
context, but it hasn't for at least a decade, and in fact if we get read
failures we do our failrec allocations with GFP_NOFS, so clearly GFP_ATOMIC
isn't really required in this path.

So kill the GFP_ATOMIC allocations in the endio path, which is where we see
these panics, and then change the extent io code to simply do the loop again if
it can't allocate the prealloc extent with GFP_ATOMIC so we can make the
allocation with the callers gfp_mask.

This is perfectly safe, we'll drop the tree lock and loop around any time we
have to re-search the tree after modifying part of our range, we don't need to
hold the lock for our entire operation.

The only drawback here is that we could infinite loop if we can't make our
allocation.  This is why a mempool would be the proper solution, as we can't
fail these allocations without brining the box down, which is what we currently
do anyway.

Josef

Josef Bacik (3):
  btrfs: do not use GFP_ATOMIC in the read endio
  btrfs: remove unlock_extent_atomic
  btrfs: do not panic if we can't allocate a prealloc extent state

 fs/btrfs/extent-io-tree.c | 22 ++++++++++++++--------
 fs/btrfs/extent-io-tree.h |  7 -------
 fs/btrfs/extent_io.c      |  8 ++++----
 3 files changed, 18 insertions(+), 19 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] btrfs: do not use GFP_ATOMIC in the read endio
  2022-10-14 14:00 [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio Josef Bacik
@ 2022-10-14 14:00 ` Josef Bacik
  2022-10-14 14:00 ` [PATCH 2/3] btrfs: remove unlock_extent_atomic Josef Bacik
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2022-10-14 14:00 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have done read endio in an async thread for a very, very long time,
which makes the use of GFP_ATOMIC and unlock_extent_atomic() unneeded in
our read endio path.  We've noticed under heavy memory pressure in our
fleet that we can fail these allocations, and then often trip a
BUG_ON(!allocation), which isn't an ideal outcome.  Begin to address
this by simply not using GFP_ATOMIC, which will allow us to do things
like actually allocate a extent state when doing
set_extent_bits(UPTODATE) in the endio handler.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1eae68fbae21..53d6221d1110 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -900,9 +900,9 @@ static void end_sector_io(struct page *page, u64 offset, bool uptodate)
 	end_page_read(page, uptodate, offset, sectorsize);
 	if (uptodate)
 		set_extent_uptodate(&inode->io_tree, offset,
-				    offset + sectorsize - 1, &cached, GFP_ATOMIC);
-	unlock_extent_atomic(&inode->io_tree, offset, offset + sectorsize - 1,
-			     &cached);
+				    offset + sectorsize - 1, &cached, GFP_NOFS);
+	unlock_extent(&inode->io_tree, offset, offset + sectorsize - 1,
+		      &cached);
 }
 
 static void submit_data_read_repair(struct inode *inode,
@@ -1106,7 +1106,7 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	 * Now we don't have range contiguous to the processed range, release
 	 * the processed range now.
 	 */
-	unlock_extent_atomic(tree, processed->start, processed->end, &cached);
+	unlock_extent(tree, processed->start, processed->end, &cached);
 
 update:
 	/* Update processed to current range */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] btrfs: remove unlock_extent_atomic
  2022-10-14 14:00 [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio Josef Bacik
  2022-10-14 14:00 ` [PATCH 1/3] btrfs: do not use GFP_ATOMIC in the read endio Josef Bacik
@ 2022-10-14 14:00 ` Josef Bacik
  2022-10-14 14:00 ` [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state Josef Bacik
  2022-10-17 14:25 ` [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio David Sterba
  3 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2022-10-14 14:00 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

As of

  btrfs: do not use GFP_ATOMIC in the read endio

we no longer have any users of unlock_extent_atomic, remove it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-io-tree.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index a855f40dd61d..8187f3360056 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -139,13 +139,6 @@ static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end,
 				  GFP_NOFS, NULL);
 }
 
-static inline int unlock_extent_atomic(struct extent_io_tree *tree, u64 start,
-				       u64 end, struct extent_state **cached)
-{
-	return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached,
-				  GFP_ATOMIC, NULL);
-}
-
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 				    u64 end, u32 bits)
 {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state
  2022-10-14 14:00 [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio Josef Bacik
  2022-10-14 14:00 ` [PATCH 1/3] btrfs: do not use GFP_ATOMIC in the read endio Josef Bacik
  2022-10-14 14:00 ` [PATCH 2/3] btrfs: remove unlock_extent_atomic Josef Bacik
@ 2022-10-14 14:00 ` Josef Bacik
  2022-10-18 12:52   ` David Sterba
  2022-10-17 14:25 ` [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio David Sterba
  3 siblings, 1 reply; 9+ messages in thread
From: Josef Bacik @ 2022-10-14 14:00 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We sometimes have to allocate new extent states when clearing or setting
new bits in an extent io tree.  Generally we preallocate this before
taking the tree spin lock, but we can use this preallocated extent state
sometimes and then need to try to do a GFP_ATOMIC allocation under the
lock.

Unfortunately sometimes this fails, and then we hit the BUG_ON() and
bring the box down.  This happens roughly 20 times a week in our fleet.

However the vast majority of callers use GFP_NOFS, which means that if
this GFP_ATOMIC allocation fails, we could simply drop the spin lock, go
back and allocate a new extent state with our given gfp mask, and begin
again from where we left off.

For the remaining callers that do not use GFP_NOFS, they are generally
using GFP_NOWAIT, which still allows for some reclaim.  So allow these
allocations to attempt to happen outside of the spin lock so we don't
need to rely on GFP_ATOMIC allocations.

This in essence creates an infinite loop for anything that isn't
GFP_NOFS.  To address this we will want to migrate to using mempools for
extent states so that we will always have emergency reserves in order to
make our allocations.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-io-tree.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index 618275af19c4..6ad09ba28aae 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -572,7 +572,7 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	if (bits & (EXTENT_LOCKED | EXTENT_BOUNDARY))
 		clear = 1;
 again:
-	if (!prealloc && gfpflags_allow_blocking(mask)) {
+	if (!prealloc) {
 		/*
 		 * Don't care for allocation failure here because we might end
 		 * up not needing the pre-allocated extent state at all, which
@@ -636,7 +636,8 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 	if (state->start < start) {
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc)
+			goto search_again;
 		err = split_state(tree, state, prealloc, start);
 		if (err)
 			extent_io_tree_panic(tree, err);
@@ -657,7 +658,8 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	 */
 	if (state->start <= end && state->end > end) {
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc)
+			goto search_again;
 		err = split_state(tree, state, prealloc, end + 1);
 		if (err)
 			extent_io_tree_panic(tree, err);
@@ -966,7 +968,7 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	else
 		ASSERT(failed_start == NULL);
 again:
-	if (!prealloc && gfpflags_allow_blocking(mask)) {
+	if (!prealloc) {
 		/*
 		 * Don't care for allocation failure here because we might end
 		 * up not needing the pre-allocated extent state at all, which
@@ -991,7 +993,8 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	state = tree_search_for_insert(tree, start, &p, &parent);
 	if (!state) {
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc)
+			goto search_again;
 		prealloc->start = start;
 		prealloc->end = end;
 		insert_state_fast(tree, prealloc, p, parent, bits, changeset);
@@ -1062,7 +1065,8 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		}
 
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc)
+			goto search_again;
 		err = split_state(tree, state, prealloc, start);
 		if (err)
 			extent_io_tree_panic(tree, err);
@@ -1099,7 +1103,8 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 			this_end = last_start - 1;
 
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc)
+			goto search_again;
 
 		/*
 		 * Avoid to free 'prealloc' if it can be merged with the later
@@ -1130,7 +1135,8 @@ static int __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		}
 
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc)
+			goto search_again;
 		err = split_state(tree, state, prealloc, end + 1);
 		if (err)
 			extent_io_tree_panic(tree, err);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio
  2022-10-14 14:00 [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio Josef Bacik
                   ` (2 preceding siblings ...)
  2022-10-14 14:00 ` [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state Josef Bacik
@ 2022-10-17 14:25 ` David Sterba
  2022-10-17 18:08   ` Josef Bacik
  3 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2022-10-17 14:25 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Oct 14, 2022 at 10:00:38AM -0400, Josef Bacik wrote:
> Hello,
> 
> As you can imagine we have workloads that don't behave super well sometimes, and
> they'll OOM the box in a really spectacular fashion.  Sometimes these trip the
> BUG_ON(!prealloc) things inside of the extent io tree code.
> 
> We've talked about switching these allocations to mempools for a while, but
> that's going to require some extra work.  We can drastically reduce the
> likelihood of failing these allocations by simply dropping the tree lock and
> attempting to make the allocation with the original gfp_mask.
> 
> The main problem with that approach is we've been using GFP_ATOMIC in the endio
> path for....reasons?  I *think* the read endio work used to happen in IRQ
> context, but it hasn't for at least a decade, and in fact if we get read
> failures we do our failrec allocations with GFP_NOFS, so clearly GFP_ATOMIC
> isn't really required in this path.

Up to my possibly dated knowledge endio is done in irq context so we
need to verify that. I did a quick check in block/ but the bare bio->end_io()
is not called unser obvious irq protection (spin lock or local_irq
save/restore), but I could be mistaken due to the maze of block layer.

> So kill the GFP_ATOMIC allocations in the endio path, which is where we see
> these panics, and then change the extent io code to simply do the loop again if
> it can't allocate the prealloc extent with GFP_ATOMIC so we can make the
> allocation with the callers gfp_mask.
> 
> This is perfectly safe, we'll drop the tree lock and loop around any time we
> have to re-search the tree after modifying part of our range, we don't need to
> hold the lock for our entire operation.
> 
> The only drawback here is that we could infinite loop if we can't make our
> allocation.  This is why a mempool would be the proper solution, as we can't
> fail these allocations without brining the box down, which is what we currently
> do anyway.

Aren't the mempools shifting the possibly infinite loop one layer down
only? With some added bonus of creating indirect dependencies of the
allocating and freeing threads.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio
  2022-10-17 14:25 ` [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio David Sterba
@ 2022-10-17 18:08   ` Josef Bacik
  2022-10-18 12:42     ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Josef Bacik @ 2022-10-17 18:08 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, kernel-team

On Mon, Oct 17, 2022 at 04:25:16PM +0200, David Sterba wrote:
> On Fri, Oct 14, 2022 at 10:00:38AM -0400, Josef Bacik wrote:
> > Hello,
> > 
> > As you can imagine we have workloads that don't behave super well sometimes, and
> > they'll OOM the box in a really spectacular fashion.  Sometimes these trip the
> > BUG_ON(!prealloc) things inside of the extent io tree code.
> > 
> > We've talked about switching these allocations to mempools for a while, but
> > that's going to require some extra work.  We can drastically reduce the
> > likelihood of failing these allocations by simply dropping the tree lock and
> > attempting to make the allocation with the original gfp_mask.
> > 
> > The main problem with that approach is we've been using GFP_ATOMIC in the endio
> > path for....reasons?  I *think* the read endio work used to happen in IRQ
> > context, but it hasn't for at least a decade, and in fact if we get read
> > failures we do our failrec allocations with GFP_NOFS, so clearly GFP_ATOMIC
> > isn't really required in this path.
> 
> Up to my possibly dated knowledge endio is done in irq context so we
> need to verify that. I did a quick check in block/ but the bare bio->end_io()
> is not called unser obvious irq protection (spin lock or local_irq
> save/restore), but I could be mistaken due to the maze of block layer.
> 

I went through and read all the code, every path that does a REQ_READ does the
actual endio work in an async worker, only some of the write path happens in IRQ
context.  Additionally we've been allocating failrec's in this context for
years, so if it was actually happening in IRQ context we would have noticed by
now.  I definitely went and looked tho because I was super confused.

> > So kill the GFP_ATOMIC allocations in the endio path, which is where we see
> > these panics, and then change the extent io code to simply do the loop again if
> > it can't allocate the prealloc extent with GFP_ATOMIC so we can make the
> > allocation with the callers gfp_mask.
> > 
> > This is perfectly safe, we'll drop the tree lock and loop around any time we
> > have to re-search the tree after modifying part of our range, we don't need to
> > hold the lock for our entire operation.
> > 
> > The only drawback here is that we could infinite loop if we can't make our
> > allocation.  This is why a mempool would be the proper solution, as we can't
> > fail these allocations without brining the box down, which is what we currently
> > do anyway.
> 
> Aren't the mempools shifting the possibly infinite loop one layer down
> only? With some added bonus of creating indirect dependencies of the
> allocating and freeing threads.

bio's use mempools for the same reason, the emergency reserve exists so that we
always are able to make our allocations.  Clearly we could still end up in a bad
situation if we exhaust the emergency reserve, but the extent states in this
particular case don't get allocated a bunch.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio
  2022-10-17 18:08   ` Josef Bacik
@ 2022-10-18 12:42     ` David Sterba
  2022-10-18 14:26       ` Josef Bacik
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2022-10-18 12:42 UTC (permalink / raw)
  To: Josef Bacik; +Cc: David Sterba, linux-btrfs, kernel-team

On Mon, Oct 17, 2022 at 02:08:02PM -0400, Josef Bacik wrote:
> On Mon, Oct 17, 2022 at 04:25:16PM +0200, David Sterba wrote:
> > > 
> > > This is perfectly safe, we'll drop the tree lock and loop around any time we
> > > have to re-search the tree after modifying part of our range, we don't need to
> > > hold the lock for our entire operation.
> > > 
> > > The only drawback here is that we could infinite loop if we can't make our
> > > allocation.  This is why a mempool would be the proper solution, as we can't
> > > fail these allocations without brining the box down, which is what we currently
> > > do anyway.
> > 
> > Aren't the mempools shifting the possibly infinite loop one layer down
> > only? With some added bonus of creating indirect dependencies of the
> > allocating and freeing threads.
> 
> bio's use mempools for the same reason, the emergency reserve exists so that we
> always are able to make our allocations.  Clearly we could still end up in a bad
> situation if we exhaust the emergency reserve, but the extent states in this
> particular case don't get allocated a bunch.  Thanks,

I think that bios are the only thing that works with mempools reliably
because it satisfies the guaranteed forward progress. Otherwise the
indirect dependenices lead to lockups in the allocation, which is
equivalent to the potentially infinite looping. The emergency reserve is
another finite resource so it can get exhausted eventually, and it's
not scaled to the number of potential requests that could hit the same
code path and competing for the memory. It's true we'd be dealing with a
system in a bad state and depending on another subsystems make it less
predictable. The simplest options are wait or exit with error.

Anyway, the mempools might work in this case but I'm skeptical so I'd be
very interested in some kind of proof that we understand the edge cases.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state
  2022-10-14 14:00 ` [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state Josef Bacik
@ 2022-10-18 12:52   ` David Sterba
  0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2022-10-18 12:52 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Oct 14, 2022 at 10:00:41AM -0400, Josef Bacik wrote:
> We sometimes have to allocate new extent states when clearing or setting
> new bits in an extent io tree.  Generally we preallocate this before
> taking the tree spin lock, but we can use this preallocated extent state
> sometimes and then need to try to do a GFP_ATOMIC allocation under the
> lock.
> 
> Unfortunately sometimes this fails, and then we hit the BUG_ON() and
> bring the box down.  This happens roughly 20 times a week in our fleet.
> 
> However the vast majority of callers use GFP_NOFS, which means that if
> this GFP_ATOMIC allocation fails, we could simply drop the spin lock, go
> back and allocate a new extent state with our given gfp mask, and begin
> again from where we left off.
> 
> For the remaining callers that do not use GFP_NOFS, they are generally
> using GFP_NOWAIT, which still allows for some reclaim.  So allow these
> allocations to attempt to happen outside of the spin lock so we don't
> need to rely on GFP_ATOMIC allocations.
> 
> This in essence creates an infinite loop for anything that isn't
> GFP_NOFS.  To address this we will want to migrate to using mempools for
> extent states so that we will always have emergency reserves in order to
> make our allocations.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent-io-tree.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
> index 618275af19c4..6ad09ba28aae 100644
> --- a/fs/btrfs/extent-io-tree.c
> +++ b/fs/btrfs/extent-io-tree.c
> @@ -572,7 +572,7 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>  	if (bits & (EXTENT_LOCKED | EXTENT_BOUNDARY))
>  		clear = 1;
>  again:
> -	if (!prealloc && gfpflags_allow_blocking(mask)) {

There's another call to gfpflags_allow_blocking(mask) at the end of the
loop before cond_resched(), if we pass only GFP_NOFS and GFP_NOWAIT we
can assume blocking is always right? Eventually we can put an assert.

And on top of that the mask argument can be refined to 'bool wait' and
we can drop passing GFP_NOFS everywhere in the set_extent_bit* helpers.
It's for a separate patch but I think we should continue with cleanups
here as this patchset enabled a few more.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio
  2022-10-18 12:42     ` David Sterba
@ 2022-10-18 14:26       ` Josef Bacik
  0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2022-10-18 14:26 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, kernel-team

On Tue, Oct 18, 2022 at 02:42:29PM +0200, David Sterba wrote:
> On Mon, Oct 17, 2022 at 02:08:02PM -0400, Josef Bacik wrote:
> > On Mon, Oct 17, 2022 at 04:25:16PM +0200, David Sterba wrote:
> > > > 
> > > > This is perfectly safe, we'll drop the tree lock and loop around any time we
> > > > have to re-search the tree after modifying part of our range, we don't need to
> > > > hold the lock for our entire operation.
> > > > 
> > > > The only drawback here is that we could infinite loop if we can't make our
> > > > allocation.  This is why a mempool would be the proper solution, as we can't
> > > > fail these allocations without brining the box down, which is what we currently
> > > > do anyway.
> > > 
> > > Aren't the mempools shifting the possibly infinite loop one layer down
> > > only? With some added bonus of creating indirect dependencies of the
> > > allocating and freeing threads.
> > 
> > bio's use mempools for the same reason, the emergency reserve exists so that we
> > always are able to make our allocations.  Clearly we could still end up in a bad
> > situation if we exhaust the emergency reserve, but the extent states in this
> > particular case don't get allocated a bunch.  Thanks,
> 
> I think that bios are the only thing that works with mempools reliably
> because it satisfies the guaranteed forward progress. Otherwise the
> indirect dependenices lead to lockups in the allocation, which is
> equivalent to the potentially infinite looping. The emergency reserve is
> another finite resource so it can get exhausted eventually, and it's
> not scaled to the number of potential requests that could hit the same
> code path and competing for the memory. It's true we'd be dealing with a
> system in a bad state and depending on another subsystems make it less
> predictable. The simplest options are wait or exit with error.
> 

Yup, this is why I didn't do the work now, becuase I don't want to use an
emergency reserve for every extent io tree operation, otherwise we're going to
run into trouble.

I only want to do it in the case of the endio handlers, since that is for
forward progress, and for EXTENT_LOCK, for the same reason.  However IDK how it
would look to have mempools where we allow failures in some cases but use the
emergency reserve for others.  It's an area for future investigation, for now
this is a step in the right direction.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-18 14:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-14 14:00 [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio Josef Bacik
2022-10-14 14:00 ` [PATCH 1/3] btrfs: do not use GFP_ATOMIC in the read endio Josef Bacik
2022-10-14 14:00 ` [PATCH 2/3] btrfs: remove unlock_extent_atomic Josef Bacik
2022-10-14 14:00 ` [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state Josef Bacik
2022-10-18 12:52   ` David Sterba
2022-10-17 14:25 ` [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio David Sterba
2022-10-17 18:08   ` Josef Bacik
2022-10-18 12:42     ` David Sterba
2022-10-18 14:26       ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).