linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio
Date: Mon, 17 Oct 2022 14:08:02 -0400	[thread overview]
Message-ID: <Y02aAoQDtAoit8xL@localhost.localdomain> (raw)
In-Reply-To: <20221017142516.GQ13389@twin.jikos.cz>

On Mon, Oct 17, 2022 at 04:25:16PM +0200, David Sterba wrote:
> On Fri, Oct 14, 2022 at 10:00:38AM -0400, Josef Bacik wrote:
> > Hello,
> > 
> > As you can imagine we have workloads that don't behave super well sometimes, and
> > they'll OOM the box in a really spectacular fashion.  Sometimes these trip the
> > BUG_ON(!prealloc) things inside of the extent io tree code.
> > 
> > We've talked about switching these allocations to mempools for a while, but
> > that's going to require some extra work.  We can drastically reduce the
> > likelihood of failing these allocations by simply dropping the tree lock and
> > attempting to make the allocation with the original gfp_mask.
> > 
> > The main problem with that approach is we've been using GFP_ATOMIC in the endio
> > path for....reasons?  I *think* the read endio work used to happen in IRQ
> > context, but it hasn't for at least a decade, and in fact if we get read
> > failures we do our failrec allocations with GFP_NOFS, so clearly GFP_ATOMIC
> > isn't really required in this path.
> 
> Up to my possibly dated knowledge endio is done in irq context so we
> need to verify that. I did a quick check in block/ but the bare bio->end_io()
> is not called unser obvious irq protection (spin lock or local_irq
> save/restore), but I could be mistaken due to the maze of block layer.
> 

I went through and read all the code, every path that does a REQ_READ does the
actual endio work in an async worker, only some of the write path happens in IRQ
context.  Additionally we've been allocating failrec's in this context for
years, so if it was actually happening in IRQ context we would have noticed by
now.  I definitely went and looked tho because I was super confused.

> > So kill the GFP_ATOMIC allocations in the endio path, which is where we see
> > these panics, and then change the extent io code to simply do the loop again if
> > it can't allocate the prealloc extent with GFP_ATOMIC so we can make the
> > allocation with the callers gfp_mask.
> > 
> > This is perfectly safe, we'll drop the tree lock and loop around any time we
> > have to re-search the tree after modifying part of our range, we don't need to
> > hold the lock for our entire operation.
> > 
> > The only drawback here is that we could infinite loop if we can't make our
> > allocation.  This is why a mempool would be the proper solution, as we can't
> > fail these allocations without brining the box down, which is what we currently
> > do anyway.
> 
> Aren't the mempools shifting the possibly infinite loop one layer down
> only? With some added bonus of creating indirect dependencies of the
> allocating and freeing threads.

bio's use mempools for the same reason, the emergency reserve exists so that we
always are able to make our allocations.  Clearly we could still end up in a bad
situation if we exhaust the emergency reserve, but the extent states in this
particular case don't get allocated a bunch.  Thanks,

Josef

  reply	other threads:[~2022-10-17 18:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 14:00 [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio Josef Bacik
2022-10-14 14:00 ` [PATCH 1/3] btrfs: do not use GFP_ATOMIC in the read endio Josef Bacik
2022-10-14 14:00 ` [PATCH 2/3] btrfs: remove unlock_extent_atomic Josef Bacik
2022-10-14 14:00 ` [PATCH 3/3] btrfs: do not panic if we can't allocate a prealloc extent state Josef Bacik
2022-10-18 12:52   ` David Sterba
2022-10-17 14:25 ` [PATCH 0/3] btrfs: avoid GFP_ATOMIC allocation failures during endio David Sterba
2022-10-17 18:08   ` Josef Bacik [this message]
2022-10-18 12:42     ` David Sterba
2022-10-18 14:26       ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y02aAoQDtAoit8xL@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=dsterba@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).