Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: fix possible infinite loop in data async reclaim
Date: Wed, 26 Aug 2020 11:13:22 +0200	[thread overview]
Message-ID: <20200826091322.GA28318@twin.jikos.cz> (raw)
In-Reply-To: <24f846bc8860cab91ca134d0a337cc290589a092.1598389008.git.josef@toxicpanda.com>

On Tue, Aug 25, 2020 at 04:56:59PM -0400, Josef Bacik wrote:
> Dave reported an issue where generic/102 would sometimes hang.  This
> turned out to be because we'd get into this spot where we were no longer
> making progress on data reservations because our exit condition was not
> met.  The log is basically
> 
> while (!space_info->full && !list_empty(&space_info->tickets))
> 	flush_space(space_info, flush_state);
> 
> where flush state is our various flush states, but doesn't include
> ALLOC_CHUNK_FORCE.  This is because we actually lead with allocating
> chunks, and so the assumption was that once you got to the actual
> flushing states you could no longer allocate chunks.  This was a stupid
> assumption, because you could have deleted block groups that would be
> reclaimed by a transaction commit, thus unsetting space_info->full.
> This is essentially what happens with generic/102, and so sometimes
> you'd get stuck in the flushing loop because we weren't allocating
> chunks, but flushing space wasn't giving us what we needed to make
> progress.
> 
> Fix this by adding ALLOC_CHUNK_FORCE to the end of our flushing states,
> that way we will eventually bail out because we did end up with
> space_info->full if we free'd a chunk previously.  Otherwise, as is the
> case for this test, we'll allocate our chunk and continue on our happy
> merry way.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Thanks. As the flushing states are added one by one at the end of the
series, I'll add it as a separate patch. Folding it to some other patch
would lose a bit more of information that's in the changelog, so this
leaves a short window where the 102 hang could happen but again the
flushing sequence is not switched at once.

      reply	other threads:[~2020-08-26  9:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-25 20:56 [PATCH] btrfs: fix possible infinite loop in data async reclaim Josef Bacik
2020-08-26  9:13 ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200826091322.GA28318@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox