public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: fix race in subpage sync writeback handling
Date: Mon, 14 Apr 2025 17:40:37 -0700	[thread overview]
Message-ID: <20250415004037.GC3218113@zen.localdomain> (raw)
In-Reply-To: <e13cf6fa-edd2-454f-8635-da8559b97ccc@gmx.com>

On Tue, Apr 15, 2025 at 07:15:42AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2025/4/15 05:02, Josef Bacik 写道:
> > While debugging a fs corruption with subpage we noticed a potential race
> > in how we do tagging for writeback.  Boris came up with the following
> > diagram to describe the race potential
> > 
> > EB1                                                       EB2
> > btree_write_cache_pages
> >    tag_pages_for_writeback
> >    filemap_get_folios_tag(TOWRITE)
> >    submit_eb_page
> >      submit_eb_subpage
> >        for eb in folio:
> >          write_one_eb
> >                                                            set_extent_buffer_dirty
> >                                                            btrfs_meta_folio_set_dirty
> >                                                            ...
> >                                                            filemap_fdatawrite_range
> >                                                              btree_write_cache_pages
> >                                                              tag_pages_for_writeback
> >            folio_lock
> >            btrfs_meta_folio_clear_dirty
> >            btrfs_meta_folio_set_writeback // clear TOWRITE
> >            folio_unlock
> >                                                              filemap_get_folios_tag(TOWRITE) //missed
> > 
> > The problem here is that we'll call folio_start_writeback() the first
> > time we initiate writeback on one extent buffer, if we happened to dirty
> > the extent buffer behind the one we were currently writing in the first
> > task, and race in as described above, we would miss the TOWRITE tag as
> > it would have been cleared, and we will never initiate writeback on that
> > EB.
> > 
> > This is kind of complicated for us, the best thing to do here is to
> > simply leave the TOWRITE tag in place, and only clear it if we aren't
> > dirty after we finish processing all the eb's in the folio.
> > 
> > Fixes: c4aec299fa8f ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >   fs/btrfs/extent_io.c | 23 +++++++++++++++++++++++
> >   fs/btrfs/subpage.c   |  2 +-
> >   2 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 6cfd286b8bbc..5d09a47c1c28 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2063,6 +2063,29 @@ static int submit_eb_subpage(struct folio *folio, struct writeback_control *wbc)
> >   		}
> >   		free_extent_buffer(eb);
> >   	}
> > +	/*
> > +	 * Normally folio_start_writeback() will clear TAG_TOWRITE, but for
> > +	 * subpage we use __folio_start_writeback(folio, true), which keeps it
> > +	 * from clearing TOWRITE.  This is because we walk the bitmap and
> > +	 * process each eb one at a time, and then locking the folio when we
> > +	 * process the eb.  We could have somebody dirty behind us, and then
> > +	 * subsequently mark this range as TOWRITE.  In that case we must not
> > +	 * clear TOWRITE or we will skip writing back the dirty folio.
> > +	 *
> > +	 * So here lock the folio, if it is clean we know we are done with it,
> > +	 * and we can clear TOWRITE.
> > +	 */
> > +	folio_lock(folio);
> > +	if (!folio_test_dirty(folio)) {
> > +		XA_STATE(xas, &folio->mapping->i_pages, folio_index(folio));
> > +		unsigned long flags;
> > +
> > +		xas_lock_irqsave(&xas, flags);
> > +		xas_load(&xas);
> > +		xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
> > +		xas_unlock_irqrestore(&xas, flags);
> > +	}
> > +	folio_unlock(folio);
> >   	return submitted;
> >   }
> > diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
> > index d4f019233493..53140a9dad9d 100644
> > --- a/fs/btrfs/subpage.c
> > +++ b/fs/btrfs/subpage.c
> > @@ -454,7 +454,7 @@ void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
> >   	spin_lock_irqsave(&subpage->lock, flags);
> >   	bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
> >   	if (!folio_test_writeback(folio))
> > -		folio_start_writeback(folio);
> > +		__folio_start_writeback(folio, true);
> 
> This looks a little dangerous to me, as for data writeback we rely on
> folio_start_writeback() to clear the TOWRITE tag before this change.

During an extended test of a branch with both fixes, I did in fact see a
deadlock that looked like an extent_write_cache_pages going re-doing
writeback on a page that should have had the mark cleared. I'm not 100%
sure it is this, since it took 60+ runs of the logwrites crash test to hit
the deadlock during the instrumented workload

Anyway, I will take a look at and test your other proposed fix in
Josef's other email.

Thanks,
Boris

> 
> Can we do a test on the dirty bitmap and only call clear TOWRITE tag when
> there is no dirty blocks?
> 
> (This also means, we must clear the folio dirty before start writeback,
> there are some exceptions that needs to be addressed)
> 
> By that, we do not need the manual tag handling in submit_eb_subpage().
> 
> Thanks,
> Qu
> 
> >   	spin_unlock_irqrestore(&subpage->lock, flags);
> >   }
> 

  reply	other threads:[~2025-04-15  0:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 19:32 [PATCH] btrfs: fix race in subpage sync writeback handling Josef Bacik
2025-04-14 19:52 ` Boris Burkov
2025-04-14 21:45 ` Qu Wenruo
2025-04-15  0:40   ` Boris Burkov [this message]
2025-04-15 17:22   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250415004037.GC3218113@zen.localdomain \
    --to=boris@bur.io \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox