From: Boris Burkov <boris@bur.io>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: fix race in subpage sync writeback handling
Date: Mon, 14 Apr 2025 12:52:48 -0700 [thread overview]
Message-ID: <20250414195248.GA3218113@zen.localdomain> (raw)
In-Reply-To: <ff2b56ecb70e4db53de11a019530c768a24f48f1.1744659146.git.josef@toxicpanda.com>
On Mon, Apr 14, 2025 at 03:32:34PM -0400, Josef Bacik wrote:
> While debugging a fs corruption with subpage we noticed a potential race
> in how we do tagging for writeback. Boris came up with the following
> diagram to describe the race potential
>
> EB1 EB2
> btree_write_cache_pages
> tag_pages_for_writeback
> filemap_get_folios_tag(TOWRITE)
> submit_eb_page
> submit_eb_subpage
> for eb in folio:
> write_one_eb
> set_extent_buffer_dirty
> btrfs_meta_folio_set_dirty
> ...
> filemap_fdatawrite_range
> btree_write_cache_pages
> tag_pages_for_writeback
> folio_lock
> btrfs_meta_folio_clear_dirty
> btrfs_meta_folio_set_writeback // clear TOWRITE
> folio_unlock
> filemap_get_folios_tag(TOWRITE) //missed
>
> The problem here is that we'll call folio_start_writeback() the first
> time we initiate writeback on one extent buffer, if we happened to dirty
> the extent buffer behind the one we were currently writing in the first
> task, and race in as described above, we would miss the TOWRITE tag as
> it would have been cleared, and we will never initiate writeback on that
> EB.
>
> This is kind of complicated for us, the best thing to do here is to
> simply leave the TOWRITE tag in place, and only clear it if we aren't
> dirty after we finish processing all the eb's in the folio.
>
> Fixes: c4aec299fa8f ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/extent_io.c | 23 +++++++++++++++++++++++
> fs/btrfs/subpage.c | 2 +-
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6cfd286b8bbc..5d09a47c1c28 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2063,6 +2063,29 @@ static int submit_eb_subpage(struct folio *folio, struct writeback_control *wbc)
> }
> free_extent_buffer(eb);
> }
> + /*
> + * Normally folio_start_writeback() will clear TAG_TOWRITE, but for
> + * subpage we use __folio_start_writeback(folio, true), which keeps it
> + * from clearing TOWRITE. This is because we walk the bitmap and
> + * process each eb one at a time, and then locking the folio when we
> + * process the eb. We could have somebody dirty behind us, and then
> + * subsequently mark this range as TOWRITE. In that case we must not
> + * clear TOWRITE or we will skip writing back the dirty folio.
> + *
> + * So here lock the folio, if it is clean we know we are done with it,
> + * and we can clear TOWRITE.
> + */
> + folio_lock(folio);
> + if (!folio_test_dirty(folio)) {
> + XA_STATE(xas, &folio->mapping->i_pages, folio_index(folio));
> + unsigned long flags;
> +
> + xas_lock_irqsave(&xas, flags);
> + xas_load(&xas);
> + xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
> + xas_unlock_irqrestore(&xas, flags);
> + }
> + folio_unlock(folio);
> return submitted;
> }
>
> diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
> index d4f019233493..53140a9dad9d 100644
> --- a/fs/btrfs/subpage.c
> +++ b/fs/btrfs/subpage.c
> @@ -454,7 +454,7 @@ void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
> spin_lock_irqsave(&subpage->lock, flags);
> bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
> if (!folio_test_writeback(folio))
> - folio_start_writeback(folio);
> + __folio_start_writeback(folio, true);
> spin_unlock_irqrestore(&subpage->lock, flags);
> }
>
> --
> 2.48.1
>
next prev parent reply other threads:[~2025-04-14 19:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-14 19:32 [PATCH] btrfs: fix race in subpage sync writeback handling Josef Bacik
2025-04-14 19:52 ` Boris Burkov [this message]
2025-04-14 21:45 ` Qu Wenruo
2025-04-15 0:40 ` Boris Burkov
2025-04-15 17:22 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250414195248.GA3218113@zen.localdomain \
--to=boris@bur.io \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox