public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: fix race in subpage sync writeback handling
Date: Tue, 15 Apr 2025 07:15:42 +0930	[thread overview]
Message-ID: <e13cf6fa-edd2-454f-8635-da8559b97ccc@gmx.com> (raw)
In-Reply-To: <ff2b56ecb70e4db53de11a019530c768a24f48f1.1744659146.git.josef@toxicpanda.com>



在 2025/4/15 05:02, Josef Bacik 写道:
> While debugging a fs corruption with subpage we noticed a potential race
> in how we do tagging for writeback.  Boris came up with the following
> diagram to describe the race potential
> 
> EB1                                                       EB2
> btree_write_cache_pages
>    tag_pages_for_writeback
>    filemap_get_folios_tag(TOWRITE)
>    submit_eb_page
>      submit_eb_subpage
>        for eb in folio:
>          write_one_eb
>                                                            set_extent_buffer_dirty
>                                                            btrfs_meta_folio_set_dirty
>                                                            ...
>                                                            filemap_fdatawrite_range
>                                                              btree_write_cache_pages
>                                                              tag_pages_for_writeback
>            folio_lock
>            btrfs_meta_folio_clear_dirty
>            btrfs_meta_folio_set_writeback // clear TOWRITE
>            folio_unlock
>                                                              filemap_get_folios_tag(TOWRITE) //missed
> 
> The problem here is that we'll call folio_start_writeback() the first
> time we initiate writeback on one extent buffer, if we happened to dirty
> the extent buffer behind the one we were currently writing in the first
> task, and race in as described above, we would miss the TOWRITE tag as
> it would have been cleared, and we will never initiate writeback on that
> EB.
> 
> This is kind of complicated for us, the best thing to do here is to
> simply leave the TOWRITE tag in place, and only clear it if we aren't
> dirty after we finish processing all the eb's in the folio.
> 
> Fixes: c4aec299fa8f ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/extent_io.c | 23 +++++++++++++++++++++++
>   fs/btrfs/subpage.c   |  2 +-
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6cfd286b8bbc..5d09a47c1c28 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2063,6 +2063,29 @@ static int submit_eb_subpage(struct folio *folio, struct writeback_control *wbc)
>   		}
>   		free_extent_buffer(eb);
>   	}
> +	/*
> +	 * Normally folio_start_writeback() will clear TAG_TOWRITE, but for
> +	 * subpage we use __folio_start_writeback(folio, true), which keeps it
> +	 * from clearing TOWRITE.  This is because we walk the bitmap and
> +	 * process each eb one at a time, and then locking the folio when we
> +	 * process the eb.  We could have somebody dirty behind us, and then
> +	 * subsequently mark this range as TOWRITE.  In that case we must not
> +	 * clear TOWRITE or we will skip writing back the dirty folio.
> +	 *
> +	 * So here lock the folio, if it is clean we know we are done with it,
> +	 * and we can clear TOWRITE.
> +	 */
> +	folio_lock(folio);
> +	if (!folio_test_dirty(folio)) {
> +		XA_STATE(xas, &folio->mapping->i_pages, folio_index(folio));
> +		unsigned long flags;
> +
> +		xas_lock_irqsave(&xas, flags);
> +		xas_load(&xas);
> +		xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
> +		xas_unlock_irqrestore(&xas, flags);
> +	}
> +	folio_unlock(folio);
>   	return submitted;
>   }
>   
> diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
> index d4f019233493..53140a9dad9d 100644
> --- a/fs/btrfs/subpage.c
> +++ b/fs/btrfs/subpage.c
> @@ -454,7 +454,7 @@ void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
>   	spin_lock_irqsave(&subpage->lock, flags);
>   	bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
>   	if (!folio_test_writeback(folio))
> -		folio_start_writeback(folio);
> +		__folio_start_writeback(folio, true);

This looks a little dangerous to me, as for data writeback we rely on 
folio_start_writeback() to clear the TOWRITE tag before this change.

Can we do a test on the dirty bitmap and only call clear TOWRITE tag 
when there is no dirty blocks?

(This also means, we must clear the folio dirty before start writeback, 
there are some exceptions that needs to be addressed)

By that, we do not need the manual tag handling in submit_eb_subpage().

Thanks,
Qu

>   	spin_unlock_irqrestore(&subpage->lock, flags);
>   }
>   


  parent reply	other threads:[~2025-04-14 21:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 19:32 [PATCH] btrfs: fix race in subpage sync writeback handling Josef Bacik
2025-04-14 19:52 ` Boris Burkov
2025-04-14 21:45 ` Qu Wenruo [this message]
2025-04-15  0:40   ` Boris Burkov
2025-04-15 17:22   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e13cf6fa-edd2-454f-8635-da8559b97ccc@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox