public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: fix race in subpage sync writeback handling
Date: Mon, 14 Apr 2025 12:52:48 -0700	[thread overview]
Message-ID: <20250414195248.GA3218113@zen.localdomain> (raw)
In-Reply-To: <ff2b56ecb70e4db53de11a019530c768a24f48f1.1744659146.git.josef@toxicpanda.com>

On Mon, Apr 14, 2025 at 03:32:34PM -0400, Josef Bacik wrote:
> While debugging a fs corruption with subpage we noticed a potential race
> in how we do tagging for writeback.  Boris came up with the following
> diagram to describe the race potential
> 
> EB1                                                       EB2
> btree_write_cache_pages
>   tag_pages_for_writeback
>   filemap_get_folios_tag(TOWRITE)
>   submit_eb_page
>     submit_eb_subpage
>       for eb in folio:
>         write_one_eb
>                                                           set_extent_buffer_dirty
>                                                           btrfs_meta_folio_set_dirty
>                                                           ...
>                                                           filemap_fdatawrite_range
>                                                             btree_write_cache_pages
>                                                             tag_pages_for_writeback
>           folio_lock
>           btrfs_meta_folio_clear_dirty
>           btrfs_meta_folio_set_writeback // clear TOWRITE
>           folio_unlock
>                                                             filemap_get_folios_tag(TOWRITE) //missed
> 
> The problem here is that we'll call folio_start_writeback() the first
> time we initiate writeback on one extent buffer, if we happened to dirty
> the extent buffer behind the one we were currently writing in the first
> task, and race in as described above, we would miss the TOWRITE tag as
> it would have been cleared, and we will never initiate writeback on that
> EB.
> 
> This is kind of complicated for us, the best thing to do here is to
> simply leave the TOWRITE tag in place, and only clear it if we aren't
> dirty after we finish processing all the eb's in the folio.
> 
> Fixes: c4aec299fa8f ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/extent_io.c | 23 +++++++++++++++++++++++
>  fs/btrfs/subpage.c   |  2 +-
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6cfd286b8bbc..5d09a47c1c28 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2063,6 +2063,29 @@ static int submit_eb_subpage(struct folio *folio, struct writeback_control *wbc)
>  		}
>  		free_extent_buffer(eb);
>  	}
> +	/*
> +	 * Normally folio_start_writeback() will clear TAG_TOWRITE, but for
> +	 * subpage we use __folio_start_writeback(folio, true), which keeps it
> +	 * from clearing TOWRITE.  This is because we walk the bitmap and
> +	 * process each eb one at a time, and then locking the folio when we
> +	 * process the eb.  We could have somebody dirty behind us, and then
> +	 * subsequently mark this range as TOWRITE.  In that case we must not
> +	 * clear TOWRITE or we will skip writing back the dirty folio.
> +	 *
> +	 * So here lock the folio, if it is clean we know we are done with it,
> +	 * and we can clear TOWRITE.
> +	 */
> +	folio_lock(folio);
> +	if (!folio_test_dirty(folio)) {
> +		XA_STATE(xas, &folio->mapping->i_pages, folio_index(folio));
> +		unsigned long flags;
> +
> +		xas_lock_irqsave(&xas, flags);
> +		xas_load(&xas);
> +		xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
> +		xas_unlock_irqrestore(&xas, flags);
> +	}
> +	folio_unlock(folio);
>  	return submitted;
>  }
>  
> diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
> index d4f019233493..53140a9dad9d 100644
> --- a/fs/btrfs/subpage.c
> +++ b/fs/btrfs/subpage.c
> @@ -454,7 +454,7 @@ void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
>  	spin_lock_irqsave(&subpage->lock, flags);
>  	bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
>  	if (!folio_test_writeback(folio))
> -		folio_start_writeback(folio);
> +		__folio_start_writeback(folio, true);
>  	spin_unlock_irqrestore(&subpage->lock, flags);
>  }
>  
> -- 
> 2.48.1
> 

  reply	other threads:[~2025-04-14 19:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 19:32 [PATCH] btrfs: fix race in subpage sync writeback handling Josef Bacik
2025-04-14 19:52 ` Boris Burkov [this message]
2025-04-14 21:45 ` Qu Wenruo
2025-04-15  0:40   ` Boris Burkov
2025-04-15 17:22   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250414195248.GA3218113@zen.localdomain \
    --to=boris@bur.io \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox