Re: [PATCH] btrfs: adjust subpage bit start based on sectorsize

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Cc: stable@vger.kernel.org, Boris Burkov <boris@bur.io>
Subject: Re: [PATCH] btrfs: adjust subpage bit start based on sectorsize
Date: Tue, 22 Apr 2025 05:47:04 +0930	[thread overview]
Message-ID: <1ffd7ba1-8e81-4dbc-8ff0-a2c376b7ffed@gmx.com> (raw)
In-Reply-To: <0914bad6138d2cfafc9cfe762bd06c1883ceb9d2.1745242535.git.josef@toxicpanda.com>



在 2025/4/21 23:07, Josef Bacik 写道:
> When running machines with 64k page size and a 16k nodesize we started
> seeing tree log corruption in production.  This turned out to be because
> we were not writing out dirty blocks sometimes, so this in fact affects
> all metadata writes.
> 
> When writing out a subpage EB we scan the subpage bitmap for a dirty
> range.  If the range isn't dirty we do
> 
> bit_start++;
> 
> to move onto the next bit.  The problem is the bitmap is based on the
> number of sectors that an EB has.  So in this case, we have a 64k
> pagesize, 16k nodesize, but a 4k sectorsize.  This means our bitmap is 4
> bits for every node.  With a 64k page size we end up with 4 nodes per
> page.
> 
> To make this easier this is how everything looks
> 
> [0         16k       32k       48k     ] logical address
> [0         4         8         12      ] radix tree offset
> [               64k page               ] folio
> [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers
> [ | | | |  | | | |   | | | |   | | | | ] bitmap
> 
> Now we use all of our addressing based on fs_info->sectorsize_bits, so
> as you can see the above our 16k eb->start turns into radix entry 4.
> 
> When we find a dirty range for our eb, we correctly do bit_start +=
> sectors_per_node, because if we start at bit 0, the next bit for the
> next eb is 4, to correspond to eb->start 16k.
> 
> However if our range is clean, we will do bit_start++, which will now
> put us offset from our radix tree entries.
> 
> In our case, assume that the first time we check the bitmap the block is
> not dirty, we increment bit_start so now it == 1, and then we loop
> around and check again.  This time it is dirty, and we go to find that
> start using the following equation
> 
> start = folio_start + bit_start * fs_info->sectorsize;
> 
> so in the case above, eb->start 0 is now dirty, and we calculate start
> as
> 
> 0 + 1 * fs_info->sectorsize = 4096
> 4096 >> 12 = 1
> 
> Now we're looking up the radix tree for 1, and we won't find an eb.
> What's worse is now we're using bit_start == 1, so we do bit_start +=
> sectors_per_node, which is now 5.  If that eb is dirty we will run into
> the same thing, we will look at an offset that is not populated in the
> radix tree, and now we're skipping the writeout of dirty extent buffers.
> 
> The best fix for this is to not use sectorsize_bits to address nodes,
> but that's a larger change.  Since this is a fs corruption problem fix
> it simply by always using sectors_per_node to increment the start bit.
> 
> cc: stable@vger.kernel.org
> Fixes: c4aec299fa8f ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
> Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Although I'm still a little concerned about unaligned tree blocks, but 
in practice there should be rarely any converted btrfs from v4.6 era 
that doesn't get a full balance and still be utilized.

So let's get the fix done and backported first, then reject unaligned 
tree blocks completely.

Thanks,
Qu

> ---
> - Further testing indicated that the page tagging theoretical race isn't getting
>    hit in practice, so we're going to limit the "hotfix" to this specific patch,
>    and then send subsequent patches to address the other issues we're hitting. My
>    simplify metadata writebback patches are the more wholistic fix.
> 
>   fs/btrfs/extent_io.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 5f08615b334f..6cfd286b8bbc 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2034,7 +2034,7 @@ static int submit_eb_subpage(struct folio *folio, struct writeback_control *wbc)
>   			      subpage->bitmaps)) {
>   			spin_unlock_irqrestore(&subpage->lock, flags);
>   			spin_unlock(&folio->mapping->i_private_lock);
> -			bit_start++;
> +			bit_start += sectors_per_node;
>   			continue;
>   		}
>

next prev parent reply	other threads:[~2025-04-21 20:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-21 13:37 [PATCH] btrfs: adjust subpage bit start based on sectorsize Josef Bacik
2025-04-21 20:17 ` Qu Wenruo [this message]
2025-04-24 10:40 ` Daniel Vacek
2025-04-24 20:57   ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2025-04-14 19:08 Josef Bacik
2025-04-14 19:54 ` Boris Burkov
2025-04-14 22:08 ` Qu Wenruo
2025-04-14 22:37   ` Qu Wenruo
2025-04-15 16:16     ` Boris Burkov
2025-04-15 23:49       ` Qu Wenruo
2025-04-16 14:16         ` Josef Bacik
2025-04-16 22:07           ` Qu Wenruo
2025-04-15 17:25     ` Josef Bacik
2025-04-16  0:08       ` Qu Wenruo
2025-04-15 17:23   ` Josef Bacik
2025-04-15 23:53     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ffd7ba1-8e81-4dbc-8ff0-a2c376b7ffed@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=boris@bur.io \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox