From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 8/9] btrfs: prepare btrfs_end_repair_bio() for larger data folios
Date: Mon, 7 Apr 2025 08:49:05 +0930 [thread overview]
Message-ID: <cac7e55d-5e60-4f72-984f-34073f7f2588@suse.com> (raw)
In-Reply-To: <8203647f525da730826857afe87cd673f1e42074.1742195085.git.wqu@suse.com>
在 2025/3/17 17:40, Qu Wenruo 写道:
> The function btrfs_end_repair_bio() has an ASSERT() making sure the
> folio is page sized.
>
> The reason is mostly related to the fact that later we pass a folio and
> its offset into btrfs_repair_io_failure().
> If we have larger folios passed in, later calculation of the folio and
> its offset can go wrong, as we have extra offset to the bv_page.
>
> Change the behavior by:
>
> - Doing a proper folio grab
> Instead of just page_folio(bv_page), we should get the real page (as the
> bv_offset can be larger than page size), then call page_folio().
>
> - Do extra folio offset calculation
> We can have the following cases of a bio_vec (this bv is moved
> forward by btrfs read verification):
>
> bv_page bv_offset
> | |
> | | | | | | | | | | | | | | | | |
> |<- folio_a ->|<- folio_b ->|
>
> | | = a page.
>
> In above case, the real folio should be folio_b, and offset inside that
> folio should be:
>
> bv_offset - ((&folio_b->page - &folio_a->page) << PAGE_SHIFT).
This part is a little confusing and resulted an incorrect calculation
and ASSERT().
The more accurate (with more corner cases) example to calculate the
offset would be something like this:
real_page
bv_page | bv_offset (10K)
| | |
v v v
| | | |
|<- F1 ->|<--- Folio 2 -->|
| result off |
In that case, we only need to care about one page pointer, the
@real_page, which is (bv_page + (bv_offset >> PAGE_SHIFT)).
Meanwhile the offset inside the folio 2 is consisted of two part:
- The offset between folio2's head page and real_page
We have an existing helper, folio_page_idx(), to do the calculation.
So this part is just (folio_page_idx() << PAGE_SHIFT).
- The offset inside that @real_page
Since it's now page based, we do not need to bother folio size, just
offset_in_page() will do the work.
So the resulted value should be:
(folio_page_idx() << PAGE_SHIFT) + offset_in_page(bv_offset)
Furthermore, the above cases shows that we can not use offset_in_folio()
either, as the previous folio is no in the same order.
(offset_in_folio() will result 2K, not the correct 6K).
I'll send out an updated series with this fixed, and enable large folios
for testing.
Thanks,
Qu
>
> With these changes, now btrfs_end_repair_bio() is able to handle larger
> folios properly.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/bio.c | 28 +++++++++++++++++++++-------
> 1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 8c2eee1f1878..292c79e0855f 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -156,6 +156,25 @@ static void btrfs_repair_done(struct btrfs_failed_bio *fbio)
> }
> }
>
> +static struct folio *bio_vec_get_folio(const struct bio_vec *bv)
> +{
> + return page_folio(bv->bv_page + (bv->bv_offset >> PAGE_SHIFT));
> +}
> +
> +static unsigned long bio_vec_get_folio_offset(const struct bio_vec *bv)
> +{
> + struct folio *folio = bio_vec_get_folio(bv);
> +
> + /*
> + * There can be multiple physically contiguous folios queued
> + * into the bio_vec.
> + * Thus the first page of our folio should be at or beyond
> + * the first page of the bio_vec.
> + */
> + ASSERT(&folio->page >= bv->bv_page);
> + return bv->bv_offset - ((&folio->page - bv->bv_page) << PAGE_SHIFT);
> +}
> +
> static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
> struct btrfs_device *dev)
> {
> @@ -165,12 +184,6 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
> struct bio_vec *bv = bio_first_bvec_all(&repair_bbio->bio);
> int mirror = repair_bbio->mirror_num;
>
> - /*
> - * We can only trigger this for data bio, which doesn't support larger
> - * folios yet.
> - */
> - ASSERT(folio_order(page_folio(bv->bv_page)) == 0);
> -
> if (repair_bbio->bio.bi_status ||
> !btrfs_data_csum_ok(repair_bbio, dev, 0, bv)) {
> bio_reset(&repair_bbio->bio, NULL, REQ_OP_READ);
> @@ -192,7 +205,8 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
> btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
> repair_bbio->file_offset, fs_info->sectorsize,
> repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
> - page_folio(bv->bv_page), bv->bv_offset, mirror);
> + bio_vec_get_folio(bv), bio_vec_get_folio_offset(bv),
> + mirror);
> } while (mirror != fbio->bbio->mirror_num);
>
> done:
next prev parent reply other threads:[~2025-04-06 23:19 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-17 7:10 [PATCH v3 0/9] btrfs: remove ASSERT()s for folio_order() and folio_test_large() Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 1/9] btrfs: send: remove the again label inside put_file_data() Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 2/9] btrfs: send: prepare put_file_data() for larger data folios Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 3/9] btrfs: prepare btrfs_page_mkwrite() " Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 4/9] btrfs: prepare prepare_one_folio() " Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 5/9] btrfs: prepare end_bbio_data_write() " Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 6/9] btrfs: subpage: prepare " Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 7/9] btrfs: zlib: prepare copy_data_into_buffer() " Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 8/9] btrfs: prepare btrfs_end_repair_bio() " Qu Wenruo
2025-04-03 19:56 ` David Sterba
2025-04-06 23:19 ` Qu Wenruo [this message]
2025-04-07 6:18 ` Christoph Hellwig
2025-04-07 7:11 ` Qu Wenruo
2025-03-17 7:10 ` [PATCH v3 9/9] btrfs: enable larger data folios support for defrag Qu Wenruo
2025-03-17 13:55 ` [PATCH v3 0/9] btrfs: remove ASSERT()s for folio_order() and folio_test_large() David Sterba
2025-03-17 15:13 ` David Sterba
2025-03-31 4:47 ` Qu Wenruo
2025-04-03 19:54 ` David Sterba
2025-04-03 21:45 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cac7e55d-5e60-4f72-984f-34073f7f2588@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox