Re: [PATCH v3 8/9] btrfs: prepare btrfs_end_repair_bio() for larger data folios

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 8/9] btrfs: prepare btrfs_end_repair_bio() for larger data folios
Date: Mon, 7 Apr 2025 08:49:05 +0930	[thread overview]
Message-ID: <cac7e55d-5e60-4f72-984f-34073f7f2588@suse.com> (raw)
In-Reply-To: <8203647f525da730826857afe87cd673f1e42074.1742195085.git.wqu@suse.com>



在 2025/3/17 17:40, Qu Wenruo 写道:
> The function btrfs_end_repair_bio() has an ASSERT() making sure the
> folio is page sized.
> 
> The reason is mostly related to the fact that later we pass a folio and
> its offset into btrfs_repair_io_failure().
> If we have larger folios passed in, later calculation of the folio and
> its offset can go wrong, as we have extra offset to the bv_page.
> 
> Change the behavior by:
> 
> - Doing a proper folio grab
>    Instead of just page_folio(bv_page), we should get the real page (as the
>    bv_offset can be larger than page size), then call page_folio().
> 
> - Do extra folio offset calculation
>    We can have the following cases of a bio_vec (this bv is moved
>    forward by btrfs read verification):
> 
>    bv_page             bv_offset
>    |                   |
>    | | | | | | | | | | | | | | | | |
>    |<-  folio_a  ->|<-  folio_b  ->|
> 
>    | | = a page.
> 
> In above case, the real folio should be folio_b, and offset inside that
> folio should be:
> 
>    bv_offset - ((&folio_b->page - &folio_a->page) << PAGE_SHIFT).

This part is a little confusing and resulted an incorrect calculation 
and ASSERT().

The more accurate (with more corner cases) example to calculate the 
offset would be something like this:

                             real_page
           bv_page           |   bv_offset (10K)
           |                 |   |
           v                 v   v
           |        |        |       |
           |<- F1 ->|<--- Folio 2 -->|
                    | result off |

In that case, we only need to care about one page pointer, the 
@real_page, which is (bv_page + (bv_offset >> PAGE_SHIFT)).

Meanwhile the offset inside the folio 2 is consisted of two part:

- The offset between folio2's head page and real_page
   We have an existing helper, folio_page_idx(), to do the calculation.
   So this part is just (folio_page_idx() << PAGE_SHIFT).

- The offset inside that @real_page
   Since it's now page based, we do not need to bother folio size, just
   offset_in_page() will do the work.

So the resulted value should be:
   (folio_page_idx() << PAGE_SHIFT) + offset_in_page(bv_offset)

Furthermore, the above cases shows that we can not use offset_in_folio() 
either, as the previous folio is no in the same order.
(offset_in_folio() will result 2K, not the correct 6K).

I'll send out an updated series with this fixed, and enable large folios 
for testing.

Thanks,
Qu

> 
> With these changes, now btrfs_end_repair_bio() is able to handle larger
> folios properly.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/bio.c | 28 +++++++++++++++++++++-------
>   1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 8c2eee1f1878..292c79e0855f 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -156,6 +156,25 @@ static void btrfs_repair_done(struct btrfs_failed_bio *fbio)
>   	}
>   }
>   
> +static struct folio *bio_vec_get_folio(const struct bio_vec *bv)
> +{
> +	return page_folio(bv->bv_page + (bv->bv_offset >> PAGE_SHIFT));
> +}
> +
> +static unsigned long bio_vec_get_folio_offset(const struct bio_vec *bv)
> +{
> +	struct folio *folio = bio_vec_get_folio(bv);
> +
> +	/*
> +	 * There can be multiple physically contiguous folios queued
> +	 * into the bio_vec.
> +	 * Thus the first page of our folio should be at or beyond
> +	 * the first page of the bio_vec.
> +	 */
> +	ASSERT(&folio->page >= bv->bv_page);
> +	return bv->bv_offset - ((&folio->page - bv->bv_page) << PAGE_SHIFT);
> +}
> +
>   static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>   				 struct btrfs_device *dev)
>   {
> @@ -165,12 +184,6 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>   	struct bio_vec *bv = bio_first_bvec_all(&repair_bbio->bio);
>   	int mirror = repair_bbio->mirror_num;
>   
> -	/*
> -	 * We can only trigger this for data bio, which doesn't support larger
> -	 * folios yet.
> -	 */
> -	ASSERT(folio_order(page_folio(bv->bv_page)) == 0);
> -
>   	if (repair_bbio->bio.bi_status ||
>   	    !btrfs_data_csum_ok(repair_bbio, dev, 0, bv)) {
>   		bio_reset(&repair_bbio->bio, NULL, REQ_OP_READ);
> @@ -192,7 +205,8 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>   		btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
>   				  repair_bbio->file_offset, fs_info->sectorsize,
>   				  repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
> -				  page_folio(bv->bv_page), bv->bv_offset, mirror);
> +				  bio_vec_get_folio(bv), bio_vec_get_folio_offset(bv),
> +				  mirror);
>   	} while (mirror != fbio->bbio->mirror_num);
>   
>   done:

next prev parent reply	other threads:[~2025-04-06 23:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17  7:10 [PATCH v3 0/9] btrfs: remove ASSERT()s for folio_order() and folio_test_large() Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 1/9] btrfs: send: remove the again label inside put_file_data() Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 2/9] btrfs: send: prepare put_file_data() for larger data folios Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 3/9] btrfs: prepare btrfs_page_mkwrite() " Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 4/9] btrfs: prepare prepare_one_folio() " Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 5/9] btrfs: prepare end_bbio_data_write() " Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 6/9] btrfs: subpage: prepare " Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 7/9] btrfs: zlib: prepare copy_data_into_buffer() " Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 8/9] btrfs: prepare btrfs_end_repair_bio() " Qu Wenruo
2025-04-03 19:56   ` David Sterba
2025-04-06 23:19   ` Qu Wenruo [this message]
2025-04-07  6:18   ` Christoph Hellwig
2025-04-07  7:11     ` Qu Wenruo
2025-03-17  7:10 ` [PATCH v3 9/9] btrfs: enable larger data folios support for defrag Qu Wenruo
2025-03-17 13:55 ` [PATCH v3 0/9] btrfs: remove ASSERT()s for folio_order() and folio_test_large() David Sterba
2025-03-17 15:13   ` David Sterba
2025-03-31  4:47     ` Qu Wenruo
2025-04-03 19:54       ` David Sterba
2025-04-03 21:45         ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cac7e55d-5e60-4f72-984f-34073f7f2588@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox