From: Qu Wenruo <wqu@suse.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/2] btrfs: prepare btrfs_end_repair_bio() for larger data folios
Date: Tue, 15 Apr 2025 08:28:29 +0930 [thread overview]
Message-ID: <d5826f39-da49-44be-ac3d-9e697ce54793@suse.com> (raw)
In-Reply-To: <CAL3q7H7x+cH6gAjuyO2pW=An6NkJwm_TCbGzVVygLHvvwzfz+g@mail.gmail.com>
在 2025/4/14 23:55, Filipe Manana 写道:
> On Mon, Apr 7, 2025 at 7:09 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> The function btrfs_end_repair_bio() has an ASSERT() making sure the
>> folio is page sized.
>>
>> The reason is mostly related to the fact that later we pass a folio and
>> its offset into btrfs_repair_io_failure().
>> If we have larger folios passed in, later calculation of the folio and
>> its offset can go wrong, as we have extra offset to the bv_page.
>>
>> Change the behavior by:
>>
>> - Doing a proper folio grab
>> Instead of just page_folio(bv_page), we should get the real page (as the
>> bv_offset can be larger than page size), then call page_folio().
>>
>> - Do extra folio offset calculation
>>
>> real_page
>> bv_page | bv_offset (10K)
>> | | |
>> v v v
>> | | | |
>> |<- F1 ->|<--- Folio 2 -->|
>> | result off |
>>
>> '|' is page boundary.
>>
>> The folio is the one containing that real_page.
>> We want the real offset inside that folio.
>>
>> The result offset we want is of two parts:
>> - the offset of the real page to the folio page
>
> "to the folio page", this is confusing for me. Isn't it the offset of
> the page inside the folio?
> I.e. "the offset of the real page inside the folio"
That's correct.
>
> Also, this terminology "real page", does it come from somewhere?
> Aren't all pages "real"?
The "real" part is compared to bv_page, which is not really the page
we're working on if the bvec is a multi-page one.
>
>> - the offset inside that real page
>>
>> We can not use offset_in_folio() which will give an incorrect result.
>> (2K instead of 6K, as folio 1 has a different order)
>
> I don't think anyone can understand where that 2K and 6K come from.
> The diagram doesn't mention how many pages per folio (folio order),
> page sizes, and file offset of each folio.
> So mentioning that 2K and 6K without all the relevant information,
> make it useless and confusing IMO.
>
>>
>> With these changes, now btrfs_end_repair_bio() is able to handle not
>> only large folios, but also multi-page bio vectors.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/bio.c | 61 ++++++++++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 54 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
>> index 8c2eee1f1878..3140aa19aadc 100644
>> --- a/fs/btrfs/bio.c
>> +++ b/fs/btrfs/bio.c
>> @@ -156,6 +156,58 @@ static void btrfs_repair_done(struct btrfs_failed_bio *fbio)
>> }
>> }
>>
>> +/*
>> + * Since a single bio_vec can merge multiple physically contiguous pages
>> + * into one bio_vec entry, we can have the following case:
>> + *
>> + * bv_page bv_offset
>> + * v v
>> + * | | | | | | |
>> + *
>> + * In that case we need to grab the real page where bv_offset is at.
>> + */
>> +static struct page *bio_vec_get_real_page(const struct bio_vec *bv)
>> +{
>> + return bv->bv_page + (bv->bv_offset >> PAGE_SHIFT);
>> +}
>> +static struct folio *bio_vec_get_folio(const struct bio_vec *bv)
>
> Please add a blank line between function delcarations.
>
>> +{
>> + return page_folio(bio_vec_get_real_page(bv));
>> +}
>> +
>> +static unsigned long bio_vec_get_folio_offset(const struct bio_vec *bv)
>> +{
>> + const struct page *real_page = bio_vec_get_real_page(bv);
>> + const struct folio *folio = page_folio(real_page);
>> +
>> + /*
>> + * The following ASCII chart is to show how the calculation is done.
>> + *
>> + * real_page
>> + * bv_page | bv_offset (10K)
>> + * | | |
>> + * v v v
>> + * | | | |
>> + * |<- F1 ->|<--- Folio 2 -->|
>> + * | result off |
>> + *
>> + * '|' is page boundary.
>> + *
>> + * The folio is the one containing that real_page.
>> + * We want the real offset inside that folio.
>> + *
>> + * The result offset we want is of two parts:
>> + * - the offset of the real page to the folio page
>> + * - the offset inside that real page
>> + *
>> + * We can not use offset_in_folio() which will give an incorrect result.
>> + * (2K instead of 6K, as folio 1 has a different order)
>
> Same comment here, as this is copied from the change log.
>
> Codewise this looks good to me, but those comments and terminology
> ("real page") make it confusing IMO.
Although this patch will be dropped, as Christoph's physical address
solution is simpler overall, and he is not happy with us poking with the
bvec internals.
Thanks,
Qu
>
> Thanks.
>
>> + */
>> + ASSERT(&folio->page <= real_page);
>> + return (folio_page_idx(folio, real_page) << PAGE_SHIFT) +
>> + offset_in_page(bv->bv_offset);
>> +}
>> +
>> static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>> struct btrfs_device *dev)
>> {
>> @@ -165,12 +217,6 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>> struct bio_vec *bv = bio_first_bvec_all(&repair_bbio->bio);
>> int mirror = repair_bbio->mirror_num;
>>
>> - /*
>> - * We can only trigger this for data bio, which doesn't support larger
>> - * folios yet.
>> - */
>> - ASSERT(folio_order(page_folio(bv->bv_page)) == 0);
>> -
>> if (repair_bbio->bio.bi_status ||
>> !btrfs_data_csum_ok(repair_bbio, dev, 0, bv)) {
>> bio_reset(&repair_bbio->bio, NULL, REQ_OP_READ);
>> @@ -192,7 +238,8 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>> btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
>> repair_bbio->file_offset, fs_info->sectorsize,
>> repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
>> - page_folio(bv->bv_page), bv->bv_offset, mirror);
>> + bio_vec_get_folio(bv), bio_vec_get_folio_offset(bv),
>> + mirror);
>> } while (mirror != fbio->bbio->mirror_num);
>>
>> done:
>> --
>> 2.49.0
>>
>>
next prev parent reply other threads:[~2025-04-14 22:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 6:09 [PATCH 0/2] btrfs: add large folio support to read-repair and defrag Qu Wenruo
2025-04-07 6:09 ` [PATCH 1/2] btrfs: prepare btrfs_end_repair_bio() for larger data folios Qu Wenruo
2025-04-14 14:25 ` Filipe Manana
2025-04-14 22:58 ` Qu Wenruo [this message]
2025-04-07 6:09 ` [PATCH 2/2] btrfs: enable larger data folios support for defrag Qu Wenruo
2025-04-14 14:38 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d5826f39-da49-44be-ac3d-9e697ce54793@suse.com \
--to=wqu@suse.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox