From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission
Date: Wed, 7 Aug 2024 10:18:05 -0400 [thread overview]
Message-ID: <20240807141805.GB242945@perftesting> (raw)
In-Reply-To: <9102c028537fbc1d94c4b092dd4a9940661bc58b.1723020573.git.wqu@suse.com>
On Wed, Aug 07, 2024 at 06:21:00PM +0930, Qu Wenruo wrote:
> Unlike the bitmap usage inside raid56, for __extent_writepage_io() we
> handle the subpage submission not sector-by-sector, but for each dirty
> range we found.
>
> This is not a big deal normally, as normally the subpage complex code is
> already mostly optimized out.
>
> For the sake of consistency and for the future of subpage sector-perfect
> compression support, this patch does:
>
> - Extract the sector submission code into submit_one_sector()
>
> - Add the needed code to extract the dirty bitmap for subpage case
>
> - Use bitmap_and() to calculate the target sectors we need to submit
>
> For x86_64, the dirty bitmap will be fixed to 1, with the length of 1,
> so we're still doing the same workload per sector.
>
> For larger page sizes, the overhead will be a little larger, as previous
> we only need to do one extent_map lookup per-dirty-range, but now it
> will be one extent_map lookup per-sector.
I'd like this to be a followup, because even in the normal page case (or hell
the subpage case) we shouldn't have to look up the extent map over and over
again, it could span a much larger area.
>
> But that is the same frequency as x86_64, so we're just aligning the
> behavior to x86_64.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> The plan of sector-perfect subpage compression is to introduce another
> bitmap, possibly inside bio_ctrl, to indicate which sectors do not need
> writeback submission (either inline, or async submission).
>
> So that __extent_writepage_io() can properly skip those ranges to
> support sector-perfect subpage compression.
> ---
> fs/btrfs/extent_io.c | 188 +++++++++++++++++--------------------------
> fs/btrfs/subpage.c | 17 ++++
> fs/btrfs/subpage.h | 3 +
> 3 files changed, 96 insertions(+), 112 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 040c92541bc9..6acd763e8f26 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1333,56 +1333,65 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
> }
>
> /*
> - * Find the first byte we need to write.
> + * Return 0 if we have submitted or queued the sector for submission.
> + * Return <0 for critical errors.
> *
> - * For subpage, one page can contain several sectors, and
> - * __extent_writepage_io() will just grab all extent maps in the page
> - * range and try to submit all non-inline/non-compressed extents.
> - *
> - * This is a big problem for subpage, we shouldn't re-submit already written
> - * data at all.
> - * This function will lookup subpage dirty bit to find which range we really
> - * need to submit.
> - *
> - * Return the next dirty range in [@start, @end).
> - * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE.
> + * Caller should make sure filepos < i_size and handle filepos >= i_size case.
> */
> -static void find_next_dirty_byte(const struct btrfs_fs_info *fs_info,
> - struct folio *folio, u64 *start, u64 *end)
> +static int submit_one_sector(struct btrfs_inode *inode,
> + struct folio *folio,
> + u64 filepos, struct btrfs_bio_ctrl *bio_ctrl,
> + loff_t i_size)
> {
> - struct btrfs_subpage *subpage = folio_get_private(folio);
> - struct btrfs_subpage_info *spi = fs_info->subpage_info;
> - u64 orig_start = *start;
> - /* Declare as unsigned long so we can use bitmap ops */
> - unsigned long flags;
> - int range_start_bit;
> - int range_end_bit;
> + struct btrfs_fs_info *fs_info = inode->root->fs_info;
> + struct extent_map *em;
> + u64 block_start;
> + u64 disk_bytenr;
> + u64 extent_offset;
> + u64 em_end;
> + u32 sectorsize = fs_info->sectorsize;
>
> - /*
> - * For regular sector size == page size case, since one page only
> - * contains one sector, we return the page offset directly.
> - */
> - if (!btrfs_is_subpage(fs_info, folio->mapping)) {
> - *start = folio_pos(folio);
> - *end = folio_pos(folio) + folio_size(folio);
> - return;
> + ASSERT(IS_ALIGNED(filepos, sectorsize));
> +
> + /* @filepos >= i_size case should be handled by the caller. */
> + ASSERT(filepos < i_size);
> +
> + em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
> + if (IS_ERR(em))
> + return PTR_ERR_OR_ZERO(em);
> +
> + extent_offset = filepos - em->start;
> + em_end = extent_map_end(em);
> + ASSERT(filepos <= em_end);
> + ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
> + ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
> +
> + block_start = extent_map_block_start(em);
> + disk_bytenr = extent_map_block_start(em) + extent_offset;
> +
> + ASSERT(!extent_map_is_compressed(em));
> + ASSERT(block_start != EXTENT_MAP_HOLE);
> + ASSERT(block_start != EXTENT_MAP_INLINE);
> +
> + free_extent_map(em);
> + em = NULL;
> +
> + btrfs_set_range_writeback(inode, filepos, filepos + sectorsize - 1);
> + if (!folio_test_writeback(folio)) {
> + btrfs_err(fs_info,
> + "folio %lu not writeback, cur %llu end %llu",
> + folio->index, filepos, filepos + sectorsize);
> }
> -
> - range_start_bit = spi->dirty_offset +
> - (offset_in_folio(folio, orig_start) >>
> - fs_info->sectorsize_bits);
> -
> - /* We should have the page locked, but just in case */
> - spin_lock_irqsave(&subpage->lock, flags);
> - bitmap_next_set_region(subpage->bitmaps, &range_start_bit, &range_end_bit,
> - spi->dirty_offset + spi->bitmap_nr_bits);
> - spin_unlock_irqrestore(&subpage->lock, flags);
> -
> - range_start_bit -= spi->dirty_offset;
> - range_end_bit -= spi->dirty_offset;
> -
> - *start = folio_pos(folio) + range_start_bit * fs_info->sectorsize;
> - *end = folio_pos(folio) + range_end_bit * fs_info->sectorsize;
> + /*
> + * Although the PageDirty bit is cleared before entering this
> + * function, subpage dirty bit is not cleared.
> + * So clear subpage dirty bit here so next time we won't submit
> + * folio for range already written to disk.
> + */
> + btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
> + submit_extent_folio(bio_ctrl, disk_bytenr, folio,
> + sectorsize, filepos - folio_pos(folio));
> + return 0;
> }
>
> /*
> @@ -1401,10 +1410,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
> {
> struct btrfs_fs_info *fs_info = inode->root->fs_info;
> u64 cur = start;
> - u64 end = start + len - 1;
> - u64 extent_offset;
> - u64 block_start;
> - struct extent_map *em;
> + unsigned long dirty_bitmap = 1;
> + unsigned long range_bitmap = 0;
> + unsigned int bitmap_size = 1;
> + int bit;
> int ret = 0;
> int nr = 0;
>
> @@ -1419,18 +1428,23 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
> return 1;
> }
>
> + if (btrfs_is_subpage(fs_info, inode->vfs_inode.i_mapping)) {
> + ASSERT(fs_info->subpage_info);
> + btrfs_get_subpage_dirty_bitmap(fs_info, folio, &dirty_bitmap);
> + bitmap_size = fs_info->subpage_info->bitmap_nr_bits;
> + }
> + for (cur = start; cur < start + len; cur += fs_info->sectorsize)
> + set_bit((cur - start) >> fs_info->sectorsize_bits, &range_bitmap);
> + bitmap_and(&dirty_bitmap, &dirty_bitmap, &range_bitmap, bitmap_size);
I'd rather move this completely under the btrfs_is_subpage() case, we don't need
to do this where sectorsize == page size.
Other than that this cleans things up nicely, you can add my
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
to v2. Thanks,
Josef
next prev parent reply other threads:[~2024-08-07 14:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-07 8:51 [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission Qu Wenruo
2024-08-07 14:18 ` Josef Bacik [this message]
2024-08-07 22:01 ` Qu Wenruo
2024-08-07 16:27 ` David Sterba
2024-08-07 22:05 ` Qu Wenruo
2024-08-07 23:22 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240807141805.GB242945@perftesting \
--to=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox