Re: [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission
Date: Wed, 7 Aug 2024 10:18:05 -0400	[thread overview]
Message-ID: <20240807141805.GB242945@perftesting> (raw)
In-Reply-To: <9102c028537fbc1d94c4b092dd4a9940661bc58b.1723020573.git.wqu@suse.com>

On Wed, Aug 07, 2024 at 06:21:00PM +0930, Qu Wenruo wrote:
> Unlike the bitmap usage inside raid56, for __extent_writepage_io() we
> handle the subpage submission not sector-by-sector, but for each dirty
> range we found.
> 
> This is not a big deal normally, as normally the subpage complex code is
> already mostly optimized out.
> 
> For the sake of consistency and for the future of subpage sector-perfect
> compression support, this patch does:
> 
> - Extract the sector submission code into submit_one_sector()
> 
> - Add the needed code to extract the dirty bitmap for subpage case
> 
> - Use bitmap_and() to calculate the target sectors we need to submit
> 
> For x86_64, the dirty bitmap will be fixed to 1, with the length of 1,
> so we're still doing the same workload per sector.
> 
> For larger page sizes, the overhead will be a little larger, as previous
> we only need to do one extent_map lookup per-dirty-range, but now it
> will be one extent_map lookup per-sector.

I'd like this to be a followup, because even in the normal page case (or hell
the subpage case) we shouldn't have to look up the extent map over and over
again, it could span a much larger area.

> 
> But that is the same frequency as x86_64, so we're just aligning the
> behavior to x86_64.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> The plan of sector-perfect subpage compression is to introduce another
> bitmap, possibly inside bio_ctrl, to indicate which sectors do not need
> writeback submission (either inline, or async submission).
> 
> So that __extent_writepage_io() can properly skip those ranges to
> support sector-perfect subpage compression.
> ---
>  fs/btrfs/extent_io.c | 188 +++++++++++++++++--------------------------
>  fs/btrfs/subpage.c   |  17 ++++
>  fs/btrfs/subpage.h   |   3 +
>  3 files changed, 96 insertions(+), 112 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 040c92541bc9..6acd763e8f26 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1333,56 +1333,65 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>  }
>  
>  /*
> - * Find the first byte we need to write.
> + * Return 0 if we have submitted or queued the sector for submission.
> + * Return <0 for critical errors.
>   *
> - * For subpage, one page can contain several sectors, and
> - * __extent_writepage_io() will just grab all extent maps in the page
> - * range and try to submit all non-inline/non-compressed extents.
> - *
> - * This is a big problem for subpage, we shouldn't re-submit already written
> - * data at all.
> - * This function will lookup subpage dirty bit to find which range we really
> - * need to submit.
> - *
> - * Return the next dirty range in [@start, @end).
> - * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE.
> + * Caller should make sure filepos < i_size and handle filepos >= i_size case.
>   */
> -static void find_next_dirty_byte(const struct btrfs_fs_info *fs_info,
> -				 struct folio *folio, u64 *start, u64 *end)
> +static int submit_one_sector(struct btrfs_inode *inode,
> +			     struct folio *folio,
> +			     u64 filepos, struct btrfs_bio_ctrl *bio_ctrl,
> +			     loff_t i_size)
>  {
> -	struct btrfs_subpage *subpage = folio_get_private(folio);
> -	struct btrfs_subpage_info *spi = fs_info->subpage_info;
> -	u64 orig_start = *start;
> -	/* Declare as unsigned long so we can use bitmap ops */
> -	unsigned long flags;
> -	int range_start_bit;
> -	int range_end_bit;
> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
> +	struct extent_map *em;
> +	u64 block_start;
> +	u64 disk_bytenr;
> +	u64 extent_offset;
> +	u64 em_end;
> +	u32 sectorsize = fs_info->sectorsize;
>  
> -	/*
> -	 * For regular sector size == page size case, since one page only
> -	 * contains one sector, we return the page offset directly.
> -	 */
> -	if (!btrfs_is_subpage(fs_info, folio->mapping)) {
> -		*start = folio_pos(folio);
> -		*end = folio_pos(folio) + folio_size(folio);
> -		return;
> +	ASSERT(IS_ALIGNED(filepos, sectorsize));
> +
> +	/* @filepos >= i_size case should be handled by the caller. */
> +	ASSERT(filepos < i_size);
> +
> +	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
> +	if (IS_ERR(em))
> +		return PTR_ERR_OR_ZERO(em);
> +
> +	extent_offset = filepos - em->start;
> +	em_end = extent_map_end(em);
> +	ASSERT(filepos <= em_end);
> +	ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
> +	ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
> +
> +	block_start = extent_map_block_start(em);
> +	disk_bytenr = extent_map_block_start(em) + extent_offset;
> +
> +	ASSERT(!extent_map_is_compressed(em));
> +	ASSERT(block_start != EXTENT_MAP_HOLE);
> +	ASSERT(block_start != EXTENT_MAP_INLINE);
> +
> +	free_extent_map(em);
> +	em = NULL;
> +
> +	btrfs_set_range_writeback(inode, filepos, filepos + sectorsize - 1);
> +	if (!folio_test_writeback(folio)) {
> +		btrfs_err(fs_info,
> +			  "folio %lu not writeback, cur %llu end %llu",
> +			  folio->index, filepos, filepos + sectorsize);
>  	}
> -
> -	range_start_bit = spi->dirty_offset +
> -			  (offset_in_folio(folio, orig_start) >>
> -			   fs_info->sectorsize_bits);
> -
> -	/* We should have the page locked, but just in case */
> -	spin_lock_irqsave(&subpage->lock, flags);
> -	bitmap_next_set_region(subpage->bitmaps, &range_start_bit, &range_end_bit,
> -			       spi->dirty_offset + spi->bitmap_nr_bits);
> -	spin_unlock_irqrestore(&subpage->lock, flags);
> -
> -	range_start_bit -= spi->dirty_offset;
> -	range_end_bit -= spi->dirty_offset;
> -
> -	*start = folio_pos(folio) + range_start_bit * fs_info->sectorsize;
> -	*end = folio_pos(folio) + range_end_bit * fs_info->sectorsize;
> +	/*
> +	 * Although the PageDirty bit is cleared before entering this
> +	 * function, subpage dirty bit is not cleared.
> +	 * So clear subpage dirty bit here so next time we won't submit
> +	 * folio for range already written to disk.
> +	 */
> +	btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
> +	submit_extent_folio(bio_ctrl, disk_bytenr, folio,
> +			    sectorsize, filepos - folio_pos(folio));
> +	return 0;
>  }
>  
>  /*
> @@ -1401,10 +1410,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  {
>  	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>  	u64 cur = start;
> -	u64 end = start + len - 1;
> -	u64 extent_offset;
> -	u64 block_start;
> -	struct extent_map *em;
> +	unsigned long dirty_bitmap = 1;
> +	unsigned long range_bitmap = 0;
> +	unsigned int bitmap_size = 1;
> +	int bit;
>  	int ret = 0;
>  	int nr = 0;
>  
> @@ -1419,18 +1428,23 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  		return 1;
>  	}
>  
> +	if (btrfs_is_subpage(fs_info, inode->vfs_inode.i_mapping)) {
> +		ASSERT(fs_info->subpage_info);
> +		btrfs_get_subpage_dirty_bitmap(fs_info, folio, &dirty_bitmap);
> +		bitmap_size = fs_info->subpage_info->bitmap_nr_bits;
> +	}
> +	for (cur = start; cur < start + len; cur += fs_info->sectorsize)
> +		set_bit((cur - start) >> fs_info->sectorsize_bits, &range_bitmap);
> +	bitmap_and(&dirty_bitmap, &dirty_bitmap, &range_bitmap, bitmap_size);

I'd rather move this completely under the btrfs_is_subpage() case, we don't need
to do this where sectorsize == page size.

Other than that this cleans things up nicely, you can add my

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to v2.  Thanks,

Josef

next prev parent reply	other threads:[~2024-08-07 14:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-07  8:51 [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission Qu Wenruo
2024-08-07 14:18 ` Josef Bacik [this message]
2024-08-07 22:01   ` Qu Wenruo
2024-08-07 16:27 ` David Sterba
2024-08-07 22:05   ` Qu Wenruo
2024-08-07 23:22     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240807141805.GB242945@perftesting \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.