public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission
Date: Wed, 7 Aug 2024 10:18:05 -0400	[thread overview]
Message-ID: <20240807141805.GB242945@perftesting> (raw)
In-Reply-To: <9102c028537fbc1d94c4b092dd4a9940661bc58b.1723020573.git.wqu@suse.com>

On Wed, Aug 07, 2024 at 06:21:00PM +0930, Qu Wenruo wrote:
> Unlike the bitmap usage inside raid56, for __extent_writepage_io() we
> handle the subpage submission not sector-by-sector, but for each dirty
> range we found.
> 
> This is not a big deal normally, as normally the subpage complex code is
> already mostly optimized out.
> 
> For the sake of consistency and for the future of subpage sector-perfect
> compression support, this patch does:
> 
> - Extract the sector submission code into submit_one_sector()
> 
> - Add the needed code to extract the dirty bitmap for subpage case
> 
> - Use bitmap_and() to calculate the target sectors we need to submit
> 
> For x86_64, the dirty bitmap will be fixed to 1, with the length of 1,
> so we're still doing the same workload per sector.
> 
> For larger page sizes, the overhead will be a little larger, as previous
> we only need to do one extent_map lookup per-dirty-range, but now it
> will be one extent_map lookup per-sector.

I'd like this to be a followup, because even in the normal page case (or hell
the subpage case) we shouldn't have to look up the extent map over and over
again, it could span a much larger area.

> 
> But that is the same frequency as x86_64, so we're just aligning the
> behavior to x86_64.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> The plan of sector-perfect subpage compression is to introduce another
> bitmap, possibly inside bio_ctrl, to indicate which sectors do not need
> writeback submission (either inline, or async submission).
> 
> So that __extent_writepage_io() can properly skip those ranges to
> support sector-perfect subpage compression.
> ---
>  fs/btrfs/extent_io.c | 188 +++++++++++++++++--------------------------
>  fs/btrfs/subpage.c   |  17 ++++
>  fs/btrfs/subpage.h   |   3 +
>  3 files changed, 96 insertions(+), 112 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 040c92541bc9..6acd763e8f26 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1333,56 +1333,65 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
>  }
>  
>  /*
> - * Find the first byte we need to write.
> + * Return 0 if we have submitted or queued the sector for submission.
> + * Return <0 for critical errors.
>   *
> - * For subpage, one page can contain several sectors, and
> - * __extent_writepage_io() will just grab all extent maps in the page
> - * range and try to submit all non-inline/non-compressed extents.
> - *
> - * This is a big problem for subpage, we shouldn't re-submit already written
> - * data at all.
> - * This function will lookup subpage dirty bit to find which range we really
> - * need to submit.
> - *
> - * Return the next dirty range in [@start, @end).
> - * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE.
> + * Caller should make sure filepos < i_size and handle filepos >= i_size case.
>   */
> -static void find_next_dirty_byte(const struct btrfs_fs_info *fs_info,
> -				 struct folio *folio, u64 *start, u64 *end)
> +static int submit_one_sector(struct btrfs_inode *inode,
> +			     struct folio *folio,
> +			     u64 filepos, struct btrfs_bio_ctrl *bio_ctrl,
> +			     loff_t i_size)
>  {
> -	struct btrfs_subpage *subpage = folio_get_private(folio);
> -	struct btrfs_subpage_info *spi = fs_info->subpage_info;
> -	u64 orig_start = *start;
> -	/* Declare as unsigned long so we can use bitmap ops */
> -	unsigned long flags;
> -	int range_start_bit;
> -	int range_end_bit;
> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
> +	struct extent_map *em;
> +	u64 block_start;
> +	u64 disk_bytenr;
> +	u64 extent_offset;
> +	u64 em_end;
> +	u32 sectorsize = fs_info->sectorsize;
>  
> -	/*
> -	 * For regular sector size == page size case, since one page only
> -	 * contains one sector, we return the page offset directly.
> -	 */
> -	if (!btrfs_is_subpage(fs_info, folio->mapping)) {
> -		*start = folio_pos(folio);
> -		*end = folio_pos(folio) + folio_size(folio);
> -		return;
> +	ASSERT(IS_ALIGNED(filepos, sectorsize));
> +
> +	/* @filepos >= i_size case should be handled by the caller. */
> +	ASSERT(filepos < i_size);
> +
> +	em = btrfs_get_extent(inode, NULL, filepos, sectorsize);
> +	if (IS_ERR(em))
> +		return PTR_ERR_OR_ZERO(em);
> +
> +	extent_offset = filepos - em->start;
> +	em_end = extent_map_end(em);
> +	ASSERT(filepos <= em_end);
> +	ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
> +	ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
> +
> +	block_start = extent_map_block_start(em);
> +	disk_bytenr = extent_map_block_start(em) + extent_offset;
> +
> +	ASSERT(!extent_map_is_compressed(em));
> +	ASSERT(block_start != EXTENT_MAP_HOLE);
> +	ASSERT(block_start != EXTENT_MAP_INLINE);
> +
> +	free_extent_map(em);
> +	em = NULL;
> +
> +	btrfs_set_range_writeback(inode, filepos, filepos + sectorsize - 1);
> +	if (!folio_test_writeback(folio)) {
> +		btrfs_err(fs_info,
> +			  "folio %lu not writeback, cur %llu end %llu",
> +			  folio->index, filepos, filepos + sectorsize);
>  	}
> -
> -	range_start_bit = spi->dirty_offset +
> -			  (offset_in_folio(folio, orig_start) >>
> -			   fs_info->sectorsize_bits);
> -
> -	/* We should have the page locked, but just in case */
> -	spin_lock_irqsave(&subpage->lock, flags);
> -	bitmap_next_set_region(subpage->bitmaps, &range_start_bit, &range_end_bit,
> -			       spi->dirty_offset + spi->bitmap_nr_bits);
> -	spin_unlock_irqrestore(&subpage->lock, flags);
> -
> -	range_start_bit -= spi->dirty_offset;
> -	range_end_bit -= spi->dirty_offset;
> -
> -	*start = folio_pos(folio) + range_start_bit * fs_info->sectorsize;
> -	*end = folio_pos(folio) + range_end_bit * fs_info->sectorsize;
> +	/*
> +	 * Although the PageDirty bit is cleared before entering this
> +	 * function, subpage dirty bit is not cleared.
> +	 * So clear subpage dirty bit here so next time we won't submit
> +	 * folio for range already written to disk.
> +	 */
> +	btrfs_folio_clear_dirty(fs_info, folio, filepos, sectorsize);
> +	submit_extent_folio(bio_ctrl, disk_bytenr, folio,
> +			    sectorsize, filepos - folio_pos(folio));
> +	return 0;
>  }
>  
>  /*
> @@ -1401,10 +1410,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  {
>  	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>  	u64 cur = start;
> -	u64 end = start + len - 1;
> -	u64 extent_offset;
> -	u64 block_start;
> -	struct extent_map *em;
> +	unsigned long dirty_bitmap = 1;
> +	unsigned long range_bitmap = 0;
> +	unsigned int bitmap_size = 1;
> +	int bit;
>  	int ret = 0;
>  	int nr = 0;
>  
> @@ -1419,18 +1428,23 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  		return 1;
>  	}
>  
> +	if (btrfs_is_subpage(fs_info, inode->vfs_inode.i_mapping)) {
> +		ASSERT(fs_info->subpage_info);
> +		btrfs_get_subpage_dirty_bitmap(fs_info, folio, &dirty_bitmap);
> +		bitmap_size = fs_info->subpage_info->bitmap_nr_bits;
> +	}
> +	for (cur = start; cur < start + len; cur += fs_info->sectorsize)
> +		set_bit((cur - start) >> fs_info->sectorsize_bits, &range_bitmap);
> +	bitmap_and(&dirty_bitmap, &dirty_bitmap, &range_bitmap, bitmap_size);

I'd rather move this completely under the btrfs_is_subpage() case, we don't need
to do this where sectorsize == page size.

Other than that this cleans things up nicely, you can add my

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to v2.  Thanks,

Josef

  reply	other threads:[~2024-08-07 14:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-07  8:51 [PATCH] btrfs: refactor __extent_writepage_io() to do sector-by-sector submission Qu Wenruo
2024-08-07 14:18 ` Josef Bacik [this message]
2024-08-07 22:01   ` Qu Wenruo
2024-08-07 16:27 ` David Sterba
2024-08-07 22:05   ` Qu Wenruo
2024-08-07 23:22     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240807141805.GB242945@perftesting \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox