public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
Date: Thu, 29 Oct 2020 13:50:41 +0200	[thread overview]
Message-ID: <765375f6-7c4f-e9d9-f0f5-3de9bac74cce@suse.com> (raw)
In-Reply-To: <20201029071218.49860-4-wqu@suse.com>



On 29.10.20 г. 9:12 ч., Qu Wenruo wrote:
> Refactor btrfs_lookup_bio_sums() by:
> - Remove the @file_offset parameter
>   There are two factors making the @file_offset parameter useless:
>   * For csum lookup in csum tree, file offset makes no sense
>     We only need disk_bytenr, which is unrelated to file_offset
>   * page_offset (file offset) of each bvec is not contig
>     bio can be merged, meaning we could have pages at different
>     file offsets in the same bio.
>   Thus passing file_offset makes no sense any longer.
>   The only user of file_offset is for data reloc inode, we will use
>   a new function, search_file_offset_in_bio(), to handle it.
> 
> - Extract the csum tree lookup into find_csum_tree_sums()
>   The new function will handle the csum search in csum tree.
>   The return value is the same as btrfs_find_ordered_sum(), returning
>   the found number of sectors who has checksum.
> 
> - Change how we do the main loop
>   The only needed info from bio is:
>   * the on-disk bytenr
>   * the file_offset (if file_offset == (u64)-1)
>   * the length
> 
>   After extracting above info, we can do the search without bio
>   at all, which makes the main loop much simpler:
> 
> 	for (cur_disk_bytenr = orig_disk_bytenr;
> 	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
> 	     cur_disk_bytenr += count * sectorsize) {
> 
> 		/* Lookup csum tree */
> 		count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr,
> 					    search_len, csum_dst);
> 		if (!count) {
> 			/* Csum hole handling */
> 		}
> 	}
> 
> - Use single variable as core to calculate all other offsets
>   Instead of all differnt type of variables, we use only one core
>   variable, cur_disk_bytenr, which represents the current disk bytenr.
> 
>   All involves values can be calculated from that core variable, and
>   all those variable will only be visible in the inner loop.
> 
> 	diff_sectors = div_u64(cur_disk_bytenr - orig_disk_bytenr,
> 			       sectorsize);
> 	cur_disk_bytenr = orig_disk_bytenr +
> 			  diff_sectors * sectorsize;
> 	csum_dst = csum + diff_sectors * csum_size;
> 
> All above refactor makes btrfs_lookup_bio_sums() way more robust than it
> used to, especially related to the file offset lookup.
> Now file_offset lookup is only related to data reloc inode, other wise
> we don't need to bother file_offset at all.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/compression.c |   4 +-
>  fs/btrfs/ctree.h       |   2 +-
>  fs/btrfs/file-item.c   | 256 ++++++++++++++++++++++++++---------------
>  fs/btrfs/inode.c       |   5 +-
>  4 files changed, 171 insertions(+), 96 deletions(-)
> 

<snip>


> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index acacdd0bfe2c..85e7a3618a07 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -239,39 +239,140 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +/*
> + * Helper to find csums for logical bytenr range
> + * [disk_bytenr, disk_bytenr + len) and restore the result to @dst.

nit: s/restore/store

> + *
> + * Return >0 for the number of sectors we found.
> + * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum
> + * for it. Caller may want to try next sector until one range is hit.
> + * Return <0 for fatal error.
> + */
> +static int find_csum_tree_sums(struct btrfs_fs_info *fs_info,
> +			       struct btrfs_path *path, u64 disk_bytenr,
> +			       u64 len, u8 *dst)

A better name would be search_csum_tree.

> +{
> +	struct btrfs_csum_item *item = NULL;
> +	struct btrfs_key key;
> +	u32 csum_size = btrfs_super_csum_size(fs_info->super_copy);

nit: Why u32, btrfs_super_csum_size is defined as returning 'int',
however this function is really returning u16 since "struct btrfs_csums"
is defined as u16.

> +	u32 sectorsize = fs_info->sectorsize;
> +	int ret;
> +	u64 csum_start;
> +	u64 csum_len;
> +
> +	ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) &&
> +	       IS_ALIGNED(len, sectorsize));
> +
> +	/* Check if the current csum item covers disk_bytenr */
> +	if (path->nodes[0]) {
> +		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
> +				      struct btrfs_csum_item);
> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +		csum_start = key.offset;
> +		csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /

nit: put the division in brackets, it doesn't affect correctness but it
makes it more obvious since multiplication and division have same
precedence then associativitiy comes into play...

> +			   csum_size * sectorsize;
> +
> +		if (csum_start <= disk_bytenr &&
> +		    csum_start + csum_len > disk_bytenr)

if (in_range(disk_bytenr, csum_start, csum_len))

> +			goto found;
> +	}
> +
> +	/* Current item doesn't contain the desired range, re-search */
> +	btrfs_release_path(path);
> +	item = btrfs_lookup_csum(NULL, fs_info->csum_root, path,
> +				 disk_bytenr, 0);
> +	if (IS_ERR(item)) {
> +		ret = PTR_ERR(item);
> +		goto out;
> +	}
> +found:

The found label could be moved right before ret = div_u64 since if you
go directly to it it's guaranteed you have already done the
btrfs_item_to_key et al operations so you are simply duplicating them now.

> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +	csum_start = key.offset;
> +	csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
> +		   csum_size * sectorsize;

nit: put brackets around division.

> +	ASSERT(csum_start <= disk_bytenr &&
> +	       csum_start + csum_len > disk_bytenr);

Use in_range macro

> +
> +	ret = div_u64(min(csum_start + csum_len, disk_bytenr + len) -
> +		      disk_bytenr, sectorsize);
> +	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
> +			ret * csum_size);
> +out:
> +	if (ret == -ENOENT) {
> +		ret = 0;
> +		memset(dst, 0, csum_size);
> +	}
> +	return ret;
> +}
> +
> +/*
> + * A helper to locate the file_offset of @cur_disk_bytenr of a @bio.
> + *
> + * Bio of btrfs represents read range of
> + * [bi_sector << 9, bi_sector << 9 + bi_size).
> + * Knowing this, we can iterate through each bvec to locate the page belong to
> + * @cur_disk_bytenr and get the file offset.
> + *
> + * @inode is used to determine the bvec page really belongs to @inode.
> + *
> + * Return 0 if we can't find the file offset;
> + * Return >0 if we find the file offset and restore it to @file_offset_ret
> + */
> +static int search_file_offset_in_bio(struct bio *bio, struct inode *inode,
> +				     u64 disk_bytenr, u64 *file_offset_ret)
> +{
> +	struct bvec_iter iter;
> +	struct bio_vec bvec;
> +	u64 cur = bio->bi_iter.bi_sector << 9;
> +	int ret = 0;
> +
> +	bio_for_each_segment(bvec, bio, iter) {
> +		struct page *page = bvec.bv_page;
> +
> +		if (cur > disk_bytenr)
> +			break;
> +		if (cur + bvec.bv_len <= disk_bytenr) {
> +			cur += bvec.bv_len;
> +			continue;
> +		}
> +		ASSERT(cur <= disk_bytenr && cur + bvec.bv_len > disk_bytenr);

in_range(disk_bytenr, cur, bvec.bv_len)

> +		if (page && page->mapping && page->mapping->host &&
> +		    page->mapping->host == inode) {
We are guaranteed to always have a page, since this bvec will have been
aded via bio_add_page, since those will be data pages we are guaranteed
to have mapping and mapping->host, so you ought to only check if it's
the same inode, no ?

> +			ret = 1;
> +			*file_offset_ret = page_offset(page) + bvec.bv_offset
> +				+ disk_bytenr - cur;
> +			break;
> +		}
> +	}
> +	return ret;
> +}

> +
<snip>

> @@ -323,81 +427,53 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
>  		path->skip_locking = 1;
>  	}
>  
> -	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
> -
> -	bio_for_each_segment(bvec, bio, iter) {
> -		page_bytes_left = bvec.bv_len;
> -		if (count)
> -			goto next;
> -
> -		if (page_offsets)
> -			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
> +	for (cur_disk_bytenr = orig_disk_bytenr;
> +	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
> +	     cur_disk_bytenr += count * sectorsize) {
> +		int search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr;
> +		int diff_sectors;

 This variable is misnamed, it's  really offset_sector or sector_offset.

> +		u8 *csum_dst;> +
> +		diff_sectors = div_u64(cur_disk_bytenr - orig_disk_bytenr,
> +				       sectorsize);
> +		cur_disk_bytenr = orig_disk_bytenr +
> +				  diff_sectors * sectorsize;

Since you already increment cur_disk_bytenr with count * sector size
where count is the number of csums, why do you have recalculate it here?
Furthermore you convert to sectors in diff_sectors and then you multiply
it by sectorsize which gives you back just cur_disk_bytenr -
orig_disk_bytenr so the end expression is:

cur_disk_bytenr = orig_disk_bytenr + cur_disk_bytenr - orig_disk_bytenr
=> cur_disk_bytenr = cur_disk_bytenr - am I missing something?


> +		csum_dst = csum + diff_sectors * csum_size;
> +
> +		count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr,
> +					    search_len, csum_dst);

Missing error handling if count is negative!

> +		if (!count) {
> +			/*
> +			 * For not found case, the csum has been zeroed
> +			 * in find_csum_tree_sums() already, just skip
> +			 * to next sector.
> +			 */
> +			count = 1;

<snip>


  reply	other threads:[~2020-10-29 11:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29  7:12 [PATCH v2 0/3] btrfs: btrfs_lookup_bio_sums() related fixes Qu Wenruo
2020-10-29  7:12 ` [PATCH v2 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums() Qu Wenruo
2020-10-29 18:57   ` Josef Bacik
2020-10-29 23:57     ` Qu Wenruo
2020-10-29  7:12 ` [PATCH v2 2/3] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
2020-10-29  7:47   ` Nikolay Borisov
2020-10-29  7:53     ` Qu Wenruo
2020-10-29 18:51   ` Josef Bacik
2020-10-29  7:12 ` [PATCH v2 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
2020-10-29 11:50   ` Nikolay Borisov [this message]
2020-10-29 12:43     ` Qu Wenruo
2020-10-29 13:54       ` David Sterba
2020-10-29 14:11         ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=765375f6-7c4f-e9d9-f0f5-3de9bac74cce@suse.com \
    --to=nborisov@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox