RFC: (almost) stop poking into bvec internals in btrfs

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* RFC: (almost) stop poking into bvec internals in btrfs
@ 2025-04-09 11:10 Christoph Hellwig
  2025-04-09 11:10 ` [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi Qu,

as you asked for patches to clean up the btrfs poking into bvec
internals, this series does an almost complete sweep.  The only bit
left is the page address in the decompression helper.  I though it
should be able to be replaced with logic based on bbio->file_offset,
but that leads to crashed and data corruption and I ran out of the
time I could allocate to this.

This passes a quick xfstests run with and without -o compress, but
otherwise should be taken as RFC quality code.  Maybe you can take
up the bits you like for your folios work?

Diffstat:
 bio.c         |    9 +-
 bio.h         |    3 
 btrfs_inode.h |    4 -
 compression.c |    9 +-
 disk-io.c     |    7 +-
 extent_io.c   |   60 ++++--------------
 inode.c       |   18 ++---
 raid56.c      |  193 ++++++++++++++++++++++------------------------------------
 scrub.c       |   80 +++++++-----------------
 9 files changed, 142 insertions(+), 241 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-09 22:13   ` Qu Wenruo
  2025-04-09 11:10 ` [PATCH 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

end_bbio_data_read checks that each iterated folio fragment is aligned
and justifies that with block drivers advancing the bio.  But block
driver only advance bi_iter, while end_bbio_data_read uses
bio_for_each_folio_all to iterate the immutable bi_io_vec array that
can't be changed by drivers at all.

Remove the alignment checking and the misleading comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 197f5e51c474..193736b07a0b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -512,43 +512,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
 	struct btrfs_fs_info *fs_info = bbio->fs_info;
 	struct bio *bio = &bbio->bio;
 	struct folio_iter fi;
-	const u32 sectorsize = fs_info->sectorsize;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_folio_all(fi, &bbio->bio) {
 		bool uptodate = !bio->bi_status;
 		struct folio *folio = fi.folio;
 		struct inode *inode = folio->mapping->host;
-		u64 start;
-		u64 end;
-		u32 len;
+		u64 start = folio_pos(folio) + fi.offset;
 
 		btrfs_debug(fs_info,
 			"%s: bi_sector=%llu, err=%d, mirror=%u",
 			__func__, bio->bi_iter.bi_sector, bio->bi_status,
 			bbio->mirror_num);
 
-		/*
-		 * We always issue full-sector reads, but if some block in a
-		 * folio fails to read, blk_update_request() will advance
-		 * bv_offset and adjust bv_len to compensate.  Print a warning
-		 * for unaligned offsets, and an error if they don't add up to
-		 * a full sector.
-		 */
-		if (!IS_ALIGNED(fi.offset, sectorsize))
-			btrfs_err(fs_info,
-		"partial page read in btrfs with offset %zu and length %zu",
-				  fi.offset, fi.length);
-		else if (!IS_ALIGNED(fi.offset + fi.length, sectorsize))
-			btrfs_info(fs_info,
-		"incomplete page read with offset %zu and length %zu",
-				   fi.offset, fi.length);
-
-		start = folio_pos(folio) + fi.offset;
-		end = start + fi.length - 1;
-		len = fi.length;
 
 		if (likely(uptodate)) {
+			u64 end = start + fi.length - 1;
 			loff_t i_size = i_size_read(inode);
 
 			/*
@@ -573,7 +552,7 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
 		}
 
 		/* Update page status and unlock. */
-		end_folio_read(folio, uptodate, start, len);
+		end_folio_read(folio, uptodate, start, fi.length);
 	}
 	bio_put(bio);
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read
  2025-04-09 11:10 ` [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read Christoph Hellwig
@ 2025-04-09 22:13   ` Qu Wenruo
  2025-04-10  5:30     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2025-04-09 22:13 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> end_bbio_data_read checks that each iterated folio fragment is aligned
> and justifies that with block drivers advancing the bio.  But block
> driver only advance bi_iter, while end_bbio_data_read uses
> bio_for_each_folio_all to iterate the immutable bi_io_vec array that
> can't be changed by drivers at all.

The old comment indeed is incorrect, but can we still leave an ASSERT()
just in case to case any unaligned submission?

It shouldn't cause anything different for end users, but should greatly
improve the life of quality for developers.

Thanks,
Qu>
> Remove the alignment checking and the misleading comment.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 27 +++------------------------
>   1 file changed, 3 insertions(+), 24 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 197f5e51c474..193736b07a0b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -512,43 +512,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>   	struct btrfs_fs_info *fs_info = bbio->fs_info;
>   	struct bio *bio = &bbio->bio;
>   	struct folio_iter fi;
> -	const u32 sectorsize = fs_info->sectorsize;
>
>   	ASSERT(!bio_flagged(bio, BIO_CLONED));
>   	bio_for_each_folio_all(fi, &bbio->bio) {
>   		bool uptodate = !bio->bi_status;
>   		struct folio *folio = fi.folio;
>   		struct inode *inode = folio->mapping->host;
> -		u64 start;
> -		u64 end;
> -		u32 len;
> +		u64 start = folio_pos(folio) + fi.offset;
>
>   		btrfs_debug(fs_info,
>   			"%s: bi_sector=%llu, err=%d, mirror=%u",
>   			__func__, bio->bi_iter.bi_sector, bio->bi_status,
>   			bbio->mirror_num);
>
> -		/*
> -		 * We always issue full-sector reads, but if some block in a
> -		 * folio fails to read, blk_update_request() will advance
> -		 * bv_offset and adjust bv_len to compensate.  Print a warning
> -		 * for unaligned offsets, and an error if they don't add up to
> -		 * a full sector.
> -		 */
> -		if (!IS_ALIGNED(fi.offset, sectorsize))
> -			btrfs_err(fs_info,
> -		"partial page read in btrfs with offset %zu and length %zu",
> -				  fi.offset, fi.length);
> -		else if (!IS_ALIGNED(fi.offset + fi.length, sectorsize))
> -			btrfs_info(fs_info,
> -		"incomplete page read with offset %zu and length %zu",
> -				   fi.offset, fi.length);
> -
> -		start = folio_pos(folio) + fi.offset;
> -		end = start + fi.length - 1;
> -		len = fi.length;
>
>   		if (likely(uptodate)) {
> +			u64 end = start + fi.length - 1;
>   			loff_t i_size = i_size_read(inode);
>
>   			/*
> @@ -573,7 +552,7 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
>   		}
>
>   		/* Update page status and unlock. */
> -		end_folio_read(folio, uptodate, start, len);
> +		end_folio_read(folio, uptodate, start, fi.length);
>   	}
>   	bio_put(bio);
>   }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read
  2025-04-09 22:13   ` Qu Wenruo
@ 2025-04-10  5:30     ` Christoph Hellwig
  2025-04-10  5:39       ` Qu Wenruo
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-10  5:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs

On Thu, Apr 10, 2025 at 07:43:54AM +0930, Qu Wenruo wrote:
> The old comment indeed is incorrect, but can we still leave an ASSERT()
> just in case to case any unaligned submission?
>
> It shouldn't cause anything different for end users, but should greatly
> improve the life of quality for developers.

What is is trying to check for? fi.offset isn't really a meaningful value
to check for as it's just an offset into the folio.  bi_sector would be
useful, but it's not owned by the file system after the bio was submitted.
Maybe adding that assert to the submission path would make sense?  Remember
that the bio completed is exactly the one submitted by the file system,
and cloned split bio never reaches back into the file systems.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read
  2025-04-10  5:30     ` Christoph Hellwig
@ 2025-04-10  5:39       ` Qu Wenruo
  0 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-10  5:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs



在 2025/4/10 15:00, Christoph Hellwig 写道:
> On Thu, Apr 10, 2025 at 07:43:54AM +0930, Qu Wenruo wrote:
>> The old comment indeed is incorrect, but can we still leave an ASSERT()
>> just in case to case any unaligned submission?
>>
>> It shouldn't cause anything different for end users, but should greatly
>> improve the life of quality for developers.
>
> What is is trying to check for? fi.offset isn't really a meaningful value
> to check for as it's just an offset into the folio.  bi_sector would be
> useful, but it's not owned by the file system after the bio was submitted.
> Maybe adding that assert to the submission path would make sense?  Remember
> that the bio completed is exactly the one submitted by the file system,
> and cloned split bio never reaches back into the file systems.

OK, it makes more sense, and we already have the alignment checks in the
write path (submit_one_sector()).

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
  2025-04-09 11:10 ` [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-09 22:15   ` Qu Wenruo
  2025-04-09 11:10 ` [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Stop poking into bio implementation details and recalculating the pos
from the folio over and over and instead just track then end of the
current bio in logical file offsets in the btrfs_bio_ctrl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 193736b07a0b..36db23f7a6bb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -96,6 +96,7 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
  */
 struct btrfs_bio_ctrl {
 	struct btrfs_bio *bbio;
+	loff_t next_file_offset; /* last byte contained in bbio + 1 */
 	enum btrfs_compression_type compress_type;
 	u32 len_to_oe_boundary;
 	blk_opf_t opf;
@@ -643,13 +644,10 @@ static int alloc_eb_folio_array(struct extent_buffer *eb, bool nofail)
 }
 
 static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
-				struct folio *folio, u64 disk_bytenr,
-				unsigned int pg_offset)
+				u64 disk_bytenr, loff_t file_offset)
 {
 	struct bio *bio = &bio_ctrl->bbio->bio;
-	struct bio_vec *bvec = bio_last_bvec_all(bio);
 	const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
-	struct folio *bv_folio = page_folio(bvec->bv_page);
 
 	if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) {
 		/*
@@ -660,19 +658,11 @@ static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
 	}
 
 	/*
-	 * The contig check requires the following conditions to be met:
-	 *
-	 * 1) The folios are belonging to the same inode
-	 *    This is implied by the call chain.
-	 *
-	 * 2) The range has adjacent logical bytenr
-	 *
-	 * 3) The range has adjacent file offset
-	 *    This is required for the usage of btrfs_bio->file_offset.
+	 * To merge into a bio both the disk sector and the logical offset in
+	 * the file need to be contiguous.
 	 */
-	return bio_end_sector(bio) == sector &&
-		folio_pos(bv_folio) + bvec->bv_offset + bvec->bv_len ==
-		folio_pos(folio) + pg_offset;
+	return bio_ctrl->next_file_offset == file_offset &&
+		bio_end_sector(bio) == sector;
 }
 
 static void alloc_new_bio(struct btrfs_inode *inode,
@@ -690,6 +680,7 @@ static void alloc_new_bio(struct btrfs_inode *inode,
 	bbio->file_offset = file_offset;
 	bio_ctrl->bbio = bbio;
 	bio_ctrl->len_to_oe_boundary = U32_MAX;
+	bio_ctrl->next_file_offset = file_offset;
 
 	/* Limit data write bios to the ordered boundary. */
 	if (bio_ctrl->wbc) {
@@ -731,12 +722,13 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 			       size_t size, unsigned long pg_offset)
 {
 	struct btrfs_inode *inode = folio_to_inode(folio);
+	loff_t file_offset = folio_pos(folio) + pg_offset;
 
 	ASSERT(pg_offset + size <= folio_size(folio));
 	ASSERT(bio_ctrl->end_io_func);
 
 	if (bio_ctrl->bbio &&
-	    !btrfs_bio_is_contig(bio_ctrl, folio, disk_bytenr, pg_offset))
+	    !btrfs_bio_is_contig(bio_ctrl, disk_bytenr, file_offset))
 		submit_one_bio(bio_ctrl);
 
 	do {
@@ -745,7 +737,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 		/* Allocate new bio if needed */
 		if (!bio_ctrl->bbio) {
 			alloc_new_bio(inode, bio_ctrl, disk_bytenr,
-				      folio_pos(folio) + pg_offset);
+				      file_offset);
 		}
 
 		/* Cap to the current ordered extent boundary if there is one. */
@@ -760,14 +752,15 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
 			submit_one_bio(bio_ctrl);
 			continue;
 		}
+		bio_ctrl->next_file_offset += len;
 
 		if (bio_ctrl->wbc)
-			wbc_account_cgroup_owner(bio_ctrl->wbc, folio,
-						 len);
+			wbc_account_cgroup_owner(bio_ctrl->wbc, folio, len);
 
 		size -= len;
 		pg_offset += len;
 		disk_bytenr += len;
+		file_offset += len;
 
 		/*
 		 * len_to_oe_boundary defaults to U32_MAX, which isn't folio or
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl
  2025-04-09 11:10 ` [PATCH 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Christoph Hellwig
@ 2025-04-09 22:15   ` Qu Wenruo
  0 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-09 22:15 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> Stop poking into bio implementation details and recalculating the pos
> from the folio over and over and instead just track then end of the
> current bio in logical file offsets in the btrfs_bio_ctrl.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu> ---
>   fs/btrfs/extent_io.c | 33 +++++++++++++--------------------
>   1 file changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 193736b07a0b..36db23f7a6bb 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -96,6 +96,7 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
>    */
>   struct btrfs_bio_ctrl {
>   	struct btrfs_bio *bbio;
> +	loff_t next_file_offset; /* last byte contained in bbio + 1 */
>   	enum btrfs_compression_type compress_type;
>   	u32 len_to_oe_boundary;
>   	blk_opf_t opf;
> @@ -643,13 +644,10 @@ static int alloc_eb_folio_array(struct extent_buffer *eb, bool nofail)
>   }
>
>   static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
> -				struct folio *folio, u64 disk_bytenr,
> -				unsigned int pg_offset)
> +				u64 disk_bytenr, loff_t file_offset)
>   {
>   	struct bio *bio = &bio_ctrl->bbio->bio;
> -	struct bio_vec *bvec = bio_last_bvec_all(bio);
>   	const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
> -	struct folio *bv_folio = page_folio(bvec->bv_page);
>
>   	if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) {
>   		/*
> @@ -660,19 +658,11 @@ static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
>   	}
>
>   	/*
> -	 * The contig check requires the following conditions to be met:
> -	 *
> -	 * 1) The folios are belonging to the same inode
> -	 *    This is implied by the call chain.
> -	 *
> -	 * 2) The range has adjacent logical bytenr
> -	 *
> -	 * 3) The range has adjacent file offset
> -	 *    This is required for the usage of btrfs_bio->file_offset.
> +	 * To merge into a bio both the disk sector and the logical offset in
> +	 * the file need to be contiguous.
>   	 */
> -	return bio_end_sector(bio) == sector &&
> -		folio_pos(bv_folio) + bvec->bv_offset + bvec->bv_len ==
> -		folio_pos(folio) + pg_offset;
> +	return bio_ctrl->next_file_offset == file_offset &&
> +		bio_end_sector(bio) == sector;
>   }
>
>   static void alloc_new_bio(struct btrfs_inode *inode,
> @@ -690,6 +680,7 @@ static void alloc_new_bio(struct btrfs_inode *inode,
>   	bbio->file_offset = file_offset;
>   	bio_ctrl->bbio = bbio;
>   	bio_ctrl->len_to_oe_boundary = U32_MAX;
> +	bio_ctrl->next_file_offset = file_offset;
>
>   	/* Limit data write bios to the ordered boundary. */
>   	if (bio_ctrl->wbc) {
> @@ -731,12 +722,13 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
>   			       size_t size, unsigned long pg_offset)
>   {
>   	struct btrfs_inode *inode = folio_to_inode(folio);
> +	loff_t file_offset = folio_pos(folio) + pg_offset;
>
>   	ASSERT(pg_offset + size <= folio_size(folio));
>   	ASSERT(bio_ctrl->end_io_func);
>
>   	if (bio_ctrl->bbio &&
> -	    !btrfs_bio_is_contig(bio_ctrl, folio, disk_bytenr, pg_offset))
> +	    !btrfs_bio_is_contig(bio_ctrl, disk_bytenr, file_offset))
>   		submit_one_bio(bio_ctrl);
>
>   	do {
> @@ -745,7 +737,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
>   		/* Allocate new bio if needed */
>   		if (!bio_ctrl->bbio) {
>   			alloc_new_bio(inode, bio_ctrl, disk_bytenr,
> -				      folio_pos(folio) + pg_offset);
> +				      file_offset);
>   		}
>
>   		/* Cap to the current ordered extent boundary if there is one. */
> @@ -760,14 +752,15 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
>   			submit_one_bio(bio_ctrl);
>   			continue;
>   		}
> +		bio_ctrl->next_file_offset += len;
>
>   		if (bio_ctrl->wbc)
> -			wbc_account_cgroup_owner(bio_ctrl->wbc, folio,
> -						 len);
> +			wbc_account_cgroup_owner(bio_ctrl->wbc, folio, len);
>
>   		size -= len;
>   		pg_offset += len;
>   		disk_bytenr += len;
> +		file_offset += len;
>
>   		/*
>   		 * len_to_oe_boundary defaults to U32_MAX, which isn't folio or


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
  2025-04-09 11:10 ` [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read Christoph Hellwig
  2025-04-09 11:10 ` [PATCH 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-09 22:19   ` Qu Wenruo
  2025-04-10  6:06   ` Johannes Thumshirn
  2025-04-09 11:10 ` [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

This avoid stepping into bvec internals or complicated folio to
page offset calculations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c     | 9 ++++-----
 fs/btrfs/bio.h     | 3 +--
 fs/btrfs/disk-io.c | 7 ++++---
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 8c2eee1f1878..10f31ee1e8c0 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -192,7 +192,7 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
 		btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
 				  repair_bbio->file_offset, fs_info->sectorsize,
 				  repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
-				  page_folio(bv->bv_page), bv->bv_offset, mirror);
+				  bvec_phys(bv), mirror);
 	} while (mirror != fbio->bbio->mirror_num);
 
 done:
@@ -803,8 +803,8 @@ void btrfs_submit_bbio(struct btrfs_bio *bbio, int mirror_num)
  * freeing the bio.
  */
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
-			    u64 length, u64 logical, struct folio *folio,
-			    unsigned int folio_offset, int mirror_num)
+			    u64 length, u64 logical, phys_addr_t paddr,
+			    int mirror_num)
 {
 	struct btrfs_io_stripe smap = { 0 };
 	struct bio_vec bvec;
@@ -835,8 +835,7 @@ int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 
 	bio_init(&bio, smap.dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
 	bio.bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
-	ret = bio_add_folio(&bio, folio, length, folio_offset);
-	ASSERT(ret);
+	__bio_add_page(&bio, phys_to_page(paddr), length, offset_in_page(paddr));
 	ret = submit_bio_wait(&bio);
 	if (ret) {
 		/* try to remap that extent elsewhere? */
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index e2fe16074ad6..dc2eb43b7097 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -110,7 +110,6 @@ void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status);
 void btrfs_submit_bbio(struct btrfs_bio *bbio, int mirror_num);
 void btrfs_submit_repair_write(struct btrfs_bio *bbio, int mirror_num, bool dev_replace);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
-			    u64 length, u64 logical, struct folio *folio,
-			    unsigned int folio_offset, int mirror_num);
+			    u64 length, u64 logical, phys_addr_t paddr, int mirror_num);
 
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3dd555db3d32..7cd1a7a775ed 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -193,10 +193,11 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
 		u64 end = min_t(u64, eb->start + eb->len,
 				folio_pos(folio) + eb->folio_size);
 		u32 len = end - start;
+		phys_addr_t paddr = PFN_PHYS(folio_pfn(folio)) +
+				    offset_in_folio(folio, start);
 
-		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
-					      start, folio, offset_in_folio(folio, start),
-					      mirror_num);
+		ret = btrfs_repair_io_failure(fs_info, 0, start, len, start,
+					      paddr, mirror_num);
 		if (ret)
 			break;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure
  2025-04-09 11:10 ` [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Christoph Hellwig
@ 2025-04-09 22:19   ` Qu Wenruo
  2025-04-10  5:31     ` Christoph Hellwig
  2025-04-10  6:06   ` Johannes Thumshirn
  1 sibling, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2025-04-09 22:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> This avoid stepping into bvec internals or complicated folio to
> page offset calculations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/bio.c     | 9 ++++-----
>   fs/btrfs/bio.h     | 3 +--
>   fs/btrfs/disk-io.c | 7 ++++---
>   3 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 8c2eee1f1878..10f31ee1e8c0 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -192,7 +192,7 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
>   		btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
>   				  repair_bbio->file_offset, fs_info->sectorsize,
>   				  repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
> -				  page_folio(bv->bv_page), bv->bv_offset, mirror);
> +				  bvec_phys(bv), mirror);

Just one concern, this is highmem safe?

If it's highmem safe, I'm pretty happy to go the single physical address 
way instead of folio/page + offset.

That's way more convenient to handle block size < page size cases.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure
  2025-04-09 22:19   ` Qu Wenruo
@ 2025-04-10  5:31     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-10  5:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs

On Thu, Apr 10, 2025 at 07:49:30AM +0930, Qu Wenruo wrote:
>>   		btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
>>   				  repair_bbio->file_offset, fs_info->sectorsize,
>>   				  repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
>> -				  page_folio(bv->bv_page), bv->bv_offset, mirror);
>> +				  bvec_phys(bv), mirror);
>
> Just one concern, this is highmem safe?
>
> If it's highmem safe, I'm pretty happy to go the single physical address 
> way instead of folio/page + offset.
>
> That's way more convenient to handle block size < page size cases.

Physicall addresses and thus bvec_phys work perfectly fine for highmem.
That's why I did that here instead of the more obvious virtual address.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure
  2025-04-09 11:10 ` [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Christoph Hellwig
  2025-04-09 22:19   ` Qu Wenruo
@ 2025-04-10  6:06   ` Johannes Thumshirn
  2025-04-10  6:17     ` hch
  1 sibling, 1 reply; 26+ messages in thread
From: Johannes Thumshirn @ 2025-04-10  6:06 UTC (permalink / raw)
  To: hch, WenRuo Qu; +Cc: linux-btrfs@vger.kernel.org

On 09.04.25 13:11, Christoph Hellwig wrote:
>   	bio_init(&bio, smap.dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
>   	bio.bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
> -	ret = bio_add_folio(&bio, folio, length, folio_offset);
> -	ASSERT(ret);
> +	__bio_add_page(&bio, phys_to_page(paddr), length, offset_in_page(paddr));

Why are we going back to using a pages and __bio_add_page() here? Can we 
lift phys_to_folio() from s390 into asm-generic/memory_model.h?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure
  2025-04-10  6:06   ` Johannes Thumshirn
@ 2025-04-10  6:17     ` hch
  0 siblings, 0 replies; 26+ messages in thread
From: hch @ 2025-04-10  6:17 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: hch, WenRuo Qu, linux-btrfs@vger.kernel.org

On Thu, Apr 10, 2025 at 06:06:08AM +0000, Johannes Thumshirn wrote:
> On 09.04.25 13:11, Christoph Hellwig wrote:
> >   	bio_init(&bio, smap.dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
> >   	bio.bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
> > -	ret = bio_add_folio(&bio, folio, length, folio_offset);
> > -	ASSERT(ret);
> > +	__bio_add_page(&bio, phys_to_page(paddr), length, offset_in_page(paddr));
> 
> Why are we going back to using a pages and __bio_add_page() here?

Because it is the most direct interface right now, bio_add_folio is
just a pointless wrapper.   I plan to add a bio_add_phys in a bit,
but don't want this to depend on it for now.

> Can we 
> lift phys_to_folio() from s390 into asm-generic/memory_model.h?

Yes, volunteers please.  Also we shouldn't depend on that for now.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
                   ` (2 preceding siblings ...)
  2025-04-09 11:10 ` [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-10  6:16   ` Johannes Thumshirn
  2025-04-16  4:51   ` Qu Wenruo
  2025-04-09 11:10 ` [PATCH 5/8] btrfs: simplify bvec iteration in index_one_bio Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Move kmapping the page out of btrfs_check_sector_csum.  This allows
using bvec_kmap_local where suitable and reduces the number of kmap
calls in the raid56 code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h |  4 ++--
 fs/btrfs/inode.c       | 18 ++++++++----------
 fs/btrfs/raid56.c      | 19 +++++++++++--------
 fs/btrfs/scrub.c       |  5 ++++-
 4 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e2952cf5766..d48438332c7c 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -529,8 +529,8 @@ static inline void btrfs_update_inode_mapping_flags(struct btrfs_inode *inode)
 #define CSUM_FMT				"0x%*phN"
 #define CSUM_FMT_VALUE(size, bytes)		size, bytes
 
-int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
-			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
+int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, void *kaddr,
+		u8 *csum, const u8 * const csum_expected);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct btrfs_inode *inode, u64 offset, u64 *len,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cc67d1a2d611..665df96af134 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3337,19 +3337,13 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered)
  * Verify the checksum for a single sector without any extra action that depend
  * on the type of I/O.
  */
-int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
-			    u32 pgoff, u8 *csum, const u8 * const csum_expected)
+int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, void *kaddr,
+			    u8 *csum, const u8 * const csum_expected)
 {
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-	char *kaddr;
-
-	ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE);
 
 	shash->tfm = fs_info->csum_shash;
-
-	kaddr = kmap_local_page(page) + pgoff;
 	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
-	kunmap_local(kaddr);
 
 	if (memcmp(csum, csum_expected, fs_info->csum_size))
 		return -EIO;
@@ -3378,6 +3372,7 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 	u64 end = file_offset + bv->bv_len - 1;
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
+	char *kaddr;
 
 	ASSERT(bv->bv_len == fs_info->sectorsize);
 
@@ -3395,9 +3390,12 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 
 	csum_expected = bbio->csum + (bio_offset >> fs_info->sectorsize_bits) *
 				fs_info->csum_size;
-	if (btrfs_check_sector_csum(fs_info, bv->bv_page, bv->bv_offset, csum,
-				    csum_expected))
+	kaddr = bvec_kmap_local(bv);
+	if (btrfs_check_sector_csum(fs_info, kaddr, csum, csum_expected)) {
+		kunmap_local(kaddr);
 		goto zeroit;
+	}
+	kunmap_local(kaddr);
 	return true;
 
 zeroit:
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index cdd373c27784..28dbded86cc2 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1575,11 +1575,11 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
 		return;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		int bv_offset;
+		void *kaddr = bvec_kmap_local(bvec);
+		unsigned int off;
 
-		for (bv_offset = bvec->bv_offset;
-		     bv_offset < bvec->bv_offset + bvec->bv_len;
-		     bv_offset += fs_info->sectorsize, total_sector_nr++) {
+		for (off = 0; off < bvec->bv_len;
+		     off += fs_info->sectorsize, total_sector_nr++) {
 			u8 csum_buf[BTRFS_CSUM_SIZE];
 			u8 *expected_csum = rbio->csum_buf +
 					    total_sector_nr * fs_info->csum_size;
@@ -1589,11 +1589,12 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
 			if (!test_bit(total_sector_nr, rbio->csum_bitmap))
 				continue;
 
-			ret = btrfs_check_sector_csum(fs_info, bvec->bv_page,
-				bv_offset, csum_buf, expected_csum);
+			ret = btrfs_check_sector_csum(fs_info, kaddr + off,
+					csum_buf, expected_csum);
 			if (ret < 0)
 				set_bit(total_sector_nr, rbio->error_bitmap);
 		}
+		kunmap_local(kaddr);
 	}
 }
 
@@ -1791,6 +1792,7 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
 	struct sector_ptr *sector;
 	u8 csum_buf[BTRFS_CSUM_SIZE];
 	u8 *csum_expected;
+	void *kaddr;
 	int ret;
 
 	if (!rbio->csum_bitmap || !rbio->csum_buf)
@@ -1811,11 +1813,12 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
 
 	ASSERT(sector->page);
 
+	kaddr = kmap_local_page(sector->page) + sector->pgoff;
 	csum_expected = rbio->csum_buf +
 			(stripe_nr * rbio->stripe_nsectors + sector_nr) *
 			fs_info->csum_size;
-	ret = btrfs_check_sector_csum(fs_info, sector->page, sector->pgoff,
-				      csum_buf, csum_expected);
+	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, csum_expected);
+	kunmap_local(kaddr);
 	return ret;
 }
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2c5edcee9450..49021765c17b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -694,6 +694,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 	struct page *page = scrub_stripe_get_page(stripe, sector_nr);
 	unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
 	u8 csum_buf[BTRFS_CSUM_SIZE];
+	void *kaddr;
 	int ret;
 
 	ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
@@ -737,7 +738,9 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 		return;
 	}
 
-	ret = btrfs_check_sector_csum(fs_info, page, pgoff, csum_buf, sector->csum);
+	kaddr = kmap_local_page(page) + pgoff;
+	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, sector->csum);
+	kunmap_local(kaddr);
 	if (ret < 0) {
 		set_bit(sector_nr, &stripe->csum_error_bitmap);
 		set_bit(sector_nr, &stripe->error_bitmap);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum
  2025-04-09 11:10 ` [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Christoph Hellwig
@ 2025-04-10  6:16   ` Johannes Thumshirn
  2025-04-16  4:51   ` Qu Wenruo
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Thumshirn @ 2025-04-10  6:16 UTC (permalink / raw)
  To: hch, WenRuo Qu; +Cc: linux-btrfs@vger.kernel.org

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum
  2025-04-09 11:10 ` [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Christoph Hellwig
  2025-04-10  6:16   ` Johannes Thumshirn
@ 2025-04-16  4:51   ` Qu Wenruo
  1 sibling, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-16  4:51 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> Move kmapping the page out of btrfs_check_sector_csum.  This allows
> using bvec_kmap_local where suitable and reduces the number of kmap
> calls in the raid56 code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/btrfs_inode.h |  4 ++--
>   fs/btrfs/inode.c       | 18 ++++++++----------
>   fs/btrfs/raid56.c      | 19 +++++++++++--------
>   fs/btrfs/scrub.c       |  5 ++++-
>   4 files changed, 25 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4e2952cf5766..d48438332c7c 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -529,8 +529,8 @@ static inline void btrfs_update_inode_mapping_flags(struct btrfs_inode *inode)
>   #define CSUM_FMT				"0x%*phN"
>   #define CSUM_FMT_VALUE(size, bytes)		size, bytes
>   
> -int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
> -			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
> +int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, void *kaddr,
> +		u8 *csum, const u8 * const csum_expected);
>   bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
>   			u32 bio_offset, struct bio_vec *bv);
>   noinline int can_nocow_extent(struct btrfs_inode *inode, u64 offset, u64 *len,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index cc67d1a2d611..665df96af134 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3337,19 +3337,13 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered)
>    * Verify the checksum for a single sector without any extra action that depend
>    * on the type of I/O.
>    */
> -int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
> -			    u32 pgoff, u8 *csum, const u8 * const csum_expected)
> +int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, void *kaddr,
> +			    u8 *csum, const u8 * const csum_expected)
>   {
>   	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
> -	char *kaddr;
> -
> -	ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE);
>   
>   	shash->tfm = fs_info->csum_shash;
> -
> -	kaddr = kmap_local_page(page) + pgoff;
>   	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
> -	kunmap_local(kaddr);
>   
>   	if (memcmp(csum, csum_expected, fs_info->csum_size))
>   		return -EIO;
> @@ -3378,6 +3372,7 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
>   	u64 end = file_offset + bv->bv_len - 1;
>   	u8 *csum_expected;
>   	u8 csum[BTRFS_CSUM_SIZE];
> +	char *kaddr;
>   
>   	ASSERT(bv->bv_len == fs_info->sectorsize);
>   
> @@ -3395,9 +3390,12 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
>   
>   	csum_expected = bbio->csum + (bio_offset >> fs_info->sectorsize_bits) *
>   				fs_info->csum_size;
> -	if (btrfs_check_sector_csum(fs_info, bv->bv_page, bv->bv_offset, csum,
> -				    csum_expected))
> +	kaddr = bvec_kmap_local(bv);
> +	if (btrfs_check_sector_csum(fs_info, kaddr, csum, csum_expected)) {
> +		kunmap_local(kaddr);
>   		goto zeroit;
> +	}
> +	kunmap_local(kaddr);
>   	return true;
>   
>   zeroit:
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index cdd373c27784..28dbded86cc2 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1575,11 +1575,11 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
>   		return;
>   
>   	bio_for_each_segment_all(bvec, bio, iter_all) {
> -		int bv_offset;
> +		void *kaddr = bvec_kmap_local(bvec);
> +		unsigned int off;
>   
> -		for (bv_offset = bvec->bv_offset;
> -		     bv_offset < bvec->bv_offset + bvec->bv_len;
> -		     bv_offset += fs_info->sectorsize, total_sector_nr++) {
> +		for (off = 0; off < bvec->bv_len;
> +		     off += fs_info->sectorsize, total_sector_nr++) {
>   			u8 csum_buf[BTRFS_CSUM_SIZE];
>   			u8 *expected_csum = rbio->csum_buf +
>   					    total_sector_nr * fs_info->csum_size;
> @@ -1589,11 +1589,12 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
>   			if (!test_bit(total_sector_nr, rbio->csum_bitmap))
>   				continue;
>   
> -			ret = btrfs_check_sector_csum(fs_info, bvec->bv_page,
> -				bv_offset, csum_buf, expected_csum);
> +			ret = btrfs_check_sector_csum(fs_info, kaddr + off,
> +					csum_buf, expected_csum);
>   			if (ret < 0)
>   				set_bit(total_sector_nr, rbio->error_bitmap);
>   		}
> +		kunmap_local(kaddr);
>   	}
>   }
>   
> @@ -1791,6 +1792,7 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
>   	struct sector_ptr *sector;
>   	u8 csum_buf[BTRFS_CSUM_SIZE];
>   	u8 *csum_expected;
> +	void *kaddr;
>   	int ret;
>   
>   	if (!rbio->csum_bitmap || !rbio->csum_buf)
> @@ -1811,11 +1813,12 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
>   
>   	ASSERT(sector->page);
>   
> +	kaddr = kmap_local_page(sector->page) + sector->pgoff;
>   	csum_expected = rbio->csum_buf +
>   			(stripe_nr * rbio->stripe_nsectors + sector_nr) *
>   			fs_info->csum_size;
> -	ret = btrfs_check_sector_csum(fs_info, sector->page, sector->pgoff,
> -				      csum_buf, csum_expected);
> +	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, csum_expected);
> +	kunmap_local(kaddr);
>   	return ret;
>   }
>   
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 2c5edcee9450..49021765c17b 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -694,6 +694,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
>   	struct page *page = scrub_stripe_get_page(stripe, sector_nr);
>   	unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
>   	u8 csum_buf[BTRFS_CSUM_SIZE];
> +	void *kaddr;
>   	int ret;
>   
>   	ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
> @@ -737,7 +738,9 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
>   		return;
>   	}
>   
> -	ret = btrfs_check_sector_csum(fs_info, page, pgoff, csum_buf, sector->csum);
> +	kaddr = kmap_local_page(page) + pgoff;
> +	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, sector->csum);
> +	kunmap_local(kaddr);
>   	if (ret < 0) {
>   		set_bit(sector_nr, &stripe->csum_error_bitmap);
>   		set_bit(sector_nr, &stripe->error_bitmap);


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/8] btrfs: simplify bvec iteration in index_one_bio
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
                   ` (3 preceding siblings ...)
  2025-04-09 11:10 ` [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-18  2:09   ` Qu Wenruo
  2025-04-09 11:10 ` [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Flatten the two loops by open coding bio_for_each_segment and advancing
the iterator one sector at a time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/raid56.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 28dbded86cc2..703e713bac03 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1195,23 +1195,20 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 {
 	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
-	struct bio_vec bvec;
-	struct bvec_iter iter;
+	struct bvec_iter iter = bio->bi_iter;
 	u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
 		     rbio->bioc->full_stripe_logical;
 
-	bio_for_each_segment(bvec, bio, iter) {
-		u32 bvec_offset;
+	while (iter.bi_size) {
+		int index = offset / sectorsize;
+		struct sector_ptr *sector = &rbio->bio_sectors[index];
+		struct bio_vec bv = bio_iter_iovec(bio, iter);
 
-		for (bvec_offset = 0; bvec_offset < bvec.bv_len;
-		     bvec_offset += sectorsize, offset += sectorsize) {
-			int index = offset / sectorsize;
-			struct sector_ptr *sector = &rbio->bio_sectors[index];
+		sector->page = bv.bv_page;
+		sector->pgoff = bv.bv_offset;
+		ASSERT(sector->pgoff < PAGE_SIZE);
 
-			sector->page = bvec.bv_page;
-			sector->pgoff = bvec.bv_offset + bvec_offset;
-			ASSERT(sector->pgoff < PAGE_SIZE);
-		}
+		bio_advance_iter_single(bio, &iter, sectorsize);
 	}
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/8] btrfs: simplify bvec iteration in index_one_bio
  2025-04-09 11:10 ` [PATCH 5/8] btrfs: simplify bvec iteration in index_one_bio Christoph Hellwig
@ 2025-04-18  2:09   ` Qu Wenruo
  0 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-18  2:09 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> Flatten the two loops by open coding bio_for_each_segment and advancing
> the iterator one sector at a time.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/raid56.c | 21 +++++++++------------
>   1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 28dbded86cc2..703e713bac03 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1195,23 +1195,20 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>   static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
>   {
>   	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
> -	struct bio_vec bvec;
> -	struct bvec_iter iter;
> +	struct bvec_iter iter = bio->bi_iter;
>   	u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
>   		     rbio->bioc->full_stripe_logical;
>   
> -	bio_for_each_segment(bvec, bio, iter) {
> -		u32 bvec_offset;
> +	while (iter.bi_size) {
> +		int index = offset / sectorsize;

The value @offset is no longer updated, this means only the first sector 
of the current bio will be updated, and it's updated several times, 
resulting incorrect location.

Furthermore, for full stripe write since we do not allocate pages for 
the stripes_pages[], we will got incorrect bio_sectors[] pointing to NULL.

This will crash rmw_rbio(), easily reproduced by btrfs/011.

I'll also fix it in my local branch.

Thanks,
Qu

> +		struct sector_ptr *sector = &rbio->bio_sectors[index];
> +		struct bio_vec bv = bio_iter_iovec(bio, iter);
>   
> -		for (bvec_offset = 0; bvec_offset < bvec.bv_len;
> -		     bvec_offset += sectorsize, offset += sectorsize) {
> -			int index = offset / sectorsize;
> -			struct sector_ptr *sector = &rbio->bio_sectors[index];
> +		sector->page = bv.bv_page;
> +		sector->pgoff = bv.bv_offset;
> +		ASSERT(sector->pgoff < PAGE_SIZE);
>   
> -			sector->page = bvec.bv_page;
> -			sector->pgoff = bvec.bv_offset + bvec_offset;
> -			ASSERT(sector->pgoff < PAGE_SIZE);
> -		}
> +		bio_advance_iter_single(bio, &iter, sectorsize);
>   	}
>   }
>   


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
                   ` (4 preceding siblings ...)
  2025-04-09 11:10 ` [PATCH 5/8] btrfs: simplify bvec iteration in index_one_bio Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-09 22:34   ` Qu Wenruo
  2025-04-09 11:10 ` [PATCH 7/8] btrfs: refactor getting the address of a stripe sector Christoph Hellwig
  2025-04-09 11:10 ` [PATCH 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page Christoph Hellwig
  7 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

All data pointed to by struct sector_ptr is non-highmem kernel memory.
Simplify the code by using a void pointer instead of a page + offset
pair and dropping all the kmap calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/raid56.c | 167 +++++++++++++++++-----------------------------
 1 file changed, 62 insertions(+), 105 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 703e713bac03..3ccbf133b455 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -134,14 +134,12 @@ struct btrfs_stripe_hash_table {
 };
 
 /*
- * A bvec like structure to present a sector inside a page.
- *
- * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
+ * A structure to present a sector inside a page, the length is fixed to
+ * sectorsize;
  */
 struct sector_ptr {
-	struct page *page;
-	unsigned int pgoff:24;
-	unsigned int uptodate:8;
+	void *kaddr;
+	u8 uptodate;
 };
 
 static void rmw_rbio_work(struct work_struct *work);
@@ -253,7 +251,7 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 
 	for (i = 0; i < rbio->nr_sectors; i++) {
 		/* Some range not covered by bio (partial write), skip it */
-		if (!rbio->bio_sectors[i].page) {
+		if (!rbio->bio_sectors[i].kaddr) {
 			/*
 			 * Even if the sector is not covered by bio, if it is
 			 * a data sector it should still be uptodate as it is
@@ -264,11 +262,9 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 			continue;
 		}
 
-		ASSERT(rbio->stripe_sectors[i].page);
-		memcpy_page(rbio->stripe_sectors[i].page,
-			    rbio->stripe_sectors[i].pgoff,
-			    rbio->bio_sectors[i].page,
-			    rbio->bio_sectors[i].pgoff,
+		ASSERT(rbio->stripe_sectors[i].kaddr);
+		memcpy(rbio->stripe_sectors[i].kaddr,
+			    rbio->bio_sectors[i].kaddr,
 			    rbio->bioc->fs_info->sectorsize);
 		rbio->stripe_sectors[i].uptodate = 1;
 	}
@@ -326,8 +322,9 @@ static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
 		int page_index = offset >> PAGE_SHIFT;
 
 		ASSERT(page_index < rbio->nr_pages);
-		rbio->stripe_sectors[i].page = rbio->stripe_pages[page_index];
-		rbio->stripe_sectors[i].pgoff = offset_in_page(offset);
+		rbio->stripe_sectors[i].kaddr =
+			page_address(rbio->stripe_pages[page_index]) +
+					offset_in_page(offset);
 	}
 }
 
@@ -962,9 +959,9 @@ static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
 
 	spin_lock(&rbio->bio_list_lock);
 	sector = &rbio->bio_sectors[index];
-	if (sector->page || bio_list_only) {
+	if (sector->kaddr || bio_list_only) {
 		/* Don't return sector without a valid page pointer */
-		if (!sector->page)
+		if (!sector->kaddr)
 			sector = NULL;
 		spin_unlock(&rbio->bio_list_lock);
 		return sector;
@@ -1142,7 +1139,7 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 			   rbio, stripe_nr);
 	ASSERT_RBIO_SECTOR(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors,
 			   rbio, sector_nr);
-	ASSERT(sector->page);
+	ASSERT(sector->kaddr);
 
 	stripe = &rbio->bioc->stripes[stripe_nr];
 	disk_start = stripe->physical + sector_nr * sectorsize;
@@ -1173,8 +1170,9 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 		 */
 		if (last_end == disk_start && !last->bi_status &&
 		    last->bi_bdev == stripe->dev->bdev) {
-			ret = bio_add_page(last, sector->page, sectorsize,
-					   sector->pgoff);
+			ret = bio_add_page(last, virt_to_page(sector->kaddr),
+					sectorsize,
+					offset_in_page(sector->kaddr));
 			if (ret == sectorsize)
 				return 0;
 		}
@@ -1187,7 +1185,8 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 	bio->bi_iter.bi_sector = disk_start >> SECTOR_SHIFT;
 	bio->bi_private = rbio;
 
-	__bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
+	__bio_add_page(bio, virt_to_page(sector->kaddr), sectorsize,
+			offset_in_page(sector->kaddr));
 	bio_list_add(bio_list, bio);
 	return 0;
 }
@@ -1204,10 +1203,7 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 		struct sector_ptr *sector = &rbio->bio_sectors[index];
 		struct bio_vec bv = bio_iter_iovec(bio, iter);
 
-		sector->page = bv.bv_page;
-		sector->pgoff = bv.bv_offset;
-		ASSERT(sector->pgoff < PAGE_SIZE);
-
+		sector->kaddr = bvec_virt(&bv);
 		bio_advance_iter_single(bio, &iter, sectorsize);
 	}
 }
@@ -1298,14 +1294,13 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
 	/* First collect one sector from each data stripe */
 	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
 		sector = sector_in_rbio(rbio, stripe, sectornr, 0);
-		pointers[stripe] = kmap_local_page(sector->page) +
-				   sector->pgoff;
+		pointers[stripe] = sector->kaddr;
 	}
 
 	/* Then add the parity stripe */
 	sector = rbio_pstripe_sector(rbio, sectornr);
 	sector->uptodate = 1;
-	pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
+	pointers[stripe++] = sector->kaddr;
 
 	if (has_qstripe) {
 		/*
@@ -1314,8 +1309,7 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
 		 */
 		sector = rbio_qstripe_sector(rbio, sectornr);
 		sector->uptodate = 1;
-		pointers[stripe++] = kmap_local_page(sector->page) +
-				     sector->pgoff;
+		pointers[stripe++] = sector->kaddr;
 
 		assert_rbio(rbio);
 		raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
@@ -1325,8 +1319,6 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
 		memcpy(pointers[rbio->nr_data], pointers[0], sectorsize);
 		run_xor(pointers + 1, rbio->nr_data - 1, sectorsize);
 	}
-	for (stripe = stripe - 1; stripe >= 0; stripe--)
-		kunmap_local(pointers[stripe]);
 }
 
 static int rmw_assemble_write_bios(struct btrfs_raid_bio *rbio,
@@ -1474,15 +1466,14 @@ static void set_rbio_range_error(struct btrfs_raid_bio *rbio, struct bio *bio)
  * stripe_pages[], thus we need to locate the sector.
  */
 static struct sector_ptr *find_stripe_sector(struct btrfs_raid_bio *rbio,
-					     struct page *page,
-					     unsigned int pgoff)
+					     void *kaddr)
 {
 	int i;
 
 	for (i = 0; i < rbio->nr_sectors; i++) {
 		struct sector_ptr *sector = &rbio->stripe_sectors[i];
 
-		if (sector->page == page && sector->pgoff == pgoff)
+		if (sector->kaddr == kaddr)
 			return sector;
 	}
 	return NULL;
@@ -1502,11 +1493,11 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		struct sector_ptr *sector;
-		int pgoff;
+		void *kaddr = bvec_virt(bvec);
+		int off;
 
-		for (pgoff = bvec->bv_offset; pgoff - bvec->bv_offset < bvec->bv_len;
-		     pgoff += sectorsize) {
-			sector = find_stripe_sector(rbio, bvec->bv_page, pgoff);
+		for (off = 0; off < bvec->bv_len; off += sectorsize) {
+			sector = find_stripe_sector(rbio, kaddr + off);
 			ASSERT(sector);
 			if (sector)
 				sector->uptodate = 1;
@@ -1516,17 +1507,13 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
 
 static int get_bio_sector_nr(struct btrfs_raid_bio *rbio, struct bio *bio)
 {
-	struct bio_vec *bv = bio_first_bvec_all(bio);
+	void *bvec_kaddr = bvec_virt(bio_first_bvec_all(bio));
 	int i;
 
 	for (i = 0; i < rbio->nr_sectors; i++) {
-		struct sector_ptr *sector;
-
-		sector = &rbio->stripe_sectors[i];
-		if (sector->page == bv->bv_page && sector->pgoff == bv->bv_offset)
+		if (rbio->stripe_sectors[i].kaddr == bvec_kaddr)
 			break;
-		sector = &rbio->bio_sectors[i];
-		if (sector->page == bv->bv_page && sector->pgoff == bv->bv_offset)
+		if (rbio->bio_sectors[i].kaddr == bvec_kaddr)
 			break;
 	}
 	ASSERT(i < rbio->nr_sectors);
@@ -1789,8 +1776,6 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
 	struct sector_ptr *sector;
 	u8 csum_buf[BTRFS_CSUM_SIZE];
 	u8 *csum_expected;
-	void *kaddr;
-	int ret;
 
 	if (!rbio->csum_bitmap || !rbio->csum_buf)
 		return 0;
@@ -1808,15 +1793,13 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
 		sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
 	}
 
-	ASSERT(sector->page);
+	ASSERT(sector->kaddr);
 
-	kaddr = kmap_local_page(sector->page) + sector->pgoff;
 	csum_expected = rbio->csum_buf +
 			(stripe_nr * rbio->stripe_nsectors + sector_nr) *
 			fs_info->csum_size;
-	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, csum_expected);
-	kunmap_local(kaddr);
-	return ret;
+	return btrfs_check_sector_csum(fs_info, sector->kaddr, csum_buf,
+			csum_expected);
 }
 
 /*
@@ -1825,7 +1808,7 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
  * need to allocate/free the pointers again and again.
  */
 static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
-			    void **pointers, void **unmap_array)
+			    void **pointers)
 {
 	struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
 	struct sector_ptr *sector;
@@ -1872,10 +1855,8 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 		} else {
 			sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
 		}
-		ASSERT(sector->page);
-		pointers[stripe_nr] = kmap_local_page(sector->page) +
-				   sector->pgoff;
-		unmap_array[stripe_nr] = pointers[stripe_nr];
+		ASSERT(sector->kaddr);
+		pointers[stripe_nr] = sector->kaddr;
 	}
 
 	/* All raid6 handling here */
@@ -1889,7 +1870,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 				 * We have nothing to do, just skip the
 				 * recovery for this stripe.
 				 */
-				goto cleanup;
+				return ret;
 			/*
 			 * a single failure in raid6 is rebuilt
 			 * in the pstripe code below
@@ -1911,7 +1892,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 				 * We only care about data stripes recovery,
 				 * can skip this vertical stripe.
 				 */
-				goto cleanup;
+				return ret;
 			/*
 			 * Otherwise we have one bad data stripe and
 			 * a good P stripe.  raid5!
@@ -1960,7 +1941,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 	if (faila >= 0) {
 		ret = verify_one_sector(rbio, faila, sector_nr);
 		if (ret < 0)
-			goto cleanup;
+			return ret;
 
 		sector = rbio_stripe_sector(rbio, faila, sector_nr);
 		sector->uptodate = 1;
@@ -1968,34 +1949,26 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 	if (failb >= 0) {
 		ret = verify_one_sector(rbio, failb, sector_nr);
 		if (ret < 0)
-			goto cleanup;
+			return ret;
 
 		sector = rbio_stripe_sector(rbio, failb, sector_nr);
 		sector->uptodate = 1;
 	}
 
-cleanup:
-	for (stripe_nr = rbio->real_stripes - 1; stripe_nr >= 0; stripe_nr--)
-		kunmap_local(unmap_array[stripe_nr]);
 	return ret;
 }
 
 static int recover_sectors(struct btrfs_raid_bio *rbio)
 {
 	void **pointers = NULL;
-	void **unmap_array = NULL;
 	int sectornr;
 	int ret = 0;
 
 	/*
 	 * @pointers array stores the pointer for each sector.
-	 *
-	 * @unmap_array stores copy of pointers that does not get reordered
-	 * during reconstruction so that kunmap_local works.
 	 */
 	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
-	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
-	if (!pointers || !unmap_array) {
+	if (!pointers) {
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -2009,14 +1982,13 @@ static int recover_sectors(struct btrfs_raid_bio *rbio)
 	index_rbio_pages(rbio);
 
 	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
-		ret = recover_vertical(rbio, sectornr, pointers, unmap_array);
+		ret = recover_vertical(rbio, sectornr, pointers);
 		if (ret < 0)
 			break;
 	}
 
 out:
 	kfree(pointers);
-	kfree(unmap_array);
 	return ret;
 }
 
@@ -2326,7 +2298,7 @@ static bool need_read_stripe_sectors(struct btrfs_raid_bio *rbio)
 		 * thus this rbio can not be cached one, as cached one must
 		 * have all its data sectors present and uptodate.
 		 */
-		if (!sector->page || !sector->uptodate)
+		if (!sector->kaddr || !sector->uptodate)
 			return true;
 	}
 	return false;
@@ -2547,29 +2519,27 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
 	 */
 	clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
-	p_sector.page = alloc_page(GFP_NOFS);
-	if (!p_sector.page)
+	p_sector.kaddr = (void *)__get_free_page(GFP_NOFS);
+	if (!p_sector.kaddr)
 		return -ENOMEM;
-	p_sector.pgoff = 0;
 	p_sector.uptodate = 1;
 
 	if (has_qstripe) {
 		/* RAID6, allocate and map temp space for the Q stripe */
-		q_sector.page = alloc_page(GFP_NOFS);
-		if (!q_sector.page) {
-			__free_page(p_sector.page);
-			p_sector.page = NULL;
+		q_sector.kaddr = (void *)__get_free_page(GFP_NOFS);
+		if (!q_sector.kaddr) {
+			free_page((unsigned long)p_sector.kaddr);
+			p_sector.kaddr = NULL;
 			return -ENOMEM;
 		}
-		q_sector.pgoff = 0;
 		q_sector.uptodate = 1;
-		pointers[rbio->real_stripes - 1] = kmap_local_page(q_sector.page);
+		pointers[rbio->real_stripes - 1] = q_sector.kaddr;
 	}
 
 	bitmap_clear(rbio->error_bitmap, 0, rbio->nr_sectors);
 
 	/* Map the parity stripe just once */
-	pointers[nr_data] = kmap_local_page(p_sector.page);
+	pointers[nr_data] = p_sector.kaddr;
 
 	for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
 		struct sector_ptr *sector;
@@ -2578,8 +2548,7 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
 			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
-			pointers[stripe] = kmap_local_page(sector->page) +
-					   sector->pgoff;
+			pointers[stripe] = sector->kaddr;
 		}
 
 		if (has_qstripe) {
@@ -2595,25 +2564,19 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
 
 		/* Check scrubbing parity and repair it */
 		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
-		parity = kmap_local_page(sector->page) + sector->pgoff;
+		parity = sector->kaddr;
 		if (memcmp(parity, pointers[rbio->scrubp], sectorsize) != 0)
 			memcpy(parity, pointers[rbio->scrubp], sectorsize);
 		else
 			/* Parity is right, needn't writeback */
 			bitmap_clear(&rbio->dbitmap, sectornr, 1);
-		kunmap_local(parity);
-
-		for (stripe = nr_data - 1; stripe >= 0; stripe--)
-			kunmap_local(pointers[stripe]);
 	}
 
-	kunmap_local(pointers[nr_data]);
-	__free_page(p_sector.page);
-	p_sector.page = NULL;
-	if (q_sector.page) {
-		kunmap_local(pointers[rbio->real_stripes - 1]);
-		__free_page(q_sector.page);
-		q_sector.page = NULL;
+	free_page((unsigned long)p_sector.kaddr);
+	p_sector.kaddr = NULL;
+	if (q_sector.kaddr) {
+		free_page((unsigned long)q_sector.kaddr);
+		q_sector.kaddr = NULL;
 	}
 
 	/*
@@ -2669,19 +2632,14 @@ static inline int is_data_stripe(struct btrfs_raid_bio *rbio, int stripe)
 static int recover_scrub_rbio(struct btrfs_raid_bio *rbio)
 {
 	void **pointers = NULL;
-	void **unmap_array = NULL;
 	int sector_nr;
 	int ret = 0;
 
 	/*
 	 * @pointers array stores the pointer for each sector.
-	 *
-	 * @unmap_array stores copy of pointers that does not get reordered
-	 * during reconstruction so that kunmap_local works.
 	 */
 	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
-	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
-	if (!pointers || !unmap_array) {
+	if (!pointers) {
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -2740,13 +2698,12 @@ static int recover_scrub_rbio(struct btrfs_raid_bio *rbio)
 			goto out;
 		}
 
-		ret = recover_vertical(rbio, sector_nr, pointers, unmap_array);
+		ret = recover_vertical(rbio, sector_nr, pointers);
 		if (ret < 0)
 			goto out;
 	}
 out:
 	kfree(pointers);
-	kfree(unmap_array);
 	return ret;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr
  2025-04-09 11:10 ` [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr Christoph Hellwig
@ 2025-04-09 22:34   ` Qu Wenruo
  2025-04-10  5:34     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2025-04-09 22:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> All data pointed to by struct sector_ptr is non-highmem kernel memory.
> Simplify the code by using a void pointer instead of a page + offset
> pair and dropping all the kmap calls.

But the higher level bio from btrfs filemaps can have highmem pages.

That's why we keep the kmap/kunmap.

Or is there a way to set the filemap to no use any highmem pages?

Thanks,
Qu

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/raid56.c | 167 +++++++++++++++++-----------------------------
>   1 file changed, 62 insertions(+), 105 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 703e713bac03..3ccbf133b455 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -134,14 +134,12 @@ struct btrfs_stripe_hash_table {
>   };
>   
>   /*
> - * A bvec like structure to present a sector inside a page.
> - *
> - * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
> + * A structure to present a sector inside a page, the length is fixed to
> + * sectorsize;
>    */
>   struct sector_ptr {
> -	struct page *page;
> -	unsigned int pgoff:24;
> -	unsigned int uptodate:8;
> +	void *kaddr;
> +	u8 uptodate;
>   };
>   
>   static void rmw_rbio_work(struct work_struct *work);
> @@ -253,7 +251,7 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
>   
>   	for (i = 0; i < rbio->nr_sectors; i++) {
>   		/* Some range not covered by bio (partial write), skip it */
> -		if (!rbio->bio_sectors[i].page) {
> +		if (!rbio->bio_sectors[i].kaddr) {
>   			/*
>   			 * Even if the sector is not covered by bio, if it is
>   			 * a data sector it should still be uptodate as it is
> @@ -264,11 +262,9 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
>   			continue;
>   		}
>   
> -		ASSERT(rbio->stripe_sectors[i].page);
> -		memcpy_page(rbio->stripe_sectors[i].page,
> -			    rbio->stripe_sectors[i].pgoff,
> -			    rbio->bio_sectors[i].page,
> -			    rbio->bio_sectors[i].pgoff,
> +		ASSERT(rbio->stripe_sectors[i].kaddr);
> +		memcpy(rbio->stripe_sectors[i].kaddr,
> +			    rbio->bio_sectors[i].kaddr,
>   			    rbio->bioc->fs_info->sectorsize);
>   		rbio->stripe_sectors[i].uptodate = 1;
>   	}
> @@ -326,8 +322,9 @@ static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
>   		int page_index = offset >> PAGE_SHIFT;
>   
>   		ASSERT(page_index < rbio->nr_pages);
> -		rbio->stripe_sectors[i].page = rbio->stripe_pages[page_index];
> -		rbio->stripe_sectors[i].pgoff = offset_in_page(offset);
> +		rbio->stripe_sectors[i].kaddr =
> +			page_address(rbio->stripe_pages[page_index]) +
> +					offset_in_page(offset);
>   	}
>   }
>   
> @@ -962,9 +959,9 @@ static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
>   
>   	spin_lock(&rbio->bio_list_lock);
>   	sector = &rbio->bio_sectors[index];
> -	if (sector->page || bio_list_only) {
> +	if (sector->kaddr || bio_list_only) {
>   		/* Don't return sector without a valid page pointer */
> -		if (!sector->page)
> +		if (!sector->kaddr)
>   			sector = NULL;
>   		spin_unlock(&rbio->bio_list_lock);
>   		return sector;
> @@ -1142,7 +1139,7 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>   			   rbio, stripe_nr);
>   	ASSERT_RBIO_SECTOR(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors,
>   			   rbio, sector_nr);
> -	ASSERT(sector->page);
> +	ASSERT(sector->kaddr);
>   
>   	stripe = &rbio->bioc->stripes[stripe_nr];
>   	disk_start = stripe->physical + sector_nr * sectorsize;
> @@ -1173,8 +1170,9 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>   		 */
>   		if (last_end == disk_start && !last->bi_status &&
>   		    last->bi_bdev == stripe->dev->bdev) {
> -			ret = bio_add_page(last, sector->page, sectorsize,
> -					   sector->pgoff);
> +			ret = bio_add_page(last, virt_to_page(sector->kaddr),
> +					sectorsize,
> +					offset_in_page(sector->kaddr));
>   			if (ret == sectorsize)
>   				return 0;
>   		}
> @@ -1187,7 +1185,8 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>   	bio->bi_iter.bi_sector = disk_start >> SECTOR_SHIFT;
>   	bio->bi_private = rbio;
>   
> -	__bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
> +	__bio_add_page(bio, virt_to_page(sector->kaddr), sectorsize,
> +			offset_in_page(sector->kaddr));
>   	bio_list_add(bio_list, bio);
>   	return 0;
>   }
> @@ -1204,10 +1203,7 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
>   		struct sector_ptr *sector = &rbio->bio_sectors[index];
>   		struct bio_vec bv = bio_iter_iovec(bio, iter);
>   
> -		sector->page = bv.bv_page;
> -		sector->pgoff = bv.bv_offset;
> -		ASSERT(sector->pgoff < PAGE_SIZE);
> -
> +		sector->kaddr = bvec_virt(&bv);
>   		bio_advance_iter_single(bio, &iter, sectorsize);
>   	}
>   }
> @@ -1298,14 +1294,13 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
>   	/* First collect one sector from each data stripe */
>   	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
>   		sector = sector_in_rbio(rbio, stripe, sectornr, 0);
> -		pointers[stripe] = kmap_local_page(sector->page) +
> -				   sector->pgoff;
> +		pointers[stripe] = sector->kaddr;
>   	}
>   
>   	/* Then add the parity stripe */
>   	sector = rbio_pstripe_sector(rbio, sectornr);
>   	sector->uptodate = 1;
> -	pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
> +	pointers[stripe++] = sector->kaddr;
>   
>   	if (has_qstripe) {
>   		/*
> @@ -1314,8 +1309,7 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
>   		 */
>   		sector = rbio_qstripe_sector(rbio, sectornr);
>   		sector->uptodate = 1;
> -		pointers[stripe++] = kmap_local_page(sector->page) +
> -				     sector->pgoff;
> +		pointers[stripe++] = sector->kaddr;
>   
>   		assert_rbio(rbio);
>   		raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
> @@ -1325,8 +1319,6 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
>   		memcpy(pointers[rbio->nr_data], pointers[0], sectorsize);
>   		run_xor(pointers + 1, rbio->nr_data - 1, sectorsize);
>   	}
> -	for (stripe = stripe - 1; stripe >= 0; stripe--)
> -		kunmap_local(pointers[stripe]);
>   }
>   
>   static int rmw_assemble_write_bios(struct btrfs_raid_bio *rbio,
> @@ -1474,15 +1466,14 @@ static void set_rbio_range_error(struct btrfs_raid_bio *rbio, struct bio *bio)
>    * stripe_pages[], thus we need to locate the sector.
>    */
>   static struct sector_ptr *find_stripe_sector(struct btrfs_raid_bio *rbio,
> -					     struct page *page,
> -					     unsigned int pgoff)
> +					     void *kaddr)
>   {
>   	int i;
>   
>   	for (i = 0; i < rbio->nr_sectors; i++) {
>   		struct sector_ptr *sector = &rbio->stripe_sectors[i];
>   
> -		if (sector->page == page && sector->pgoff == pgoff)
> +		if (sector->kaddr == kaddr)
>   			return sector;
>   	}
>   	return NULL;
> @@ -1502,11 +1493,11 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
>   
>   	bio_for_each_segment_all(bvec, bio, iter_all) {
>   		struct sector_ptr *sector;
> -		int pgoff;
> +		void *kaddr = bvec_virt(bvec);
> +		int off;
>   
> -		for (pgoff = bvec->bv_offset; pgoff - bvec->bv_offset < bvec->bv_len;
> -		     pgoff += sectorsize) {
> -			sector = find_stripe_sector(rbio, bvec->bv_page, pgoff);
> +		for (off = 0; off < bvec->bv_len; off += sectorsize) {
> +			sector = find_stripe_sector(rbio, kaddr + off);
>   			ASSERT(sector);
>   			if (sector)
>   				sector->uptodate = 1;
> @@ -1516,17 +1507,13 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
>   
>   static int get_bio_sector_nr(struct btrfs_raid_bio *rbio, struct bio *bio)
>   {
> -	struct bio_vec *bv = bio_first_bvec_all(bio);
> +	void *bvec_kaddr = bvec_virt(bio_first_bvec_all(bio));
>   	int i;
>   
>   	for (i = 0; i < rbio->nr_sectors; i++) {
> -		struct sector_ptr *sector;
> -
> -		sector = &rbio->stripe_sectors[i];
> -		if (sector->page == bv->bv_page && sector->pgoff == bv->bv_offset)
> +		if (rbio->stripe_sectors[i].kaddr == bvec_kaddr)
>   			break;
> -		sector = &rbio->bio_sectors[i];
> -		if (sector->page == bv->bv_page && sector->pgoff == bv->bv_offset)
> +		if (rbio->bio_sectors[i].kaddr == bvec_kaddr)
>   			break;
>   	}
>   	ASSERT(i < rbio->nr_sectors);
> @@ -1789,8 +1776,6 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
>   	struct sector_ptr *sector;
>   	u8 csum_buf[BTRFS_CSUM_SIZE];
>   	u8 *csum_expected;
> -	void *kaddr;
> -	int ret;
>   
>   	if (!rbio->csum_bitmap || !rbio->csum_buf)
>   		return 0;
> @@ -1808,15 +1793,13 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
>   		sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
>   	}
>   
> -	ASSERT(sector->page);
> +	ASSERT(sector->kaddr);
>   
> -	kaddr = kmap_local_page(sector->page) + sector->pgoff;
>   	csum_expected = rbio->csum_buf +
>   			(stripe_nr * rbio->stripe_nsectors + sector_nr) *
>   			fs_info->csum_size;
> -	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, csum_expected);
> -	kunmap_local(kaddr);
> -	return ret;
> +	return btrfs_check_sector_csum(fs_info, sector->kaddr, csum_buf,
> +			csum_expected);
>   }
>   
>   /*
> @@ -1825,7 +1808,7 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
>    * need to allocate/free the pointers again and again.
>    */
>   static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
> -			    void **pointers, void **unmap_array)
> +			    void **pointers)
>   {
>   	struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
>   	struct sector_ptr *sector;
> @@ -1872,10 +1855,8 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
>   		} else {
>   			sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
>   		}
> -		ASSERT(sector->page);
> -		pointers[stripe_nr] = kmap_local_page(sector->page) +
> -				   sector->pgoff;
> -		unmap_array[stripe_nr] = pointers[stripe_nr];
> +		ASSERT(sector->kaddr);
> +		pointers[stripe_nr] = sector->kaddr;
>   	}
>   
>   	/* All raid6 handling here */
> @@ -1889,7 +1870,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
>   				 * We have nothing to do, just skip the
>   				 * recovery for this stripe.
>   				 */
> -				goto cleanup;
> +				return ret;
>   			/*
>   			 * a single failure in raid6 is rebuilt
>   			 * in the pstripe code below
> @@ -1911,7 +1892,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
>   				 * We only care about data stripes recovery,
>   				 * can skip this vertical stripe.
>   				 */
> -				goto cleanup;
> +				return ret;
>   			/*
>   			 * Otherwise we have one bad data stripe and
>   			 * a good P stripe.  raid5!
> @@ -1960,7 +1941,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
>   	if (faila >= 0) {
>   		ret = verify_one_sector(rbio, faila, sector_nr);
>   		if (ret < 0)
> -			goto cleanup;
> +			return ret;
>   
>   		sector = rbio_stripe_sector(rbio, faila, sector_nr);
>   		sector->uptodate = 1;
> @@ -1968,34 +1949,26 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
>   	if (failb >= 0) {
>   		ret = verify_one_sector(rbio, failb, sector_nr);
>   		if (ret < 0)
> -			goto cleanup;
> +			return ret;
>   
>   		sector = rbio_stripe_sector(rbio, failb, sector_nr);
>   		sector->uptodate = 1;
>   	}
>   
> -cleanup:
> -	for (stripe_nr = rbio->real_stripes - 1; stripe_nr >= 0; stripe_nr--)
> -		kunmap_local(unmap_array[stripe_nr]);
>   	return ret;
>   }
>   
>   static int recover_sectors(struct btrfs_raid_bio *rbio)
>   {
>   	void **pointers = NULL;
> -	void **unmap_array = NULL;
>   	int sectornr;
>   	int ret = 0;
>   
>   	/*
>   	 * @pointers array stores the pointer for each sector.
> -	 *
> -	 * @unmap_array stores copy of pointers that does not get reordered
> -	 * during reconstruction so that kunmap_local works.
>   	 */
>   	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
> -	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
> -	if (!pointers || !unmap_array) {
> +	if (!pointers) {
>   		ret = -ENOMEM;
>   		goto out;
>   	}
> @@ -2009,14 +1982,13 @@ static int recover_sectors(struct btrfs_raid_bio *rbio)
>   	index_rbio_pages(rbio);
>   
>   	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
> -		ret = recover_vertical(rbio, sectornr, pointers, unmap_array);
> +		ret = recover_vertical(rbio, sectornr, pointers);
>   		if (ret < 0)
>   			break;
>   	}
>   
>   out:
>   	kfree(pointers);
> -	kfree(unmap_array);
>   	return ret;
>   }
>   
> @@ -2326,7 +2298,7 @@ static bool need_read_stripe_sectors(struct btrfs_raid_bio *rbio)
>   		 * thus this rbio can not be cached one, as cached one must
>   		 * have all its data sectors present and uptodate.
>   		 */
> -		if (!sector->page || !sector->uptodate)
> +		if (!sector->kaddr || !sector->uptodate)
>   			return true;
>   	}
>   	return false;
> @@ -2547,29 +2519,27 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
>   	 */
>   	clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>   
> -	p_sector.page = alloc_page(GFP_NOFS);
> -	if (!p_sector.page)
> +	p_sector.kaddr = (void *)__get_free_page(GFP_NOFS);
> +	if (!p_sector.kaddr)
>   		return -ENOMEM;
> -	p_sector.pgoff = 0;
>   	p_sector.uptodate = 1;
>   
>   	if (has_qstripe) {
>   		/* RAID6, allocate and map temp space for the Q stripe */
> -		q_sector.page = alloc_page(GFP_NOFS);
> -		if (!q_sector.page) {
> -			__free_page(p_sector.page);
> -			p_sector.page = NULL;
> +		q_sector.kaddr = (void *)__get_free_page(GFP_NOFS);
> +		if (!q_sector.kaddr) {
> +			free_page((unsigned long)p_sector.kaddr);
> +			p_sector.kaddr = NULL;
>   			return -ENOMEM;
>   		}
> -		q_sector.pgoff = 0;
>   		q_sector.uptodate = 1;
> -		pointers[rbio->real_stripes - 1] = kmap_local_page(q_sector.page);
> +		pointers[rbio->real_stripes - 1] = q_sector.kaddr;
>   	}
>   
>   	bitmap_clear(rbio->error_bitmap, 0, rbio->nr_sectors);
>   
>   	/* Map the parity stripe just once */
> -	pointers[nr_data] = kmap_local_page(p_sector.page);
> +	pointers[nr_data] = p_sector.kaddr;
>   
>   	for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
>   		struct sector_ptr *sector;
> @@ -2578,8 +2548,7 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
>   		/* first collect one page from each data stripe */
>   		for (stripe = 0; stripe < nr_data; stripe++) {
>   			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
> -			pointers[stripe] = kmap_local_page(sector->page) +
> -					   sector->pgoff;
> +			pointers[stripe] = sector->kaddr;
>   		}
>   
>   		if (has_qstripe) {
> @@ -2595,25 +2564,19 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
>   
>   		/* Check scrubbing parity and repair it */
>   		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
> -		parity = kmap_local_page(sector->page) + sector->pgoff;
> +		parity = sector->kaddr;
>   		if (memcmp(parity, pointers[rbio->scrubp], sectorsize) != 0)
>   			memcpy(parity, pointers[rbio->scrubp], sectorsize);
>   		else
>   			/* Parity is right, needn't writeback */
>   			bitmap_clear(&rbio->dbitmap, sectornr, 1);
> -		kunmap_local(parity);
> -
> -		for (stripe = nr_data - 1; stripe >= 0; stripe--)
> -			kunmap_local(pointers[stripe]);
>   	}
>   
> -	kunmap_local(pointers[nr_data]);
> -	__free_page(p_sector.page);
> -	p_sector.page = NULL;
> -	if (q_sector.page) {
> -		kunmap_local(pointers[rbio->real_stripes - 1]);
> -		__free_page(q_sector.page);
> -		q_sector.page = NULL;
> +	free_page((unsigned long)p_sector.kaddr);
> +	p_sector.kaddr = NULL;
> +	if (q_sector.kaddr) {
> +		free_page((unsigned long)q_sector.kaddr);
> +		q_sector.kaddr = NULL;
>   	}
>   
>   	/*
> @@ -2669,19 +2632,14 @@ static inline int is_data_stripe(struct btrfs_raid_bio *rbio, int stripe)
>   static int recover_scrub_rbio(struct btrfs_raid_bio *rbio)
>   {
>   	void **pointers = NULL;
> -	void **unmap_array = NULL;
>   	int sector_nr;
>   	int ret = 0;
>   
>   	/*
>   	 * @pointers array stores the pointer for each sector.
> -	 *
> -	 * @unmap_array stores copy of pointers that does not get reordered
> -	 * during reconstruction so that kunmap_local works.
>   	 */
>   	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
> -	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
> -	if (!pointers || !unmap_array) {
> +	if (!pointers) {
>   		ret = -ENOMEM;
>   		goto out;
>   	}
> @@ -2740,13 +2698,12 @@ static int recover_scrub_rbio(struct btrfs_raid_bio *rbio)
>   			goto out;
>   		}
>   
> -		ret = recover_vertical(rbio, sector_nr, pointers, unmap_array);
> +		ret = recover_vertical(rbio, sector_nr, pointers);
>   		if (ret < 0)
>   			goto out;
>   	}
>   out:
>   	kfree(pointers);
> -	kfree(unmap_array);
>   	return ret;
>   }
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr
  2025-04-09 22:34   ` Qu Wenruo
@ 2025-04-10  5:34     ` Christoph Hellwig
  2025-04-14  3:04       ` Qu Wenruo
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-10  5:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs

On Thu, Apr 10, 2025 at 08:04:25AM +0930, Qu Wenruo wrote:
> 在 2025/4/9 20:40, Christoph Hellwig 写道:
>> All data pointed to by struct sector_ptr is non-highmem kernel memory.
>> Simplify the code by using a void pointer instead of a page + offset
>> pair and dropping all the kmap calls.
>
> But the higher level bio from btrfs filemaps can have highmem pages.
>
> That's why we keep the kmap/kunmap.

Where do filemap pages come into the raid code?   As far as I can see
they are always copied, and the memory is only allocated in the raid
code.  As seen in this code we have two direct allocations that I
convered to __get_free_page, and one case that uses the bulk page
allocator where I use page_address.  Or did I miss something?

> Or is there a way to set the filemap to no use any highmem pages?

You can restrict the allocation of a mapping to avoid highmem using
mapping_set_gfp_mask().  But that would not help with direct I/O IFF
user pages came into this code.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr
  2025-04-10  5:34     ` Christoph Hellwig
@ 2025-04-14  3:04       ` Qu Wenruo
  2025-04-17 23:41         ` Qu Wenruo
  0 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2025-04-14  3:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



在 2025/4/10 15:04, Christoph Hellwig 写道:
> On Thu, Apr 10, 2025 at 08:04:25AM +0930, Qu Wenruo wrote:
>> 在 2025/4/9 20:40, Christoph Hellwig 写道:
>>> All data pointed to by struct sector_ptr is non-highmem kernel memory.
>>> Simplify the code by using a void pointer instead of a page + offset
>>> pair and dropping all the kmap calls.
>>
>> But the higher level bio from btrfs filemaps can have highmem pages.
>>
>> That's why we keep the kmap/kunmap.
> 
> Where do filemap pages come into the raid code?   As far as I can see
> they are always copied, and the memory is only allocated in the raid
> code.  As seen in this code we have two direct allocations that I
> convered to __get_free_page, and one case that uses the bulk page
> allocator where I use page_address.  Or did I miss something?

The function sector_in_rbio() can force to use the sector_ptr in 
bio_sectors, and by default we use bio_sectors first.

Thus all the kmap/kunmap pairs are needed for call sites that is 
grabbing the sector through sector_in_rbio().

This includes:

- generate_pq_vertical()
- rmw_assemble_write_bios()
- verify_one_sector() for READ_REBUILD
- recovery_vertical() for READ_REBUILD

So I'm afraid we still need the kmap/kunmap for now.

> 
>> Or is there a way to set the filemap to no use any highmem pages?
> 
> You can restrict the allocation of a mapping to avoid highmem using
> mapping_set_gfp_mask().  But that would not help with direct I/O IFF
> user pages came into this code.

Fine, my dream of getting rid of highmem inside btrfs is still really a 
dream...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr
  2025-04-14  3:04       ` Qu Wenruo
@ 2025-04-17 23:41         ` Qu Wenruo
  0 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-17 23:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



在 2025/4/14 12:34, Qu Wenruo 写道:
> 
> 
> 在 2025/4/10 15:04, Christoph Hellwig 写道:
>> On Thu, Apr 10, 2025 at 08:04:25AM +0930, Qu Wenruo wrote:
>>> 在 2025/4/9 20:40, Christoph Hellwig 写道:
>>>> All data pointed to by struct sector_ptr is non-highmem kernel memory.
>>>> Simplify the code by using a void pointer instead of a page + offset
>>>> pair and dropping all the kmap calls.
>>>
>>> But the higher level bio from btrfs filemaps can have highmem pages.
>>>
>>> That's why we keep the kmap/kunmap.
>>
>> Where do filemap pages come into the raid code?   As far as I can see
>> they are always copied, and the memory is only allocated in the raid
>> code.  As seen in this code we have two direct allocations that I
>> convered to __get_free_page, and one case that uses the bulk page
>> allocator where I use page_address.  Or did I miss something?
> 
> The function sector_in_rbio() can force to use the sector_ptr in 
> bio_sectors, and by default we use bio_sectors first.
> 
> Thus all the kmap/kunmap pairs are needed for call sites that is 
> grabbing the sector through sector_in_rbio().
> 
> This includes:
> 
> - generate_pq_vertical()
> - rmw_assemble_write_bios()
> - verify_one_sector() for READ_REBUILD
> - recovery_vertical() for READ_REBUILD
> 
> So I'm afraid we still need the kmap/kunmap for now.

Most of the series still looks very good to me, for this problem I'll 
changed it to use physical addresses, and keep the kmap/kunmap instead.

When the modified series is properly tested on PAE systems I'll send out 
the refreshed version for review.

Thanks,
Qu

> 
>>
>>> Or is there a way to set the filemap to no use any highmem pages?
>>
>> You can restrict the allocation of a mapping to avoid highmem using
>> mapping_set_gfp_mask().  But that would not help with direct I/O IFF
>> user pages came into this code.
> 
> Fine, my dream of getting rid of highmem inside btrfs is still really a 
> dream...
> 
> Thanks,
> Qu
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 7/8] btrfs: refactor getting the address of a stripe sector
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
                   ` (5 preceding siblings ...)
  2025-04-09 11:10 ` [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  2025-04-09 22:38   ` Qu Wenruo
  2025-04-19  1:01   ` Qu Wenruo
  2025-04-09 11:10 ` [PATCH 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page Christoph Hellwig
  7 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Add a helper to get the actual kernel address of a stripe instead of
just the page and offset into it to simplify the code, and add another
helper to add the memory backing a sector to a bio using the above
helper.

This nicely abstracts away the page + offset representation from almost
all of the scrub code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/scrub.c | 81 +++++++++++++++---------------------------------
 1 file changed, 25 insertions(+), 56 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 49021765c17b..d014b728eb0d 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -579,20 +579,13 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical)
 	return ret;
 }
 
-static struct page *scrub_stripe_get_page(struct scrub_stripe *stripe, int sector_nr)
+static void *scrub_stripe_get_kaddr(struct scrub_stripe *stripe, int sector_nr)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
-	int page_index = (sector_nr << fs_info->sectorsize_bits) >> PAGE_SHIFT;
+	u32 offset = sector_nr << fs_info->sectorsize_bits;
 
-	return stripe->pages[page_index];
-}
-
-static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
-						 int sector_nr)
-{
-	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
-
-	return offset_in_page(sector_nr << fs_info->sectorsize_bits);
+	return page_address(stripe->pages[offset >> PAGE_SHIFT]) +
+			offset_in_page(offset);
 }
 
 static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
@@ -600,19 +593,17 @@ static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
 	const u64 logical = stripe->logical + (sector_nr << fs_info->sectorsize_bits);
-	const struct page *first_page = scrub_stripe_get_page(stripe, sector_nr);
-	const unsigned int first_off = scrub_stripe_get_page_offset(stripe, sector_nr);
+	void *first_kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	u8 on_disk_csum[BTRFS_CSUM_SIZE];
 	u8 calculated_csum[BTRFS_CSUM_SIZE];
-	struct btrfs_header *header;
+	struct btrfs_header *header = first_kaddr;
 
 	/*
 	 * Here we don't have a good way to attach the pages (and subpages)
 	 * to a dummy extent buffer, thus we have to directly grab the members
 	 * from pages.
 	 */
-	header = (struct btrfs_header *)(page_address(first_page) + first_off);
 	memcpy(on_disk_csum, header->csum, fs_info->csum_size);
 
 	if (logical != btrfs_stack_header_bytenr(header)) {
@@ -648,14 +639,11 @@ static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr
 	/* Now check tree block csum. */
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
-	crypto_shash_update(shash, page_address(first_page) + first_off +
-			    BTRFS_CSUM_SIZE, fs_info->sectorsize - BTRFS_CSUM_SIZE);
+	crypto_shash_update(shash, first_kaddr + BTRFS_CSUM_SIZE,
+			fs_info->sectorsize - BTRFS_CSUM_SIZE);
 
 	for (int i = sector_nr + 1; i < sector_nr + sectors_per_tree; i++) {
-		struct page *page = scrub_stripe_get_page(stripe, i);
-		unsigned int page_off = scrub_stripe_get_page_offset(stripe, i);
-
-		crypto_shash_update(shash, page_address(page) + page_off,
+		crypto_shash_update(shash, scrub_stripe_get_kaddr(stripe, i),
 				    fs_info->sectorsize);
 	}
 
@@ -691,10 +679,8 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	struct scrub_sector_verification *sector = &stripe->sectors[sector_nr];
 	const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
-	struct page *page = scrub_stripe_get_page(stripe, sector_nr);
-	unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
+	void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
 	u8 csum_buf[BTRFS_CSUM_SIZE];
-	void *kaddr;
 	int ret;
 
 	ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
@@ -738,9 +724,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 		return;
 	}
 
-	kaddr = kmap_local_page(page) + pgoff;
 	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, sector->csum);
-	kunmap_local(kaddr);
 	if (ret < 0) {
 		set_bit(sector_nr, &stripe->csum_error_bitmap);
 		set_bit(sector_nr, &stripe->error_bitmap);
@@ -769,8 +753,7 @@ static int calc_sector_number(struct scrub_stripe *stripe, struct bio_vec *first
 	int i;
 
 	for (i = 0; i < stripe->nr_sectors; i++) {
-		if (scrub_stripe_get_page(stripe, i) == first_bvec->bv_page &&
-		    scrub_stripe_get_page_offset(stripe, i) == first_bvec->bv_offset)
+		if (scrub_stripe_get_kaddr(stripe, i) == bvec_virt(first_bvec))
 			break;
 	}
 	ASSERT(i < stripe->nr_sectors);
@@ -817,6 +800,15 @@ static int calc_next_mirror(int mirror, int num_copies)
 	return (mirror + 1 > num_copies) ? 1 : mirror + 1;
 }
 
+static void scrub_bio_add_sector(struct btrfs_bio *bbio,
+		struct scrub_stripe *stripe, int sector_nr)
+{
+	void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
+
+	__bio_add_page(&bbio->bio, virt_to_page(kaddr),
+			bbio->fs_info->sectorsize, offset_in_page(kaddr));
+}
+
 static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
 					    int mirror, int blocksize, bool wait)
 {
@@ -829,13 +821,6 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
 	ASSERT(atomic_read(&stripe->pending_io) == 0);
 
 	for_each_set_bit(i, &old_error_bitmap, stripe->nr_sectors) {
-		struct page *page;
-		int pgoff;
-		int ret;
-
-		page = scrub_stripe_get_page(stripe, i);
-		pgoff = scrub_stripe_get_page_offset(stripe, i);
-
 		/* The current sector cannot be merged, submit the bio. */
 		if (bbio && ((i > 0 && !test_bit(i - 1, &stripe->error_bitmap)) ||
 			     bbio->bio.bi_iter.bi_size >= blocksize)) {
@@ -854,8 +839,7 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
 				(i << fs_info->sectorsize_bits)) >> SECTOR_SHIFT;
 		}
 
-		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
-		ASSERT(ret == fs_info->sectorsize);
+		scrub_bio_add_sector(bbio, stripe, i);
 	}
 	if (bbio) {
 		ASSERT(bbio->bio.bi_iter.bi_size);
@@ -1202,10 +1186,6 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
 	int sector_nr;
 
 	for_each_set_bit(sector_nr, &write_bitmap, stripe->nr_sectors) {
-		struct page *page = scrub_stripe_get_page(stripe, sector_nr);
-		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
-		int ret;
-
 		/* We should only writeback sectors covered by an extent. */
 		ASSERT(test_bit(sector_nr, &stripe->extent_sector_bitmap));
 
@@ -1221,8 +1201,7 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
 				(sector_nr << fs_info->sectorsize_bits)) >>
 				SECTOR_SHIFT;
 		}
-		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
-		ASSERT(ret == fs_info->sectorsize);
+		scrub_bio_add_sector(bbio, stripe, sector_nr);
 	}
 	if (bbio)
 		scrub_submit_write_bio(sctx, stripe, bbio, dev_replace);
@@ -1675,9 +1654,6 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
 	atomic_inc(&stripe->pending_io);
 
 	for_each_set_bit(i, &stripe->extent_sector_bitmap, stripe->nr_sectors) {
-		struct page *page = scrub_stripe_get_page(stripe, i);
-		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, i);
-
 		/* We're beyond the chunk boundary, no need to read anymore. */
 		if (i >= nr_sectors)
 			break;
@@ -1730,7 +1706,7 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
 			bbio->bio.bi_iter.bi_sector = logical >> SECTOR_SHIFT;
 		}
 
-		__bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
+		scrub_bio_add_sector(bbio, stripe, i);
 	}
 
 	if (bbio) {
@@ -1768,15 +1744,8 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx,
 
 	bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
 	/* Read the whole range inside the chunk boundary. */
-	for (unsigned int cur = 0; cur < nr_sectors; cur++) {
-		struct page *page = scrub_stripe_get_page(stripe, cur);
-		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, cur);
-		int ret;
-
-		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
-		/* We should have allocated enough bio vectors. */
-		ASSERT(ret == fs_info->sectorsize);
-	}
+	for (unsigned int cur = 0; cur < nr_sectors; cur++)
+		scrub_bio_add_sector(bbio, stripe, cur);
 	atomic_inc(&stripe->pending_io);
 
 	/*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/8] btrfs: refactor getting the address of a stripe sector
  2025-04-09 11:10 ` [PATCH 7/8] btrfs: refactor getting the address of a stripe sector Christoph Hellwig
@ 2025-04-09 22:38   ` Qu Wenruo
  2025-04-19  1:01   ` Qu Wenruo
  1 sibling, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-09 22:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> Add a helper to get the actual kernel address of a stripe instead of
> just the page and offset into it to simplify the code, and add another
> helper to add the memory backing a sector to a bio using the above
> helper.
> 
> This nicely abstracts away the page + offset representation from almost
> all of the scrub code.

I love this change, and it should also be safe because all scrub pages 
are allocated by ourselves, and are ensured not to be high mem.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/scrub.c | 81 +++++++++++++++---------------------------------
>   1 file changed, 25 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 49021765c17b..d014b728eb0d 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -579,20 +579,13 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical)
>   	return ret;
>   }
>   
> -static struct page *scrub_stripe_get_page(struct scrub_stripe *stripe, int sector_nr)
> +static void *scrub_stripe_get_kaddr(struct scrub_stripe *stripe, int sector_nr)
>   {
>   	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
> -	int page_index = (sector_nr << fs_info->sectorsize_bits) >> PAGE_SHIFT;
> +	u32 offset = sector_nr << fs_info->sectorsize_bits;
>   
> -	return stripe->pages[page_index];
> -}
> -
> -static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
> -						 int sector_nr)
> -{
> -	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
> -
> -	return offset_in_page(sector_nr << fs_info->sectorsize_bits);
> +	return page_address(stripe->pages[offset >> PAGE_SHIFT]) +
> +			offset_in_page(offset);
>   }
>   
>   static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
> @@ -600,19 +593,17 @@ static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr
>   	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
>   	const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
>   	const u64 logical = stripe->logical + (sector_nr << fs_info->sectorsize_bits);
> -	const struct page *first_page = scrub_stripe_get_page(stripe, sector_nr);
> -	const unsigned int first_off = scrub_stripe_get_page_offset(stripe, sector_nr);
> +	void *first_kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
>   	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>   	u8 on_disk_csum[BTRFS_CSUM_SIZE];
>   	u8 calculated_csum[BTRFS_CSUM_SIZE];
> -	struct btrfs_header *header;
> +	struct btrfs_header *header = first_kaddr;
>   
>   	/*
>   	 * Here we don't have a good way to attach the pages (and subpages)
>   	 * to a dummy extent buffer, thus we have to directly grab the members
>   	 * from pages.
>   	 */
> -	header = (struct btrfs_header *)(page_address(first_page) + first_off);
>   	memcpy(on_disk_csum, header->csum, fs_info->csum_size);
>   
>   	if (logical != btrfs_stack_header_bytenr(header)) {
> @@ -648,14 +639,11 @@ static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr
>   	/* Now check tree block csum. */
>   	shash->tfm = fs_info->csum_shash;
>   	crypto_shash_init(shash);
> -	crypto_shash_update(shash, page_address(first_page) + first_off +
> -			    BTRFS_CSUM_SIZE, fs_info->sectorsize - BTRFS_CSUM_SIZE);
> +	crypto_shash_update(shash, first_kaddr + BTRFS_CSUM_SIZE,
> +			fs_info->sectorsize - BTRFS_CSUM_SIZE);
>   
>   	for (int i = sector_nr + 1; i < sector_nr + sectors_per_tree; i++) {
> -		struct page *page = scrub_stripe_get_page(stripe, i);
> -		unsigned int page_off = scrub_stripe_get_page_offset(stripe, i);
> -
> -		crypto_shash_update(shash, page_address(page) + page_off,
> +		crypto_shash_update(shash, scrub_stripe_get_kaddr(stripe, i),
>   				    fs_info->sectorsize);
>   	}
>   
> @@ -691,10 +679,8 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
>   	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
>   	struct scrub_sector_verification *sector = &stripe->sectors[sector_nr];
>   	const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
> -	struct page *page = scrub_stripe_get_page(stripe, sector_nr);
> -	unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
> +	void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
>   	u8 csum_buf[BTRFS_CSUM_SIZE];
> -	void *kaddr;
>   	int ret;
>   
>   	ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
> @@ -738,9 +724,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
>   		return;
>   	}
>   
> -	kaddr = kmap_local_page(page) + pgoff;
>   	ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, sector->csum);
> -	kunmap_local(kaddr);
>   	if (ret < 0) {
>   		set_bit(sector_nr, &stripe->csum_error_bitmap);
>   		set_bit(sector_nr, &stripe->error_bitmap);
> @@ -769,8 +753,7 @@ static int calc_sector_number(struct scrub_stripe *stripe, struct bio_vec *first
>   	int i;
>   
>   	for (i = 0; i < stripe->nr_sectors; i++) {
> -		if (scrub_stripe_get_page(stripe, i) == first_bvec->bv_page &&
> -		    scrub_stripe_get_page_offset(stripe, i) == first_bvec->bv_offset)
> +		if (scrub_stripe_get_kaddr(stripe, i) == bvec_virt(first_bvec))
>   			break;
>   	}
>   	ASSERT(i < stripe->nr_sectors);
> @@ -817,6 +800,15 @@ static int calc_next_mirror(int mirror, int num_copies)
>   	return (mirror + 1 > num_copies) ? 1 : mirror + 1;
>   }
>   
> +static void scrub_bio_add_sector(struct btrfs_bio *bbio,
> +		struct scrub_stripe *stripe, int sector_nr)
> +{
> +	void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
> +
> +	__bio_add_page(&bbio->bio, virt_to_page(kaddr),
> +			bbio->fs_info->sectorsize, offset_in_page(kaddr));
> +}
> +
>   static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
>   					    int mirror, int blocksize, bool wait)
>   {
> @@ -829,13 +821,6 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
>   	ASSERT(atomic_read(&stripe->pending_io) == 0);
>   
>   	for_each_set_bit(i, &old_error_bitmap, stripe->nr_sectors) {
> -		struct page *page;
> -		int pgoff;
> -		int ret;
> -
> -		page = scrub_stripe_get_page(stripe, i);
> -		pgoff = scrub_stripe_get_page_offset(stripe, i);
> -
>   		/* The current sector cannot be merged, submit the bio. */
>   		if (bbio && ((i > 0 && !test_bit(i - 1, &stripe->error_bitmap)) ||
>   			     bbio->bio.bi_iter.bi_size >= blocksize)) {
> @@ -854,8 +839,7 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
>   				(i << fs_info->sectorsize_bits)) >> SECTOR_SHIFT;
>   		}
>   
> -		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> -		ASSERT(ret == fs_info->sectorsize);
> +		scrub_bio_add_sector(bbio, stripe, i);
>   	}
>   	if (bbio) {
>   		ASSERT(bbio->bio.bi_iter.bi_size);
> @@ -1202,10 +1186,6 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
>   	int sector_nr;
>   
>   	for_each_set_bit(sector_nr, &write_bitmap, stripe->nr_sectors) {
> -		struct page *page = scrub_stripe_get_page(stripe, sector_nr);
> -		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
> -		int ret;
> -
>   		/* We should only writeback sectors covered by an extent. */
>   		ASSERT(test_bit(sector_nr, &stripe->extent_sector_bitmap));
>   
> @@ -1221,8 +1201,7 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
>   				(sector_nr << fs_info->sectorsize_bits)) >>
>   				SECTOR_SHIFT;
>   		}
> -		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> -		ASSERT(ret == fs_info->sectorsize);
> +		scrub_bio_add_sector(bbio, stripe, sector_nr);
>   	}
>   	if (bbio)
>   		scrub_submit_write_bio(sctx, stripe, bbio, dev_replace);
> @@ -1675,9 +1654,6 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
>   	atomic_inc(&stripe->pending_io);
>   
>   	for_each_set_bit(i, &stripe->extent_sector_bitmap, stripe->nr_sectors) {
> -		struct page *page = scrub_stripe_get_page(stripe, i);
> -		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, i);
> -
>   		/* We're beyond the chunk boundary, no need to read anymore. */
>   		if (i >= nr_sectors)
>   			break;
> @@ -1730,7 +1706,7 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
>   			bbio->bio.bi_iter.bi_sector = logical >> SECTOR_SHIFT;
>   		}
>   
> -		__bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> +		scrub_bio_add_sector(bbio, stripe, i);
>   	}
>   
>   	if (bbio) {
> @@ -1768,15 +1744,8 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx,
>   
>   	bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
>   	/* Read the whole range inside the chunk boundary. */
> -	for (unsigned int cur = 0; cur < nr_sectors; cur++) {
> -		struct page *page = scrub_stripe_get_page(stripe, cur);
> -		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, cur);
> -		int ret;
> -
> -		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> -		/* We should have allocated enough bio vectors. */
> -		ASSERT(ret == fs_info->sectorsize);
> -	}
> +	for (unsigned int cur = 0; cur < nr_sectors; cur++)
> +		scrub_bio_add_sector(bbio, stripe, cur);
>   	atomic_inc(&stripe->pending_io);
>   
>   	/*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/8] btrfs: refactor getting the address of a stripe sector
  2025-04-09 11:10 ` [PATCH 7/8] btrfs: refactor getting the address of a stripe sector Christoph Hellwig
  2025-04-09 22:38   ` Qu Wenruo
@ 2025-04-19  1:01   ` Qu Wenruo
  1 sibling, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2025-04-19  1:01 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



在 2025/4/9 20:40, Christoph Hellwig 写道:
> Add a helper to get the actual kernel address of a stripe instead of
> just the page and offset into it to simplify the code, and add another
> helper to add the memory backing a sector to a bio using the above
> helper.
> 
> This nicely abstracts away the page + offset representation from almost
> all of the scrub code.
[...]
> +static void scrub_bio_add_sector(struct btrfs_bio *bbio,
> +		struct scrub_stripe *stripe, int sector_nr)
> +{
> +	void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
> +
> +	__bio_add_page(&bbio->bio, virt_to_page(kaddr),
> +			bbio->fs_info->sectorsize, offset_in_page(kaddr));

Another bug here, on subpage cases (fs block size < page size), 
especially for 64K page size and 4K fs block size, it will trigger the 
WARN_ON_ONCE(bio_full()).

As a stripe is exactly 64K, thus for the initial read we only need a 
single bio with at most one bvec.

As we know the whole range is covered by exactly one page, thus all the 
submitted blocks can be merged into just one bvec.

But __bio_add_page() doesn't do the merge check, thus it will trigger 
the WARN_ON_ONCE() for the second block.


__bio_add_page() is not the same as "ret = bio_add_page(); ASSERT(ret == 
blocksize)".

Again will fix it by reverting to the old bio_add_page() with ASSERT().

Thanks,
Qu

> +}
> +
>   static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
>   					    int mirror, int blocksize, bool wait)
>   {
> @@ -829,13 +821,6 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
>   	ASSERT(atomic_read(&stripe->pending_io) == 0);
>   
>   	for_each_set_bit(i, &old_error_bitmap, stripe->nr_sectors) {
> -		struct page *page;
> -		int pgoff;
> -		int ret;
> -
> -		page = scrub_stripe_get_page(stripe, i);
> -		pgoff = scrub_stripe_get_page_offset(stripe, i);
> -
>   		/* The current sector cannot be merged, submit the bio. */
>   		if (bbio && ((i > 0 && !test_bit(i - 1, &stripe->error_bitmap)) ||
>   			     bbio->bio.bi_iter.bi_size >= blocksize)) {
> @@ -854,8 +839,7 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
>   				(i << fs_info->sectorsize_bits)) >> SECTOR_SHIFT;
>   		}
>   
> -		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> -		ASSERT(ret == fs_info->sectorsize);
> +		scrub_bio_add_sector(bbio, stripe, i);
>   	}
>   	if (bbio) {
>   		ASSERT(bbio->bio.bi_iter.bi_size);
> @@ -1202,10 +1186,6 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
>   	int sector_nr;
>   
>   	for_each_set_bit(sector_nr, &write_bitmap, stripe->nr_sectors) {
> -		struct page *page = scrub_stripe_get_page(stripe, sector_nr);
> -		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
> -		int ret;
> -
>   		/* We should only writeback sectors covered by an extent. */
>   		ASSERT(test_bit(sector_nr, &stripe->extent_sector_bitmap));
>   
> @@ -1221,8 +1201,7 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
>   				(sector_nr << fs_info->sectorsize_bits)) >>
>   				SECTOR_SHIFT;
>   		}
> -		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> -		ASSERT(ret == fs_info->sectorsize);
> +		scrub_bio_add_sector(bbio, stripe, sector_nr);
>   	}
>   	if (bbio)
>   		scrub_submit_write_bio(sctx, stripe, bbio, dev_replace);
> @@ -1675,9 +1654,6 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
>   	atomic_inc(&stripe->pending_io);
>   
>   	for_each_set_bit(i, &stripe->extent_sector_bitmap, stripe->nr_sectors) {
> -		struct page *page = scrub_stripe_get_page(stripe, i);
> -		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, i);
> -
>   		/* We're beyond the chunk boundary, no need to read anymore. */
>   		if (i >= nr_sectors)
>   			break;
> @@ -1730,7 +1706,7 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
>   			bbio->bio.bi_iter.bi_sector = logical >> SECTOR_SHIFT;
>   		}
>   
> -		__bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> +		scrub_bio_add_sector(bbio, stripe, i);
>   	}
>   
>   	if (bbio) {
> @@ -1768,15 +1744,8 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx,
>   
>   	bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
>   	/* Read the whole range inside the chunk boundary. */
> -	for (unsigned int cur = 0; cur < nr_sectors; cur++) {
> -		struct page *page = scrub_stripe_get_page(stripe, cur);
> -		unsigned int pgoff = scrub_stripe_get_page_offset(stripe, cur);
> -		int ret;
> -
> -		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
> -		/* We should have allocated enough bio vectors. */
> -		ASSERT(ret == fs_info->sectorsize);
> -	}
> +	for (unsigned int cur = 0; cur < nr_sectors; cur++)
> +		scrub_bio_add_sector(bbio, stripe, cur);
>   	atomic_inc(&stripe->pending_io);
>   
>   	/*


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page
  2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
                   ` (6 preceding siblings ...)
  2025-04-09 11:10 ` [PATCH 7/8] btrfs: refactor getting the address of a stripe sector Christoph Hellwig
@ 2025-04-09 11:10 ` Christoph Hellwig
  7 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2025-04-09 11:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

This removes the last direct poke into bvec internals in btrfs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e7f8ee5d48a4..9d0996c8370d 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1182,6 +1182,7 @@ int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
 		u32 copy_start;
 		/* Offset inside the full decompressed extent */
 		u32 bvec_offset;
+		void *kaddr;
 
 		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
 		/*
@@ -1204,10 +1205,12 @@ int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
 		 * @buf + @buf_len.
 		 */
 		ASSERT(copy_start - decompressed < buf_len);
-		memcpy_to_page(bvec.bv_page, bvec.bv_offset,
-			       buf + copy_start - decompressed, copy_len);
-		cur_offset += copy_len;
 
+		kaddr = bvec_kmap_local(&bvec);
+		memcpy(kaddr, buf + copy_start - decompressed, copy_len);
+		kunmap_local(kaddr);
+
+		cur_offset += copy_len;
 		bio_advance(orig_bio, copy_len);
 		/* Finished the bio */
 		if (!orig_bio->bi_iter.bi_size)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-04-19  1:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09 11:10 RFC: (almost) stop poking into bvec internals in btrfs Christoph Hellwig
2025-04-09 11:10 ` [PATCH 1/8] btrfs: remove the alignment checks in end_bbio_data_read Christoph Hellwig
2025-04-09 22:13   ` Qu Wenruo
2025-04-10  5:30     ` Christoph Hellwig
2025-04-10  5:39       ` Qu Wenruo
2025-04-09 11:10 ` [PATCH 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Christoph Hellwig
2025-04-09 22:15   ` Qu Wenruo
2025-04-09 11:10 ` [PATCH 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Christoph Hellwig
2025-04-09 22:19   ` Qu Wenruo
2025-04-10  5:31     ` Christoph Hellwig
2025-04-10  6:06   ` Johannes Thumshirn
2025-04-10  6:17     ` hch
2025-04-09 11:10 ` [PATCH 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Christoph Hellwig
2025-04-10  6:16   ` Johannes Thumshirn
2025-04-16  4:51   ` Qu Wenruo
2025-04-09 11:10 ` [PATCH 5/8] btrfs: simplify bvec iteration in index_one_bio Christoph Hellwig
2025-04-18  2:09   ` Qu Wenruo
2025-04-09 11:10 ` [PATCH 6/8] btrfs: store a kernel virtual address in struct sector_ptr Christoph Hellwig
2025-04-09 22:34   ` Qu Wenruo
2025-04-10  5:34     ` Christoph Hellwig
2025-04-14  3:04       ` Qu Wenruo
2025-04-17 23:41         ` Qu Wenruo
2025-04-09 11:10 ` [PATCH 7/8] btrfs: refactor getting the address of a stripe sector Christoph Hellwig
2025-04-09 22:38   ` Qu Wenruo
2025-04-19  1:01   ` Qu Wenruo
2025-04-09 11:10 ` [PATCH 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox