* [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec
@ 2025-04-19 7:17 Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 1/8] btrfs: remove the alignment checks in end_bbio_data_read Qu Wenruo
` (8 more replies)
0 siblings, 9 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests
- Fix a crash caused by incorrectly index sector_ptrs
The 5th patch, caused by missing increasement of the @offset variable.
- Use physical addresses instead of virtual addresses to handle HIGHMEM
The 6th patch, as we can still get sector_ptr from bio_sectors[] which
is using pages from the bio, and that can be highmem.
However I can not do a proper test on this, as the latest kernel has
already rejected HIGHMEM64G option, thus even if my VM has extra 3GB
for HIGHMEM (total 6GB), I'm not sure if the kernel can really utilize
those high memories.
Furthermore there seems to be other bugs in mm layer related to
HIGHMEM + PAGE, resulting zswap crash when try compressing a page to
be swapped out.
But at least several scrub/balance related test cases passed on x86
32bit with HIGHMEM and PAGE.
I have tested with x86_64 (64 bit, no HIGHMEM), aarch64 (64bit, 64K
page size, no HIGHMEM) with no regression.
- Fix a incorrect __bio_add_page() usage in scrub_bio_add_sector()
The 6th patch, as bio_add_page() do extra bvec merging before
allocating a new bvec, but __bio_add_page() does not.
This triggers WARN_ON_ONCE() in __bio_add_page() checking if the bio
is full.
Fixing it by do the old bio_add_page() and ASSERT(), with extra
comment on we can not use __bio_add_page().
- Various minor commit message update
And full proper test runs.
Christoph Hellwig (8):
btrfs: remove the alignment checks in end_bbio_data_read
btrfs: track the next file offset in struct btrfs_bio_ctrl
btrfs: pass a physical address to btrfs_repair_io_failure
btrfs: move kmapping out of btrfs_check_sector_csum
btrfs: simplify bvec iteration in index_one_bio
btrfs: raid56: store a physical address in structure sector_ptr
btrfs: scrub: use virtual addresses directly
btrfs: use bvec_kmap_local in btrfs_decompress_buf2page
fs/btrfs/bio.c | 9 +-
fs/btrfs/bio.h | 3 +-
fs/btrfs/btrfs_inode.h | 4 +-
fs/btrfs/compression.c | 9 +-
fs/btrfs/disk-io.c | 7 +-
fs/btrfs/extent_io.c | 60 ++++---------
fs/btrfs/inode.c | 20 ++---
fs/btrfs/raid56.c | 189 ++++++++++++++++++++++-------------------
fs/btrfs/scrub.c | 93 +++++++++-----------
9 files changed, 184 insertions(+), 210 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/8] btrfs: remove the alignment checks in end_bbio_data_read
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Qu Wenruo
` (7 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
end_bbio_data_read() checks that each iterated folio fragment is aligned
and justifies that with block drivers advancing the bio. But block
driver only advance bi_iter, while end_bbio_data_read() uses
bio_for_each_folio_all() to iterate the immutable bi_io_vec array that
can't be changed by drivers at all.
Furthermore btrfs has already did the alignment check of the file
offset inside submit_one_sector(), and the size is fixed to fs block
size, there is no need to re-do the alignment check again inside the
endio function.
So just remove the unnecessary alignment check along with the incorrect
comment.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 27 +++------------------------
1 file changed, 3 insertions(+), 24 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5f08615b334f..f2fafad1add0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -499,43 +499,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
struct btrfs_fs_info *fs_info = bbio->fs_info;
struct bio *bio = &bbio->bio;
struct folio_iter fi;
- const u32 sectorsize = fs_info->sectorsize;
ASSERT(!bio_flagged(bio, BIO_CLONED));
bio_for_each_folio_all(fi, &bbio->bio) {
bool uptodate = !bio->bi_status;
struct folio *folio = fi.folio;
struct inode *inode = folio->mapping->host;
- u64 start;
- u64 end;
- u32 len;
+ u64 start = folio_pos(folio) + fi.offset;
btrfs_debug(fs_info,
"%s: bi_sector=%llu, err=%d, mirror=%u",
__func__, bio->bi_iter.bi_sector, bio->bi_status,
bbio->mirror_num);
- /*
- * We always issue full-sector reads, but if some block in a
- * folio fails to read, blk_update_request() will advance
- * bv_offset and adjust bv_len to compensate. Print a warning
- * for unaligned offsets, and an error if they don't add up to
- * a full sector.
- */
- if (!IS_ALIGNED(fi.offset, sectorsize))
- btrfs_err(fs_info,
- "partial page read in btrfs with offset %zu and length %zu",
- fi.offset, fi.length);
- else if (!IS_ALIGNED(fi.offset + fi.length, sectorsize))
- btrfs_info(fs_info,
- "incomplete page read with offset %zu and length %zu",
- fi.offset, fi.length);
-
- start = folio_pos(folio) + fi.offset;
- end = start + fi.length - 1;
- len = fi.length;
if (likely(uptodate)) {
+ u64 end = start + fi.length - 1;
loff_t i_size = i_size_read(inode);
/*
@@ -560,7 +539,7 @@ static void end_bbio_data_read(struct btrfs_bio *bbio)
}
/* Update page status and unlock. */
- end_folio_read(folio, uptodate, start, len);
+ end_folio_read(folio, uptodate, start, fi.length);
}
bio_put(bio);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 1/8] btrfs: remove the alignment checks in end_bbio_data_read Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Qu Wenruo
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
The bio implementation is not something we should really mess around,
and we shouldn't recalculate the pos from the folio over and over.
Instead just track then end of the current bio in logical file offsets
in the btrfs_bio_ctrl, which is much simpler and easier to read.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 33 +++++++++++++--------------------
1 file changed, 13 insertions(+), 20 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f2fafad1add0..51b6f25e94aa 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -96,6 +96,7 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
*/
struct btrfs_bio_ctrl {
struct btrfs_bio *bbio;
+ loff_t next_file_offset; /* last byte contained in bbio + 1 */
enum btrfs_compression_type compress_type;
u32 len_to_oe_boundary;
blk_opf_t opf;
@@ -630,13 +631,10 @@ static int alloc_eb_folio_array(struct extent_buffer *eb, bool nofail)
}
static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
- struct folio *folio, u64 disk_bytenr,
- unsigned int pg_offset)
+ u64 disk_bytenr, loff_t file_offset)
{
struct bio *bio = &bio_ctrl->bbio->bio;
- struct bio_vec *bvec = bio_last_bvec_all(bio);
const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
- struct folio *bv_folio = page_folio(bvec->bv_page);
if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) {
/*
@@ -647,19 +645,11 @@ static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
}
/*
- * The contig check requires the following conditions to be met:
- *
- * 1) The folios are belonging to the same inode
- * This is implied by the call chain.
- *
- * 2) The range has adjacent logical bytenr
- *
- * 3) The range has adjacent file offset
- * This is required for the usage of btrfs_bio->file_offset.
+ * To merge into a bio both the disk sector and the logical offset in
+ * the file need to be contiguous.
*/
- return bio_end_sector(bio) == sector &&
- folio_pos(bv_folio) + bvec->bv_offset + bvec->bv_len ==
- folio_pos(folio) + pg_offset;
+ return bio_ctrl->next_file_offset == file_offset &&
+ bio_end_sector(bio) == sector;
}
static void alloc_new_bio(struct btrfs_inode *inode,
@@ -677,6 +667,7 @@ static void alloc_new_bio(struct btrfs_inode *inode,
bbio->file_offset = file_offset;
bio_ctrl->bbio = bbio;
bio_ctrl->len_to_oe_boundary = U32_MAX;
+ bio_ctrl->next_file_offset = file_offset;
/* Limit data write bios to the ordered boundary. */
if (bio_ctrl->wbc) {
@@ -718,12 +709,13 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
size_t size, unsigned long pg_offset)
{
struct btrfs_inode *inode = folio_to_inode(folio);
+ loff_t file_offset = folio_pos(folio) + pg_offset;
ASSERT(pg_offset + size <= folio_size(folio));
ASSERT(bio_ctrl->end_io_func);
if (bio_ctrl->bbio &&
- !btrfs_bio_is_contig(bio_ctrl, folio, disk_bytenr, pg_offset))
+ !btrfs_bio_is_contig(bio_ctrl, disk_bytenr, file_offset))
submit_one_bio(bio_ctrl);
do {
@@ -732,7 +724,7 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
/* Allocate new bio if needed */
if (!bio_ctrl->bbio) {
alloc_new_bio(inode, bio_ctrl, disk_bytenr,
- folio_pos(folio) + pg_offset);
+ file_offset);
}
/* Cap to the current ordered extent boundary if there is one. */
@@ -747,14 +739,15 @@ static void submit_extent_folio(struct btrfs_bio_ctrl *bio_ctrl,
submit_one_bio(bio_ctrl);
continue;
}
+ bio_ctrl->next_file_offset += len;
if (bio_ctrl->wbc)
- wbc_account_cgroup_owner(bio_ctrl->wbc, folio,
- len);
+ wbc_account_cgroup_owner(bio_ctrl->wbc, folio, len);
size -= len;
pg_offset += len;
disk_bytenr += len;
+ file_offset += len;
/*
* len_to_oe_boundary defaults to U32_MAX, which isn't folio or
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/8] btrfs: pass a physical address to btrfs_repair_io_failure
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 1/8] btrfs: remove the alignment checks in end_bbio_data_read Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Qu Wenruo
` (5 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
Using physical address has the following advantages:
- All involved callers only need a single pointer
Instead of the old @folio + @offset pair.
- No complex poking into the bio_vec structure
As a bio_vec can be single or multiple paged, grabbing the real page
can be quite complex if the bio_vec is a multi-page one.
Instead bvec_phys() will always give a single physical address, and it
cab be easily converted to a page.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/bio.c | 9 ++++-----
fs/btrfs/bio.h | 3 +--
fs/btrfs/disk-io.c | 7 ++++---
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 8c2eee1f1878..10f31ee1e8c0 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -192,7 +192,7 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
repair_bbio->file_offset, fs_info->sectorsize,
repair_bbio->saved_iter.bi_sector << SECTOR_SHIFT,
- page_folio(bv->bv_page), bv->bv_offset, mirror);
+ bvec_phys(bv), mirror);
} while (mirror != fbio->bbio->mirror_num);
done:
@@ -803,8 +803,8 @@ void btrfs_submit_bbio(struct btrfs_bio *bbio, int mirror_num)
* freeing the bio.
*/
int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
- u64 length, u64 logical, struct folio *folio,
- unsigned int folio_offset, int mirror_num)
+ u64 length, u64 logical, phys_addr_t paddr,
+ int mirror_num)
{
struct btrfs_io_stripe smap = { 0 };
struct bio_vec bvec;
@@ -835,8 +835,7 @@ int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
bio_init(&bio, smap.dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
bio.bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
- ret = bio_add_folio(&bio, folio, length, folio_offset);
- ASSERT(ret);
+ __bio_add_page(&bio, phys_to_page(paddr), length, offset_in_page(paddr));
ret = submit_bio_wait(&bio);
if (ret) {
/* try to remap that extent elsewhere? */
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index e2fe16074ad6..dc2eb43b7097 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -110,7 +110,6 @@ void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status);
void btrfs_submit_bbio(struct btrfs_bio *bbio, int mirror_num);
void btrfs_submit_repair_write(struct btrfs_bio *bbio, int mirror_num, bool dev_replace);
int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
- u64 length, u64 logical, struct folio *folio,
- unsigned int folio_offset, int mirror_num);
+ u64 length, u64 logical, phys_addr_t paddr, int mirror_num);
#endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 59da809b7d57..280bc0ee6046 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -193,10 +193,11 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
u64 end = min_t(u64, eb->start + eb->len,
folio_pos(folio) + eb->folio_size);
u32 len = end - start;
+ phys_addr_t paddr = PFN_PHYS(folio_pfn(folio)) +
+ offset_in_folio(folio, start);
- ret = btrfs_repair_io_failure(fs_info, 0, start, len,
- start, folio, offset_in_folio(folio, start),
- mirror_num);
+ ret = btrfs_repair_io_failure(fs_info, 0, start, len, start,
+ paddr, mirror_num);
if (ret)
break;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 4/8] btrfs: move kmapping out of btrfs_check_sector_csum
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
` (2 preceding siblings ...)
2025-04-19 7:17 ` [PATCH v2 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 5/8] btrfs: simplify bvec iteration in index_one_bio Qu Wenruo
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
Move kmapping the page out of btrfs_check_sector_csum().
This allows using bvec_kmap_local() where suitable and reduces the number
of kmap*() calls in the raid56 code.
This also means btrfs_check_sector_csum() will only accept a properly
kmapped address.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/btrfs_inode.h | 4 ++--
fs/btrfs/inode.c | 20 ++++++++++----------
fs/btrfs/raid56.c | 19 +++++++++++--------
fs/btrfs/scrub.c | 5 ++++-
4 files changed, 27 insertions(+), 21 deletions(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e2952cf5766..d48438332c7c 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -529,8 +529,8 @@ static inline void btrfs_update_inode_mapping_flags(struct btrfs_inode *inode)
#define CSUM_FMT "0x%*phN"
#define CSUM_FMT_VALUE(size, bytes) size, bytes
-int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
- u32 pgoff, u8 *csum, const u8 * const csum_expected);
+int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, void *kaddr,
+ u8 *csum, const u8 * const csum_expected);
bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
u32 bio_offset, struct bio_vec *bv);
noinline int can_nocow_extent(struct btrfs_inode *inode, u64 offset, u64 *len,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ddbddf5d2423..f6600924e775 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3320,20 +3320,16 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered)
/*
* Verify the checksum for a single sector without any extra action that depend
* on the type of I/O.
+ *
+ * @kaddr must be a properly kmapped address.
*/
-int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
- u32 pgoff, u8 *csum, const u8 * const csum_expected)
+int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, void *kaddr,
+ u8 *csum, const u8 * const csum_expected)
{
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
- char *kaddr;
-
- ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE);
shash->tfm = fs_info->csum_shash;
-
- kaddr = kmap_local_page(page) + pgoff;
crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
- kunmap_local(kaddr);
if (memcmp(csum, csum_expected, fs_info->csum_size))
return -EIO;
@@ -3362,6 +3358,7 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
u64 end = file_offset + bv->bv_len - 1;
u8 *csum_expected;
u8 csum[BTRFS_CSUM_SIZE];
+ char *kaddr;
ASSERT(bv->bv_len == fs_info->sectorsize);
@@ -3379,9 +3376,12 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
csum_expected = bbio->csum + (bio_offset >> fs_info->sectorsize_bits) *
fs_info->csum_size;
- if (btrfs_check_sector_csum(fs_info, bv->bv_page, bv->bv_offset, csum,
- csum_expected))
+ kaddr = bvec_kmap_local(bv);
+ if (btrfs_check_sector_csum(fs_info, kaddr, csum, csum_expected)) {
+ kunmap_local(kaddr);
goto zeroit;
+ }
+ kunmap_local(kaddr);
return true;
zeroit:
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index cdd373c27784..28dbded86cc2 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1575,11 +1575,11 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
return;
bio_for_each_segment_all(bvec, bio, iter_all) {
- int bv_offset;
+ void *kaddr = bvec_kmap_local(bvec);
+ unsigned int off;
- for (bv_offset = bvec->bv_offset;
- bv_offset < bvec->bv_offset + bvec->bv_len;
- bv_offset += fs_info->sectorsize, total_sector_nr++) {
+ for (off = 0; off < bvec->bv_len;
+ off += fs_info->sectorsize, total_sector_nr++) {
u8 csum_buf[BTRFS_CSUM_SIZE];
u8 *expected_csum = rbio->csum_buf +
total_sector_nr * fs_info->csum_size;
@@ -1589,11 +1589,12 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
if (!test_bit(total_sector_nr, rbio->csum_bitmap))
continue;
- ret = btrfs_check_sector_csum(fs_info, bvec->bv_page,
- bv_offset, csum_buf, expected_csum);
+ ret = btrfs_check_sector_csum(fs_info, kaddr + off,
+ csum_buf, expected_csum);
if (ret < 0)
set_bit(total_sector_nr, rbio->error_bitmap);
}
+ kunmap_local(kaddr);
}
}
@@ -1791,6 +1792,7 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
struct sector_ptr *sector;
u8 csum_buf[BTRFS_CSUM_SIZE];
u8 *csum_expected;
+ void *kaddr;
int ret;
if (!rbio->csum_bitmap || !rbio->csum_buf)
@@ -1811,11 +1813,12 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
ASSERT(sector->page);
+ kaddr = kmap_local_page(sector->page) + sector->pgoff;
csum_expected = rbio->csum_buf +
(stripe_nr * rbio->stripe_nsectors + sector_nr) *
fs_info->csum_size;
- ret = btrfs_check_sector_csum(fs_info, sector->page, sector->pgoff,
- csum_buf, csum_expected);
+ ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, csum_expected);
+ kunmap_local(kaddr);
return ret;
}
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2c5edcee9450..49021765c17b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -694,6 +694,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
struct page *page = scrub_stripe_get_page(stripe, sector_nr);
unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
u8 csum_buf[BTRFS_CSUM_SIZE];
+ void *kaddr;
int ret;
ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
@@ -737,7 +738,9 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
return;
}
- ret = btrfs_check_sector_csum(fs_info, page, pgoff, csum_buf, sector->csum);
+ kaddr = kmap_local_page(page) + pgoff;
+ ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, sector->csum);
+ kunmap_local(kaddr);
if (ret < 0) {
set_bit(sector_nr, &stripe->csum_error_bitmap);
set_bit(sector_nr, &stripe->error_bitmap);
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 5/8] btrfs: simplify bvec iteration in index_one_bio
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
` (3 preceding siblings ...)
2025-04-19 7:17 ` [PATCH v2 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr Qu Wenruo
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
Flatten the two loops by open coding bio_for_each_segment and advancing
the iterator one sector at a time.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ Fix a bug that @offset is not increased. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/raid56.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 28dbded86cc2..b0938eff908e 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1195,23 +1195,21 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
{
const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
- struct bio_vec bvec;
- struct bvec_iter iter;
+ struct bvec_iter iter = bio->bi_iter;
u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
rbio->bioc->full_stripe_logical;
- bio_for_each_segment(bvec, bio, iter) {
- u32 bvec_offset;
+ while (iter.bi_size) {
+ int index = offset / sectorsize;
+ struct sector_ptr *sector = &rbio->bio_sectors[index];
+ struct bio_vec bv = bio_iter_iovec(bio, iter);
- for (bvec_offset = 0; bvec_offset < bvec.bv_len;
- bvec_offset += sectorsize, offset += sectorsize) {
- int index = offset / sectorsize;
- struct sector_ptr *sector = &rbio->bio_sectors[index];
+ sector->page = bv.bv_page;
+ sector->pgoff = bv.bv_offset;
+ ASSERT(sector->pgoff < PAGE_SIZE);
- sector->page = bvec.bv_page;
- sector->pgoff = bvec.bv_offset + bvec_offset;
- ASSERT(sector->pgoff < PAGE_SIZE);
- }
+ bio_advance_iter_single(bio, &iter, sectorsize);
+ offset += sectorsize;
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
` (4 preceding siblings ...)
2025-04-19 7:17 ` [PATCH v2 5/8] btrfs: simplify bvec iteration in index_one_bio Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-21 11:44 ` Christoph Hellwig
2025-04-19 7:17 ` [PATCH v2 7/8] btrfs: scrub: use virtual addresses directly Qu Wenruo
` (2 subsequent siblings)
8 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
Instead of using a @page + @pg_offset pair inside sector_ptr structure,
use a single physical address instead.
This allows us to grab both the page and offset from a single u64 value.
Although we still need an extra bool value, @has_paddr, to distinguish
if the sector is properly mapped (as the 0 physical address is totally
valid).
This change doesn't change the size of structure sector_ptr, but reduces
the parameters of several functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ Use physical addresses instead to handle highmem. ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/raid56.c | 158 +++++++++++++++++++++++++---------------------
1 file changed, 86 insertions(+), 72 deletions(-)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index b0938eff908e..f196682e40f1 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -134,14 +134,17 @@ struct btrfs_stripe_hash_table {
};
/*
- * A bvec like structure to present a sector inside a page.
- *
- * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
+ * A structure to present a sector inside a page, the length is fixed to
+ * sectorsize;
*/
struct sector_ptr {
- struct page *page;
- unsigned int pgoff:24;
- unsigned int uptodate:8;
+ /*
+ * Blocks from the bio list can still be highmem.
+ * So here we use physical address to present a page and the offset inside it.
+ */
+ phys_addr_t paddr;
+ bool has_paddr;
+ bool uptodate;
};
static void rmw_rbio_work(struct work_struct *work);
@@ -233,6 +236,14 @@ int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info)
return 0;
}
+static void memcpy_sectors(const struct sector_ptr *dst,
+ const struct sector_ptr *src, u32 blocksize)
+{
+ memcpy_page(phys_to_page(dst->paddr), offset_in_page(dst->paddr),
+ phys_to_page(src->paddr), offset_in_page(src->paddr),
+ blocksize);
+}
+
/*
* caching an rbio means to copy anything from the
* bio_sectors array into the stripe_pages array. We
@@ -253,7 +264,7 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
for (i = 0; i < rbio->nr_sectors; i++) {
/* Some range not covered by bio (partial write), skip it */
- if (!rbio->bio_sectors[i].page) {
+ if (!rbio->bio_sectors[i].has_paddr) {
/*
* Even if the sector is not covered by bio, if it is
* a data sector it should still be uptodate as it is
@@ -264,12 +275,8 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
continue;
}
- ASSERT(rbio->stripe_sectors[i].page);
- memcpy_page(rbio->stripe_sectors[i].page,
- rbio->stripe_sectors[i].pgoff,
- rbio->bio_sectors[i].page,
- rbio->bio_sectors[i].pgoff,
- rbio->bioc->fs_info->sectorsize);
+ memcpy_sectors(&rbio->stripe_sectors[i], &rbio->bio_sectors[i],
+ rbio->bioc->fs_info->sectorsize);
rbio->stripe_sectors[i].uptodate = 1;
}
set_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
@@ -326,8 +333,13 @@ static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
int page_index = offset >> PAGE_SHIFT;
ASSERT(page_index < rbio->nr_pages);
- rbio->stripe_sectors[i].page = rbio->stripe_pages[page_index];
- rbio->stripe_sectors[i].pgoff = offset_in_page(offset);
+ if (!rbio->stripe_pages[page_index])
+ continue;
+
+ rbio->stripe_sectors[i].has_paddr = true;
+ rbio->stripe_sectors[i].paddr =
+ page_to_phys(rbio->stripe_pages[page_index]) +
+ offset_in_page(offset);
}
}
@@ -962,9 +974,9 @@ static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
spin_lock(&rbio->bio_list_lock);
sector = &rbio->bio_sectors[index];
- if (sector->page || bio_list_only) {
+ if (sector->has_paddr || bio_list_only) {
/* Don't return sector without a valid page pointer */
- if (!sector->page)
+ if (!sector->has_paddr)
sector = NULL;
spin_unlock(&rbio->bio_list_lock);
return sector;
@@ -1142,7 +1154,7 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
rbio, stripe_nr);
ASSERT_RBIO_SECTOR(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors,
rbio, sector_nr);
- ASSERT(sector->page);
+ ASSERT(sector->has_paddr);
stripe = &rbio->bioc->stripes[stripe_nr];
disk_start = stripe->physical + sector_nr * sectorsize;
@@ -1173,8 +1185,8 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
*/
if (last_end == disk_start && !last->bi_status &&
last->bi_bdev == stripe->dev->bdev) {
- ret = bio_add_page(last, sector->page, sectorsize,
- sector->pgoff);
+ ret = bio_add_page(last, phys_to_page(sector->paddr),
+ sectorsize, offset_in_page(sector->paddr));
if (ret == sectorsize)
return 0;
}
@@ -1187,7 +1199,8 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
bio->bi_iter.bi_sector = disk_start >> SECTOR_SHIFT;
bio->bi_private = rbio;
- __bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
+ __bio_add_page(bio, phys_to_page(sector->paddr), sectorsize,
+ offset_in_page(sector->paddr));
bio_list_add(bio_list, bio);
return 0;
}
@@ -1204,10 +1217,8 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
struct sector_ptr *sector = &rbio->bio_sectors[index];
struct bio_vec bv = bio_iter_iovec(bio, iter);
- sector->page = bv.bv_page;
- sector->pgoff = bv.bv_offset;
- ASSERT(sector->pgoff < PAGE_SIZE);
-
+ sector->has_paddr = true;
+ sector->paddr = bvec_phys(&bv);
bio_advance_iter_single(bio, &iter, sectorsize);
offset += sectorsize;
}
@@ -1287,6 +1298,15 @@ static void assert_rbio(struct btrfs_raid_bio *rbio)
ASSERT_RBIO(rbio->nr_data < rbio->real_stripes, rbio);
}
+static inline void *kmap_local_sector(const struct sector_ptr *sector)
+{
+ /* The sector pointer must has a page mapped to it. */
+ ASSERT(sector->has_paddr);
+
+ return kmap_local_page(phys_to_page(sector->paddr)) +
+ offset_in_page(sector->paddr);
+}
+
/* Generate PQ for one vertical stripe. */
static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
{
@@ -1299,14 +1319,13 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
/* First collect one sector from each data stripe */
for (stripe = 0; stripe < rbio->nr_data; stripe++) {
sector = sector_in_rbio(rbio, stripe, sectornr, 0);
- pointers[stripe] = kmap_local_page(sector->page) +
- sector->pgoff;
+ pointers[stripe] = kmap_local_sector(sector);
}
/* Then add the parity stripe */
sector = rbio_pstripe_sector(rbio, sectornr);
sector->uptodate = 1;
- pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
+ pointers[stripe++] = kmap_local_sector(sector);
if (has_qstripe) {
/*
@@ -1315,8 +1334,7 @@ static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
*/
sector = rbio_qstripe_sector(rbio, sectornr);
sector->uptodate = 1;
- pointers[stripe++] = kmap_local_page(sector->page) +
- sector->pgoff;
+ pointers[stripe++] = kmap_local_sector(sector);
assert_rbio(rbio);
raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
@@ -1475,15 +1493,14 @@ static void set_rbio_range_error(struct btrfs_raid_bio *rbio, struct bio *bio)
* stripe_pages[], thus we need to locate the sector.
*/
static struct sector_ptr *find_stripe_sector(struct btrfs_raid_bio *rbio,
- struct page *page,
- unsigned int pgoff)
+ phys_addr_t paddr)
{
int i;
for (i = 0; i < rbio->nr_sectors; i++) {
struct sector_ptr *sector = &rbio->stripe_sectors[i];
- if (sector->page == page && sector->pgoff == pgoff)
+ if (sector->has_paddr && sector->paddr == paddr)
return sector;
}
return NULL;
@@ -1503,11 +1520,11 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
bio_for_each_segment_all(bvec, bio, iter_all) {
struct sector_ptr *sector;
- int pgoff;
+ phys_addr_t paddr = bvec_phys(bvec);
+ int off;
- for (pgoff = bvec->bv_offset; pgoff - bvec->bv_offset < bvec->bv_len;
- pgoff += sectorsize) {
- sector = find_stripe_sector(rbio, bvec->bv_page, pgoff);
+ for (off = 0; off < bvec->bv_len; off += sectorsize) {
+ sector = find_stripe_sector(rbio, paddr + off);
ASSERT(sector);
if (sector)
sector->uptodate = 1;
@@ -1517,17 +1534,14 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
static int get_bio_sector_nr(struct btrfs_raid_bio *rbio, struct bio *bio)
{
- struct bio_vec *bv = bio_first_bvec_all(bio);
+ phys_addr_t bvec_paddr = bvec_phys(bio_first_bvec_all(bio));
int i;
for (i = 0; i < rbio->nr_sectors; i++) {
- struct sector_ptr *sector;
-
- sector = &rbio->stripe_sectors[i];
- if (sector->page == bv->bv_page && sector->pgoff == bv->bv_offset)
+ if (rbio->stripe_sectors[i].paddr == bvec_paddr)
break;
- sector = &rbio->bio_sectors[i];
- if (sector->page == bv->bv_page && sector->pgoff == bv->bv_offset)
+ if (rbio->bio_sectors[i].has_paddr &&
+ rbio->bio_sectors[i].paddr == bvec_paddr)
break;
}
ASSERT(i < rbio->nr_sectors);
@@ -1788,10 +1802,10 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
{
struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
struct sector_ptr *sector;
- u8 csum_buf[BTRFS_CSUM_SIZE];
- u8 *csum_expected;
void *kaddr;
int ret;
+ u8 csum_buf[BTRFS_CSUM_SIZE];
+ u8 *csum_expected;
if (!rbio->csum_bitmap || !rbio->csum_buf)
return 0;
@@ -1809,12 +1823,10 @@ static int verify_one_sector(struct btrfs_raid_bio *rbio,
sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
}
- ASSERT(sector->page);
-
- kaddr = kmap_local_page(sector->page) + sector->pgoff;
csum_expected = rbio->csum_buf +
(stripe_nr * rbio->stripe_nsectors + sector_nr) *
fs_info->csum_size;
+ kaddr = kmap_local_sector(sector);
ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, csum_expected);
kunmap_local(kaddr);
return ret;
@@ -1873,9 +1885,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
} else {
sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
}
- ASSERT(sector->page);
- pointers[stripe_nr] = kmap_local_page(sector->page) +
- sector->pgoff;
+ pointers[stripe_nr] = kmap_local_sector(sector);
unmap_array[stripe_nr] = pointers[stripe_nr];
}
@@ -2327,7 +2337,7 @@ static bool need_read_stripe_sectors(struct btrfs_raid_bio *rbio)
* thus this rbio can not be cached one, as cached one must
* have all its data sectors present and uptodate.
*/
- if (!sector->page || !sector->uptodate)
+ if (!sector->has_paddr || !sector->uptodate)
return true;
}
return false;
@@ -2517,6 +2527,7 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
int stripe;
int sectornr;
bool has_qstripe;
+ struct page *page;
struct sector_ptr p_sector = { 0 };
struct sector_ptr q_sector = { 0 };
struct bio_list bio_list;
@@ -2548,29 +2559,34 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
*/
clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
- p_sector.page = alloc_page(GFP_NOFS);
- if (!p_sector.page)
+ page = alloc_page(GFP_NOFS);
+ if (!page)
return -ENOMEM;
- p_sector.pgoff = 0;
+ p_sector.has_paddr = true;
+ p_sector.paddr = page_to_phys(page);
p_sector.uptodate = 1;
+ page = NULL;
if (has_qstripe) {
/* RAID6, allocate and map temp space for the Q stripe */
- q_sector.page = alloc_page(GFP_NOFS);
- if (!q_sector.page) {
- __free_page(p_sector.page);
- p_sector.page = NULL;
+ page = alloc_page(GFP_NOFS);
+ if (!page) {
+ __free_page(phys_to_page(p_sector.paddr));
+ p_sector.has_paddr = false;
return -ENOMEM;
}
- q_sector.pgoff = 0;
+ q_sector.has_paddr = true;
+ q_sector.paddr = page_to_phys(page);
q_sector.uptodate = 1;
- pointers[rbio->real_stripes - 1] = kmap_local_page(q_sector.page);
+ page = NULL;
+ pointers[rbio->real_stripes - 1] =
+ kmap_local_sector(&q_sector);
}
bitmap_clear(rbio->error_bitmap, 0, rbio->nr_sectors);
/* Map the parity stripe just once */
- pointers[nr_data] = kmap_local_page(p_sector.page);
+ pointers[nr_data] = kmap_local_sector(&p_sector);
for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
struct sector_ptr *sector;
@@ -2579,8 +2595,7 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
/* first collect one page from each data stripe */
for (stripe = 0; stripe < nr_data; stripe++) {
sector = sector_in_rbio(rbio, stripe, sectornr, 0);
- pointers[stripe] = kmap_local_page(sector->page) +
- sector->pgoff;
+ pointers[stripe] = kmap_local_sector(sector);
}
if (has_qstripe) {
@@ -2596,7 +2611,7 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
/* Check scrubbing parity and repair it */
sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
- parity = kmap_local_page(sector->page) + sector->pgoff;
+ parity = kmap_local_sector(sector);
if (memcmp(parity, pointers[rbio->scrubp], sectorsize) != 0)
memcpy(parity, pointers[rbio->scrubp], sectorsize);
else
@@ -2609,12 +2624,11 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio)
}
kunmap_local(pointers[nr_data]);
- __free_page(p_sector.page);
- p_sector.page = NULL;
- if (q_sector.page) {
- kunmap_local(pointers[rbio->real_stripes - 1]);
- __free_page(q_sector.page);
- q_sector.page = NULL;
+ __free_page(phys_to_page(p_sector.paddr));
+ p_sector.has_paddr = false;
+ if (q_sector.has_paddr) {
+ __free_page(phys_to_page(q_sector.paddr));
+ q_sector.has_paddr = false;
}
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 7/8] btrfs: scrub: use virtual addresses directly
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
` (5 preceding siblings ...)
2025-04-19 7:17 ` [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page Qu Wenruo
2025-04-21 13:29 ` [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Josef Bacik
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
Instead of the old @page and @page_offset pair inside scrub, here we can
directly use the virtual address for a sector.
This has the following benefit:
- Simplified parameters
A single @kaddr will repair @page and @page_offset.
- No more unnecessary kmap/kunmap calls
Since all pages utilized by scrub is allocated by scrub, and no
highmem is allowed, we do not need to do any kmap/kunmap.
And add an ASSERT() inside the new scrub_stripe_get_kaddr() to
catch any unexpected highmem page.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/scrub.c | 94 ++++++++++++++++++++----------------------------
1 file changed, 38 insertions(+), 56 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 49021765c17b..4ff3465c430c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -579,20 +579,16 @@ static int fill_writer_pointer_gap(struct scrub_ctx *sctx, u64 physical)
return ret;
}
-static struct page *scrub_stripe_get_page(struct scrub_stripe *stripe, int sector_nr)
+static void *scrub_stripe_get_kaddr(struct scrub_stripe *stripe, int sector_nr)
{
struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
- int page_index = (sector_nr << fs_info->sectorsize_bits) >> PAGE_SHIFT;
+ u32 offset = sector_nr << fs_info->sectorsize_bits;
+ const struct page *page = stripe->pages[offset >> PAGE_SHIFT];
- return stripe->pages[page_index];
-}
-
-static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
- int sector_nr)
-{
- struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
-
- return offset_in_page(sector_nr << fs_info->sectorsize_bits);
+ /* stripe->pages[] is allocated by us and no highmem is allowed. */
+ ASSERT(!PageHighMem(page));
+ ASSERT(page);
+ return page_address(page) + offset_in_page(offset);
}
static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
@@ -600,19 +596,17 @@ static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr
struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
const u64 logical = stripe->logical + (sector_nr << fs_info->sectorsize_bits);
- const struct page *first_page = scrub_stripe_get_page(stripe, sector_nr);
- const unsigned int first_off = scrub_stripe_get_page_offset(stripe, sector_nr);
+ void *first_kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
u8 on_disk_csum[BTRFS_CSUM_SIZE];
u8 calculated_csum[BTRFS_CSUM_SIZE];
- struct btrfs_header *header;
+ struct btrfs_header *header = first_kaddr;
/*
* Here we don't have a good way to attach the pages (and subpages)
* to a dummy extent buffer, thus we have to directly grab the members
* from pages.
*/
- header = (struct btrfs_header *)(page_address(first_page) + first_off);
memcpy(on_disk_csum, header->csum, fs_info->csum_size);
if (logical != btrfs_stack_header_bytenr(header)) {
@@ -648,14 +642,11 @@ static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr
/* Now check tree block csum. */
shash->tfm = fs_info->csum_shash;
crypto_shash_init(shash);
- crypto_shash_update(shash, page_address(first_page) + first_off +
- BTRFS_CSUM_SIZE, fs_info->sectorsize - BTRFS_CSUM_SIZE);
+ crypto_shash_update(shash, first_kaddr + BTRFS_CSUM_SIZE,
+ fs_info->sectorsize - BTRFS_CSUM_SIZE);
for (int i = sector_nr + 1; i < sector_nr + sectors_per_tree; i++) {
- struct page *page = scrub_stripe_get_page(stripe, i);
- unsigned int page_off = scrub_stripe_get_page_offset(stripe, i);
-
- crypto_shash_update(shash, page_address(page) + page_off,
+ crypto_shash_update(shash, scrub_stripe_get_kaddr(stripe, i),
fs_info->sectorsize);
}
@@ -691,10 +682,8 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
struct scrub_sector_verification *sector = &stripe->sectors[sector_nr];
const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
- struct page *page = scrub_stripe_get_page(stripe, sector_nr);
- unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
+ void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
u8 csum_buf[BTRFS_CSUM_SIZE];
- void *kaddr;
int ret;
ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
@@ -738,9 +727,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
return;
}
- kaddr = kmap_local_page(page) + pgoff;
ret = btrfs_check_sector_csum(fs_info, kaddr, csum_buf, sector->csum);
- kunmap_local(kaddr);
if (ret < 0) {
set_bit(sector_nr, &stripe->csum_error_bitmap);
set_bit(sector_nr, &stripe->error_bitmap);
@@ -769,8 +756,7 @@ static int calc_sector_number(struct scrub_stripe *stripe, struct bio_vec *first
int i;
for (i = 0; i < stripe->nr_sectors; i++) {
- if (scrub_stripe_get_page(stripe, i) == first_bvec->bv_page &&
- scrub_stripe_get_page_offset(stripe, i) == first_bvec->bv_offset)
+ if (scrub_stripe_get_kaddr(stripe, i) == bvec_virt(first_bvec))
break;
}
ASSERT(i < stripe->nr_sectors);
@@ -817,6 +803,25 @@ static int calc_next_mirror(int mirror, int num_copies)
return (mirror + 1 > num_copies) ? 1 : mirror + 1;
}
+static void scrub_bio_add_sector(struct btrfs_bio *bbio,
+ struct scrub_stripe *stripe, int sector_nr)
+{
+ void *kaddr = scrub_stripe_get_kaddr(stripe, sector_nr);
+ int ret;
+
+ ret = bio_add_page(&bbio->bio, virt_to_page(kaddr),
+ bbio->fs_info->sectorsize, offset_in_page(kaddr));
+ /*
+ * Caller should ensure the bbio has enough size.
+ * And we can not use __bio_add_page(), which doesn't do any merge.
+ *
+ * Meanwhile for scrub_submit_initial_read() we fully rely on the merge
+ * to create the minimal amount of bio vectors, for fs block size < page
+ * size cases.
+ */
+ ASSERT(ret == bbio->fs_info->sectorsize);
+}
+
static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
int mirror, int blocksize, bool wait)
{
@@ -829,13 +834,6 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
ASSERT(atomic_read(&stripe->pending_io) == 0);
for_each_set_bit(i, &old_error_bitmap, stripe->nr_sectors) {
- struct page *page;
- int pgoff;
- int ret;
-
- page = scrub_stripe_get_page(stripe, i);
- pgoff = scrub_stripe_get_page_offset(stripe, i);
-
/* The current sector cannot be merged, submit the bio. */
if (bbio && ((i > 0 && !test_bit(i - 1, &stripe->error_bitmap)) ||
bbio->bio.bi_iter.bi_size >= blocksize)) {
@@ -854,8 +852,7 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
(i << fs_info->sectorsize_bits)) >> SECTOR_SHIFT;
}
- ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
- ASSERT(ret == fs_info->sectorsize);
+ scrub_bio_add_sector(bbio, stripe, i);
}
if (bbio) {
ASSERT(bbio->bio.bi_iter.bi_size);
@@ -1202,10 +1199,6 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
int sector_nr;
for_each_set_bit(sector_nr, &write_bitmap, stripe->nr_sectors) {
- struct page *page = scrub_stripe_get_page(stripe, sector_nr);
- unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
- int ret;
-
/* We should only writeback sectors covered by an extent. */
ASSERT(test_bit(sector_nr, &stripe->extent_sector_bitmap));
@@ -1221,8 +1214,7 @@ static void scrub_write_sectors(struct scrub_ctx *sctx, struct scrub_stripe *str
(sector_nr << fs_info->sectorsize_bits)) >>
SECTOR_SHIFT;
}
- ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
- ASSERT(ret == fs_info->sectorsize);
+ scrub_bio_add_sector(bbio, stripe, sector_nr);
}
if (bbio)
scrub_submit_write_bio(sctx, stripe, bbio, dev_replace);
@@ -1675,9 +1667,6 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
atomic_inc(&stripe->pending_io);
for_each_set_bit(i, &stripe->extent_sector_bitmap, stripe->nr_sectors) {
- struct page *page = scrub_stripe_get_page(stripe, i);
- unsigned int pgoff = scrub_stripe_get_page_offset(stripe, i);
-
/* We're beyond the chunk boundary, no need to read anymore. */
if (i >= nr_sectors)
break;
@@ -1730,7 +1719,7 @@ static void scrub_submit_extent_sector_read(struct scrub_stripe *stripe)
bbio->bio.bi_iter.bi_sector = logical >> SECTOR_SHIFT;
}
- __bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
+ scrub_bio_add_sector(bbio, stripe, i);
}
if (bbio) {
@@ -1768,15 +1757,8 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx,
bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
/* Read the whole range inside the chunk boundary. */
- for (unsigned int cur = 0; cur < nr_sectors; cur++) {
- struct page *page = scrub_stripe_get_page(stripe, cur);
- unsigned int pgoff = scrub_stripe_get_page_offset(stripe, cur);
- int ret;
-
- ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
- /* We should have allocated enough bio vectors. */
- ASSERT(ret == fs_info->sectorsize);
- }
+ for (unsigned int cur = 0; cur < nr_sectors; cur++)
+ scrub_bio_add_sector(bbio, stripe, cur);
atomic_inc(&stripe->pending_io);
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
` (6 preceding siblings ...)
2025-04-19 7:17 ` [PATCH v2 7/8] btrfs: scrub: use virtual addresses directly Qu Wenruo
@ 2025-04-19 7:17 ` Qu Wenruo
2025-04-21 13:29 ` [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Josef Bacik
8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-19 7:17 UTC (permalink / raw)
To: linux-btrfs, fstests; +Cc: Christoph Hellwig
From: Christoph Hellwig <hch@lst.de>
This removes the last direct poke into bvec internals in btrfs.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 6911eaa6b408..1f8bc9e4acb6 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1198,6 +1198,7 @@ int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
u32 copy_start;
/* Offset inside the full decompressed extent */
u32 bvec_offset;
+ void *kaddr;
bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
/*
@@ -1220,10 +1221,12 @@ int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
* @buf + @buf_len.
*/
ASSERT(copy_start - decompressed < buf_len);
- memcpy_to_page(bvec.bv_page, bvec.bv_offset,
- buf + copy_start - decompressed, copy_len);
- cur_offset += copy_len;
+ kaddr = bvec_kmap_local(&bvec);
+ memcpy(kaddr, buf + copy_start - decompressed, copy_len);
+ kunmap_local(kaddr);
+
+ cur_offset += copy_len;
bio_advance(orig_bio, copy_len);
/* Finished the bio */
if (!orig_bio->bi_iter.bi_size)
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr
2025-04-19 7:17 ` [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr Qu Wenruo
@ 2025-04-21 11:44 ` Christoph Hellwig
2025-04-22 11:31 ` David Sterba
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2025-04-21 11:44 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, fstests, Christoph Hellwig
On Sat, Apr 19, 2025 at 04:47:13PM +0930, Qu Wenruo wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [ Use physical addresses instead to handle highmem. ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
I think you should take full credit and authorship here as this is quite
different from what I posted.
The changes look good to me, though:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
` (7 preceding siblings ...)
2025-04-19 7:17 ` [PATCH v2 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page Qu Wenruo
@ 2025-04-21 13:29 ` Josef Bacik
8 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2025-04-21 13:29 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, fstests
On Sat, Apr 19, 2025 at 04:47:07PM +0930, Qu Wenruo wrote:
> - Fix a crash caused by incorrectly index sector_ptrs
> The 5th patch, caused by missing increasement of the @offset variable.
>
> - Use physical addresses instead of virtual addresses to handle HIGHMEM
> The 6th patch, as we can still get sector_ptr from bio_sectors[] which
> is using pages from the bio, and that can be highmem.
>
> However I can not do a proper test on this, as the latest kernel has
> already rejected HIGHMEM64G option, thus even if my VM has extra 3GB
> for HIGHMEM (total 6GB), I'm not sure if the kernel can really utilize
> those high memories.
>
> Furthermore there seems to be other bugs in mm layer related to
> HIGHMEM + PAGE, resulting zswap crash when try compressing a page to
> be swapped out.
> But at least several scrub/balance related test cases passed on x86
> 32bit with HIGHMEM and PAGE.
>
> I have tested with x86_64 (64 bit, no HIGHMEM), aarch64 (64bit, 64K
> page size, no HIGHMEM) with no regression.
>
> - Fix a incorrect __bio_add_page() usage in scrub_bio_add_sector()
> The 6th patch, as bio_add_page() do extra bvec merging before
> allocating a new bvec, but __bio_add_page() does not.
>
> This triggers WARN_ON_ONCE() in __bio_add_page() checking if the bio
> is full.
>
> Fixing it by do the old bio_add_page() and ASSERT(), with extra
> comment on we can not use __bio_add_page().
>
> - Various minor commit message update
> And full proper test runs.
>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr
2025-04-21 11:44 ` Christoph Hellwig
@ 2025-04-22 11:31 ` David Sterba
2025-04-23 9:40 ` Qu Wenruo
0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2025-04-22 11:31 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, fstests
On Mon, Apr 21, 2025 at 01:44:16PM +0200, Christoph Hellwig wrote:
> On Sat, Apr 19, 2025 at 04:47:13PM +0930, Qu Wenruo wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > [ Use physical addresses instead to handle highmem. ]
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> I think you should take full credit and authorship here as this is quite
> different from what I posted.
>
> The changes look good to me, though:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
For the record, Qu added the patches to for-next with you as the author,
so I'll change it and mention it's based on your patches instead.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr
2025-04-22 11:31 ` David Sterba
@ 2025-04-23 9:40 ` Qu Wenruo
0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2025-04-23 9:40 UTC (permalink / raw)
To: dsterba, Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, fstests
在 2025/4/22 21:01, David Sterba 写道:
> On Mon, Apr 21, 2025 at 01:44:16PM +0200, Christoph Hellwig wrote:
>> On Sat, Apr 19, 2025 at 04:47:13PM +0930, Qu Wenruo wrote:
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> [ Use physical addresses instead to handle highmem. ]
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>> I think you should take full credit and authorship here as this is quite
>> different from what I posted.
>>
>> The changes look good to me, though:
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> For the record, Qu added the patches to for-next with you as the author,
> so I'll change it and mention it's based on your patches instead.
>
I'm OK with this author change, but to be honest, I didn't dig deep
enough into the page frame number/physical/virtual address things
before, and all the existing bio_add_folio/page() interfaces require
page/folio with an offset thus I never considered anything else.
Thus although this one patch changed a lot, without HCH's initial patch
the bulb will never be lit.
So I do not care that much who the author is, especially I'm mostly for
the patch 3 to prepare for large folios support.
Thanks,
Qu
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-04-23 9:40 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-19 7:17 [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 1/8] btrfs: remove the alignment checks in end_bbio_data_read Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 2/8] btrfs: track the next file offset in struct btrfs_bio_ctrl Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 3/8] btrfs: pass a physical address to btrfs_repair_io_failure Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 4/8] btrfs: move kmapping out of btrfs_check_sector_csum Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 5/8] btrfs: simplify bvec iteration in index_one_bio Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 6/8] btrfs: raid56: store a physical address in structure sector_ptr Qu Wenruo
2025-04-21 11:44 ` Christoph Hellwig
2025-04-22 11:31 ` David Sterba
2025-04-23 9:40 ` Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 7/8] btrfs: scrub: use virtual addresses directly Qu Wenruo
2025-04-19 7:17 ` [PATCH v2 8/8] btrfs: use bvec_kmap_local in btrfs_decompress_buf2page Qu Wenruo
2025-04-21 13:29 ` [PATCH v2 0/8] btrfs: do not poking into the implementation details of bio_vec Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox