* [PATCH v2] btrfs: don't merge pages into bio if their page offset is not continuous
@ 2022-08-13 8:06 Qu Wenruo
2022-08-13 8:09 ` Christoph Hellwig
2022-08-18 12:50 ` David Sterba
0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-08-13 8:06 UTC (permalink / raw)
To: linux-btrfs; +Cc: Zygo Blaxell, Christoph Hellwig
[BUG]
Zygo reported on latest devel branch, he can hit ASSERT()/BUG_ON()
caused crash when doing RAID5 recovery (intentionally corrupt one disk,
and let btrfs to recovery the data during read/scrub).
And The following minimal reproducer can cause extent state leakage at
rmmod time:
mkfs.btrfs -f -d raid5 -m raid5 $dev1 $dev2 $dev3 -b 1G > /dev/null
mount $dev1 $mnt
fsstress -w -d $mnt -n 25 -s 1660807876
sync
fssum -A -f -w /tmp/fssum.saved $mnt
umount $mnt
# Wipe the dev1 but keeps its super block
xfs_io -c "pwrite -S 0x0 1m 1023m" $dev1
mount $dev1 $mnt
fssum -r /tmp/fssum.saved $mnt > /dev/null
umount $mnt
rmmod btrfs
This will lead to the following extent states leakage:
BTRFS: state leak: start 499712 end 503807 state 5 in tree 1 refs 1
BTRFS: state leak: start 495616 end 499711 state 5 in tree 1 refs 1
BTRFS: state leak: start 491520 end 495615 state 5 in tree 1 refs 1
BTRFS: state leak: start 487424 end 491519 state 5 in tree 1 refs 1
BTRFS: state leak: start 483328 end 487423 state 5 in tree 1 refs 1
BTRFS: state leak: start 479232 end 483327 state 5 in tree 1 refs 1
BTRFS: state leak: start 475136 end 479231 state 5 in tree 1 refs 1
BTRFS: state leak: start 471040 end 475135 state 5 in tree 1 refs 1
[CAUSE]
Since commit 7aa51232e204 ("btrfs: pass a btrfs_bio to
btrfs_repair_one_sector"), we always use btrfs_bio->file_offset to
determine the file offset of a page.
But that usage assume that, one bio has all its page having a continuous
page offsets.
Unfortunately that's not true, btrfs only requires the logical bytenr
continuous when assembling its bios.
From above script, we have one bio looks like this:
fssum-27671 submit_one_bio: bio logical=217739264 len=36864
fssum-27671 submit_one_bio: r/i=5/261 page_offset=466944 <<<
fssum-27671 submit_one_bio: r/i=5/261 page_offset=724992 <<<
fssum-27671 submit_one_bio: r/i=5/261 page_offset=729088
fssum-27671 submit_one_bio: r/i=5/261 page_offset=733184
fssum-27671 submit_one_bio: r/i=5/261 page_offset=737280
fssum-27671 submit_one_bio: r/i=5/261 page_offset=741376
fssum-27671 submit_one_bio: r/i=5/261 page_offset=745472
fssum-27671 submit_one_bio: r/i=5/261 page_offset=749568
fssum-27671 submit_one_bio: r/i=5/261 page_offset=753664
Note that the 1st and the 2nd page has non-continuous page offsets.
This means, at repair time, we will have completely wrong file offset
passed in:
kworker/u32:2-19927 btrfs_repair_one_sector: r/i=5/261 page_off=729088 file_off=475136 bio_offset=8192
Since the file offset is incorrect, we latter incorrectly set the extent
states, and no way to really release them.
Thus later it causes the leakage.
In fact, this can be even worse, since the file offset is incorrect, we
can hit cases like the incorrect file offset belongs to a HOLE, and
later cause btrfs_num_copies() to trigger error, finally hit
BUG_ON()/ASSERT() later.
[FIX]
This patch will add an extra condition in btrfs_bio_add_page() for
uncompressed IO.
Now we will have more strict requirement for bio pages:
- They should all have the same mapping
(the mapping check is already implied by the call chain)
- Their logical bytenr should be adjacent
This is the same as the old condition.
- Their page_offset() (file offset) should be adjacent
This is the new check.
This would result a slightly increased amount of bios from btrfs
(needs holes and inside the same stripe boundary to trigger).
But this would greatly reduce the confusion, as it's pretty common
to assume a btrfs bio would only contain continuous page cache.
Later we may need extra cleanups, as we no longer needs to handle gaps
between page offsets in endio functions.
Currently this should be the minimal patch to fix commit 7aa51232e204
("btrfs: pass a btrfs_bio to btrfs_repair_one_sector").
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Christoph Hellwig <hch@lst.de>
Fixes: 7aa51232e204 ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
- Code style cleanups to improve readability
---
fs/btrfs/extent_io.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bc30c0a4a7f2..56dd6997c0ec 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3256,7 +3256,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
u32 bio_size = bio->bi_iter.bi_size;
u32 real_size;
const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
- bool contig;
+ bool contig = false;
int ret;
ASSERT(bio);
@@ -3265,10 +3265,34 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
if (bio_ctrl->compress_type != compress_type)
return 0;
- if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
+
+ if (bio->bi_iter.bi_size == 0) {
+ /* We can always add a page into an empty bio. */
+ contig = true;
+ } else if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE) {
+ struct bio_vec *bvec = bio_last_bvec_all(bio);
+
+ /*
+ * The contig check requires the following conditions to be met:
+ * 1) The pages are belonging to the same inode
+ * This is implied by the call chain.
+ *
+ * 2) The range has adjacent logical bytenr
+ *
+ * 3) The range has adjacent file offset (NEW)
+ * This is required for the usage of btrfs_bio->file_offset.
+ */
+ contig = bio_end_sector(bio) == sector &&
+ page_offset(bvec->bv_page) + bvec->bv_offset +
+ bvec->bv_len == page_offset(page) + pg_offset;
+ } else {
+ /*
+ * For compression, all IO should have its logical bytenr
+ * set to the starting bytenr of the compressed extent.
+ */
contig = bio->bi_iter.bi_sector == sector;
- else
- contig = bio_end_sector(bio) == sector;
+ }
+
if (!contig)
return 0;
--
2.37.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] btrfs: don't merge pages into bio if their page offset is not continuous
2022-08-13 8:06 [PATCH v2] btrfs: don't merge pages into bio if their page offset is not continuous Qu Wenruo
@ 2022-08-13 8:09 ` Christoph Hellwig
2022-08-18 12:50 ` David Sterba
1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2022-08-13 8:09 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Zygo Blaxell, Christoph Hellwig
On Sat, Aug 13, 2022 at 04:06:53PM +0800, Qu Wenruo wrote:
> - bool contig;
> + bool contig = false;
Nit: the initialization isn't strictly needed in this new version.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] btrfs: don't merge pages into bio if their page offset is not continuous
2022-08-13 8:06 [PATCH v2] btrfs: don't merge pages into bio if their page offset is not continuous Qu Wenruo
2022-08-13 8:09 ` Christoph Hellwig
@ 2022-08-18 12:50 ` David Sterba
1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2022-08-18 12:50 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Zygo Blaxell, Christoph Hellwig
On Sat, Aug 13, 2022 at 04:06:53PM +0800, Qu Wenruo wrote:
> [BUG]
> Zygo reported on latest devel branch, he can hit ASSERT()/BUG_ON()
> caused crash when doing RAID5 recovery (intentionally corrupt one disk,
> and let btrfs to recovery the data during read/scrub).
>
> And The following minimal reproducer can cause extent state leakage at
> rmmod time:
>
> mkfs.btrfs -f -d raid5 -m raid5 $dev1 $dev2 $dev3 -b 1G > /dev/null
> mount $dev1 $mnt
> fsstress -w -d $mnt -n 25 -s 1660807876
> sync
> fssum -A -f -w /tmp/fssum.saved $mnt
> umount $mnt
>
> # Wipe the dev1 but keeps its super block
> xfs_io -c "pwrite -S 0x0 1m 1023m" $dev1
> mount $dev1 $mnt
> fssum -r /tmp/fssum.saved $mnt > /dev/null
> umount $mnt
> rmmod btrfs
>
> This will lead to the following extent states leakage:
>
> BTRFS: state leak: start 499712 end 503807 state 5 in tree 1 refs 1
> BTRFS: state leak: start 495616 end 499711 state 5 in tree 1 refs 1
> BTRFS: state leak: start 491520 end 495615 state 5 in tree 1 refs 1
> BTRFS: state leak: start 487424 end 491519 state 5 in tree 1 refs 1
> BTRFS: state leak: start 483328 end 487423 state 5 in tree 1 refs 1
> BTRFS: state leak: start 479232 end 483327 state 5 in tree 1 refs 1
> BTRFS: state leak: start 475136 end 479231 state 5 in tree 1 refs 1
> BTRFS: state leak: start 471040 end 475135 state 5 in tree 1 refs 1
>
> [CAUSE]
> Since commit 7aa51232e204 ("btrfs: pass a btrfs_bio to
> btrfs_repair_one_sector"), we always use btrfs_bio->file_offset to
> determine the file offset of a page.
>
> But that usage assume that, one bio has all its page having a continuous
> page offsets.
>
> Unfortunately that's not true, btrfs only requires the logical bytenr
> continuous when assembling its bios.
>
> >From above script, we have one bio looks like this:
> fssum-27671 submit_one_bio: bio logical=217739264 len=36864
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=466944 <<<
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=724992 <<<
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=729088
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=733184
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=737280
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=741376
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=745472
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=749568
> fssum-27671 submit_one_bio: r/i=5/261 page_offset=753664
>
> Note that the 1st and the 2nd page has non-continuous page offsets.
>
> This means, at repair time, we will have completely wrong file offset
> passed in:
>
> kworker/u32:2-19927 btrfs_repair_one_sector: r/i=5/261 page_off=729088 file_off=475136 bio_offset=8192
>
> Since the file offset is incorrect, we latter incorrectly set the extent
> states, and no way to really release them.
>
> Thus later it causes the leakage.
>
> In fact, this can be even worse, since the file offset is incorrect, we
> can hit cases like the incorrect file offset belongs to a HOLE, and
> later cause btrfs_num_copies() to trigger error, finally hit
> BUG_ON()/ASSERT() later.
>
> [FIX]
> This patch will add an extra condition in btrfs_bio_add_page() for
> uncompressed IO.
>
> Now we will have more strict requirement for bio pages:
>
> - They should all have the same mapping
> (the mapping check is already implied by the call chain)
>
> - Their logical bytenr should be adjacent
> This is the same as the old condition.
>
> - Their page_offset() (file offset) should be adjacent
> This is the new check.
> This would result a slightly increased amount of bios from btrfs
> (needs holes and inside the same stripe boundary to trigger).
>
> But this would greatly reduce the confusion, as it's pretty common
> to assume a btrfs bio would only contain continuous page cache.
>
> Later we may need extra cleanups, as we no longer needs to handle gaps
> between page offsets in endio functions.
>
> Currently this should be the minimal patch to fix commit 7aa51232e204
> ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector").
>
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Fixes: 7aa51232e204 ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Thanks for fixing it, added to misc-next.
> @@ -3265,10 +3265,34 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
> if (bio_ctrl->compress_type != compress_type)
> return 0;
>
> - if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
> +
> + if (bio->bi_iter.bi_size == 0) {
> + /* We can always add a page into an empty bio. */
> + contig = true;
> + } else if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE) {
> + struct bio_vec *bvec = bio_last_bvec_all(bio);
> +
> + /*
> + * The contig check requires the following conditions to be met:
> + * 1) The pages are belonging to the same inode
> + * This is implied by the call chain.
> + *
> + * 2) The range has adjacent logical bytenr
> + *
> + * 3) The range has adjacent file offset (NEW)
> + * This is required for the usage of btrfs_bio->file_offset.
> + */
> + contig = bio_end_sector(bio) == sector &&
> + page_offset(bvec->bv_page) + bvec->bv_offset +
> + bvec->bv_len == page_offset(page) + pg_offset;
I've converted this to an if(), the one line condition assignments
should be used for simple expressions like "X = (A == B)", otherwise
it's hard to read.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-08-18 12:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-13 8:06 [PATCH v2] btrfs: don't merge pages into bio if their page offset is not continuous Qu Wenruo
2022-08-13 8:09 ` Christoph Hellwig
2022-08-18 12:50 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox