linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW
@ 2022-08-04  7:07 Qu Wenruo
  2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 1/2] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-08-04  7:07 UTC (permalink / raw)
  To: linux-btrfs, stable

Hi Greg and Sasha,

This two patches are backports for v5.15 and v5.10 (for v5.10 conflicts
can be auto resolved) stable branches.

(For older branches from v4.9 to v5.4, due to some naming change,
although the patches can be applied with auto-resolve, they won't compile).

These two patches are reducing the chance of destructive RMW cycle,
where btrfs can use corrupted data to generate new P/Q, thus making some
repairable data unrepairable.

Those patches are more important than what I initially thought, thus
unfortunately they are not CCed to stable by themselves.

Furthermore due to recent refactors/renames, there are quite some member
change related to those patches, thus have to be manually backported.


One of the fastest way to verify the behavior is the existing btrfs/125
test case from fstests. (not in auto group AFAIK).

Qu Wenruo (2):
  btrfs: only write the sectors in the vertical stripe which has data
    stripes
  btrfs: raid56: don't trust any cached sector in
    __raid56_parity_recover()

 fs/btrfs/raid56.c | 74 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 57 insertions(+), 17 deletions(-)

-- 
2.37.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH STABLE 5.10 5.15 1/2] btrfs: only write the sectors in the vertical stripe which has data stripes
  2022-08-04  7:07 [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Qu Wenruo
@ 2022-08-04  7:07 ` Qu Wenruo
  2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 2/2] btrfs: raid56: don't trust any cached sector in __raid56_parity_recover() Qu Wenruo
  2022-08-04 10:25 ` [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Wang Yugui
  2 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-08-04  7:07 UTC (permalink / raw)
  To: linux-btrfs, stable; +Cc: David Sterba

commit bd8f7e627703ca5707833d623efcd43f104c7b3f upstream.

If we have only 8K partial write at the beginning of a full RAID56
stripe, we will write the following contents:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX|

|X| means the sector will be written back to disk.

Note that, although we won't write any sectors from disk 2, but we will
write the full 64KiB of parity to disk.

This behavior is fine for now, but not for the future (especially for
RAID56J, as we waste quite some space to journal the unused parity
stripes).

So here we will also utilize the btrfs_raid_bio::dbitmap, anytime we
queue a higher level bio into an rbio, we will update rbio::dbitmap to
indicate which vertical stripes we need to writeback.

And at finish_rmw(), we also check dbitmap to see if we need to write
any sector in the vertical stripe.

So after the patch, above example will only lead to the following
writeback pattern:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XX|            |               |

Cc: stable@vger.kernel.org # 5.10 5.15
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/raid56.c | 55 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 893d93e3c516..ac1e8d2714b0 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -324,6 +324,9 @@ static void merge_rbio(struct btrfs_raid_bio *dest,
 {
 	bio_list_merge(&dest->bio_list, &victim->bio_list);
 	dest->bio_list_bytes += victim->bio_list_bytes;
+	/* Also inherit the bitmaps from @victim. */
+	bitmap_or(dest->dbitmap, victim->dbitmap, dest->dbitmap,
+		  dest->stripe_npages);
 	dest->generic_bio_cnt += victim->generic_bio_cnt;
 	bio_list_init(&victim->bio_list);
 }
@@ -865,6 +868,12 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err)
 
 	if (rbio->generic_bio_cnt)
 		btrfs_bio_counter_sub(rbio->fs_info, rbio->generic_bio_cnt);
+	/*
+	 * Clear the data bitmap, as the rbio may be cached for later usage.
+	 * do this before before unlock_stripe() so there will be no new bio
+	 * for this bio.
+	 */
+	bitmap_clear(rbio->dbitmap, 0, rbio->stripe_npages);
 
 	/*
 	 * At this moment, rbio->bio_list is empty, however since rbio does not
@@ -1197,6 +1206,9 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	else
 		BUG();
 
+	/* We should have at least one data sector. */
+	ASSERT(bitmap_weight(rbio->dbitmap, rbio->stripe_npages));
+
 	/* at this point we either have a full stripe,
 	 * or we've read the full stripe from the drive.
 	 * recalculate the parity and write the new results.
@@ -1268,6 +1280,11 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
 		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
 			struct page *page;
+
+			/* This vertical stripe has no data, skip it. */
+			if (!test_bit(pagenr, rbio->dbitmap))
+				continue;
+
 			if (stripe < rbio->nr_data) {
 				page = page_in_rbio(rbio, stripe, pagenr, 1);
 				if (!page)
@@ -1292,6 +1309,11 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 
 		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
 			struct page *page;
+
+			/* This vertical stripe has no data, skip it. */
+			if (!test_bit(pagenr, rbio->dbitmap))
+				continue;
+
 			if (stripe < rbio->nr_data) {
 				page = page_in_rbio(rbio, stripe, pagenr, 1);
 				if (!page)
@@ -1715,6 +1737,33 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	run_plug(plug);
 }
 
+/* Add the original bio into rbio->bio_list, and update rbio::dbitmap. */
+static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
+{
+	const struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
+	const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT;
+	const u64 full_stripe_start = rbio->bioc->raid_map[0];
+	const u32 orig_len = orig_bio->bi_iter.bi_size;
+	const u32 sectorsize = fs_info->sectorsize;
+	u64 cur_logical;
+
+	ASSERT(orig_logical >= full_stripe_start &&
+	       orig_logical + orig_len <= full_stripe_start +
+	       rbio->nr_data * rbio->stripe_len);
+
+	bio_list_add(&rbio->bio_list, orig_bio);
+	rbio->bio_list_bytes += orig_bio->bi_iter.bi_size;
+
+	/* Update the dbitmap. */
+	for (cur_logical = orig_logical; cur_logical < orig_logical + orig_len;
+	     cur_logical += sectorsize) {
+		int bit = ((u32)(cur_logical - full_stripe_start) >>
+			   fs_info->sectorsize_bits) % rbio->stripe_npages;
+
+		set_bit(bit, rbio->dbitmap);
+	}
+}
+
 /*
  * our main entry point for writes from the rest of the FS.
  */
@@ -1731,9 +1780,8 @@ int raid56_parity_write(struct btrfs_fs_info *fs_info, struct bio *bio,
 		btrfs_put_bioc(bioc);
 		return PTR_ERR(rbio);
 	}
-	bio_list_add(&rbio->bio_list, bio);
-	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 	rbio->operation = BTRFS_RBIO_WRITE;
+	rbio_add_bio(rbio, bio);
 
 	btrfs_bio_counter_inc_noblocked(fs_info);
 	rbio->generic_bio_cnt = 1;
@@ -2135,8 +2183,7 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
 	}
 
 	rbio->operation = BTRFS_RBIO_READ_REBUILD;
-	bio_list_add(&rbio->bio_list, bio);
-	rbio->bio_list_bytes = bio->bi_iter.bi_size;
+	rbio_add_bio(rbio, bio);
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH STABLE 5.10 5.15 2/2] btrfs: raid56: don't trust any cached sector in __raid56_parity_recover()
  2022-08-04  7:07 [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Qu Wenruo
  2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 1/2] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
@ 2022-08-04  7:07 ` Qu Wenruo
  2022-08-04 10:25 ` [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Wang Yugui
  2 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-08-04  7:07 UTC (permalink / raw)
  To: linux-btrfs, stable; +Cc: David Sterba

commit f6065f8edeb25f4a9dfe0b446030ad995a84a088 upstream.

[BUG]
There is a small workload which will always fail with recent kernel:
(A simplified version from btrfs/125 test case)

  mkfs.btrfs -f -m raid5 -d raid5 -b 1G $dev1 $dev2 $dev3
  mount $dev1 $mnt
  xfs_io -f -c "pwrite -S 0xee 0 1M" $mnt/file1
  sync
  umount $mnt
  btrfs dev scan -u $dev3
  mount -o degraded $dev1 $mnt
  xfs_io -f -c "pwrite -S 0xff 0 128M" $mnt/file2
  umount $mnt
  btrfs dev scan
  mount $dev1 $mnt
  btrfs balance start --full-balance $mnt
  umount $mnt

The failure is always failed to read some tree blocks:

  BTRFS info (device dm-4): relocating block group 217710592 flags data|raid5
  BTRFS error (device dm-4): parent transid verify failed on 38993920 wanted 9 found 7
  BTRFS error (device dm-4): parent transid verify failed on 38993920 wanted 9 found 7
  ...

[CAUSE]
With the recently added debug output, we can see all RAID56 operations
related to full stripe 38928384:

  56.1183: raid56_read_partial: full_stripe=38928384 devid=2 type=DATA1 offset=0 opf=0x0 physical=9502720 len=65536
  56.1185: raid56_read_partial: full_stripe=38928384 devid=3 type=DATA2 offset=16384 opf=0x0 physical=9519104 len=16384
  56.1185: raid56_read_partial: full_stripe=38928384 devid=3 type=DATA2 offset=49152 opf=0x0 physical=9551872 len=16384
  56.1187: raid56_write_stripe: full_stripe=38928384 devid=3 type=DATA2 offset=0 opf=0x1 physical=9502720 len=16384
  56.1188: raid56_write_stripe: full_stripe=38928384 devid=3 type=DATA2 offset=32768 opf=0x1 physical=9535488 len=16384
  56.1188: raid56_write_stripe: full_stripe=38928384 devid=1 type=PQ1 offset=0 opf=0x1 physical=30474240 len=16384
  56.1189: raid56_write_stripe: full_stripe=38928384 devid=1 type=PQ1 offset=32768 opf=0x1 physical=30507008 len=16384
  56.1218: raid56_write_stripe: full_stripe=38928384 devid=3 type=DATA2 offset=49152 opf=0x1 physical=9551872 len=16384
  56.1219: raid56_write_stripe: full_stripe=38928384 devid=1 type=PQ1 offset=49152 opf=0x1 physical=30523392 len=16384
  56.2721: raid56_parity_recover: full stripe=38928384 eb=39010304 mirror=2
  56.2723: raid56_parity_recover: full stripe=38928384 eb=39010304 mirror=2
  56.2724: raid56_parity_recover: full stripe=38928384 eb=39010304 mirror=2

Before we enter raid56_parity_recover(), we have triggered some metadata
write for the full stripe 38928384, this leads to us to read all the
sectors from disk.

Furthermore, btrfs raid56 write will cache its calculated P/Q sectors to
avoid unnecessary read.

This means, for that full stripe, after any partial write, we will have
stale data, along with P/Q calculated using that stale data.

Thankfully due to patch "btrfs: only write the sectors in the vertical stripe
which has data stripes" we haven't submitted all the corrupted P/Q to disk.

When we really need to recover certain range, aka in
raid56_parity_recover(), we will use the cached rbio, along with its
cached sectors (the full stripe is all cached).

This explains why we have no event raid56_scrub_read_recover()
triggered.

Since we have the cached P/Q which is calculated using the stale data,
the recovered one will just be stale.

In our particular test case, it will always return the same incorrect
metadata, thus causing the same error message "parent transid verify
failed on 39010304 wanted 9 found 7" again and again.

[BTRFS DESTRUCTIVE RMW PROBLEM]

Test case btrfs/125 (and above workload) always has its trouble with
the destructive read-modify-write (RMW) cycle:

        0       32K     64K
Data1:  | Good  | Good  |
Data2:  | Bad   | Bad   |
Parity: | Good  | Good  |

In above case, if we trigger any write into Data1, we will use the bad
data in Data2 to re-generate parity, killing the only chance to recovery
Data2, thus Data2 is lost forever.

This destructive RMW cycle is not specific to btrfs RAID56, but there
are some btrfs specific behaviors making the case even worse:

- Btrfs will cache sectors for unrelated vertical stripes.

  In above example, if we're only writing into 0~32K range, btrfs will
  still read data range (32K ~ 64K) of Data1, and (64K~128K) of Data2.
  This behavior is to cache sectors for later update.

  Incidentally commit d4e28d9b5f04 ("btrfs: raid56: make steal_rbio()
  subpage compatible") has a bug which makes RAID56 to never trust the
  cached sectors, thus slightly improve the situation for recovery.

  Unfortunately, follow up fix "btrfs: update stripe_sectors::uptodate in
  steal_rbio" will revert the behavior back to the old one.

- Btrfs raid56 partial write will update all P/Q sectors and cache them

  This means, even if data at (64K ~ 96K) of Data2 is free space, and
  only (96K ~ 128K) of Data2 is really stale data.
  And we write into that (96K ~ 128K), we will update all the parity
  sectors for the full stripe.

  This unnecessary behavior will completely kill the chance of recovery.

  Thankfully, an unrelated optimization "btrfs: only write the sectors
  in the vertical stripe which has data stripes" will prevent
  submitting the write bio for untouched vertical sectors.

  That optimization will keep the on-disk P/Q untouched for a chance for
  later recovery.

[FIX]
Although we have no good way to completely fix the destructive RMW
(unless we go full scrub for each partial write), we can still limit the
damage.

With patch "btrfs: only write the sectors in the vertical stripe which
has data stripes" now we won't really submit the P/Q of unrelated
vertical stripes, so the on-disk P/Q should still be fine.

Now we really need to do is just drop all the cached sectors when doing
recovery.

By this, we have a chance to read the original P/Q from disk, and have a
chance to recover the stale data, while still keep the cache to speed up
regular write path.

In fact, just dropping all the cache for recovery path is good enough to
allow the test case btrfs/125 along with the small script to pass
reliably.

The lack of metadata write after the degraded mount, and forced metadata
COW is saving us this time.

So this patch will fix the behavior by not trust any cache in
__raid56_parity_recover(), to solve the problem while still keep the
cache useful.

But please note that this test pass DOES NOT mean we have solved the
destructive RMW problem, we just do better damage control a little
better.

Related patches:

- btrfs: only write the sectors in the vertical stripe
- d4e28d9b5f04 ("btrfs: raid56: make steal_rbio() subpage compatible")
- btrfs: update stripe_sectors::uptodate in steal_rbio

Cc: stable@vger.kernel.org # 5.10 5.15
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/raid56.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index ac1e8d2714b0..4415c8917019 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2085,9 +2085,12 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 	atomic_set(&rbio->error, 0);
 
 	/*
-	 * read everything that hasn't failed.  Thanks to the
-	 * stripe cache, it is possible that some or all of these
-	 * pages are going to be uptodate.
+	 * Read everything that hasn't failed. However this time we will
+	 * not trust any cached sector.
+	 * As we may read out some stale data but higher layer is not reading
+	 * that stale part.
+	 *
+	 * So here we always re-read everything in recovery path.
 	 */
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
 		if (rbio->faila == stripe || rbio->failb == stripe) {
@@ -2096,16 +2099,6 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 		}
 
 		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *p;
-
-			/*
-			 * the rmw code may have already read this
-			 * page in
-			 */
-			p = rbio_stripe_page(rbio, stripe, pagenr);
-			if (PageUptodate(p))
-				continue;
-
 			ret = rbio_add_io_page(rbio, &bio_list,
 				       rbio_stripe_page(rbio, stripe, pagenr),
 				       stripe, pagenr, rbio->stripe_len);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW
  2022-08-04  7:07 [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Qu Wenruo
  2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 1/2] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
  2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 2/2] btrfs: raid56: don't trust any cached sector in __raid56_parity_recover() Qu Wenruo
@ 2022-08-04 10:25 ` Wang Yugui
  2022-08-04 11:26   ` Qu Wenruo
  2 siblings, 1 reply; 6+ messages in thread
From: Wang Yugui @ 2022-08-04 10:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, stable

[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]

Hi,

xfstest btrfs/158 trigged a panic after these 2 patches are applied.

btrfs-158-dmesg.txt
	dmesg output when panic
btrfs-158-dmesg-decoded.txt
	dmesg output decoded by decode_stacktrace.sh
	and some source code is added too.

reproduce rate:
	not 100%, but 2 times here.

xfstest  './check -g scrub' seem higher rate  than
'./check test/btrfs/158' to reproduce this problem .

linux kernel: 5.15.59 with some local backport patches too.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2022/08/04

> Hi Greg and Sasha,
> 
> This two patches are backports for v5.15 and v5.10 (for v5.10 conflicts
> can be auto resolved) stable branches.
> 
> (For older branches from v4.9 to v5.4, due to some naming change,
> although the patches can be applied with auto-resolve, they won't compile).
> 
> These two patches are reducing the chance of destructive RMW cycle,
> where btrfs can use corrupted data to generate new P/Q, thus making some
> repairable data unrepairable.
> 
> Those patches are more important than what I initially thought, thus
> unfortunately they are not CCed to stable by themselves.
> 
> Furthermore due to recent refactors/renames, there are quite some member
> change related to those patches, thus have to be manually backported.
> 
> 
> One of the fastest way to verify the behavior is the existing btrfs/125
> test case from fstests. (not in auto group AFAIK).
> 
> Qu Wenruo (2):
>   btrfs: only write the sectors in the vertical stripe which has data
>     stripes
>   btrfs: raid56: don't trust any cached sector in
>     __raid56_parity_recover()
> 
>  fs/btrfs/raid56.c | 74 ++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 57 insertions(+), 17 deletions(-)
> 
> -- 
> 2.37.0


[-- Attachment #2: btrfs-158-dmesg.txt --]
[-- Type: application/octet-stream, Size: 6377 bytes --]

[ 1852.190978] run fstests btrfs/158 at 2022-08-04 18:00:39
[ 1852.373676] BTRFS info (device sdb1): enabling tiering(tier=auto)
[ 1852.380583] BTRFS info (device sdb1): using free space tree
[ 1852.389925] BTRFS info (device sdb1): enabling ssd optimizations
[ 1852.697009] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 1 transid 6 /dev/sdb2 scanned by systemd-udevd (198490)
[ 1852.709663] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 2 transid 6 /dev/sdb3 scanned by systemd-udevd (198488)
[ 1852.722186] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 3 transid 6 /dev/sdb4 scanned by systemd-udevd (198489)
[ 1852.734797] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 4 transid 6 /dev/sdb5 scanned by systemd-udevd (200492)
[ 1852.935269] BTRFS info (device sdb2): enabling tiering(tier=auto)
[ 1852.942020] BTRFS info (device sdb2): using free space tree
[ 1852.950910] BTRFS info (device sdb2): enabling ssd optimizations
[ 1852.957933] BTRFS info (device sdb2): checking UUID tree
[ 1853.330957] BTRFS info (device sdb2): enabling tiering(tier=auto)
[ 1853.337734] BTRFS info (device sdb2): using free space tree
[ 1853.346797] BTRFS info (device sdb2): enabling ssd optimizations
[ 1853.355651] BTRFS info (device sdb2): scrub: started on devid 1
[ 1853.355666] BTRFS info (device sdb2): scrub: started on devid 3
[ 1853.355683] BTRFS info (device sdb2): scrub: started on devid 2
[ 1853.355764] BTRFS info (device sdb2): scrub: started on devid 4
[ 1853.384159] BTRFS warning (device sdb2): checksum error at logical 298909696 on dev /dev/sdb3, physical 1048576, root 5, inode 257, offset 65536, length 4096, links 1 (path: foobar)
[ 1853.401724] BTRFS error (device sdb2): bdev /dev/sdb3 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 1853.411355] BUG: kernel NULL pointer dereference, address: 0000000000000cec
[ 1853.416714] #PF: supervisor read access in kernel mode
[ 1853.421717] #PF: error_code(0x0000) - not-present page
[ 1853.427714] PGD 0 P4D 0
[ 1853.431718] Oops: 0000 [#1] SMP NOPTI
[ 1853.436714] CPU: 9 PID: 88073 Comm: kworker/u81:5 Not tainted 5.15.59-3.el7.x86_64 #1
[ 1853.446713] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[ 1853.453729] Workqueue: btrfs-scrub btrfs_work_helper [btrfs]
[ 1853.459728] RIP: 0010:rbio_add_bio+0x49/0xc0 [btrfs]
[ 1853.466713] Code: 39 d0 0f 82 dc 69 02 00 8b 87 bc 00 00 00 0f af 87 b8 00 00 00 4d 01 c2 48 98 48 01 d0 49 39 c2 0f 87 be 69 02 00 48 8b 59 08 <44> 8b 9b ec 0c 00 00 48 c7 06 00 00 00 00 48 8b 87 90 00 00 00 48
[ 1853.487713] RSP: 0018:ffffb9bb8edf7c80 EFLAGS: 00010246
[ 1853.493715] RAX: 0000000011d20000 RBX: 0000000000000000 RCX: ffff93601b156b00
[ 1853.498714] RDX: 0000000011d00000 RSI: ffff933ed3ba3b38 RDI: ffff934169de6000
[ 1853.508714] RBP: ffff93601b156b00 R08: 0000000011d10000 R09: ffff934169de6000
[ 1853.513717] R10: 0000000011d20000 R11: 0000000000000000 R12: 0000000000000000
[ 1853.523718] R13: ffff93410745c000 R14: ffff934169de6000 R15: ffff933ed3ba3b38
[ 1853.532723] FS:  0000000000000000(0000) GS:ffff935dafa40000(0000) knlGS:0000000000000000
[ 1853.542714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1853.547713] CR2: 0000000000000cec CR3: 0000001fd0010001 CR4: 00000000001706e0
[ 1853.555726] Call Trace:
[ 1853.560714]  <TASK>
[ 1853.560714]  raid56_parity_recover+0x65/0x1d0 [btrfs]
[ 1853.569716]  scrub_recheck_block+0x271/0x2f0 [btrfs]
[ 1853.574715]  scrub_handle_errored_block+0x7e8/0x10b0 [btrfs]
[ 1853.579721]  scrub_bio_end_io_worker+0xef/0x2f0 [btrfs]
[ 1853.588715]  ? put_prev_task_fair+0x21/0x40
[ 1853.593714]  ? pick_next_task+0x96/0xbe0
[ 1853.598715]  btrfs_work_helper+0xbf/0x300 [btrfs]
[ 1853.603717]  process_one_work+0x1cb/0x370
[ 1853.608715]  worker_thread+0x30/0x380
[ 1853.613716]  ? process_one_work+0x370/0x370
[ 1853.617717]  kthread+0x118/0x140
[ 1853.622714]  ? set_kthread_struct+0x50/0x50
[ 1853.627715]  ret_from_fork+0x1f/0x30
[ 1853.631717]  </TASK>
[ 1853.636721] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp radeon ledtrig_audio snd_hda_codec_hdmi btrfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec snd_hda_core i2c_algo_bit drm_ttm_helper mei_wdt kvm snd_hwdep raid6_pq zstd_compress snd_seq ttm dcdbas zstd_decompress snd_seq_device irqbypass iTCO_wdt rapl iTCO_vendor_support dell_smm_hwmon snd_pcm intel_cstate drm_kms_helper mei_me snd_timer i2c_i801 syscopyarea intel_uncore pcspkr i2c_smbus sysfillrect dm_mod lpc_ich mei snd sysimgblt fb_sys_fops cec soundcore drm fuse xfs sd_mod t10_pi sr_mod cdrom sg bnx2x ahci crct10dif_pclmul crc32_pclmul crc32c_intel libahci mpt3sas libata ghash_clmulni_intel e1000e mdio raid_class scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[ 1853.728714] CR2: 0000000000000cec
[ 1853.733713] ---[ end trace 7f32564f450c4714 ]---
[ 1853.888717] RIP: 0010:rbio_add_bio+0x49/0xc0 [btrfs]
[ 1853.930725] Code: 39 d0 0f 82 dc 69 02 00 8b 87 bc 00 00 00 0f af 87 b8 00 00 00 4d 01 c2 48 98 48 01 d0 49 39 c2 0f 87 be 69 02 00 48 8b 59 08 <44> 8b 9b ec 0c 00 00 48 c7 06 00 00 00 00 48 8b 87 90 00 00 00 48
[ 1853.949725] RSP: 0018:ffffb9bb8edf7c80 EFLAGS: 00010246
[ 1853.959723] RAX: 0000000011d20000 RBX: 0000000000000000 RCX: ffff93601b156b00
[ 1853.965720] RDX: 0000000011d00000 RSI: ffff933ed3ba3b38 RDI: ffff934169de6000
[ 1853.974724] RBP: ffff93601b156b00 R08: 0000000011d10000 R09: ffff934169de6000
[ 1853.983726] R10: 0000000011d20000 R11: 0000000000000000 R12: 0000000000000000
[ 1853.991725] R13: ffff93410745c000 R14: ffff934169de6000 R15: ffff933ed3ba3b38
[ 1854.001717] FS:  0000000000000000(0000) GS:ffff935dafa40000(0000) knlGS:0000000000000000
[ 1854.011721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1854.016719] CR2: 0000000000000cec CR3: 0000001fd0010001 CR4: 00000000001706e0
[ 1854.026721] Kernel panic - not syncing: Fatal exception
[ 1854.030722] Kernel Offset: 0x2e400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1854.030722] ---[ end Kernel panic - not syncing: Fatal exception ]---
                                                                                

[-- Attachment #3: btrfs-158-dmesg-decoded.txt --]
[-- Type: application/octet-stream, Size: 10203 bytes --]

[ 1852.190978] run fstests btrfs/158 at 2022-08-04 18:00:39
[ 1852.373676] BTRFS info (device sdb1): enabling tiering(tier=auto)
[ 1852.380583] BTRFS info (device sdb1): using free space tree
[ 1852.389925] BTRFS info (device sdb1): enabling ssd optimizations
[ 1852.697009] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 1 transid 6 /dev/sdb2 scanned by systemd-udevd (198490)
[ 1852.709663] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 2 transid 6 /dev/sdb3 scanned by systemd-udevd (198488)
[ 1852.722186] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 3 transid 6 /dev/sdb4 scanned by systemd-udevd (198489)
[ 1852.734797] BTRFS: device fsid cb1b521a-5287-4161-bf81-b409f6bd9cc4 devid 4 transid 6 /dev/sdb5 scanned by systemd-udevd (200492)
[ 1852.935269] BTRFS info (device sdb2): enabling tiering(tier=auto)
[ 1852.942020] BTRFS info (device sdb2): using free space tree
[ 1852.950910] BTRFS info (device sdb2): enabling ssd optimizations
[ 1852.957933] BTRFS info (device sdb2): checking UUID tree
[ 1853.330957] BTRFS info (device sdb2): enabling tiering(tier=auto)
[ 1853.337734] BTRFS info (device sdb2): using free space tree
[ 1853.346797] BTRFS info (device sdb2): enabling ssd optimizations
[ 1853.355651] BTRFS info (device sdb2): scrub: started on devid 1
[ 1853.355666] BTRFS info (device sdb2): scrub: started on devid 3
[ 1853.355683] BTRFS info (device sdb2): scrub: started on devid 2
[ 1853.355764] BTRFS info (device sdb2): scrub: started on devid 4
[ 1853.384159] BTRFS warning (device sdb2): checksum error at logical 298909696 on dev /dev/sdb3, physical 1048576, root 5, inode 257, offset 65536, length 4096, links 1 (path: foobar)
[ 1853.401724] BTRFS error (device sdb2): bdev /dev/sdb3 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 1853.411355] BUG: kernel NULL pointer dereference, address: 0000000000000cec
[ 1853.416714] #PF: supervisor read access in kernel mode
[ 1853.421717] #PF: error_code(0x0000) - not-present page
[ 1853.427714] PGD 0 P4D 0
[ 1853.431718] Oops: 0000 [#1] SMP NOPTI
[ 1853.436714] CPU: 9 PID: 88073 Comm: kworker/u81:5 Not tainted 5.15.59-3.el7.x86_64 #1
[ 1853.446713] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
[ 1853.453729] Workqueue: btrfs-scrub btrfs_work_helper [btrfs]
[ 1853.459728] RIP: 0010:rbio_add_bio (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/raid56.c:1747) btrfs
[ 1853.466713] Code: 39 d0 0f 82 dc 69 02 00 8b 87 bc 00 00 00 0f af 87 b8 00 00 00 4d 01 c2 48 98 48 01 d0 49 39 c2 0f 87 be 69 02 00 48 8b 59 08 <44> 8b 9b ec 0c 00 00 48 c7 06 00 00 00 00 48 8b 87 90 00 00 00 48
All code
========
   0:	39 d0                	cmp    %edx,%eax
   2:	0f 82 dc 69 02 00    	jb     0x269e4
   8:	8b 87 bc 00 00 00    	mov    0xbc(%rdi),%eax
   e:	0f af 87 b8 00 00 00 	imul   0xb8(%rdi),%eax
  15:	4d 01 c2             	add    %r8,%r10
  18:	48 98                	cltq   
  1a:	48 01 d0             	add    %rdx,%rax
  1d:	49 39 c2             	cmp    %rax,%r10
  20:	0f 87 be 69 02 00    	ja     0x269e4
  26:	48 8b 59 08          	mov    0x8(%rcx),%rbx
  2a:*	44 8b 9b ec 0c 00 00 	mov    0xcec(%rbx),%r11d		<-- trapping instruction
  31:	48 c7 06 00 00 00 00 	movq   $0x0,(%rsi)
  38:	48 8b 87 90 00 00 00 	mov    0x90(%rdi),%rax
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	44 8b 9b ec 0c 00 00 	mov    0xcec(%rbx),%r11d
   7:	48 c7 06 00 00 00 00 	movq   $0x0,(%rsi)
   e:	48 8b 87 90 00 00 00 	mov    0x90(%rdi),%rax
  15:	48                   	rex.W
[ 1853.487713] RSP: 0018:ffffb9bb8edf7c80 EFLAGS: 00010246
[ 1853.493715] RAX: 0000000011d20000 RBX: 0000000000000000 RCX: ffff93601b156b00
[ 1853.498714] RDX: 0000000011d00000 RSI: ffff933ed3ba3b38 RDI: ffff934169de6000
[ 1853.508714] RBP: ffff93601b156b00 R08: 0000000011d10000 R09: ffff934169de6000
[ 1853.513717] R10: 0000000011d20000 R11: 0000000000000000 R12: 0000000000000000
[ 1853.523718] R13: ffff93410745c000 R14: ffff934169de6000 R15: ffff933ed3ba3b38
[ 1853.532723] FS:  0000000000000000(0000) GS:ffff935dafa40000(0000) knlGS:0000000000000000
[ 1853.542714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1853.547713] CR2: 0000000000000cec CR3: 0000001fd0010001 CR4: 00000000001706e0
[ 1853.555726] Call Trace:
[ 1853.560714]  <TASK>
[ 1853.560714] raid56_parity_recover (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/raid56.c:1385 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/raid56.c:2181) btrfs

static int find_logical_bio_stripe(struct btrfs_raid_bio *rbio,
                   struct bio *bio)
{
L1385:    u64 logical = bio->bi_iter.bi_sector << 9;


    rbio->operation = BTRFS_RBIO_READ_REBUILD;
    rbio_add_bio(rbio, bio);

L2181:    rbio->faila = find_logical_bio_stripe(rbio, bio);
    if (rbio->faila == -1) {
        btrfs_warn(fs_info,

[ 1853.569716] scrub_recheck_block (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/scrub.c:1406 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/scrub.c:1435 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/scrub.c:1469) btrfs
[ 1853.574715] scrub_handle_errored_block (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/scrub.c:1046) btrfs
[ 1853.579721] scrub_bio_end_io_worker (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/scrub.c:2465 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/scrub.c:2388) btrfs
[ 1853.588715] ? put_prev_task_fair (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/sched/fair.c:7430 (discriminator 2)) 
[ 1853.593714] ? pick_next_task (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/sched/sched.h:2186 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/sched/core.c:5611 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/sched/core.c:5725) 
[ 1853.598715] btrfs_work_helper (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/async-thread.c:325) btrfs
[ 1853.603717] process_one_work (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/workqueue.c:2306) 
[ 1853.608715] worker_thread (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/include/linux/list.h:290 /usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/workqueue.c:2454) 
[ 1853.613716] ? process_one_work (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/workqueue.c:2396) 
[ 1853.617717] kthread (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/kthread.c:319) 
[ 1853.622714] ? set_kthread_struct (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/kernel/kthread.c:272) 
[ 1853.627715] ret_from_fork (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/arch/x86/entry/entry_64.S:298) 
[ 1853.631717]  </TASK>
[ 1853.636721] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill ib_core sunrpc dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic coretemp radeon ledtrig_audio snd_hda_codec_hdmi btrfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_hda_codec snd_hda_core i2c_algo_bit drm_ttm_helper mei_wdt kvm snd_hwdep raid6_pq zstd_compress snd_seq ttm dcdbas zstd_decompress snd_seq_device irqbypass iTCO_wdt rapl iTCO_vendor_support dell_smm_hwmon snd_pcm intel_cstate drm_kms_helper mei_me snd_timer i2c_i801 syscopyarea intel_uncore pcspkr i2c_smbus sysfillrect dm_mod lpc_ich mei snd sysimgblt fb_sys_fops cec soundcore drm fuse xfs sd_mod t10_pi sr_mod cdrom sg bnx2x ahci crct10dif_pclmul crc32_pclmul crc32c_intel libahci mpt3sas libata ghash_clmulni_intel e1000e mdio raid_class scsi_transport_sas wmi i2c_dev ipmi_devintf ipmi_msghandler
[ 1853.728714] CR2: 0000000000000cec
[ 1853.733713] ---[ end trace 7f32564f450c4714 ]---
[ 1853.888717] RIP: 0010:rbio_add_bio (/usr/src/debug/kernel-5.15.59/linux-5.15.59-3.el7.x86_64/fs/btrfs/raid56.c:1747) btrfs
[ 1853.930725] Code: 39 d0 0f 82 dc 69 02 00 8b 87 bc 00 00 00 0f af 87 b8 00 00 00 4d 01 c2 48 98 48 01 d0 49 39 c2 0f 87 be 69 02 00 48 8b 59 08 <44> 8b 9b ec 0c 00 00 48 c7 06 00 00 00 00 48 8b 87 90 00 00 00 48
All code
========
   0:	39 d0                	cmp    %edx,%eax
   2:	0f 82 dc 69 02 00    	jb     0x269e4
   8:	8b 87 bc 00 00 00    	mov    0xbc(%rdi),%eax
   e:	0f af 87 b8 00 00 00 	imul   0xb8(%rdi),%eax
  15:	4d 01 c2             	add    %r8,%r10
  18:	48 98                	cltq   
  1a:	48 01 d0             	add    %rdx,%rax
  1d:	49 39 c2             	cmp    %rax,%r10
  20:	0f 87 be 69 02 00    	ja     0x269e4
  26:	48 8b 59 08          	mov    0x8(%rcx),%rbx
  2a:*	44 8b 9b ec 0c 00 00 	mov    0xcec(%rbx),%r11d		<-- trapping instruction
  31:	48 c7 06 00 00 00 00 	movq   $0x0,(%rsi)
  38:	48 8b 87 90 00 00 00 	mov    0x90(%rdi),%rax
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	44 8b 9b ec 0c 00 00 	mov    0xcec(%rbx),%r11d
   7:	48 c7 06 00 00 00 00 	movq   $0x0,(%rsi)
   e:	48 8b 87 90 00 00 00 	mov    0x90(%rdi),%rax
  15:	48                   	rex.W
[ 1853.949725] RSP: 0018:ffffb9bb8edf7c80 EFLAGS: 00010246
[ 1853.959723] RAX: 0000000011d20000 RBX: 0000000000000000 RCX: ffff93601b156b00
[ 1853.965720] RDX: 0000000011d00000 RSI: ffff933ed3ba3b38 RDI: ffff934169de6000
[ 1853.974724] RBP: ffff93601b156b00 R08: 0000000011d10000 R09: ffff934169de6000
[ 1853.983726] R10: 0000000011d20000 R11: 0000000000000000 R12: 0000000000000000
[ 1853.991725] R13: ffff93410745c000 R14: ffff934169de6000 R15: ffff933ed3ba3b38
[ 1854.001717] FS:  0000000000000000(0000) GS:ffff935dafa40000(0000) knlGS:0000000000000000
[ 1854.011721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1854.016719] CR2: 0000000000000cec CR3: 0000001fd0010001 CR4: 00000000001706e0
[ 1854.026721] Kernel panic - not syncing: Fatal exception
[ 1854.030722] Kernel Offset: 0x2e400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1854.030722] ---[ end Kernel panic - not syncing: Fatal exception ]---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW
  2022-08-04 10:25 ` [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Wang Yugui
@ 2022-08-04 11:26   ` Qu Wenruo
  2022-08-04 11:31     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2022-08-04 11:26 UTC (permalink / raw)
  To: Wang Yugui, Qu Wenruo; +Cc: linux-btrfs, stable



On 2022/8/4 18:25, Wang Yugui wrote:
> Hi,
>
> xfstest btrfs/158 trigged a panic after these 2 patches are applied.
>
> btrfs-158-dmesg.txt
> 	dmesg output when panic
> btrfs-158-dmesg-decoded.txt
> 	dmesg output decoded by decode_stacktrace.sh
> 	and some source code is added too.
>
> reproduce rate:
> 	not 100%, but 2 times here.
>
> xfstest  './check -g scrub' seem higher rate  than
> './check test/btrfs/158' to reproduce this problem .

Also reproduced here running that in a loop.

>
> linux kernel: 5.15.59 with some local backport patches too.

Got the reason pinned down, missing one dependency.

The code triggering the crash is "const u32 sectorsize =
fs_info->sectorsize", and @fs_info is from bioc.

But bioc initialization doesn't ensure every bioc has its fs_info
initialized.

That is only ensured by commit 731ccf15c952 ("btrfs: make sure
btrfs_io_context::fs_info is always initialized").

So I have also need to backport that patch.

Weirdly, I ran my tests with "-g raid -g replace -g scrub" but didn't
trigger this on even older branches.

I'll do more tests to make sure it doesn't cause problems.

Thanks,
Qu


>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2022/08/04
>
>> Hi Greg and Sasha,
>>
>> This two patches are backports for v5.15 and v5.10 (for v5.10 conflicts
>> can be auto resolved) stable branches.
>>
>> (For older branches from v4.9 to v5.4, due to some naming change,
>> although the patches can be applied with auto-resolve, they won't compile).
>>
>> These two patches are reducing the chance of destructive RMW cycle,
>> where btrfs can use corrupted data to generate new P/Q, thus making some
>> repairable data unrepairable.
>>
>> Those patches are more important than what I initially thought, thus
>> unfortunately they are not CCed to stable by themselves.
>>
>> Furthermore due to recent refactors/renames, there are quite some member
>> change related to those patches, thus have to be manually backported.
>>
>>
>> One of the fastest way to verify the behavior is the existing btrfs/125
>> test case from fstests. (not in auto group AFAIK).
>>
>> Qu Wenruo (2):
>>    btrfs: only write the sectors in the vertical stripe which has data
>>      stripes
>>    btrfs: raid56: don't trust any cached sector in
>>      __raid56_parity_recover()
>>
>>   fs/btrfs/raid56.c | 74 ++++++++++++++++++++++++++++++++++++-----------
>>   1 file changed, 57 insertions(+), 17 deletions(-)
>>
>> --
>> 2.37.0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW
  2022-08-04 11:26   ` Qu Wenruo
@ 2022-08-04 11:31     ` Qu Wenruo
  0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-08-04 11:31 UTC (permalink / raw)
  To: Wang Yugui, Qu Wenruo; +Cc: linux-btrfs, stable



On 2022/8/4 19:26, Qu Wenruo wrote:
>
>
> On 2022/8/4 18:25, Wang Yugui wrote:
>> Hi,
>>
>> xfstest btrfs/158 trigged a panic after these 2 patches are applied.
>>
>> btrfs-158-dmesg.txt
>>     dmesg output when panic
>> btrfs-158-dmesg-decoded.txt
>>     dmesg output decoded by decode_stacktrace.sh
>>     and some source code is added too.
>>
>> reproduce rate:
>>     not 100%, but 2 times here.
>>
>> xfstest  './check -g scrub' seem higher rate  than
>> './check test/btrfs/158' to reproduce this problem .
>
> Also reproduced here running that in a loop.
>
>>
>> linux kernel: 5.15.59 with some local backport patches too.
>
> Got the reason pinned down, missing one dependency.
>
> The code triggering the crash is "const u32 sectorsize =
> fs_info->sectorsize", and @fs_info is from bioc.
>
> But bioc initialization doesn't ensure every bioc has its fs_info
> initialized.
>
> That is only ensured by commit 731ccf15c952 ("btrfs: make sure
> btrfs_io_context::fs_info is always initialized").

Wait, it can be done without that dependency, just use old
btrfs_raid_bio::fs_info.

Thanks,
Qu

>
> So I have also need to backport that patch.
>
> Weirdly, I ran my tests with "-g raid -g replace -g scrub" but didn't
> trigger this on even older branches.
>
> I'll do more tests to make sure it doesn't cause problems.
>
> Thanks,
> Qu
>
>
>>
>> Best Regards
>> Wang Yugui (wangyugui@e16-tech.com)
>> 2022/08/04
>>
>>> Hi Greg and Sasha,
>>>
>>> This two patches are backports for v5.15 and v5.10 (for v5.10 conflicts
>>> can be auto resolved) stable branches.
>>>
>>> (For older branches from v4.9 to v5.4, due to some naming change,
>>> although the patches can be applied with auto-resolve, they won't
>>> compile).
>>>
>>> These two patches are reducing the chance of destructive RMW cycle,
>>> where btrfs can use corrupted data to generate new P/Q, thus making some
>>> repairable data unrepairable.
>>>
>>> Those patches are more important than what I initially thought, thus
>>> unfortunately they are not CCed to stable by themselves.
>>>
>>> Furthermore due to recent refactors/renames, there are quite some member
>>> change related to those patches, thus have to be manually backported.
>>>
>>>
>>> One of the fastest way to verify the behavior is the existing btrfs/125
>>> test case from fstests. (not in auto group AFAIK).
>>>
>>> Qu Wenruo (2):
>>>    btrfs: only write the sectors in the vertical stripe which has data
>>>      stripes
>>>    btrfs: raid56: don't trust any cached sector in
>>>      __raid56_parity_recover()
>>>
>>>   fs/btrfs/raid56.c | 74 ++++++++++++++++++++++++++++++++++++-----------
>>>   1 file changed, 57 insertions(+), 17 deletions(-)
>>>
>>> --
>>> 2.37.0
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-04 11:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-04  7:07 [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Qu Wenruo
2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 1/2] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
2022-08-04  7:07 ` [PATCH STABLE 5.10 5.15 2/2] btrfs: raid56: don't trust any cached sector in __raid56_parity_recover() Qu Wenruo
2022-08-04 10:25 ` [PATCH STABLE 5.10 5.15 0/2] btrfs: raid56 backports to reduce destructive RMW Wang Yugui
2022-08-04 11:26   ` Qu Wenruo
2022-08-04 11:31     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).