* Buggy behaviour after replacing failed disk in RAID1
[not found] <CACsxjPaFgBMRkeEgbHcGwM7czSrjtakX9hSKXQq7RL2wJZYYCA@mail.gmail.com>
@ 2023-01-01 0:05 ` 小太
2023-01-01 0:38 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: 小太 @ 2023-01-01 0:05 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 4634 bytes --]
dmesg (with spammed similar lines removed): See attachment
uname -a: Linux home.kota.moe 6.0.0-5-amd64 #1 SMP PREEMPT_DYNAMIC
Debian 6.0.10-2 (2022-12-01) x86_64 GNU/Linux
btrfs --version: btrfs-progs v6.1
Recently one of my disks in my BTRFS RAID1 array failed, so I
disconnected it from the system:
> kota@home:~$ sudo btrfs fi show
> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
> Total devices 3 FS bytes used 1.47TiB
> devid 1 size 0 used 0 path /dev/sda1 MISSING
> devid 2 size 1.82TiB used 956.00GiB path /dev/sdc1
> devid 3 size 1.82TiB used 1.35TiB path /dev/sdb1
I then connected a new disk, and started a btrfs replace:
> kota@home:~$ sudo btrfs replace start 1 /dev/sdd1 /media/Data
> kota@home:~$ sudo btrfs fi show
> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
> Total devices 4 FS bytes used 1.47TiB
> devid 0 size 1.36TiB used 925.03GiB path /dev/sdd1
> devid 1 size 0 used 0 path /dev/sda1 MISSING
> devid 2 size 1.82TiB used 956.00GiB path /dev/sdc1
> devid 3 size 1.82TiB used 1.35TiB path /dev/sdb1
> kota@home:~$ sudo btrfs replace status /media/Data
> Started on 31.Dec 22:03:20, finished on 1.Jan 02:30:26, 0 write errs, 0 uncorr. read errs
This operation spammed my dmesg with lines all looking like the following:
> [1762170.345526] BTRFS error (device sda1): fixed up error at logical 6863996235776 on dev /dev/sda1
> ...
> [1762174.289119] btrfs_dev_stat_inc_and_print: 91111 callbacks suppressed
> [1762174.289123] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4147, rd 220063627, flush 0, corrupt 0, gen 0
> ...
> [1762175.320807] scrub_handle_errored_block: 91075 callbacks suppressed
> [1762175.320850] BTRFS warning (device sda1): i/o error at logical 6909072957440 on dev /dev/sda1, physical 1395819732992, root 257, inode 452632, offset 1293344768, length 4096, links 1 (path: phoronix-test-suite/config/installed-tests/pts/unigine-super-1.0.7/Unigine_Superposition-1.0.run)
> ...
> [1762175.348848] scrub_handle_errored_block: 91080 callbacks suppressed
> [1762175.348851] BTRFS error (device sda1): fixed up error at logical 6909075079168 on dev /dev/sda1
> ...
> [1762176.154002] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
> ...
> [1762176.172957] BTRFS error (device sda1): error writing primary super block to device 1
>
> [1762176.196418] BTRFS info (device sda1): dev_replace from /dev/sda1 (devid 1) to /dev/sdd1 finished
Once it finished, I wanted to check the filesystem and do a scrub:
> kota@home:~$ sudo btrfs fi show
> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
> Total devices 4 FS bytes used 1.47TiB
> devid 0 size 0 used 0 path /dev/sda1 MISSING
> devid 1 size 1.36TiB used 925.03GiB path /dev/sdd1
> devid 2 size 1.82TiB used 956.03GiB path /dev/sdc1
> devid 3 size 1.82TiB used 1.35TiB path /dev/sdb1
> kota@home:~$ sudo btrfs fi df /media/Data
> Data, RAID1: total=1.59TiB, used=1.46TiB
> System, RAID1: total=64.00MiB, used=272.00KiB
> Metadata, RAID1: total=5.00GiB, used=3.42GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> kota@home:~$ sudo btrfs device delete missing /media/Data
> ERROR: unable to start device remove, another exclusive operation 'device replace' in progress
> kota@home:~$ sudo btrfs scrub start /media/Data
At this point, the command froze and has been stuck in uninterruptible
sleep ever since (and no further lines in dmesg either).
"btrfs device stats" and "btrfs scrub status" also hang in
uninterruptible sleep as well.
And I notice the original "btrfs replace" command seems to be still running?
> kota@home:~$ ps aux | grep btrfs
> root 3626272 0.5 0.0 7044 2312 ? Ds 2022 3:56 btrfs replace start 1 /dev/sdd1 /media/Data
> root 3832422 0.0 0.0 4948 1408 pts/4 D+ 09:54 0:00 btrfs scrub start /media/Data
> root 3832891 0.0 0.0 4948 1412 pts/7 D+ 09:56 0:00 btrfs scrub status /media/Data
> root 3832943 0.0 0.0 4948 1452 pts/9 D+ 09:56 0:00 btrfs device stats /media/Data
And all (non-cached) accesses to the filesystem (such as ls -l) also
hang similarly.
Given that this is a "reliable" RAID1 setup where a disk dropping dead
shouldn't cause problems, BTRFS failing like this sounds like a bug to
me.
What's going on here, and how should I recover from this?
I haven't tried remounting or restarting the system yet since this
isn't a critical filesystem for me, and it might help debug the issue.
[-- Attachment #2: dmesg.log --]
[-- Type: application/octet-stream, Size: 45946 bytes --]
[1533863.933772] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0xc0000 action 0x0
[1533863.933778] ata4.00: irq_stat 0x40000001
[1533863.933779] ata4: SError: { CommWake 10B8B }
[1533863.933782] ata4.00: failed command: READ DMA EXT
[1533863.933783] ata4.00: cmd 25/00:08:68:fc:e5/00:00:96:00:00/e0 tag 4 dma 4096 in
res 51/40:00:68:fc:e5/00:00:96:00:00/e0 Emask 0x9 (media error)
[1533863.933787] ata4.00: status: { DRDY ERR }
[1533863.933788] ata4.00: error: { UNC }
[1533863.941067] ata4.00: configured for UDMA/133
[1533863.941077] sd 3:0:0:0: [sda] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
[1533863.941080] sd 3:0:0:0: [sda] tag#4 Sense Key : Medium Error [current]
[1533863.941082] sd 3:0:0:0: [sda] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
[1533863.941084] sd 3:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 96 e5 fc 68 00 00 08 00
[1533863.941085] I/O error, dev sda, sector 2531654760 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
[1533863.941093] ata4: EH complete
[1533866.948142] BTRFS warning (device sda1): i/o error at logical 5180593061888 on dev /dev/sda1, physical 1296206184448: metadata leaf (level 0) in tree 2
[1533866.948148] BTRFS warning (device sda1): i/o error at logical 5180593061888 on dev /dev/sda1, physical 1296206184448: metadata leaf (level 0) in tree 2
[1533866.948151] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 3, rd 555, flush 0, corrupt 0, gen 0
[1533867.038990] BTRFS error (device sda1): fixed up error at logical 5180610412544 on dev /dev/sda1
[1533867.054925] BTRFS error (device sda1): fixed up error at logical 5180593061888 on dev /dev/sda1
[1533875.301298] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0xc0000 action 0x0
[1533875.301304] ata4.00: irq_stat 0x40000001
[1533875.301305] ata4: SError: { CommWake 10B8B }
[1533875.301307] ata4.00: failed command: READ DMA EXT
[1533875.301309] ata4.00: cmd 25/00:20:00:4f:f6/00:00:96:00:00/e0 tag 6 dma 16384 in
res 51/40:0f:00:4f:f6/00:00:96:00:00/e0 Emask 0x9 (media error)
[1533875.301313] ata4.00: status: { DRDY ERR }
[1533875.301314] ata4.00: error: { UNC }
[1533875.308506] ata4.00: configured for UDMA/133
[1533875.308517] sd 3:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[1533875.308520] sd 3:0:0:0: [sda] tag#6 Sense Key : Medium Error [current]
[1533875.308521] sd 3:0:0:0: [sda] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed
[1533875.308523] sd 3:0:0:0: [sda] tag#6 CDB: Read(10) 28 00 96 f6 4f 00 00 00 20 00
[1533875.308524] I/O error, dev sda, sector 2532724480 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 3
[1533875.308534] ata4: EH complete
[1533879.009144] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0xc0000 action 0x0
[1533879.009150] ata4.00: irq_stat 0x40000001
[1533879.009151] ata4: SError: { CommWake 10B8B }
[1533879.009153] ata4.00: failed command: READ DMA EXT
[1533879.009155] ata4.00: cmd 25/00:20:e0:66:f6/00:00:96:00:00/e0 tag 9 dma 16384 in
res 51/40:0f:e0:66:f6/00:00:96:00:00/e0 Emask 0x9 (media error)
[1533879.009159] ata4.00: status: { DRDY ERR }
[1533879.009160] ata4.00: error: { UNC }
[1533879.016370] ata4.00: configured for UDMA/133
[1533879.016382] sd 3:0:0:0: [sda] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[1533879.016385] sd 3:0:0:0: [sda] tag#9 Sense Key : Medium Error [current]
[1533879.016386] sd 3:0:0:0: [sda] tag#9 Add. Sense: Unrecovered read error - auto reallocate failed
[1533879.016388] sd 3:0:0:0: [sda] tag#9 CDB: Read(10) 28 00 96 f6 66 e0 00 00 20 00
[1533879.016390] I/O error, dev sda, sector 2532730592 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 3
[1533879.016401] ata4: EH complete
[1533882.097021] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0xc0000 action 0x0
[1533882.097026] ata4.00: irq_stat 0x40000001
[1533882.097027] ata4: SError: { CommWake 10B8B }
[1533882.097030] ata4.00: failed command: READ DMA EXT
[1533882.097031] ata4.00: cmd 25/00:08:00:4f:f6/00:00:96:00:00/e0 tag 31 dma 4096 in
res 51/40:00:00:4f:f6/00:00:96:00:00/e0 Emask 0x9 (media error)
[1533882.097036] ata4.00: status: { DRDY ERR }
[1533882.097037] ata4.00: error: { UNC }
[1533882.104231] ata4.00: configured for UDMA/133
[1533882.104241] sd 3:0:0:0: [sda] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
[1533882.104244] sd 3:0:0:0: [sda] tag#31 Sense Key : Medium Error [current]
[1533882.104246] sd 3:0:0:0: [sda] tag#31 Add. Sense: Unrecovered read error - auto reallocate failed
[1533882.104248] sd 3:0:0:0: [sda] tag#31 CDB: Read(10) 28 00 96 f6 4f 00 00 00 08 00
[1533882.104249] I/O error, dev sda, sector 2532724480 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
[1533882.104259] ata4: EH complete
[1533885.204897] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0xc0000 action 0x0
[1533885.204903] ata4.00: irq_stat 0x40000001
[1533885.204904] ata4: SError: { CommWake 10B8B }
[1533885.204907] ata4.00: failed command: READ DMA EXT
[1533885.204908] ata4.00: cmd 25/00:08:e0:66:f6/00:00:96:00:00/e0 tag 18 dma 4096 in
res 51/40:00:e0:66:f6/00:00:96:00:00/e0 Emask 0x9 (media error)
[1533885.204912] ata4.00: status: { DRDY ERR }
[1533885.204913] ata4.00: error: { UNC }
[1533885.212123] ata4.00: configured for UDMA/133
[1533885.212140] sd 3:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
[1533885.212143] sd 3:0:0:0: [sda] tag#18 Sense Key : Medium Error [current]
[1533885.212145] sd 3:0:0:0: [sda] tag#18 Add. Sense: Unrecovered read error - auto reallocate failed
[1533885.212147] sd 3:0:0:0: [sda] tag#18 CDB: Read(10) 28 00 96 f6 66 e0 00 00 08 00
[1533885.212149] I/O error, dev sda, sector 2532730592 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
[1533885.212159] ata4: EH complete
[1533891.792624] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0xc0000 action 0x0
[1533891.792630] ata4.00: irq_stat 0x40000001
[1533891.792631] ata4: SError: { CommWake 10B8B }
[1533891.792634] ata4.00: failed command: READ DMA EXT
[1533891.792635] ata4.00: cmd 25/00:20:60:f7:f7/00:00:96:00:00/e0 tag 13 dma 16384 in
res 51/40:00:70:f7:f7/00:00:96:00:00/e0 Emask 0x9 (media error)
[1533891.792640] ata4.00: status: { DRDY ERR }
[1533891.792641] ata4.00: error: { UNC }
[1533891.799870] ata4.00: configured for UDMA/133
[1533891.799883] sd 3:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=5s
[1533891.799885] sd 3:0:0:0: [sda] tag#13 Sense Key : Medium Error [current]
[1533891.799887] sd 3:0:0:0: [sda] tag#13 Add. Sense: Unrecovered read error - auto reallocate failed
[1533891.799889] sd 3:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 96 f7 f7 60 00 00 20 00
[1533891.799891] I/O error, dev sda, sector 2532833136 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 3
[1533891.799902] ata4: EH complete
[1533896.660005] BTRFS warning (device sda1): i/o error at logical 5181140762624 on dev /dev/sda1, physical 1296753885184: metadata leaf (level 0) in tree 2
[1533896.660012] BTRFS warning (device sda1): i/o error at logical 5181140762624 on dev /dev/sda1, physical 1296753885184: metadata leaf (level 0) in tree 2
[1533896.660016] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 3, rd 556, flush 0, corrupt 0, gen 0
[1533896.660233] BTRFS warning (device sda1): i/o error at logical 5181143891968 on dev /dev/sda1, physical 1296757014528: metadata leaf (level 0) in tree 7
[1533896.660237] BTRFS warning (device sda1): i/o error at logical 5181143891968 on dev /dev/sda1, physical 1296757014528: metadata leaf (level 0) in tree 7
[1533896.660239] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 3, rd 557, flush 0, corrupt 0, gen 0
[1533897.118086] BTRFS error (device sda1): fixed up error at logical 5181143891968 on dev /dev/sda1
[1533897.126813] BTRFS error (device sda1): fixed up error at logical 5181140762624 on dev /dev/sda1
[1534038.019140] BTRFS info (device sda1): scrub: finished on devid 1 with status: 0
...
[1744757.386462] ata4: SATA link down (SStatus 0 SControl 300)
[1744762.810285] ata4: SATA link down (SStatus 0 SControl 300)
[1744768.190059] ata4: SATA link down (SStatus 0 SControl 300)
[1744768.190072] ata4.00: disable device
[1744768.190097] ata4.00: detaching (SCSI 3:0:0:0)
[1744768.295895] sd 3:0:0:0: [sda] Stopping disk
[1744768.295913] sd 3:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[1744830.216445] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4, rd 557, flush 0, corrupt 0, gen 0
[1744830.217535] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 5, rd 557, flush 0, corrupt 0, gen 0
[1744830.217680] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 6, rd 557, flush 0, corrupt 0, gen 0
[1744830.217779] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 7, rd 557, flush 0, corrupt 0, gen 0
[1744830.217863] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 8, rd 557, flush 0, corrupt 0, gen 0
[1744830.224749] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 9, rd 557, flush 0, corrupt 0, gen 0
[1744862.845264] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 10, rd 557, flush 0, corrupt 0, gen 0
[1744862.845483] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 11, rd 557, flush 0, corrupt 0, gen 0
[1744862.845681] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 12, rd 557, flush 0, corrupt 0, gen 0
[1744862.858758] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 13, rd 557, flush 0, corrupt 0, gen 0
[1744863.073335] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1744863.073342] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 14, rd 557, flush 0, corrupt 0, gen 0
[1744863.073347] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1744863.073349] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 15, rd 557, flush 0, corrupt 0, gen 0
[1744863.073353] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1744863.073354] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 16, rd 557, flush 0, corrupt 0, gen 0
[1744863.073394] BTRFS error (device sda1): error writing primary super block to device 1
[1745468.057120] ata5: SATA link down (SStatus 0 SControl 300)
[1745523.320657] ata4: found unknown device (class 0)
[1745527.965324] ata4: softreset failed (1st FIS failed)
[1745533.288241] ata4: found unknown device (class 0)
[1745533.452246] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[1745533.453025] ata4.00: ATA-9: MB2000ECWCR, HPG4, max UDMA/133
[1745533.453306] ata4.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[1745533.454136] ata4.00: configured for UDMA/133
[1745533.464339] scsi 3:0:0:0: Direct-Access ATA MB2000ECWCR HPG4 PQ: 0 ANSI: 5
[1745533.464556] sd 3:0:0:0: Attached scsi generic sg3 type 0
[1745533.464652] sd 3:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[1745533.464667] sd 3:0:0:0: [sdd] Write Protect is off
[1745533.464671] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[1745533.464684] sd 3:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[1745533.464700] sd 3:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
[1745533.492586] sd 3:0:0:0: [sdd] Attached SCSI disk
[1745533.620802] BTRFS: device fsid abc798d4-91bc-4a47-b9f1-1826e230d944 devid 1 transid 14987 /dev/sdd scanned by systemd-udevd (3623259)
[1745909.851279] BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm
[1745909.851286] BTRFS info (device sdd): disk space caching is enabled
[1746006.393946] sdd: sdd1
[1746007.499129] sdd: sdd1
[1746013.998416] sdd: sdd1
[1746151.233000] BTRFS info (device sda1): dev_replace from /dev/sda1 (devid 1) to /dev/sdd1 started
[1746151.233810] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 17, rd 557, flush 0, corrupt 0, gen 0
...
[1746151.392955] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1746151.392959] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 25, rd 557, flush 0, corrupt 0, gen 0
...
[1746151.416764] BTRFS error (device sda1): error writing primary super block to device 1
[1746151.445394] BTRFS warning (device sda1): i/o error at logical 2752700743680 on dev /dev/sda1, physical 1048576, root 257, inode 108025, offset 140640256, length 4096, links 1 (path: CD Images/オーライフジャパン/東方奔放戯.cdda)
[1746151.458920] BTRFS error (device sda1): fixed up error at logical 2752700743680 on dev /dev/sda1
...
[1746156.234501] btrfs_dev_stat_inc_and_print: 59562 callbacks suppressed
[1746156.234505] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 27, rd 60119, flush 0, corrupt 0, gen 0
...
[1746421.382718] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39583744 csum 0xea07a559 expected csum 0x23385e4c mirror 3
[1746421.382720] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39587840 csum 0xd1df5454 expected csum 0x7d24c1a7 mirror 3
[1746421.382726] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39591936 csum 0xefa56703 expected csum 0xf2d812fb mirror 3
[1746421.382731] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39596032 csum 0x34677cd6 expected csum 0x59ab4dfa mirror 3
[1746421.382739] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39600128 csum 0xf9fea637 expected csum 0xe4633da4 mirror 3
[1746421.382742] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39604224 csum 0x1e727699 expected csum 0x9f9dcc9d mirror 3
[1746421.382745] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39612416 csum 0x9db7efbd expected csum 0xb78117b8 mirror 3
[1746421.382746] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39608320 csum 0xe7ebf730 expected csum 0xcf96ce7f mirror 3
[1746421.382747] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39616512 csum 0x71395e4c expected csum 0x12abfd74 mirror 3
[1746421.382748] BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3
[1746421.396876] ------------[ cut here ]------------
[1746421.396876] ------------[ cut here ]------------
[1746421.396878] ------------[ cut here ]------------
[1746421.396878] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.396880] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.396881] ------------[ cut here ]------------
[1746421.396882] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.396884] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.396884] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[1746421.396887] CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.396892] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.396894] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.396926] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.396952] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.396954] RSP: 0018:ffffa9c116d87c70 EFLAGS: 00010293
[1746421.396956] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b1b21ed1e00
[1746421.396957] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b1b21ed1e00
[1746421.396958] RBP: 0000036289aac000 R08: 0000000000000003 R09: 0000000000000001
[1746421.396959] R10: 00000000b7d77995 R11: 0000000000000000 R12: 0000000000001000
[1746421.396961] R13: ffffa9c116d87ca8 R14: ffffd03606151ec0 R15: 0000000000000000
[1746421.396962] FS: 0000000000000000(0000) GS:ffff9b29ff040000(0000) knlGS:0000000000000000
[1746421.396963] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.396965] CR2: 000000195917eff8 CR3: 00000004f7af8000 CR4: 0000000000350ee0
[1746421.396966] Call Trace:
[1746421.396969] <TASK>
[1746421.396972] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.396998] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.397022] ? __switch_to+0x106/0x420
[1746421.397026] process_one_work+0x1c7/0x380
[1746421.397029] worker_thread+0x4d/0x380
[1746421.397031] ? rescuer_thread+0x3a0/0x3a0
[1746421.397033] kthread+0xe9/0x110
[1746421.397035] ? kthread_complete_and_exit+0x20/0x20
[1746421.397037] ret_from_fork+0x22/0x30
[1746421.397041] </TASK>
[1746421.397041] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.397085] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.397128] ---[ end trace 0000000000000000 ]---
[1746421.397128] invalid opcode: 0000 [#2] PREEMPT SMP NOPTI
[1746421.397132] CPU: 11 PID: 3614332 Comm: kworker/u257:4 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.397135] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.397137] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.397171] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.397197] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.397199] RSP: 0018:ffffa9c116d8fc70 EFLAGS: 00010293
[1746421.397201] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b1e19cd26c0
[1746421.397202] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b1e19cd26c0
[1746421.397203] RBP: 0000036289aa9000 R08: 0000000000000003 R09: 0000000000000001
[1746421.397205] R10: 000000001d687359 R11: 0000000000000000 R12: 0000000000001000
[1746421.397206] R13: ffffa9c116d8fca8 R14: ffffd03604eb4500 R15: 0000000000000000
[1746421.397207] FS: 0000000000000000(0000) GS:ffff9b29ff0c0000(0000) knlGS:0000000000000000
[1746421.397209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.397210] CR2: ffff890cb2688008 CR3: 00000001f13a2000 CR4: 0000000000350ee0
[1746421.397211] Call Trace:
[1746421.397213] <TASK>
[1746421.397216] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.397234] ------------[ cut here ]------------
[1746421.397236] ------------[ cut here ]------------
[1746421.397236] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.397238] ------------[ cut here ]------------
[1746421.397240] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.397241] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.397245] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.397270] ? try_to_wake_up+0x93/0x5d0
[1746421.397273] process_one_work+0x1c7/0x380
[1746421.397275] worker_thread+0x4d/0x380
[1746421.397277] ? rescuer_thread+0x3a0/0x3a0
[1746421.397279] kthread+0xe9/0x110
[1746421.397281] ? kthread_complete_and_exit+0x20/0x20
[1746421.397283] ret_from_fork+0x22/0x30
[1746421.397286] </TASK>
[1746421.397287] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.397328] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper
[1746421.397347] ------------[ cut here ]------------
[1746421.397349] kernel BUG at fs/btrfs/extent_io.c:2380!
[1746421.397349] ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.397373] ---[ end trace 0000000000000000 ]---
[1746421.397373] invalid opcode: 0000 [#3] PREEMPT SMP NOPTI
[1746421.397377] CPU: 31 PID: 3625741 Comm: kworker/u257:16 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.397380] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.397382] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.397415] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.397442] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.397445] RSP: 0018:ffffa9c11f907c70 EFLAGS: 00010293
[1746421.397447] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b1ca239cc00
[1746421.397449] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b1ca239cc00
[1746421.397451] RBP: 0000036289aaa000 R08: 0000000000000003 R09: 0000000000000001
[1746421.397453] R10: 00000000767028db R11: 0000000000000000 R12: 0000000000001000
[1746421.397454] R13: ffffa9c11f907ca8 R14: ffffd03637614140 R15: 0000000000000000
[1746421.397456] FS: 0000000000000000(0000) GS:ffff9b29ff2c0000(0000) knlGS:0000000000000000
[1746421.397458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.397460] CR2: 00000000004208a0 CR3: 00000001f13a2000 CR4: 0000000000350ee0
[1746421.397462] Call Trace:
[1746421.397465] <TASK>
[1746421.397469] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.397496] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.397521] ? try_to_wake_up+0x93/0x5d0
[1746421.397525] process_one_work+0x1c7/0x380
[1746421.397528] worker_thread+0x4d/0x380
[1746421.397530] ? _raw_spin_lock_irqsave+0x23/0x50
[1746421.397534] ? rescuer_thread+0x3a0/0x3a0
[1746421.397536] kthread+0xe9/0x110
[1746421.397539] ? kthread_complete_and_exit+0x20/0x20
[1746421.397541] ret_from_fork+0x22/0x30
[1746421.397545] </TASK>
[1746421.397546] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.397613] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.397674] invalid opcode: 0000 [#4] PREEMPT SMP NOPTI
[1746421.397677] CPU: 1 PID: 3622076 Comm: kworker/u257:3 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.397680] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.397681] ---[ end trace 0000000000000000 ]---
[1746421.397682] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.397715] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.397741] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.397743] RSP: 0018:ffffa9c1199d3c70 EFLAGS: 00010293
[1746421.397745] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b1f09c18540
[1746421.397747] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b1f09c18540
[1746421.397748] RBP: 0000036289aab000 R08: 0000000000000003 R09: 0000000000000001
[1746421.397750] R10: 000000007986f0fc R11: 0000000000000000 R12: 0000000000001000
[1746421.397751] R13: ffffa9c1199d3ca8 R14: ffffd03604faf480 R15: 0000000000000000
[1746421.397753] FS: 0000000000000000(0000) GS:ffff9b29fee40000(0000) knlGS:0000000000000000
[1746421.397755] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.397756] CR2: 000001e03248a23e CR3: 000000010efba000 CR4: 0000000000350ee0
[1746421.397758] Call Trace:
[1746421.397761] <TASK>
[1746421.397765] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.397791] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.397815] ? try_to_wake_up+0x93/0x5d0
[1746421.397819] process_one_work+0x1c7/0x380
[1746421.397823] worker_thread+0x4d/0x380
[1746421.397825] ? rescuer_thread+0x3a0/0x3a0
[1746421.397827] kthread+0xe9/0x110
[1746421.397829] ? kthread_complete_and_exit+0x20/0x20
[1746421.397831] ret_from_fork+0x22/0x30
[1746421.397836] </TASK>
[1746421.397837] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.397900] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.397952] ---[ end trace 0000000000000000 ]---
[1746421.397953] invalid opcode: 0000 [#5] PREEMPT SMP NOPTI
[1746421.397957] CPU: 6 PID: 3625697 Comm: kworker/u257:15 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.397960] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.397962] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.397993] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.398019] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.398022] RSP: 0018:ffffa9c126227c70 EFLAGS: 00010293
[1746421.398024] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b20affa32c0
[1746421.398025] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b20affa32c0
[1746421.398027] RBP: 0000036289aae000 R08: 0000000000000003 R09: 0000000000000001
[1746421.398029] R10: 000000001ec1455d R11: 0000000000000000 R12: 0000000000001000
[1746421.398030] R13: ffffa9c126227ca8 R14: ffffd03606a85740 R15: 0000000000000000
[1746421.398032] FS: 0000000000000000(0000) GS:ffff9b29fef80000(0000) knlGS:0000000000000000
[1746421.398033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.398035] CR2: 00000000002503c0 CR3: 00000001810f8000 CR4: 0000000000350ee0
[1746421.398037] Call Trace:
[1746421.398038] <TASK>
[1746421.398041] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.398068] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.398093] ? try_to_wake_up+0x93/0x5d0
[1746421.398095] process_one_work+0x1c7/0x380
[1746421.398098] worker_thread+0x4d/0x380
[1746421.398100] ? _raw_spin_lock_irqsave+0x23/0x50
[1746421.398103] ? rescuer_thread+0x3a0/0x3a0
[1746421.398105] kthread+0xe9/0x110
[1746421.398107] ? kthread_complete_and_exit+0x20/0x20
[1746421.398109] ret_from_fork+0x22/0x30
[1746421.398112] </TASK>
[1746421.398113] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.398174] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.398216] ---[ end trace 0000000000000000 ]---
[1746421.398217] invalid opcode: 0000 [#6] PREEMPT SMP NOPTI
[1746421.398220] CPU: 8 PID: 3601183 Comm: kworker/u257:5 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.398223] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.398225] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.398255] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.398282] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.398284] RSP: 0018:ffffa9c11d1e7c70 EFLAGS: 00010293
[1746421.398287] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b2213d78cc0
[1746421.398288] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b2213d78cc0
[1746421.398290] RBP: 0000036289aad000 R08: 0000000000000003 R09: 0000000000000001
[1746421.398291] R10: 00000000aa2f26bf R11: 0000000000000000 R12: 0000000000001000
[1746421.398293] R13: ffffa9c11d1e7ca8 R14: ffffd036044b0980 R15: 0000000000000000
[1746421.398294] FS: 0000000000000000(0000) GS:ffff9b29ff000000(0000) knlGS:0000000000000000
[1746421.398296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.398298] CR2: 00000000004208a0 CR3: 000000017df8a000 CR4: 0000000000350ee0
[1746421.398300] Call Trace:
[1746421.398301] <TASK>
[1746421.398304] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.398330] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.398355] ? try_to_wake_up+0x93/0x5d0
[1746421.398358] process_one_work+0x1c7/0x380
[1746421.398360] worker_thread+0x4d/0x380
[1746421.398363] ? rescuer_thread+0x3a0/0x3a0
[1746421.398365] kthread+0xe9/0x110
[1746421.398366] ? kthread_complete_and_exit+0x20/0x20
[1746421.398368] ret_from_fork+0x22/0x30
[1746421.398372] </TASK>
[1746421.398373] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.398427] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.398468] ---[ end trace 0000000000000000 ]---
[1746421.398469] invalid opcode: 0000 [#7] PREEMPT SMP NOPTI
[1746421.398473] CPU: 2 PID: 3626648 Comm: kworker/u257:17 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.398475] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.398477] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.398507] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.398533] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.398535] RSP: 0018:ffffa9c127fc3c70 EFLAGS: 00010293
[1746421.398537] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b1b0f805800
[1746421.398539] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b1b0f805800
[1746421.398540] RBP: 0000036289aaf000 R08: 0000000000000003 R09: 0000000000000001
[1746421.398542] R10: 00000000aa522b07 R11: 0000000000000000 R12: 0000000000001000
[1746421.398543] R13: ffffa9c127fc3ca8 R14: ffffd03605715580 R15: 0000000000000000
[1746421.398545] FS: 0000000000000000(0000) GS:ffff9b29fee80000(0000) knlGS:0000000000000000
[1746421.398547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.398548] CR2: 00007ff6eeed0000 CR3: 00000004f7af8000 CR4: 0000000000350ee0
[1746421.398550] Call Trace:
[1746421.398552] <TASK>
[1746421.398555] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.398580] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.398604] ? __switch_to+0x106/0x420
[1746421.398609] process_one_work+0x1c7/0x380
[1746421.398612] worker_thread+0x4d/0x380
[1746421.398614] ? _raw_spin_lock_irqsave+0x23/0x50
[1746421.398618] ? rescuer_thread+0x3a0/0x3a0
[1746421.398620] kthread+0xe9/0x110
[1746421.398622] ? kthread_complete_and_exit+0x20/0x20
[1746421.398624] ret_from_fork+0x22/0x30
[1746421.398627] </TASK>
[1746421.398628] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.398684] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.398727] invalid opcode: 0000 [#8] PREEMPT SMP NOPTI
[1746421.398730] CPU: 10 PID: 3624726 Comm: kworker/u257:6 Tainted: G D OE 6.0.0-5-amd64 #1 Debian 6.0.10-2
[1746421.398733] Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
[1746421.398740] ---[ end trace 0000000000000000 ]---
[1746421.398735] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[1746421.398772] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
[1746421.398801] Code: dc cd be 01 00 00 00 48 89 df e8 e1 be 06 00 e9 5f fe ff ff 0f 0b 41 bf fb ff ff ff eb e4 31 f6 48 89 ef e8 f8 87 01 00 eb d0 <0f> 0b e8 ff 08 2f ce 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[1746421.398803] RSP: 0018:ffffa9c122efbc70 EFLAGS: 00010293
[1746421.398804] RAX: 0000000000000003 RBX: ffff9b1b14f30000 RCX: ffff9b1bb7ee4900
[1746421.398806] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9b1bb7ee4900
[1746421.398807] RBP: 0000036289ab2000 R08: 0000000000000003 R09: 0000000000000001
[1746421.398808] R10: 0000000008084d4e R11: 0000000000000000 R12: 0000000000001000
[1746421.398809] R13: ffffa9c122efbca8 R14: ffffd036060dc9c0 R15: 0000000000000000
[1746421.398811] FS: 0000000000000000(0000) GS:ffff9b29ff080000(0000) knlGS:0000000000000000
[1746421.398812] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1746421.398813] CR2: ffff890cb2688008 CR3: 00000001f13a2000 CR4: 0000000000350ee0
[1746421.398815] Call Trace:
[1746421.398816] <TASK>
[1746421.398818] clean_io_failure+0x14d/0x180 [btrfs]
[1746421.398845] end_bio_extent_readpage+0x412/0x6e0 [btrfs]
[1746421.398869] ? __switch_to+0x236/0x420
[1746421.398872] process_one_work+0x1c7/0x380
[1746421.398875] worker_thread+0x4d/0x380
[1746421.398877] ? _raw_spin_lock_irqsave+0x23/0x50
[1746421.398880] ? rescuer_thread+0x3a0/0x3a0
[1746421.398882] kthread+0xe9/0x110
[1746421.398883] ? kthread_complete_and_exit+0x20/0x20
[1746421.398885] ret_from_fork+0x22/0x30
[1746421.398888] </TASK>
[1746421.398889] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nfnetlink cpuid uinput snd_seq_dummy snd_seq cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii ext4 mbcache jbd2 uas usb_storage msr vhost_net vhost vhost_iotlb tap tun bridge 8021q garp stp max6697(OE) mrp llc qrtr bnep i2c_tiny_usb binfmt_misc nls_ascii nls_cp437 vfat fat btusb btrtl btbcm btintel btmtk intel_rapl_msr bluetooth intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi jitterentropy_rng snd_usbmidi_lib kvm_amd snd_hda_codec snd_rawmidi drbg snd_seq_device iwlwifi snd_hda_core kvm mc ansi_cprng snd_hwdep joydev ecdh_generic cfg80211 snd_pcm ecc rapl crc16 snd_timer zenpower(OE) wmi_bmof snd ccp sp5100_tco pcspkr rfkill soundcore k10temp rng_core watchdog acpi_cpufreq sg evdev nct6775 nct6775_core hwmon_vid jc42 fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress amdgpu hid_generic usbhid hid
[1746421.398925] vendor_reset(OE) sha512_ssse3 sha512_generic vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod gpu_sched sr_mod xor raid6_pq drm_buddy libcrc32c cdrom sd_mod crc32c_generic drm_display_helper dm_mod crc32_pclmul cec crc32c_intel rc_core drm_ttm_helper ahci ttm libahci ghash_clmulni_intel libata drm_kms_helper xhci_pci nvme aesni_intel xhci_hcd nvme_core scsi_mod igb crypto_simd cryptd drm t10_pi firewire_ohci dca usbcore ptp crc64_rocksoft_generic firewire_core pps_core scsi_common crc64_rocksoft i2c_algo_bit crc_t10dif crc_itu_t crct10dif_generic crct10dif_pclmul i2c_piix4 usb_common crc64 crct10dif_common mxm_wmi wmi button [last unloaded: vboxdrv(OE)]
[1746421.398962] ---[ end trace 0000000000000000 ]---
[1746421.488952] btrfs_dev_stat_inc_and_print: 62002 callbacks suppressed
[1746421.488957] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 88, rd 3419233, flush 0, corrupt 0, gen 0
...
[1762175.349337] BTRFS error (device sda1): fixed up error at logical 6909075116032 on dev /dev/sda1
[1762176.154002] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1762176.154009] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1762176.154012] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
[1762176.172957] BTRFS error (device sda1): error writing primary super block to device 1
[1762176.196418] BTRFS info (device sda1): dev_replace from /dev/sda1 (devid 1) to /dev/sdd1 finished
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buggy behaviour after replacing failed disk in RAID1
2023-01-01 0:05 ` Buggy behaviour after replacing failed disk in RAID1 小太
@ 2023-01-01 0:38 ` Qu Wenruo
2023-01-01 5:03 ` 小太
0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2023-01-01 0:38 UTC (permalink / raw)
To: linux-btrfs
On 2023/1/1 08:05, 小太 wrote:
> dmesg (with spammed similar lines removed): See attachment
> uname -a: Linux home.kota.moe 6.0.0-5-amd64 #1 SMP PREEMPT_DYNAMIC
> Debian 6.0.10-2 (2022-12-01) x86_64 GNU/Linux
> btrfs --version: btrfs-progs v6.1
>
> Recently one of my disks in my BTRFS RAID1 array failed, so I
> disconnected it from the system:
>
>> kota@home:~$ sudo btrfs fi show
>> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
>> Total devices 3 FS bytes used 1.47TiB
>> devid 1 size 0 used 0 path /dev/sda1 MISSING
>> devid 2 size 1.82TiB used 956.00GiB path /dev/sdc1
>> devid 3 size 1.82TiB used 1.35TiB path /dev/sdb1
Weirdly, the dmesg is not showing devid 1 missing, in fact, it still
shows the devices is there, just tons of IO errors (ata4, sd 3:0:0:0)
I guess that has something to do with the BUG_ON() you hit.
>
> I then connected a new disk, and started a btrfs replace:
>
>> kota@home:~$ sudo btrfs replace start 1 /dev/sdd1 /media/Data
>> kota@home:~$ sudo btrfs fi show
>> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
>> Total devices 4 FS bytes used 1.47TiB
>> devid 0 size 1.36TiB used 925.03GiB path /dev/sdd1
>> devid 1 size 0 used 0 path /dev/sda1 MISSING
>> devid 2 size 1.82TiB used 956.00GiB path /dev/sdc1
>> devid 3 size 1.82TiB used 1.35TiB path /dev/sdb1
>> kota@home:~$ sudo btrfs replace status /media/Data
>> Started on 31.Dec 22:03:20, finished on 1.Jan 02:30:26, 0 write errs, 0 uncorr. read errs
>
>
> This operation spammed my dmesg with lines all looking like the following:
>
>> [1762170.345526] BTRFS error (device sda1): fixed up error at logical 6863996235776 on dev /dev/sda1
>> ...
>> [1762174.289119] btrfs_dev_stat_inc_and_print: 91111 callbacks suppressed
>> [1762174.289123] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4147, rd 220063627, flush 0, corrupt 0, gen 0
>> ...
>> [1762175.320807] scrub_handle_errored_block: 91075 callbacks suppressed
>> [1762175.320850] BTRFS warning (device sda1): i/o error at logical 6909072957440 on dev /dev/sda1, physical 1395819732992, root 257, inode 452632, offset 1293344768, length 4096, links 1 (path: phoronix-test-suite/config/installed-tests/pts/unigine-super-1.0.7/Unigine_Superposition-1.0.run)
>> ...
>> [1762175.348848] scrub_handle_errored_block: 91080 callbacks suppressed
>> [1762175.348851] BTRFS error (device sda1): fixed up error at logical 6909075079168 on dev /dev/sda1
>> ...
>> [1762176.154002] BTRFS warning (device sda1): lost page write due to IO error on /dev/sda1 (-5)
>> ...
>> [1762176.172957] BTRFS error (device sda1): error writing primary super block to device 1
>>
>> [1762176.196418] BTRFS info (device sda1): dev_replace from /dev/sda1 (devid 1) to /dev/sdd1 finished
In fact it didn't finished.
Several BUG_ON() triggered before it, mostly crashing the fs.
The BUG_ON() itself happens for the IO repair path, which would
writeback the repaired contents back the corrupted disk.
The problem is the writeback target, it looks like at the writeback
time, we have already finished the replace, causing the mirror_num mismatch.
Normally this would be avoided if the target device is completely
missing, but in your case, the corrupted disk is still accessible to
kernel, causing the problem.
>
>
> Once it finished, I wanted to check the filesystem and do a scrub:
>
>> kota@home:~$ sudo btrfs fi show
>> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
>> Total devices 4 FS bytes used 1.47TiB
>> devid 0 size 0 used 0 path /dev/sda1 MISSING
>> devid 1 size 1.36TiB used 925.03GiB path /dev/sdd1
>> devid 2 size 1.82TiB used 956.03GiB path /dev/sdc1
>> devid 3 size 1.82TiB used 1.35TiB path /dev/sdb1
>> kota@home:~$ sudo btrfs fi df /media/Data
>> Data, RAID1: total=1.59TiB, used=1.46TiB
>> System, RAID1: total=64.00MiB, used=272.00KiB
>> Metadata, RAID1: total=5.00GiB, used=3.42GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> kota@home:~$ sudo btrfs device delete missing /media/Data
>> ERROR: unable to start device remove, another exclusive operation 'device replace' in progress
>> kota@home:~$ sudo btrfs scrub start /media/Data
>
> At this point, the command froze and has been stuck in uninterruptible
> sleep ever since (and no further lines in dmesg either).
> "btrfs device stats" and "btrfs scrub status" also hang in
> uninterruptible sleep as well.
> And I notice the original "btrfs replace" command seems to be still running?
>
>> kota@home:~$ ps aux | grep btrfs
>> root 3626272 0.5 0.0 7044 2312 ? Ds 2022 3:56 btrfs replace start 1 /dev/sdd1 /media/Data
>> root 3832422 0.0 0.0 4948 1408 pts/4 D+ 09:54 0:00 btrfs scrub start /media/Data
>> root 3832891 0.0 0.0 4948 1412 pts/7 D+ 09:56 0:00 btrfs scrub status /media/Data
>> root 3832943 0.0 0.0 4948 1452 pts/9 D+ 09:56 0:00 btrfs device stats /media/Data
>
> And all (non-cached) accesses to the filesystem (such as ls -l) also
> hang similarly.
That's caused by the kernel panic.
>
> Given that this is a "reliable" RAID1 setup where a disk dropping dead
> shouldn't cause problems, BTRFS failing like this sounds like a bug to
> me.
As explained, the unreliable disk itself seems to be cause.
If you initially removed the hard disk completely, btrfs then can handle
it well.
(Sure, this is a bug in btrfs and we should be able to fix it).
> What's going on here, and how should I recover from this?
Since btrfs module crashed, you have to reset the system.
And if you're going to reset the system, please remove the offending
disk (sda) completely.
Then at next bootup, the fs can not be mounted without "-o degraded".
And at next mount, btrfs may or may not resume the replace.
If btrfs didn't resume the replace, it means the previous replace finished.
If btrfs did resume the replace, let it finish first.
Then run a scrub to fix any corrupted sectors which didn't properly get
repaired during replace.
Thanks,
Qu
> I haven't tried remounting or restarting the system yet since this
> isn't a critical filesystem for me, and it might help debug the issue.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buggy behaviour after replacing failed disk in RAID1
2023-01-01 0:38 ` Qu Wenruo
@ 2023-01-01 5:03 ` 小太
2023-01-01 9:27 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: 小太 @ 2023-01-01 5:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Qu Wenruo
> Weirdly, the dmesg is not showing devid 1 missing, in fact, it still
> shows the devices is there, just tons of IO errors (ata4, sd 3:0:0:0)
> If you initially removed the hard disk completely, btrfs then can handle
> it well.
> (Sure, this is a bug in btrfs and we should be able to fix it).
I did completely remove the drive. In fact, I used the very same SATA port for
the replacement drive. See my dmesg lines:
> [1744757.386462] ata4: SATA link down (SStatus 0 SControl 300)
> [1744762.810285] ata4: SATA link down (SStatus 0 SControl 300)
> [1744768.190059] ata4: SATA link down (SStatus 0 SControl 300)
> [1744768.190072] ata4.00: disable device
> [1744768.190097] ata4.00: detaching (SCSI 3:0:0:0)
> [1744768.295895] sd 3:0:0:0: [sda] Stopping disk
> [1744768.295913] sd 3:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> ...
> [1745523.320657] ata4: found unknown device (class 0)
> [1745527.965324] ata4: softreset failed (1st FIS failed)
> [1745533.288241] ata4: found unknown device (class 0)
> [1745533.452246] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [1745533.453025] ata4.00: ATA-9: MB2000ECWCR, HPG4, max UDMA/133
> [1745533.453306] ata4.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [1745533.454136] ata4.00: configured for UDMA/133
> [1745533.464339] scsi 3:0:0:0: Direct-Access ATA MB2000ECWCR HPG4 PQ: 0 ANSI: 5
> [1745533.464556] sd 3:0:0:0: Attached scsi generic sg3 type 0
> [1745533.464652] sd 3:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> [1745533.464667] sd 3:0:0:0: [sdd] Write Protect is off
> [1745533.464671] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> [1745533.464684] sd 3:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [1745533.464700] sd 3:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
> [1745533.492586] sd 3:0:0:0: [sdd] Attached SCSI disk
I also verified that the device file /dev/sda was also gone at the time (despite
"btrfs fi show" thinking it still exists).
Maybe there's some other bug where the kernel still thinks the drive exists, even
though it was disconnected?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buggy behaviour after replacing failed disk in RAID1
2023-01-01 5:03 ` 小太
@ 2023-01-01 9:27 ` Qu Wenruo
2023-01-01 23:31 ` 小太
0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2023-01-01 9:27 UTC (permalink / raw)
To: 小太, linux-btrfs
On 2023/1/1 13:03, 小太 wrote:
>> Weirdly, the dmesg is not showing devid 1 missing, in fact, it still
>> shows the devices is there, just tons of IO errors (ata4, sd 3:0:0:0)
>
>> If you initially removed the hard disk completely, btrfs then can handle
>> it well.
>> (Sure, this is a bug in btrfs and we should be able to fix it).
>
> I did completely remove the drive. In fact, I used the very same SATA port for
> the replacement drive. See my dmesg lines:
>
>> [1744757.386462] ata4: SATA link down (SStatus 0 SControl 300)
>> [1744762.810285] ata4: SATA link down (SStatus 0 SControl 300)
>> [1744768.190059] ata4: SATA link down (SStatus 0 SControl 300)
>> [1744768.190072] ata4.00: disable device
>> [1744768.190097] ata4.00: detaching (SCSI 3:0:0:0)
>> [1744768.295895] sd 3:0:0:0: [sda] Stopping disk
>> [1744768.295913] sd 3:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> ...
>> [1745523.320657] ata4: found unknown device (class 0)
>> [1745527.965324] ata4: softreset failed (1st FIS failed)
>> [1745533.288241] ata4: found unknown device (class 0)
>> [1745533.452246] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [1745533.453025] ata4.00: ATA-9: MB2000ECWCR, HPG4, max UDMA/133
>> [1745533.453306] ata4.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 32), AA
>> [1745533.454136] ata4.00: configured for UDMA/133
>> [1745533.464339] scsi 3:0:0:0: Direct-Access ATA MB2000ECWCR HPG4 PQ: 0 ANSI: 5
>> [1745533.464556] sd 3:0:0:0: Attached scsi generic sg3 type 0
>> [1745533.464652] sd 3:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
>> [1745533.464667] sd 3:0:0:0: [sdd] Write Protect is off
>> [1745533.464671] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
>> [1745533.464684] sd 3:0:0:0: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
>> [1745533.464700] sd 3:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
>> [1745533.492586] sd 3:0:0:0: [sdd] Attached SCSI disk
>
> I also verified that the device file /dev/sda was also gone at the time (despite
> "btrfs fi show" thinking it still exists).
OK, I guess there is really no way to let btrfs to release that faulty
device until it got replaced.
And since the root cause is not the hanging sda, but a bug in btrfs
repair code (patch already sent), it's unrelated to the bug.
Thanks,
Qu
> Maybe there's some other bug where the kernel still thinks the drive exists, even
> though it was disconnected?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buggy behaviour after replacing failed disk in RAID1
2023-01-01 9:27 ` Qu Wenruo
@ 2023-01-01 23:31 ` 小太
2023-01-02 0:15 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: 小太 @ 2023-01-01 23:31 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
For posterity, I will record what happened after I restarted my system
in case anyone else encounters my problem.
On Sun, 1 Jan 2023 at 11:38, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> Since btrfs module crashed, you have to reset the system.
>
> And if you're going to reset the system, please remove the offending
> disk (sda) completely.
>
> Then at next bootup, the fs can not be mounted without "-o degraded".
>
> And at next mount, btrfs may or may not resume the replace.
> If btrfs didn't resume the replace, it means the previous replace finished.
> If btrfs did resume the replace, let it finish first.
>
> Then run a scrub to fix any corrupted sectors which didn't properly get
> repaired during replace.
I restarted my system today (which ended up requiring the magic sysrq
sequence to get around the processes stuck in uninterruptible sleep).
Indeed afterwards the file system needed to be mounted with -o
degraded, but it finished the replace immediately afterwards:
> kota@home:~$ sudo btrfs fi show
> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
> Total devices 3 FS bytes used 1.47TiB
> devid 0 size 1.36TiB used 925.03GiB path /dev/sdb1
> devid 2 size 1.82TiB used 956.03GiB path /dev/sda1
> devid 3 size 1.82TiB used 1.35TiB path /dev/sdc1
> kota@home:~$ sudo mount /media/Data -o degraded
> kota@home:~$ sudo dmesg
> ...
> [ 261.442582] BTRFS info (device sdb1): using crc32c (crc32c-intel) checksum algorithm
> [ 261.442590] BTRFS info (device sdb1): allowing degraded mounts
> [ 261.442592] BTRFS info (device sdb1): disk space caching is enabled
> [ 261.442917] BTRFS warning (device sdb1): devid 1 uuid 1f6d045d-ff05-43ee-99b6-0517b0240656 is missing
> [ 261.460749] BTRFS warning (device sdb1): devid 1 uuid 1f6d045d-ff05-43ee-99b6-0517b0240656 is missing
> [ 261.729290] BTRFS info (device sdb1): bdev (efault) errs: wr 4147, rd 220094697, flush 0, corrupt 0, gen 0
> [ 261.729294] BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 521, gen 0
> [ 268.369240] BTRFS info (device sdb1): continuing dev_replace from <missing disk> (devid 1) to target /dev/sdb1 @93%
> [ 269.123140] BTRFS info (device sdb1): dev_replace from <missing disk> (devid 1) to /dev/sdb1 finished
> kota@home:~$ sudo btrfs fi show
> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
> Total devices 3 FS bytes used 1.47TiB
> devid 1 size 1.36TiB used 925.03GiB path /dev/sdb1
> devid 2 size 1.82TiB used 956.03GiB path /dev/sda1
> devid 3 size 1.82TiB used 1.35TiB path /dev/sdc1
Performing the scrub afterwards showed there were no errors, despite
"btrfs device stats" claiming there were corruptions
> kota@home:~$ sudo btrfs device stats /media/Data/
> [/dev/sdb1].write_io_errs 0
> [/dev/sdb1].read_io_errs 0
> [/dev/sdb1].flush_io_errs 0
> [/dev/sdb1].corruption_errs 521
> [/dev/sdb1].generation_errs 0
> [/dev/sda1].write_io_errs 0
> [/dev/sda1].read_io_errs 0
> [/dev/sda1].flush_io_errs 0
> [/dev/sda1].corruption_errs 0
> [/dev/sda1].generation_errs 0
> [/dev/sdc1].write_io_errs 0
> [/dev/sdc1].read_io_errs 0
> [/dev/sdc1].flush_io_errs 0
> [/dev/sdc1].corruption_errs 0
> [/dev/sdc1].generation_errs 0
> kota@home:~$ sudo btrfs scrub start /media/Data
> scrub started on /media/Data, fsid b7e4da98-b885-4e70-b0a4-510fb77fa744 (pid=5993)
> kota@home:~$ sudo btrfs scrub status /media/Data
> UUID: b7e4da98-b885-4e70-b0a4-510fb77fa744
> Scrub started: Sun Jan 1 22:00:57 2023
> Status: finished
> Duration: 3:13:29
> Total to scrub: 2.94TiB
> Rate: 265.23MiB/s
> Error summary: no errors found
> kota@home:~$ sudo dmesg
> ...
> [ 320.159758] BTRFS info (device sdb1): scrub: started on devid 1
> [ 320.180088] BTRFS info (device sdb1): scrub: started on devid 2
> [ 320.180491] BTRFS info (device sdb1): scrub: started on devid 3
> [11864.015290] BTRFS info (device sdb1): scrub: finished on devid 1 with status: 0
> [11920.924076] BTRFS info (device sdb1): scrub: finished on devid 2 with status: 0
> [11928.142109] BTRFS info (device sdb1): scrub: finished on devid 3 with status: 0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buggy behaviour after replacing failed disk in RAID1
2023-01-01 23:31 ` 小太
@ 2023-01-02 0:15 ` Qu Wenruo
0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2023-01-02 0:15 UTC (permalink / raw)
Cc: linux-btrfs
On 2023/1/2 07:31, 小太 wrote:
> For posterity, I will record what happened after I restarted my system
> in case anyone else encounters my problem.
>
> On Sun, 1 Jan 2023 at 11:38, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> Since btrfs module crashed, you have to reset the system.
>>
>> And if you're going to reset the system, please remove the offending
>> disk (sda) completely.
>>
>> Then at next bootup, the fs can not be mounted without "-o degraded".
>>
>> And at next mount, btrfs may or may not resume the replace.
>> If btrfs didn't resume the replace, it means the previous replace finished.
>> If btrfs did resume the replace, let it finish first.
>>
>> Then run a scrub to fix any corrupted sectors which didn't properly get
>> repaired during replace.
>
> I restarted my system today (which ended up requiring the magic sysrq
> sequence to get around the processes stuck in uninterruptible sleep).
> Indeed afterwards the file system needed to be mounted with -o
> degraded, but it finished the replace immediately afterwards:
>
>> kota@home:~$ sudo btrfs fi show
>> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
>> Total devices 3 FS bytes used 1.47TiB
>> devid 0 size 1.36TiB used 925.03GiB path /dev/sdb1
>> devid 2 size 1.82TiB used 956.03GiB path /dev/sda1
>> devid 3 size 1.82TiB used 1.35TiB path /dev/sdc1
>> kota@home:~$ sudo mount /media/Data -o degraded
>> kota@home:~$ sudo dmesg
>> ...
>> [ 261.442582] BTRFS info (device sdb1): using crc32c (crc32c-intel) checksum algorithm
>> [ 261.442590] BTRFS info (device sdb1): allowing degraded mounts
>> [ 261.442592] BTRFS info (device sdb1): disk space caching is enabled
>> [ 261.442917] BTRFS warning (device sdb1): devid 1 uuid 1f6d045d-ff05-43ee-99b6-0517b0240656 is missing
>> [ 261.460749] BTRFS warning (device sdb1): devid 1 uuid 1f6d045d-ff05-43ee-99b6-0517b0240656 is missing
>> [ 261.729290] BTRFS info (device sdb1): bdev (efault) errs: wr 4147, rd 220094697, flush 0, corrupt 0, gen 0
>> [ 261.729294] BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 521, gen 0
>> [ 268.369240] BTRFS info (device sdb1): continuing dev_replace from <missing disk> (devid 1) to target /dev/sdb1 @93%
>> [ 269.123140] BTRFS info (device sdb1): dev_replace from <missing disk> (devid 1) to /dev/sdb1 finished
>> kota@home:~$ sudo btrfs fi show
>> Label: none uuid: b7e4da98-b885-4e70-b0a4-510fb77fa744
>> Total devices 3 FS bytes used 1.47TiB
>> devid 1 size 1.36TiB used 925.03GiB path /dev/sdb1
>> devid 2 size 1.82TiB used 956.03GiB path /dev/sda1
>> devid 3 size 1.82TiB used 1.35TiB path /dev/sdc1
>
> Performing the scrub afterwards showed there were no errors, despite
> "btrfs device stats" claiming there were corruptions
That's fine, I believe now the sdb1 is the old sdd1, which during
replace shows some false errors (the ones with mirror 3 during replace).
You can use "btrfs device status -z <device>" to clear the false errors.
Thanks,
Qu
>
>> kota@home:~$ sudo btrfs device stats /media/Data/
>> [/dev/sdb1].write_io_errs 0
>> [/dev/sdb1].read_io_errs 0
>> [/dev/sdb1].flush_io_errs 0
>> [/dev/sdb1].corruption_errs 521
>> [/dev/sdb1].generation_errs 0
>> [/dev/sda1].write_io_errs 0
>> [/dev/sda1].read_io_errs 0
>> [/dev/sda1].flush_io_errs 0
>> [/dev/sda1].corruption_errs 0
>> [/dev/sda1].generation_errs 0
>> [/dev/sdc1].write_io_errs 0
>> [/dev/sdc1].read_io_errs 0
>> [/dev/sdc1].flush_io_errs 0
>> [/dev/sdc1].corruption_errs 0
>> [/dev/sdc1].generation_errs 0
>> kota@home:~$ sudo btrfs scrub start /media/Data
>> scrub started on /media/Data, fsid b7e4da98-b885-4e70-b0a4-510fb77fa744 (pid=5993)
>> kota@home:~$ sudo btrfs scrub status /media/Data
>> UUID: b7e4da98-b885-4e70-b0a4-510fb77fa744
>> Scrub started: Sun Jan 1 22:00:57 2023
>> Status: finished
>> Duration: 3:13:29
>> Total to scrub: 2.94TiB
>> Rate: 265.23MiB/s
>> Error summary: no errors found
>> kota@home:~$ sudo dmesg
>> ...
>> [ 320.159758] BTRFS info (device sdb1): scrub: started on devid 1
>> [ 320.180088] BTRFS info (device sdb1): scrub: started on devid 2
>> [ 320.180491] BTRFS info (device sdb1): scrub: started on devid 3
>> [11864.015290] BTRFS info (device sdb1): scrub: finished on devid 1 with status: 0
>> [11920.924076] BTRFS info (device sdb1): scrub: finished on devid 2 with status: 0
>> [11928.142109] BTRFS info (device sdb1): scrub: finished on devid 3 with status: 0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-01-02 0:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CACsxjPaFgBMRkeEgbHcGwM7czSrjtakX9hSKXQq7RL2wJZYYCA@mail.gmail.com>
2023-01-01 0:05 ` Buggy behaviour after replacing failed disk in RAID1 小太
2023-01-01 0:38 ` Qu Wenruo
2023-01-01 5:03 ` 小太
2023-01-01 9:27 ` Qu Wenruo
2023-01-01 23:31 ` 小太
2023-01-02 0:15 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox