Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
@ 2026-03-31 20:39 Jaron Viëtor
  2026-03-31 21:13 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Jaron Viëtor @ 2026-03-31 20:39 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have a machine with a 4-drive RAID1 btrfs filesystem attached to it
over a USB3-to-SATA bridge. They are not connected directly over SATA
because it's an Intel NUC, so USB3 is pretty much the only sensible
option.
A few days ago, one of the drives started failing so I connected a
second USB-TO-SATA bridge (the first one could only hold 4 drives),
inserted a new drive into the new bridge, and ran:

btrfs replace start -r 7 /dev/sdk1 /media

Devid 7 was the failing drive, and sdk1 is the new (larger) replacement.
This all went fine so far, and the replace was happily chugging along
for several hours.
Unfortunately, at around 7.1% done a kernel crash happened (I believe
it was unrelated to the replace operation, but can't be sure -
unfortunately I didn't save the errors it printed) and I had to reboot
the machine.

After the reboot, attempting to mount the filesystem gives these messages:

BTRFS info (device sdk1): first mount of filesystem
d18c93f8-d80a-4aa7-adc5-86d457ddde20
BTRFS info (device sdk1): using crc32c (crc32c-lib) checksum algorithm
BTRFS error (device sdk1): devid 0 path /dev/sdk1 is registered but
not found in chunk tree
BTRFS error (device sdk1): remove the above devices or use 'btrfs
device scan --forget <dev>' to unregister them before mount
BTRFS error (device sdk1): open_ctree failed: -117

Either running that command and/or unplugging the new replacement
drive, instead gives me these (either action results in the same
messages):

BTRFS info (device sdg1): first mount of filesystem
d18c93f8-d80a-4aa7-adc5-86d457ddde20
BTRFS info (device sdg1): using crc32c (crc32c-lib) checksum algorithm
BTRFS info (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 16, flush 0,
corrupt 1054, gen 0
BTRFS info (device sdg1): bdev /dev/sdh1 errs: wr 0, rd 0, flush 0,
corrupt 379, gen 0
BTRFS info (device sdg1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0,
corrupt 1652, gen 0
BTRFS info (device sdg1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0,
corrupt 1522, gen 0
BTRFS warning (device sdg1): cannot mount because device replace
operation is ongoing and
BTRFS warning (device sdg1): tgtdev (devid 0) is missing, need to run
'btrfs dev scan'?
BTRFS error (device sdg1): failed to init dev_replace: -5
BTRFS error (device sdg1): open_ctree failed: -5

So... it seems to be stuck thinking the new drive both should -and-
shouldn't be there. Huh.

I already asked for help with this issue in the IRC channel, but the
friendly folks there told me after some debugging that this was a
problem for the mailing list. So... here I am!
I do have access to other machines I could potentially connect the
drives directly to... but I'm not inclined to think the USB-to-SATA
bridge(s) is/are the problem here. (Unless somebody here says
otherwise, of course.)

Thanks in advance for any help you may be able to provide!

Kind regards,
Jaron Viëtor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 20:39 Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash Jaron Viëtor
@ 2026-03-31 21:13 ` Qu Wenruo
  2026-03-31 21:23   ` Jaron Viëtor
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-03-31 21:13 UTC (permalink / raw)
  To: Jaron Viëtor, linux-btrfs



在 2026/4/1 07:09, Jaron Viëtor 写道:
> Hello,
> 
> I have a machine with a 4-drive RAID1 btrfs filesystem attached to it
> over a USB3-to-SATA bridge. They are not connected directly over SATA
> because it's an Intel NUC, so USB3 is pretty much the only sensible
> option.
> A few days ago, one of the drives started failing so I connected a
> second USB-TO-SATA bridge (the first one could only hold 4 drives),
> inserted a new drive into the new bridge, and ran:
> 
> btrfs replace start -r 7 /dev/sdk1 /media
> 
> Devid 7 was the failing drive, and sdk1 is the new (larger) replacement.
> This all went fine so far, and the replace was happily chugging along
> for several hours.
> Unfortunately, at around 7.1% done a kernel crash happened (I believe
> it was unrelated to the replace operation, but can't be sure -
> unfortunately I didn't save the errors it printed) and I had to reboot
> the machine.
> 
> After the reboot, attempting to mount the filesystem gives these messages:
> 
> BTRFS info (device sdk1): first mount of filesystem
> d18c93f8-d80a-4aa7-adc5-86d457ddde20
> BTRFS info (device sdk1): using crc32c (crc32c-lib) checksum algorithm
> BTRFS error (device sdk1): devid 0 path /dev/sdk1 is registered but
> not found in chunk tree
> BTRFS error (device sdk1): remove the above devices or use 'btrfs
> device scan --forget <dev>' to unregister them before mount
> BTRFS error (device sdk1): open_ctree failed: -117
> 
> Either running that command and/or unplugging the new replacement
> drive, instead gives me these (either action results in the same
> messages):

Kernel version please.

And with all devices (including the new and failing disks), and "btrfs 
dev scan", the mount still fails with the same message?

If so, mount with "degraded" mount option, and try to cancel the 
replacement, then try again.

Thanks,
Qu

> 
> BTRFS info (device sdg1): first mount of filesystem
> d18c93f8-d80a-4aa7-adc5-86d457ddde20
> BTRFS info (device sdg1): using crc32c (crc32c-lib) checksum algorithm
> BTRFS info (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 16, flush 0,
> corrupt 1054, gen 0
> BTRFS info (device sdg1): bdev /dev/sdh1 errs: wr 0, rd 0, flush 0,
> corrupt 379, gen 0
> BTRFS info (device sdg1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0,
> corrupt 1652, gen 0
> BTRFS info (device sdg1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0,
> corrupt 1522, gen 0
> BTRFS warning (device sdg1): cannot mount because device replace
> operation is ongoing and
> BTRFS warning (device sdg1): tgtdev (devid 0) is missing, need to run
> 'btrfs dev scan'?
> BTRFS error (device sdg1): failed to init dev_replace: -5
> BTRFS error (device sdg1): open_ctree failed: -5
> 
> So... it seems to be stuck thinking the new drive both should -and-
> shouldn't be there. Huh.
> 
> I already asked for help with this issue in the IRC channel, but the
> friendly folks there told me after some debugging that this was a
> problem for the mailing list. So... here I am!
> I do have access to other machines I could potentially connect the
> drives directly to... but I'm not inclined to think the USB-to-SATA
> bridge(s) is/are the problem here. (Unless somebody here says
> otherwise, of course.)
> 
> Thanks in advance for any help you may be able to provide!
> 
> Kind regards,
> Jaron Viëtor
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 21:13 ` Qu Wenruo
@ 2026-03-31 21:23   ` Jaron Viëtor
  2026-03-31 21:31     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Jaron Viëtor @ 2026-03-31 21:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Mar 31, 2026 at 11:14 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2026/4/1 07:09, Jaron Viëtor 写道:
> > Hello,
> >
> > I have a machine with a 4-drive RAID1 btrfs filesystem attached to it
> > over a USB3-to-SATA bridge. They are not connected directly over SATA
> > because it's an Intel NUC, so USB3 is pretty much the only sensible
> > option.
> > A few days ago, one of the drives started failing so I connected a
> > second USB-TO-SATA bridge (the first one could only hold 4 drives),
> > inserted a new drive into the new bridge, and ran:
> >
> > btrfs replace start -r 7 /dev/sdk1 /media
> >
> > Devid 7 was the failing drive, and sdk1 is the new (larger) replacement.
> > This all went fine so far, and the replace was happily chugging along
> > for several hours.
> > Unfortunately, at around 7.1% done a kernel crash happened (I believe
> > it was unrelated to the replace operation, but can't be sure -
> > unfortunately I didn't save the errors it printed) and I had to reboot
> > the machine.
> >
> > After the reboot, attempting to mount the filesystem gives these messages:
> >
> > BTRFS info (device sdk1): first mount of filesystem
> > d18c93f8-d80a-4aa7-adc5-86d457ddde20
> > BTRFS info (device sdk1): using crc32c (crc32c-lib) checksum algorithm
> > BTRFS error (device sdk1): devid 0 path /dev/sdk1 is registered but
> > not found in chunk tree
> > BTRFS error (device sdk1): remove the above devices or use 'btrfs
> > device scan --forget <dev>' to unregister them before mount
> > BTRFS error (device sdk1): open_ctree failed: -117
> >
> > Either running that command and/or unplugging the new replacement
> > drive, instead gives me these (either action results in the same
> > messages):
>
> Kernel version please.
>
> And with all devices (including the new and failing disks), and "btrfs
> dev scan", the mount still fails with the same message?
>
> If so, mount with "degraded" mount option, and try to cancel the
> replacement, then try again.

Thanks for your reply!
This is on kernel 6.19.10, the problem (and start of the replace
operation) was originally started on kernel 6.18.2 - I updated in
hopes that would fix things, but it made no difference.
Yes, with all drives the mount fails with the same message. Attempting
to mount degraded (or ro,degraded) also gives the exact same message
as well, so I have no way to cancel the replacement (that I know of).

Kind regards,
Jaron

>
> Thanks,
> Qu
>
> >
> > BTRFS info (device sdg1): first mount of filesystem
> > d18c93f8-d80a-4aa7-adc5-86d457ddde20
> > BTRFS info (device sdg1): using crc32c (crc32c-lib) checksum algorithm
> > BTRFS info (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 16, flush 0,
> > corrupt 1054, gen 0
> > BTRFS info (device sdg1): bdev /dev/sdh1 errs: wr 0, rd 0, flush 0,
> > corrupt 379, gen 0
> > BTRFS info (device sdg1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0,
> > corrupt 1652, gen 0
> > BTRFS info (device sdg1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0,
> > corrupt 1522, gen 0
> > BTRFS warning (device sdg1): cannot mount because device replace
> > operation is ongoing and
> > BTRFS warning (device sdg1): tgtdev (devid 0) is missing, need to run
> > 'btrfs dev scan'?
> > BTRFS error (device sdg1): failed to init dev_replace: -5
> > BTRFS error (device sdg1): open_ctree failed: -5
> >
> > So... it seems to be stuck thinking the new drive both should -and-
> > shouldn't be there. Huh.
> >
> > I already asked for help with this issue in the IRC channel, but the
> > friendly folks there told me after some debugging that this was a
> > problem for the mailing list. So... here I am!
> > I do have access to other machines I could potentially connect the
> > drives directly to... but I'm not inclined to think the USB-to-SATA
> > bridge(s) is/are the problem here. (Unless somebody here says
> > otherwise, of course.)
> >
> > Thanks in advance for any help you may be able to provide!
> >
> > Kind regards,
> > Jaron Viëtor
> >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 21:23   ` Jaron Viëtor
@ 2026-03-31 21:31     ` Qu Wenruo
  2026-03-31 21:54       ` Jaron Viëtor
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-03-31 21:31 UTC (permalink / raw)
  To: Jaron Viëtor; +Cc: linux-btrfs



在 2026/4/1 07:53, Jaron Viëtor 写道:
> On Tue, Mar 31, 2026 at 11:14 PM Qu Wenruo <wqu@suse.com> wrote:
[...]
>>
>> Kernel version please.
>>
>> And with all devices (including the new and failing disks), and "btrfs
>> dev scan", the mount still fails with the same message?
>>
>> If so, mount with "degraded" mount option, and try to cancel the
>> replacement, then try again.
> 
> Thanks for your reply!
> This is on kernel 6.19.10, the problem (and start of the replace
> operation) was originally started on kernel 6.18.2 - I updated in
> hopes that would fix things, but it made no difference.
> Yes, with all drives the mount fails with the same message. Attempting
> to mount degraded (or ro,degraded) also gives the exact same message
> as well, so I have no way to cancel the replacement (that I know of).

Then please provide the following dump, just use any device is fine 
(maybe except the replace target/source device):

  # btrfs ins dump-tree -t root <device>

Then the following dump for each device, including the target and source 
device:

  # btrfs ins dump-super -f <device>

I still remember I handled a similar bug before, which seems to be a 
bitflip in the devid.

And just in case, please also run a memtest to rule out any hardware 
memory problems.
You won't believe how frequent such problems are observed, especially if 
your NUC is running DDR4 memory.

Thanks,
Qu

> 
> Kind regards,
> Jaron
> 
>>
>> Thanks,
>> Q

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 21:31     ` Qu Wenruo
@ 2026-03-31 21:54       ` Jaron Viëtor
  2026-03-31 21:58         ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Jaron Viëtor @ 2026-03-31 21:54 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Mar 31, 2026 at 11:31 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2026/4/1 07:53, Jaron Viëtor 写道:
> > On Tue, Mar 31, 2026 at 11:14 PM Qu Wenruo <wqu@suse.com> wrote:
> [...]
> >>
> >> Kernel version please.
> >>
> >> And with all devices (including the new and failing disks), and "btrfs
> >> dev scan", the mount still fails with the same message?
> >>
> >> If so, mount with "degraded" mount option, and try to cancel the
> >> replacement, then try again.
> >
> > Thanks for your reply!
> > This is on kernel 6.19.10, the problem (and start of the replace
> > operation) was originally started on kernel 6.18.2 - I updated in
> > hopes that would fix things, but it made no difference.
> > Yes, with all drives the mount fails with the same message. Attempting
> > to mount degraded (or ro,degraded) also gives the exact same message
> > as well, so I have no way to cancel the replacement (that I know of).
>
> Then please provide the following dump, just use any device is fine
> (maybe except the replace target/source device):
>
>   # btrfs ins dump-tree -t root <device>
>
> Then the following dump for each device, including the target and source
> device:
>
>   # btrfs ins dump-super -f <device>
>

I've uploaded the dumps here:
https://transfer.ddvtech.com/obowqfcDsl/dumps.tar.gz
/dev/sdh is the failing drive, /dev/sdk is the replacement drive.

> I still remember I handled a similar bug before, which seems to be a
> bitflip in the devid.
>
> And just in case, please also run a memtest to rule out any hardware
> memory problems.
> You won't believe how frequent such problems are observed, especially if
> your NUC is running DDR4 memory.

Ok, I'll run a memtest overnight just in case.
Though I should note this system had been running for several months
with no observed problems, so it's -probably- not bad memory.

>
> Thanks,
> Qu
>
> >
> > Kind regards,
> > Jaron
> >
> >>
> >> Thanks,
> >> Q

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 21:54       ` Jaron Viëtor
@ 2026-03-31 21:58         ` Qu Wenruo
  2026-03-31 22:01           ` Jaron Viëtor
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-03-31 21:58 UTC (permalink / raw)
  To: Jaron Viëtor; +Cc: linux-btrfs



在 2026/4/1 08:24, Jaron Viëtor 写道:
> On Tue, Mar 31, 2026 at 11:31 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> 在 2026/4/1 07:53, Jaron Viëtor 写道:
>>> On Tue, Mar 31, 2026 at 11:14 PM Qu Wenruo <wqu@suse.com> wrote:
>> [...]
>>>>
>>>> Kernel version please.
>>>>
>>>> And with all devices (including the new and failing disks), and "btrfs
>>>> dev scan", the mount still fails with the same message?
>>>>
>>>> If so, mount with "degraded" mount option, and try to cancel the
>>>> replacement, then try again.
>>>
>>> Thanks for your reply!
>>> This is on kernel 6.19.10, the problem (and start of the replace
>>> operation) was originally started on kernel 6.18.2 - I updated in
>>> hopes that would fix things, but it made no difference.
>>> Yes, with all drives the mount fails with the same message. Attempting
>>> to mount degraded (or ro,degraded) also gives the exact same message
>>> as well, so I have no way to cancel the replacement (that I know of).
>>
>> Then please provide the following dump, just use any device is fine
>> (maybe except the replace target/source device):
>>
>>    # btrfs ins dump-tree -t root <device>
>>
>> Then the following dump for each device, including the target and source
>> device:
>>
>>    # btrfs ins dump-super -f <device>
>>
> 
> I've uploaded the dumps here:
> https://transfer.ddvtech.com/obowqfcDsl/dumps.tar.gz
> /dev/sdh is the failing drive, /dev/sdk is the replacement drive.

My bad, I forgot to ask for dump-tree of dev tree:

# btrfs ins dump-tree -t dev <device>

Again it's fine to dump it from any good device.

Thanks,
Qu
> 
>> I still remember I handled a similar bug before, which seems to be a
>> bitflip in the devid.
>>
>> And just in case, please also run a memtest to rule out any hardware
>> memory problems.
>> You won't believe how frequent such problems are observed, especially if
>> your NUC is running DDR4 memory.
> 
> Ok, I'll run a memtest overnight just in case.
> Though I should note this system had been running for several months
> with no observed problems, so it's -probably- not bad memory.
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Kind regards,
>>> Jaron
>>>
>>>>
>>>> Thanks,
>>>> Q


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 21:58         ` Qu Wenruo
@ 2026-03-31 22:01           ` Jaron Viëtor
  2026-03-31 22:07             ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Jaron Viëtor @ 2026-03-31 22:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Mar 31, 2026 at 11:58 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2026/4/1 08:24, Jaron Viëtor 写道:
> > On Tue, Mar 31, 2026 at 11:31 PM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >>
> >>
> >> 在 2026/4/1 07:53, Jaron Viëtor 写道:
> >>> On Tue, Mar 31, 2026 at 11:14 PM Qu Wenruo <wqu@suse.com> wrote:
> >> [...]
> >>>>
> >>>> Kernel version please.
> >>>>
> >>>> And with all devices (including the new and failing disks), and "btrfs
> >>>> dev scan", the mount still fails with the same message?
> >>>>
> >>>> If so, mount with "degraded" mount option, and try to cancel the
> >>>> replacement, then try again.
> >>>
> >>> Thanks for your reply!
> >>> This is on kernel 6.19.10, the problem (and start of the replace
> >>> operation) was originally started on kernel 6.18.2 - I updated in
> >>> hopes that would fix things, but it made no difference.
> >>> Yes, with all drives the mount fails with the same message. Attempting
> >>> to mount degraded (or ro,degraded) also gives the exact same message
> >>> as well, so I have no way to cancel the replacement (that I know of).
> >>
> >> Then please provide the following dump, just use any device is fine
> >> (maybe except the replace target/source device):
> >>
> >>    # btrfs ins dump-tree -t root <device>
> >>
> >> Then the following dump for each device, including the target and source
> >> device:
> >>
> >>    # btrfs ins dump-super -f <device>
> >>
> >
> > I've uploaded the dumps here:
> > https://transfer.ddvtech.com/obowqfcDsl/dumps.tar.gz
> > /dev/sdh is the failing drive, /dev/sdk is the replacement drive.
>
> My bad, I forgot to ask for dump-tree of dev tree:
>
> # btrfs ins dump-tree -t dev <device>
>
> Again it's fine to dump it from any good device.

No worries! Here it is: https://transfer.ddvtech.com/pY3LV9yw1L/dump_tree.txt

>
> Thanks,
> Qu
> >
> >> I still remember I handled a similar bug before, which seems to be a
> >> bitflip in the devid.
> >>
> >> And just in case, please also run a memtest to rule out any hardware
> >> memory problems.
> >> You won't believe how frequent such problems are observed, especially if
> >> your NUC is running DDR4 memory.
> >
> > Ok, I'll run a memtest overnight just in case.
> > Though I should note this system had been running for several months
> > with no observed problems, so it's -probably- not bad memory.
> >
> >>
> >> Thanks,
> >> Qu
> >>
> >>>
> >>> Kind regards,
> >>> Jaron
> >>>
> >>>>
> >>>> Thanks,
> >>>> Q
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 22:01           ` Jaron Viëtor
@ 2026-03-31 22:07             ` Qu Wenruo
  2026-03-31 22:11               ` Jaron Viëtor
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-03-31 22:07 UTC (permalink / raw)
  To: Jaron Viëtor; +Cc: linux-btrfs



在 2026/4/1 08:31, Jaron Viëtor 写道:
> On Tue, Mar 31, 2026 at 11:58 PM Qu Wenruo <wqu@suse.com> wrote:
[...]
>> My bad, I forgot to ask for dump-tree of dev tree:
>>
>> # btrfs ins dump-tree -t dev <device>
>>
>> Again it's fine to dump it from any good device.
> 
> No worries! Here it is: https://transfer.ddvtech.com/pY3LV9yw1L/dump_tree.txt

And the last one, from chunk tree:

# btrfs ins dump-tree -t chunk <device>

So far the result looks fine, nothing wrong so far.
All device items looks correct, with unique uuid and correct devid.
The dev-replace item in dev tree also looks sane.

The only next item to check is the device items in chunk tree.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 22:07             ` Qu Wenruo
@ 2026-03-31 22:11               ` Jaron Viëtor
  2026-03-31 22:33                 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Jaron Viëtor @ 2026-03-31 22:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Apr 1, 2026 at 12:07 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2026/4/1 08:31, Jaron Viëtor 写道:
> > On Tue, Mar 31, 2026 at 11:58 PM Qu Wenruo <wqu@suse.com> wrote:
> [...]
> >> My bad, I forgot to ask for dump-tree of dev tree:
> >>
> >> # btrfs ins dump-tree -t dev <device>
> >>
> >> Again it's fine to dump it from any good device.
> >
> > No worries! Here it is: https://transfer.ddvtech.com/pY3LV9yw1L/dump_tree.txt
>
> And the last one, from chunk tree:
>
> # btrfs ins dump-tree -t chunk <device>

Here you go:
https://transfer.ddvtech.com/FAXxEEq71p/dump_tree_chunk.txt

>
> So far the result looks fine, nothing wrong so far.
> All device items looks correct, with unique uuid and correct devid.
> The dev-replace item in dev tree also looks sane.
>
> The only next item to check is the device items in chunk tree.
>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 22:11               ` Jaron Viëtor
@ 2026-03-31 22:33                 ` Qu Wenruo
  2026-03-31 22:47                   ` Jaron Viëtor
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-03-31 22:33 UTC (permalink / raw)
  To: Jaron Viëtor; +Cc: linux-btrfs

在 2026/4/1 08:41, Jaron Viëtor 写道:
> On Wed, Apr 1, 2026 at 12:07 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> 在 2026/4/1 08:31, Jaron Viëtor 写道:
>>> On Tue, Mar 31, 2026 at 11:58 PM Qu Wenruo <wqu@suse.com> wrote:
>> [...]
>>>> My bad, I forgot to ask for dump-tree of dev tree:
>>>>
>>>> # btrfs ins dump-tree -t dev <device>
>>>>
>>>> Again it's fine to dump it from any good device.
>>>
>>> No worries! Here it is: https://transfer.ddvtech.com/pY3LV9yw1L/dump_tree.txt
>>
>> And the last one, from chunk tree:
>>
>> # btrfs ins dump-tree -t chunk <device>
> 
> Here you go:
> https://transfer.ddvtech.com/FAXxEEq71p/dump_tree_chunk.txt

Thanks a lot.

All the dumps show that everything is fine, except we do not have a 
device item in the chunk tree for the replace target device.

After some more digging, it looks like we didn't really insert the 
device item for the target device into chunk tree.

So it triggered a false alert from btrfs_verify_dev_items(), thus 
rejected the mount.

Thankfully the check is only introduced in v6.19 and should not been 
backported to older kernels.

You can try to downgrade the kernel to v6.18 or even the latest LTS 
(v6.12), and try to mount them again.

Meanwhile I'll need to address the regression properly.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 22:33                 ` Qu Wenruo
@ 2026-03-31 22:47                   ` Jaron Viëtor
  2026-04-07  5:41                     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Jaron Viëtor @ 2026-03-31 22:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Apr 1, 2026 at 12:33 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2026/4/1 08:41, Jaron Viëtor 写道:
> > On Wed, Apr 1, 2026 at 12:07 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >>
> >>
> >> 在 2026/4/1 08:31, Jaron Viëtor 写道:
> >>> On Tue, Mar 31, 2026 at 11:58 PM Qu Wenruo <wqu@suse.com> wrote:
> >> [...]
> >>>> My bad, I forgot to ask for dump-tree of dev tree:
> >>>>
> >>>> # btrfs ins dump-tree -t dev <device>
> >>>>
> >>>> Again it's fine to dump it from any good device.
> >>>
> >>> No worries! Here it is: https://transfer.ddvtech.com/pY3LV9yw1L/dump_tree.txt
> >>
> >> And the last one, from chunk tree:
> >>
> >> # btrfs ins dump-tree -t chunk <device>
> >
> > Here you go:
> > https://transfer.ddvtech.com/FAXxEEq71p/dump_tree_chunk.txt
>
> Thanks a lot.
>
> All the dumps show that everything is fine, except we do not have a
> device item in the chunk tree for the replace target device.
>
> After some more digging, it looks like we didn't really insert the
> device item for the target device into chunk tree.
>
> So it triggered a false alert from btrfs_verify_dev_items(), thus
> rejected the mount.
>
> Thankfully the check is only introduced in v6.19 and should not been
> backported to older kernels.
>
> You can try to downgrade the kernel to v6.18 or even the latest LTS
> (v6.12), and try to mount them again.
>
> Meanwhile I'll need to address the regression properly.

That seems to have done the trick!

BTRFS info (device sdb1): first mount of filesystem
d18c93f8-d80a-4aa7-adc5-86d457ddde20
BTRFS info (device sdb1): using crc32c (crc32c-lib) checksum algorithm
BTRFS info (device sdb1): bdev /dev/sdh1 errs: wr 0, rd 16, flush 0,
corrupt 1054, gen 0
BTRFS info (device sdb1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0,
corrupt 379, gen 0
BTRFS info (device sdb1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0,
corrupt 1652, gen 0
BTRFS info (device sdb1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0,
corrupt 1522, gen 0
BTRFS info (device sdb1): start tree-log replay
BTRFS info (device sdb1): enabling free space tree
BTRFS info (device sdb1): continuing dev_replace from /dev/sdi1 (devid
7) to target /dev/sdb1 @7%

Thanks for the super fast response, and good luck patching the regression.

>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-03-31 22:47                   ` Jaron Viëtor
@ 2026-04-07  5:41                     ` Qu Wenruo
  2026-04-07 11:31                       ` Jaron Viëtor
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2026-04-07  5:41 UTC (permalink / raw)
  To: Jaron Viëtor; +Cc: linux-btrfs



在 2026/4/1 09:17, Jaron Viëtor 写道:
> On Wed, Apr 1, 2026 at 12:33 AM Qu Wenruo <wqu@suse.com> wrote:
[...]
> 
> That seems to have done the trick!
> 
> BTRFS info (device sdb1): first mount of filesystem
> d18c93f8-d80a-4aa7-adc5-86d457ddde20
> BTRFS info (device sdb1): using crc32c (crc32c-lib) checksum algorithm
> BTRFS info (device sdb1): bdev /dev/sdh1 errs: wr 0, rd 16, flush 0,
> corrupt 1054, gen 0
> BTRFS info (device sdb1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0,
> corrupt 379, gen 0
> BTRFS info (device sdb1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0,
> corrupt 1652, gen 0
> BTRFS info (device sdb1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0,
> corrupt 1522, gen 0
> BTRFS info (device sdb1): start tree-log replay
> BTRFS info (device sdb1): enabling free space tree
> BTRFS info (device sdb1): continuing dev_replace from /dev/sdi1 (devid
> 7) to target /dev/sdb1 @7%
> 
> Thanks for the super fast response, and good luck patching the regression.

Sorry to bother you again, and I wish the replace has finished without 
problem.

If the replace finished, mind to dump the dev tree again?

# btrfs ins dump-tree -t dev <device>

Recently I believe I found another minor bug, that if you have a device 
status item (which records how many errors you hit for that replace 
target device) for the replace target device, it will stay there forever.

Normally that dev stats item should have no errors, but if not the next 
time a new dev-replace is initialized, that new device will inherit that 
old numbers.

It will help a lot if such dump can be provided.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash
  2026-04-07  5:41                     ` Qu Wenruo
@ 2026-04-07 11:31                       ` Jaron Viëtor
  0 siblings, 0 replies; 13+ messages in thread
From: Jaron Viëtor @ 2026-04-07 11:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 7, 2026 at 7:41 AM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2026/4/1 09:17, Jaron Viëtor 写道:
> > On Wed, Apr 1, 2026 at 12:33 AM Qu Wenruo <wqu@suse.com> wrote:
> [...]
> >
> > That seems to have done the trick!
> >
> > BTRFS info (device sdb1): first mount of filesystem
> > d18c93f8-d80a-4aa7-adc5-86d457ddde20
> > BTRFS info (device sdb1): using crc32c (crc32c-lib) checksum algorithm
> > BTRFS info (device sdb1): bdev /dev/sdh1 errs: wr 0, rd 16, flush 0,
> > corrupt 1054, gen 0
> > BTRFS info (device sdb1): bdev /dev/sdi1 errs: wr 0, rd 0, flush 0,
> > corrupt 379, gen 0
> > BTRFS info (device sdb1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0,
> > corrupt 1652, gen 0
> > BTRFS info (device sdb1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0,
> > corrupt 1522, gen 0
> > BTRFS info (device sdb1): start tree-log replay
> > BTRFS info (device sdb1): enabling free space tree
> > BTRFS info (device sdb1): continuing dev_replace from /dev/sdi1 (devid
> > 7) to target /dev/sdb1 @7%
> >
> > Thanks for the super fast response, and good luck patching the regression.
>
> Sorry to bother you again, and I wish the replace has finished without
> problem.

It did, thanks! (Though a day after the replaced finished, another
drive failed - just bad luck I guess - going to start another replace
today...)

>
> If the replace finished, mind to dump the dev tree again?
>
> # btrfs ins dump-tree -t dev <device>

Here you go: https://transfer.ddvtech.com/koToYlIWj9/replaced_device_dev_tree.txt

>
> Recently I believe I found another minor bug, that if you have a device
> status item (which records how many errors you hit for that replace
> target device) for the replace target device, it will stay there forever.
>
> Normally that dev stats item should have no errors, but if not the next
> time a new dev-replace is initialized, that new device will inherit that
> old numbers.
>
> It will help a lot if such dump can be provided.
>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-07 11:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 20:39 Problem mounting 4-drive RAID1 fs after replace was interrupted by kernel crash Jaron Viëtor
2026-03-31 21:13 ` Qu Wenruo
2026-03-31 21:23   ` Jaron Viëtor
2026-03-31 21:31     ` Qu Wenruo
2026-03-31 21:54       ` Jaron Viëtor
2026-03-31 21:58         ` Qu Wenruo
2026-03-31 22:01           ` Jaron Viëtor
2026-03-31 22:07             ` Qu Wenruo
2026-03-31 22:11               ` Jaron Viëtor
2026-03-31 22:33                 ` Qu Wenruo
2026-03-31 22:47                   ` Jaron Viëtor
2026-04-07  5:41                     ` Qu Wenruo
2026-04-07 11:31                       ` Jaron Viëtor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox