* Re: replacing a disk in a btrfs multi disk array with raid10
2020-08-03 6:15 ` Chris Murphy
@ 2020-08-03 7:47 ` Norbert Preining
2020-10-09 4:20 ` Norbert Preining
1 sibling, 0 replies; 4+ messages in thread
From: Norbert Preining @ 2020-08-03 7:47 UTC (permalink / raw)
To: Btrfs BTRFS
Hi Chris,
thanks for your answer, that is very much appreciated.
On Mon, 03 Aug 2020, Chris Murphy wrote:
> Some of these are considered normal. I suggest making sure each
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
Thanks, will read up on that.
> Once you've done that, do a btrfs scrub.
Happening regularly, but I will kick one off anyway.
> btrfs replace will work whether the drive is present or not. It's just
> safer to do it with the drive present because you don't have to mount
> degraded.
Ok.
I wasn't sure about whether I can mount without -o degraded because all
the metadata and data is on raid1. And then, I don't know what the
Debian initramfs is doing - that is probably the more interesting
surprise.
> > - add the new device
>
> Use 'btrfs replace'
Thanks, noted.
> Currently 'btrfs replace' does require a separate resize step. 'device
> add' doesn't, resize is implied by the command.
This is somehow a logic approach, I agree.
> > - start a new rebalancing
> > (for the rebalance, do I need to give the
> > same -dconvert=raid1 -mconvert=raid1 arguments?)
>
> Not necessary. But it's worth checking 'btrfs fi us -T' and making
> sure everything is raid1 as you expect.
Thanks, good to know.
Again, thanks a lot for all the details - I couldn't deduce most of them
from the wiki page on multiple devices. Your email is extremely helpful!
All the best
Norbert
--
PREINING Norbert https://www.preining.info
Accelia Inc. + IFMGA ProGuide + TU Wien + JAIST + TeX Live + Debian Dev
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: replacing a disk in a btrfs multi disk array with raid10
2020-08-03 6:15 ` Chris Murphy
2020-08-03 7:47 ` Norbert Preining
@ 2020-10-09 4:20 ` Norbert Preining
1 sibling, 0 replies; 4+ messages in thread
From: Norbert Preining @ 2020-10-09 4:20 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
Hi Chris,
(please Cc)
sorry for the late reply - real life.
It turned out that the disk I use is well known to misreport this
property, and thus it can be ignored.
But I had to deal with (temporary) loss of one disk. Fortunately,
Debian's initramfs dropped me into a proper shell where I could mount
the array in degraded mode and just remove the device.
Just one hiccup I realized: **after** some time I could re-connect the
one disc from the array that was missing (I needed a x1 NVMe extender
which I didn't have at the beginning). I though reconnecting is as
simple as
btrfs device add -f /dev/nvme0n1p1 /
but it turned out, because that disk has been part of the array, it was
rejected. Even using the -f option did not work. At the end I had to
fdisk the drive and trash the partition table and btrfs info to get it
ready to be re-added.
Full story https://www.preining.info/blog/2020/09/dealing-with-lost-disks-in-a-btrfs-array/
Anyway, all suprisingly smooth. Thanks to all of you.
Best
Norbert
On Mon, 03 Aug 2020, Chris Murphy wrote:
> On Sun, Aug 2, 2020 at 11:51 PM Norbert Preining <norbert@preining.info> wrote:
> >
> > Hi all
> >
> > (please Cc)
> >
> > I am running Linux 5.7 or 5.8 on a btrfs array of 7 disks, with metadata
> > and data both on raid1, which contains the complete system.
> > (btrfs balance start -dconvert=raid1 -mconvert=raid1 /)
> >
> > Although btrfs device stats / doesn't show any errors, SMART warns about
> > one disk (reallocated sector count property) and I was pondering
> > replacing the device.
>
> Some of these are considered normal. I suggest making sure each
> drive's SCT ERC value is less than the SCSI command timer. You want
> the drive to give up on reading a sector before the kernel considers
> the command "overdue" and does a link reset - losing the contents of
> the command queue. Upon read error, the drive reports the sector LBA
> so that Btrfs can automatically do a fixup.
>
> More info here. It applies to mdadm, lvm, and Btrfs raid.
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>
> Once you've done that, do a btrfs scrub.
>
> >
> > What is the currently suggested method given that I cannot plug in
> > another disk into the computer, all slots are used up (thus a btrfs
> > replace will not work as far as I understand).
>
> btrfs replace will work whether the drive is present or not. It's just
> safer to do it with the drive present because you don't have to mount
> degraded.
>
>
> > Do I need to:
> > - shutdown
> > - pysically replace disk
> > - reboot into rescue system
> > - mount in degraded mode
> > - add the new device
>
> Use 'btrfs replace'
>
> > - resize the file system (new disk would be bigger)
>
> Currently 'btrfs replace' does require a separate resize step. 'device
> add' doesn't, resize is implied by the command.
>
>
> > - start a new rebalancing
> > (for the rebalance, do I need to give the
> > same -dconvert=raid1 -mconvert=raid1 arguments?)
>
> Not necessary. But it's worth checking 'btrfs fi us -T' and making
> sure everything is raid1 as you expect.
--
PREINING Norbert https://www.preining.info
Accelia Inc. + IFMGA ProGuide + TU Wien + JAIST + TeX Live + Debian Dev
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
^ permalink raw reply [flat|nested] 4+ messages in thread