Re: How to replace a failing device

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matt Huszagh <huszaghmatt@gmail.com>
To: Roman Mamedov <rm@romanrm.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: How to replace a failing device
Date: Wed, 02 Nov 2022 20:51:11 -0700	[thread overview]
Message-ID: <87v8nw3dcg.fsf@gmail.com> (raw)
In-Reply-To: <20221102003232.097748e7@nvm>

Roman Mamedov <rm@romanrm.net> writes:

> Remove this cryptsdc2 from the FS (btrfs dev remove), stop the crypto device,
> wipe sdc2 entirely with wipefs.
>
> Then power-off, boot into a rescue system such as grml.org, and use "ddrescue"
> to copy the entire content of ex-sdd1 to ex-sdc2.
>
> After "ddrescue" manages to copy whatever it could, power-off and remove the
> old failing "sdd" from the system. Do not boot the main OS with both disks
> still plugged in. You can wipe the failing one later (after verifying the
> created copy is good) on some other PC, or booting into the same rescue system
> again.

Thanks so much for the help Roman. I was able to mostly recover the old
device and am now running my computer with ex-sdc2. ddrescue reported
99.99% data rescued with 2 bad sectors. Since I didn't have any trouble
backing up recent data, I'm optimistic I haven't lost anything of value.

I'm investigating RAID configurations (probably RAID10) as a way to make
the process of replacing faulty drives somewhat smoother in the
future. If you have any opinions on this would be curious to hear
them. I'll probably also setup a periodic systemd service to run
smartctl and detect issues (hopefully) earlier.

> The reason for calling them
> "ex", is because it is extremely unfortunate to refer to disks by their sd*
> names like that, as those can vary per kernel, distro, random delays during
> detection, etc -- and of course, the controller ports used. So double-check to
> ensure you get the source and destination right.

Yeah this was a bit of a poor choice I made a while back and failed to
update it. I've now changed my configuration to refer to them by the
device serial number (as reported by smartctl -i) and partition (e.g.,
/dev/mapper/S5H9NC0MC06674P_p2). The disks are found by UUID, so this
should be more correct.

> Of course this eschews the question of why Btrfs is not behaving in a more
> desirable way in these circumstances (maybe someone can weigh in on that), and
> does not use its native tools to recover, but this feels to be just the most
> straightforward idea of "what you can do" right now, to bring the system back
> to working order.

Not sure if it will help, but I'll update my Linux kernel version, which
I haven't done in a while. Still curious why scrub wasn't helping
though.

Thank you!
Matt

next prev parent reply	other threads:[~2022-11-03  3:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01 19:13 How to replace a failing device Matt Huszagh
2022-11-01 19:32 ` Roman Mamedov
2022-11-01 19:44   ` Roman Mamedov
2022-11-03  3:51   ` Matt Huszagh [this message]
2022-11-03  4:19     ` Andrei Borzenkov
2022-11-03  4:25       ` Matt Huszagh
2022-11-03 12:18     ` Roman Mamedov
2022-11-03 21:39       ` Matt Huszagh
2022-11-03 22:32         ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8nw3dcg.fsf@gmail.com \
    --to=huszaghmatt@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).