Re: How to replace a failing device

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Roman Mamedov <rm@romanrm.net>
To: Matt Huszagh <huszaghmatt@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: How to replace a failing device
Date: Fri, 4 Nov 2022 03:32:40 +0500	[thread overview]
Message-ID: <20221104033240.7f219ed4@nvm> (raw)
In-Reply-To: <87sfiz3egt.fsf@gmail.com>

On Thu, 03 Nov 2022 14:39:14 -0700
Matt Huszagh <huszaghmatt@gmail.com> wrote:

> I was able to run btrfs scrub successfully with the problematic drive
> removed. Logs show that the following file has a checksum error:
> 
> BTRFS warning (device dm-0): checksum error at logical 10087829524480 on dev /dev/dm-4, physical 1883324207104, root 28842, inode 27543115, offset 74526720, length 4096, links 1 (path: matt/.recoll/library/xapiandb/docdata.glass)
> 
> What can I do to get BTRFS to no longer report a checksum error? Do I
> need to delete this along with all snapshots that contain it?

I believe that's one way, yes; or truncate it to zero bytes, or fully
overwrite with its copy from a backup.

> Ok, thanks for the input. But in theory, BTRFS with a redundant data
> RAID (such as RAID1 or RAID10) should allow scrub to preserve all data
> if a single drive fails, no?

It could, what I meant is that the experience you get while having a disk
failure does not seem polished enough.

For a start, it will cease to mount, and will require adding a "degraded"
mount option. Then, newly written data may get a "single" data profile, and
mounting with "degraded" the second time may fail. To work-around this you
could use the "ro" mount option as well, except that now FS-level operations
like adding or replacing a device will not work. How does one get out of this,
I am not entirely sure.

And there were some reports of bad experience with a disk failure during
normal operation (i.e. device going away, or starting to fail reads or
writes). In theory a RAID system should barely notice such an event, and let
the user smoothly disconnect and pop-in a replacement drive, then continue
working, not to mention without even a reboot. But from the general impression
from user reports, in Btrfs that currently stays only a theory at best.

-- 
With respect,
Roman

     prev parent reply	other threads:[~2022-11-03 22:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01 19:13 How to replace a failing device Matt Huszagh
2022-11-01 19:32 ` Roman Mamedov
2022-11-01 19:44   ` Roman Mamedov
2022-11-03  3:51   ` Matt Huszagh
2022-11-03  4:19     ` Andrei Borzenkov
2022-11-03  4:25       ` Matt Huszagh
2022-11-03 12:18     ` Roman Mamedov
2022-11-03 21:39       ` Matt Huszagh
2022-11-03 22:32         ` Roman Mamedov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221104033240.7f219ed4@nvm \
    --to=rm@romanrm.net \
    --cc=huszaghmatt@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.