Re: BTRFS bad block management. Does it exist?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: waxhead@dirtcellar.net, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS bad block management. Does it exist?
Date: Mon, 15 Oct 2018 08:09:37 -0400	[thread overview]
Message-ID: <3af0d10e-f7d8-806e-8ce5-91295e0ed6d7@gmail.com> (raw)
In-Reply-To: <42b1965a-356c-25c9-8c49-788a9a8a11aa@dirtcellar.net>

On 2018-10-14 07:08, waxhead wrote:
> In case BTRFS fails to WRITE to a disk. What happens?
> Does the bad area get mapped out somehow? Does it try again until it 
> succeed or until it "times out" or reach a threshold counter?
> Does it eventually try to write to a different disk (in case of using 
> the raid1/10 profile?)

Building on Qu's answer (which is absolutely correct), BTRFS makes the 
perfectly reasonable assumption that you're not trying to use known bad 
hardware.  It's not alone in this respect either, pretty much every 
Linux filesystem makes the exact same assumption (and almost all 
non-Linux ones too), because it really is a perfectly reasonable 
assumption.  The only exception is ext[234], but they only support it 
statically (you can set the bad block list at mkfs time, but not 
afterwards, and they don't update it at runtime), and it's a holdover 
from earlier filesystems which originated at a time when storage was 
sufficiently expensive _and_ unreliable that you kept using disks until 
they were essentially completely dead.

The reality is that with modern storage hardware, if you have 
persistently bad sectors the device is either defective (and should be 
returned under warranty), or it's beyond expected EOL (and should just 
be replaced).  Most people know about SSD's doing block remapping to 
avoid bad blocks, but hard drives do it to, and they're actually rather 
good at it.  In both cases, enough spare blocks are provided that the 
device can handle average rates of media errors through the entirety of 
it's average life expectancy without running out of spare blocks.

On top of all of that though, it's fully possible to work around bad 
blocks in the block layer if you take the time to actually do it.  With 
a bit of reasonably simple math, you can easily set up an LVM volume 
that actively avoids all the bad blocks on a disk while still fully 
utilizing the rest of the volume.  Similarly, with a bit of work (and a 
partition table that supports _lots_ of partitions) you can work around 
bad blocks with an MD concatenated device.

next prev parent reply	other threads:[~2018-10-15 12:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-14 11:08 BTRFS bad block management. Does it exist? waxhead
2018-10-14 11:31 ` Qu Wenruo
2018-10-15 12:09 ` Austin S. Hemmelgarn [this message]
2018-10-16  9:57 ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3af0d10e-f7d8-806e-8ce5-91295e0ed6d7@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@dirtcellar.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).