From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Anand Jain <anand.jain@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
Date: Mon, 9 Nov 2015 09:09:07 -0500 [thread overview]
Message-ID: <5640A903.9030209@gmail.com> (raw)
In-Reply-To: <1447066589-3835-1-git-send-email-anand.jain@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]
On 2015-11-09 05:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
It's absolutely awesome to see that someone picked up this project, it's
something that's very useful and helps BTRFS to compete with many
established storage technologies. I've got some specific questions below.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
> btrfs spare add /dev/sde -f
>
> OR if there is a spare device which is already added before the, just
> run
>
> btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
> btrfs fi show
> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091
> Total devices 2 FS bytes used 112.00KiB
> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc
> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd
>
> Global spare
> device size 3.00GiB path /dev/sde
Would I be correct in assuming that we can have more than one hot-spare
device at a time? If so, what method is used to select which one to use
when one is needed?
>
> Thats it.
>
> Auto replace:
> Replace happens automatically, that is when there is any write
> failed or flush failed, the device will be marked as failed, which
> will stop any further IO attempt to that device. And in the next commit
> thread cycle the auto replace will pick the spare device (/dev/sde is
> above example) to replace the failed device. And so the btrfs volume is
> back to a healthy state.
Is there any possibility we could add a knob to control how many errors
are needed before the device is marked as failed? For an enterprise
environment, immediately marking the device failed is the right thing to
do, but for home usage it may make more sense to retry the I/O at least
once before marking the device failed (especially considering that most
home users don't have ECC memory, and a transient memory error can cause
an I/O request to fail (I've actually had this happen on my laptop before)).
>
>
> Its btrfs Global spare:
> as of now only global hot spare is supported, that is hot spare(s)
> are for all the btrfs FS in the system.
How hard would it be to eventually extend this to per-filesystem hot-spares?
>
> No spare when device failed:
> It would scan for spare device at the rate of transaction commit
> and will trigger the auto replace when ever spare device is added.
Does this absolutely have to be polled every commit? This has serious
potential to make running on a degraded array have a much bigger impact
than it does now. While we obviously want people to notice that their
array is degraded, killing performance is not the proper way to do that.
Couldn't we have a callback when adding a hot-spare that would check
for failed devices and initiate the replacement automatically for the
first one found? Ideally, we should keep the current behavior (assume
the error was transient, and retry the I/O) when there is no hot-spare
available.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
next prev parent reply other threads:[~2015-11-09 14:09 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2015-12-05 7:16 ` Qu Wenruo
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58 ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2015-11-09 10:58 ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
2015-11-09 10:58 ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2015-11-09 10:58 ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
2015-11-09 14:09 ` Austin S Hemmelgarn [this message]
2015-11-09 21:29 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Duncan
2015-11-10 12:13 ` Austin S Hemmelgarn
2015-11-13 10:17 ` Anand Jain
2015-11-13 12:25 ` Austin S Hemmelgarn
2015-11-15 18:10 ` Christoph Anton Mitterer
2015-11-12 2:15 ` Qu Wenruo
2015-11-12 6:46 ` Duncan
2015-11-12 13:04 ` Austin S Hemmelgarn
2015-11-13 1:07 ` Qu Wenruo
2015-11-13 10:20 ` Anand Jain
2015-11-14 0:54 ` Qu Wenruo
2015-11-16 13:39 ` Austin S Hemmelgarn
2015-11-12 19:08 ` Goffredo Baroncelli
2015-11-13 10:18 ` Anand Jain
2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-13 10:20 ` Anand Jain
2015-11-14 11:05 ` Goffredo Baroncelli
2015-11-16 13:41 ` Austin S Hemmelgarn
2015-11-16 22:07 ` Anand Jain
2015-11-17 12:28 ` Austin S Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5640A903.9030209@gmail.com \
--to=ahferroin7@gmail.com \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).