raid6 + hot spare question

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid6 + hot spare question
@ 2015-09-08 11:59 Peter Keše
  2015-09-08 12:12 ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Keše @ 2015-09-08 11:59 UTC (permalink / raw)
  To: linux-btrfs

I'm planning to set up a raid6 array with 4 x 4TB drives.
Presumably that would result in 8TB of usable space + parity, which is 
about enough for my data (my data is currently 5TB in raid1, slowly 
growing at about 1 TB per year, but I often keep some additional backups 
if space permits).

However I'd like to be prepared for a disk failure. Because my server is 
not easily accessible and disk replacement times can be long, I'm 
considering the idea of making a 5-drive raid6, thus getting 12TB 
useable space + parity. In this case, the extra 4TB drive would serve as 
some sort of a hot spare.

My assumption is that if one hard drive fails before the volume is more 
than 8TB full, I can just rebalance and resize the volume from 12 TB 
back to 8 TB essentially going from 5-drive raid6 to 4-drive raid6).

Can anyone confirm my assumption? Can I indeed rebalance from 5-drive 
raid6 to 4-drive raid6 if the volume is not too big?

Thanks,
     Peter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid6 + hot spare question
  2015-09-08 11:59 raid6 + hot spare question Peter Keše
@ 2015-09-08 12:12 ` Hugo Mills
  2015-09-09 15:48   ` Brendan Hide
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2015-09-08 12:12 UTC (permalink / raw)
  To: Peter Keše; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

On Tue, Sep 08, 2015 at 01:59:19PM +0200, Peter Keše wrote:
> 
> I'm planning to set up a raid6 array with 4 x 4TB drives.
> Presumably that would result in 8TB of usable space + parity, which
> is about enough for my data (my data is currently 5TB in raid1,
> slowly growing at about 1 TB per year, but I often keep some
> additional backups if space permits).
> 
> However I'd like to be prepared for a disk failure. Because my
> server is not easily accessible and disk replacement times can be
> long, I'm considering the idea of making a 5-drive raid6, thus
> getting 12TB useable space + parity. In this case, the extra 4TB
> drive would serve as some sort of a hot spare.
> 
> My assumption is that if one hard drive fails before the volume is
> more than 8TB full, I can just rebalance and resize the volume from
> 12 TB back to 8 TB essentially going from 5-drive raid6 to 4-drive
> raid6).
> 
> Can anyone confirm my assumption? Can I indeed rebalance from
> 5-drive raid6 to 4-drive raid6 if the volume is not too big?

   Yes, you can, provided, as you say, the data is small enough to fit
into the reduced filesystem.

   Hugo.

-- 
Hugo Mills             | "What's so bad about being drunk?"
hugo@... carfax.org.uk | "You ask a glass of water"
http://carfax.org.uk/  |                                         Arthur & Ford
PGP: E2AB1DE4          |                 The Hitch-Hiker's Guide to the Galaxy

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid6 + hot spare question
  2015-09-08 12:12 ` Hugo Mills
@ 2015-09-09 15:48   ` Brendan Hide
  2015-09-09 23:14     ` Chris Murphy
  2015-09-10  0:28     ` Duncan
  0 siblings, 2 replies; 5+ messages in thread
From: Brendan Hide @ 2015-09-09 15:48 UTC (permalink / raw)
  To: Hugo Mills, Peter Keše, linux-btrfs

Things can be a little more nuanced.

First off, I'm not even sure btrfs supports a hot spare currently. I 
haven't seen anything along those lines recently in the list - and don't 
recall anything along those lines before either. The current mention of 
it in the Project Ideas page on the wiki implies it hasn't been looked 
at yet.

Also, depending on your experience with btrfs, some of the tasks 
involved in fixing up a missing/dead disk might be daunting.

See further (queries for btrfs-devs too) inline below:

On 2015-09-08 14:12, Hugo Mills wrote:
> On Tue, Sep 08, 2015 at 01:59:19PM +0200, Peter Keše wrote:
>> <snip>
>> However I'd like to be prepared for a disk failure. Because my
>> server is not easily accessible and disk replacement times can be
>> long, I'm considering the idea of making a 5-drive raid6, thus
>> getting 12TB useable space + parity. In this case, the extra 4TB
>> drive would serve as some sort of a hot spare.
 From the above I'm reading one of two situations:
a) 6 drives, raid6 across 5 drives and 1 unused/hot spare
b) 5 drives, raid6 across 5 drives and zero unused/hot spare
>>
>> My assumption is that if one hard drive fails before the volume is
>> more than 8TB full, I can just rebalance and resize the volume from
>> 12 TB back to 8 TB essentially going from 5-drive raid6 to 4-drive
>> raid6).
>>
>> Can anyone confirm my assumption? Can I indeed rebalance from
>> 5-drive raid6 to 4-drive raid6 if the volume is not too big?
>     Yes, you can, provided, as you say, the data is small enough to fit
> into the reduced filesystem.
>
>     Hugo.
>
This is true - however, I'd be hesitant to build this up due to the 
current process not being very "smooth" depending on how unlucky you 
are. If you have scenario b above, will the filesystem still be 
read/write or read-only post-reboot? Will it "just work" with the only 
requirement being free space on the four working disks?

RAID6 is intended to be tolerant of two disk failures. In the case of 
there being a double failure and only 5 disks, the ease with which the 
user can balance/convert to a 3-disk raid5 is also important.

Please shoot down my concerns. :)

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid6 + hot spare question
  2015-09-09 15:48   ` Brendan Hide
@ 2015-09-09 23:14     ` Chris Murphy
  2015-09-10  0:28     ` Duncan
  1 sibling, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2015-09-09 23:14 UTC (permalink / raw)
  To: Brendan Hide; +Cc: Hugo Mills, Peter Keše, Btrfs BTRFS

On Wed, Sep 9, 2015 at 9:48 AM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
> Things can be a little more nuanced.
>
> First off, I'm not even sure btrfs supports a hot spare currently. I haven't
> seen anything along those lines recently in the list - and don't recall
> anything along those lines before either. The current mention of it in the
> Project Ideas page on the wiki implies it hasn't been looked at yet.
>
> Also, depending on your experience with btrfs, some of the tasks involved in
> fixing up a missing/dead disk might be daunting.
>
> See further (queries for btrfs-devs too) inline below:
>
> On 2015-09-08 14:12, Hugo Mills wrote:
>>
>> On Tue, Sep 08, 2015 at 01:59:19PM +0200, Peter Keše wrote:
>>>
>>> <snip>
>>> However I'd like to be prepared for a disk failure. Because my
>>> server is not easily accessible and disk replacement times can be
>>> long, I'm considering the idea of making a 5-drive raid6, thus
>>> getting 12TB useable space + parity. In this case, the extra 4TB
>>> drive would serve as some sort of a hot spare.
>
> From the above I'm reading one of two situations:
> a) 6 drives, raid6 across 5 drives and 1 unused/hot spare
> b) 5 drives, raid6 across 5 drives and zero unused/hot spare
>>>
>>>
>>> My assumption is that if one hard drive fails before the volume is
>>> more than 8TB full, I can just rebalance and resize the volume from
>>> 12 TB back to 8 TB essentially going from 5-drive raid6 to 4-drive
>>> raid6).
>>>
>>> Can anyone confirm my assumption? Can I indeed rebalance from
>>> 5-drive raid6 to 4-drive raid6 if the volume is not too big?
>>
>>     Yes, you can, provided, as you say, the data is small enough to fit
>> into the reduced filesystem.
>>
>>     Hugo.
>>
> This is true - however, I'd be hesitant to build this up due to the current
> process not being very "smooth" depending on how unlucky you are. If you
> have scenario b above, will the filesystem still be read/write or read-only
> post-reboot? Will it "just work" with the only requirement being free space
> on the four working disks?


There isn't even a need to rebalance, dev delete will shrink the fs
and balance. At least that's what I'm seeing here, and found a failure
in a really simple (I think) case, which I just made a new post about:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg46296.html

This should work whether on a failed/missing disk, or normally
operating volume so long as a.) the removal doesn't go below the
minimum devices and b.) there's enough space for the data as a result
of the volume shrink operation.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid6 + hot spare question
  2015-09-09 15:48   ` Brendan Hide
  2015-09-09 23:14     ` Chris Murphy
@ 2015-09-10  0:28     ` Duncan
  1 sibling, 0 replies; 5+ messages in thread
From: Duncan @ 2015-09-10  0:28 UTC (permalink / raw)
  To: linux-btrfs

Brendan Hide posted on Wed, 09 Sep 2015 17:48:11 +0200 as excerpted:

> Things can be a little more nuanced.
> 
> First off, I'm not even sure btrfs supports a hot spare currently. I
> haven't seen anything along those lines recently in the list - and don't
> recall anything along those lines before either. The current mention of
> it in the Project Ideas page on the wiki implies it hasn't been looked
> at yet.

Btrfs doesn't support hot spares... yet.  As mentioned it's in ideas and 
given the practicality, is likely to be implemented at some point, but 
given the reality of btrfs development speed, that's likely to be some 
years away.

The best that can be done is "warm spare", connected up but (presumably) 
spun down and not part of the raid, so it can (remotely if necessary) be 
brought online and added to the raid as needed.  That's certainly 
possible, but not as a btrfs specific feature, rather, as a general part 
of the Linux infrastructure.

> Also, depending on your experience with btrfs, some of the tasks
> involved in fixing up a missing/dead disk might be daunting.

Yes...

> On 2015-09-08 14:12, Hugo Mills wrote:
>> On Tue, Sep 08, 2015 at 01:59:19PM +0200, Peter Keše wrote:
>>> <snip>
>>> My assumption is that if one hard drive fails before the volume is
>>> more than 8TB full, I can just rebalance and resize the volume from 12
>>> TB back to 8 TB essentially going from 5-drive raid6 to 4-drive
>>> raid6).
>>>
>>> Can anyone confirm my assumption? Can I indeed rebalance from 5-drive
>>> raid6 to 4-drive raid6 if the volume is not too big?
>> 
>> Yes, you can, provided, as you say, the data is small enough to fit
>> into the reduced filesystem.
>>
> This is true - however, I'd be hesitant to build this up due to the
> current process not being very "smooth" depending on how unlucky you
> are.  [W]ill the filesystem still be read/write or read-only post-
> reboot? Will it "just work" with the only requirement being free space
> on the four working disks?

As long as there's four working devices and chunk-unallocated[1] space on 
them, yes, reducing to a 4-device raid6 should be fine.  What happens is 
that raid6 normally requires writing in at least fours[2], two-way data 
stripe and two parities.  If devices drop out and existing chunks with 
free space are no longer are available in fours, btrfs will leave them be 
and try to allocate additional chunks across remaining devices down to 
four[2].  If it can do so, writing can continue in the now reduced-stripe-
width raid6.  If not, there's a chance of going read-only, as it can no 
longer satisfy the raid6 requirements.[3]

> RAID6 is intended to be tolerant of two disk failures. In the case of
> there being a double failure and only 5 disks, the ease with which the
> user can balance/convert to a 3-disk raid5 is also important.

Again, see footnote [2] and [3] below.

---
[1] Btrfs allocates space in two stages, first in largish chunks to 
either data or metadata (chunk nominal size 1 GiB data, 256 MiB 
metadata), then actually using space from the chunk until it's gone and a 
new one needs allocated.  It's quite possible to have normal df, etc, 
report space left, but have it all locked up in pre-allocated chunks, 
typically data, and not have unallocated space left from which to 
allocate new chunks, typically metadata, when needed.  That used to be a 
big issue as btrfs could automatically allocate chunks but it took a 
balance to deallocate them, but now btrfs deallocates entirely empty 
chunks on its own, so the problem can still occur especially over time as 
existing chunks get fragmented and more chunks are only partially used, 
but it's not the /huge/ problem it once was, because at least entirely 
empty chunks are automatically deallocated and their space returned to 
the unallocated space pool to be chunk-allocated as necessary.

[2] While traditional raid6 requires minimum four devices, two-way-data-
stripe and two parity, and raid5 requires minimum three devices, two-way-
data-stripe with single parity, btrfs raid5, at least, degrades to single 
data, single parity, which ends up being in effect raid1, thus allowing a 
two-device "raid5".  I am not actually sure whether btrfs raid6 similarly 
allows degrading to single data, double parity, thus three devices, or 
not.  Of course, to do the full filesystem this way would require that 
the data and metadata fit on a single device, since the others are 
parity, but as a temporary fallback where existing chunks are simply left 
as is with data/metadata reconstructed from parity where necessary, only 
writing new data in the single-data/metadata mode, it can keep the 
filesystem writable.

[3] In actuality, given a device dropout situation, as long as the 
filesystem isn't unmounted, btrfs will continue to try to write to the 
failed/dropped device, writing to the other devices and buffering writes 
for the failed device in case it reappears, until memory is exhausted, at 
which point it presumably crashes.

I'm unsure about raid56 behavior on reboot/remount, but at least with 
raid1, dropping below the minimum required devices (2) to maintain the 
raid1 still allows mounting rw degraded... for one mount.  In that case 
the formerly raidN writes force allocation of new single chunks (or for 
the single device non-ssd case for metadata only, dup) and writing 
continues to them, allowing device delete/add/replace and rebalance as 
the admin considers appropriate.

The problem appears on the /second/ attempt to mount degraded writable, 
after there are existing single mode chunks on the filesystem, since 
btrfs sees the single chunks and thinks that there are single chunks on 
the missing device as well, and blocks writes in ordered to prevent 
further damage.  It's not smart enough to know that the only single 
chunks written are on still available devices.

Awareness of (the cause of) this problem is fairly recent, and there are 
patches that I think made it into 4.2 to allow writable degraded mount 
even with single chunks, but I'm not sure of the 4.2 status, and in any 
event, being new, the patches may not catch all corner cases.  
Additionally, while I /think/ the same situation and thus patches apply 
to raid56, I'm not entirely sure of that, so some testing (or 
verification from others who have tested in raid56 mode) would be needed 
if you want to be sure.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-09-10  0:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-08 11:59 raid6 + hot spare question Peter Keše
2015-09-08 12:12 ` Hugo Mills
2015-09-09 15:48   ` Brendan Hide
2015-09-09 23:14     ` Chris Murphy
2015-09-10  0:28     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).