mdadm vs zfs for home server?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm vs zfs for home server?
@ 2013-05-27 18:09 Matt Garman
  2013-05-27 19:02 ` Roy Sigurd Karlsbakk
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Matt Garman @ 2013-05-27 18:09 UTC (permalink / raw)
  To: linux-raid

Anyone out there have a home (or maybe small office) file server
that where they thought about native Linux software RAID (mdadm)
versus ZFS on Linux?

I currently have a raid6 array built from five low power (5400 rpm)
3TB drives.  I put an ext4 filesystem right on top of the md device
(no lvm).  This array used to be comprised of 2TB drives; I've been
slowly replacing drives with 3TB versions as they went on sale.

I run a weekly check on the array ("raid-check" script on CentOS,
which is basically a fancy wrapper for "echo check >>
/sys/block/mdX/md/sync_action").  I shouldn't be surprised, but I've
noticed that this check now takes substantially longer (than it did
with the 2TB drives).

I got to thinking about the chances of data loss.  First off: I do
have backups.  But I want to take every "reasonable" precaution
against having to use the backups.  Initially I started thinking
about zfs's raid-z3 (basically, triple-parity raid, the next logical
step in the raid5, raid6 progression).  But then I decided that,
based on the check speed of my current raid6, maybe I want to get
away from parity-based raid all together.

Now I've got another 3TB drive on the way (rounding out the total to
six) and am leaning towards RAID-10.  I don't need the performance,
but it should be more performant than raid6.  And I assume (though I
could be very wrong) that the weekly "check" action ought to be much
faster than it is with raid6.  Is this correct?

But after all that zfs reading, I'm wondering if that might not be
the way to go.  I don't know how necessary it is, but I like the
idea of having the in-filesystem checksums to prevent "silent" data
corruption.

I went through a zfs tutorial, building a little raid10 pool out of
files (just to play with).  Seems pretty straightforward.  But I'm
still much more familiar with mdadm (not an expert by any means, but
quite comfortable with typical uses).  So, does my lack of
experience with zfs offset it's data integrity checks?  And
furthermore, zfs on linux has only recently been marked stable.
Although there is plenty of anecdotal comments that it's been stable
much longer (the zfs on linux guys are just ultra-conservative).
Still, doesn't mdadm have the considerable edge in terms of
"longtime stability"?

As I said initially, I'm in the thinking-it-through stage, just
looking to maybe get a discussion going as to why I should go one
way or the other.

Thanks,
Matt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 18:09 mdadm vs zfs for home server? Matt Garman
@ 2013-05-27 19:02 ` Roy Sigurd Karlsbakk
  2013-05-28 15:00   ` Matt Garman
  2013-05-27 19:20 ` Roman Mamedov
  2013-05-27 22:33 ` Stan Hoeppner
  2 siblings, 1 reply; 11+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-05-27 19:02 UTC (permalink / raw)
  To: Matt Garman; +Cc: linux-raid

Short answer: ZFS will guarantee the data is free of errors, but MD will give you the flexibility of moving between RAID levels and adding drives to existing RAIDs. I have been working with ZFS with some 400TB of storage, and I considered using it for my home server, but chose MD because of the flexibility in there. ZFS requires you to plan your setup. It allows you to add VDEVs, but data isn't balanced over the VDEVs. That will required block pointer rewrite, something that's been talked about for at least four years, but yet hasn't surfaced.

just my 2c

roy

----- Opprinnelig melding -----
> Anyone out there have a home (or maybe small office) file server
> that where they thought about native Linux software RAID (mdadm)
> versus ZFS on Linux?
> 
> I currently have a raid6 array built from five low power (5400 rpm)
> 3TB drives. I put an ext4 filesystem right on top of the md device
> (no lvm). This array used to be comprised of 2TB drives; I've been
> slowly replacing drives with 3TB versions as they went on sale.
> 
> I run a weekly check on the array ("raid-check" script on CentOS,
> which is basically a fancy wrapper for "echo check >>
> /sys/block/mdX/md/sync_action"). I shouldn't be surprised, but I've
> noticed that this check now takes substantially longer (than it did
> with the 2TB drives).
> 
> I got to thinking about the chances of data loss. First off: I do
> have backups. But I want to take every "reasonable" precaution
> against having to use the backups. Initially I started thinking
> about zfs's raid-z3 (basically, triple-parity raid, the next logical
> step in the raid5, raid6 progression). But then I decided that,
> based on the check speed of my current raid6, maybe I want to get
> away from parity-based raid all together.
> 
> Now I've got another 3TB drive on the way (rounding out the total to
> six) and am leaning towards RAID-10. I don't need the performance,
> but it should be more performant than raid6. And I assume (though I
> could be very wrong) that the weekly "check" action ought to be much
> faster than it is with raid6. Is this correct?
> 
> But after all that zfs reading, I'm wondering if that might not be
> the way to go. I don't know how necessary it is, but I like the
> idea of having the in-filesystem checksums to prevent "silent" data
> corruption.
> 
> I went through a zfs tutorial, building a little raid10 pool out of
> files (just to play with). Seems pretty straightforward. But I'm
> still much more familiar with mdadm (not an expert by any means, but
> quite comfortable with typical uses). So, does my lack of
> experience with zfs offset it's data integrity checks? And
> furthermore, zfs on linux has only recently been marked stable.
> Although there is plenty of anecdotal comments that it's been stable
> much longer (the zfs on linux guys are just ultra-conservative).
> Still, doesn't mdadm have the considerable edge in terms of
> "longtime stability"?
> 
> As I said initially, I'm in the thinking-it-through stage, just
> looking to maybe get a discussion going as to why I should go one
> way or the other.
> 
> Thanks,
> Matt
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 19:02 ` Roy Sigurd Karlsbakk
@ 2013-05-28 15:00   ` Matt Garman
  2013-05-28 15:18     ` Jon Nelson
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Garman @ 2013-05-28 15:00 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: linux-raid

On Mon, May 27, 2013 at 09:02:08PM +0200, Roy Sigurd Karlsbakk wrote:
> Short answer: ZFS will guarantee the data is free of errors, but
> MD will give you the flexibility of moving between RAID levels and
> adding drives to existing RAIDs. I have been working with ZFS with
> some 400TB of storage, and I considered using it for my home
> server, but chose MD because of the flexibility in there. ZFS
> requires you to plan your setup. It allows you to add VDEVs, but
> data isn't balanced over the VDEVs. That will required block
> pointer rewrite, something that's been talked about for at least
> four years, but yet hasn't surfaced.


In the raid-10 case, does Linux MD automatically "reblance" the
data?  I could be wrong, but my understanding is that it will let
you grow the array, but in the same way that ZFS would (for raid10
anyway): the extra space is there, but not striped across the
original disks.

If that's true, then it somewhat "evens the score" for me, as I'm
leaning towards raid-10.




> just my 2c
> 
> roy
> 
> ----- Opprinnelig melding -----
> > Anyone out there have a home (or maybe small office) file server
> > that where they thought about native Linux software RAID (mdadm)
> > versus ZFS on Linux?
> > 
> > I currently have a raid6 array built from five low power (5400 rpm)
> > 3TB drives. I put an ext4 filesystem right on top of the md device
> > (no lvm). This array used to be comprised of 2TB drives; I've been
> > slowly replacing drives with 3TB versions as they went on sale.
> > 
> > I run a weekly check on the array ("raid-check" script on CentOS,
> > which is basically a fancy wrapper for "echo check >>
> > /sys/block/mdX/md/sync_action"). I shouldn't be surprised, but I've
> > noticed that this check now takes substantially longer (than it did
> > with the 2TB drives).
> > 
> > I got to thinking about the chances of data loss. First off: I do
> > have backups. But I want to take every "reasonable" precaution
> > against having to use the backups. Initially I started thinking
> > about zfs's raid-z3 (basically, triple-parity raid, the next logical
> > step in the raid5, raid6 progression). But then I decided that,
> > based on the check speed of my current raid6, maybe I want to get
> > away from parity-based raid all together.
> > 
> > Now I've got another 3TB drive on the way (rounding out the total to
> > six) and am leaning towards RAID-10. I don't need the performance,
> > but it should be more performant than raid6. And I assume (though I
> > could be very wrong) that the weekly "check" action ought to be much
> > faster than it is with raid6. Is this correct?
> > 
> > But after all that zfs reading, I'm wondering if that might not be
> > the way to go. I don't know how necessary it is, but I like the
> > idea of having the in-filesystem checksums to prevent "silent" data
> > corruption.
> > 
> > I went through a zfs tutorial, building a little raid10 pool out of
> > files (just to play with). Seems pretty straightforward. But I'm
> > still much more familiar with mdadm (not an expert by any means, but
> > quite comfortable with typical uses). So, does my lack of
> > experience with zfs offset it's data integrity checks? And
> > furthermore, zfs on linux has only recently been marked stable.
> > Although there is plenty of anecdotal comments that it's been stable
> > much longer (the zfs on linux guys are just ultra-conservative).
> > Still, doesn't mdadm have the considerable edge in terms of
> > "longtime stability"?
> > 
> > As I said initially, I'm in the thinking-it-through stage, just
> > looking to maybe get a discussion going as to why I should go one
> > way or the other.
> > 
> > Thanks,
> > Matt
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 98013356
> roy@karlsbakk.net
> http://blogg.karlsbakk.net/
> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-28 15:00   ` Matt Garman
@ 2013-05-28 15:18     ` Jon Nelson
  0 siblings, 0 replies; 11+ messages in thread
From: Jon Nelson @ 2013-05-28 15:18 UTC (permalink / raw)
  To: Matt Garman; +Cc: Roy Sigurd Karlsbakk, linux-raid

On Tue, May 28, 2013 at 10:00 AM, Matt Garman <matthew.garman@gmail.com> wrote:
> On Mon, May 27, 2013 at 09:02:08PM +0200, Roy Sigurd Karlsbakk wrote:
>> Short answer: ZFS will guarantee the data is free of errors, but
>> MD will give you the flexibility of moving between RAID levels and
>> adding drives to existing RAIDs. I have been working with ZFS with
>> some 400TB of storage, and I considered using it for my home
>> server, but chose MD because of the flexibility in there. ZFS
>> requires you to plan your setup. It allows you to add VDEVs, but
>> data isn't balanced over the VDEVs. That will required block
>> pointer rewrite, something that's been talked about for at least
>> four years, but yet hasn't surfaced.
>
>
> In the raid-10 case, does Linux MD automatically "reblance" the
> data?  I could be wrong, but my understanding is that it will let
> you grow the array, but in the same way that ZFS would (for raid10
> anyway): the extra space is there, but not striped across the
> original disks.

IIRC, as of March or so of last year (2012), kernels gained the
ability to grow MD RAID10 arrays *provided* they are not using the
"far" offset layout (sadly, my favorite).

--
Jon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 18:09 mdadm vs zfs for home server? Matt Garman
  2013-05-27 19:02 ` Roy Sigurd Karlsbakk
@ 2013-05-27 19:20 ` Roman Mamedov
  2013-05-27 22:33 ` Stan Hoeppner
  2 siblings, 0 replies; 11+ messages in thread
From: Roman Mamedov @ 2013-05-27 19:20 UTC (permalink / raw)
  To: Matt Garman; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On Mon, 27 May 2013 13:09:12 -0500
Matt Garman <matthew.garman@gmail.com> wrote:

> the way to go.  I don't know how necessary it is, but I like the
> idea of having the in-filesystem checksums to prevent "silent" data
> corruption.

On some machines I run btrfs on top of MD RAID. In this configuration btrfs
can't heal checksum errors, but will still detect them if they appear.

btrfs now also has built-in RAID5 and RAID6 which *can* heal errors, but
that's still way too immature for being actually used. In fact one may
consider btrfs as a whole to be not mature enough yet, but in my experience
without using fancy cutting edge features like RAID it generally works, and I
don't remember seeing mailing list reports of any data loss or corruption from
anyone in a long time.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 18:09 mdadm vs zfs for home server? Matt Garman
  2013-05-27 19:02 ` Roy Sigurd Karlsbakk
  2013-05-27 19:20 ` Roman Mamedov
@ 2013-05-27 22:33 ` Stan Hoeppner
  2013-05-27 23:50   ` Phil Turmel
  2013-05-28 15:24   ` Matt Garman
  2 siblings, 2 replies; 11+ messages in thread
From: Stan Hoeppner @ 2013-05-27 22:33 UTC (permalink / raw)
  To: Matt Garman; +Cc: linux-raid

On 5/27/2013 1:09 PM, Matt Garman wrote:
...
> I got to thinking about the chances of data loss.  First off: I do
> have backups.  But I want to take every "reasonable" precaution
> against having to use the backups.  Initially I started thinking
> about zfs's raid-z3 (basically, triple-parity raid, the next logical
> step in the raid5, raid6 progression).  But then I decided that,
> based on the check speed of my current raid6, maybe I want to get
> away from parity-based raid all together.
> 
> Now I've got another 3TB drive on the way (rounding out the total to
> six) and am leaning towards RAID-10.  I don't need the performance,
> but it should be more performant than raid6.  And I assume (though I
> could be very wrong) that the weekly "check" action ought to be much
> faster than it is with raid6.  Is this correct?

The primary reason RAID6 came into use is double drive failure during
RAID5's lengthy rebuild times causing total array loss.  RAID10 rebuilds
are the same as a mirror rebuild.  Takes ~4-6 hours with 3TB drives.
Over the ~20 years RAID10 has been in use in both soft/hardware
solutions it has been shown that partner drive loss during rebuild is
extremely rare.  RAID6 rebuild times will be double/triple or more that
of RAID10.  And these will stress all drives in the array.  A RAID10
rebuild only stresses the two drives in the mirror being rebuilt.

RAID10 rebuild time is constant regardless of array size.  RAID6 rebuild
times tend to increase as the number of drives increases.  You may not
need the application performance of RAID10, but you would surely benefit
from the drastically lower rebuild time.  The only downside to md/RAID10
is that it cannot be expanded.  Many hardware RAID controllers can
expand RAID10 arrays, however.

WRT to scheduled scrubbing, I don't do it, I don't believe in it.  While
it may give you some piece of mind, this simply puts extra wear on the
drives.  RAID6 it is self healing, right, so why bother with scrubbing?
 It's a self fulfilling prophecy kinda thing--the more you scrub, the
more likely you are to need to scrub due to the wear of previous scrubs.
 I don't do it any arrays.  It just wears the drives out quicker.

If "losing" another 3TB to redundancy isn't a problem for you, I'd go
RAID10, and format the md device directly with XFS.  You may not need
the application performance of XFS, but backups using xfsdump are faster
than you can possibly imagine.  Why?  They're performed entirely in
kernel space inside the filesystem driver, no user space calls as with
traditional Linux backup utils such as rsync.  Targets include a local
drive, a local or remote file (NFS), tape, etc.

-- 
Stan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 22:33 ` Stan Hoeppner
@ 2013-05-27 23:50   ` Phil Turmel
  2013-05-28  5:12     ` Stan Hoeppner
  2013-05-28 15:24   ` Matt Garman
  1 sibling, 1 reply; 11+ messages in thread
From: Phil Turmel @ 2013-05-27 23:50 UTC (permalink / raw)
  To: stan; +Cc: Matt Garman, linux-raid

Hi all,

On 05/27/2013 06:33 PM, Stan Hoeppner wrote:

[trim /]

> WRT to scheduled scrubbing, I don't do it, I don't believe in it.  While
> it may give you some piece of mind, this simply puts extra wear on the
> drives.  RAID6 it is self healing, right, so why bother with scrubbing?
>  It's a self fulfilling prophecy kinda thing--the more you scrub, the
> more likely you are to need to scrub due to the wear of previous scrubs.
>  I don't do it any arrays.  It just wears the drives out quicker.

I'm going to go out on a limb here and disagree with Stan.  I do "check"
scrubs in lieu of SMART long self tests, on a weekly basis.  They both
read the entire drive--necessary to uncover "pending" sectors.  But a
check scrub will rewrite that pending sector to immediately turn it into
a relocation, if it cannot be fixed.  An enterprise drive's better error
rate (an order of magnitude better, from the specs I've read) reduces
the need to do any scrub, but if you are doing long self tests anyways,
you should scrub.

In my humble opinion, *relocations* are the key indicator of approaching
drive failure, and they won't happen if pending sectors don't get rewritten.

Arguably, if you want to be anal, one could analyze the error reports
from an long self test, and "scrub" just the sectors with errors.  I
find that to be an unnecessary complication.

Phil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 23:50   ` Phil Turmel
@ 2013-05-28  5:12     ` Stan Hoeppner
  0 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2013-05-28  5:12 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Matt Garman, linux-raid

On 5/27/2013 6:50 PM, Phil Turmel wrote:
> Hi all,
> 
> On 05/27/2013 06:33 PM, Stan Hoeppner wrote:
> 
> [trim /]
> 
>> WRT to scheduled scrubbing, I don't do it, I don't believe in it.  While
>> it may give you some piece of mind, this simply puts extra wear on the
>> drives.  RAID6 it is self healing, right, so why bother with scrubbing?
>>  It's a self fulfilling prophecy kinda thing--the more you scrub, the
>> more likely you are to need to scrub due to the wear of previous scrubs.
>>  I don't do it any arrays.  It just wears the drives out quicker.
> 
> I'm going to go out on a limb here and disagree with Stan.  I do "check"
> scrubs in lieu of SMART long self tests, on a weekly basis.  They both
> read the entire drive--necessary to uncover "pending" sectors.  But a
> check scrub will rewrite that pending sector to immediately turn it into
> a relocation, if it cannot be fixed.  An enterprise drive's better error
> rate (an order of magnitude better, from the specs I've read) reduces
> the need to do any scrub, but if you are doing long self tests anyways,
> you should scrub.
...

I should have qualified my statement above because as with many IO
related things, "to scrub or not to scrub" depends largely on one's
workload as well as the quality of the drives.  If one treats an array
of WDEARS drives as a WORM device, such as in the home media server
case, scrubbing may not be a bad idea as surface defects may develop and
never be discovered until "it's too late".

At the other end of the spectrum we have a busy SMTP/POP server with an
array of Seagate SAS drives w/XFS atop and running at average ~70% of
storage capacity.  It is going to see write/read/delete cycles daily
across nearly the entire array "surface".  In this case the application
itself is performing the "scrubbing", albeit not every sector of every
drive.  But this isn't necessary, as over a period of a week or so most
sectors will be overwritten.  Now, if such a system runs at 70% of peak
IOPS capacity 24x7, running a scrub may take days to complete, and will
invariably slow down user IO, no matter how it's prioritized.  And at
this high a duty cycle the drives are sustaining wear at a good clip
already.  Running scheduled scrubs, again, simply puts more wear on the
drives.  So in this case it doesn't make a lot of sense to do scheduled
scrubs.

And of course there are all kinds of workloads and hardware quality
combinations in between.  This is why I spoke from first person up
above, stating what -I- do.  Most of the advice I give to others on this
list is formulated as "this is what -you- should do".  I won't do that
with this subject because there's too much variability.

What people should gain from this sub topic of this thread is that
scheduled scrubbing is neither statically good nor bad, but that, as
with many things IO related, it "depends".

-- 
Stan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-27 22:33 ` Stan Hoeppner
  2013-05-27 23:50   ` Phil Turmel
@ 2013-05-28 15:24   ` Matt Garman
  2013-05-28 15:55     ` Ryan Wagoner
  1 sibling, 1 reply; 11+ messages in thread
From: Matt Garman @ 2013-05-28 15:24 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: linux-raid

On Mon, May 27, 2013 at 05:33:15PM -0500, Stan Hoeppner wrote:
> The primary reason RAID6 came into use is double drive failure during
> RAID5's lengthy rebuild times causing total array loss.  RAID10 rebuilds
> are the same as a mirror rebuild.  Takes ~4-6 hours with 3TB drives.

And, based on what I've read about ZFS, since it knows about the
data, it only resyncs ("resilvers" in zfs lingo) actual data, not
the whole drive.  So depending on how full the array is, resilvering
could take even less time.

That should therefore *decrease* the chances of another disk failing
during rebuild/resilver, right?  That is, if rebuild times are
proportional to the amount of actual disk utilization (which is
assumed to be less than 100%).

> Over the ~20 years RAID10 has been in use in both soft/hardware
> solutions it has been shown that partner drive loss during rebuild
> is extremely rare.

Is that based on your experience, or have you read studies and such?
That's an honest question, not trying to start a debate, but I've
read anecdotal experience to the contrary.  One of the limitations
of doing informal internet research.  :)

> RAID10 rebuild time is constant regardless of array size.  RAID6 rebuild
> times tend to increase as the number of drives increases.  You may not
> need the application performance of RAID10, but you would surely benefit
> from the drastically lower rebuild time.  The only downside to md/RAID10
> is that it cannot be expanded.  Many hardware RAID controllers can
> expand RAID10 arrays, however.

And again, in general, my understanding is that lower rebuild times
equate to lowered chances of 2nd drive failure during the rebuild.
I don't have the math skills to predict partner drive failure (in
raid10), but intuitively, it seems like it should be fairly rare.
And in my personal case, I intend to comprise each partner (i.e.
mirror) set with drives from different manufacturers.  Again, seems
like this should give me still better statistical odds of not having
both drives in a mirror set fail at the same time.  And failing
that, that's what backups are for.  :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
  2013-05-28 15:24   ` Matt Garman
@ 2013-05-28 15:55     ` Ryan Wagoner
  0 siblings, 0 replies; 11+ messages in thread
From: Ryan Wagoner @ 2013-05-28 15:55 UTC (permalink / raw)
  To: Matt Garman; +Cc: Stan Hoeppner, linux-raid

On Tue, May 28, 2013 at 11:24 AM, Matt Garman <matthew.garman@gmail.com> wrote:
> And, based on what I've read about ZFS, since it knows about the
> data, it only resyncs ("resilvers" in zfs lingo) actual data, not
> the whole drive.  So depending on how full the array is, resilvering
> could take even less time.
>
> That should therefore *decrease* the chances of another disk failing
> during rebuild/resilver, right?  That is, if rebuild times are
> proportional to the amount of actual disk utilization (which is
> assumed to be less than 100%).

Yep ZFS will only resilver used space. When I was testing failure
scenarios with a few GB of data the resilver would complete in
seconds. Additionally if there is an unrecoverable error detected ZFS
will list the files affected. You can even have ZFS store multiple
copies of a file so if the above does happen it will replace the
damaged copy with a good copy.

Ryan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm vs zfs for home server?
@ 2013-05-28  3:09 Ryan Wagoner
  0 siblings, 0 replies; 11+ messages in thread
From: Ryan Wagoner @ 2013-05-28  3:09 UTC (permalink / raw)
  To: Matt Garman; +Cc: linux-raid

On Mon, May 27, 2013 at 2:09 PM, Matt Garman <matthew.garman@gmail.com> wrote:
>
>
> Anyone out there have a home (or maybe small office) file server
> that where they thought about native Linux software RAID (mdadm)
> versus ZFS on Linux?
>

I have a 4 x 1TB drive setup that was running CentOS 5 with mdadm and
ext4 for the last 4 years. About 3 weeks ago I reinstalled with CentOS
6 and ZFS on Linux. One of the deciding factors was I wanted the
previous version tab in Windows to function since I access the shares
mainly from Windows systems.

I've looked at LVM solutions in the past, but there were multiple
drawbacks. The recent LVM thin provisioning addresses some of the
issues, but it still was cumbersome and drawn out dealing with the
various layers.

I also looked at Solaris (OpenIndiana and OmniOS) and FreeBSD.
Obviously ZFS on Solaris just works and the performance seemed good.
However I'm not as familiar with Solaris and there isn't a large
community following for support. FreeBSD had terrible performance out
of the box accessing the Samba shares. I would see spikes where I
would get 70% of gigabit and then drop to 30% and back again. FreeBSD
seems to always require tweaking for performance, which seems
unnecessary when Linux has good performance out of the box.

Going back to CentOS 6 I followed the http://zfsonlinux.org/
directions and was up and running in minutes. Performance with Samba
was great and the system has been rock solid. Accessing shares from
Windows I can achieve 80-90% of gigabit. With ext4 I would see 90-100%
utilization on a large copy, but the features are worth the small
performance hit.

I did turn compression on and atime off. I also set the recommended
options for interoperability with Windows when creating the datasets.

zfs set compress=on data
zfs set atime=off data

zfs create -o casesensitivity=mixed -o nbmand=on data/share

I am using the https://github.com/zfsonlinux/zfs-auto-snapshot script
to create daily and weekly snapshots. You can disable snapshots per
zfs dataset with

zfs set com.sun:auto-snapshot=false data/share2

With CentOS 6.4 Samba 3.6.9 has support for the format option of the
vfs shadow_copy2 module. I added the following to /etc/samba/smb.conf
and the previous versions tab populated.

unix extensions = no

[share]
path = /data/share
wide links = yes
vfs objects = shadow_copy2
shadow: snapdir = .zfs/snapshot
shadow: format = zfs-auto-snap_daily-%Y-%m-%d-%H%M

I added a cron.d job to weekly scrub the array like the raid-check script does.

# Run system wide zfs scrub once a week on Sunday at 3am by default
0 3 * * Sun root /usr/local/sbin/zfs-scrub

Contents of the /usr/local/sbin/zfs-scrub file.

#!/bin/sh
for pool in `/sbin/zpool list -H | cut -f 1`
do
   /sbin/zpool scrub $pool
done

The only missing part is a script to check the zpool status command
for errors and send an email alert.

Ryan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-05-28 15:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-27 18:09 mdadm vs zfs for home server? Matt Garman
2013-05-27 19:02 ` Roy Sigurd Karlsbakk
2013-05-28 15:00   ` Matt Garman
2013-05-28 15:18     ` Jon Nelson
2013-05-27 19:20 ` Roman Mamedov
2013-05-27 22:33 ` Stan Hoeppner
2013-05-27 23:50   ` Phil Turmel
2013-05-28  5:12     ` Stan Hoeppner
2013-05-28 15:24   ` Matt Garman
2013-05-28 15:55     ` Ryan Wagoner
  -- strict thread matches above, loose matches on Subject: below --
2013-05-28  3:09 Ryan Wagoner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).