Re: suns raid-z / zfs - Bill Davidsen

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bill Davidsen <davidsen@tmr.com>
To: "Keld Jørn Simonsen" <keld@dkuug.dk>
Cc: Neil Brown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: suns raid-z / zfs
Date: Tue, 26 Feb 2008 15:27:26 -0500	[thread overview]
Message-ID: <47C4762E.4080700@tmr.com> (raw)
In-Reply-To: <20080218204529.GA17984@rap.rap.dk>

Keld Jørn Simonsen wrote:
> On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
>   
>> On Monday February 18, keld@dkuug.dk wrote:
>>     
>>> On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
>>>       
>>>> On Sunday February 17, keld@dkuug.dk wrote:
>>>>         
>>>>> Hi
>>>>>
>>>>>           
>>>>> It seems like a good way to avoid the performance problems of raid-5
>>>>> /raid-6
>>>>>           
>>>> I think there are better ways.
>>>>         
>>> Interesting! What do you have in mind?
>>>       
>> A "Log Structured Filesystem" always does large contiguous writes.
>> Aligning these to the raid5 stripes wouldn't be too hard and then you
>> would never have to do any pre-reading.
>>
>>     
>>> and what are the problems with zfs?
>>>       
>> Recovery after a failed drive would not be an easy operation, and I
>> cannot imagine it being even close to the raw speed of the device.
>>     
>
> I thought this was a problem with most raid types, while
> reconstructioning, performance is quite slow. And as there has been some
> damage, this is expected. And there probebly is no much ado about it.
>
> Or is there? Are there any RAID types that performs reasonably well
> given that one disk is under repair? The performance could be cruical
> for some applications. 
>
>   
If that's a requirement, RAID1 with multiple copies would probably be 
your best best. You could probably design a test from existing software 
and a script, I'm just basing the thought on having run load on a 
recovering 4 way mirror at one time. The load was 100-250 random 
reads/sec, and response time stayed acceptable.
> One could think of clever arrangements so that say two disks could go
> down and the rest of the array with 10-20 drives could still function
> reasonably well, even under the reconstruction. As far as I can tell
> from the code, the reconstruction itself is not impeding normal
> performance much, as normal operation bars reconstuction operations.
>
> Hmm, my understanding would then be, for both random reads and writes
> that performance in typical raids would only be reduced by the IO bandwidth
> of the failing disks.
>
> For sequential R/W performance for raid10,f would
> be hurt, downgrading its performance to random IO for the drives involved.
>
> Raid5/6 would be hurt much for reading, as all drives need to be read for giving
> correct information during reconstruction.
>
>
> So it looks like, if your performance is important under a
> reconstruction, then you should avoid raid5/6 and use the mirrored raid
> types. Given you have a big operation, with a load balance of a lot of
> random reading and writing, it does not matter much which mirrored
> raid type you would choose, as they all perform about equal for random
> IO, even when reconstructing. Is that correct advice?
>
>   
>>>>> But does it stripe? One could think that rewriting stripes
>>>>> other places would damage the striping effects.
>>>>>           
>>>> I'm not sure what you mean exactly.  But I suspect your concerns here
>>>> are unjustified.
>>>>         
>>> More precisely. I understand that zfs always write the data anew.
>>> That would mean at other blocks on the partitions, for the logical blocks
>>> of the file in question. So the blocks on the partitions will not be
>>> adjacant. And striping will not be possible, generally.
>>>       
>> The important part of striping is that a write is spread out over
>> multiple devices, isn't it.
>>
>> If ZFS can choose where to put each block that it writes, it can
>> easily choose to write a series of blocks to a collection of different
>> devices, thus getting the major benefit of striping.
>>     
>
> I see 2 major benefits of striping: one is that many drives are involved 
> and the other is that the stripes are  allocated adjacant, so that io
> on one drive can just proceed to the next physical blocks when one
> stripe has been processed. Dependent on the size of the IO operations
> involved, first one or more disks in a stripe is processed, and then the
> following stripes are processed. ZFS misses the second part of the
> optimization, In think.
>   


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2008-02-26 20:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-17 16:04 suns raid-z / zfs Keld Jørn Simonsen
2008-02-18  4:07 ` Neil Brown
2008-02-18  5:33   ` Keld Jørn Simonsen
2008-02-18 10:51     ` Neil Brown
2008-02-18 20:45       ` Keld Jørn Simonsen
2008-02-21 10:37         ` Mario 'BitKoenig' Holbe
2008-02-26 20:27         ` Bill Davidsen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47C4762E.4080700@tmr.com \
    --to=davidsen@tmr.com \
    --cc=keld@dkuug.dk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).