From: Bill Davidsen <davidsen@tmr.com>
To: "Keld Jørn Simonsen" <keld@dkuug.dk>
Cc: Neil Brown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: suns raid-z / zfs
Date: Tue, 26 Feb 2008 15:27:26 -0500 [thread overview]
Message-ID: <47C4762E.4080700@tmr.com> (raw)
In-Reply-To: <20080218204529.GA17984@rap.rap.dk>
Keld Jørn Simonsen wrote:
> On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
>
>> On Monday February 18, keld@dkuug.dk wrote:
>>
>>> On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
>>>
>>>> On Sunday February 17, keld@dkuug.dk wrote:
>>>>
>>>>> Hi
>>>>>
>>>>>
>>>>> It seems like a good way to avoid the performance problems of raid-5
>>>>> /raid-6
>>>>>
>>>> I think there are better ways.
>>>>
>>> Interesting! What do you have in mind?
>>>
>> A "Log Structured Filesystem" always does large contiguous writes.
>> Aligning these to the raid5 stripes wouldn't be too hard and then you
>> would never have to do any pre-reading.
>>
>>
>>> and what are the problems with zfs?
>>>
>> Recovery after a failed drive would not be an easy operation, and I
>> cannot imagine it being even close to the raw speed of the device.
>>
>
> I thought this was a problem with most raid types, while
> reconstructioning, performance is quite slow. And as there has been some
> damage, this is expected. And there probebly is no much ado about it.
>
> Or is there? Are there any RAID types that performs reasonably well
> given that one disk is under repair? The performance could be cruical
> for some applications.
>
>
If that's a requirement, RAID1 with multiple copies would probably be
your best best. You could probably design a test from existing software
and a script, I'm just basing the thought on having run load on a
recovering 4 way mirror at one time. The load was 100-250 random
reads/sec, and response time stayed acceptable.
> One could think of clever arrangements so that say two disks could go
> down and the rest of the array with 10-20 drives could still function
> reasonably well, even under the reconstruction. As far as I can tell
> from the code, the reconstruction itself is not impeding normal
> performance much, as normal operation bars reconstuction operations.
>
> Hmm, my understanding would then be, for both random reads and writes
> that performance in typical raids would only be reduced by the IO bandwidth
> of the failing disks.
>
> For sequential R/W performance for raid10,f would
> be hurt, downgrading its performance to random IO for the drives involved.
>
> Raid5/6 would be hurt much for reading, as all drives need to be read for giving
> correct information during reconstruction.
>
>
> So it looks like, if your performance is important under a
> reconstruction, then you should avoid raid5/6 and use the mirrored raid
> types. Given you have a big operation, with a load balance of a lot of
> random reading and writing, it does not matter much which mirrored
> raid type you would choose, as they all perform about equal for random
> IO, even when reconstructing. Is that correct advice?
>
>
>>>>> But does it stripe? One could think that rewriting stripes
>>>>> other places would damage the striping effects.
>>>>>
>>>> I'm not sure what you mean exactly. But I suspect your concerns here
>>>> are unjustified.
>>>>
>>> More precisely. I understand that zfs always write the data anew.
>>> That would mean at other blocks on the partitions, for the logical blocks
>>> of the file in question. So the blocks on the partitions will not be
>>> adjacant. And striping will not be possible, generally.
>>>
>> The important part of striping is that a write is spread out over
>> multiple devices, isn't it.
>>
>> If ZFS can choose where to put each block that it writes, it can
>> easily choose to write a series of blocks to a collection of different
>> devices, thus getting the major benefit of striping.
>>
>
> I see 2 major benefits of striping: one is that many drives are involved
> and the other is that the stripes are allocated adjacant, so that io
> on one drive can just proceed to the next physical blocks when one
> stripe has been processed. Dependent on the size of the IO operations
> involved, first one or more disks in a stripe is processed, and then the
> following stripes are processed. ZFS misses the second part of the
> optimization, In think.
>
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2008-02-26 20:27 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-17 16:04 suns raid-z / zfs Keld Jørn Simonsen
2008-02-18 4:07 ` Neil Brown
2008-02-18 5:33 ` Keld Jørn Simonsen
2008-02-18 10:51 ` Neil Brown
2008-02-18 20:45 ` Keld Jørn Simonsen
2008-02-21 10:37 ` Mario 'BitKoenig' Holbe
2008-02-26 20:27 ` Bill Davidsen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47C4762E.4080700@tmr.com \
--to=davidsen@tmr.com \
--cc=keld@dkuug.dk \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).