linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* suns raid-z / zfs
@ 2008-02-17 16:04 Keld Jørn Simonsen
  2008-02-18  4:07 ` Neil Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Keld Jørn Simonsen @ 2008-02-17 16:04 UTC (permalink / raw)
  To: linux-raid

Hi

any opinions on suns zfs/raid-z?
It seems like a good way to avoid the performance problems of raid-5
/raid-6

But does it stripe? One could think that rewriting stripes
other places would damage the striping effects.

Or is the performance only meant to be good for random read/write?

Can the code be lifted to Linux? I understand that it is already in
freebsd. Does Suns licence prevent this?

And could something like this be built into existing file systems like
ext3 and xfs? They could have a multipartition layer in their code, and
then the heuristics to optimize block access could also apply to stripe
access.

best regards
keld

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suns raid-z / zfs
  2008-02-17 16:04 suns raid-z / zfs Keld Jørn Simonsen
@ 2008-02-18  4:07 ` Neil Brown
  2008-02-18  5:33   ` Keld Jørn Simonsen
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2008-02-18  4:07 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

On Sunday February 17, keld@dkuug.dk wrote:
> Hi
> 
> any opinions on suns zfs/raid-z?

It's vaguely interesting.  I'm not sold on the idea though.

> It seems like a good way to avoid the performance problems of raid-5
> /raid-6

I think there are better ways.

> 
> But does it stripe? One could think that rewriting stripes
> other places would damage the striping effects.

I'm not sure what you mean exactly.  But I suspect your concerns here
are unjustified.

> 
> Or is the performance only meant to be good for random read/write?

I suspect it is mean to be good for everything.  But you would have to
ask SUN that.

> 
> Can the code be lifted to Linux? I understand that it is already in
> freebsd. Does Suns licence prevent this?

My understanding is that the sun license prevents it.

However raid-z only makes sense in the context of a specific
filesystem such as ZFS.  It isn't something that you could just layer
any filesystem on top of.

> 
> And could something like this be built into existing file systems like
> ext3 and xfs? They could have a multipartition layer in their code, and
> then the heuristics to optimize block access could also apply to stripe
> access.

I doubt it, but I haven't thought deeply enough about it to see if
there might be some relatively non-intrusive way.

NeilBrown

> 
> best regards
> keld
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suns raid-z / zfs
  2008-02-18  4:07 ` Neil Brown
@ 2008-02-18  5:33   ` Keld Jørn Simonsen
  2008-02-18 10:51     ` Neil Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Keld Jørn Simonsen @ 2008-02-18  5:33 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
> On Sunday February 17, keld@dkuug.dk wrote:
> > Hi
> > 
> 
> > It seems like a good way to avoid the performance problems of raid-5
> > /raid-6
> 
> I think there are better ways.

Interesting! What do you have in mind?

and what are the problems with zfs?

> > 
> > But does it stripe? One could think that rewriting stripes
> > other places would damage the striping effects.
> 
> I'm not sure what you mean exactly.  But I suspect your concerns here
> are unjustified.

More precisely. I understand that zfs always write the data anew.
That would mean at other blocks on the partitions, for the logical blocks
of the file in question. So the blocks on the partitions will not be
adjacant. And striping will not be possible, generally.
> 
> > 
> > Can the code be lifted to Linux? I understand that it is already in
> > freebsd. Does Suns licence prevent this?
> 
> My understanding is that the sun license prevents it.
> 
> However raid-z only makes sense in the context of a specific
> filesystem such as ZFS.  It isn't something that you could just layer
> any filesystem on top of.

That is understood.
> 
> > 
> > And could something like this be built into existing file systems like
> > ext3 and xfs? They could have a multipartition layer in their code, and
> > then the heuristics to optimize block access could also apply to stripe
> > access.
> 
> I doubt it, but I haven't thought deeply enough about it to see if
> there might be some relatively non-intrusive way.

Hmm, I think this is not for the raid layer to do, as I understand it.

Best regards
keld

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suns raid-z / zfs
  2008-02-18  5:33   ` Keld Jørn Simonsen
@ 2008-02-18 10:51     ` Neil Brown
  2008-02-18 20:45       ` Keld Jørn Simonsen
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2008-02-18 10:51 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Neil Brown, linux-raid

On Monday February 18, keld@dkuug.dk wrote:
> On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
> > On Sunday February 17, keld@dkuug.dk wrote:
> > > Hi
> > > 
> > 
> > > It seems like a good way to avoid the performance problems of raid-5
> > > /raid-6
> > 
> > I think there are better ways.
> 
> Interesting! What do you have in mind?

A "Log Structured Filesystem" always does large contiguous writes.
Aligning these to the raid5 stripes wouldn't be too hard and then you
would never have to do any pre-reading.

> 
> and what are the problems with zfs?

Recovery after a failed drive would not be an easy operation, and I
cannot imagine it being even close to the raw speed of the device.

> 
> > > 
> > > But does it stripe? One could think that rewriting stripes
> > > other places would damage the striping effects.
> > 
> > I'm not sure what you mean exactly.  But I suspect your concerns here
> > are unjustified.
> 
> More precisely. I understand that zfs always write the data anew.
> That would mean at other blocks on the partitions, for the logical blocks
> of the file in question. So the blocks on the partitions will not be
> adjacant. And striping will not be possible, generally.

The important part of striping is that a write is spread out over
multiple devices, isn't it.

If ZFS can choose where to put each block that it writes, it can
easily choose to write a series of blocks to a collection of different
devices, thus getting the major benefit of striping.


NeilBrown

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suns raid-z / zfs
  2008-02-18 10:51     ` Neil Brown
@ 2008-02-18 20:45       ` Keld Jørn Simonsen
  2008-02-21 10:37         ` Mario 'BitKoenig' Holbe
  2008-02-26 20:27         ` Bill Davidsen
  0 siblings, 2 replies; 7+ messages in thread
From: Keld Jørn Simonsen @ 2008-02-18 20:45 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
> On Monday February 18, keld@dkuug.dk wrote:
> > On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
> > > On Sunday February 17, keld@dkuug.dk wrote:
> > > > Hi
> > > > 
> > > 
> > > > It seems like a good way to avoid the performance problems of raid-5
> > > > /raid-6
> > > 
> > > I think there are better ways.
> > 
> > Interesting! What do you have in mind?
> 
> A "Log Structured Filesystem" always does large contiguous writes.
> Aligning these to the raid5 stripes wouldn't be too hard and then you
> would never have to do any pre-reading.
> 
> > 
> > and what are the problems with zfs?
> 
> Recovery after a failed drive would not be an easy operation, and I
> cannot imagine it being even close to the raw speed of the device.

I thought this was a problem with most raid types, while
reconstructioning, performance is quite slow. And as there has been some
damage, this is expected. And there probebly is no much ado about it.

Or is there? Are there any RAID types that performs reasonably well
given that one disk is under repair? The performance could be cruical
for some applications. 

One could think of clever arrangements so that say two disks could go
down and the rest of the array with 10-20 drives could still function
reasonably well, even under the reconstruction. As far as I can tell
from the code, the reconstruction itself is not impeding normal
performance much, as normal operation bars reconstuction operations.

Hmm, my understanding would then be, for both random reads and writes
that performance in typical raids would only be reduced by the IO bandwidth
of the failing disks.

For sequential R/W performance for raid10,f would
be hurt, downgrading its performance to random IO for the drives involved.

Raid5/6 would be hurt much for reading, as all drives need to be read for giving
correct information during reconstruction.


So it looks like, if your performance is important under a
reconstruction, then you should avoid raid5/6 and use the mirrored raid
types. Given you have a big operation, with a load balance of a lot of
random reading and writing, it does not matter much which mirrored
raid type you would choose, as they all perform about equal for random
IO, even when reconstructing. Is that correct advice?

> > 
> > > > 
> > > > But does it stripe? One could think that rewriting stripes
> > > > other places would damage the striping effects.
> > > 
> > > I'm not sure what you mean exactly.  But I suspect your concerns here
> > > are unjustified.
> > 
> > More precisely. I understand that zfs always write the data anew.
> > That would mean at other blocks on the partitions, for the logical blocks
> > of the file in question. So the blocks on the partitions will not be
> > adjacant. And striping will not be possible, generally.
> 
> The important part of striping is that a write is spread out over
> multiple devices, isn't it.
> 
> If ZFS can choose where to put each block that it writes, it can
> easily choose to write a series of blocks to a collection of different
> devices, thus getting the major benefit of striping.

I see 2 major benefits of striping: one is that many drives are involved 
and the other is that the stripes are  allocated adjacant, so that io
on one drive can just proceed to the next physical blocks when one
stripe has been processed. Dependent on the size of the IO operations
involved, first one or more disks in a stripe is processed, and then the
following stripes are processed. ZFS misses the second part of the
optimization, In think.

Best regards
Keld

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suns raid-z / zfs
  2008-02-18 20:45       ` Keld Jørn Simonsen
@ 2008-02-21 10:37         ` Mario 'BitKoenig' Holbe
  2008-02-26 20:27         ` Bill Davidsen
  1 sibling, 0 replies; 7+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2008-02-21 10:37 UTC (permalink / raw)
  To: linux-raid

Keld Jørn Simonsen <keld@dkuug.dk> wrote:
> On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
>> Recovery after a failed drive would not be an easy operation, and I
>> cannot imagine it being even close to the raw speed of the device.
> I thought this was a problem with most raid types, while
> reconstructioning, performance is quite slow. And as there has been some

There is a difference between "recovery is quite slow" and "raid device
access is quite slow" The former is an issue since it stretches the time
where you're in non-redundant danger while the latter is just
inconvenient.


regards
   Mario
-- 
I heard, if you play a NT-CD backwards, you get satanic messages...
That's nothing. If you play it forwards, it installs NT.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: suns raid-z / zfs
  2008-02-18 20:45       ` Keld Jørn Simonsen
  2008-02-21 10:37         ` Mario 'BitKoenig' Holbe
@ 2008-02-26 20:27         ` Bill Davidsen
  1 sibling, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2008-02-26 20:27 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Neil Brown, linux-raid

Keld Jørn Simonsen wrote:
> On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
>   
>> On Monday February 18, keld@dkuug.dk wrote:
>>     
>>> On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
>>>       
>>>> On Sunday February 17, keld@dkuug.dk wrote:
>>>>         
>>>>> Hi
>>>>>
>>>>>           
>>>>> It seems like a good way to avoid the performance problems of raid-5
>>>>> /raid-6
>>>>>           
>>>> I think there are better ways.
>>>>         
>>> Interesting! What do you have in mind?
>>>       
>> A "Log Structured Filesystem" always does large contiguous writes.
>> Aligning these to the raid5 stripes wouldn't be too hard and then you
>> would never have to do any pre-reading.
>>
>>     
>>> and what are the problems with zfs?
>>>       
>> Recovery after a failed drive would not be an easy operation, and I
>> cannot imagine it being even close to the raw speed of the device.
>>     
>
> I thought this was a problem with most raid types, while
> reconstructioning, performance is quite slow. And as there has been some
> damage, this is expected. And there probebly is no much ado about it.
>
> Or is there? Are there any RAID types that performs reasonably well
> given that one disk is under repair? The performance could be cruical
> for some applications. 
>
>   
If that's a requirement, RAID1 with multiple copies would probably be 
your best best. You could probably design a test from existing software 
and a script, I'm just basing the thought on having run load on a 
recovering 4 way mirror at one time. The load was 100-250 random 
reads/sec, and response time stayed acceptable.
> One could think of clever arrangements so that say two disks could go
> down and the rest of the array with 10-20 drives could still function
> reasonably well, even under the reconstruction. As far as I can tell
> from the code, the reconstruction itself is not impeding normal
> performance much, as normal operation bars reconstuction operations.
>
> Hmm, my understanding would then be, for both random reads and writes
> that performance in typical raids would only be reduced by the IO bandwidth
> of the failing disks.
>
> For sequential R/W performance for raid10,f would
> be hurt, downgrading its performance to random IO for the drives involved.
>
> Raid5/6 would be hurt much for reading, as all drives need to be read for giving
> correct information during reconstruction.
>
>
> So it looks like, if your performance is important under a
> reconstruction, then you should avoid raid5/6 and use the mirrored raid
> types. Given you have a big operation, with a load balance of a lot of
> random reading and writing, it does not matter much which mirrored
> raid type you would choose, as they all perform about equal for random
> IO, even when reconstructing. Is that correct advice?
>
>   
>>>>> But does it stripe? One could think that rewriting stripes
>>>>> other places would damage the striping effects.
>>>>>           
>>>> I'm not sure what you mean exactly.  But I suspect your concerns here
>>>> are unjustified.
>>>>         
>>> More precisely. I understand that zfs always write the data anew.
>>> That would mean at other blocks on the partitions, for the logical blocks
>>> of the file in question. So the blocks on the partitions will not be
>>> adjacant. And striping will not be possible, generally.
>>>       
>> The important part of striping is that a write is spread out over
>> multiple devices, isn't it.
>>
>> If ZFS can choose where to put each block that it writes, it can
>> easily choose to write a series of blocks to a collection of different
>> devices, thus getting the major benefit of striping.
>>     
>
> I see 2 major benefits of striping: one is that many drives are involved 
> and the other is that the stripes are  allocated adjacant, so that io
> on one drive can just proceed to the next physical blocks when one
> stripe has been processed. Dependent on the size of the IO operations
> involved, first one or more disks in a stripe is processed, and then the
> following stripes are processed. ZFS misses the second part of the
> optimization, In think.
>   


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-26 20:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-17 16:04 suns raid-z / zfs Keld Jørn Simonsen
2008-02-18  4:07 ` Neil Brown
2008-02-18  5:33   ` Keld Jørn Simonsen
2008-02-18 10:51     ` Neil Brown
2008-02-18 20:45       ` Keld Jørn Simonsen
2008-02-21 10:37         ` Mario 'BitKoenig' Holbe
2008-02-26 20:27         ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).