RAID5

All of lore.kernel.org
 help / color / mirror / Atom feed

* RAID5
@ 2010-04-19  3:46 Kaushal Shriyan
  2010-04-19  4:21 ` RAID5 Michael Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Kaushal Shriyan @ 2010-04-19  3:46 UTC (permalink / raw)
  To: linux-raid

Hi

I am a newbie to RAID. is strip size and block size same. How is it
calculated. is it 64Kb by default. what should be the strip size ?

I have referred to
http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is
parity handled in case of RAID 5.

Please explain me with an example.

Thanks and Regards,

Kaushal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-19  3:46 RAID5 Kaushal Shriyan
@ 2010-04-19  4:21 ` Michael Evans
  2010-04-21 13:32   ` RAID5 Bill Davidsen
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Evans @ 2010-04-19  4:21 UTC (permalink / raw)
  To: Kaushal Shriyan; +Cc: linux-raid

On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan
<kaushalshriyan@gmail.com> wrote:
> Hi
>
> I am a newbie to RAID. is strip size and block size same. How is it
> calculated. is it 64Kb by default. what should be the strip size ?
>
> I have referred to
> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is
> parity handled in case of RAID 5.
>
> Please explain me with an example.
>
> Thanks and Regards,
>
> Kaushal
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

You already have one good resource.

I wrote this a while ago, and the preface may answer some questions
you have about the terminology used.

http://wiki.tldp.org/LVM-on-RAID

However the question you're asking is more or less borderline
off-topic for this mailing list.  If the linked information is
insufficient I suggest using the Wikipedia article's links to learn
more.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-19  4:21 ` RAID5 Michael Evans
@ 2010-04-21 13:32   ` Bill Davidsen
  2010-04-21 19:43     ` RAID5 Michael Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Davidsen @ 2010-04-21 13:32 UTC (permalink / raw)
  To: Michael Evans; +Cc: Kaushal Shriyan, linux-raid

Michael Evans wrote:
> On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan
> <kaushalshriyan@gmail.com> wrote:
>   
>> Hi
>>
>> I am a newbie to RAID. is strip size and block size same. How is it
>> calculated. is it 64Kb by default. what should be the strip size ?
>>
>> I have referred to
>> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is
>> parity handled in case of RAID 5.
>>
>> Please explain me with an example.
>>
>> Thanks and Regards,
>>
>> Kaushal
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>     
>
> You already have one good resource.
>
> I wrote this a while ago, and the preface may answer some questions
> you have about the terminology used.
>
> http://wiki.tldp.org/LVM-on-RAID
>
> However the question you're asking is more or less borderline
> off-topic for this mailing list.  If the linked information is
> insufficient I suggest using the Wikipedia article's links to learn
> more.
>   

I have some recent experience with this gained the hard way, by looking 
for a problem rather than curiousity. My experience with LVM on RAID is 
that, at least for RAID-5, write performance sucks. I created two 
partitions on each of three drives, and two raid-5 arrays using those 
partitions. Same block size, same tuning for stripe-cache, etc. I 
dropped an ext4 on on array, and LVM on the other, put ext4 on the LVM 
drive, and copied 500GB to each. LVM had a 50% performance penalty, took 
twice as long. Repeated with four drives (all I could spare) and found 
that the speed right on an array was roughly 3x slower with LVM.

I did not look into it further, I know why the performance is bad, I 
don't have the hardware to change things right now, so I live with it. 
When I get back from a trip I will change that.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-21 13:32   ` RAID5 Bill Davidsen
@ 2010-04-21 19:43     ` Michael Evans
  2010-04-23 14:26       ` RAID5 Michael Tokarev
  2010-05-02 22:45       ` RAID5 Bill Davidsen
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Evans @ 2010-04-21 19:43 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Kaushal Shriyan, linux-raid

On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote:
> Michael Evans wrote:
>>
>> On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan
>> <kaushalshriyan@gmail.com> wrote:
>>
>>>
>>> Hi
>>>
>>> I am a newbie to RAID. is strip size and block size same. How is it
>>> calculated. is it 64Kb by default. what should be the strip size ?
>>>
>>> I have referred to
>>> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is
>>> parity handled in case of RAID 5.
>>>
>>> Please explain me with an example.
>>>
>>> Thanks and Regards,
>>>
>>> Kaushal
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>> You already have one good resource.
>>
>> I wrote this a while ago, and the preface may answer some questions
>> you have about the terminology used.
>>
>> http://wiki.tldp.org/LVM-on-RAID
>>
>> However the question you're asking is more or less borderline
>> off-topic for this mailing list.  If the linked information is
>> insufficient I suggest using the Wikipedia article's links to learn
>> more.
>>
>
> I have some recent experience with this gained the hard way, by looking for
> a problem rather than curiousity. My experience with LVM on RAID is that, at
> least for RAID-5, write performance sucks. I created two partitions on each
> of three drives, and two raid-5 arrays using those partitions. Same block
> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and
> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM
> had a 50% performance penalty, took twice as long. Repeated with four drives
> (all I could spare) and found that the speed right on an array was roughly
> 3x slower with LVM.
>
> I did not look into it further, I know why the performance is bad, I don't
> have the hardware to change things right now, so I live with it. When I get
> back from a trip I will change that.
>
> --
> Bill Davidsen <davidsen@tmr.com>
>  "We can't solve today's problems by using the same thinking we
>  used in creating them." - Einstein
>
>

This issues sounds very likely to be write barrier related.  Were you
using an external journal on a write-barrier honoring device?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-21 19:43     ` RAID5 Michael Evans
@ 2010-04-23 14:26       ` Michael Tokarev
  2010-04-23 14:57         ` RAID5 MRK
                           ` (3 more replies)
  2010-05-02 22:45       ` RAID5 Bill Davidsen
  1 sibling, 4 replies; 16+ messages in thread
From: Michael Tokarev @ 2010-04-23 14:26 UTC (permalink / raw)
  To: Michael Evans; +Cc: Bill Davidsen, Kaushal Shriyan, linux-raid

Michael Evans wrote:
> On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote:
[]
>> I have some recent experience with this gained the hard way, by looking for
>> a problem rather than curiousity. My experience with LVM on RAID is that, at
>> least for RAID-5, write performance sucks. I created two partitions on each
>> of three drives, and two raid-5 arrays using those partitions. Same block
>> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and
>> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM
>> had a 50% performance penalty, took twice as long. Repeated with four drives
>> (all I could spare) and found that the speed right on an array was roughly
>> 3x slower with LVM.
>>
> This issues sounds very likely to be write barrier related.  Were you
> using an external journal on a write-barrier honoring device?

This is most likely due to read-modify-write cycle which is present on
lvm-on-raid[456] if the number of data drives is not a power of two.
LVM requires the block size to be a power of two, so if you can't fit
some number of LVM blocks on whole raid stripe size your write speed
is expected to be ~3 times worse...

Even creating partitions on such raid array is difficult.

'Hwell.

Unfortunately very few people understand this.

As of write barriers, it looks like either they already work
(in 2.6.33) or will be (in 2.6.34) for whole raid5-lvm stack.

/mjt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-23 14:26       ` RAID5 Michael Tokarev
@ 2010-04-23 14:57         ` MRK
  2010-04-23 20:57         ` RAID5 Michael Evans
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: MRK @ 2010-04-23 14:57 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Michael Evans, Bill Davidsen, Kaushal Shriyan, linux-raid

On 04/23/2010 04:26 PM, Michael Tokarev wrote:
>
> This is most likely due to read-modify-write cycle which is present on
> lvm-on-raid[456] if the number of data drives is not a power of two.
> LVM requires the block size to be a power of two, so if you can't fit
> some number of LVM blocks on whole raid stripe size your write speed
> is expected to be ~3 times worse...
>
> Even creating partitions on such raid array is difficult.
>
>
>
>    

Seriously?
a number of data drives power of 2 would be an immense limitation.
Why should that be? I understand that LVM blocks would not be aligned to 
raid stripes, and this can worsen the problem for random writes, but if 
the write is sequential, the raid stripe will still be filled at the 
next block-output by LVM.
Maybe the very first stripe you write will get an RMW but the next ones 
will be filled in the wait, and also consider you have the 
preread_bypass_threshold feature by MD which helps in this.

Also if you really need to put an integer number of LVM blocks in an MD 
stripe (which I doubt, as I wrote above), this still does not mean that 
the number of drives needs to be a power of 2: e.g. you could put 10 LVM 
blocks in 5 data disks, couldn't you?

I would think more to a barriers thing... I'd try to repeat the test 
with nobarrier upon ext4 mount and see.
But Bill says that he "knows" what's the problem so maybe he will tell 
us earlier or later :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-23 14:26       ` RAID5 Michael Tokarev
  2010-04-23 14:57         ` RAID5 MRK
@ 2010-04-23 20:57         ` Michael Evans
  2010-04-24  1:47           ` RAID5 Mikael Abrahamsson
  2010-05-02 22:51         ` RAID5 Bill Davidsen
  2010-05-03  5:51         ` RAID5 Luca Berra
  3 siblings, 1 reply; 16+ messages in thread
From: Michael Evans @ 2010-04-23 20:57 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Bill Davidsen, Kaushal Shriyan, linux-raid

On Fri, Apr 23, 2010 at 7:26 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> Michael Evans wrote:
>> On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote:
> []
>>> I have some recent experience with this gained the hard way, by looking for
>>> a problem rather than curiousity. My experience with LVM on RAID is that, at
>>> least for RAID-5, write performance sucks. I created two partitions on each
>>> of three drives, and two raid-5 arrays using those partitions. Same block
>>> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and
>>> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM
>>> had a 50% performance penalty, took twice as long. Repeated with four drives
>>> (all I could spare) and found that the speed right on an array was roughly
>>> 3x slower with LVM.
>>>
>> This issues sounds very likely to be write barrier related.  Were you
>> using an external journal on a write-barrier honoring device?
>
> This is most likely due to read-modify-write cycle which is present on
> lvm-on-raid[456] if the number of data drives is not a power of two.
> LVM requires the block size to be a power of two, so if you can't fit
> some number of LVM blocks on whole raid stripe size your write speed
> is expected to be ~3 times worse...
>
> Even creating partitions on such raid array is difficult.
>
> 'Hwell.
>
> Unfortunately very few people understand this.
>
> As of write barriers, it looks like either they already work
> (in 2.6.33) or will be (in 2.6.34) for whole raid5-lvm stack.
>
> /mjt
>

Even when write barriers are supported what will a typical transaction
look like?

Journal Flush
Data Flush
Journal Flush (maybe)

If the operations are small (which the journal ops should be) then
you're forced to wait for a read, and then make a write barrier after
it.

J.read(2 drives)
J.write(2 drives) -- Barrier
D.read(2 drives)
D.write(2 drives) -- Barrier
Then maybe
J.read(2 drives) (Hopefully cached, but could cross in to a new stripe...)
J.write(2 drives) -- Barrier

This is why an external journal on another device is a great idea.
Unfortunately what I really want is something like 512mb of battery
backed ram (at any vaguely modern speed) to split up as a journal
devices, but now everyone is selling SDDs which are broken for such
needs.  Any ram drive units still being sold seem to be more along
data-center grade sizes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-23 20:57         ` RAID5 Michael Evans
@ 2010-04-24  1:47           ` Mikael Abrahamsson
  2010-04-24  3:34             ` RAID5 Michael Evans
  0 siblings, 1 reply; 16+ messages in thread
From: Mikael Abrahamsson @ 2010-04-24  1:47 UTC (permalink / raw)
  To: Michael Evans; +Cc: Michael Tokarev, Bill Davidsen, Kaushal Shriyan, linux-raid

On Fri, 23 Apr 2010, Michael Evans wrote:

> devices, but now everyone is selling SDDs which are broken for such
> needs.  Any ram drive units still being sold seem to be more along
> data-center grade sizes.

http://benchmarkreviews.com/index.php?option=com_content&task=view&id=308&Itemid=60

Basically it's DRAM with a battery backup and a CF slot where the data 
goes in case of poewr failure. It's a bit big and so on, but it should be 
perfect for journals... Or is this the kind of device you were referring 
to as "data center grade size"?

Some of te SSDs sold today have a capacitor for power failure as well, so 
all writes will complete, but they're not so common.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-24  1:47           ` RAID5 Mikael Abrahamsson
@ 2010-04-24  3:34             ` Michael Evans
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Evans @ 2010-04-24  3:34 UTC (permalink / raw)
  To: Mikael Abrahamsson
  Cc: Michael Tokarev, Bill Davidsen, Kaushal Shriyan, linux-raid

On Fri, Apr 23, 2010 at 6:47 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Fri, 23 Apr 2010, Michael Evans wrote:
>
>> devices, but now everyone is selling SDDs which are broken for such
>> needs.  Any ram drive units still being sold seem to be more along
>> data-center grade sizes.
>
> http://benchmarkreviews.com/index.php?option=com_content&task=view&id=308&Itemid=60
>
> Basically it's DRAM with a battery backup and a CF slot where the data goes
> in case of poewr failure. It's a bit big and so on, but it should be perfect
> for journals... Or is this the kind of device you were referring to as "data
> center grade size"?
>
> Some of te SSDs sold today have a capacitor for power failure as well, so
> all writes will complete, but they're not so common.
>
> --
> Mikael Abrahamsson    email: swmike@swm.pp.se
>

Yeah, that's in the range I call 'data center grade' since the least
expensive model I can find using search tools is about 236 USD.  For
that price I could /buy/ two to three hard drives and get nearly the
same effect by reusing old drives (but wasting more power).

I should be able to find something with a cheep plastic shell for
mounting and a very simple PCB that has slots for older ram of my
selection, and a minimal onboard CPU for less than 50USD; I seriously
doubt the components cost that much.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-23 14:26       ` RAID5 Michael Tokarev
  2010-04-23 14:57         ` RAID5 MRK
  2010-04-23 20:57         ` RAID5 Michael Evans
@ 2010-05-02 22:51         ` Bill Davidsen
  2010-05-03  5:51         ` RAID5 Luca Berra
  3 siblings, 0 replies; 16+ messages in thread
From: Bill Davidsen @ 2010-05-02 22:51 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Michael Evans, Kaushal Shriyan, linux-raid

Michael Tokarev wrote:
> Michael Evans wrote:
>   
>> On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote:
>>     
> []
>   
>>> I have some recent experience with this gained the hard way, by looking for
>>> a problem rather than curiousity. My experience with LVM on RAID is that, at
>>> least for RAID-5, write performance sucks. I created two partitions on each
>>> of three drives, and two raid-5 arrays using those partitions. Same block
>>> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and
>>> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM
>>> had a 50% performance penalty, took twice as long. Repeated with four drives
>>> (all I could spare) and found that the speed right on an array was roughly
>>> 3x slower with LVM.
>>>
>>>       
>> This issues sounds very likely to be write barrier related.  Were you
>> using an external journal on a write-barrier honoring device?
>>     
>
> This is most likely due to read-modify-write cycle which is present on
> lvm-on-raid[456] if the number of data drives is not a power of two.
> LVM requires the block size to be a power of two, so if you can't fit
> some number of LVM blocks on whole raid stripe size your write speed
> is expected to be ~3 times worse...
>
>   
Since I tried 3 and 4 drive setups, with several chunk sizes, I would 
hope that no matter how lvm counts data drives (why does it care?) it 
would find a power of two there.
> Even creating partitions on such raid array is difficult.
>
> 'Hwell.
>
> Unfortunately very few people understand this.
>
> As of write barriers, it looks like either they already work
> (in 2.6.33) or will be (in 2.6.34) for whole raid5-lvm stack.
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-23 14:26       ` RAID5 Michael Tokarev
                           ` (2 preceding siblings ...)
  2010-05-02 22:51         ` RAID5 Bill Davidsen
@ 2010-05-03  5:51         ` Luca Berra
  3 siblings, 0 replies; 16+ messages in thread
From: Luca Berra @ 2010-05-03  5:51 UTC (permalink / raw)
  To: linux-raid

On Fri, Apr 23, 2010 at 06:26:20PM +0400, Michael Tokarev wrote:
>This is most likely due to read-modify-write cycle which is present on
>lvm-on-raid[456] if the number of data drives is not a power of two.
>LVM requires the block size to be a power of two, so if you can't fit
>some number of LVM blocks on whole raid stripe size your write speed
>is expected to be ~3 times worse...
uh?
PE size != block size.
PE size is not used for io, it is only used for laying out data.
It will influence data alignment, but i believe the issue may be
bypassed if we make PE size == chunk_size and do all creation/extension
of LV in multiple of data_disks, the resulting device-mapper tables
should be aligned.

L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5
  2010-04-21 19:43     ` RAID5 Michael Evans
  2010-04-23 14:26       ` RAID5 Michael Tokarev
@ 2010-05-02 22:45       ` Bill Davidsen
  1 sibling, 0 replies; 16+ messages in thread
From: Bill Davidsen @ 2010-05-02 22:45 UTC (permalink / raw)
  To: Michael Evans; +Cc: Kaushal Shriyan, linux-raid

Michael Evans wrote:
> On Wed, Apr 21, 2010 at 6:32 AM, Bill Davidsen <davidsen@tmr.com> wrote:
>   
>> Michael Evans wrote:
>>     
>>> On Sun, Apr 18, 2010 at 8:46 PM, Kaushal Shriyan
>>> <kaushalshriyan@gmail.com> wrote:
>>>
>>>       
>>>> Hi
>>>>
>>>> I am a newbie to RAID. is strip size and block size same. How is it
>>>> calculated. is it 64Kb by default. what should be the strip size ?
>>>>
>>>> I have referred to
>>>> http://en.wikipedia.org/wiki/Raid5#RAID_5_parity_handling. How is
>>>> parity handled in case of RAID 5.
>>>>
>>>> Please explain me with an example.
>>>>
>>>> Thanks and Regards,
>>>>
>>>> Kaushal
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>         
>>> You already have one good resource.
>>>
>>> I wrote this a while ago, and the preface may answer some questions
>>> you have about the terminology used.
>>>
>>> http://wiki.tldp.org/LVM-on-RAID
>>>
>>> However the question you're asking is more or less borderline
>>> off-topic for this mailing list.  If the linked information is
>>> insufficient I suggest using the Wikipedia article's links to learn
>>> more.
>>>
>>>       
>> I have some recent experience with this gained the hard way, by looking for
>> a problem rather than curiousity. My experience with LVM on RAID is that, at
>> least for RAID-5, write performance sucks. I created two partitions on each
>> of three drives, and two raid-5 arrays using those partitions. Same block
>> size, same tuning for stripe-cache, etc. I dropped an ext4 on on array, and
>> LVM on the other, put ext4 on the LVM drive, and copied 500GB to each. LVM
>> had a 50% performance penalty, took twice as long. Repeated with four drives
>> (all I could spare) and found that the speed right on an array was roughly
>> 3x slower with LVM.
>>
>> I did not look into it further, I know why the performance is bad, I don't
>> have the hardware to change things right now, so I live with it. When I get
>> back from a trip I will change that.
>>
>>
>>     
>
> This issues sounds very likely to be write barrier related.  Were you
> using an external journal on a write-barrier honoring device?
>   

Not at all, just taking 60G of free space of the drives, creating two 
partitions (on 64 sector boundaries) and using them for raid-5. Tried 
various chunk sizes, better for some things, not so much for others.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 16+ messages in thread

* raid5
@ 2016-03-01 14:24 John Smith
  2016-03-01 21:44 ` raid5 Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: John Smith @ 2016-03-01 14:24 UTC (permalink / raw)
  To: linux-btrfs

Hi,
what is the status of  btrfs raid5 in kernel 4.4? Thank you

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: raid5
  2016-03-01 14:24 raid5 John Smith
@ 2016-03-01 21:44 ` Duncan
  2016-03-02 13:43   ` raid5 Austin S. Hemmelgarn
  0 siblings, 1 reply; 16+ messages in thread
From: Duncan @ 2016-03-01 21:44 UTC (permalink / raw)
  To: linux-btrfs

John Smith posted on Tue, 01 Mar 2016 15:24:04 +0100 as excerpted:

> what is the status of  btrfs raid5 in kernel 4.4? Thank you

That is a very good question. =:^)

The answer, to the best I can give it, is, btrfs raid56 mode has no known 
outstanding bugs specific to it at this time (unless a dev knows of any, 
but I've not seen any confirmed on-list), and hasn't had any, at least 
nothing major, since early in the 4.1 cycle, so 4.2 thru 4.4 should be 
clean of /known/ raid56 bugs.

However, there was (at least) /one/ report of a problem on raid56, that 
to my knowledge hasn't been traced, so it's unknown if it would have 
happened on say raid1 or raid10, as opposed to the raid5/6 it did happen 
on, or not.  Of course with raid56 mode still being relatively new, 
that's one suspect, but we simply don't know at this point, and I've seen 
nothing further on that thread in say 10 days or so, so I'm not sure we 
ever will.

So the best recommendation I can give is that raid56 mode is definitely 
coming along, but depending on how cautious you are, may or may not yet 
be considered quite as stable as the rest of btrfs in general.

If your use-case calls for raid5 or raid6 and provided you have backups 
in place if you value the data (a caveat which still applies to btrfs in 
general even more than it does to fully stable and mature filesystems, as 
in general it's "stabilizing, but not yet fully stable and mature", and 
here, even incrementally more to raid56 mode), raid56 mode is what I'd 
call the barely acceptable tolerably stable range, but I'd still be much 
more comfortable with, and recommend, waiting another couple of kernel 
cycles just to be sure if you don't have an /immediate/ need, or 
alternatively, using say raid10 mode.

Again, that's assuming backups and that you're prepared to use them if 
necessary, if you care about the data, but again, that still applies to 
btrfs in general, and indeed, to a slightly lessor extent to /any/ data 
on /any/ filesystem.  Because as the sysadmin's rule of backups states 
(in simple form), you either have at least one level of backup, or by 
your (in)actions, you are literally defining that data as worth less than 
the time and resources necessary to do that backup.  Which means if you 
lose the data and don't have it backed up elsewhere to restore it from, 
you can still be happy, as obviously you considered the time and 
resources necessary to do that backup as of more worth than the data, so 
even if you lose the data, you saved what was obviously more important to 
you, the time and resources you otherwise would have put into ensuring 
that you had a backup, if the data was worth it.

So take heed and don't decide only AFTER you've lost it, that the data 
was actually worth more than the time and resources you DIDN'T spend on 
backing it up.  And while that definitely applies a bit more to btrfs in 
its current "stabilizing but not yet fully stable and mature" state than 
it does to fully stable and mature filesystems, it applies well enough to 
all of them, that figuring out the data was worth more than you thought 
is /always/ an experience you'd rather avoid, /regardless/ of the 
filesystem and hardware that data is on. =:^)

And with it either backed up or of only trivial value regardless, you 
don't have anything to lose, and even if raid56 mode /doesn't/ prove so 
stable for you after all, you can still rest easy knowing you aren't 
going to lose anything of value. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: raid5
  2016-03-01 21:44 ` raid5 Duncan
@ 2016-03-02 13:43   ` Austin S. Hemmelgarn
  2016-03-03  4:16     ` raid5 Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-02 13:43 UTC (permalink / raw)
  To: linux-btrfs

On 2016-03-01 16:44, Duncan wrote:
> John Smith posted on Tue, 01 Mar 2016 15:24:04 +0100 as excerpted:
>
>> what is the status of  btrfs raid5 in kernel 4.4? Thank you
>
> That is a very good question. =:^)
>
> The answer, to the best I can give it, is, btrfs raid56 mode has no known
> outstanding bugs specific to it at this time (unless a dev knows of any,
> but I've not seen any confirmed on-list), and hasn't had any, at least
> nothing major, since early in the 4.1 cycle, so 4.2 thru 4.4 should be
> clean of /known/ raid56 bugs.
That really depends on what you consider to be a bug...

For example, for most production usage, the insanely long 
rebuild/rebalance times that people are seeing with BTRFS raid56 (on the 
order of multiple days per terabyte of data to be rebuilt, compared to a 
couple of hours for a rebuild on the same hardware using MDRAID or LVM) 
would very much be considered a serious bug, as it significantly 
increases the chances of data loss due to further disk failures. 
Personally, my recommendation would be to not use BTRFS raid56 for 
anything other than testing if you're working with data-sets bigger than 
about 250G until this particular issue gets fixed, which may be a while 
as we can't seem to figure out what exactly is causing the problem.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: raid5
  2016-03-02 13:43   ` raid5 Austin S. Hemmelgarn
@ 2016-03-03  4:16     ` Duncan
  0 siblings, 0 replies; 16+ messages in thread
From: Duncan @ 2016-03-03  4:16 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Wed, 02 Mar 2016 08:43:17 -0500 as
excerpted:

> On 2016-03-01 16:44, Duncan wrote:
>> John Smith posted on Tue, 01 Mar 2016 15:24:04 +0100 as excerpted:
>>
>>> what is the status of  btrfs raid5 in kernel 4.4? Thank you
>>
>> That is a very good question. =:^)
>>
>> The answer, to the best I can give it, is, btrfs raid56 mode has no
>> known outstanding bugs specific to it at this time (unless a dev knows
>> of any,
>> but I've not seen any confirmed on-list), and hasn't had any, at least
>> nothing major, since early in the 4.1 cycle, so 4.2 thru 4.4 should be
>> clean of /known/ raid56 bugs.
> That really depends on what you consider to be a bug...
> 
> For example, for most production usage, the insanely long
> rebuild/rebalance times that people are seeing with BTRFS raid56 (on the
> order of multiple days per terabyte of data to be rebuilt, compared to a
> couple of hours for a rebuild on the same hardware using MDRAID or LVM)

Very good point.  I wasn't considering that a bug as it's not a direct 
dataloss danger (only the indirect danger of another device dying during 
the extremely long rebuilds), but you're correct, in practice it's a 
potentially blocker level bug.

But from what I've seen, it isn't affecting everyone, which is of course 
part of the problem from a developer POV, since that makes it harder to 
replicate and trace down.  And it's equally a problem from a user POV, as 
until it's fixed, even if your testing demonstrates that it's not 
affecting you ATM, until we actually pin down what's triggering it, 
there's no way of knowing whether or when it /might/ trigger, which means 
even if it's not affecting you in testing, you gotta assume it's going to 
affect you if you end up trying to do data recovery.

So agreed, tho the effect is pretty much the same as my preferred 
recommendation in any case, effectively, hold off another couple kernel 
cycles and ask again.  I simply wasn't thinking of this specific bug at 
the time and thus couldn't specifically mention it as a concrete reason.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-03  4:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-19  3:46 RAID5 Kaushal Shriyan
2010-04-19  4:21 ` RAID5 Michael Evans
2010-04-21 13:32   ` RAID5 Bill Davidsen
2010-04-21 19:43     ` RAID5 Michael Evans
2010-04-23 14:26       ` RAID5 Michael Tokarev
2010-04-23 14:57         ` RAID5 MRK
2010-04-23 20:57         ` RAID5 Michael Evans
2010-04-24  1:47           ` RAID5 Mikael Abrahamsson
2010-04-24  3:34             ` RAID5 Michael Evans
2010-05-02 22:51         ` RAID5 Bill Davidsen
2010-05-03  5:51         ` RAID5 Luca Berra
2010-05-02 22:45       ` RAID5 Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2016-03-01 14:24 raid5 John Smith
2016-03-01 21:44 ` raid5 Duncan
2016-03-02 13:43   ` raid5 Austin S. Hemmelgarn
2016-03-03  4:16     ` raid5 Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.