RAID6 questions

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID6 questions
@ 2009-07-02 15:22 Marek
  2009-07-02 16:23 ` Robin Hill
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Marek @ 2009-07-02 15:22 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb

Hi,

I'm trying to build a RAID6 array out of 6x1TB disks, and would like
to ask the following:

1. Is it possible to convert from 0.9 superblock to 1.x with mdadm
3.0? The reason is that most distributions ship with mdadm 2.6.x which
seems to use 0.9 superblock by default. I wasn't able to find any info
on mdadm 2.6.x using or switching to 1.x superblocks, so it seems that
unless I'm using mdadm 3.0 which is practically unavailable, I'm stuck
with 0.9.

2. Is it safe to upgrade to mdadm 3.x?

3. Is it possible to use 0xDA with 0.9 superblock and omit autodetect
with mdadm 2.6.x? I couldn't find any information regarding this since
most RAID related sources either still suggest 0xFD and
autodetect(even with mdadm 3.0 by using -e 0.9 option) or they do not
state which version of mdadm to use in case of 1.x superblocks. Since
autodetect is deprecated, is there a safe way(without losing any data)
to convert from autodetect + 0xFD in the future?

4. (probably a stupid question but..) Should an extended 0x05
partition be ignored on RAID build? This is not directly related to
mdadm, but many tutorials basically suggest to
for i in `seq 1 x`; do mdadm --create (...) /dev/md$i /dev/sda$i
/dev/sdb$i (...)
It's not obvious in case one decides to partition the drives into many
small partitions e.g. 1TB into 20x 50GB, in such case he gets 3
primary partitions and one extended containing(or pointing to?) the
remaining logical partitions, however the extended partition shows up
as e.g. /dev/sda4, while the logical partitions appear as /dev/sda5,
/dev/sda6 etc., so in the above mentioned case it would basically also
try to create a RAID array from extended partitions.
It would seem more logical to lay out the logical partitions as
/dev/sda4l1 /dev/sda4l2 .... /dev/sda4l17 but udev doesn't seem to do
that. Is it safe to ignore /dev/sdX4 and just create RAIDs out of
/dev/sdX(1..3,5..20)?

5. In case one decides for a partitioned approach - does mdadm kick
out faulty partitions or whole drives? I have read several sources
including some comments on slashdot that it's much better to split
large drives into many small partitions, but noone clarified in
detail.  A possible though unlikely scenario would be simultaneous
failure of all hdds in the array:

 md1 RAID6 sda1[_] sdb1[_] sdc1[U] sdd1[U] sde1[U] sdf1[U]
 md2 RAID6 sda2[U] sdb2[_] sdc2[_] sdd2[U] sde2[U] sdf2[U]
 md3 RAID6 sda3[U] sdb3[U] sdc3[_] sdd3[_] sde3[U] sdf3[U]
 md4 RAID6 sda4[U] sdb4[U] sdc4[U] sdd4[_] sde4[_] sdf4[U]
 md5 RAID6 sda5[U] sdb5[U] sdc5[U] sdd5[U] sde5[_] sdf5[_]
(...)

If mdadm kicks out faulty partitions only, but leaves the remaining
part of drive going as long as it's able to read it, would it mean
that even if every single hdd in the array failed somewhere (for
example due to Reallocated_Sector_Ct), mdadm would keep the healthy
partitions of that failed drive running, thus the entire system would
be still running in degraded mode without loss of data?

6. Is it safe to have 20+ partitions for a RAID5,6 system? Most RAID
related sources state that there's a limitation on number of
partitions one can have on SATA drives(AFAIK 16), but i digged out
some information about a recent patch which would remove this
limitation and which according to some other source had also been
accepted into mainline kernel, though I'm not sure about it.
http://thread.gmane.org/gmane.linux.kernel/701825
http://lwn.net/Articles/289927/

7. Question about special metadata with X58 ICH10R controllers - since
the 3.0 announcement states that the Intel Matrix metadata format used
by recent Intel ICH controlers is also supported, I'd like to ask if
there's some instructions available on how to use it and what benefits
it would bring to the user.

8. Most RAID related sources seem to deal with rather simple scenarios
such as RAID0 or RAID1. There are only a few brief examples avaliable
on how to build RAID5 and none for RAID6. Does anyone know of any
recent & decent RAID6 tutorial?

thanks,

Marek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 15:22 RAID6 questions Marek
@ 2009-07-02 16:23 ` Robin Hill
  2009-07-02 16:27 ` Andre Noll
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Robin Hill @ 2009-07-02 16:23 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5955 bytes --]

On Thu Jul 02, 2009 at 05:22:54PM +0200, Marek wrote:

> Hi,
> 
> I'm trying to build a RAID6 array out of 6x1TB disks, and would like
> to ask the following:
> 
> 1. Is it possible to convert from 0.9 superblock to 1.x with mdadm
> 3.0? The reason is that most distributions ship with mdadm 2.6.x which
> seems to use 0.9 superblock by default. I wasn't able to find any info
> on mdadm 2.6.x using or switching to 1.x superblocks, so it seems that
> unless I'm using mdadm 3.0 which is practically unavailable, I'm stuck
> with 0.9.
> 
You can certainly use 1.x superblocks with mdadm 2.6.x (just specify the
--metadata= switch).  You certainly can't switch superblocks (easily)
with 2.6.x, and I've not heard anything to suggest 3.0 supports it yet
either.

> 2. Is it safe to upgrade to mdadm 3.x?
> 
It certainly should be safe to use, and it's backward compatible so I
don't see there should be any issues with upgrading.

> 3. Is it possible to use 0xDA with 0.9 superblock and omit autodetect
> with mdadm 2.6.x? I couldn't find any information regarding this since
> most RAID related sources either still suggest 0xFD and
> autodetect(even with mdadm 3.0 by using -e 0.9 option) or they do not
> state which version of mdadm to use in case of 1.x superblocks. Since
> autodetect is deprecated, is there a safe way(without losing any data)
> to convert from autodetect + 0xFD in the future?
> 
You can use 0xDA with any superblock version.  If you're not using
autodetect then you have to make sure you're using an initrd and that it
has the correct mdadm.conf in it.  Most distros will take care of this
for you.  You can switch back/forward between autodetect/not whenever
you like (providing you're using 0.9 metadata anyway).

> 4. (probably a stupid question but..) Should an extended 0x05
> partition be ignored on RAID build? This is not directly related to
> mdadm, but many tutorials basically suggest to
> for i in `seq 1 x`; do mdadm --create (...) /dev/md$i /dev/sda$i
> /dev/sdb$i (...)
> It's not obvious in case one decides to partition the drives into many
> small partitions e.g. 1TB into 20x 50GB, in such case he gets 3
> primary partitions and one extended containing(or pointing to?) the
> remaining logical partitions, however the extended partition shows up
> as e.g. /dev/sda4, while the logical partitions appear as /dev/sda5,
> /dev/sda6 etc., so in the above mentioned case it would basically also
> try to create a RAID array from extended partitions.
> It would seem more logical to lay out the logical partitions as
> /dev/sda4l1 /dev/sda4l2 .... /dev/sda4l17 but udev doesn't seem to do
> that. Is it safe to ignore /dev/sdX4 and just create RAIDs out of
> /dev/sdX(1..3,5..20)?
> 
I'm not sure how mdadm would deal if you passed it an extended partition
- it's certainly safest not to do so!  You could also just use a single
partitionable array instead of using logical partitions.

> 5. In case one decides for a partitioned approach - does mdadm kick
> out faulty partitions or whole drives? I have read several sources
> including some comments on slashdot that it's much better to split
> large drives into many small partitions, but noone clarified in
> detail.  A possible though unlikely scenario would be simultaneous
> failure of all hdds in the array:
> 
>  md1 RAID6 sda1[_] sdb1[_] sdc1[U] sdd1[U] sde1[U] sdf1[U]
>  md2 RAID6 sda2[U] sdb2[_] sdc2[_] sdd2[U] sde2[U] sdf2[U]
>  md3 RAID6 sda3[U] sdb3[U] sdc3[_] sdd3[_] sde3[U] sdf3[U]
>  md4 RAID6 sda4[U] sdb4[U] sdc4[U] sdd4[_] sde4[_] sdf4[U]
>  md5 RAID6 sda5[U] sdb5[U] sdc5[U] sdd5[U] sde5[_] sdf5[_]
> (...)
> 
> If mdadm kicks out faulty partitions only, but leaves the remaining
> part of drive going as long as it's able to read it, would it mean
> that even if every single hdd in the array failed somewhere (for
> example due to Reallocated_Sector_Ct), mdadm would keep the healthy
> partitions of that failed drive running, thus the entire system would
> be still running in degraded mode without loss of data?
> 
This depends on the failure mode.  Drives usually deal with soft
failures themselves (reallocating sectors), so a failure usually takes
out the whole drive.  In my experience, md will only kick out failed
partitions though.

> 6. Is it safe to have 20+ partitions for a RAID5,6 system? Most RAID
> related sources state that there's a limitation on number of
> partitions one can have on SATA drives(AFAIK 16), but i digged out
> some information about a recent patch which would remove this
> limitation and which according to some other source had also been
> accepted into mainline kernel, though I'm not sure about it.
> http://thread.gmane.org/gmane.linux.kernel/701825
> http://lwn.net/Articles/289927/
> 
If your system can handle that many partitions then md should be fine.

> 7. Question about special metadata with X58 ICH10R controllers - since
> the 3.0 announcement states that the Intel Matrix metadata format used
> by recent Intel ICH controlers is also supported, I'd like to ask if
> there's some instructions available on how to use it and what benefits
> it would bring to the user.
> 
Pass.  You'd probably be best searching an archive of this list though.

> 8. Most RAID related sources seem to deal with rather simple scenarios
> such as RAID0 or RAID1. There are only a few brief examples avaliable
> on how to build RAID5 and none for RAID6. Does anyone know of any
> recent & decent RAID6 tutorial?
> 
Not that I've seen.  The process doesn't really differ between RAID
types though, and RAID5/RAID6 should take exactly the same parameters.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 15:22 RAID6 questions Marek
  2009-07-02 16:23 ` Robin Hill
@ 2009-07-02 16:27 ` Andre Noll
  2009-07-02 16:42 ` Goswin von Brederlow
  2009-07-03  6:40 ` Luca Berra
  3 siblings, 0 replies; 9+ messages in thread
From: Andre Noll @ 2009-07-02 16:27 UTC (permalink / raw)
  To: Marek; +Cc: linux-raid, neilb

[-- Attachment #1: Type: text/plain, Size: 3629 bytes --]

On 17:22, Marek wrote:
> it seems that unless I'm using mdadm 3.0 which is practically
> unavailable, I'm stuck with 0.9.

Nope, mdadm-2.6 supports v1.2 superblocks.

> 3. Is it possible to use 0xDA with 0.9 superblock and omit autodetect
> with mdadm 2.6.x?

yes, unless of course you have your root partition on md and want the
kernel (rather than your initramfs scripts) to detect the md device.

> is there a safe way(without losing any data) to convert from
> autodetect + 0xFD in the future?

Yes, just change the partition types. No sane program relies on these
types anyway.

> 4. (probably a stupid question but..) Should an extended 0x05
> partition be ignored on RAID build? This is not directly related to
> mdadm, but many tutorials basically suggest to
> for i in `seq 1 x`; do mdadm --create (...) /dev/md$i /dev/sda$i
> /dev/sdb$i (...)

Of course you can not have an md device on both the extended
partition and some logical partition contained therein. I'd
recomment to stay away from the extended partition craziness whenever
possible. Especially as you are talking about to

> partition the drives into many small partitions e.g. 1TB into 20x
> 50GB,

If you are planning to have that many devices, I'd rather use LVM on top
of md which is much more flexible.

> does mdadm kick out faulty partitions or whole drives?

It kicks whatever the failing component device is, so if the md is made
from partitions only, only the faulty partition would be kicked out.

> I have read several sources including some comments on slashdot that
> it's much better to split large drives into many small partitions, but
> noone clarified in detail.

Yeah, and if you use emacs rather than vi, your disks won't fail at
all. ;)

> If mdadm kicks out faulty partitions only, but leaves the remaining
> part of drive going as long as it's able to read it, would it mean
> that even if every single hdd in the array failed somewhere (for
> example due to Reallocated_Sector_Ct), mdadm would keep the healthy
> partitions of that failed drive running, thus the entire system would
> be still running in degraded mode without loss of data?

True. It's up to you to estimate the likelyhood of this scenario.
Usually, if a disk starts to fail, it will soon return errors for
the other partitions as well. Also, you should be aware of the fact
that md tries to re-write bad sectors on read-errors with (valid)
data from the remaining good drives. So md will "fix" the read error
if the drive can remap the bad sector.

> 6. Is it safe to have 20+ partitions for a RAID5,6 system?

Yes, as the number of partitions is not as critical as the number of
component devices. The latter is bounded by 26 for raid6 and v0.90
superblocks IIRC.

> Most RAID related sources state that there's a limitation on number of
> partitions one can have on SATA drives(AFAIK 16)

This limitation is not imposed by the disk, but by the type of the
partition table.

> 8. Most RAID related sources seem to deal with rather simple scenarios
> such as RAID0 or RAID1. There are only a few brief examples avaliable
> on how to build RAID5 and none for RAID6. Does anyone know of any
> recent & decent RAID6 tutorial?

At least for md, creating and using a raid5/raid6 array is not much
different than the raid0/raid1 case. If you want to understand the
algorithm behind raid6 I'd recommend to read hpa's paper [1].

Regards
Andre

[1] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 15:22 RAID6 questions Marek
  2009-07-02 16:23 ` Robin Hill
  2009-07-02 16:27 ` Andre Noll
@ 2009-07-02 16:42 ` Goswin von Brederlow
  2009-07-02 16:53   ` Doug Ledford
  2009-07-02 22:13   ` Greg Freemyer
  2009-07-03  6:40 ` Luca Berra
  3 siblings, 2 replies; 9+ messages in thread
From: Goswin von Brederlow @ 2009-07-02 16:42 UTC (permalink / raw)
  To: Marek; +Cc: linux-raid, neilb

Marek <mlf.conv@gmail.com> writes:

> Hi,
>
> I'm trying to build a RAID6 array out of 6x1TB disks, and would like
> to ask the following:
>
> 1. Is it possible to convert from 0.9 superblock to 1.x with mdadm
> 3.0? The reason is that most distributions ship with mdadm 2.6.x which
> seems to use 0.9 superblock by default. I wasn't able to find any info
> on mdadm 2.6.x using or switching to 1.x superblocks, so it seems that
> unless I'm using mdadm 3.0 which is practically unavailable, I'm stuck
> with 0.9.
>
> 2. Is it safe to upgrade to mdadm 3.x?
>
> 3. Is it possible to use 0xDA with 0.9 superblock and omit autodetect
> with mdadm 2.6.x? I couldn't find any information regarding this since
> most RAID related sources either still suggest 0xFD and
> autodetect(even with mdadm 3.0 by using -e 0.9 option) or they do not
> state which version of mdadm to use in case of 1.x superblocks. Since
> autodetect is deprecated, is there a safe way(without losing any data)
> to convert from autodetect + 0xFD in the future?

If you have raid build as module then the kernel does no
autodetect. Otherwise you can give some kernel commandline option, see
docs.

> 4. (probably a stupid question but..) Should an extended 0x05
> partition be ignored on RAID build? This is not directly related to
> mdadm, but many tutorials basically suggest to
> for i in `seq 1 x`; do mdadm --create (...) /dev/md$i /dev/sda$i
> /dev/sdb$i (...)
> It's not obvious in case one decides to partition the drives into many
> small partitions e.g. 1TB into 20x 50GB, in such case he gets 3
> primary partitions and one extended containing(or pointing to?) the
> remaining logical partitions, however the extended partition shows up
> as e.g. /dev/sda4, while the logical partitions appear as /dev/sda5,
> /dev/sda6 etc., so in the above mentioned case it would basically also
> try to create a RAID array from extended partitions.
> It would seem more logical to lay out the logical partitions as
> /dev/sda4l1 /dev/sda4l2 .... /dev/sda4l17 but udev doesn't seem to do
> that. Is it safe to ignore /dev/sdX4 and just create RAIDs out of
> /dev/sdX(1..3,5..20)?

Obviously you need to skip the extended partiton. I also see no reason
to create multiple raid6 over partitions on the same drive. Create one
big raid6 and use lvm or partitioning on that.

> 5. In case one decides for a partitioned approach - does mdadm kick
> out faulty partitions or whole drives? I have read several sources
> including some comments on slashdot that it's much better to split
> large drives into many small partitions, but noone clarified in
> detail.  A possible though unlikely scenario would be simultaneous
> failure of all hdds in the array:
>
>  md1 RAID6 sda1[_] sdb1[_] sdc1[U] sdd1[U] sde1[U] sdf1[U]
>  md2 RAID6 sda2[U] sdb2[_] sdc2[_] sdd2[U] sde2[U] sdf2[U]
>  md3 RAID6 sda3[U] sdb3[U] sdc3[_] sdd3[_] sde3[U] sdf3[U]
>  md4 RAID6 sda4[U] sdb4[U] sdc4[U] sdd4[_] sde4[_] sdf4[U]
>  md5 RAID6 sda5[U] sdb5[U] sdc5[U] sdd5[U] sde5[_] sdf5[_]
> (...)
>
> If mdadm kicks out faulty partitions only, but leaves the remaining
> part of drive going as long as it's able to read it, would it mean
> that even if every single hdd in the array failed somewhere (for
> example due to Reallocated_Sector_Ct), mdadm would keep the healthy
> partitions of that failed drive running, thus the entire system would
> be still running in degraded mode without loss of data?

The raid code kicks out a partition at a time when it gets errors. But
that means there must be an access to the partition for the kernel to
notice it does give errors first. So even if sda fails completly only
those drives you access will notice that and fail their sdaX.

In case of read errors the raid code also tries to restore a block
using the parity data and rewrite it so the drive can remap it to a
healthy sector.

> 6. Is it safe to have 20+ partitions for a RAID5,6 system? Most RAID
> related sources state that there's a limitation on number of
> partitions one can have on SATA drives(AFAIK 16), but i digged out
> some information about a recent patch which would remove this
> limitation and which according to some other source had also been
> accepted into mainline kernel, though I'm not sure about it.
> http://thread.gmane.org/gmane.linux.kernel/701825
> http://lwn.net/Articles/289927/

Should be 15 or unlimited. Look at the major/minor numbers of sda* and
sdb. After sda15 there is no space before sdb comes. So unless sda16
gets a dynamic major/minor it can't be accessed.

It certainly is safe. But it seems stupid as well.

> 7. Question about special metadata with X58 ICH10R controllers - since
> the 3.0 announcement states that the Intel Matrix metadata format used
> by recent Intel ICH controlers is also supported, I'd like to ask if
> there's some instructions available on how to use it and what benefits
> it would bring to the user.
>
> 8. Most RAID related sources seem to deal with rather simple scenarios
> such as RAID0 or RAID1. There are only a few brief examples avaliable
> on how to build RAID5 and none for RAID6. Does anyone know of any
> recent & decent RAID6 tutorial?

I don't see how the raid level is really relevant, esspecially between
raid5 and raid6. Raid6 just protects against 2 drives failing but
nothing changes in how to set it up or maintain it.

> thanks,
>
> Marek

MfG
        Goswin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 16:42 ` Goswin von Brederlow
@ 2009-07-02 16:53   ` Doug Ledford
  2009-07-02 22:13   ` Greg Freemyer
  1 sibling, 0 replies; 9+ messages in thread
From: Doug Ledford @ 2009-07-02 16:53 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Marek, linux-raid, neilb

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On Jul 2, 2009, at 12:42 PM, Goswin von Brederlow wrote:
> Marek <mlf.conv@gmail.com> writes:
>>
>> 3. Is it possible to use 0xDA with 0.9 superblock and omit autodetect
>> with mdadm 2.6.x? I couldn't find any information regarding this  
>> since
>> most RAID related sources either still suggest 0xFD and
>> autodetect(even with mdadm 3.0 by using -e 0.9 option) or they do not
>> state which version of mdadm to use in case of 1.x superblocks. Since
>> autodetect is deprecated, is there a safe way(without losing any  
>> data)
>> to convert from autodetect + 0xFD in the future?
>
> If you have raid build as module then the kernel does no
> autodetect. Otherwise you can give some kernel commandline option, see
> docs.

Most (if not all) sane distributions have not used raid autodetect in  
quite a while.  They use mdadm in the initrd and later during the init  
script sequence to start raid arrays instead.  Mdadm doesn't care what  
the partition type is.  I regularly use 0x83 for older raid devices,  
and 0xda on newer ones, with no ill effects.


--

Doug Ledford <dledford@redhat.com>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 16:42 ` Goswin von Brederlow
  2009-07-02 16:53   ` Doug Ledford
@ 2009-07-02 22:13   ` Greg Freemyer
  2009-07-02 22:57     ` Goswin von Brederlow
  1 sibling, 1 reply; 9+ messages in thread
From: Greg Freemyer @ 2009-07-02 22:13 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Marek, linux-raid, neilb

On Thu, Jul 2, 2009 at 12:42 PM, Goswin von Brederlow<goswin-v-b@web.de> wrote:
> Marek <mlf.conv@gmail.com> writes:
<snip>
>> 6. Is it safe to have 20+ partitions for a RAID5,6 system? Most RAID
>> related sources state that there's a limitation on number of
>> partitions one can have on SATA drives(AFAIK 16), but i digged out
>> some information about a recent patch which would remove this
>> limitation and which according to some other source had also been
>> accepted into mainline kernel, though I'm not sure about it.
>> http://thread.gmane.org/gmane.linux.kernel/701825
>> http://lwn.net/Articles/289927/
>
> Should be 15 or unlimited. Look at the major/minor numbers of sda* and
> sdb. After sda15 there is no space before sdb comes. So unless sda16
> gets a dynamic major/minor it can't be accessed.
>
> It certainly is safe. But it seems stupid as well.

That patch went in 2.6.29 I'm pretty sure.  Not that I have ever
needed more than 15 partitions on one drive.

And yes major/minor after the first 15 are now dynamic I believe.

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
Preservation and Forensic processing of Exchange Repositories White Paper -
<http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html>

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 22:13   ` Greg Freemyer
@ 2009-07-02 22:57     ` Goswin von Brederlow
  0 siblings, 0 replies; 9+ messages in thread
From: Goswin von Brederlow @ 2009-07-02 22:57 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Goswin von Brederlow, Marek, linux-raid, neilb

Greg Freemyer <greg.freemyer@gmail.com> writes:

> On Thu, Jul 2, 2009 at 12:42 PM, Goswin von Brederlow<goswin-v-b@web.de> wrote:
>> Marek <mlf.conv@gmail.com> writes:
> <snip>
>>> 6. Is it safe to have 20+ partitions for a RAID5,6 system? Most RAID
>>> related sources state that there's a limitation on number of
>>> partitions one can have on SATA drives(AFAIK 16), but i digged out
>>> some information about a recent patch which would remove this
>>> limitation and which according to some other source had also been
>>> accepted into mainline kernel, though I'm not sure about it.
>>> http://thread.gmane.org/gmane.linux.kernel/701825
>>> http://lwn.net/Articles/289927/
>>
>> Should be 15 or unlimited. Look at the major/minor numbers of sda* and
>> sdb. After sda15 there is no space before sdb comes. So unless sda16
>> gets a dynamic major/minor it can't be accessed.
>>
>> It certainly is safe. But it seems stupid as well.
>
> That patch went in 2.6.29 I'm pretty sure.  Not that I have ever
> needed more than 15 partitions on one drive.
>
> And yes major/minor after the first 15 are now dynamic I believe.
>
> Greg

In case someone misunderstands, the "It certainly is safe. But it
seems stupid as well." refers to creating 20+ raid6. Not the
major/minor problem. :)

MfG
        Goswin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-02 15:22 RAID6 questions Marek
                   ` (2 preceding siblings ...)
  2009-07-02 16:42 ` Goswin von Brederlow
@ 2009-07-03  6:40 ` Luca Berra
  2009-07-03  8:24   ` Goswin von Brederlow
  3 siblings, 1 reply; 9+ messages in thread
From: Luca Berra @ 2009-07-03  6:40 UTC (permalink / raw)
  To: linux-raid

On Thu, Jul 02, 2009 at 05:22:54PM +0200, Marek wrote:
>5. In case one decides for a partitioned approach - does mdadm kick
>out faulty partitions or whole drives? I have read several sources
>including some comments on slashdot that it's much better to split
>large drives into many small partitions, but noone clarified in
maybe because those suggesting this are not able to come up with a
reasonable explanation?
>detail.  A possible though unlikely scenario would be simultaneous
>failure of all hdds in the array:
>
> md1 RAID6 sda1[_] sdb1[_] sdc1[U] sdd1[U] sde1[U] sdf1[U]
> md2 RAID6 sda2[U] sdb2[_] sdc2[_] sdd2[U] sde2[U] sdf2[U]
> md3 RAID6 sda3[U] sdb3[U] sdc3[_] sdd3[_] sde3[U] sdf3[U]
> md4 RAID6 sda4[U] sdb4[U] sdc4[U] sdd4[_] sde4[_] sdf4[U]
> md5 RAID6 sda5[U] sdb5[U] sdc5[U] sdd5[U] sde5[_] sdf5[_]
>(...)
>
>If mdadm kicks out faulty partitions only, but leaves the remaining
>part of drive going as long as it's able to read it, would it mean
>that even if every single hdd in the array failed somewhere (for
>example due to Reallocated_Sector_Ct), mdadm would keep the healthy
>partitions of that failed drive running, thus the entire system would
>be still running in degraded mode without loss of data?
This really depends on your priorities, i would have replaced my drives
well in advance of a similar situation.
The only reason i can imagine for splitting a disk into many partitions
and raiding them together is avoiding lenghty rebuilds when a single
drive is kicked from an array due to a correctable read error.
In practice the above scenario should not happen anymore, since md will
retry writing a stripe if it gets a read-error, besides you are planning
on using raid6, so a single drive failure will still leave you with a
nice degree of protection.

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RAID6 questions
  2009-07-03  6:40 ` Luca Berra
@ 2009-07-03  8:24   ` Goswin von Brederlow
  0 siblings, 0 replies; 9+ messages in thread
From: Goswin von Brederlow @ 2009-07-03  8:24 UTC (permalink / raw)
  To: linux-raid

Luca Berra <bluca@comedia.it> writes:

> On Thu, Jul 02, 2009 at 05:22:54PM +0200, Marek wrote:
>>5. In case one decides for a partitioned approach - does mdadm kick
>>out faulty partitions or whole drives? I have read several sources
>>including some comments on slashdot that it's much better to split
>>large drives into many small partitions, but noone clarified in
> maybe because those suggesting this are not able to come up with a
> reasonable explanation?
>>detail.  A possible though unlikely scenario would be simultaneous
>>failure of all hdds in the array:
>>
>> md1 RAID6 sda1[_] sdb1[_] sdc1[U] sdd1[U] sde1[U] sdf1[U]
>> md2 RAID6 sda2[U] sdb2[_] sdc2[_] sdd2[U] sde2[U] sdf2[U]
>> md3 RAID6 sda3[U] sdb3[U] sdc3[_] sdd3[_] sde3[U] sdf3[U]
>> md4 RAID6 sda4[U] sdb4[U] sdc4[U] sdd4[_] sde4[_] sdf4[U]
>> md5 RAID6 sda5[U] sdb5[U] sdc5[U] sdd5[U] sde5[_] sdf5[_]
>>(...)
>>
>>If mdadm kicks out faulty partitions only, but leaves the remaining
>>part of drive going as long as it's able to read it, would it mean
>>that even if every single hdd in the array failed somewhere (for
>>example due to Reallocated_Sector_Ct), mdadm would keep the healthy
>>partitions of that failed drive running, thus the entire system would
>>be still running in degraded mode without loss of data?
> This really depends on your priorities, i would have replaced my drives
> well in advance of a similar situation.
> The only reason i can imagine for splitting a disk into many partitions
> and raiding them together is avoiding lenghty rebuilds when a single
> drive is kicked from an array due to a correctable read error.
> In practice the above scenario should not happen anymore, since md will
> retry writing a stripe if it gets a read-error, besides you are planning
> on using raid6, so a single drive failure will still leave you with a
> nice degree of protection.
>
> Regards,
> L.

Plus bitmaps do that much better.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-07-03  8:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-02 15:22 RAID6 questions Marek
2009-07-02 16:23 ` Robin Hill
2009-07-02 16:27 ` Andre Noll
2009-07-02 16:42 ` Goswin von Brederlow
2009-07-02 16:53   ` Doug Ledford
2009-07-02 22:13   ` Greg Freemyer
2009-07-02 22:57     ` Goswin von Brederlow
2009-07-03  6:40 ` Luca Berra
2009-07-03  8:24   ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).