How many drives are bad?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How many drives are bad?
@ 2008-02-19 17:23 Norman Elton
  2008-02-19 17:31 ` Justin Piszcz
  2008-02-21  4:28 ` Neil Brown
  0 siblings, 2 replies; 15+ messages in thread
From: Norman Elton @ 2008-02-19 17:23 UTC (permalink / raw)
  To: linux-raid

So I had my first "failure" today, when I got a report that one drive
(/dev/sdam) failed. I've attached the output of "mdadm --detail". It
appears that two drives are listed as "removed", but the array is
still functioning. What does this mean? How many drives actually
failed?

This is all a test system, so I can dink around as much as necessary.
Thanks for any advice!

Norman Elton

====== OUTPUT OF MDADM =====

        Version : 00.90.03
  Creation Time : Fri Jan 18 13:17:33 2008
     Raid Level : raid5
     Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
    Device Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 8
  Total Devices : 7
Preferred Minor : 4
    Persistence : Superblock is persistent

    Update Time : Mon Feb 18 11:49:13 2008
          State : clean, degraded
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
         Events : 0.110

    Number   Major   Minor   RaidDevice State
       0      66        1        0      active sync   /dev/sdag1
       1      66       17        1      active sync   /dev/sdah1
       2      66       33        2      active sync   /dev/sdai1
       3      66       49        3      active sync   /dev/sdaj1
       4      66       65        4      active sync   /dev/sdak1
       5       0        0        5      removed
       6       0        0        6      removed
       7      66      113        7      active sync   /dev/sdan1

       8      66       97        -      faulty spare   /dev/sdam1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 17:23 How many drives are bad? Norman Elton
@ 2008-02-19 17:31 ` Justin Piszcz
  2008-02-19 18:24   ` Norman Elton
  2008-02-21  4:28 ` Neil Brown
  1 sibling, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2008-02-19 17:31 UTC (permalink / raw)
  To: Norman Elton; +Cc: linux-raid

How many drives actually failed?
> Failed Devices : 1


On Tue, 19 Feb 2008, Norman Elton wrote:

> So I had my first "failure" today, when I got a report that one drive
> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
> appears that two drives are listed as "removed", but the array is
> still functioning. What does this mean? How many drives actually
> failed?
>
> This is all a test system, so I can dink around as much as necessary.
> Thanks for any advice!
>
> Norman Elton
>
> ====== OUTPUT OF MDADM =====
>
>        Version : 00.90.03
>  Creation Time : Fri Jan 18 13:17:33 2008
>     Raid Level : raid5
>     Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
>    Device Size : 976759936 (931.51 GiB 1000.20 GB)
>   Raid Devices : 8
>  Total Devices : 7
> Preferred Minor : 4
>    Persistence : Superblock is persistent
>
>    Update Time : Mon Feb 18 11:49:13 2008
>          State : clean, degraded
> Active Devices : 6
> Working Devices : 6
> Failed Devices : 1
>  Spare Devices : 0
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>           UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
>         Events : 0.110
>
>    Number   Major   Minor   RaidDevice State
>       0      66        1        0      active sync   /dev/sdag1
>       1      66       17        1      active sync   /dev/sdah1
>       2      66       33        2      active sync   /dev/sdai1
>       3      66       49        3      active sync   /dev/sdaj1
>       4      66       65        4      active sync   /dev/sdak1
>       5       0        0        5      removed
>       6       0        0        6      removed
>       7      66      113        7      active sync   /dev/sdan1
>
>       8      66       97        -      faulty spare   /dev/sdam1
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 17:31 ` Justin Piszcz
@ 2008-02-19 18:24   ` Norman Elton
  2008-02-19 18:33     ` Justin Piszcz
  0 siblings, 1 reply; 15+ messages in thread
From: Norman Elton @ 2008-02-19 18:24 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

But why do two show up as "removed"?? I would expect /dev/sdal1 to  
show up someplace, either active or failed.

Any ideas?

Thanks,

Norman



On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote:

> How many drives actually failed?
>> Failed Devices : 1
>
>
> On Tue, 19 Feb 2008, Norman Elton wrote:
>
>> So I had my first "failure" today, when I got a report that one drive
>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
>> appears that two drives are listed as "removed", but the array is
>> still functioning. What does this mean? How many drives actually
>> failed?
>>
>> This is all a test system, so I can dink around as much as necessary.
>> Thanks for any advice!
>>
>> Norman Elton
>>
>> ====== OUTPUT OF MDADM =====
>>
>>       Version : 00.90.03
>> Creation Time : Fri Jan 18 13:17:33 2008
>>    Raid Level : raid5
>>    Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
>>   Device Size : 976759936 (931.51 GiB 1000.20 GB)
>>  Raid Devices : 8
>> Total Devices : 7
>> Preferred Minor : 4
>>   Persistence : Superblock is persistent
>>
>>   Update Time : Mon Feb 18 11:49:13 2008
>>         State : clean, degraded
>> Active Devices : 6
>> Working Devices : 6
>> Failed Devices : 1
>> Spare Devices : 0
>>
>>        Layout : left-symmetric
>>    Chunk Size : 64K
>>
>>          UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
>>        Events : 0.110
>>
>>   Number   Major   Minor   RaidDevice State
>>      0      66        1        0      active sync   /dev/sdag1
>>      1      66       17        1      active sync   /dev/sdah1
>>      2      66       33        2      active sync   /dev/sdai1
>>      3      66       49        3      active sync   /dev/sdaj1
>>      4      66       65        4      active sync   /dev/sdak1
>>      5       0        0        5      removed
>>      6       0        0        6      removed
>>      7      66      113        7      active sync   /dev/sdan1
>>
>>      8      66       97        -      faulty spare   /dev/sdam1
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux- 
>> raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 18:24   ` Norman Elton
@ 2008-02-19 18:33     ` Justin Piszcz
  2008-02-19 18:38       ` Norman Elton
  0 siblings, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2008-02-19 18:33 UTC (permalink / raw)
  To: Norman Elton; +Cc: linux-raid

Neil,

Is this a bug?

Also, I have a question for Norman-- how come your drives are sda[a-z]1? 
Typically it is /dev/sda1 /dev/sdb1 etc?

Justin.

On Tue, 19 Feb 2008, Norman Elton wrote:

> But why do two show up as "removed"?? I would expect /dev/sdal1 to show up 
> someplace, either active or failed.
>
> Any ideas?
>
> Thanks,
>
> Norman
>
>
>
> On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote:
>
>> How many drives actually failed?
>>> Failed Devices : 1
>> 
>> 
>> On Tue, 19 Feb 2008, Norman Elton wrote:
>> 
>>> So I had my first "failure" today, when I got a report that one drive
>>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
>>> appears that two drives are listed as "removed", but the array is
>>> still functioning. What does this mean? How many drives actually
>>> failed?
>>> 
>>> This is all a test system, so I can dink around as much as necessary.
>>> Thanks for any advice!
>>> 
>>> Norman Elton
>>> 
>>> ====== OUTPUT OF MDADM =====
>>>
>>>      Version : 00.90.03
>>> Creation Time : Fri Jan 18 13:17:33 2008
>>>   Raid Level : raid5
>>>   Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
>>>  Device Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Raid Devices : 8
>>> Total Devices : 7
>>> Preferred Minor : 4
>>>  Persistence : Superblock is persistent
>>>
>>>  Update Time : Mon Feb 18 11:49:13 2008
>>>        State : clean, degraded
>>> Active Devices : 6
>>> Working Devices : 6
>>> Failed Devices : 1
>>> Spare Devices : 0
>>>
>>>       Layout : left-symmetric
>>>   Chunk Size : 64K
>>>
>>>         UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
>>>       Events : 0.110
>>>
>>>  Number   Major   Minor   RaidDevice State
>>>     0      66        1        0      active sync   /dev/sdag1
>>>     1      66       17        1      active sync   /dev/sdah1
>>>     2      66       33        2      active sync   /dev/sdai1
>>>     3      66       49        3      active sync   /dev/sdaj1
>>>     4      66       65        4      active sync   /dev/sdak1
>>>     5       0        0        5      removed
>>>     6       0        0        6      removed
>>>     7      66      113        7      active sync   /dev/sdan1
>>>
>>>     8      66       97        -      faulty spare   /dev/sdam1
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 18:33     ` Justin Piszcz
@ 2008-02-19 18:38       ` Norman Elton
  2008-02-19 19:13         ` Justin Piszcz
  0 siblings, 1 reply; 15+ messages in thread
From: Norman Elton @ 2008-02-19 18:38 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

Justin,

This is a Sun X4500 (Thumper) box, so it's got 48 drives inside.
/dev/sd[a-z] are all there as well, just in other RAID sets. Once you
get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc.

I'd be curious if what I'm experiencing is a bug. What should I try to
restore the array?

Norman

On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Neil,
>
> Is this a bug?
>
> Also, I have a question for Norman-- how come your drives are sda[a-z]1?
> Typically it is /dev/sda1 /dev/sdb1 etc?
>
> Justin.
>
> On Tue, 19 Feb 2008, Norman Elton wrote:
>
> > But why do two show up as "removed"?? I would expect /dev/sdal1 to show up
> > someplace, either active or failed.
> >
> > Any ideas?
> >
> > Thanks,
> >
> > Norman
> >
> >
> >
> > On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote:
> >
> >> How many drives actually failed?
> >>> Failed Devices : 1
> >>
> >>
> >> On Tue, 19 Feb 2008, Norman Elton wrote:
> >>
> >>> So I had my first "failure" today, when I got a report that one drive
> >>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
> >>> appears that two drives are listed as "removed", but the array is
> >>> still functioning. What does this mean? How many drives actually
> >>> failed?
> >>>
> >>> This is all a test system, so I can dink around as much as necessary.
> >>> Thanks for any advice!
> >>>
> >>> Norman Elton
> >>>
> >>> ====== OUTPUT OF MDADM =====
> >>>
> >>>      Version : 00.90.03
> >>> Creation Time : Fri Jan 18 13:17:33 2008
> >>>   Raid Level : raid5
> >>>   Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
> >>>  Device Size : 976759936 (931.51 GiB 1000.20 GB)
> >>> Raid Devices : 8
> >>> Total Devices : 7
> >>> Preferred Minor : 4
> >>>  Persistence : Superblock is persistent
> >>>
> >>>  Update Time : Mon Feb 18 11:49:13 2008
> >>>        State : clean, degraded
> >>> Active Devices : 6
> >>> Working Devices : 6
> >>> Failed Devices : 1
> >>> Spare Devices : 0
> >>>
> >>>       Layout : left-symmetric
> >>>   Chunk Size : 64K
> >>>
> >>>         UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
> >>>       Events : 0.110
> >>>
> >>>  Number   Major   Minor   RaidDevice State
> >>>     0      66        1        0      active sync   /dev/sdag1
> >>>     1      66       17        1      active sync   /dev/sdah1
> >>>     2      66       33        2      active sync   /dev/sdai1
> >>>     3      66       49        3      active sync   /dev/sdaj1
> >>>     4      66       65        4      active sync   /dev/sdak1
> >>>     5       0        0        5      removed
> >>>     6       0        0        6      removed
> >>>     7      66      113        7      active sync   /dev/sdan1
> >>>
> >>>     8      66       97        -      faulty spare   /dev/sdam1
> >>> -
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 18:38       ` Norman Elton
@ 2008-02-19 19:13         ` Justin Piszcz
  2008-02-19 19:25           ` Norman Elton
  0 siblings, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2008-02-19 19:13 UTC (permalink / raw)
  To: Norman Elton; +Cc: linux-raid, Alan Piszcz

Norman,

I am extremely interested in what distribution you are running on it and 
what type of SW raid you are employing (besides the one you showed here), 
are all 48 drives filled, or?

Justin.

On Tue, 19 Feb 2008, Norman Elton wrote:

> Justin,
>
> This is a Sun X4500 (Thumper) box, so it's got 48 drives inside.
> /dev/sd[a-z] are all there as well, just in other RAID sets. Once you
> get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc.
>
> I'd be curious if what I'm experiencing is a bug. What should I try to
> restore the array?
>
> Norman
>
> On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>> Neil,
>>
>> Is this a bug?
>>
>> Also, I have a question for Norman-- how come your drives are sda[a-z]1?
>> Typically it is /dev/sda1 /dev/sdb1 etc?
>>
>> Justin.
>>
>> On Tue, 19 Feb 2008, Norman Elton wrote:
>>
>>> But why do two show up as "removed"?? I would expect /dev/sdal1 to show up
>>> someplace, either active or failed.
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>>
>>> Norman
>>>
>>>
>>>
>>> On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote:
>>>
>>>> How many drives actually failed?
>>>>> Failed Devices : 1
>>>>
>>>>
>>>> On Tue, 19 Feb 2008, Norman Elton wrote:
>>>>
>>>>> So I had my first "failure" today, when I got a report that one drive
>>>>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
>>>>> appears that two drives are listed as "removed", but the array is
>>>>> still functioning. What does this mean? How many drives actually
>>>>> failed?
>>>>>
>>>>> This is all a test system, so I can dink around as much as necessary.
>>>>> Thanks for any advice!
>>>>>
>>>>> Norman Elton
>>>>>
>>>>> ====== OUTPUT OF MDADM =====
>>>>>
>>>>>      Version : 00.90.03
>>>>> Creation Time : Fri Jan 18 13:17:33 2008
>>>>>   Raid Level : raid5
>>>>>   Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
>>>>>  Device Size : 976759936 (931.51 GiB 1000.20 GB)
>>>>> Raid Devices : 8
>>>>> Total Devices : 7
>>>>> Preferred Minor : 4
>>>>>  Persistence : Superblock is persistent
>>>>>
>>>>>  Update Time : Mon Feb 18 11:49:13 2008
>>>>>        State : clean, degraded
>>>>> Active Devices : 6
>>>>> Working Devices : 6
>>>>> Failed Devices : 1
>>>>> Spare Devices : 0
>>>>>
>>>>>       Layout : left-symmetric
>>>>>   Chunk Size : 64K
>>>>>
>>>>>         UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
>>>>>       Events : 0.110
>>>>>
>>>>>  Number   Major   Minor   RaidDevice State
>>>>>     0      66        1        0      active sync   /dev/sdag1
>>>>>     1      66       17        1      active sync   /dev/sdah1
>>>>>     2      66       33        2      active sync   /dev/sdai1
>>>>>     3      66       49        3      active sync   /dev/sdaj1
>>>>>     4      66       65        4      active sync   /dev/sdak1
>>>>>     5       0        0        5      removed
>>>>>     6       0        0        6      removed
>>>>>     7      66      113        7      active sync   /dev/sdan1
>>>>>
>>>>>     8      66       97        -      faulty spare   /dev/sdam1
>>>>> -
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 19:13         ` Justin Piszcz
@ 2008-02-19 19:25           ` Norman Elton
  2008-02-19 19:44             ` Steve Fairbairn
  2008-02-20  7:21             ` Peter Grandi
  0 siblings, 2 replies; 15+ messages in thread
From: Norman Elton @ 2008-02-19 19:25 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, Alan Piszcz

Justin,

There was actually a discussion I fired off a few weeks ago about how
to best run SW RAID on this hardware. Here's the recap:

We're running RHEL, so no access to ZFS/XFS. I really wish we could do
ZFS, but no luck.

The box presents 48 drives, split across 6 SATA controllers. So disks
sda-sdh are on one controller, etc. In our configuration, I run a
RAID5 MD array for each controller, then run LVM on top of these to
form one large VolGroup.

I found that it was easiest to setup ext3 with a max of 2TB
partitions. So running on top of the massive LVM VolGroup are a
handful of ext3 partitions, each mounted in the filesystem. This less
than ideal (ZFS would allow us one large partition), but we're
rewriting some software to utilize the multi-partition scheme.

In this setup, we should be fairly protected against drive failure. We
are vulnerable to a controller failure. If such a failure occurred,
we'd have to restore from backup.

Hope this helps, let me know if you have any questions or suggestions.
I'm certainly no expert here!

Thanks,

Norman

On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Norman,
>
> I am extremely interested in what distribution you are running on it and
> what type of SW raid you are employing (besides the one you showed here),
> are all 48 drives filled, or?
>
> Justin.
>
> On Tue, 19 Feb 2008, Norman Elton wrote:
>
> > Justin,
> >
> > This is a Sun X4500 (Thumper) box, so it's got 48 drives inside.
> > /dev/sd[a-z] are all there as well, just in other RAID sets. Once you
> > get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc.
> >
> > I'd be curious if what I'm experiencing is a bug. What should I try to
> > restore the array?
> >
> > Norman
> >
> > On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> >> Neil,
> >>
> >> Is this a bug?
> >>
> >> Also, I have a question for Norman-- how come your drives are sda[a-z]1?
> >> Typically it is /dev/sda1 /dev/sdb1 etc?
> >>
> >> Justin.
> >>
> >> On Tue, 19 Feb 2008, Norman Elton wrote:
> >>
> >>> But why do two show up as "removed"?? I would expect /dev/sdal1 to show up
> >>> someplace, either active or failed.
> >>>
> >>> Any ideas?
> >>>
> >>> Thanks,
> >>>
> >>> Norman
> >>>
> >>>
> >>>
> >>> On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote:
> >>>
> >>>> How many drives actually failed?
> >>>>> Failed Devices : 1
> >>>>
> >>>>
> >>>> On Tue, 19 Feb 2008, Norman Elton wrote:
> >>>>
> >>>>> So I had my first "failure" today, when I got a report that one drive
> >>>>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
> >>>>> appears that two drives are listed as "removed", but the array is
> >>>>> still functioning. What does this mean? How many drives actually
> >>>>> failed?
> >>>>>
> >>>>> This is all a test system, so I can dink around as much as necessary.
> >>>>> Thanks for any advice!
> >>>>>
> >>>>> Norman Elton
> >>>>>
> >>>>> ====== OUTPUT OF MDADM =====
> >>>>>
> >>>>>      Version : 00.90.03
> >>>>> Creation Time : Fri Jan 18 13:17:33 2008
> >>>>>   Raid Level : raid5
> >>>>>   Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
> >>>>>  Device Size : 976759936 (931.51 GiB 1000.20 GB)
> >>>>> Raid Devices : 8
> >>>>> Total Devices : 7
> >>>>> Preferred Minor : 4
> >>>>>  Persistence : Superblock is persistent
> >>>>>
> >>>>>  Update Time : Mon Feb 18 11:49:13 2008
> >>>>>        State : clean, degraded
> >>>>> Active Devices : 6
> >>>>> Working Devices : 6
> >>>>> Failed Devices : 1
> >>>>> Spare Devices : 0
> >>>>>
> >>>>>       Layout : left-symmetric
> >>>>>   Chunk Size : 64K
> >>>>>
> >>>>>         UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
> >>>>>       Events : 0.110
> >>>>>
> >>>>>  Number   Major   Minor   RaidDevice State
> >>>>>     0      66        1        0      active sync   /dev/sdag1
> >>>>>     1      66       17        1      active sync   /dev/sdah1
> >>>>>     2      66       33        2      active sync   /dev/sdai1
> >>>>>     3      66       49        3      active sync   /dev/sdaj1
> >>>>>     4      66       65        4      active sync   /dev/sdak1
> >>>>>     5       0        0        5      removed
> >>>>>     6       0        0        6      removed
> >>>>>     7      66      113        7      active sync   /dev/sdan1
> >>>>>
> >>>>>     8      66       97        -      faulty spare   /dev/sdam1
> >>>>> -
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: How many drives are bad?
  2008-02-19 19:25           ` Norman Elton
@ 2008-02-19 19:44             ` Steve Fairbairn
  2008-02-20  0:22               ` Guy Watkins
  2008-02-20  7:21             ` Peter Grandi
  1 sibling, 1 reply; 15+ messages in thread
From: Steve Fairbairn @ 2008-02-19 19:44 UTC (permalink / raw)
  To: 'Norman Elton'; +Cc: linux-raid

> 
> The box presents 48 drives, split across 6 SATA controllers. 
> So disks sda-sdh are on one controller, etc. In our 
> configuration, I run a RAID5 MD array for each controller, 
> then run LVM on top of these to form one large VolGroup.
> 

I might be missing something here, and I realise you'd lose 8 drives to
redundancy rather than 6, but wouldn't it have been better to have 8
arrays of 6 drives, each array using a single drive from each
controller?  That way a single controller failure (assuming no other HD
failures) wouldn't actually take any array down?  I do realise that 2
controller failures at the same time would lose everything.

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.20.7/1286 - Release Date:
18/02/2008 18:49

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: How many drives are bad?
  2008-02-19 19:44             ` Steve Fairbairn
@ 2008-02-20  0:22               ` Guy Watkins
  0 siblings, 0 replies; 15+ messages in thread
From: Guy Watkins @ 2008-02-20  0:22 UTC (permalink / raw)
  To: 'Steve Fairbairn', 'Norman Elton'; +Cc: linux-raid

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of Steve Fairbairn
} Sent: Tuesday, February 19, 2008 2:45 PM
} To: 'Norman Elton'
} Cc: linux-raid@vger.kernel.org
} Subject: RE: How many drives are bad?
} 
} 
} >
} > The box presents 48 drives, split across 6 SATA controllers.
} > So disks sda-sdh are on one controller, etc. In our
} > configuration, I run a RAID5 MD array for each controller,
} > then run LVM on top of these to form one large VolGroup.
} >
} 
} I might be missing something here, and I realise you'd lose 8 drives to
} redundancy rather than 6, but wouldn't it have been better to have 8
} arrays of 6 drives, each array using a single drive from each
} controller?  That way a single controller failure (assuming no other HD
} failures) wouldn't actually take any array down?  I do realise that 2
} controller failures at the same time would lose everything.

Wow.  Sounds like what I said a few months ago.  I think I also recommended
RAID6.

Guy

} 
} Steve.
} 
} No virus found in this outgoing message.
} Checked by AVG Free Edition.
} Version: 7.5.516 / Virus Database: 269.20.7/1286 - Release Date:
} 18/02/2008 18:49
} 
} 
} -
} To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} the body of a message to majordomo@vger.kernel.org
} More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 19:25           ` Norman Elton
  2008-02-19 19:44             ` Steve Fairbairn
@ 2008-02-20  7:21             ` Peter Grandi
  2008-02-21 18:12               ` Norman Elton
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Grandi @ 2008-02-20  7:21 UTC (permalink / raw)
  To: Linux RAID

>>> On Tue, 19 Feb 2008 14:25:28 -0500, "Norman Elton"
>>> <normelton@gmail.com> said:

[ ... ]

normelton> The box presents 48 drives, split across 6 SATA
normelton> controllers. So disks sda-sdh are on one controller,
normelton> etc. In our configuration, I run a RAID5 MD array for
normelton> each controller, then run LVM on top of these to form
normelton> one large VolGroup.

Pure genius! I wonder how many Thumpers have been configured in
this well thought out way :-).

BTW, just to be sure -- you are running LVM in default linear
mode over those 6 RAID5s aren't you?

normelton> I found that it was easiest to setup ext3 with a max
normelton> of 2TB partitions. So running on top of the massive
normelton> LVM VolGroup are a handful of ext3 partitions, each
normelton> mounted in the filesystem.

Uhm, assuming 500GB drives each RAID set has a capacity of
3.5TB, and odds are that a bit over half of those 2TB volumes
will straddle array boundaries. Such attention to detail is
quite remarkable :-).

normelton> This less than ideal (ZFS would allow us one large
normelton> partition),

That would be another stroke of genius! (especially if you were
still using a set of underlying RAID5s instead of letting ZFS do
its RAIDZ thing). :-)

normelton> but we're rewriting some software to utilize the
normelton> multi-partition scheme.

Good luck!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-20  7:21             ` Peter Grandi
@ 2008-02-21 18:12               ` Norman Elton
  2008-02-21 20:54                 ` pg_mh, Peter Grandi
  0 siblings, 1 reply; 15+ messages in thread
From: Norman Elton @ 2008-02-21 18:12 UTC (permalink / raw)
  To: Linux RAID

> Pure genius! I wonder how many Thumpers have been configured in
> this well thought out way :-).

I'm sorry I missed your contributions to the discussion a few weeks ago.

As I said up front, this is a test system. We're still trying a number  
of different configurations, and are learning how best to recover from  
a fault. Guy Watkins proposed one a few weeks ago that we haven't yet  
tried, but given our current situation... it may be a good time to  
give it a shot.

I'm still not convinced we were running a degraded array before this.  
One drive mysteriously dropped from the array, showing up as "removed"  
but not failed. We did not receive the notification that we did when  
the second actually failed. I'm still thinking its just one drive that  
actually failed.

Assuming we go with Guy's layout of 8 arrays of 6 drives (picking one  
from each controller), how would you setup the LVM VolGroups over top  
of these already distributed arrays?

Thanks again,

Norman

On Feb 20, 2008, at 2:21 AM, Peter Grandi wrote:

>>>> On Tue, 19 Feb 2008 14:25:28 -0500, "Norman Elton"
>>>> <normelton@gmail.com> said:
>
> [ ... ]
>
> normelton> The box presents 48 drives, split across 6 SATA
> normelton> controllers. So disks sda-sdh are on one controller,
> normelton> etc. In our configuration, I run a RAID5 MD array for
> normelton> each controller, then run LVM on top of these to form
> normelton> one large VolGroup.
>
> Pure genius! I wonder how many Thumpers have been configured in
> this well thought out way :-).
>
> BTW, just to be sure -- you are running LVM in default linear
> mode over those 6 RAID5s aren't you?
>
> normelton> I found that it was easiest to setup ext3 with a max
> normelton> of 2TB partitions. So running on top of the massive
> normelton> LVM VolGroup are a handful of ext3 partitions, each
> normelton> mounted in the filesystem.
>
> Uhm, assuming 500GB drives each RAID set has a capacity of
> 3.5TB, and odds are that a bit over half of those 2TB volumes
> will straddle array boundaries. Such attention to detail is
> quite remarkable :-).
>
> normelton> This less than ideal (ZFS would allow us one large
> normelton> partition),
>
> That would be another stroke of genius! (especially if you were
> still using a set of underlying RAID5s instead of letting ZFS do
> its RAIDZ thing). :-)
>
> normelton> but we're rewriting some software to utilize the
> normelton> multi-partition scheme.
>
> Good luck!
> -
> To unsubscribe from this list: send the line "unsubscribe linux- 
> raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-21 18:12               ` Norman Elton
@ 2008-02-21 20:54                 ` pg_mh, Peter Grandi
  2008-02-21 21:45                   ` Peter Rabbitson
  0 siblings, 1 reply; 15+ messages in thread
From: pg_mh, Peter Grandi @ 2008-02-21 20:54 UTC (permalink / raw)
  To: Linux RAID

>>> On Thu, 21 Feb 2008 13:12:30 -0500, Norman Elton
>>> <normelton@gmail.com> said:

[ ... ]

normelton> Assuming we go with Guy's layout of 8 arrays of 6
normelton> drives (picking one from each controller),

Guy Watkins proposed another one too:

   «Assuming the 6 controllers are equal, I would make 3 16 disk
    RAID6 arrays using 2 disks from each controller.  That way
    any 1 controller can fail and your system will still be
    running. 6 disks will be used for redundancy.

    Or 6 8 disk RAID6 arrays using 1 disk from each controller).
    That way any 2 controllers can fail and your system will
    still be running. 12 disks will be used for redundancy.
    Might be too excessive!»

So, I would not be overjoyed with either physical configuration,
except in a few particular cases. It is very amusing to read such
worries about host adapter failures, and somewhat depressing to
see "too excessive" used to describe 4+2 parity RAID.

normelton> how would you setup the LVM VolGroups over top of
normelton> these already distributed arrays?

That looks like a trick question, or at least an incorrect
question; because I would rather not do anything like that
except in a very few cases.

However, if one wants to do a bad thing in the least bad way,
perhaps a volume group per array would be least bad.

Going back to your original question:

  «So... we're curious how Linux will handle such a beast. Has
   anyone run MD software RAID over so many disks? Then piled
   LVM/ext3 on top of that?»

I haven't because it sounds rather inappropriate to me.

  «Any suggestions?»

Not easy to respond without a clear statement of what the array
be used for: RAID levels and file systems are very anisotropic
in both performance an resilience, so a particular configuration
may be very good for something but not for something else.

For example a 48 drive RAID0 with 'ext2' on top would be very
good for some cases, but perhaps not for archival :-).
In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in
very few cases and RAID6 almost never.

In general current storage practices do not handle that well
large single computer storage pools (just consider 'fsck'
times) and beyond 10TB I reckon that currently only multi-host
parallel/cluster file systems are good enough, for example
Lustre (for smaller multi TB filesystem I'd use JFS or XFS).

But then Lustre can be also used on a single machine with
multiple (say 2TB) block devices, and this may be the best
choice here too if a single virtual filesystem is the goal:

  http://wiki.Lustre.org/index.php?title=Lustre_Howto
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-21 20:54                 ` pg_mh, Peter Grandi
@ 2008-02-21 21:45                   ` Peter Rabbitson
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Rabbitson @ 2008-02-21 21:45 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID

Peter Grandi wrote:
> In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in

Interesting movement. What do you think is their stance on Raid Fix? :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: How many drives are bad?
  2008-02-19 17:23 How many drives are bad? Norman Elton
  2008-02-19 17:31 ` Justin Piszcz
@ 2008-02-21  4:28 ` Neil Brown
  1 sibling, 0 replies; 15+ messages in thread
From: Neil Brown @ 2008-02-21  4:28 UTC (permalink / raw)
  To: Norman Elton; +Cc: linux-raid

On Tuesday February 19, normelton@gmail.com wrote:
> So I had my first "failure" today, when I got a report that one drive
> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It
> appears that two drives are listed as "removed", but the array is
> still functioning. What does this mean? How many drives actually
> failed?

The array is configured for 8 devices, but on 6 are active.  So you
have lost data.
Of the two missing devices, one is still in the array and is marked as
fault.  One is simply not present at all.
Hence "Failed Devices: 1".  i.e. there is one failed device in the
array.

It looks like you have been running a degraded array for a while
(maybe not a long while) and the device has then failed.

"mdadm --monitor"

will send you mail if you have a degraded array.

NeilBrown

> 
> This is all a test system, so I can dink around as much as necessary.
> Thanks for any advice!
> 
> Norman Elton
> 
> ====== OUTPUT OF MDADM =====
> 
>         Version : 00.90.03
>   Creation Time : Fri Jan 18 13:17:33 2008
>      Raid Level : raid5
>      Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
>     Device Size : 976759936 (931.51 GiB 1000.20 GB)
>    Raid Devices : 8
>   Total Devices : 7
> Preferred Minor : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Mon Feb 18 11:49:13 2008
>           State : clean, degraded
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>            UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
>          Events : 0.110
> 
>     Number   Major   Minor   RaidDevice State
>        0      66        1        0      active sync   /dev/sdag1
>        1      66       17        1      active sync   /dev/sdah1
>        2      66       33        2      active sync   /dev/sdai1
>        3      66       49        3      active sync   /dev/sdaj1
>        4      66       65        4      active sync   /dev/sdak1
>        5       0        0        5      removed
>        6       0        0        6      removed
>        7      66      113        7      active sync   /dev/sdan1
> 
>        8      66       97        -      faulty spare   /dev/sdam1
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: How many drives are bad?
@ 2008-02-20  4:03 Guy Watkins
  0 siblings, 0 replies; 15+ messages in thread
From: Guy Watkins @ 2008-02-20  4:03 UTC (permalink / raw)
  To: 'Guy Watkins', 'Steve Fairbairn',
	'Norman Elton'
  Cc: linux-raid

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of Steve Fairbairn
} Sent: Tuesday, February 19, 2008 2:45 PM
} To: 'Norman Elton'
} Cc: linux-raid@vger.kernel.org
} Subject: RE: How many drives are bad?
}
}
} >
} > The box presents 48 drives, split across 6 SATA controllers.
} > So disks sda-sdh are on one controller, etc. In our
} > configuration, I run a RAID5 MD array for each controller,
} > then run LVM on top of these to form one large VolGroup.
} >
}
} I might be missing something here, and I realise you'd lose 8 drives to
} redundancy rather than 6, but wouldn't it have been better to have 8
} arrays of 6 drives, each array using a single drive from each
} controller?  That way a single controller failure (assuming no other HD
} failures) wouldn't actually take any array down?  I do realise that 2
} controller failures at the same time would lose everything.

Wow.  Sounds like what I said a few months ago.  I think I also recommended
RAID6.

Guy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-02-21 21:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-19 17:23 How many drives are bad? Norman Elton
2008-02-19 17:31 ` Justin Piszcz
2008-02-19 18:24   ` Norman Elton
2008-02-19 18:33     ` Justin Piszcz
2008-02-19 18:38       ` Norman Elton
2008-02-19 19:13         ` Justin Piszcz
2008-02-19 19:25           ` Norman Elton
2008-02-19 19:44             ` Steve Fairbairn
2008-02-20  0:22               ` Guy Watkins
2008-02-20  7:21             ` Peter Grandi
2008-02-21 18:12               ` Norman Elton
2008-02-21 20:54                 ` pg_mh, Peter Grandi
2008-02-21 21:45                   ` Peter Rabbitson
2008-02-21  4:28 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2008-02-20  4:03 Guy Watkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).