* How many drives are bad?
@ 2008-02-19 17:23 Norman Elton
2008-02-19 17:31 ` Justin Piszcz
2008-02-21 4:28 ` Neil Brown
0 siblings, 2 replies; 15+ messages in thread
From: Norman Elton @ 2008-02-19 17:23 UTC (permalink / raw)
To: linux-raid
So I had my first "failure" today, when I got a report that one drive
(/dev/sdam) failed. I've attached the output of "mdadm --detail". It
appears that two drives are listed as "removed", but the array is
still functioning. What does this mean? How many drives actually
failed?
This is all a test system, so I can dink around as much as necessary.
Thanks for any advice!
Norman Elton
====== OUTPUT OF MDADM =====
Version : 00.90.03
Creation Time : Fri Jan 18 13:17:33 2008
Raid Level : raid5
Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
Device Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 8
Total Devices : 7
Preferred Minor : 4
Persistence : Superblock is persistent
Update Time : Mon Feb 18 11:49:13 2008
State : clean, degraded
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20
Events : 0.110
Number Major Minor RaidDevice State
0 66 1 0 active sync /dev/sdag1
1 66 17 1 active sync /dev/sdah1
2 66 33 2 active sync /dev/sdai1
3 66 49 3 active sync /dev/sdaj1
4 66 65 4 active sync /dev/sdak1
5 0 0 5 removed
6 0 0 6 removed
7 66 113 7 active sync /dev/sdan1
8 66 97 - faulty spare /dev/sdam1
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: How many drives are bad? 2008-02-19 17:23 How many drives are bad? Norman Elton @ 2008-02-19 17:31 ` Justin Piszcz 2008-02-19 18:24 ` Norman Elton 2008-02-21 4:28 ` Neil Brown 1 sibling, 1 reply; 15+ messages in thread From: Justin Piszcz @ 2008-02-19 17:31 UTC (permalink / raw) To: Norman Elton; +Cc: linux-raid How many drives actually failed? > Failed Devices : 1 On Tue, 19 Feb 2008, Norman Elton wrote: > So I had my first "failure" today, when I got a report that one drive > (/dev/sdam) failed. I've attached the output of "mdadm --detail". It > appears that two drives are listed as "removed", but the array is > still functioning. What does this mean? How many drives actually > failed? > > This is all a test system, so I can dink around as much as necessary. > Thanks for any advice! > > Norman Elton > > ====== OUTPUT OF MDADM ===== > > Version : 00.90.03 > Creation Time : Fri Jan 18 13:17:33 2008 > Raid Level : raid5 > Array Size : 6837319552 (6520.58 GiB 7001.42 GB) > Device Size : 976759936 (931.51 GiB 1000.20 GB) > Raid Devices : 8 > Total Devices : 7 > Preferred Minor : 4 > Persistence : Superblock is persistent > > Update Time : Mon Feb 18 11:49:13 2008 > State : clean, degraded > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 > Events : 0.110 > > Number Major Minor RaidDevice State > 0 66 1 0 active sync /dev/sdag1 > 1 66 17 1 active sync /dev/sdah1 > 2 66 33 2 active sync /dev/sdai1 > 3 66 49 3 active sync /dev/sdaj1 > 4 66 65 4 active sync /dev/sdak1 > 5 0 0 5 removed > 6 0 0 6 removed > 7 66 113 7 active sync /dev/sdan1 > > 8 66 97 - faulty spare /dev/sdam1 > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 17:31 ` Justin Piszcz @ 2008-02-19 18:24 ` Norman Elton 2008-02-19 18:33 ` Justin Piszcz 0 siblings, 1 reply; 15+ messages in thread From: Norman Elton @ 2008-02-19 18:24 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid But why do two show up as "removed"?? I would expect /dev/sdal1 to show up someplace, either active or failed. Any ideas? Thanks, Norman On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote: > How many drives actually failed? >> Failed Devices : 1 > > > On Tue, 19 Feb 2008, Norman Elton wrote: > >> So I had my first "failure" today, when I got a report that one drive >> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It >> appears that two drives are listed as "removed", but the array is >> still functioning. What does this mean? How many drives actually >> failed? >> >> This is all a test system, so I can dink around as much as necessary. >> Thanks for any advice! >> >> Norman Elton >> >> ====== OUTPUT OF MDADM ===== >> >> Version : 00.90.03 >> Creation Time : Fri Jan 18 13:17:33 2008 >> Raid Level : raid5 >> Array Size : 6837319552 (6520.58 GiB 7001.42 GB) >> Device Size : 976759936 (931.51 GiB 1000.20 GB) >> Raid Devices : 8 >> Total Devices : 7 >> Preferred Minor : 4 >> Persistence : Superblock is persistent >> >> Update Time : Mon Feb 18 11:49:13 2008 >> State : clean, degraded >> Active Devices : 6 >> Working Devices : 6 >> Failed Devices : 1 >> Spare Devices : 0 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 >> Events : 0.110 >> >> Number Major Minor RaidDevice State >> 0 66 1 0 active sync /dev/sdag1 >> 1 66 17 1 active sync /dev/sdah1 >> 2 66 33 2 active sync /dev/sdai1 >> 3 66 49 3 active sync /dev/sdaj1 >> 4 66 65 4 active sync /dev/sdak1 >> 5 0 0 5 removed >> 6 0 0 6 removed >> 7 66 113 7 active sync /dev/sdan1 >> >> 8 66 97 - faulty spare /dev/sdam1 >> - >> To unsubscribe from this list: send the line "unsubscribe linux- >> raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 18:24 ` Norman Elton @ 2008-02-19 18:33 ` Justin Piszcz 2008-02-19 18:38 ` Norman Elton 0 siblings, 1 reply; 15+ messages in thread From: Justin Piszcz @ 2008-02-19 18:33 UTC (permalink / raw) To: Norman Elton; +Cc: linux-raid Neil, Is this a bug? Also, I have a question for Norman-- how come your drives are sda[a-z]1? Typically it is /dev/sda1 /dev/sdb1 etc? Justin. On Tue, 19 Feb 2008, Norman Elton wrote: > But why do two show up as "removed"?? I would expect /dev/sdal1 to show up > someplace, either active or failed. > > Any ideas? > > Thanks, > > Norman > > > > On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote: > >> How many drives actually failed? >>> Failed Devices : 1 >> >> >> On Tue, 19 Feb 2008, Norman Elton wrote: >> >>> So I had my first "failure" today, when I got a report that one drive >>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It >>> appears that two drives are listed as "removed", but the array is >>> still functioning. What does this mean? How many drives actually >>> failed? >>> >>> This is all a test system, so I can dink around as much as necessary. >>> Thanks for any advice! >>> >>> Norman Elton >>> >>> ====== OUTPUT OF MDADM ===== >>> >>> Version : 00.90.03 >>> Creation Time : Fri Jan 18 13:17:33 2008 >>> Raid Level : raid5 >>> Array Size : 6837319552 (6520.58 GiB 7001.42 GB) >>> Device Size : 976759936 (931.51 GiB 1000.20 GB) >>> Raid Devices : 8 >>> Total Devices : 7 >>> Preferred Minor : 4 >>> Persistence : Superblock is persistent >>> >>> Update Time : Mon Feb 18 11:49:13 2008 >>> State : clean, degraded >>> Active Devices : 6 >>> Working Devices : 6 >>> Failed Devices : 1 >>> Spare Devices : 0 >>> >>> Layout : left-symmetric >>> Chunk Size : 64K >>> >>> UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 >>> Events : 0.110 >>> >>> Number Major Minor RaidDevice State >>> 0 66 1 0 active sync /dev/sdag1 >>> 1 66 17 1 active sync /dev/sdah1 >>> 2 66 33 2 active sync /dev/sdai1 >>> 3 66 49 3 active sync /dev/sdaj1 >>> 4 66 65 4 active sync /dev/sdak1 >>> 5 0 0 5 removed >>> 6 0 0 6 removed >>> 7 66 113 7 active sync /dev/sdan1 >>> >>> 8 66 97 - faulty spare /dev/sdam1 >>> - >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 18:33 ` Justin Piszcz @ 2008-02-19 18:38 ` Norman Elton 2008-02-19 19:13 ` Justin Piszcz 0 siblings, 1 reply; 15+ messages in thread From: Norman Elton @ 2008-02-19 18:38 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid Justin, This is a Sun X4500 (Thumper) box, so it's got 48 drives inside. /dev/sd[a-z] are all there as well, just in other RAID sets. Once you get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc. I'd be curious if what I'm experiencing is a bug. What should I try to restore the array? Norman On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > Neil, > > Is this a bug? > > Also, I have a question for Norman-- how come your drives are sda[a-z]1? > Typically it is /dev/sda1 /dev/sdb1 etc? > > Justin. > > On Tue, 19 Feb 2008, Norman Elton wrote: > > > But why do two show up as "removed"?? I would expect /dev/sdal1 to show up > > someplace, either active or failed. > > > > Any ideas? > > > > Thanks, > > > > Norman > > > > > > > > On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote: > > > >> How many drives actually failed? > >>> Failed Devices : 1 > >> > >> > >> On Tue, 19 Feb 2008, Norman Elton wrote: > >> > >>> So I had my first "failure" today, when I got a report that one drive > >>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It > >>> appears that two drives are listed as "removed", but the array is > >>> still functioning. What does this mean? How many drives actually > >>> failed? > >>> > >>> This is all a test system, so I can dink around as much as necessary. > >>> Thanks for any advice! > >>> > >>> Norman Elton > >>> > >>> ====== OUTPUT OF MDADM ===== > >>> > >>> Version : 00.90.03 > >>> Creation Time : Fri Jan 18 13:17:33 2008 > >>> Raid Level : raid5 > >>> Array Size : 6837319552 (6520.58 GiB 7001.42 GB) > >>> Device Size : 976759936 (931.51 GiB 1000.20 GB) > >>> Raid Devices : 8 > >>> Total Devices : 7 > >>> Preferred Minor : 4 > >>> Persistence : Superblock is persistent > >>> > >>> Update Time : Mon Feb 18 11:49:13 2008 > >>> State : clean, degraded > >>> Active Devices : 6 > >>> Working Devices : 6 > >>> Failed Devices : 1 > >>> Spare Devices : 0 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 64K > >>> > >>> UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 > >>> Events : 0.110 > >>> > >>> Number Major Minor RaidDevice State > >>> 0 66 1 0 active sync /dev/sdag1 > >>> 1 66 17 1 active sync /dev/sdah1 > >>> 2 66 33 2 active sync /dev/sdai1 > >>> 3 66 49 3 active sync /dev/sdaj1 > >>> 4 66 65 4 active sync /dev/sdak1 > >>> 5 0 0 5 removed > >>> 6 0 0 6 removed > >>> 7 66 113 7 active sync /dev/sdan1 > >>> > >>> 8 66 97 - faulty spare /dev/sdam1 > >>> - > >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 18:38 ` Norman Elton @ 2008-02-19 19:13 ` Justin Piszcz 2008-02-19 19:25 ` Norman Elton 0 siblings, 1 reply; 15+ messages in thread From: Justin Piszcz @ 2008-02-19 19:13 UTC (permalink / raw) To: Norman Elton; +Cc: linux-raid, Alan Piszcz Norman, I am extremely interested in what distribution you are running on it and what type of SW raid you are employing (besides the one you showed here), are all 48 drives filled, or? Justin. On Tue, 19 Feb 2008, Norman Elton wrote: > Justin, > > This is a Sun X4500 (Thumper) box, so it's got 48 drives inside. > /dev/sd[a-z] are all there as well, just in other RAID sets. Once you > get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc. > > I'd be curious if what I'm experiencing is a bug. What should I try to > restore the array? > > Norman > > On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: >> Neil, >> >> Is this a bug? >> >> Also, I have a question for Norman-- how come your drives are sda[a-z]1? >> Typically it is /dev/sda1 /dev/sdb1 etc? >> >> Justin. >> >> On Tue, 19 Feb 2008, Norman Elton wrote: >> >>> But why do two show up as "removed"?? I would expect /dev/sdal1 to show up >>> someplace, either active or failed. >>> >>> Any ideas? >>> >>> Thanks, >>> >>> Norman >>> >>> >>> >>> On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote: >>> >>>> How many drives actually failed? >>>>> Failed Devices : 1 >>>> >>>> >>>> On Tue, 19 Feb 2008, Norman Elton wrote: >>>> >>>>> So I had my first "failure" today, when I got a report that one drive >>>>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It >>>>> appears that two drives are listed as "removed", but the array is >>>>> still functioning. What does this mean? How many drives actually >>>>> failed? >>>>> >>>>> This is all a test system, so I can dink around as much as necessary. >>>>> Thanks for any advice! >>>>> >>>>> Norman Elton >>>>> >>>>> ====== OUTPUT OF MDADM ===== >>>>> >>>>> Version : 00.90.03 >>>>> Creation Time : Fri Jan 18 13:17:33 2008 >>>>> Raid Level : raid5 >>>>> Array Size : 6837319552 (6520.58 GiB 7001.42 GB) >>>>> Device Size : 976759936 (931.51 GiB 1000.20 GB) >>>>> Raid Devices : 8 >>>>> Total Devices : 7 >>>>> Preferred Minor : 4 >>>>> Persistence : Superblock is persistent >>>>> >>>>> Update Time : Mon Feb 18 11:49:13 2008 >>>>> State : clean, degraded >>>>> Active Devices : 6 >>>>> Working Devices : 6 >>>>> Failed Devices : 1 >>>>> Spare Devices : 0 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 64K >>>>> >>>>> UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 >>>>> Events : 0.110 >>>>> >>>>> Number Major Minor RaidDevice State >>>>> 0 66 1 0 active sync /dev/sdag1 >>>>> 1 66 17 1 active sync /dev/sdah1 >>>>> 2 66 33 2 active sync /dev/sdai1 >>>>> 3 66 49 3 active sync /dev/sdaj1 >>>>> 4 66 65 4 active sync /dev/sdak1 >>>>> 5 0 0 5 removed >>>>> 6 0 0 6 removed >>>>> 7 66 113 7 active sync /dev/sdan1 >>>>> >>>>> 8 66 97 - faulty spare /dev/sdam1 >>>>> - >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 19:13 ` Justin Piszcz @ 2008-02-19 19:25 ` Norman Elton 2008-02-19 19:44 ` Steve Fairbairn 2008-02-20 7:21 ` Peter Grandi 0 siblings, 2 replies; 15+ messages in thread From: Norman Elton @ 2008-02-19 19:25 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid, Alan Piszcz Justin, There was actually a discussion I fired off a few weeks ago about how to best run SW RAID on this hardware. Here's the recap: We're running RHEL, so no access to ZFS/XFS. I really wish we could do ZFS, but no luck. The box presents 48 drives, split across 6 SATA controllers. So disks sda-sdh are on one controller, etc. In our configuration, I run a RAID5 MD array for each controller, then run LVM on top of these to form one large VolGroup. I found that it was easiest to setup ext3 with a max of 2TB partitions. So running on top of the massive LVM VolGroup are a handful of ext3 partitions, each mounted in the filesystem. This less than ideal (ZFS would allow us one large partition), but we're rewriting some software to utilize the multi-partition scheme. In this setup, we should be fairly protected against drive failure. We are vulnerable to a controller failure. If such a failure occurred, we'd have to restore from backup. Hope this helps, let me know if you have any questions or suggestions. I'm certainly no expert here! Thanks, Norman On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > Norman, > > I am extremely interested in what distribution you are running on it and > what type of SW raid you are employing (besides the one you showed here), > are all 48 drives filled, or? > > Justin. > > On Tue, 19 Feb 2008, Norman Elton wrote: > > > Justin, > > > > This is a Sun X4500 (Thumper) box, so it's got 48 drives inside. > > /dev/sd[a-z] are all there as well, just in other RAID sets. Once you > > get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc. > > > > I'd be curious if what I'm experiencing is a bug. What should I try to > > restore the array? > > > > Norman > > > > On 2/19/08, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > >> Neil, > >> > >> Is this a bug? > >> > >> Also, I have a question for Norman-- how come your drives are sda[a-z]1? > >> Typically it is /dev/sda1 /dev/sdb1 etc? > >> > >> Justin. > >> > >> On Tue, 19 Feb 2008, Norman Elton wrote: > >> > >>> But why do two show up as "removed"?? I would expect /dev/sdal1 to show up > >>> someplace, either active or failed. > >>> > >>> Any ideas? > >>> > >>> Thanks, > >>> > >>> Norman > >>> > >>> > >>> > >>> On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote: > >>> > >>>> How many drives actually failed? > >>>>> Failed Devices : 1 > >>>> > >>>> > >>>> On Tue, 19 Feb 2008, Norman Elton wrote: > >>>> > >>>>> So I had my first "failure" today, when I got a report that one drive > >>>>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It > >>>>> appears that two drives are listed as "removed", but the array is > >>>>> still functioning. What does this mean? How many drives actually > >>>>> failed? > >>>>> > >>>>> This is all a test system, so I can dink around as much as necessary. > >>>>> Thanks for any advice! > >>>>> > >>>>> Norman Elton > >>>>> > >>>>> ====== OUTPUT OF MDADM ===== > >>>>> > >>>>> Version : 00.90.03 > >>>>> Creation Time : Fri Jan 18 13:17:33 2008 > >>>>> Raid Level : raid5 > >>>>> Array Size : 6837319552 (6520.58 GiB 7001.42 GB) > >>>>> Device Size : 976759936 (931.51 GiB 1000.20 GB) > >>>>> Raid Devices : 8 > >>>>> Total Devices : 7 > >>>>> Preferred Minor : 4 > >>>>> Persistence : Superblock is persistent > >>>>> > >>>>> Update Time : Mon Feb 18 11:49:13 2008 > >>>>> State : clean, degraded > >>>>> Active Devices : 6 > >>>>> Working Devices : 6 > >>>>> Failed Devices : 1 > >>>>> Spare Devices : 0 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 64K > >>>>> > >>>>> UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 > >>>>> Events : 0.110 > >>>>> > >>>>> Number Major Minor RaidDevice State > >>>>> 0 66 1 0 active sync /dev/sdag1 > >>>>> 1 66 17 1 active sync /dev/sdah1 > >>>>> 2 66 33 2 active sync /dev/sdai1 > >>>>> 3 66 49 3 active sync /dev/sdaj1 > >>>>> 4 66 65 4 active sync /dev/sdak1 > >>>>> 5 0 0 5 removed > >>>>> 6 0 0 6 removed > >>>>> 7 66 113 7 active sync /dev/sdan1 > >>>>> > >>>>> 8 66 97 - faulty spare /dev/sdam1 > >>>>> - > >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>>>> the body of a message to majordomo@vger.kernel.org > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: How many drives are bad? 2008-02-19 19:25 ` Norman Elton @ 2008-02-19 19:44 ` Steve Fairbairn 2008-02-20 0:22 ` Guy Watkins 2008-02-20 7:21 ` Peter Grandi 1 sibling, 1 reply; 15+ messages in thread From: Steve Fairbairn @ 2008-02-19 19:44 UTC (permalink / raw) To: 'Norman Elton'; +Cc: linux-raid > > The box presents 48 drives, split across 6 SATA controllers. > So disks sda-sdh are on one controller, etc. In our > configuration, I run a RAID5 MD array for each controller, > then run LVM on top of these to form one large VolGroup. > I might be missing something here, and I realise you'd lose 8 drives to redundancy rather than 6, but wouldn't it have been better to have 8 arrays of 6 drives, each array using a single drive from each controller? That way a single controller failure (assuming no other HD failures) wouldn't actually take any array down? I do realise that 2 controller failures at the same time would lose everything. Steve. No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.20.7/1286 - Release Date: 18/02/2008 18:49 ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: How many drives are bad? 2008-02-19 19:44 ` Steve Fairbairn @ 2008-02-20 0:22 ` Guy Watkins 0 siblings, 0 replies; 15+ messages in thread From: Guy Watkins @ 2008-02-20 0:22 UTC (permalink / raw) To: 'Steve Fairbairn', 'Norman Elton'; +Cc: linux-raid } -----Original Message----- } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- } owner@vger.kernel.org] On Behalf Of Steve Fairbairn } Sent: Tuesday, February 19, 2008 2:45 PM } To: 'Norman Elton' } Cc: linux-raid@vger.kernel.org } Subject: RE: How many drives are bad? } } } > } > The box presents 48 drives, split across 6 SATA controllers. } > So disks sda-sdh are on one controller, etc. In our } > configuration, I run a RAID5 MD array for each controller, } > then run LVM on top of these to form one large VolGroup. } > } } I might be missing something here, and I realise you'd lose 8 drives to } redundancy rather than 6, but wouldn't it have been better to have 8 } arrays of 6 drives, each array using a single drive from each } controller? That way a single controller failure (assuming no other HD } failures) wouldn't actually take any array down? I do realise that 2 } controller failures at the same time would lose everything. Wow. Sounds like what I said a few months ago. I think I also recommended RAID6. Guy } } Steve. } } No virus found in this outgoing message. } Checked by AVG Free Edition. } Version: 7.5.516 / Virus Database: 269.20.7/1286 - Release Date: } 18/02/2008 18:49 } } } - } To unsubscribe from this list: send the line "unsubscribe linux-raid" in } the body of a message to majordomo@vger.kernel.org } More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 19:25 ` Norman Elton 2008-02-19 19:44 ` Steve Fairbairn @ 2008-02-20 7:21 ` Peter Grandi 2008-02-21 18:12 ` Norman Elton 1 sibling, 1 reply; 15+ messages in thread From: Peter Grandi @ 2008-02-20 7:21 UTC (permalink / raw) To: Linux RAID >>> On Tue, 19 Feb 2008 14:25:28 -0500, "Norman Elton" >>> <normelton@gmail.com> said: [ ... ] normelton> The box presents 48 drives, split across 6 SATA normelton> controllers. So disks sda-sdh are on one controller, normelton> etc. In our configuration, I run a RAID5 MD array for normelton> each controller, then run LVM on top of these to form normelton> one large VolGroup. Pure genius! I wonder how many Thumpers have been configured in this well thought out way :-). BTW, just to be sure -- you are running LVM in default linear mode over those 6 RAID5s aren't you? normelton> I found that it was easiest to setup ext3 with a max normelton> of 2TB partitions. So running on top of the massive normelton> LVM VolGroup are a handful of ext3 partitions, each normelton> mounted in the filesystem. Uhm, assuming 500GB drives each RAID set has a capacity of 3.5TB, and odds are that a bit over half of those 2TB volumes will straddle array boundaries. Such attention to detail is quite remarkable :-). normelton> This less than ideal (ZFS would allow us one large normelton> partition), That would be another stroke of genius! (especially if you were still using a set of underlying RAID5s instead of letting ZFS do its RAIDZ thing). :-) normelton> but we're rewriting some software to utilize the normelton> multi-partition scheme. Good luck! ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-20 7:21 ` Peter Grandi @ 2008-02-21 18:12 ` Norman Elton 2008-02-21 20:54 ` pg_mh, Peter Grandi 0 siblings, 1 reply; 15+ messages in thread From: Norman Elton @ 2008-02-21 18:12 UTC (permalink / raw) To: Linux RAID > Pure genius! I wonder how many Thumpers have been configured in > this well thought out way :-). I'm sorry I missed your contributions to the discussion a few weeks ago. As I said up front, this is a test system. We're still trying a number of different configurations, and are learning how best to recover from a fault. Guy Watkins proposed one a few weeks ago that we haven't yet tried, but given our current situation... it may be a good time to give it a shot. I'm still not convinced we were running a degraded array before this. One drive mysteriously dropped from the array, showing up as "removed" but not failed. We did not receive the notification that we did when the second actually failed. I'm still thinking its just one drive that actually failed. Assuming we go with Guy's layout of 8 arrays of 6 drives (picking one from each controller), how would you setup the LVM VolGroups over top of these already distributed arrays? Thanks again, Norman On Feb 20, 2008, at 2:21 AM, Peter Grandi wrote: >>>> On Tue, 19 Feb 2008 14:25:28 -0500, "Norman Elton" >>>> <normelton@gmail.com> said: > > [ ... ] > > normelton> The box presents 48 drives, split across 6 SATA > normelton> controllers. So disks sda-sdh are on one controller, > normelton> etc. In our configuration, I run a RAID5 MD array for > normelton> each controller, then run LVM on top of these to form > normelton> one large VolGroup. > > Pure genius! I wonder how many Thumpers have been configured in > this well thought out way :-). > > BTW, just to be sure -- you are running LVM in default linear > mode over those 6 RAID5s aren't you? > > normelton> I found that it was easiest to setup ext3 with a max > normelton> of 2TB partitions. So running on top of the massive > normelton> LVM VolGroup are a handful of ext3 partitions, each > normelton> mounted in the filesystem. > > Uhm, assuming 500GB drives each RAID set has a capacity of > 3.5TB, and odds are that a bit over half of those 2TB volumes > will straddle array boundaries. Such attention to detail is > quite remarkable :-). > > normelton> This less than ideal (ZFS would allow us one large > normelton> partition), > > That would be another stroke of genius! (especially if you were > still using a set of underlying RAID5s instead of letting ZFS do > its RAIDZ thing). :-) > > normelton> but we're rewriting some software to utilize the > normelton> multi-partition scheme. > > Good luck! > - > To unsubscribe from this list: send the line "unsubscribe linux- > raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-21 18:12 ` Norman Elton @ 2008-02-21 20:54 ` pg_mh, Peter Grandi 2008-02-21 21:45 ` Peter Rabbitson 0 siblings, 1 reply; 15+ messages in thread From: pg_mh, Peter Grandi @ 2008-02-21 20:54 UTC (permalink / raw) To: Linux RAID >>> On Thu, 21 Feb 2008 13:12:30 -0500, Norman Elton >>> <normelton@gmail.com> said: [ ... ] normelton> Assuming we go with Guy's layout of 8 arrays of 6 normelton> drives (picking one from each controller), Guy Watkins proposed another one too: «Assuming the 6 controllers are equal, I would make 3 16 disk RAID6 arrays using 2 disks from each controller. That way any 1 controller can fail and your system will still be running. 6 disks will be used for redundancy. Or 6 8 disk RAID6 arrays using 1 disk from each controller). That way any 2 controllers can fail and your system will still be running. 12 disks will be used for redundancy. Might be too excessive!» So, I would not be overjoyed with either physical configuration, except in a few particular cases. It is very amusing to read such worries about host adapter failures, and somewhat depressing to see "too excessive" used to describe 4+2 parity RAID. normelton> how would you setup the LVM VolGroups over top of normelton> these already distributed arrays? That looks like a trick question, or at least an incorrect question; because I would rather not do anything like that except in a very few cases. However, if one wants to do a bad thing in the least bad way, perhaps a volume group per array would be least bad. Going back to your original question: «So... we're curious how Linux will handle such a beast. Has anyone run MD software RAID over so many disks? Then piled LVM/ext3 on top of that?» I haven't because it sounds rather inappropriate to me. «Any suggestions?» Not easy to respond without a clear statement of what the array be used for: RAID levels and file systems are very anisotropic in both performance an resilience, so a particular configuration may be very good for something but not for something else. For example a 48 drive RAID0 with 'ext2' on top would be very good for some cases, but perhaps not for archival :-). In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in very few cases and RAID6 almost never. In general current storage practices do not handle that well large single computer storage pools (just consider 'fsck' times) and beyond 10TB I reckon that currently only multi-host parallel/cluster file systems are good enough, for example Lustre (for smaller multi TB filesystem I'd use JFS or XFS). But then Lustre can be also used on a single machine with multiple (say 2TB) block devices, and this may be the best choice here too if a single virtual filesystem is the goal: http://wiki.Lustre.org/index.php?title=Lustre_Howto - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-21 20:54 ` pg_mh, Peter Grandi @ 2008-02-21 21:45 ` Peter Rabbitson 0 siblings, 0 replies; 15+ messages in thread From: Peter Rabbitson @ 2008-02-21 21:45 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID Peter Grandi wrote: > In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in Interesting movement. What do you think is their stance on Raid Fix? :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: How many drives are bad? 2008-02-19 17:23 How many drives are bad? Norman Elton 2008-02-19 17:31 ` Justin Piszcz @ 2008-02-21 4:28 ` Neil Brown 1 sibling, 0 replies; 15+ messages in thread From: Neil Brown @ 2008-02-21 4:28 UTC (permalink / raw) To: Norman Elton; +Cc: linux-raid On Tuesday February 19, normelton@gmail.com wrote: > So I had my first "failure" today, when I got a report that one drive > (/dev/sdam) failed. I've attached the output of "mdadm --detail". It > appears that two drives are listed as "removed", but the array is > still functioning. What does this mean? How many drives actually > failed? The array is configured for 8 devices, but on 6 are active. So you have lost data. Of the two missing devices, one is still in the array and is marked as fault. One is simply not present at all. Hence "Failed Devices: 1". i.e. there is one failed device in the array. It looks like you have been running a degraded array for a while (maybe not a long while) and the device has then failed. "mdadm --monitor" will send you mail if you have a degraded array. NeilBrown > > This is all a test system, so I can dink around as much as necessary. > Thanks for any advice! > > Norman Elton > > ====== OUTPUT OF MDADM ===== > > Version : 00.90.03 > Creation Time : Fri Jan 18 13:17:33 2008 > Raid Level : raid5 > Array Size : 6837319552 (6520.58 GiB 7001.42 GB) > Device Size : 976759936 (931.51 GiB 1000.20 GB) > Raid Devices : 8 > Total Devices : 7 > Preferred Minor : 4 > Persistence : Superblock is persistent > > Update Time : Mon Feb 18 11:49:13 2008 > State : clean, degraded > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 > Events : 0.110 > > Number Major Minor RaidDevice State > 0 66 1 0 active sync /dev/sdag1 > 1 66 17 1 active sync /dev/sdah1 > 2 66 33 2 active sync /dev/sdai1 > 3 66 49 3 active sync /dev/sdaj1 > 4 66 65 4 active sync /dev/sdak1 > 5 0 0 5 removed > 6 0 0 6 removed > 7 66 113 7 active sync /dev/sdan1 > > 8 66 97 - faulty spare /dev/sdam1 > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: How many drives are bad? @ 2008-02-20 4:03 Guy Watkins 0 siblings, 0 replies; 15+ messages in thread From: Guy Watkins @ 2008-02-20 4:03 UTC (permalink / raw) To: 'Guy Watkins', 'Steve Fairbairn', 'Norman Elton' Cc: linux-raid } -----Original Message----- } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- } owner@vger.kernel.org] On Behalf Of Steve Fairbairn } Sent: Tuesday, February 19, 2008 2:45 PM } To: 'Norman Elton' } Cc: linux-raid@vger.kernel.org } Subject: RE: How many drives are bad? } } } > } > The box presents 48 drives, split across 6 SATA controllers. } > So disks sda-sdh are on one controller, etc. In our } > configuration, I run a RAID5 MD array for each controller, } > then run LVM on top of these to form one large VolGroup. } > } } I might be missing something here, and I realise you'd lose 8 drives to } redundancy rather than 6, but wouldn't it have been better to have 8 } arrays of 6 drives, each array using a single drive from each } controller? That way a single controller failure (assuming no other HD } failures) wouldn't actually take any array down? I do realise that 2 } controller failures at the same time would lose everything. Wow. Sounds like what I said a few months ago. I think I also recommended RAID6. Guy ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-02-21 21:45 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-19 17:23 How many drives are bad? Norman Elton 2008-02-19 17:31 ` Justin Piszcz 2008-02-19 18:24 ` Norman Elton 2008-02-19 18:33 ` Justin Piszcz 2008-02-19 18:38 ` Norman Elton 2008-02-19 19:13 ` Justin Piszcz 2008-02-19 19:25 ` Norman Elton 2008-02-19 19:44 ` Steve Fairbairn 2008-02-20 0:22 ` Guy Watkins 2008-02-20 7:21 ` Peter Grandi 2008-02-21 18:12 ` Norman Elton 2008-02-21 20:54 ` pg_mh, Peter Grandi 2008-02-21 21:45 ` Peter Rabbitson 2008-02-21 4:28 ` Neil Brown -- strict thread matches above, loose matches on Subject: below -- 2008-02-20 4:03 Guy Watkins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).