* RE: [PATCH 000 of 5] md: Introduction @ 2006-01-17 21:38 Lincoln Dale (ltd) 2006-01-18 13:27 ` Jan Engelhardt 0 siblings, 1 reply; 72+ messages in thread From: Lincoln Dale (ltd) @ 2006-01-17 21:38 UTC (permalink / raw) To: Michael Tokarev, NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson > Neil, is this online resizing/reshaping really needed? I understand > all those words means alot for marketing persons - zero downtime, > online resizing etc, but it is much safer and easier to do that stuff > 'offline', on an inactive array, like raidreconf does - safer, easier, > faster, and one have more possibilities for more complex changes. It > isn't like you want to add/remove drives to/from your arrays every day... > Alot of good hw raid cards are unable to perform such reshaping too. RAID resize/restripe may not be so common with cheap / PC-based RAID systems, but it is common with midrange and enterprise storage subsystems from vendors such as EMC, HDS, IBM & HP. in fact, I'd say it's the exception to the rule _if_ an midrange/enterprise storage subsystem doesn't have an _online_ resize capability.. personally, I think this this useful functionality, but my personal preference is that this would be in DM/LVM2 rather than MD. but given Neil is the MD author/maintainer, I can see why he'd prefer to do it in MD. :) cheers, lincoln. ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [PATCH 000 of 5] md: Introduction 2006-01-17 21:38 [PATCH 000 of 5] md: Introduction Lincoln Dale (ltd) @ 2006-01-18 13:27 ` Jan Engelhardt 2006-01-18 23:19 ` Neil Brown 0 siblings, 1 reply; 72+ messages in thread From: Jan Engelhardt @ 2006-01-18 13:27 UTC (permalink / raw) To: Lincoln Dale (ltd) Cc: Michael Tokarev, NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson >personally, I think this this useful functionality, but my personal >preference is that this would be in DM/LVM2 rather than MD. but given >Neil is the MD author/maintainer, I can see why he'd prefer to do it in >MD. :) Why don't MD and DM merge some bits? Jan Engelhardt -- ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [PATCH 000 of 5] md: Introduction 2006-01-18 13:27 ` Jan Engelhardt @ 2006-01-18 23:19 ` Neil Brown 2006-01-19 15:33 ` Mark Hahn ` (2 more replies) 0 siblings, 3 replies; 72+ messages in thread From: Neil Brown @ 2006-01-18 23:19 UTC (permalink / raw) To: Jan Engelhardt Cc: Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Wednesday January 18, jengelh@linux01.gwdg.de wrote: > > >personally, I think this this useful functionality, but my personal > >preference is that this would be in DM/LVM2 rather than MD. but given > >Neil is the MD author/maintainer, I can see why he'd prefer to do it in > >MD. :) > > Why don't MD and DM merge some bits? > Which bits? Why? My current opinion is that you should: Use md for raid1, raid5, raid6 - anything with redundancy. Use dm for multipath, crypto, linear, LVM, snapshot Use either for raid0 (I don't think dm has particular advantages for md or md over dm). These can be mixed together quite effectively: You can have dm/lvm over md/raid1 over dm/multipath with no problems. If there is functionality missing from any of these recommended components, then make a noise about it, preferably but not necessarily with code, and it will quite possibly be fixed. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [PATCH 000 of 5] md: Introduction 2006-01-18 23:19 ` Neil Brown @ 2006-01-19 15:33 ` Mark Hahn 2006-01-19 20:12 ` Jan Engelhardt 2006-01-19 22:17 ` Phillip Susi 2 siblings, 0 replies; 72+ messages in thread From: Mark Hahn @ 2006-01-19 15:33 UTC (permalink / raw) To: Neil Brown Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson > Use either for raid0 (I don't think dm has particular advantages > for md or md over dm). I measured this a few months ago, and was surprised to find that DM raid0 was very noticably slower than MD raid0. same machine, same disks/controller/kernel/settings/stripe-size. I didn't try to find out why, since I usually need redundancy... regards, mark hahn. ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [PATCH 000 of 5] md: Introduction 2006-01-18 23:19 ` Neil Brown 2006-01-19 15:33 ` Mark Hahn @ 2006-01-19 20:12 ` Jan Engelhardt 2006-01-19 21:22 ` Lars Marowsky-Bree 2006-01-19 22:17 ` Phillip Susi 2 siblings, 1 reply; 72+ messages in thread From: Jan Engelhardt @ 2006-01-19 20:12 UTC (permalink / raw) To: Neil Brown Cc: Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson >> >personally, I think this this useful functionality, but my personal >> >preference is that this would be in DM/LVM2 rather than MD. but given >> >Neil is the MD author/maintainer, I can see why he'd prefer to do it in >> >MD. :) >> >> Why don't MD and DM merge some bits? > >Which bits? >Why? > >My current opinion is that you should: > > Use md for raid1, raid5, raid6 - anything with redundancy. > Use dm for multipath, crypto, linear, LVM, snapshot There are pairs of files that look like they would do the same thing: raid1.c <-> dm-raid1.c linear.c <-> dm-linear.c Jan Engelhardt -- ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 20:12 ` Jan Engelhardt @ 2006-01-19 21:22 ` Lars Marowsky-Bree 0 siblings, 0 replies; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-19 21:22 UTC (permalink / raw) To: Jan Engelhardt, Neil Brown Cc: Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-19T21:12:02, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote: > > Use md for raid1, raid5, raid6 - anything with redundancy. > > Use dm for multipath, crypto, linear, LVM, snapshot > There are pairs of files that look like they would do the same thing: > > raid1.c <-> dm-raid1.c > linear.c <-> dm-linear.c Sure there's some historical overlap. It'd make sense if DM used the md raid personalities, yes. Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 23:19 ` Neil Brown 2006-01-19 15:33 ` Mark Hahn 2006-01-19 20:12 ` Jan Engelhardt @ 2006-01-19 22:17 ` Phillip Susi 2006-01-19 22:32 ` Neil Brown 2 siblings, 1 reply; 72+ messages in thread From: Phillip Susi @ 2006-01-19 22:17 UTC (permalink / raw) To: Neil Brown Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson I'm currently of the opinion that dm needs a raid5 and raid6 module added, then the user land lvm tools fixed to use them, and then you could use dm instead of md. The benefit being that dm pushes things like volume autodetection and management out of the kernel to user space where it belongs. But that's just my opinion... I'm using dm at home because I have a sata hardware fakeraid raid-0 between two WD 10,000 rpm raptors, and the dmraid utility correctly recognizes that and configures device mapper to use it. Neil Brown wrote: > Which bits? > Why? > > My current opinion is that you should: > > Use md for raid1, raid5, raid6 - anything with redundancy. > Use dm for multipath, crypto, linear, LVM, snapshot > Use either for raid0 (I don't think dm has particular advantages > for md or md over dm). > > These can be mixed together quite effectively: > You can have dm/lvm over md/raid1 over dm/multipath > with no problems. > > If there is functionality missing from any of these recommended > components, then make a noise about it, preferably but not necessarily > with code, and it will quite possibly be fixed. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 22:17 ` Phillip Susi @ 2006-01-19 22:32 ` Neil Brown 2006-01-19 23:26 ` Phillip Susi 2006-01-20 7:51 ` Reuben Farrelly 0 siblings, 2 replies; 72+ messages in thread From: Neil Brown @ 2006-01-19 22:32 UTC (permalink / raw) To: Phillip Susi Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Thursday January 19, psusi@cfl.rr.com wrote: > I'm currently of the opinion that dm needs a raid5 and raid6 module > added, then the user land lvm tools fixed to use them, and then you > could use dm instead of md. The benefit being that dm pushes things > like volume autodetection and management out of the kernel to user space > where it belongs. But that's just my opinion... The in-kernel autodetection in md is purely legacy support as far as I am concerned. md does volume detection in user space via 'mdadm'. What other "things like" were you thinking of. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 22:32 ` Neil Brown @ 2006-01-19 23:26 ` Phillip Susi 2006-01-19 23:43 ` Neil Brown 2006-01-20 7:51 ` Reuben Farrelly 1 sibling, 1 reply; 72+ messages in thread From: Phillip Susi @ 2006-01-19 23:26 UTC (permalink / raw) To: Neil Brown Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson Neil Brown wrote: > > The in-kernel autodetection in md is purely legacy support as far as I > am concerned. md does volume detection in user space via 'mdadm'. > > What other "things like" were you thinking of. > Oh, I suppose that's true. Well, another thing is your new mods to support on the fly reshaping, which dm could do from user space. Then of course, there's multipath and snapshots and other lvm things which you need dm for, so why use both when one will do? That's my take on it. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 23:26 ` Phillip Susi @ 2006-01-19 23:43 ` Neil Brown 2006-01-20 2:17 ` Phillip Susi ` (2 more replies) 0 siblings, 3 replies; 72+ messages in thread From: Neil Brown @ 2006-01-19 23:43 UTC (permalink / raw) To: Phillip Susi Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Thursday January 19, psusi@cfl.rr.com wrote: > Neil Brown wrote: > > > > The in-kernel autodetection in md is purely legacy support as far as I > > am concerned. md does volume detection in user space via 'mdadm'. > > > > What other "things like" were you thinking of. > > > > Oh, I suppose that's true. Well, another thing is your new mods to > support on the fly reshaping, which dm could do from user space. Then > of course, there's multipath and snapshots and other lvm things which > you need dm for, so why use both when one will do? That's my take on it. Maybe the problem here is thinking of md and dm as different things. Try just not thinking of them at all. Think about it like this: The linux kernel support lvm The linux kernel support multipath The linux kernel support snapshots The linux kernel support raid0 The linux kernel support raid1 The linux kernel support raid5 Use the bits that you want, and not the bits that you don't. dm and md are just two different interface styles to various bits of this. Neither is clearly better than the other, partly because different people have different tastes. Maybe what you really want is for all of these functions to be managed under the one umbrella application. I think that is was EVMS tried to do. One big selling point that 'dm' has is 'dmraid' - a tool that allows you to use a lot of 'fakeraid' cards. People would like dmraid to work with raid5 as well, and that is a good goal. However it doesn't mean that dm needs to get it's own raid5 implementation or that md/raid5 needs to be merged with dm. It can be achieved by giving md/raid5 the right interfaces so that metadata can be managed from userspace (and I am nearly there). Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some raid levels and 'md' interfaces for others. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 23:43 ` Neil Brown @ 2006-01-20 2:17 ` Phillip Susi 2006-01-20 10:53 ` Lars Marowsky-Bree 2006-01-20 18:41 ` Heinz Mauelshagen 2006-01-20 17:29 ` Ross Vandegrift 2006-01-20 18:36 ` Heinz Mauelshagen 2 siblings, 2 replies; 72+ messages in thread From: Phillip Susi @ 2006-01-20 2:17 UTC (permalink / raw) To: Neil Brown Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson Neil Brown wrote: >Maybe the problem here is thinking of md and dm as different things. >Try just not thinking of them at all. >Think about it like this: > The linux kernel support lvm > The linux kernel support multipath > The linux kernel support snapshots > The linux kernel support raid0 > The linux kernel support raid1 > The linux kernel support raid5 > >Use the bits that you want, and not the bits that you don't. > >dm and md are just two different interface styles to various bits of >this. Neither is clearly better than the other, partly because >different people have different tastes. > >Maybe what you really want is for all of these functions to be managed >under the one umbrella application. I think that is was EVMS tried to >do. > > > I am under the impression that dm is simpler/cleaner than md. That impression very well may be wrong, but if it is simpler, then that's a good thing. >One big selling point that 'dm' has is 'dmraid' - a tool that allows >you to use a lot of 'fakeraid' cards. People would like dmraid to >work with raid5 as well, and that is a good goal. > > AFAIK, the hardware fakeraid solutions on the market don't support raid5 anyhow ( at least mine doesn't ), so dmraid won't either. >However it doesn't mean that dm needs to get it's own raid5 >implementation or that md/raid5 needs to be merged with dm. >It can be achieved by giving md/raid5 the right interfaces so that >metadata can be managed from userspace (and I am nearly there). >Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some >raid levels and 'md' interfaces for others. > Having two sets of interfaces and retrofiting a new interface onto a system that wasn't designed for it seems likely to bloat the kernel with complex code. I don't really know if that is the case because I have not studied the code, but that's the impression I get, and if it's right, then I'd say it is better to stick with dm rather than retrofit md. In either case, it seems overly complex to have to deal with both. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 2:17 ` Phillip Susi @ 2006-01-20 10:53 ` Lars Marowsky-Bree 2006-01-20 12:06 ` Jens Axboe 2006-01-20 18:38 ` Heinz Mauelshagen 2006-01-20 18:41 ` Heinz Mauelshagen 1 sibling, 2 replies; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-20 10:53 UTC (permalink / raw) To: Phillip Susi, Neil Brown Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-19T21:17:12, Phillip Susi <psusi@cfl.rr.com> wrote: > I am under the impression that dm is simpler/cleaner than md. That > impression very well may be wrong, but if it is simpler, then that's a > good thing. That impression is wrong in that general form. Both have advantages and disadvantages. I've been an advocate of seeing both of them merged, mostly because I think it would be beneficial if they'd share the same interface to user-space to make the tools easier to write and maintain. However, rewriting the RAID personalities for DM is a thing only a fool would do without really good cause. Sure, everybody can write a RAID5/RAID6 parity algorithm. But getting the failure/edge cases stable is not trivial and requires years of maturing. Which is why I think gentle evolution of both source bases towards some common API (for example) is much preferable to reinventing one within the other. Oversimplifying to "dm is better than md" is just stupid. Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 10:53 ` Lars Marowsky-Bree @ 2006-01-20 12:06 ` Jens Axboe 2006-01-20 18:38 ` Heinz Mauelshagen 1 sibling, 0 replies; 72+ messages in thread From: Jens Axboe @ 2006-01-20 12:06 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Fri, Jan 20 2006, Lars Marowsky-Bree wrote: > Oversimplifying to "dm is better than md" is just stupid. Indeed. But "generally" md is faster and more efficient in the way it handles ios, it doesn't do any splitting unless it has to. -- Jens Axboe ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 10:53 ` Lars Marowsky-Bree 2006-01-20 12:06 ` Jens Axboe @ 2006-01-20 18:38 ` Heinz Mauelshagen 2006-01-20 22:09 ` Lars Marowsky-Bree 1 sibling, 1 reply; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-20 18:38 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Fri, Jan 20, 2006 at 11:53:06AM +0100, Lars Marowsky-Bree wrote: > On 2006-01-19T21:17:12, Phillip Susi <psusi@cfl.rr.com> wrote: > > > I am under the impression that dm is simpler/cleaner than md. That > > impression very well may be wrong, but if it is simpler, then that's a > > good thing. > > That impression is wrong in that general form. Both have advantages and > disadvantages. > > I've been an advocate of seeing both of them merged, mostly because I > think it would be beneficial if they'd share the same interface to > user-space to make the tools easier to write and maintain. > > However, rewriting the RAID personalities for DM is a thing only a fool > would do without really good cause. Thanks Lars ;) > Sure, everybody can write a > RAID5/RAID6 parity algorithm. But getting the failure/edge cases stable > is not trivial and requires years of maturing. > > Which is why I think gentle evolution of both source bases towards some > common API (for example) is much preferable to reinventing one within > the other. > > Oversimplifying to "dm is better than md" is just stupid. > > > > Sincerely, > Lars Marowsky-Brée > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 18:38 ` Heinz Mauelshagen @ 2006-01-20 22:09 ` Lars Marowsky-Bree 2006-01-21 0:06 ` Heinz Mauelshagen 0 siblings, 1 reply; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-20 22:09 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-20T19:38:40, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > However, rewriting the RAID personalities for DM is a thing only a fool > > would do without really good cause. > > Thanks Lars ;) Well, I assume you have a really good cause then, don't you? ;-) Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 22:09 ` Lars Marowsky-Bree @ 2006-01-21 0:06 ` Heinz Mauelshagen 0 siblings, 0 replies; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-21 0:06 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Heinz Mauelshagen, Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Fri, Jan 20, 2006 at 11:09:51PM +0100, Lars Marowsky-Bree wrote: > On 2006-01-20T19:38:40, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > > > However, rewriting the RAID personalities for DM is a thing only a fool > > > would do without really good cause. > > > > Thanks Lars ;) > > Well, I assume you have a really good cause then, don't you? ;-) Well, I'll share your assumption ;-) > > > Sincerely, > Lars Marowsky-Brée > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 2:17 ` Phillip Susi 2006-01-20 10:53 ` Lars Marowsky-Bree @ 2006-01-20 18:41 ` Heinz Mauelshagen 1 sibling, 0 replies; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-20 18:41 UTC (permalink / raw) To: Phillip Susi Cc: Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Thu, Jan 19, 2006 at 09:17:12PM -0500, Phillip Susi wrote: > Neil Brown wrote: > > >Maybe the problem here is thinking of md and dm as different things. > >Try just not thinking of them at all. > >Think about it like this: > > The linux kernel support lvm > > The linux kernel support multipath > > The linux kernel support snapshots > > The linux kernel support raid0 > > The linux kernel support raid1 > > The linux kernel support raid5 > > > >Use the bits that you want, and not the bits that you don't. > > > >dm and md are just two different interface styles to various bits of > >this. Neither is clearly better than the other, partly because > >different people have different tastes. > > > >Maybe what you really want is for all of these functions to be managed > >under the one umbrella application. I think that is was EVMS tried to > >do. > > > > > > > > I am under the impression that dm is simpler/cleaner than md. That > impression very well may be wrong, but if it is simpler, then that's a > good thing. > > > >One big selling point that 'dm' has is 'dmraid' - a tool that allows > >you to use a lot of 'fakeraid' cards. People would like dmraid to > >work with raid5 as well, and that is a good goal. > > > > > > AFAIK, the hardware fakeraid solutions on the market don't support raid5 > anyhow ( at least mine doesn't ), so dmraid won't either. Well, some do (eg, Nvidia). > > >However it doesn't mean that dm needs to get it's own raid5 > >implementation or that md/raid5 needs to be merged with dm. > >It can be achieved by giving md/raid5 the right interfaces so that > >metadata can be managed from userspace (and I am nearly there). > >Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some > >raid levels and 'md' interfaces for others. > > > > Having two sets of interfaces and retrofiting a new interface onto a > system that wasn't designed for it seems likely to bloat the kernel with > complex code. I don't really know if that is the case because I have > not studied the code, but that's the impression I get, and if it's > right, then I'd say it is better to stick with dm rather than retrofit > md. In either case, it seems overly complex to have to deal with both. I agree, but dm will need to mature before it'll be able to substitute md. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Heinz -- The LVM Guy -- *** Software bugs are stupid. Nevertheless it needs not so stupid people to solve them *** =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 23:43 ` Neil Brown 2006-01-20 2:17 ` Phillip Susi @ 2006-01-20 17:29 ` Ross Vandegrift 2006-01-20 18:36 ` Heinz Mauelshagen 2 siblings, 0 replies; 72+ messages in thread From: Ross Vandegrift @ 2006-01-20 17:29 UTC (permalink / raw) To: Neil Brown Cc: Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Fri, Jan 20, 2006 at 10:43:13AM +1100, Neil Brown wrote: > dm and md are just two different interface styles to various bits of > this. Neither is clearly better than the other, partly because > different people have different tastes. Here's why it's great to have both: they have different toolkits. I'm really familiar with md's toolkit. I can do most anything I need. But I'll bet that I've never gotten a pvmove to finish sucessfully because I am doing something wrong and I don't know it. Becuase we're talking about data integrity, the toolkit issue alone makes it worth keeping both code paths. md does 90% of what I need, so why should I spend the time to learn a new system that doesn't offer any advantages? [1] I'm intentionally neglecting the 4k stack issue -- Ross Vandegrift ross@lug.udel.edu "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 23:43 ` Neil Brown 2006-01-20 2:17 ` Phillip Susi 2006-01-20 17:29 ` Ross Vandegrift @ 2006-01-20 18:36 ` Heinz Mauelshagen 2006-01-20 22:57 ` Lars Marowsky-Bree 2 siblings, 1 reply; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-20 18:36 UTC (permalink / raw) To: Neil Brown Cc: Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Fri, Jan 20, 2006 at 10:43:13AM +1100, Neil Brown wrote: > On Thursday January 19, psusi@cfl.rr.com wrote: > > Neil Brown wrote: > > > > > > The in-kernel autodetection in md is purely legacy support as far as I > > > am concerned. md does volume detection in user space via 'mdadm'. > > > > > > What other "things like" were you thinking of. > > > > > > > Oh, I suppose that's true. Well, another thing is your new mods to > > support on the fly reshaping, which dm could do from user space. Then > > of course, there's multipath and snapshots and other lvm things which > > you need dm for, so why use both when one will do? That's my take on it. > > Maybe the problem here is thinking of md and dm as different things. > Try just not thinking of them at all. > Think about it like this: > The linux kernel support lvm > The linux kernel support multipath > The linux kernel support snapshots > The linux kernel support raid0 > The linux kernel support raid1 > The linux kernel support raid5 > > Use the bits that you want, and not the bits that you don't. > > dm and md are just two different interface styles to various bits of > this. Neither is clearly better than the other, partly because > different people have different tastes. > > Maybe what you really want is for all of these functions to be managed > under the one umbrella application. I think that is was EVMS tried to > do. > > One big selling point that 'dm' has is 'dmraid' - a tool that allows > you to use a lot of 'fakeraid' cards. People would like dmraid to > work with raid5 as well, and that is a good goal. > However it doesn't mean that dm needs to get it's own raid5 > implementation or that md/raid5 needs to be merged with dm. That's a valid point to make but it can ;) > It can be achieved by giving md/raid5 the right interfaces so that > metadata can be managed from userspace (and I am nearly there). Yeah, and I'm nearly there to have a RAID4 and RAID5 target for dm (which took advantage of the raid address calculation and the bio to stripe cache copy code of md raid5). See http://people.redhat.com/heinzm/sw/dm/dm-raid45/dm-raid45_2.6.15_200601201914.patch.bz2 (no Makefile / no Kconfig changes) for early code reference. > Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some > raid levels and 'md' interfaces for others. Yes, that's possible but there's recommendations to have a native target for dm to do RAID5, so I started to implement it. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 18:36 ` Heinz Mauelshagen @ 2006-01-20 22:57 ` Lars Marowsky-Bree 2006-01-21 0:01 ` Heinz Mauelshagen 0 siblings, 1 reply; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-20 22:57 UTC (permalink / raw) To: Heinz Mauelshagen, Neil Brown Cc: Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-20T19:36:21, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some > > raid levels and 'md' interfaces for others. > Yes, that's possible but there's recommendations to have a native target > for dm to do RAID5, so I started to implement it. Can you answer me what the recommendations are based on? I understand wanting to manage both via the same framework, but duplicating the code is just ... wrong. What's gained by it? Why not provide a dm-md wrapper which could then load/interface to all md personalities? Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 22:57 ` Lars Marowsky-Bree @ 2006-01-21 0:01 ` Heinz Mauelshagen 2006-01-21 0:03 ` Lars Marowsky-Bree 0 siblings, 1 reply; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-21 0:01 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Fri, Jan 20, 2006 at 11:57:24PM +0100, Lars Marowsky-Bree wrote: > On 2006-01-20T19:36:21, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > > > Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some > > > raid levels and 'md' interfaces for others. > > Yes, that's possible but there's recommendations to have a native target > > for dm to do RAID5, so I started to implement it. > > Can you answer me what the recommendations are based on? Partner requests. > > I understand wanting to manage both via the same framework, but > duplicating the code is just ... wrong. > > What's gained by it? > > Why not provide a dm-md wrapper which could then > load/interface to all md personalities? > As we want to enrich the mapping flexibility (ie, multi-segment fine grained mappings) of dm by adding targets as we go, a certain degree and transitional existence of duplicate code is the price to gain that flexibility. > > Sincerely, > Lars Marowsky-Brée > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" Warm regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-21 0:01 ` Heinz Mauelshagen @ 2006-01-21 0:03 ` Lars Marowsky-Bree 2006-01-21 0:08 ` Heinz Mauelshagen 0 siblings, 1 reply; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-21 0:03 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-21T01:01:42, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > Why not provide a dm-md wrapper which could then > > load/interface to all md personalities? > As we want to enrich the mapping flexibility (ie, multi-segment fine grained > mappings) of dm by adding targets as we go, a certain degree and transitional > existence of duplicate code is the price to gain that flexibility. A dm-md wrapper would give you the same? Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-21 0:03 ` Lars Marowsky-Bree @ 2006-01-21 0:08 ` Heinz Mauelshagen 2006-01-21 0:13 ` Lars Marowsky-Bree 0 siblings, 1 reply; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-21 0:08 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Sat, Jan 21, 2006 at 01:03:44AM +0100, Lars Marowsky-Bree wrote: > On 2006-01-21T01:01:42, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > > > Why not provide a dm-md wrapper which could then > > > load/interface to all md personalities? > > As we want to enrich the mapping flexibility (ie, multi-segment fine grained > > mappings) of dm by adding targets as we go, a certain degree and transitional > > existence of duplicate code is the price to gain that flexibility. > > A dm-md wrapper would give you the same? No, we'ld need to stack more complex to achieve mappings. Think lvm2 and logical volume level raid5. > > > Sincerely, > Lars Marowsky-Brée > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-21 0:08 ` Heinz Mauelshagen @ 2006-01-21 0:13 ` Lars Marowsky-Bree 2006-01-23 9:44 ` Heinz Mauelshagen 0 siblings, 1 reply; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-21 0:13 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-21T01:08:06, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > A dm-md wrapper would give you the same? > No, we'ld need to stack more complex to achieve mappings. > Think lvm2 and logical volume level raid5. How would you not get that if you had a wrapper around md which made it into an dm personality/target? Besides, stacking between dm devices so far (ie, if I look how kpartx does it, or LVM2 on top of MPIO etc, which works just fine) is via the block device layer anyway - and nothing stops you from putting md on top of LVM2 LVs either. I use the regularly to play with md and other stuff... So I remain unconvinced that code duplication is worth it for more than "hark we want it so!" ;-) ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-21 0:13 ` Lars Marowsky-Bree @ 2006-01-23 9:44 ` Heinz Mauelshagen 2006-01-23 10:26 ` Lars Marowsky-Bree 2006-01-23 12:54 ` Ville Herva 0 siblings, 2 replies; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-23 9:44 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Sat, Jan 21, 2006 at 01:13:11AM +0100, Lars Marowsky-Bree wrote: > On 2006-01-21T01:08:06, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > > > A dm-md wrapper would give you the same? > > No, we'ld need to stack more complex to achieve mappings. > > Think lvm2 and logical volume level raid5. > > How would you not get that if you had a wrapper around md which made it > into an dm personality/target? You could with deeper stacking. That's why I mentioned it above. > > Besides, stacking between dm devices so far (ie, if I look how kpartx > does it, or LVM2 on top of MPIO etc, which works just fine) is via the > block device layer anyway - and nothing stops you from putting md on top > of LVM2 LVs either. > > I use the regularly to play with md and other stuff... Me too but for production, I want to avoid the additional stacking overhead and complexity. > > So I remain unconvinced that code duplication is worth it for more than > "hark we want it so!" ;-) Shall I remove you from the list of potential testers of dm-raid45 then ;-) > > -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 9:44 ` Heinz Mauelshagen @ 2006-01-23 10:26 ` Lars Marowsky-Bree 2006-01-23 10:38 ` Heinz Mauelshagen 2006-01-23 12:54 ` Ville Herva 1 sibling, 1 reply; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-23 10:26 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-23T10:44:18, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > Besides, stacking between dm devices so far (ie, if I look how kpartx > > does it, or LVM2 on top of MPIO etc, which works just fine) is via the > > block device layer anyway - and nothing stops you from putting md on top > > of LVM2 LVs either. > > > > I use the regularly to play with md and other stuff... > > Me too but for production, I want to avoid the > additional stacking overhead and complexity. Ok, I still didn't get that. I must be slow. Did you implement some DM-internal stacking now to avoid the above mentioned complexity? Otherwise, even DM-on-DM is still stacked via the block device abstraction... Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 10:26 ` Lars Marowsky-Bree @ 2006-01-23 10:38 ` Heinz Mauelshagen 2006-01-23 10:45 ` Lars Marowsky-Bree 0 siblings, 1 reply; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-23 10:38 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Mon, Jan 23, 2006 at 11:26:01AM +0100, Lars Marowsky-Bree wrote: > On 2006-01-23T10:44:18, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > > > Besides, stacking between dm devices so far (ie, if I look how kpartx > > > does it, or LVM2 on top of MPIO etc, which works just fine) is via the > > > block device layer anyway - and nothing stops you from putting md on top > > > of LVM2 LVs either. > > > > > > I use the regularly to play with md and other stuff... > > > > Me too but for production, I want to avoid the > > additional stacking overhead and complexity. > > Ok, I still didn't get that. I must be slow. > > Did you implement some DM-internal stacking now to avoid the above > mentioned complexity? > > Otherwise, even DM-on-DM is still stacked via the block device > abstraction... No, not necessary because a single-level raid4/5 mapping will do it. Ie. it supports <offset> parameters in the constructor as other targets do as well (eg. mirror or linear). > > > Sincerely, > Lars Marowsky-Brée > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 10:38 ` Heinz Mauelshagen @ 2006-01-23 10:45 ` Lars Marowsky-Bree 2006-01-23 11:00 ` Heinz Mauelshagen 0 siblings, 1 reply; 72+ messages in thread From: Lars Marowsky-Bree @ 2006-01-23 10:45 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On 2006-01-23T11:38:51, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > Ok, I still didn't get that. I must be slow. > > > > Did you implement some DM-internal stacking now to avoid the above > > mentioned complexity? > > > > Otherwise, even DM-on-DM is still stacked via the block device > > abstraction... > > No, not necessary because a single-level raid4/5 mapping will do it. > Ie. it supports <offset> parameters in the constructor as other targets > do as well (eg. mirror or linear). An dm-md wrapper would not support such a basic feature (which is easily added to md too) how? I mean, "I'm rewriting it because I want to and because I understand and own the code then" is a perfectly legitimate reason, but let's please not pretend there's really sound and good technical reasons ;-) Sincerely, Lars Marowsky-Brée -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 10:45 ` Lars Marowsky-Bree @ 2006-01-23 11:00 ` Heinz Mauelshagen 0 siblings, 0 replies; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-23 11:00 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Mon, Jan 23, 2006 at 11:45:22AM +0100, Lars Marowsky-Bree wrote: > On 2006-01-23T11:38:51, Heinz Mauelshagen <mauelshagen@redhat.com> wrote: > > > > Ok, I still didn't get that. I must be slow. > > > > > > Did you implement some DM-internal stacking now to avoid the above > > > mentioned complexity? > > > > > > Otherwise, even DM-on-DM is still stacked via the block device > > > abstraction... > > > > No, not necessary because a single-level raid4/5 mapping will do it. > > Ie. it supports <offset> parameters in the constructor as other targets > > do as well (eg. mirror or linear). > > An dm-md wrapper would not support such a basic feature (which is easily > added to md too) how? > > I mean, "I'm rewriting it because I want to and because I understand and > own the code then" is a perfectly legitimate reason Sure :-) >, but let's please > not pretend there's really sound and good technical reasons ;-) Mind you that there's no need to argue about that: this is based on requests to do it. > > > Sincerely, > Lars Marowsky-Brée > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 9:44 ` Heinz Mauelshagen 2006-01-23 10:26 ` Lars Marowsky-Bree @ 2006-01-23 12:54 ` Ville Herva 2006-01-23 13:00 ` Steinar H. Gunderson ` (2 more replies) 1 sibling, 3 replies; 72+ messages in thread From: Ville Herva @ 2006-01-23 12:54 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Lars Marowsky-Bree, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Mon, Jan 23, 2006 at 10:44:18AM +0100, you [Heinz Mauelshagen] wrote: > > > > I use the regularly to play with md and other stuff... > > Me too but for production, I want to avoid the > additional stacking overhead and complexity. > > > So I remain unconvinced that code duplication is worth it for more than > > "hark we want it so!" ;-) > > Shall I remove you from the list of potential testers of dm-raid45 then ;-) Heinz, If you really want the rest of us to convert from md to lvm, you should perhaps give some attention to thee brittle userland (scripts and and binaries). It is very tedious to have to debug a production system for a few hours in order to get the rootfs mounted after each kernel update. The lvm error messages give almost no clue on the problem. Worse yet, problem reports on these issues are completely ignored on the lvm mailing list, even when a patch is attached. (See http://marc.theaimsgroup.com/?l=linux-lvm&m=113775502821403&w=2 http://linux.msede.com/lvm_mlist/archive/2001/06/0205.html http://linux.msede.com/lvm_mlist/archive/2001/06/0271.html for reference.) Such experience gives an impression lvm is not yet ready for serious production use. No offense intended, lvm kernel (lvm1 nor lvm2) code has never given me trouble, and is probably as solid as anything. -- v -- v@iki.fi PS: Speaking of debugging failing initrd init scripts; it would be nice if the kernel gave an error message on wrong initrd format rather than silently failing... Yes, I forgot to make the cpio with the "-H newc" option :-/. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 12:54 ` Ville Herva @ 2006-01-23 13:00 ` Steinar H. Gunderson 2006-01-23 13:54 ` Heinz Mauelshagen 2006-01-24 2:02 ` Phillip Susi 2 siblings, 0 replies; 72+ messages in thread From: Steinar H. Gunderson @ 2006-01-23 13:00 UTC (permalink / raw) To: Ville Herva Cc: Heinz Mauelshagen, Lars Marowsky-Bree, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel On Mon, Jan 23, 2006 at 02:54:20PM +0200, Ville Herva wrote: > If you really want the rest of us to convert from md to lvm, you should > perhaps give some attention to thee brittle userland (scripts and and > binaries). If you do not like the LVM userland, you might want to try the EVMS userland, which uses the same kernel code and (mostly) the same on-disk formats, but has a different front-end. > It is very tedious to have to debug a production system for a few hours in > order to get the rootfs mounted after each kernel update. This sounds a bit like an issue with your distribution, which should normally fix initrd/initramfs issues for you. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 12:54 ` Ville Herva 2006-01-23 13:00 ` Steinar H. Gunderson @ 2006-01-23 13:54 ` Heinz Mauelshagen 2006-01-23 17:33 ` Ville Herva 2006-01-24 2:02 ` Phillip Susi 2 siblings, 1 reply; 72+ messages in thread From: Heinz Mauelshagen @ 2006-01-23 13:54 UTC (permalink / raw) To: Ville Herva Cc: Heinz Mauelshagen, Lars Marowsky-Bree, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Mon, Jan 23, 2006 at 02:54:20PM +0200, Ville Herva wrote: > On Mon, Jan 23, 2006 at 10:44:18AM +0100, you [Heinz Mauelshagen] wrote: > > > > > > I use the regularly to play with md and other stuff... > > > > Me too but for production, I want to avoid the > > additional stacking overhead and complexity. > > > > > So I remain unconvinced that code duplication is worth it for more than > > > "hark we want it so!" ;-) > > > > Shall I remove you from the list of potential testers of dm-raid45 then ;-) > > Heinz, > > If you really want the rest of us to convert from md to lvm, you should > perhaps give some attention to thee brittle userland (scripts and and > binaries). Sure :-) > > It is very tedious to have to debug a production system for a few hours in > order to get the rootfs mounted after each kernel update. > > The lvm error messages give almost no clue on the problem. > > Worse yet, problem reports on these issues are completely ignored on the lvm > mailing list, even when a patch is attached. > > (See > http://marc.theaimsgroup.com/?l=linux-lvm&m=113775502821403&w=2 > http://linux.msede.com/lvm_mlist/archive/2001/06/0205.html > http://linux.msede.com/lvm_mlist/archive/2001/06/0271.html > for reference.) Hrm, those are initscripts related, not lvm directly > > Such experience gives an impression lvm is not yet ready for serious > production use. initscripts/initramfs surely need to do the right thing in case root is on lvm. > > No offense intended, lvm kernel (lvm1 nor lvm2) code has never given me > trouble, and is probably as solid as anything. Alright. Is the initscript issue fixed now or still open ? Had you filed a bug against the distros initscripts ? > > > -- v -- > > v@iki.fi > > PS: Speaking of debugging failing initrd init scripts; it would be nice if > the kernel gave an error message on wrong initrd format rather than silently > failing... Yes, I forgot to make the cpio with the "-H newc" option :-/. -- Regards, Heinz -- The LVM Guy -- *** Software bugs are stupid. Nevertheless it needs not so stupid people to solve them *** =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 Cluster and Storage Development 56242 Marienrachdorf Germany Mauelshagen@RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 13:54 ` Heinz Mauelshagen @ 2006-01-23 17:33 ` Ville Herva 0 siblings, 0 replies; 72+ messages in thread From: Ville Herva @ 2006-01-23 17:33 UTC (permalink / raw) To: Heinz Mauelshagen Cc: Lars Marowsky-Bree, Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson On Mon, Jan 23, 2006 at 02:54:28PM +0100, you [Heinz Mauelshagen] wrote: > > > > It is very tedious to have to debug a production system for a few hours in > > order to get the rootfs mounted after each kernel update. > > > > The lvm error messages give almost no clue on the problem. > > > > Worse yet, problem reports on these issues are completely ignored on the lvm > > mailing list, even when a patch is attached. > > > > (See > > http://marc.theaimsgroup.com/?l=linux-lvm&m=113775502821403&w=2 > > http://linux.msede.com/lvm_mlist/archive/2001/06/0205.html > > http://linux.msede.com/lvm_mlist/archive/2001/06/0271.html > > for reference.) > > Hrm, those are initscripts related, not lvm directly With the ancient LVM1 issue, my main problem was indeed that mkinitrd did not reserve enough space for the initrd. The LVM issue I posted to the LVM list was that LVM userland (vg_cfgbackup.c) did not check for errors while writing to the fs. The (ignored) patch added some error checking. But that's ancient, I think we can forget about that. The current issue (please see the first link) is about the need to add a "sleep 5" between lvm vgmknodes and mount -o defaults --ro -t ext3 /dev/root /sysroot . Otherwise, mounting fails. (Actually, I added "sleep 5" after every lvm command in the init script and did not narrow it down any more, since this was a production system, each boot took ages, and I had to get the system up as soon as possible.) To me it seemed some kind of problem with the lvm utilities, not with the initscripts. At least, the correct solution cannot be adding "sleep 5" here and there in the initscripts... > Alright. > Is the initscript issue fixed now or still open ? It is still open. Sadly, the only two systems this currently happens are production boxes and I cannot boot them at will for debugging. It is, however, 100% reproducible and I can try reasonable suggestions when I boot them the next time. Sorry about this. > Had you filed a bug against the distros initscripts ? No, since I wasn't sure the problem actually was in the initscript. Perhaps it does do something wrong, but the "sleep 5" workaround is pretty suspicious. Thanks for the reply. -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 12:54 ` Ville Herva 2006-01-23 13:00 ` Steinar H. Gunderson 2006-01-23 13:54 ` Heinz Mauelshagen @ 2006-01-24 2:02 ` Phillip Susi 2 siblings, 0 replies; 72+ messages in thread From: Phillip Susi @ 2006-01-24 2:02 UTC (permalink / raw) To: vherva Cc: Heinz Mauelshagen, Lars Marowsky-Bree, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson Ville Herva wrote: > PS: Speaking of debugging failing initrd init scripts; it would be nice if > the kernel gave an error message on wrong initrd format rather than silently > failing... Yes, I forgot to make the cpio with the "-H newc" option :-/. > LOL, yea, that one got me too when I was first getting back into linux a few months ago and had to customize my initramfs to include dmraid to recognize my hardware fakeraid raid0. Then I discovered the mkinitramfs utility which makes things much nicer ;) ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 22:32 ` Neil Brown 2006-01-19 23:26 ` Phillip Susi @ 2006-01-20 7:51 ` Reuben Farrelly 2006-01-20 3:43 ` Andre' Breiler 1 sibling, 1 reply; 72+ messages in thread From: Reuben Farrelly @ 2006-01-20 7:51 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 20/01/2006 11:32 a.m., Neil Brown wrote: > On Thursday January 19, psusi@cfl.rr.com wrote: >> I'm currently of the opinion that dm needs a raid5 and raid6 module >> added, then the user land lvm tools fixed to use them, and then you >> could use dm instead of md. The benefit being that dm pushes things >> like volume autodetection and management out of the kernel to user space >> where it belongs. But that's just my opinion... > > The in-kernel autodetection in md is purely legacy support as far as I > am concerned. md does volume detection in user space via 'mdadm'. Hrm. <puzzled look> How would I then start my md0 raid-1 array that is mounted as the root partition / if I'm not doing this when the kernel is starting up? Because without it I've got no userspace to actually execute. Some of the other arrays with things like /var and /home could obviously be easily assembled soon after the kernel hands over control to userspace before the filesystem points are mounted, but for the root I am not quite sure how it could work... reuben ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 7:51 ` Reuben Farrelly @ 2006-01-20 3:43 ` Andre' Breiler 2006-01-21 0:42 ` David Greaves 0 siblings, 1 reply; 72+ messages in thread From: Andre' Breiler @ 2006-01-20 3:43 UTC (permalink / raw) To: linux-raid Hi, On Fri, 20 Jan 2006, Reuben Farrelly wrote: > On 20/01/2006 11:32 a.m., Neil Brown wrote: > > > > The in-kernel autodetection in md is purely legacy support as far as I > > am concerned. md does volume detection in user space via 'mdadm'. > > Hrm. <puzzled look> How would I then start my md0 raid-1 array that is > mounted as the root partition / if I'm not doing this when the kernel is > starting up? Because without it I've got no userspace to actually execute. Indeed you won't be able to use a 'plain kernel' anymore but switch to kernel + initrd. This adds a good amount of useful flexibility (but makes life a little bit harder). The autodetect feature got discussed multiple time with pro/cons for it (so I will skip it there) with the overall outcome that it's a to dangerous feature to leave in kernel space (btw. on some architectures it doesn't work at all anyway) in the long term. > Some of the other arrays with things like /var and /home could obviously be > easily assembled soon after the kernel hands over control to userspace before > the filesystem points are mounted, but for the root I am not quite sure > how it > could work... A simple initrd. If you look at random distributions (e.g. Debian) you will see it done that way (yes I don't think it's perfect as it is yet). Personally on systems which change often I run without auto detect. On systems which hardly change and if get wiped disks I use the autodetection. Andre' ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-20 3:43 ` Andre' Breiler @ 2006-01-21 0:42 ` David Greaves 0 siblings, 0 replies; 72+ messages in thread From: David Greaves @ 2006-01-21 0:42 UTC (permalink / raw) To: Andre' Breiler, neilb; +Cc: linux-raid Andre' Breiler wrote: >Hi, > >On Fri, 20 Jan 2006, Reuben Farrelly wrote: > > >>On 20/01/2006 11:32 a.m., Neil Brown wrote: >> >> >>>The in-kernel autodetection in md is purely legacy support as far as I >>>am concerned. md does volume detection in user space via 'mdadm'. >>> >>> >>Hrm. <puzzled look> How would I then start my md0 raid-1 array that is >>mounted as the root partition / if I'm not doing this when the kernel is >>starting up? Because without it I've got no userspace to actually execute. >> >> > >Indeed you won't be able to use a 'plain kernel' anymore but switch to >kernel + initrd. > > I understand the anti-autodetect arguments and was easily persuaded by them... Can I however suggest we have autodetect only actually triggers if the kernel is supplied with the UUID of an md0 as a boot option. ie: root=/dev/md0 md0=b87a211e:8a9bef34:ccc334d8:f84216d6 I'm not sure I ever saw this proposed during the debate. It's just *so* nice not to have to bother with initrd (having just moved to a mirrored root and finding it wonderfully easy to specify root=/dev/md0 and have it 'just work') It looks easy enough to add into autorun_devices() in md.c Maybe then deprecate autodetect more quickly by having it optional in 2.6.17(?) - but if even one UUID *is* specified then no other devices are autodetected. Eventually having no devices autodetected *unless* a UUID is specified (for convenience printk the UUIDs of devices that *are* found just in case you mistype it...) Please forgive me if I've missed the reason that this is a bad idea. David -- ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH 000 of 5] md: Introduction
@ 2006-01-17 6:56 NeilBrown
2006-01-17 8:17 ` Michael Tokarev
` (3 more replies)
0 siblings, 4 replies; 72+ messages in thread
From: NeilBrown @ 2006-01-17 6:56 UTC (permalink / raw)
To: linux-raid, linux-kernel; +Cc: Steinar H. Gunderson
Greetings.
In line with the principle of "release early", following are 5 patches
against md in 2.6.latest which implement reshaping of a raid5 array.
By this I mean adding 1 or more drives to the array and then re-laying
out all of the data.
This is still EXPERIMENTAL and could easily eat your data. Don't use it on
valuable data. Only use it for review and testing.
This release does not make ANY attempt to record how far the reshape
has progressed on stable storage. That means that if the process is
interrupted either by a crash or by "mdadm -S", then you completely
lose your data. All of it.
So don't use it on valuable data.
There are 5 patches to (hopefully) ease review. Comments are most
welcome, as are test results (providing they aren't done on valuable data:-).
You will need to enable the experimental MD_RAID5_RESHAPE config option
for this to work. Please read the help message that come with it.
It gives an example mdadm command to effect a reshape (you do not need
a new mdadm, and vaguely recent version should work).
This code is based in part on earlier work by
"Steinar H. Gunderson" <sgunderson@bigfoot.com>
Though little of his code remains, having access to it, and having
discussed the issues with him greatly eased the processed of creating
these patches. Thanks Steinar.
NeilBrown
[PATCH 001 of 5] md: Split disks array out of raid5 conf structure so it is easier to grow.
[PATCH 002 of 5] md: Allow stripes to be expanded in preparation for expanding an array.
[PATCH 003 of 5] md: Infrastructure to allow normal IO to continue while array is expanding.
[PATCH 004 of 5] md: Core of raid5 resize process
[PATCH 005 of 5] md: Final stages of raid5 expand code.
^ permalink raw reply [flat|nested] 72+ messages in thread* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 6:56 NeilBrown @ 2006-01-17 8:17 ` Michael Tokarev [not found] ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com> ` (2 more replies) 2006-01-17 15:07 ` Mr. James W. Laferriere ` (2 subsequent siblings) 3 siblings, 3 replies; 72+ messages in thread From: Michael Tokarev @ 2006-01-17 8:17 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson NeilBrown wrote: > Greetings. > > In line with the principle of "release early", following are 5 patches > against md in 2.6.latest which implement reshaping of a raid5 array. > By this I mean adding 1 or more drives to the array and then re-laying > out all of the data. Neil, is this online resizing/reshaping really needed? I understand all those words means alot for marketing persons - zero downtime, online resizing etc, but it is much safer and easier to do that stuff 'offline', on an inactive array, like raidreconf does - safer, easier, faster, and one have more possibilities for more complex changes. It isn't like you want to add/remove drives to/from your arrays every day... Alot of good hw raid cards are unable to perform such reshaping too. /mjt ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <fd8d0180601170121s1e6a55b7o@mail.gmail.com>]
* [PATCH 000 of 5] md: Introduction [not found] ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com> @ 2006-01-17 9:38 ` Francois Barre 2006-01-19 0:35 ` Neil Brown 0 siblings, 1 reply; 72+ messages in thread From: Francois Barre @ 2006-01-17 9:38 UTC (permalink / raw) To: linux-raid 2006/1/17, Michael Tokarev <mjt@tls.msk.ru>: > NeilBrown wrote: > > Greetings. > > > > In line with the principle of "release early", following are 5 patches > > against md in 2.6.latest which implement reshaping of a raid5 array. > > By this I mean adding 1 or more drives to the array and then re-laying > > out all of the data. > > Neil, is this online resizing/reshaping really needed? Congratulations Neil, I was really expecting this feature, and will test as soon as possible. IMHO, being able to resize 'online' is really interesting, so I would thank Neil and Steinar much more than I would blame them, Michael :-p. Regarding box crash and process interruption, what is the remaining work to be done to save the process status efficiently, in order to resume resize process ? In my case, I would really wish to trust resizing enough to use it on working env. May I help you ? Anyway, the patchset you submitted appeared to me so clearly, neat and simple, that it looks a piece of cake to make it secure. I know it's wrong, but you know, you can take it as a congratulation for your code quality :-p Regards, F.-E.B. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 9:38 ` Francois Barre @ 2006-01-19 0:35 ` Neil Brown 0 siblings, 0 replies; 72+ messages in thread From: Neil Brown @ 2006-01-19 0:35 UTC (permalink / raw) To: Francois Barre; +Cc: linux-raid On Tuesday January 17, francois.barre@gmail.com wrote: > Regarding box crash and process interruption, what is the remaining > work to be done to save the process status efficiently, in order to > resume resize process ? design and implement ... It's not particularly hard, but it is a separate task and I wanted to keep it separate to ease code review. > In my case, I would really wish to trust resizing enough to use it on > working env. > May I help you ? Testing and code review is probably the most helpful thing you can do, thanks. > > Anyway, the patchset you submitted appeared to me so clearly, neat and > simple, that it looks a piece of cake to make it secure. I know it's > wrong, but you know, you can take it as a congratulation for your code > quality :-p Thanks :-) NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 8:17 ` Michael Tokarev [not found] ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com> @ 2006-01-17 9:50 ` Sander 2006-01-17 11:26 ` Michael Tokarev 2006-01-17 14:10 ` Steinar H. Gunderson 2 siblings, 1 reply; 72+ messages in thread From: Sander @ 2006-01-17 9:50 UTC (permalink / raw) To: Michael Tokarev; +Cc: NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson Michael Tokarev wrote (ao): > NeilBrown wrote: > > Greetings. > > > > In line with the principle of "release early", following are 5 > > patches against md in 2.6.latest which implement reshaping of a > > raid5 array. By this I mean adding 1 or more drives to the array and > > then re-laying out all of the data. > > Neil, is this online resizing/reshaping really needed? I understand > all those words means alot for marketing persons - zero downtime, > online resizing etc, but it is much safer and easier to do that stuff > 'offline', on an inactive array, like raidreconf does - safer, easier, > faster, and one have more possibilities for more complex changes. It > isn't like you want to add/remove drives to/from your arrays every > day... Alot of good hw raid cards are unable to perform such reshaping > too. I like the feature. Not only marketing prefers zero downtime you know :-) Actually, I don't understand why you bother at all. One writes the feature. Another uses it. How would this feature harm you? Kind regards, Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 9:50 ` Sander @ 2006-01-17 11:26 ` Michael Tokarev 2006-01-17 11:37 ` Francois Barre ` (3 more replies) 0 siblings, 4 replies; 72+ messages in thread From: Michael Tokarev @ 2006-01-17 11:26 UTC (permalink / raw) To: sander; +Cc: NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson Sander wrote: > Michael Tokarev wrote (ao): [] >>Neil, is this online resizing/reshaping really needed? I understand >>all those words means alot for marketing persons - zero downtime, >>online resizing etc, but it is much safer and easier to do that stuff >>'offline', on an inactive array, like raidreconf does - safer, easier, >>faster, and one have more possibilities for more complex changes. It >>isn't like you want to add/remove drives to/from your arrays every >>day... Alot of good hw raid cards are unable to perform such reshaping >>too. [] > Actually, I don't understand why you bother at all. One writes the > feature. Another uses it. How would this feature harm you? This is about code complexity/bloat. It's already complex enouth. I rely on the stability of the linux softraid subsystem, and want it to be reliable. Adding more features, especially non-trivial ones, does not buy you bugfree raid subsystem, just the opposite: it will have more chances to crash, to eat your data etc, and will be harder in finding/fixing bugs. Raid code is already too fragile, i'm afraid "simple" I/O errors (which is what we need raid for) may crash the system already, and am waiting for the next whole system crash due to eg superblock update error or whatnot. I saw all sorts of failures due to linux softraid already (we use it here alot), including ones which required complete array rebuild with heavy data loss. Any "unnecessary bloat" (note the quotes: I understand some people like this and other features) makes whole system even more fragile than it is already. Compare this with my statement about "offline" "reshaper" above: separate userspace (easier to write/debug compared with kernel space) program which operates on an inactive array (no locking needed, no need to worry about other I/O operations going to the array at the time of reshaping etc), with an ability to plan it's I/O strategy in alot more efficient and safer way... Yes this apprpach has one downside: the array has to be inactive. But in my opinion it's worth it, compared to more possibilities to lose your data, even if you do NOT use that feature at all... /mjt ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 11:26 ` Michael Tokarev @ 2006-01-17 11:37 ` Francois Barre 2006-01-17 14:03 ` Kyle Moffett ` (2 subsequent siblings) 3 siblings, 0 replies; 72+ messages in thread From: Francois Barre @ 2006-01-17 11:37 UTC (permalink / raw) To: linux-raid 2006/1/17, Michael Tokarev <mjt@tls.msk.ru>: > Sander wrote: > This is about code complexity/bloat. It's already complex enouth. > I rely on the stability of the linux softraid subsystem, and want > it to be reliable. Adding more features, especially non-trivial > ones, does not buy you bugfree raid subsystem, just the opposite: > it will have more chances to crash, to eat your data etc, and will > be harder in finding/fixing bugs. > > Raid code is already too fragile, i'm afraid "simple" I/O errors > (which is what we need raid for) may crash the system already, and > am waiting for the next whole system crash due to eg superblock > update error or whatnot. I saw all sorts of failures due to > linux softraid already (we use it here alot), including ones > which required complete array rebuild with heavy data loss. > > Any "unnecessary bloat" (note the quotes: I understand some > people like this and other features) makes whole system even > more fragile than it is already. > > Compare this with my statement about "offline" "reshaper" above: > separate userspace (easier to write/debug compared with kernel > space) program which operates on an inactive array (no locking > needed, no need to worry about other I/O operations going to the > array at the time of reshaping etc), with an ability to plan it's > I/O strategy in alot more efficient and safer way... Yes this > apprpach has one downside: the array has to be inactive. But in > my opinion it's worth it, compared to more possibilities to lose > your data, even if you do NOT use that feature at all... > > /mjt I do agree with you about that : the lesser the code, the fewer the bugs. Of course. But I do think that Linux would not have become what it is now if each-and-every new feature was debated for years and years about their risk. Anyway, having a suspiously bogus raid5 resize is not a fatality : what about having a 'paranoïd' option/strategy, decreasing performance but mirroring superblocks on another device (not necessarily on the array), logging/journalling loads of stuff (metadata quite exclusively), and making resize much much stronger ? I prefer having my raid5 online and have a resize time of 12 hours than putting it offline and have a resize time of 2 hours. This is not marketting, this is the way computers shall behave :-p. Trustworthy features and flexibility in the same box. F.-E.B. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 11:26 ` Michael Tokarev 2006-01-17 11:37 ` Francois Barre @ 2006-01-17 14:03 ` Kyle Moffett 2006-01-19 0:28 ` Neil Brown 2006-01-17 16:08 ` Ross Vandegrift 2006-01-17 22:38 ` Phillip Susi 3 siblings, 1 reply; 72+ messages in thread From: Kyle Moffett @ 2006-01-17 14:03 UTC (permalink / raw) To: Michael Tokarev Cc: sander, NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson On Jan 17, 2006, at 06:26, Michael Tokarev wrote: > This is about code complexity/bloat. It's already complex enouth. > I rely on the stability of the linux softraid subsystem, and want > it to be reliable. Adding more features, especially non-trivial > ones, does not buy you bugfree raid subsystem, just the opposite: > it will have more chances to crash, to eat your data etc, and will > be harder in finding/fixing bugs. What part of: "You will need to enable the experimental MD_RAID5_RESHAPE config option for this to work." isn't bvious? If you don't want this feature, either don't turn on CONFIG_MD_RAID5_RESHAPE, or don't use the raid5 mdadm reshaping command. This feature might be extremely useful for some people (including me on occasion), but I would not trust it even on my family's fileserver (let alone a corporate one) until it's been through several generations of testing and bugfixing. Cheers, Kyle Moffett -- There is no way to make Linux robust with unreliable memory subsystems, sorry. It would be like trying to make a human more robust with an unreliable O2 supply. Memory just has to work. -- Andi Kleen ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 14:03 ` Kyle Moffett @ 2006-01-19 0:28 ` Neil Brown 0 siblings, 0 replies; 72+ messages in thread From: Neil Brown @ 2006-01-19 0:28 UTC (permalink / raw) To: Kyle Moffett Cc: Michael Tokarev, sander, linux-raid, linux-kernel, Steinar H. Gunderson On Tuesday January 17, mrmacman_g4@mac.com wrote: > On Jan 17, 2006, at 06:26, Michael Tokarev wrote: > > This is about code complexity/bloat. It's already complex enouth. > > I rely on the stability of the linux softraid subsystem, and want > > it to be reliable. Adding more features, especially non-trivial > > ones, does not buy you bugfree raid subsystem, just the opposite: > > it will have more chances to crash, to eat your data etc, and will > > be harder in finding/fixing bugs. > > What part of: "You will need to enable the experimental > MD_RAID5_RESHAPE config option for this to work." isn't bvious? If > you don't want this feature, either don't turn on > CONFIG_MD_RAID5_RESHAPE, or don't use the raid5 mdadm reshaping > command. This isn't really a fair comment. CONFIG_MD_RAID5_RESHAPE just enables the code. All the code is included whether this config option is set or not. So if code-bloat were an issue, the config option wouldn't answer it. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 11:26 ` Michael Tokarev 2006-01-17 11:37 ` Francois Barre 2006-01-17 14:03 ` Kyle Moffett @ 2006-01-17 16:08 ` Ross Vandegrift 2006-01-17 18:12 ` Michael Tokarev 2006-01-17 22:38 ` Phillip Susi 3 siblings, 1 reply; 72+ messages in thread From: Ross Vandegrift @ 2006-01-17 16:08 UTC (permalink / raw) To: Michael Tokarev; +Cc: linux-raid, linux-kernel On Tue, Jan 17, 2006 at 02:26:11PM +0300, Michael Tokarev wrote: > Raid code is already too fragile, i'm afraid "simple" I/O errors > (which is what we need raid for) may crash the system already, and > am waiting for the next whole system crash due to eg superblock > update error or whatnot. I think you've got some other issue if simple I/O errors cause issues. I've managed hundreds of MD arrays over the past ~ten years. MD is rock solid. I'd guess that I've recovered at least a hundred disk failures where data was saved by mdadm. What is your setup like? It's also possible that you've found a bug. > I saw all sorts of failures due to > linux softraid already (we use it here alot), including ones > which required complete array rebuild with heavy data loss. Are you sure? The one thing that's not always intuitive about MD - a faild array often still has your data and you can recover it. Unlike hardware RAID solutions, you have a lot of control over how the disks are assembled and used - this can be a major advantage. I'd say once a week someone comes on the linux-raid list and says "Oh no! I accidently ruined my RAID array!". Neil almost always responds "Well, don't do that! But since you did, this might help...". -- Ross Vandegrift ross@lug.udel.edu "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 16:08 ` Ross Vandegrift @ 2006-01-17 18:12 ` Michael Tokarev 2006-01-18 8:14 ` Sander 2006-01-19 0:22 ` Neil Brown 0 siblings, 2 replies; 72+ messages in thread From: Michael Tokarev @ 2006-01-17 18:12 UTC (permalink / raw) To: Ross Vandegrift; +Cc: linux-raid, linux-kernel Ross Vandegrift wrote: > On Tue, Jan 17, 2006 at 02:26:11PM +0300, Michael Tokarev wrote: > >>Raid code is already too fragile, i'm afraid "simple" I/O errors >>(which is what we need raid for) may crash the system already, and >>am waiting for the next whole system crash due to eg superblock >>update error or whatnot. > > I think you've got some other issue if simple I/O errors cause issues. > I've managed hundreds of MD arrays over the past ~ten years. MD is > rock solid. I'd guess that I've recovered at least a hundred disk failures > where data was saved by mdadm. > > What is your setup like? It's also possible that you've found a bug. We've about 500 systems with raid1, raid5 and raid10 running for about 5 or 6 years (since 0.90 beta patched into 2.2 kernel -- I don't think linux softraid existed before, or, rather, I can't say that was something which was possible to use in production). Most problematic case so far, which I described numerous times (like, "why linux raid isn't Raid really, why it can be worse than plain disk") is when, after single sector read failure, md kicks the whole disk off the array, and when you start resync (after replacing the "bad" drive or just remapping that bad sector or even doing nothing, as it will be remapped in almost all cases during write, on real drives anyway), you find another "bad sector" on another drive. After this, the array can't be started anymore, at least not w/o --force (ie, requires some user intervention, which is sometimes quite difficult if the server is several 100s miles away). More, it's quite difficult to recover it even manually (after --force'ing it to start), without fixing that bad sector somehow -- if first drive failure is "recent enouth" we've a hope that this very sector can be read from that first drive. if the alot of filesystem activity happened since that time, that chances are quite small; and with raid5 it's quite difficult to say where the error is in the filesystem, due to the complex layout of raid5. But this has been described here numerous times, and - hopefully - with current changes (re-writing of bad blocks) this very issue will go away, at least most common scenario of it (i'd try to keep even "bad" drive, even after some write errors, because it still contains some data which can be read; but that's problematic to say the best because one has to store a list of bad blocks somewhere...). (And no, I don't have all bad/cheap drives - it's just when you have hundreds or 1000s of drives, you've quite high probability that some of them will fail sometimes, or will develop a bad sector etc). >>I saw all sorts of failures due to >>linux softraid already (we use it here alot), including ones >>which required complete array rebuild with heavy data loss. > > Are you sure? The one thing that's not always intuitive about MD - a > faild array often still has your data and you can recover it. Unlike > hardware RAID solutions, you have a lot of control over how the disks > are assembled and used - this can be a major advantage. > > I'd say once a week someone comes on the linux-raid list and says "Oh no! > I accidently ruined my RAID array!". Neil almost always responds "Well, > don't do that! But since you did, this might help...". I know that. And I've quite some expirience too, and I studied mdadm source. There was in fact two cases like that, not one. First was mostly due to operator error, or lack of better choice at 2.2 (or early 2.4) times -- I relied on raid autodetection (which I don't do anymore, and strongly suggest others to switch to mdassemble or something like that) -- a drive failed (for real, not bad blocks) and needed to be replaced, and I forgot to clear the partition table on the replacement drive (which was in our testing box) - in a result, kernel assembled a raid5 out of components which belonged to different arrays.. I only vaguely remember what it was at that time -- maybe kernel or I started reconstruction (not noticiyng the wrong array), or i mounted the filesystem - can't say anymore for sure, but the result was that I wasn't able to restore the filesystem, because i didn't have that filesystem anymore. (it should have been assembling boot raid1 array but assembled a degraided raid5 instead) And second case was when, after an attempt to resync the array (after that famous 'bad block kicked off the whole disk) which resulted in an OOPS (which I didn't notice immediately, but it continued the resync), it wrote some garbage all over, resulting in badly broken filesystem, and somewhat broken nearby partition too (which I was able to recover). It was at about 2.4.19 or so, and I had that situation only once. Granted, I can't blame raid code for all this, because I don't even know what was in the oops (machine locked hard but someone who was near the server noticied it OOPSed) - it sure may be a bug somewhere else. As a sort of conclusion. There are several features that can be implemented in linux softraid code to make it real Raid, with data safety goal. One example is to be able to replace a "to-be-failed" drive (think SMART failure predictions for example) without removing it from the array with a (hot)spare (or just a replacement) -- by adding the new drive to the array *first*, and removing the to-be-replaced one only after new is fully synced. Another example is to implement some NVRAM-like storage for metadata (this will require the necessary hardware as well, like eg a flash card -- I dunno how safe it can be). And so on. The current MD code is "almost here", almost real. It still has some (maybe minor) problems, it still lacks some (again maybe minor) features wrt data safety. Ie, it still can fail, but it's almost here. While current development is going to implement some new and non-trivial features which are of little use in real life. Face it: yes it's good when you're able to reshape your array online keeping servicing your users, but i'd go for even 12 hours downtime if i know my data is safe, instead of unknown downtime after I realize the reshape failed for some reason and I dont have my data anymore. And yes it's very rarely used (which adds to the problem - rarely used code paths with bugs with stays unfound for alot of time, and bite you at a very unexpected moment, when you think it's all ok...) Well, not all is that bad really. I really apprecate Neil's work, it's all his baby after all, and I owe him alot of stuff because of all our machines which, due to raid code, are running fine (most of them anyway). I had a hopefully small question, whenever the new features are really useful, and just described my point of view to the topic.. And answered your, Ross, questions as well.. ;) Thank you. /mjt ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 18:12 ` Michael Tokarev @ 2006-01-18 8:14 ` Sander 2006-01-18 8:37 ` Brad Campbell ` (2 more replies) 2006-01-19 0:22 ` Neil Brown 1 sibling, 3 replies; 72+ messages in thread From: Sander @ 2006-01-18 8:14 UTC (permalink / raw) To: Michael Tokarev; +Cc: Ross Vandegrift, linux-raid, linux-kernel Michael Tokarev wrote (ao): > Most problematic case so far, which I described numerous times (like, > "why linux raid isn't Raid really, why it can be worse than plain > disk") is when, after single sector read failure, md kicks the whole > disk off the array, and when you start resync (after replacing the > "bad" drive or just remapping that bad sector or even doing nothing, > as it will be remapped in almost all cases during write, on real > drives anyway), If the (harddisk internal) remap succeeded, the OS doesn't see the bad sector at all I believe. If you (the OS) do see a bad sector, the disk couldn't remap, and goes downhill from there, right? Sander -- Humilis IT Services and Solutions http://www.humilis.net ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 8:14 ` Sander @ 2006-01-18 8:37 ` Brad Campbell 2006-01-18 9:03 ` Alan Cox 2006-01-18 12:46 ` John Hendrikx 2 siblings, 0 replies; 72+ messages in thread From: Brad Campbell @ 2006-01-18 8:37 UTC (permalink / raw) To: sander; +Cc: Michael Tokarev, Ross Vandegrift, linux-raid linux-kernel snipped from cc list. Sander wrote: > Michael Tokarev wrote (ao): >> Most problematic case so far, which I described numerous times (like, >> "why linux raid isn't Raid really, why it can be worse than plain >> disk") is when, after single sector read failure, md kicks the whole >> disk off the array, and when you start resync (after replacing the >> "bad" drive or just remapping that bad sector or even doing nothing, >> as it will be remapped in almost all cases during write, on real >> drives anyway), This particular case has been addressed in the latest kernels. md will now attempt to write the bad block back using reconstructed data and the disk will only be punted after multiple failures or a write failure (if my understanding of the patches is any good anyway) > If the (harddisk internal) remap succeeded, the OS doesn't see the bad > sector at all I believe. If the disk can get a good read then it will re-map on the fly and the OS has no idea there was an issue. If not then it returns a read error to the OS. When that sector is next written it will be re-mapped by the drive and the error disappears. > If you (the OS) do see a bad sector, the disk couldn't remap, and goes > downhill from there, right? With the older md code, yes, however as stated above this should almost become a non-issue now. (yay!) Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 8:14 ` Sander 2006-01-18 8:37 ` Brad Campbell @ 2006-01-18 9:03 ` Alan Cox 2006-01-18 12:46 ` John Hendrikx 2 siblings, 0 replies; 72+ messages in thread From: Alan Cox @ 2006-01-18 9:03 UTC (permalink / raw) To: sander; +Cc: Michael Tokarev, Ross Vandegrift, linux-raid, linux-kernel On Mer, 2006-01-18 at 09:14 +0100, Sander wrote: > If the (harddisk internal) remap succeeded, the OS doesn't see the bad > sector at all I believe. True for ATA, in the SCSI case you may be told about the remap having occurred but its a "by the way" type message not an error proper. > If you (the OS) do see a bad sector, the disk couldn't remap, and goes > downhill from there, right? If a hot spare is configured it will be dropped into the configuration at that point. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 8:14 ` Sander 2006-01-18 8:37 ` Brad Campbell 2006-01-18 9:03 ` Alan Cox @ 2006-01-18 12:46 ` John Hendrikx 2006-01-18 12:51 ` Gordon Henderson 2006-01-18 23:54 ` Neil Brown 2 siblings, 2 replies; 72+ messages in thread From: John Hendrikx @ 2006-01-18 12:46 UTC (permalink / raw) To: linux-raid Sander wrote: > Michael Tokarev wrote (ao): > >> Most problematic case so far, which I described numerous times (like, >> "why linux raid isn't Raid really, why it can be worse than plain >> disk") is when, after single sector read failure, md kicks the whole >> disk off the array, and when you start resync (after replacing the >> "bad" drive or just remapping that bad sector or even doing nothing, >> as it will be remapped in almost all cases during write, on real >> drives anyway), >> > > If the (harddisk internal) remap succeeded, the OS doesn't see the bad > sector at all I believe. > Most hard disks will not remap sectors when reading fails, because then the contents would be lost permanently. Instead, they will report a failure to the OS, hoping that the sector might be readable at some later time. What Linux Raid could do is reconstructing the sector that failed from the other drives and then writing it to disk. Because the original contents of the sector will be lost on writing, your hard disk can safely remap the sector (and it will -- I often "repaired" bad sectors by writing to them). > If you (the OS) do see a bad sector, the disk couldn't remap, and goes > downhill from there, right? > Not necessarily, if you see a bad sector after *writing* to it (several times), then your hard disk will probably go bad soon. Most hard disks only remap sectors on write, so a simple full format can fix sectors that failed on read. I agree with the original poster though, I'd really love to see Linux Raid take special action on sector read failures. It happens about 5-6 times a year here that a disk gets kicked out of the array for a simple read failure. A rebuild of the array will fix it without a trace, but a rebuild takes about 3 hours :) --John ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 12:46 ` John Hendrikx @ 2006-01-18 12:51 ` Gordon Henderson 2006-01-18 23:51 ` Neil Brown 2006-01-18 23:54 ` Neil Brown 1 sibling, 1 reply; 72+ messages in thread From: Gordon Henderson @ 2006-01-18 12:51 UTC (permalink / raw) To: linux-raid On Wed, 18 Jan 2006, John Hendrikx wrote: > I agree with the original poster though, I'd really love to see Linux > Raid take special action on sector read failures. It happens about 5-6 > times a year here that a disk gets kicked out of the array for a simple > read failure. A rebuild of the array will fix it without a trace, but a > rebuild takes about 3 hours :) One thing that's well worth doing before simply fail/remove/add the drive with the bad sector, is to do a read-only test on the other drives/paritions in the rest of the set. That way you won't find out half-way through the resync that other drives have failures, and then lose the lot. It adds time to the whole operation, but it's worth it IMO. Gordon ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 12:51 ` Gordon Henderson @ 2006-01-18 23:51 ` Neil Brown 2006-01-19 7:20 ` PFC 0 siblings, 1 reply; 72+ messages in thread From: Neil Brown @ 2006-01-18 23:51 UTC (permalink / raw) To: Gordon Henderson; +Cc: linux-raid On Wednesday January 18, gordon@drogon.net wrote: > On Wed, 18 Jan 2006, John Hendrikx wrote: > > > I agree with the original poster though, I'd really love to see Linux > > Raid take special action on sector read failures. It happens about 5-6 > > times a year here that a disk gets kicked out of the array for a simple > > read failure. A rebuild of the array will fix it without a trace, but a > > rebuild takes about 3 hours :) > > One thing that's well worth doing before simply fail/remove/add the drive > with the bad sector, is to do a read-only test on the other > drives/paritions in the rest of the set. That way you won't find out > half-way through the resync that other drives have failures, and then lose > the lot. It adds time to the whole operation, but it's worth it IMO. But what do you do if the read-only test fails... I guess you try to reconstruct using the nearly-failed drive... What might be good and practical is to not remove a failed drive completely, but to hold on to it and only read from it in desperation while reconstructing a spare. That might be worth the effort... NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 23:51 ` Neil Brown @ 2006-01-19 7:20 ` PFC 2006-01-19 8:01 ` dean gaudet 0 siblings, 1 reply; 72+ messages in thread From: PFC @ 2006-01-19 7:20 UTC (permalink / raw) To: Neil Brown, Gordon Henderson; +Cc: linux-raid While we're at it, here's a little issue I had with RAID5 ; not really the fault of md, but you might want to know... I have a 5x250GB RAID5 array for home storage (digital photo, my lossless ripped cds, etc). 1 IDE Drive ave 4 SATA Drives. Now, turns out one of the SATA drives is a Maxtor 6V250F0, and these have problems ; it died, then was RMA'd, then died again. Finally, it turned out this drive series is incompatible with nvidia sata chipsets. A third drive seems to work, setting the jumper to SATA 150. Back to the point. Failure mode of these drives is an IDE command timeout. This takes a long time ! So, when the drive has failed, each command to it takes forever. md will eventually reject said drive, but it takes hours ; and meanwhile, the computer is unusable and data is offline... In this case, the really tempting solution is to hit the windows key (er, the hard reset button) ; but doing this, makes the array dirty and degraded, and it won't mount, and all data is seemingly lost. (well, recoverable with a bit of hacking /* goto error; */, but that's not very clean...) This isn't really a md issue, but it's really annoying only when using RAID, because it makes a normal process (kicking a dead drive out) so slow it's almost non-functional. Is there a way to modify the timeout in question ? Note that, re-reading the log below, it writes "Disk failure on sdd1, disabling device. Operation continuing on 4 devices", but errors continue to come, and the array is still unreachable (ie. cat /proc/mdstat hangs, etc). Hmm... Thanks for the time. Jan 8 21:38:41 apollo13 ReiserFS: md2: checking transaction log (md2) Jan 8 21:39:11 apollo13 ata4: command 0xca timeout, stat 0xd0 host_stat 0x21 Jan 8 21:39:11 apollo13 ata4: translated ATA stat/err 0xca/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Jan 8 21:39:11 apollo13 ata4: status=0xca { Busy } Jan 8 21:39:11 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002 Jan 8 21:39:11 apollo13 sdd: Current: sense key=0xb Jan 8 21:39:11 apollo13 ASC=0x47 ASCQ=0x0 Jan 8 21:39:11 apollo13 Info fld=0x3f Jan 8 21:39:11 apollo13 end_request: I/O error, dev sdd, sector 63 Jan 8 21:39:11 apollo13 raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices Jan 8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977 Jan 8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977 Jan 8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977 Jan 8 21:39:41 apollo13 ata4: command 0xca timeout, stat 0xd0 host_stat 0x21 Jan 8 21:39:41 apollo13 ata4: translated ATA stat/err 0xca/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Jan 8 21:39:41 apollo13 ata4: status=0xca { Busy } Jan 8 21:39:41 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002 Jan 8 21:39:41 apollo13 sdd: Current: sense key=0xb Jan 8 21:39:41 apollo13 ASC=0x47 ASCQ=0x0 Jan 8 21:39:41 apollo13 Info fld=0x9840097 Jan 8 21:39:41 apollo13 end_request: I/O error, dev sdd, sector 159645847 Jan 8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977 Jan 8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977 Jan 8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977 Jan 8 21:40:01 apollo13 cron[17973]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons ) Jan 8 21:40:11 apollo13 ata4: command 0x35 timeout, stat 0xd0 host_stat 0x21 Jan 8 21:40:11 apollo13 ata4: translated ATA stat/err 0x35/00 to SCSI SK/ASC/ASCQ 0x4/00/00 Jan 8 21:40:11 apollo13 ata4: status=0x35 { DeviceFault SeekComplete CorrectedError Error } Jan 8 21:40:11 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002 Jan 8 21:40:11 apollo13 sdd: Current: sense key=0x4 Jan 8 21:40:11 apollo13 ASC=0x0 ASCQ=0x0 Jan 8 21:40:11 apollo13 end_request: I/O error, dev sdd, sector 465232831 ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 7:20 ` PFC @ 2006-01-19 8:01 ` dean gaudet 0 siblings, 0 replies; 72+ messages in thread From: dean gaudet @ 2006-01-19 8:01 UTC (permalink / raw) To: PFC; +Cc: Neil Brown, Gordon Henderson, linux-raid On Thu, 19 Jan 2006, PFC wrote: > This isn't really a md issue, but it's really annoying only when using > RAID, because it makes a normal process (kicking a dead drive out) so slow > it's almost non-functional. Is there a way to modify the timeout in question ? yeah i posted to l-k about similar problems a while back... i've got a disk which boots fine but fails all writes... useful for showing just how bad the system can become with a dead/dying disk. western digital is selling "raid edition" disks now -- and part of their marketing material discusses the long timeout which commodity disks implement <http://www.westerndigital.com/en/library/sata/2579-001098.pdf>. the raid edition disks give up earlier on the assumption the raid layer is going to take care of things. it's really too bad this isn't just a tunable parameter of the disk. even still -- the linux kernel could probably do something about this... drivers could have a blockdev(8) tunable timeout, and a mode where the driver just gives up entirely on the device at the first error/timeout and return EIO for all outstanding requests at that point... and the driver could remain in this state until an explicit request to re-attempt normal operations. -dean ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-18 12:46 ` John Hendrikx 2006-01-18 12:51 ` Gordon Henderson @ 2006-01-18 23:54 ` Neil Brown 1 sibling, 0 replies; 72+ messages in thread From: Neil Brown @ 2006-01-18 23:54 UTC (permalink / raw) To: John Hendrikx; +Cc: linux-raid On Wednesday January 18, hjohn@xs4all.nl wrote: > > I agree with the original poster though, I'd really love to see Linux > Raid take special action on sector read failures. It happens about 5-6 > times a year here that a disk gets kicked out of the array for a simple > read failure. A rebuild of the array will fix it without a trace, but a > rebuild takes about 3 hours :) See 2.6.15 (for raid5) or 2.6.16-rc1 (for raid1). You'll love it! NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 18:12 ` Michael Tokarev 2006-01-18 8:14 ` Sander @ 2006-01-19 0:22 ` Neil Brown 2006-01-19 9:01 ` Jakob Oestergaard 1 sibling, 1 reply; 72+ messages in thread From: Neil Brown @ 2006-01-19 0:22 UTC (permalink / raw) To: Michael Tokarev; +Cc: Ross Vandegrift, linux-raid, linux-kernel On Tuesday January 17, mjt@tls.msk.ru wrote: > > As a sort of conclusion. > > There are several features that can be implemented in linux softraid > code to make it real Raid, with data safety goal. One example is to > be able to replace a "to-be-failed" drive (think SMART failure > predictions for example) without removing it from the array with a > (hot)spare (or just a replacement) -- by adding the new drive to the > array *first*, and removing the to-be-replaced one only after new is > fully synced. Another example is to implement some NVRAM-like storage > for metadata (this will require the necessary hardware as well, like > eg a flash card -- I dunno how safe it can be). And so on. proactive replacement before complete failure is a good idea and is (just recently) on my todo list. It shouldn't be too hard. > > The current MD code is "almost here", almost real. It still has some > (maybe minor) problems, it still lacks some (again maybe minor) features > wrt data safety. Ie, it still can fail, but it's almost here. concrete suggestions are always welcome (though sometimes you might have to put some effort into convincing me...) > > While current development is going to implement some new and non-trivial > features which are of little use in real life. Face it: yes it's good > when you're able to reshape your array online keeping servicing your > users, but i'd go for even 12 hours downtime if i know my data is safe, > instead of unknown downtime after I realize the reshape failed for some > reason and I dont have my data anymore. And yes it's very rarely used > (which adds to the problem - rarely used code paths with bugs with stays > unfound for alot of time, and bite you at a very unexpected moment, when > you think it's all ok...) If you look at the amount of code in the 'reshape raid5' patch you will notice that it isn't really very much. It reuses a lot of the infrastructure that is already present in md/raid5. So a reshape actually uses a lot of code that is used very often. Compare this to an offline solution (raidreconfig) where all the code is only used occasionally. You could argue that the online version has more code safety than the offline version.... NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-19 0:22 ` Neil Brown @ 2006-01-19 9:01 ` Jakob Oestergaard 0 siblings, 0 replies; 72+ messages in thread From: Jakob Oestergaard @ 2006-01-19 9:01 UTC (permalink / raw) To: Neil Brown; +Cc: Michael Tokarev, Ross Vandegrift, linux-raid, linux-kernel On Thu, Jan 19, 2006 at 11:22:31AM +1100, Neil Brown wrote: ... > Compare this to an offline solution (raidreconfig) where all the code > is only used occasionally. You could argue that the online version > has more code safety than the offline version.... Correct. raidreconf, however, can convert a 2 disk RAID-0 to a 4 disk RAID-5 for example - the whole design of raidreconf is fundamentally different (of course) from the on-line reshape. The on-line reshape can be (and should be) much simpler. Now, back when I wrote raidreconf, my thoughts were that md would be merged into dm, and that raidreconf should evolve into something like 'pvmove' - a user-space tool that moves blocks around, interfacing with the kernel as much as strictly necessary, allowing hot reconfiguration of RAID setups. That was the idea. Reality, however, seems to be that MD is not moving quickly into DM (for whatever reasons). Also, I haven't had the time to actually just move on this myself. Today, raidreconf is used by some, but it is not maintained, and it is often too slow for comfortable off-line usage (reconfiguration of TB sized arrays is slow - not so much because of raidreconf, but because there simply is a lot of data that needs to be moved around). I still think that putting MD into DM and extending pvmove to include raidreconf functionality, would be the way to go. The final solution should also be tolerant (like pvmove is today) of power cycles during reconfiguration - the operation should be re-startable. Anyway - this is just me dreaming - I don't have time to do this and it seems that currently noone else has either. Great initiative with the reshape Neil - hot reconfiguration is much needed - personally I still hope to see MD move into DM and pvmove including raidreconf functionality, but I guess that when we're eating an elephant we should be satisfied with taking one bite at a time :) -- / jakob ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 11:26 ` Michael Tokarev ` (2 preceding siblings ...) 2006-01-17 16:08 ` Ross Vandegrift @ 2006-01-17 22:38 ` Phillip Susi 2006-01-17 22:57 ` Neil Brown 3 siblings, 1 reply; 72+ messages in thread From: Phillip Susi @ 2006-01-17 22:38 UTC (permalink / raw) To: Michael Tokarev Cc: sander, NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson Michael Tokarev wrote: <snip> > Compare this with my statement about "offline" "reshaper" above: > separate userspace (easier to write/debug compared with kernel > space) program which operates on an inactive array (no locking > needed, no need to worry about other I/O operations going to the > array at the time of reshaping etc), with an ability to plan it's > I/O strategy in alot more efficient and safer way... Yes this > apprpach has one downside: the array has to be inactive. But in > my opinion it's worth it, compared to more possibilities to lose > your data, even if you do NOT use that feature at all... > > I also like the idea of this kind of thing going in user space. I was also under the impression that md was going to be phased out and replaced by the device mapper. I've been kicking around the idea of a user space utility that manipulates the device mapper tables and performs block moves itself to reshape a raid array. It doesn't seem like it would be that difficult and would not require modifying the kernel at all. The basic idea is something like this: /dev/mapper/raid is your raid array, which is mapped to a stripe between /dev/sda, /dev/sdb. When you want to expand the stripe to add /dev/sdc to the array, you create three new devices: /dev/mapper/raid-old: copy of the old mapper table, striping sda and sdb /dev/mapper/raid-progress: linear map with size = new stripe width, and pointing to raid-new /dev/mapper/raid-new: what the raid will look like when done, i.e. stripe of sda, sdb, and sdc Then you replace /dev/mapper/raid with a linear map to raid-new, raid-progress, and raid-old, in that order. Initially the length of the chunks from raid-progress and raid-new are zero, so you will still be entirely accessing raid-old. For each stripe in the array, you change raid-progress to point to the corresponding blocks in raid-new, but suspended, so IO to this stripe will block. Then you update the raid map so raid-progress overlays the stripe you are working on to catch IO instead of allowing it to go to raid-old. After you read that stripe from raid-old and write it to raid-new, resume raid-progress to flush any blocked writes to the raid-new stripe. Finally update raid so the previously in progress stripe now maps to raid-new. Repeat for each stripe in the array, and finally replace the raid table with raid-new's table, and delete the 3 temporary devices. Adding transaction logging to the user mode utility wouldn't be very hard either. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 22:38 ` Phillip Susi @ 2006-01-17 22:57 ` Neil Brown 0 siblings, 0 replies; 72+ messages in thread From: Neil Brown @ 2006-01-17 22:57 UTC (permalink / raw) To: Phillip Susi Cc: Michael Tokarev, sander, linux-raid, linux-kernel, Steinar H. Gunderson On Tuesday January 17, psusi@cfl.rr.com wrote: > I was > also under the impression that md was going to be phased out and > replaced by the device mapper. I wonder where this sort of idea comes from.... Obviously individual distributions are free to support or not support whatever bits of code they like. And developers are free to add duplicate functionality to the kernel (I believe someone is working on a raid5 target for dm). But that doesn't mean that anything is going to be 'phased out'. md and dm, while similar, are quite different. They can both comfortably co-exist even if they have similar functionality. What I expect will happen (in line with what normally happens in Linux) is that both will continue to evolve as long as there is interest and developer support. They will quite possibly borrow ideas from each other where that is relevant. Parts of one may lose support and eventually die (as md/multipath is on the way to doing) but there is no wholesale 'phasing out' going to happen in either direction. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 8:17 ` Michael Tokarev [not found] ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com> 2006-01-17 9:50 ` Sander @ 2006-01-17 14:10 ` Steinar H. Gunderson 2 siblings, 0 replies; 72+ messages in thread From: Steinar H. Gunderson @ 2006-01-17 14:10 UTC (permalink / raw) To: Michael Tokarev; +Cc: NeilBrown, linux-raid, linux-kernel On Tue, Jan 17, 2006 at 11:17:15AM +0300, Michael Tokarev wrote: > Neil, is this online resizing/reshaping really needed? I understand > all those words means alot for marketing persons - zero downtime, > online resizing etc, but it is much safer and easier to do that stuff > 'offline', on an inactive array, like raidreconf does - safer, easier, > faster, and one have more possibilities for more complex changes. Try the scenario where the resize takes a week, and you don't have enough spare disks to move it onto another server -- besides, that would take several days alone... This is the kind of use-case for which I wrote the original patch, and I'm grateful that Neil has picked it up again so we can finally get something working in. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 6:56 NeilBrown 2006-01-17 8:17 ` Michael Tokarev @ 2006-01-17 15:07 ` Mr. James W. Laferriere 2006-01-19 0:23 ` Neil Brown 2006-01-22 4:42 ` Adam Kropelin 2006-01-23 1:08 ` John Hendrikx 3 siblings, 1 reply; 72+ messages in thread From: Mr. James W. Laferriere @ 2006-01-17 15:07 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid maillist Hello Neil , On Tue, 17 Jan 2006, NeilBrown wrote: > Greetings. > > In line with the principle of "release early", following are 5 patches > against md in 2.6.latest which implement reshaping of a raid5 array. > By this I mean adding 1 or more drives to the array and then re-laying > out all of the data. Please inform me of which of the 2.6.latest to use ? Tia , JimL The latest stable version of the Linux kernel is: 2.6.15.1 2006-01-15 06:14 UTC F V C Changelog The latest prepatch for the stable Linux kernel tree is: 2.6.16-rc1 2006-01-17 08:09 UTC V C Changelog The latest snapshot for the stable Linux kernel tree is: 2.6.15-git12 2006-01-16 08:04 UTC V C Changelog > This is still EXPERIMENTAL and could easily eat your data. Don't use it on > valuable data. Only use it for review and testing. > > This release does not make ANY attempt to record how far the reshape > has progressed on stable storage. That means that if the process is > interrupted either by a crash or by "mdadm -S", then you completely > lose your data. All of it. > So don't use it on valuable data. > > There are 5 patches to (hopefully) ease review. Comments are most > welcome, as are test results (providing they aren't done on valuable data:-). > > You will need to enable the experimental MD_RAID5_RESHAPE config option > for this to work. Please read the help message that come with it. > It gives an example mdadm command to effect a reshape (you do not need > a new mdadm, and vaguely recent version should work). > > This code is based in part on earlier work by > "Steinar H. Gunderson" <sgunderson@bigfoot.com> > Though little of his code remains, having access to it, and having > discussed the issues with him greatly eased the processed of creating > these patches. Thanks Steinar. > > NeilBrown > > [PATCH 001 of 5] md: Split disks array out of raid5 conf structure so it is easier to grow. > [PATCH 002 of 5] md: Allow stripes to be expanded in preparation for expanding an array. > [PATCH 003 of 5] md: Infrastructure to allow normal IO to continue while array is expanding. > [PATCH 004 of 5] md: Core of raid5 resize process > [PATCH 005 of 5] md: Final stages of raid5 expand code. -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | | http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 15:07 ` Mr. James W. Laferriere @ 2006-01-19 0:23 ` Neil Brown 0 siblings, 0 replies; 72+ messages in thread From: Neil Brown @ 2006-01-19 0:23 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist On Tuesday January 17, babydr@baby-dragons.com wrote: > Hello Neil , > > On Tue, 17 Jan 2006, NeilBrown wrote: > > Greetings. > > > > In line with the principle of "release early", following are 5 patches > > against md in 2.6.latest which implement reshaping of a raid5 array. > > By this I mean adding 1 or more drives to the array and then re-laying > > out all of the data. > Please inform me of which of the 2.6.latest to use ? Tia , JimL > > The latest stable version of the Linux kernel is: 2.6.15.1 2006-01-15 06:14 UTC F V C Changelog > The latest prepatch for the stable Linux kernel tree is: 2.6.16-rc1 2006-01-17 08:09 UTC V C Changelog > The latest snapshot for the stable Linux kernel tree is: 2.6.15-git12 2006-01-16 08:04 UTC V C Changelog Yes, any of those would be fine. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 6:56 NeilBrown 2006-01-17 8:17 ` Michael Tokarev 2006-01-17 15:07 ` Mr. James W. Laferriere @ 2006-01-22 4:42 ` Adam Kropelin 2006-01-22 22:52 ` Neil Brown 2006-01-23 1:08 ` John Hendrikx 3 siblings, 1 reply; 72+ messages in thread From: Adam Kropelin @ 2006-01-22 4:42 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson NeilBrown <neilb@suse.de> wrote: > In line with the principle of "release early", following are 5 patches > against md in 2.6.latest which implement reshaping of a raid5 array. > By this I mean adding 1 or more drives to the array and then re-laying > out all of the data. I've been looking forward to a feature like this, so I took the opportunity to set up a vmware session and give the patches a try. I encountered both success and failure, and here are the details of both. On the first try I neglected to read the directions and increased the number of devices first (which worked) and then attempted to add the physical device (which didn't work; at least not the way I intended). The result was an array of size 4, operating in degraded mode, with three active drives and one spare. I was unable to find a way to coax mdadm into adding the 4th drive as an active device instead of a spare. I'm not an mdadm guru, so there may be a method I overlooked. Here's what I did, interspersed with trimmed /proc/mdstat output: mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc md0 : active raid5 sda[0] sdc[2] sdb[1] 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] mdadm --grow -n4 /dev/md0 md0 : active raid5 sda[0] sdc[2] sdb[1] 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] mdadm --manage --add /dev/md0 /dev/sdd md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1] 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] mdadm --misc --stop /dev/md0 mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1] 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] For my second try I actually read the directions and things went much better, aside from a possible /proc/mdstat glitch shown below. mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc md0 : active raid5 sda[0] sdc[2] sdb[1] 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] mdadm --manage --add /dev/md0 /dev/sdd md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0] 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] mdadm --grow -n4 /dev/md0 md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] 2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] ...should this be... --> [4/3] [UUU_] perhaps? [>....................] recovery = 0.4% (5636/1048512) finish=9.1min speed=1878K/sec [...time passes...] md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] 3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] My final test was a repeat of #2, but with data actively being written to the array during the reshape (the previous tests were on an idle, unmounted array). This one failed pretty hard, with several processes ending up in the D state. I repeated it twice and sysrq-t dumps can be found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>. The writeout load was a kernel tree untar started shortly before the 'mdadm --grow' command was given. mdadm hung, as did tar. Any process which subsequently attmpted to access the array hung as well. A second attempt at the same thing hung similarly, although only pdflush shows up hung in that trace. mdadm and tar are missing for some reason. I'm happy to do more tests. It's easy to conjur up virtual disks and load them with irrelevant data (like kernel trees ;) --Adam ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-22 4:42 ` Adam Kropelin @ 2006-01-22 22:52 ` Neil Brown 2006-01-23 23:02 ` Adam Kropelin 0 siblings, 1 reply; 72+ messages in thread From: Neil Brown @ 2006-01-22 22:52 UTC (permalink / raw) To: Adam Kropelin; +Cc: NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson On Saturday January 21, akropel1@rochester.rr.com wrote: > NeilBrown <neilb@suse.de> wrote: > > In line with the principle of "release early", following are 5 patches > > against md in 2.6.latest which implement reshaping of a raid5 array. > > By this I mean adding 1 or more drives to the array and then re-laying > > out all of the data. > > I've been looking forward to a feature like this, so I took the > opportunity to set up a vmware session and give the patches a try. I > encountered both success and failure, and here are the details of both. > > On the first try I neglected to read the directions and increased the > number of devices first (which worked) and then attempted to add the > physical device (which didn't work; at least not the way I intended). > The result was an array of size 4, operating in degraded mode, with > three active drives and one spare. I was unable to find a way to coax > mdadm into adding the 4th drive as an active device instead of a > spare. I'm not an mdadm guru, so there may be a method I overlooked. > Here's what I did, interspersed with trimmed /proc/mdstat output: Thanks, this is exactly the sort of feedback I was hoping for - people testing thing that I didn't think to... > > mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc > > md0 : active raid5 sda[0] sdc[2] sdb[1] > 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > mdadm --grow -n4 /dev/md0 > > md0 : active raid5 sda[0] sdc[2] sdb[1] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] I assume that no "resync" started at this point? It should have done. > > mdadm --manage --add /dev/md0 /dev/sdd > > md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > mdadm --misc --stop /dev/md0 > mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd > > md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] This really should have started a recovery.... I'll look into that too. > > For my second try I actually read the directions and things went much > better, aside from a possible /proc/mdstat glitch shown below. > > mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc > > md0 : active raid5 sda[0] sdc[2] sdb[1] > 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > mdadm --manage --add /dev/md0 /dev/sdd > > md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0] > 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > mdadm --grow -n4 /dev/md0 > > md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] > 2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > ...should this be... --> [4/3] [UUU_] perhaps? Well, part of the array is "4/4 UUUU" and part is "3/3 UUU". How do you represent that? I think "4/4 UUUU" is best. > [>....................] recovery = 0.4% (5636/1048512) finish=9.1min speed=1878K/sec > > [...time passes...] > > md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > > My final test was a repeat of #2, but with data actively being written > to the array during the reshape (the previous tests were on an idle, > unmounted array). This one failed pretty hard, with several processes > ending up in the D state. I repeated it twice and sysrq-t dumps can be > found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>. > The writeout load was a kernel tree untar started shortly before the > 'mdadm --grow' command was given. mdadm hung, as did tar. Any process > which subsequently attmpted to access the array hung as well. A second > attempt at the same thing hung similarly, although only pdflush shows up > hung in that trace. mdadm and tar are missing for some reason. Hmmm... I tried similar things but didn't get this deadlock. Somehow the fact that mdadm is holding the reconfig_sem semaphore means that some IO cannot proceed and so mdadm cannot grab and resize all the stripe heads... I'll have to look more deeply into this. > > I'm happy to do more tests. It's easy to conjur up virtual disks and > load them with irrelevant data (like kernel trees ;) Great. I'll probably be putting out a new patch set late this week or early next. Hopefully it will fix the issues you can found and you can try it again.. Thanks again, NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-22 22:52 ` Neil Brown @ 2006-01-23 23:02 ` Adam Kropelin 0 siblings, 0 replies; 72+ messages in thread From: Adam Kropelin @ 2006-01-23 23:02 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson Neil Brown wrote: > On Saturday January 21, akropel1@rochester.rr.com wrote: >> On the first try I neglected to read the directions and increased the >> number of devices first (which worked) and then attempted to add the >> physical device (which didn't work; at least not the way I intended). > > Thanks, this is exactly the sort of feedback I was hoping for - people > testing thing that I didn't think to... > >> mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc >> >> md0 : active raid5 sda[0] sdc[2] sdb[1] >> 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> >> mdadm --grow -n4 /dev/md0 >> >> md0 : active raid5 sda[0] sdc[2] sdb[1] >> 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > I assume that no "resync" started at this point? It should have done. Actually, it did start a resync. Sorry, I should have mentioned that. I waited until the resync completed before I issued the 'mdadm --add' command. >> md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] >> 2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] >> ...should this be... --> [4/3] >> [UUU_] perhaps? > > Well, part of the array is "4/4 UUUU" and part is "3/3 UUU". How do > you represent that? I think "4/4 UUUU" is best. I see your point. I was expecting some indication that that my array was vulnerable and that the new disk was not fully utilized yet. I guess the resync in progress indicator is sufficient. >> My final test was a repeat of #2, but with data actively being >> written >> to the array during the reshape (the previous tests were on an idle, >> unmounted array). This one failed pretty hard, with several processes >> ending up in the D state. > > Hmmm... I tried similar things but didn't get this deadlock. Somehow > the fact that mdadm is holding the reconfig_sem semaphore means that > some IO cannot proceed and so mdadm cannot grab and resize all the > stripe heads... I'll have to look more deeply into this. For what it's worth, I'm using the Buslogic SCSI driver for the disks in the array. >> I'm happy to do more tests. It's easy to conjur up virtual disks and >> load them with irrelevant data (like kernel trees ;) > > Great. I'll probably be putting out a new patch set late this week > or early next. Hopefully it will fix the issues you can found and you > can try it again.. Looking forward to it... --Adam ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-17 6:56 NeilBrown ` (2 preceding siblings ...) 2006-01-22 4:42 ` Adam Kropelin @ 2006-01-23 1:08 ` John Hendrikx 2006-01-23 1:25 ` Neil Brown 3 siblings, 1 reply; 72+ messages in thread From: John Hendrikx @ 2006-01-23 1:08 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson NeilBrown wrote: > In line with the principle of "release early", following are 5 patches > against md in 2.6.latest which implement reshaping of a raid5 array. > By this I mean adding 1 or more drives to the array and then re-laying > out all of the data. > I think my question is already answered by this, but... Would this also allow changing the size of each raid device? Let's say I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I only allowed to add more 160 GB devices? ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 1:08 ` John Hendrikx @ 2006-01-23 1:25 ` Neil Brown 2006-01-23 1:54 ` Kyle Moffett 2006-01-23 2:09 ` Mr. James W. Laferriere 0 siblings, 2 replies; 72+ messages in thread From: Neil Brown @ 2006-01-23 1:25 UTC (permalink / raw) To: John Hendrikx; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson On Monday January 23, hjohn@xs4all.nl wrote: > NeilBrown wrote: > > In line with the principle of "release early", following are 5 patches > > against md in 2.6.latest which implement reshaping of a raid5 array. > > By this I mean adding 1 or more drives to the array and then re-laying > > out all of the data. > > > I think my question is already answered by this, but... > > Would this also allow changing the size of each raid device? Let's say > I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I > only allowed to add more 160 GB devices? Changing the size of the devices is a separate operation that has been supported for a while. For each device in turn, you fail it and replace it with a larger device. (This means the array runs degraded for a while, which isn't ideal and might be fixed one day). Once all the devices in the array are of the desired size, you run mdadm --grow /dev/mdX --size=max and the array (raid1, raid5, raid6) will use up all available space on the devices, and a resync will start to make sure that extra space is in-sync. NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 1:25 ` Neil Brown @ 2006-01-23 1:54 ` Kyle Moffett 2006-01-23 2:09 ` Mr. James W. Laferriere 1 sibling, 0 replies; 72+ messages in thread From: Kyle Moffett @ 2006-01-23 1:54 UTC (permalink / raw) To: Neil Brown; +Cc: John Hendrikx, linux-raid, linux-kernel, Steinar H. Gunderson On Jan 22, 2006, at 20:25, Neil Brown wrote: > Changing the size of the devices is a separate operation that has > been supported for a while. For each device in turn, you fail it > and replace it with a larger device. (This means the array runs > degraded for a while, which isn't ideal and might be fixed one day). > > Once all the devices in the array are of the desired size, you run > mdadm --grow /dev/mdX --size=max > and the array (raid1, raid5, raid6) will use up all available space > on the devices, and a resync will start to make sure that extra > space is in-sync. One option I can think of that would make it much safer would be to originally set up your RAID like this: md3 (RAID-5) __________/ | \__________ / | \ md0 (RAID-1) md1 (RAID-1) md2 (RAID-1) Each of md0-2 would only have a single drive, and therefore provide no redundancy. When you wanted to grow the RAID-5, you would first add a new larger disk to each of md0-md2 and trigger each resync. Once that is complete, remove the old drives from md0-2 and run: mdadm --grow /dev/md0 --size=max mdadm --grow /dev/md1 --size=max mdadm --grow /dev/md2 --size=max Then once all that has completed, run: mdadm --grow /dev/md3 --size=max This will enlarge the top-level array. If you have LVM on the top- level, you can allocate new LVs, resize existing ones, etc. With the newly added code, you could also add new drives dynamically by creating a /dev/md4 out of the single drive, and adding that as a new member of /dev/md3. Cheers, Kyle Moffett -- I lost interest in "blade servers" when I found they didn't throw knives at people who weren't supposed to be in your machine room. -- Anthony de Boer ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 1:25 ` Neil Brown 2006-01-23 1:54 ` Kyle Moffett @ 2006-01-23 2:09 ` Mr. James W. Laferriere 2006-01-23 2:33 ` Neil Brown 1 sibling, 1 reply; 72+ messages in thread From: Mr. James W. Laferriere @ 2006-01-23 2:09 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid maillist Hello Neil , On Mon, 23 Jan 2006, Neil Brown wrote: > On Monday January 23, hjohn@xs4all.nl wrote: >> NeilBrown wrote: >>> In line with the principle of "release early", following are 5 patches >>> against md in 2.6.latest which implement reshaping of a raid5 array. >>> By this I mean adding 1 or more drives to the array and then re-laying >>> out all of the data. >>> >> I think my question is already answered by this, but... >> >> Would this also allow changing the size of each raid device? Let's say >> I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I >> only allowed to add more 160 GB devices? > > Changing the size of the devices is a separate operation that has been > supported for a while. > For each device in turn, you fail it and replace it with a larger > device. (This means the array runs degraded for a while, which isn't > ideal and might be fixed one day). > > Once all the devices in the array are of the desired size, you run > mdadm --grow /dev/mdX --size=max > and the array (raid1, raid5, raid6) will use up all available space on > the devices, and a resync will start to make sure that extra space is > in-sync. How does one come up with a accurate '--size=max' ? I thought someone had asked this question before , but the message where this was mentioned eluded me . Tia , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | | http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 000 of 5] md: Introduction 2006-01-23 2:09 ` Mr. James W. Laferriere @ 2006-01-23 2:33 ` Neil Brown 0 siblings, 0 replies; 72+ messages in thread From: Neil Brown @ 2006-01-23 2:33 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist On Sunday January 22, babydr@baby-dragons.com wrote: > Hello Neil , > > On Mon, 23 Jan 2006, Neil Brown wrote: > > On Monday January 23, hjohn@xs4all.nl wrote: > >> NeilBrown wrote: > >>> In line with the principle of "release early", following are 5 patches > >>> against md in 2.6.latest which implement reshaping of a raid5 array. > >>> By this I mean adding 1 or more drives to the array and then re-laying > >>> out all of the data. > >>> > >> I think my question is already answered by this, but... > >> > >> Would this also allow changing the size of each raid device? Let's say > >> I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I > >> only allowed to add more 160 GB devices? > > > > Changing the size of the devices is a separate operation that has been > > supported for a while. > > For each device in turn, you fail it and replace it with a larger > > device. (This means the array runs degraded for a while, which isn't > > ideal and might be fixed one day). > > > > Once all the devices in the array are of the desired size, you run > > mdadm --grow /dev/mdX --size=max > > and the array (raid1, raid5, raid6) will use up all available space on > > the devices, and a resync will start to make sure that extra space is > > in-sync. > How does one come up with a accurate '--size=max' ? > I thought someone had asked this question before , but the > message where this was mentioned eluded me . > Tia , JimL --size=max is literal. If you say 'max', mdadm will either figure out the maximum, or tell the kernel to (I don't remember which). NeilBrown ^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2006-01-24 2:02 UTC | newest]
Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-17 21:38 [PATCH 000 of 5] md: Introduction Lincoln Dale (ltd)
2006-01-18 13:27 ` Jan Engelhardt
2006-01-18 23:19 ` Neil Brown
2006-01-19 15:33 ` Mark Hahn
2006-01-19 20:12 ` Jan Engelhardt
2006-01-19 21:22 ` Lars Marowsky-Bree
2006-01-19 22:17 ` Phillip Susi
2006-01-19 22:32 ` Neil Brown
2006-01-19 23:26 ` Phillip Susi
2006-01-19 23:43 ` Neil Brown
2006-01-20 2:17 ` Phillip Susi
2006-01-20 10:53 ` Lars Marowsky-Bree
2006-01-20 12:06 ` Jens Axboe
2006-01-20 18:38 ` Heinz Mauelshagen
2006-01-20 22:09 ` Lars Marowsky-Bree
2006-01-21 0:06 ` Heinz Mauelshagen
2006-01-20 18:41 ` Heinz Mauelshagen
2006-01-20 17:29 ` Ross Vandegrift
2006-01-20 18:36 ` Heinz Mauelshagen
2006-01-20 22:57 ` Lars Marowsky-Bree
2006-01-21 0:01 ` Heinz Mauelshagen
2006-01-21 0:03 ` Lars Marowsky-Bree
2006-01-21 0:08 ` Heinz Mauelshagen
2006-01-21 0:13 ` Lars Marowsky-Bree
2006-01-23 9:44 ` Heinz Mauelshagen
2006-01-23 10:26 ` Lars Marowsky-Bree
2006-01-23 10:38 ` Heinz Mauelshagen
2006-01-23 10:45 ` Lars Marowsky-Bree
2006-01-23 11:00 ` Heinz Mauelshagen
2006-01-23 12:54 ` Ville Herva
2006-01-23 13:00 ` Steinar H. Gunderson
2006-01-23 13:54 ` Heinz Mauelshagen
2006-01-23 17:33 ` Ville Herva
2006-01-24 2:02 ` Phillip Susi
2006-01-20 7:51 ` Reuben Farrelly
2006-01-20 3:43 ` Andre' Breiler
2006-01-21 0:42 ` David Greaves
-- strict thread matches above, loose matches on Subject: below --
2006-01-17 6:56 NeilBrown
2006-01-17 8:17 ` Michael Tokarev
[not found] ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com>
2006-01-17 9:38 ` Francois Barre
2006-01-19 0:35 ` Neil Brown
2006-01-17 9:50 ` Sander
2006-01-17 11:26 ` Michael Tokarev
2006-01-17 11:37 ` Francois Barre
2006-01-17 14:03 ` Kyle Moffett
2006-01-19 0:28 ` Neil Brown
2006-01-17 16:08 ` Ross Vandegrift
2006-01-17 18:12 ` Michael Tokarev
2006-01-18 8:14 ` Sander
2006-01-18 8:37 ` Brad Campbell
2006-01-18 9:03 ` Alan Cox
2006-01-18 12:46 ` John Hendrikx
2006-01-18 12:51 ` Gordon Henderson
2006-01-18 23:51 ` Neil Brown
2006-01-19 7:20 ` PFC
2006-01-19 8:01 ` dean gaudet
2006-01-18 23:54 ` Neil Brown
2006-01-19 0:22 ` Neil Brown
2006-01-19 9:01 ` Jakob Oestergaard
2006-01-17 22:38 ` Phillip Susi
2006-01-17 22:57 ` Neil Brown
2006-01-17 14:10 ` Steinar H. Gunderson
2006-01-17 15:07 ` Mr. James W. Laferriere
2006-01-19 0:23 ` Neil Brown
2006-01-22 4:42 ` Adam Kropelin
2006-01-22 22:52 ` Neil Brown
2006-01-23 23:02 ` Adam Kropelin
2006-01-23 1:08 ` John Hendrikx
2006-01-23 1:25 ` Neil Brown
2006-01-23 1:54 ` Kyle Moffett
2006-01-23 2:09 ` Mr. James W. Laferriere
2006-01-23 2:33 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).