RE: [PATCH 000 of 5] md: Introduction

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: [PATCH 000 of 5] md: Introduction
@ 2006-01-17 21:38 Lincoln Dale (ltd)
  2006-01-18 13:27 ` Jan Engelhardt
  0 siblings, 1 reply; 72+ messages in thread
From: Lincoln Dale (ltd) @ 2006-01-17 21:38 UTC (permalink / raw)
  To: Michael Tokarev, NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson

> Neil, is this online resizing/reshaping really needed?  I understand
> all those words means alot for marketing persons - zero downtime,
> online resizing etc, but it is much safer and easier to do that stuff
> 'offline', on an inactive array, like raidreconf does - safer, easier,
> faster, and one have more possibilities for more complex changes.  It
> isn't like you want to add/remove drives to/from your arrays every
day...
> Alot of good hw raid cards are unable to perform such reshaping too.

RAID resize/restripe may not be so common with cheap / PC-based RAID
systems, but it is common with midrange and enterprise storage
subsystems from vendors such as EMC, HDS, IBM & HP.
in fact, I'd say it's the exception to the rule _if_ an
midrange/enterprise storage subsystem doesn't have an _online_ resize
capability..

personally, I think this this useful functionality, but my personal
preference is that this would be in DM/LVM2 rather than MD.  but given
Neil is the MD author/maintainer, I can see why he'd prefer to do it in
MD. :)

cheers,

lincoln.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 000 of 5] md: Introduction
  2006-01-17 21:38 [PATCH 000 of 5] md: Introduction Lincoln Dale (ltd)
@ 2006-01-18 13:27 ` Jan Engelhardt
  2006-01-18 23:19   ` Neil Brown
  0 siblings, 1 reply; 72+ messages in thread
From: Jan Engelhardt @ 2006-01-18 13:27 UTC (permalink / raw)
  To: Lincoln Dale (ltd)
  Cc: Michael Tokarev, NeilBrown, linux-raid, linux-kernel,
	Steinar H. Gunderson


>personally, I think this this useful functionality, but my personal
>preference is that this would be in DM/LVM2 rather than MD.  but given
>Neil is the MD author/maintainer, I can see why he'd prefer to do it in
>MD. :)

Why don't MD and DM merge some bits?



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 000 of 5] md: Introduction
  2006-01-18 13:27 ` Jan Engelhardt
@ 2006-01-18 23:19   ` Neil Brown
  2006-01-19 15:33     ` Mark Hahn
                       ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-18 23:19 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Wednesday January 18, jengelh@linux01.gwdg.de wrote:
> 
> >personally, I think this this useful functionality, but my personal
> >preference is that this would be in DM/LVM2 rather than MD.  but given
> >Neil is the MD author/maintainer, I can see why he'd prefer to do it in
> >MD. :)
> 
> Why don't MD and DM merge some bits?
> 

Which bits?
Why?

My current opinion is that you should:

 Use md for raid1, raid5, raid6 - anything with redundancy.
 Use dm for multipath, crypto, linear, LVM, snapshot
 Use either for raid0 (I don't think dm has particular advantages
     for md or md over dm).

These can be mixed together quite effectively:
  You can have dm/lvm over md/raid1 over dm/multipath
with no problems.

If there is functionality missing from any of these recommended
components, then make a noise about it, preferably but not necessarily
with code, and it will quite possibly be fixed.

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 000 of 5] md: Introduction
  2006-01-18 23:19   ` Neil Brown
@ 2006-01-19 15:33     ` Mark Hahn
  2006-01-19 20:12     ` Jan Engelhardt
  2006-01-19 22:17     ` Phillip Susi
  2 siblings, 0 replies; 72+ messages in thread
From: Mark Hahn @ 2006-01-19 15:33 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

>  Use either for raid0 (I don't think dm has particular advantages
>      for md or md over dm).

I measured this a few months ago, and was surprised to find that 
DM raid0 was very noticably slower than MD raid0.  same machine,
same disks/controller/kernel/settings/stripe-size.  I didn't try
to find out why, since I usually need redundancy...

regards, mark hahn.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH 000 of 5] md: Introduction
  2006-01-18 23:19   ` Neil Brown
  2006-01-19 15:33     ` Mark Hahn
@ 2006-01-19 20:12     ` Jan Engelhardt
  2006-01-19 21:22       ` Lars Marowsky-Bree
  2006-01-19 22:17     ` Phillip Susi
  2 siblings, 1 reply; 72+ messages in thread
From: Jan Engelhardt @ 2006-01-19 20:12 UTC (permalink / raw)
  To: Neil Brown
  Cc: Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

>> >personally, I think this this useful functionality, but my personal
>> >preference is that this would be in DM/LVM2 rather than MD.  but given
>> >Neil is the MD author/maintainer, I can see why he'd prefer to do it in
>> >MD. :)
>> 
>> Why don't MD and DM merge some bits?
>
>Which bits?
>Why?
>
>My current opinion is that you should:
>
> Use md for raid1, raid5, raid6 - anything with redundancy.
> Use dm for multipath, crypto, linear, LVM, snapshot

There are pairs of files that look like they would do the same thing:

  raid1.c  <-> dm-raid1.c
  linear.c <-> dm-linear.c



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 20:12     ` Jan Engelhardt
@ 2006-01-19 21:22       ` Lars Marowsky-Bree
  0 siblings, 0 replies; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-19 21:22 UTC (permalink / raw)
  To: Jan Engelhardt, Neil Brown
  Cc: Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On 2006-01-19T21:12:02, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote:

> > Use md for raid1, raid5, raid6 - anything with redundancy.
> > Use dm for multipath, crypto, linear, LVM, snapshot
> There are pairs of files that look like they would do the same thing:
> 
>   raid1.c  <-> dm-raid1.c
>   linear.c <-> dm-linear.c

Sure there's some historical overlap. It'd make sense if DM used the md
raid personalities, yes.


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18 23:19   ` Neil Brown
  2006-01-19 15:33     ` Mark Hahn
  2006-01-19 20:12     ` Jan Engelhardt
@ 2006-01-19 22:17     ` Phillip Susi
  2006-01-19 22:32       ` Neil Brown
  2 siblings, 1 reply; 72+ messages in thread
From: Phillip Susi @ 2006-01-19 22:17 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

I'm currently of the opinion that dm needs a raid5 and raid6 module 
added, then the user land lvm tools fixed to use them, and then you 
could use dm instead of md.  The benefit being that dm pushes things 
like volume autodetection and management out of the kernel to user space 
where it belongs.  But that's just my opinion...

I'm using dm at home because I have a sata hardware fakeraid raid-0 
between two WD 10,000 rpm raptors, and the dmraid utility correctly 
recognizes that and configures device mapper to use it. 

Neil Brown wrote:
> Which bits?
> Why?
>
> My current opinion is that you should:
>
>  Use md for raid1, raid5, raid6 - anything with redundancy.
>  Use dm for multipath, crypto, linear, LVM, snapshot
>  Use either for raid0 (I don't think dm has particular advantages
>      for md or md over dm).
>
> These can be mixed together quite effectively:
>   You can have dm/lvm over md/raid1 over dm/multipath
> with no problems.
>
> If there is functionality missing from any of these recommended
> components, then make a noise about it, preferably but not necessarily
> with code, and it will quite possibly be fixed.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 22:17     ` Phillip Susi
@ 2006-01-19 22:32       ` Neil Brown
  2006-01-19 23:26         ` Phillip Susi
  2006-01-20  7:51         ` Reuben Farrelly
  0 siblings, 2 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-19 22:32 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

On Thursday January 19, psusi@cfl.rr.com wrote:
> I'm currently of the opinion that dm needs a raid5 and raid6 module 
> added, then the user land lvm tools fixed to use them, and then you 
> could use dm instead of md.  The benefit being that dm pushes things 
> like volume autodetection and management out of the kernel to user space 
> where it belongs.  But that's just my opinion...

The in-kernel autodetection in md is purely legacy support as far as I
am concerned.  md does volume detection in user space via 'mdadm'.

What other "things like" were you thinking of.

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 22:32       ` Neil Brown
@ 2006-01-19 23:26         ` Phillip Susi
  2006-01-19 23:43           ` Neil Brown
  2006-01-20  7:51         ` Reuben Farrelly
  1 sibling, 1 reply; 72+ messages in thread
From: Phillip Susi @ 2006-01-19 23:26 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

Neil Brown wrote:
> 
> The in-kernel autodetection in md is purely legacy support as far as I
> am concerned.  md does volume detection in user space via 'mdadm'.
> 
> What other "things like" were you thinking of.
> 

Oh, I suppose that's true.  Well, another thing is your new mods to 
support on the fly reshaping, which dm could do from user space.  Then 
of course, there's multipath and snapshots and other lvm things which 
you need dm for, so why use both when one will do?  That's my take on it.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 23:26         ` Phillip Susi
@ 2006-01-19 23:43           ` Neil Brown
  2006-01-20  2:17             ` Phillip Susi
                               ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-19 23:43 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

On Thursday January 19, psusi@cfl.rr.com wrote:
> Neil Brown wrote:
> > 
> > The in-kernel autodetection in md is purely legacy support as far as I
> > am concerned.  md does volume detection in user space via 'mdadm'.
> > 
> > What other "things like" were you thinking of.
> > 
> 
> Oh, I suppose that's true.  Well, another thing is your new mods to 
> support on the fly reshaping, which dm could do from user space.  Then 
> of course, there's multipath and snapshots and other lvm things which 
> you need dm for, so why use both when one will do?  That's my take on it.

Maybe the problem here is thinking of md and dm as different things.
Try just not thinking of them at all.  
Think about it like this:
  The linux kernel support lvm
  The linux kernel support multipath
  The linux kernel support snapshots
  The linux kernel support raid0
  The linux kernel support raid1
  The linux kernel support raid5

Use the bits that you want, and not the bits that you don't.

dm and md are just two different interface styles to various bits of
this.  Neither is clearly better than the other, partly because
different people have different tastes.

Maybe what you really want is for all of these functions to be managed
under the one umbrella application.  I think that is was EVMS tried to
do. 

One big selling point that 'dm' has is 'dmraid' - a tool that allows
you to use a lot of 'fakeraid' cards.  People would like dmraid to
work with raid5 as well, and that is a good goal.
However it doesn't mean that dm needs to get it's own raid5
implementation or that md/raid5 needs to be merged with dm.
It can be achieved by giving md/raid5 the right interfaces so that
metadata can be managed from userspace (and I am nearly there).
Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some
raid levels and 'md' interfaces for others.

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 23:43           ` Neil Brown
@ 2006-01-20  2:17             ` Phillip Susi
  2006-01-20 10:53               ` Lars Marowsky-Bree
  2006-01-20 18:41               ` Heinz Mauelshagen
  2006-01-20 17:29             ` Ross Vandegrift
  2006-01-20 18:36             ` Heinz Mauelshagen
  2 siblings, 2 replies; 72+ messages in thread
From: Phillip Susi @ 2006-01-20  2:17 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

Neil Brown wrote:

>Maybe the problem here is thinking of md and dm as different things.
>Try just not thinking of them at all.  
>Think about it like this:
>  The linux kernel support lvm
>  The linux kernel support multipath
>  The linux kernel support snapshots
>  The linux kernel support raid0
>  The linux kernel support raid1
>  The linux kernel support raid5
>
>Use the bits that you want, and not the bits that you don't.
>
>dm and md are just two different interface styles to various bits of
>this.  Neither is clearly better than the other, partly because
>different people have different tastes.
>
>Maybe what you really want is for all of these functions to be managed
>under the one umbrella application.  I think that is was EVMS tried to
>do. 
>
>  
>

I am under the impression that dm is simpler/cleaner than md.  That 
impression very well may be wrong, but if it is simpler, then that's a 
good thing. 


>One big selling point that 'dm' has is 'dmraid' - a tool that allows
>you to use a lot of 'fakeraid' cards.  People would like dmraid to
>work with raid5 as well, and that is a good goal.
>  
>

AFAIK, the hardware fakeraid solutions on the market don't support raid5 
anyhow ( at least mine doesn't ), so dmraid won't either. 

>However it doesn't mean that dm needs to get it's own raid5
>implementation or that md/raid5 needs to be merged with dm.
>It can be achieved by giving md/raid5 the right interfaces so that
>metadata can be managed from userspace (and I am nearly there).
>Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some
>raid levels and 'md' interfaces for others.
>

Having two sets of interfaces and retrofiting a new interface onto a 
system that wasn't designed for it seems likely to bloat the kernel with 
complex code.  I don't really know if that is the case because I have 
not studied the code, but that's the impression I get, and if it's 
right, then I'd say it is better to stick with dm rather than retrofit 
md.  In either case, it seems overly complex to have to deal with both. 



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20  2:17             ` Phillip Susi
@ 2006-01-20 10:53               ` Lars Marowsky-Bree
  2006-01-20 12:06                 ` Jens Axboe
  2006-01-20 18:38                 ` Heinz Mauelshagen
  2006-01-20 18:41               ` Heinz Mauelshagen
  1 sibling, 2 replies; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-20 10:53 UTC (permalink / raw)
  To: Phillip Susi, Neil Brown
  Cc: Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

On 2006-01-19T21:17:12, Phillip Susi <psusi@cfl.rr.com> wrote:

> I am under the impression that dm is simpler/cleaner than md.  That 
> impression very well may be wrong, but if it is simpler, then that's a 
> good thing. 

That impression is wrong in that general form. Both have advantages and
disadvantages.

I've been an advocate of seeing both of them merged, mostly because I
think it would be beneficial if they'd share the same interface to
user-space to make the tools easier to write and maintain.

However, rewriting the RAID personalities for DM is a thing only a fool
would do without really good cause. Sure, everybody can write a
RAID5/RAID6 parity algorithm. But getting the failure/edge cases stable
is not trivial and requires years of maturing.

Which is why I think gentle evolution of both source bases towards some
common API (for example) is much preferable to reinventing one within
the other.

Oversimplifying to "dm is better than md" is just stupid.

Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20 10:53               ` Lars Marowsky-Bree
@ 2006-01-20 12:06                 ` Jens Axboe
  2006-01-20 18:38                 ` Heinz Mauelshagen
  1 sibling, 0 replies; 72+ messages in thread
From: Jens Axboe @ 2006-01-20 12:06 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On Fri, Jan 20 2006, Lars Marowsky-Bree wrote:
> Oversimplifying to "dm is better than md" is just stupid.

Indeed. But "generally" md is faster and more efficient in the way it
handles ios, it doesn't do any splitting unless it has to.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20 10:53               ` Lars Marowsky-Bree
  2006-01-20 12:06                 ` Jens Axboe
@ 2006-01-20 18:38                 ` Heinz Mauelshagen
  2006-01-20 22:09                   ` Lars Marowsky-Bree
  1 sibling, 1 reply; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-20 18:38 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On Fri, Jan 20, 2006 at 11:53:06AM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-19T21:17:12, Phillip Susi <psusi@cfl.rr.com> wrote:
> 
> > I am under the impression that dm is simpler/cleaner than md.  That 
> > impression very well may be wrong, but if it is simpler, then that's a 
> > good thing. 
> 
> That impression is wrong in that general form. Both have advantages and
> disadvantages.
> 
> I've been an advocate of seeing both of them merged, mostly because I
> think it would be beneficial if they'd share the same interface to
> user-space to make the tools easier to write and maintain.
> 
> However, rewriting the RAID personalities for DM is a thing only a fool
> would do without really good cause.

Thanks Lars ;)

> Sure, everybody can write a
> RAID5/RAID6 parity algorithm. But getting the failure/edge cases stable
> is not trivial and requires years of maturing.
> 
> Which is why I think gentle evolution of both source bases towards some
> common API (for example) is much preferable to reinventing one within
> the other.
> 
> Oversimplifying to "dm is better than md" is just stupid.
> 
> 
> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20 18:38                 ` Heinz Mauelshagen
@ 2006-01-20 22:09                   ` Lars Marowsky-Bree
  2006-01-21  0:06                     ` Heinz Mauelshagen
  0 siblings, 1 reply; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-20 22:09 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Phillip Susi, Neil Brown, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On 2006-01-20T19:38:40, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:

> > However, rewriting the RAID personalities for DM is a thing only a fool
> > would do without really good cause.
> 
> Thanks Lars ;)

Well, I assume you have a really good cause then, don't you? ;-)


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20 22:09                   ` Lars Marowsky-Bree
@ 2006-01-21  0:06                     ` Heinz Mauelshagen
  0 siblings, 0 replies; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-21  0:06 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Heinz Mauelshagen, Phillip Susi, Neil Brown, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Fri, Jan 20, 2006 at 11:09:51PM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-20T19:38:40, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:
> 
> > > However, rewriting the RAID personalities for DM is a thing only a fool
> > > would do without really good cause.
> > 
> > Thanks Lars ;)
> 
> Well, I assume you have a really good cause then, don't you? ;-)

Well, I'll share your assumption ;-)

> 
> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"

-- 

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20  2:17             ` Phillip Susi
  2006-01-20 10:53               ` Lars Marowsky-Bree
@ 2006-01-20 18:41               ` Heinz Mauelshagen
  1 sibling, 0 replies; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-20 18:41 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Neil Brown, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev,
	linux-raid, linux-kernel, Steinar H. Gunderson

On Thu, Jan 19, 2006 at 09:17:12PM -0500, Phillip Susi wrote:
> Neil Brown wrote:
> 
> >Maybe the problem here is thinking of md and dm as different things.
> >Try just not thinking of them at all.  
> >Think about it like this:
> > The linux kernel support lvm
> > The linux kernel support multipath
> > The linux kernel support snapshots
> > The linux kernel support raid0
> > The linux kernel support raid1
> > The linux kernel support raid5
> >
> >Use the bits that you want, and not the bits that you don't.
> >
> >dm and md are just two different interface styles to various bits of
> >this.  Neither is clearly better than the other, partly because
> >different people have different tastes.
> >
> >Maybe what you really want is for all of these functions to be managed
> >under the one umbrella application.  I think that is was EVMS tried to
> >do. 
> >
> > 
> >
> 
> I am under the impression that dm is simpler/cleaner than md.  That 
> impression very well may be wrong, but if it is simpler, then that's a 
> good thing. 
> 
> 
> >One big selling point that 'dm' has is 'dmraid' - a tool that allows
> >you to use a lot of 'fakeraid' cards.  People would like dmraid to
> >work with raid5 as well, and that is a good goal.
> > 
> >
> 
> AFAIK, the hardware fakeraid solutions on the market don't support raid5 
> anyhow ( at least mine doesn't ), so dmraid won't either. 

Well, some do (eg, Nvidia).

> 
> >However it doesn't mean that dm needs to get it's own raid5
> >implementation or that md/raid5 needs to be merged with dm.
> >It can be achieved by giving md/raid5 the right interfaces so that
> >metadata can be managed from userspace (and I am nearly there).
> >Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some
> >raid levels and 'md' interfaces for others.
> >
> 
> Having two sets of interfaces and retrofiting a new interface onto a 
> system that wasn't designed for it seems likely to bloat the kernel with 
> complex code.  I don't really know if that is the case because I have 
> not studied the code, but that's the impression I get, and if it's 
> right, then I'd say it is better to stick with dm rather than retrofit 
> md.  In either case, it seems overly complex to have to deal with both. 

I agree, but dm will need to mature before it'll be able to substitute md.

> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 23:43           ` Neil Brown
  2006-01-20  2:17             ` Phillip Susi
@ 2006-01-20 17:29             ` Ross Vandegrift
  2006-01-20 18:36             ` Heinz Mauelshagen
  2 siblings, 0 replies; 72+ messages in thread
From: Ross Vandegrift @ 2006-01-20 17:29 UTC (permalink / raw)
  To: Neil Brown
  Cc: Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev,
	linux-raid, linux-kernel, Steinar H. Gunderson

On Fri, Jan 20, 2006 at 10:43:13AM +1100, Neil Brown wrote:
> dm and md are just two different interface styles to various bits of
> this.  Neither is clearly better than the other, partly because
> different people have different tastes.

Here's why it's great to have both: they have different toolkits.  I'm
really familiar with md's toolkit.  I can do most anything I need.
But I'll bet that I've never gotten a pvmove to finish sucessfully
because I am doing something wrong and I don't know it.

Becuase we're talking about data integrity, the toolkit issue alone
makes it worth keeping both code paths.  md does 90% of what I need,
so why should I spend the time to learn a new system that doesn't
offer any advantages?

[1] I'm intentionally neglecting the 4k stack issue

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 23:43           ` Neil Brown
  2006-01-20  2:17             ` Phillip Susi
  2006-01-20 17:29             ` Ross Vandegrift
@ 2006-01-20 18:36             ` Heinz Mauelshagen
  2006-01-20 22:57               ` Lars Marowsky-Bree
  2 siblings, 1 reply; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-20 18:36 UTC (permalink / raw)
  To: Neil Brown
  Cc: Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev,
	linux-raid, linux-kernel, Steinar H. Gunderson

On Fri, Jan 20, 2006 at 10:43:13AM +1100, Neil Brown wrote:
> On Thursday January 19, psusi@cfl.rr.com wrote:
> > Neil Brown wrote:
> > > 
> > > The in-kernel autodetection in md is purely legacy support as far as I
> > > am concerned.  md does volume detection in user space via 'mdadm'.
> > > 
> > > What other "things like" were you thinking of.
> > > 
> > 
> > Oh, I suppose that's true.  Well, another thing is your new mods to 
> > support on the fly reshaping, which dm could do from user space.  Then 
> > of course, there's multipath and snapshots and other lvm things which 
> > you need dm for, so why use both when one will do?  That's my take on it.
> 
> Maybe the problem here is thinking of md and dm as different things.
> Try just not thinking of them at all.  
> Think about it like this:
>   The linux kernel support lvm
>   The linux kernel support multipath
>   The linux kernel support snapshots
>   The linux kernel support raid0
>   The linux kernel support raid1
>   The linux kernel support raid5
> 
> Use the bits that you want, and not the bits that you don't.
> 
> dm and md are just two different interface styles to various bits of
> this.  Neither is clearly better than the other, partly because
> different people have different tastes.
> 
> Maybe what you really want is for all of these functions to be managed
> under the one umbrella application.  I think that is was EVMS tried to
> do. 
> 
> One big selling point that 'dm' has is 'dmraid' - a tool that allows
> you to use a lot of 'fakeraid' cards.  People would like dmraid to
> work with raid5 as well, and that is a good goal.
> However it doesn't mean that dm needs to get it's own raid5
> implementation or that md/raid5 needs to be merged with dm.

That's a valid point to make but it can ;)

> It can be achieved by giving md/raid5 the right interfaces so that
> metadata can be managed from userspace (and I am nearly there).

Yeah, and I'm nearly there to have a RAID4 and RAID5 target for dm
(which took advantage of the raid address calculation and the bio to
stripe cache copy code of md raid5).

See http://people.redhat.com/heinzm/sw/dm/dm-raid45/dm-raid45_2.6.15_200601201914.patch.bz2 (no Makefile / no Kconfig changes) for early code reference.

> Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some
> raid levels and 'md' interfaces for others.

Yes, that's possible but there's recommendations to have a native target
for dm to do RAID5, so I started to implement it.

> 
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20 18:36             ` Heinz Mauelshagen
@ 2006-01-20 22:57               ` Lars Marowsky-Bree
  2006-01-21  0:01                 ` Heinz Mauelshagen
  0 siblings, 1 reply; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-20 22:57 UTC (permalink / raw)
  To: Heinz Mauelshagen, Neil Brown
  Cc: Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev,
	linux-raid, linux-kernel, Steinar H. Gunderson

On 2006-01-20T19:36:21, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:

> > Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some
> > raid levels and 'md' interfaces for others.
> Yes, that's possible but there's recommendations to have a native target
> for dm to do RAID5, so I started to implement it.

Can you answer me what the recommendations are based on?

I understand wanting to manage both via the same framework, but
duplicating the code is just ... wrong.

What's gained by it? Why not provide a dm-md wrapper which could then
load/interface to all md personalities?


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20 22:57               ` Lars Marowsky-Bree
@ 2006-01-21  0:01                 ` Heinz Mauelshagen
  2006-01-21  0:03                   ` Lars Marowsky-Bree
  0 siblings, 1 reply; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-21  0:01 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Fri, Jan 20, 2006 at 11:57:24PM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-20T19:36:21, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:
> 
> > > Then 'dmraid' (or a similar tool) can use 'dm' interfaces for some
> > > raid levels and 'md' interfaces for others.
> > Yes, that's possible but there's recommendations to have a native target
> > for dm to do RAID5, so I started to implement it.
> 
> Can you answer me what the recommendations are based on?

Partner requests.

> 
> I understand wanting to manage both via the same framework, but
> duplicating the code is just ... wrong.
> 
> What's gained by it?
>
> Why not provide a dm-md wrapper which could then
> load/interface to all md personalities?
> 

As we want to enrich the mapping flexibility (ie, multi-segment fine grained
mappings) of dm by adding targets as we go, a certain degree and transitional
existence of duplicate code is the price to gain that flexibility.

> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"

Warm regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-21  0:01                 ` Heinz Mauelshagen
@ 2006-01-21  0:03                   ` Lars Marowsky-Bree
  2006-01-21  0:08                     ` Heinz Mauelshagen
  0 siblings, 1 reply; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-21  0:03 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On 2006-01-21T01:01:42, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:

> > Why not provide a dm-md wrapper which could then
> > load/interface to all md personalities?
> As we want to enrich the mapping flexibility (ie, multi-segment fine grained
> mappings) of dm by adding targets as we go, a certain degree and transitional
> existence of duplicate code is the price to gain that flexibility.

A dm-md wrapper would give you the same?


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-21  0:03                   ` Lars Marowsky-Bree
@ 2006-01-21  0:08                     ` Heinz Mauelshagen
  2006-01-21  0:13                       ` Lars Marowsky-Bree
  0 siblings, 1 reply; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-21  0:08 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Sat, Jan 21, 2006 at 01:03:44AM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-21T01:01:42, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:
> 
> > > Why not provide a dm-md wrapper which could then
> > > load/interface to all md personalities?
> > As we want to enrich the mapping flexibility (ie, multi-segment fine grained
> > mappings) of dm by adding targets as we go, a certain degree and transitional
> > existence of duplicate code is the price to gain that flexibility.
> 
> A dm-md wrapper would give you the same?

No, we'ld need to stack more complex to achieve mappings.
Think lvm2 and logical volume level raid5.

> 
> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"

-- 

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-21  0:08                     ` Heinz Mauelshagen
@ 2006-01-21  0:13                       ` Lars Marowsky-Bree
  2006-01-23  9:44                         ` Heinz Mauelshagen
  0 siblings, 1 reply; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-21  0:13 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On 2006-01-21T01:08:06, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:

> > A dm-md wrapper would give you the same?
> No, we'ld need to stack more complex to achieve mappings.
> Think lvm2 and logical volume level raid5.

How would you not get that if you had a wrapper around md which made it
into an dm personality/target?

Besides, stacking between dm devices so far (ie, if I look how kpartx
does it, or LVM2 on top of MPIO etc, which works just fine) is via the
block device layer anyway - and nothing stops you from putting md on top
of LVM2 LVs either.

I use the regularly to play with md and other stuff...

So I remain unconvinced that code duplication is worth it for more than
"hark we want it so!" ;-)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-21  0:13                       ` Lars Marowsky-Bree
@ 2006-01-23  9:44                         ` Heinz Mauelshagen
  2006-01-23 10:26                           ` Lars Marowsky-Bree
  2006-01-23 12:54                           ` Ville Herva
  0 siblings, 2 replies; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-23  9:44 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Sat, Jan 21, 2006 at 01:13:11AM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-21T01:08:06, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:
> 
> > > A dm-md wrapper would give you the same?
> > No, we'ld need to stack more complex to achieve mappings.
> > Think lvm2 and logical volume level raid5.
> 
> How would you not get that if you had a wrapper around md which made it
> into an dm personality/target?

You could with deeper stacking. That's why I mentioned it above.

> 
> Besides, stacking between dm devices so far (ie, if I look how kpartx
> does it, or LVM2 on top of MPIO etc, which works just fine) is via the
> block device layer anyway - and nothing stops you from putting md on top
> of LVM2 LVs either.
> 
> I use the regularly to play with md and other stuff...

Me too but for production, I want to avoid the
additional stacking overhead and complexity.

> 
> So I remain unconvinced that code duplication is worth it for more than
> "hark we want it so!" ;-)

Shall I remove you from the list of potential testers of dm-raid45 then ;-)

> 
> 

-- 

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23  9:44                         ` Heinz Mauelshagen
@ 2006-01-23 10:26                           ` Lars Marowsky-Bree
  2006-01-23 10:38                             ` Heinz Mauelshagen
  2006-01-23 12:54                           ` Ville Herva
  1 sibling, 1 reply; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-23 10:26 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On 2006-01-23T10:44:18, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:

> > Besides, stacking between dm devices so far (ie, if I look how kpartx
> > does it, or LVM2 on top of MPIO etc, which works just fine) is via the
> > block device layer anyway - and nothing stops you from putting md on top
> > of LVM2 LVs either.
> > 
> > I use the regularly to play with md and other stuff...
> 
> Me too but for production, I want to avoid the
> additional stacking overhead and complexity.

Ok, I still didn't get that. I must be slow.

Did you implement some DM-internal stacking now to avoid the above
mentioned complexity? 

Otherwise, even DM-on-DM is still stacked via the block device
abstraction...


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 10:26                           ` Lars Marowsky-Bree
@ 2006-01-23 10:38                             ` Heinz Mauelshagen
  2006-01-23 10:45                               ` Lars Marowsky-Bree
  0 siblings, 1 reply; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-23 10:38 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Mon, Jan 23, 2006 at 11:26:01AM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-23T10:44:18, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:
> 
> > > Besides, stacking between dm devices so far (ie, if I look how kpartx
> > > does it, or LVM2 on top of MPIO etc, which works just fine) is via the
> > > block device layer anyway - and nothing stops you from putting md on top
> > > of LVM2 LVs either.
> > > 
> > > I use the regularly to play with md and other stuff...
> > 
> > Me too but for production, I want to avoid the
> > additional stacking overhead and complexity.
> 
> Ok, I still didn't get that. I must be slow.
> 
> Did you implement some DM-internal stacking now to avoid the above
> mentioned complexity? 
> 
> Otherwise, even DM-on-DM is still stacked via the block device
> abstraction...

No, not necessary because a single-level raid4/5 mapping will do it.
Ie. it supports <offset> parameters in the constructor as other targets
do as well (eg. mirror or linear).

> 
> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"

-- 

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 10:38                             ` Heinz Mauelshagen
@ 2006-01-23 10:45                               ` Lars Marowsky-Bree
  2006-01-23 11:00                                 ` Heinz Mauelshagen
  0 siblings, 1 reply; 72+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-23 10:45 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Neil Brown, Phillip Susi, Jan Engelhardt, Lincoln Dale (ltd),
	Michael Tokarev, linux-raid, linux-kernel, Steinar H. Gunderson

On 2006-01-23T11:38:51, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:

> > Ok, I still didn't get that. I must be slow.
> > 
> > Did you implement some DM-internal stacking now to avoid the above
> > mentioned complexity? 
> > 
> > Otherwise, even DM-on-DM is still stacked via the block device
> > abstraction...
> 
> No, not necessary because a single-level raid4/5 mapping will do it.
> Ie. it supports <offset> parameters in the constructor as other targets
> do as well (eg. mirror or linear).

An dm-md wrapper would not support such a basic feature (which is easily
added to md too) how?

I mean, "I'm rewriting it because I want to and because I understand and
own the code then" is a perfectly legitimate reason, but let's please
not pretend there's really sound and good technical reasons ;-)


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 10:45                               ` Lars Marowsky-Bree
@ 2006-01-23 11:00                                 ` Heinz Mauelshagen
  0 siblings, 0 replies; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-23 11:00 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Heinz Mauelshagen, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Mon, Jan 23, 2006 at 11:45:22AM +0100, Lars Marowsky-Bree wrote:
> On 2006-01-23T11:38:51, Heinz Mauelshagen <mauelshagen@redhat.com> wrote:
> 
> > > Ok, I still didn't get that. I must be slow.
> > > 
> > > Did you implement some DM-internal stacking now to avoid the above
> > > mentioned complexity? 
> > > 
> > > Otherwise, even DM-on-DM is still stacked via the block device
> > > abstraction...
> > 
> > No, not necessary because a single-level raid4/5 mapping will do it.
> > Ie. it supports <offset> parameters in the constructor as other targets
> > do as well (eg. mirror or linear).
> 
> An dm-md wrapper would not support such a basic feature (which is easily
> added to md too) how?
> 
> I mean, "I'm rewriting it because I want to and because I understand and
> own the code then" is a perfectly legitimate reason

Sure :-)

>, but let's please
> not pretend there's really sound and good technical reasons ;-)

Mind you that there's no need to argue about that:
this is based on requests to do it.

> 
> 
> Sincerely,
>     Lars Marowsky-Brée
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"

-- 

Regards,
Heinz    -- The LVM Guy --

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23  9:44                         ` Heinz Mauelshagen
  2006-01-23 10:26                           ` Lars Marowsky-Bree
@ 2006-01-23 12:54                           ` Ville Herva
  2006-01-23 13:00                             ` Steinar H. Gunderson
                                               ` (2 more replies)
  1 sibling, 3 replies; 72+ messages in thread
From: Ville Herva @ 2006-01-23 12:54 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Lars Marowsky-Bree, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Mon, Jan 23, 2006 at 10:44:18AM +0100, you [Heinz Mauelshagen] wrote:
> > 
> > I use the regularly to play with md and other stuff...
> 
> Me too but for production, I want to avoid the
> additional stacking overhead and complexity.
> 
> > So I remain unconvinced that code duplication is worth it for more than
> > "hark we want it so!" ;-)
> 
> Shall I remove you from the list of potential testers of dm-raid45 then ;-)

Heinz, 

If you really want the rest of us to convert from md to lvm, you should
perhaps give some attention to thee brittle userland (scripts and and
binaries).

It is very tedious to have to debug a production system for a few hours in
order to get the rootfs mounted after each kernel update. 

The lvm error messages give almost no clue on the problem. 

Worse yet, problem reports on these issues are completely ignored on the lvm
mailing list, even when a patch is attached.

(See
 http://marc.theaimsgroup.com/?l=linux-lvm&m=113775502821403&w=2
 http://linux.msede.com/lvm_mlist/archive/2001/06/0205.html
 http://linux.msede.com/lvm_mlist/archive/2001/06/0271.html
 for reference.)

Such experience gives an impression lvm is not yet ready for serious
production use.

No offense intended, lvm kernel (lvm1 nor lvm2) code has never given me
trouble, and is probably as solid as anything. 

-- v -- 

v@iki.fi

PS: Speaking of debugging failing initrd init scripts; it would be nice if
the kernel gave an error message on wrong initrd format rather than silently
failing... Yes, I forgot to make the cpio with the "-H newc" option :-/.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 12:54                           ` Ville Herva
@ 2006-01-23 13:00                             ` Steinar H. Gunderson
  2006-01-23 13:54                             ` Heinz Mauelshagen
  2006-01-24  2:02                             ` Phillip Susi
  2 siblings, 0 replies; 72+ messages in thread
From: Steinar H. Gunderson @ 2006-01-23 13:00 UTC (permalink / raw)
  To: Ville Herva
  Cc: Heinz Mauelshagen, Lars Marowsky-Bree, Neil Brown, Phillip Susi,
	Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel

On Mon, Jan 23, 2006 at 02:54:20PM +0200, Ville Herva wrote:
> If you really want the rest of us to convert from md to lvm, you should
> perhaps give some attention to thee brittle userland (scripts and and
> binaries).

If you do not like the LVM userland, you might want to try the EVMS userland,
which uses the same kernel code and (mostly) the same on-disk formats, but
has a different front-end.

> It is very tedious to have to debug a production system for a few hours in
> order to get the rootfs mounted after each kernel update. 

This sounds a bit like an issue with your distribution, which should normally
fix initrd/initramfs issues for you.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 12:54                           ` Ville Herva
  2006-01-23 13:00                             ` Steinar H. Gunderson
@ 2006-01-23 13:54                             ` Heinz Mauelshagen
  2006-01-23 17:33                               ` Ville Herva
  2006-01-24  2:02                             ` Phillip Susi
  2 siblings, 1 reply; 72+ messages in thread
From: Heinz Mauelshagen @ 2006-01-23 13:54 UTC (permalink / raw)
  To: Ville Herva
  Cc: Heinz Mauelshagen, Lars Marowsky-Bree, Neil Brown, Phillip Susi,
	Jan Engelhardt, Lincoln Dale (ltd), Michael Tokarev, linux-raid,
	linux-kernel, Steinar H. Gunderson

On Mon, Jan 23, 2006 at 02:54:20PM +0200, Ville Herva wrote:
> On Mon, Jan 23, 2006 at 10:44:18AM +0100, you [Heinz Mauelshagen] wrote:
> > > 
> > > I use the regularly to play with md and other stuff...
> > 
> > Me too but for production, I want to avoid the
> > additional stacking overhead and complexity.
> > 
> > > So I remain unconvinced that code duplication is worth it for more than
> > > "hark we want it so!" ;-)
> > 
> > Shall I remove you from the list of potential testers of dm-raid45 then ;-)
> 
> Heinz, 
> 
> If you really want the rest of us to convert from md to lvm, you should
> perhaps give some attention to thee brittle userland (scripts and and
> binaries).

Sure :-)

> 
> It is very tedious to have to debug a production system for a few hours in
> order to get the rootfs mounted after each kernel update. 
> 
> The lvm error messages give almost no clue on the problem. 
> 
> Worse yet, problem reports on these issues are completely ignored on the lvm
> mailing list, even when a patch is attached.
> 
> (See
>  http://marc.theaimsgroup.com/?l=linux-lvm&m=113775502821403&w=2
>  http://linux.msede.com/lvm_mlist/archive/2001/06/0205.html
>  http://linux.msede.com/lvm_mlist/archive/2001/06/0271.html
>  for reference.)

Hrm, those are initscripts related, not lvm directly

> 
> Such experience gives an impression lvm is not yet ready for serious
> production use.

initscripts/initramfs surely need to do the right thing
in case root is on lvm.

> 
> No offense intended, lvm kernel (lvm1 nor lvm2) code has never given me
> trouble, and is probably as solid as anything. 

Alright.
Is the initscript issue fixed now or still open ?
Had you filed a bug against the distros initscripts ?

> 
> 
> -- v -- 
> 
> v@iki.fi
> 
> PS: Speaking of debugging failing initrd init scripts; it would be nice if
> the kernel gave an error message on wrong initrd format rather than silently
> failing... Yes, I forgot to make the cpio with the "-H newc" option :-/.

-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
Cluster and Storage Development                   56242 Marienrachdorf
                                                  Germany
Mauelshagen@RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 13:54                             ` Heinz Mauelshagen
@ 2006-01-23 17:33                               ` Ville Herva
  0 siblings, 0 replies; 72+ messages in thread
From: Ville Herva @ 2006-01-23 17:33 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Lars Marowsky-Bree, Neil Brown, Phillip Susi, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Mon, Jan 23, 2006 at 02:54:28PM +0100, you [Heinz Mauelshagen] wrote:
> > 
> > It is very tedious to have to debug a production system for a few hours in
> > order to get the rootfs mounted after each kernel update. 
> > 
> > The lvm error messages give almost no clue on the problem. 
> > 
> > Worse yet, problem reports on these issues are completely ignored on the lvm
> > mailing list, even when a patch is attached.
> > 
> > (See
> >  http://marc.theaimsgroup.com/?l=linux-lvm&m=113775502821403&w=2
> >  http://linux.msede.com/lvm_mlist/archive/2001/06/0205.html
> >  http://linux.msede.com/lvm_mlist/archive/2001/06/0271.html
> >  for reference.)
> 
> Hrm, those are initscripts related, not lvm directly

With the ancient LVM1 issue, my main problem was indeed that mkinitrd did
not reserve enough space for the initrd. The LVM issue I posted to the LVM
list was that LVM userland (vg_cfgbackup.c) did not check for errors while
writing to the fs. The (ignored) patch added some error checking.

But that's ancient, I think we can forget about that.

The current issue (please see the first link) is about the need to add
a "sleep 5" between 
 lvm vgmknodes
and
 mount -o defaults --ro -t ext3 /dev/root /sysroot 
. 

Otherwise, mounting fails. (Actually, I added "sleep 5" after every lvm
command in the init script and did not narrow it down any more, since this
was a production system, each boot took ages, and I had to get the system up
as soon as possible.)

To me it seemed some kind of problem with the lvm utilities, not with the
initscripts. At least, the correct solution cannot be adding "sleep 5" here
and there in the initscripts...

> Alright.
> Is the initscript issue fixed now or still open ?

It is still open.

Sadly, the only two systems this currently happens are production boxes and
I cannot boot them at will for debugging. It is, however, 100% reproducible
and I can try reasonable suggestions when I boot them the next time. Sorry
about this.

> Had you filed a bug against the distros initscripts ?

No, since I wasn't sure the problem actually was in the initscript. Perhaps
it does do something wrong, but the "sleep 5" workaround is pretty
suspicious.

Thanks for the reply.

-- v -- 

v@iki.fi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23 12:54                           ` Ville Herva
  2006-01-23 13:00                             ` Steinar H. Gunderson
  2006-01-23 13:54                             ` Heinz Mauelshagen
@ 2006-01-24  2:02                             ` Phillip Susi
  2 siblings, 0 replies; 72+ messages in thread
From: Phillip Susi @ 2006-01-24  2:02 UTC (permalink / raw)
  To: vherva
  Cc: Heinz Mauelshagen, Lars Marowsky-Bree, Neil Brown, Jan Engelhardt,
	Lincoln Dale (ltd), Michael Tokarev, linux-raid, linux-kernel,
	Steinar H. Gunderson

Ville Herva wrote:
> PS: Speaking of debugging failing initrd init scripts; it would be nice if
> the kernel gave an error message on wrong initrd format rather than silently
> failing... Yes, I forgot to make the cpio with the "-H newc" option :-/.
>   

LOL, yea, that one got me too when I was first getting back into linux a 
few months ago and had to customize my initramfs to include dmraid to 
recognize my hardware fakeraid raid0.  Then I discovered the mkinitramfs 
utility which makes things much nicer ;)



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19 22:32       ` Neil Brown
  2006-01-19 23:26         ` Phillip Susi
@ 2006-01-20  7:51         ` Reuben Farrelly
  2006-01-20  3:43           ` Andre' Breiler
  1 sibling, 1 reply; 72+ messages in thread
From: Reuben Farrelly @ 2006-01-20  7:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 20/01/2006 11:32 a.m., Neil Brown wrote:
> On Thursday January 19, psusi@cfl.rr.com wrote:
>> I'm currently of the opinion that dm needs a raid5 and raid6 module 
>> added, then the user land lvm tools fixed to use them, and then you 
>> could use dm instead of md.  The benefit being that dm pushes things 
>> like volume autodetection and management out of the kernel to user space 
>> where it belongs.  But that's just my opinion...
> 
> The in-kernel autodetection in md is purely legacy support as far as I
> am concerned.  md does volume detection in user space via 'mdadm'.

Hrm.  <puzzled look>   How would I then start my md0 raid-1 array that is 
mounted as the root partition / if I'm not doing this when the kernel is 
starting up?  Because without it I've got no userspace to actually execute.

Some of the other arrays with things like /var and /home could obviously be 
easily assembled soon after the kernel hands over control to userspace before 
the filesystem points are mounted, but for the root I am not quite sure how it 
could work...

reuben

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20  7:51         ` Reuben Farrelly
@ 2006-01-20  3:43           ` Andre' Breiler
  2006-01-21  0:42             ` David Greaves
  0 siblings, 1 reply; 72+ messages in thread
From: Andre' Breiler @ 2006-01-20  3:43 UTC (permalink / raw)
  To: linux-raid

Hi,

On Fri, 20 Jan 2006, Reuben Farrelly wrote:

> On 20/01/2006 11:32 a.m., Neil Brown wrote:
> >
> > The in-kernel autodetection in md is purely legacy support as far as I
> > am concerned.  md does volume detection in user space via 'mdadm'.
>
> Hrm.  <puzzled look>   How would I then start my md0 raid-1 array that is
> mounted as the root partition / if I'm not doing this when the kernel is
> starting up?  Because without it I've got no userspace to actually execute.

Indeed you won't be able to use a 'plain kernel' anymore but switch to
kernel + initrd.
This adds a good amount of useful flexibility (but makes life a little bit
harder).

The autodetect feature got discussed multiple time with pro/cons for it
(so I will skip it there) with the overall outcome that it's a to
dangerous feature to leave in kernel space (btw. on some architectures
it doesn't work at all anyway) in the long term.

> Some of the other arrays with things like /var and /home could obviously be
> easily assembled soon after the kernel hands over control to userspace before
> the filesystem points are mounted, but for the root I am not quite sure
> how it
> could work...

A simple initrd. If you look at random distributions (e.g. Debian) you
will see it done that way (yes I don't think it's perfect as it is yet).

Personally on systems which change often I run without auto detect.
On systems which hardly change and if get wiped disks I use the
autodetection.

Andre'

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-20  3:43           ` Andre' Breiler
@ 2006-01-21  0:42             ` David Greaves
  0 siblings, 0 replies; 72+ messages in thread
From: David Greaves @ 2006-01-21  0:42 UTC (permalink / raw)
  To: Andre' Breiler, neilb; +Cc: linux-raid

Andre' Breiler wrote:

>Hi,
>
>On Fri, 20 Jan 2006, Reuben Farrelly wrote:
>  
>
>>On 20/01/2006 11:32 a.m., Neil Brown wrote:
>>    
>>
>>>The in-kernel autodetection in md is purely legacy support as far as I
>>>am concerned.  md does volume detection in user space via 'mdadm'.
>>>      
>>>
>>Hrm.  <puzzled look>   How would I then start my md0 raid-1 array that is
>>mounted as the root partition / if I'm not doing this when the kernel is
>>starting up?  Because without it I've got no userspace to actually execute.
>>    
>>
>
>Indeed you won't be able to use a 'plain kernel' anymore but switch to
>kernel + initrd.
>  
>
I understand the anti-autodetect arguments and was easily persuaded by
them...

Can I however suggest  we have autodetect only actually triggers if the
kernel is supplied with the UUID of an md0 as a boot option.
ie: root=/dev/md0 md0=b87a211e:8a9bef34:ccc334d8:f84216d6

I'm not sure I ever saw this proposed during the debate.

It's just *so* nice not to have to bother with initrd (having just moved
to a mirrored root and finding it wonderfully easy to specify
root=/dev/md0 and have it 'just work')

It looks easy enough to add into autorun_devices() in md.c

Maybe then deprecate autodetect more quickly by having it optional in
2.6.17(?) - but if even one UUID *is* specified then no other devices
are autodetected. Eventually having no devices autodetected *unless* a
UUID is specified (for convenience printk the UUIDs of devices that
*are* found just in case you mistype it...)

Please forgive me if I've missed the reason that this is a bad idea.

David

-- 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 000 of 5] md: Introduction
@ 2006-01-17  6:56 NeilBrown
  2006-01-17  8:17 ` Michael Tokarev
                   ` (3 more replies)
  0 siblings, 4 replies; 72+ messages in thread
From: NeilBrown @ 2006-01-17  6:56 UTC (permalink / raw)
  To: linux-raid, linux-kernel; +Cc: Steinar H. Gunderson

Greetings.

In line with the principle of "release early", following are 5 patches
against md in 2.6.latest which implement reshaping of a raid5 array.
By this I mean adding 1 or more drives to the array and then re-laying
out all of the data.

This is still EXPERIMENTAL and could easily eat your data.  Don't use it on
valuable data.  Only use it for review and testing.

This release does not make ANY attempt to record how far the reshape
has progressed on stable storage.  That means that if the process is
interrupted either by a crash or by "mdadm -S", then you completely
lose your data.  All of it.
So don't use it on valuable data.

There are 5 patches to (hopefully) ease review.  Comments are most
welcome, as are test results (providing they aren't done on valuable data:-).

You will need to enable the experimental MD_RAID5_RESHAPE config option
for this to work.  Please read the help message that come with it.  
It gives an example mdadm command to effect a reshape (you do not need
a new mdadm, and vaguely recent version should work).

This code is based in part on earlier work by
  "Steinar H. Gunderson" <sgunderson@bigfoot.com>
Though little of his code remains, having access to it, and having
discussed the issues with him greatly eased the processed of creating
these patches.  Thanks Steinar.

NeilBrown

 [PATCH 001 of 5] md: Split disks array out of raid5 conf structure so it is easier to grow.
 [PATCH 002 of 5] md: Allow stripes to be expanded in preparation for expanding an array.
 [PATCH 003 of 5] md: Infrastructure to allow normal IO to continue while array is expanding.
 [PATCH 004 of 5] md: Core of raid5 resize process
 [PATCH 005 of 5] md: Final stages of raid5 expand code.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  6:56 NeilBrown
@ 2006-01-17  8:17 ` Michael Tokarev
       [not found]   ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com>
                     ` (2 more replies)
  2006-01-17 15:07 ` Mr. James W. Laferriere
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 72+ messages in thread
From: Michael Tokarev @ 2006-01-17  8:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson

NeilBrown wrote:
> Greetings.
> 
> In line with the principle of "release early", following are 5 patches
> against md in 2.6.latest which implement reshaping of a raid5 array.
> By this I mean adding 1 or more drives to the array and then re-laying
> out all of the data.

Neil, is this online resizing/reshaping really needed?  I understand
all those words means alot for marketing persons - zero downtime,
online resizing etc, but it is much safer and easier to do that stuff
'offline', on an inactive array, like raidreconf does - safer, easier,
faster, and one have more possibilities for more complex changes.  It
isn't like you want to add/remove drives to/from your arrays every day...
Alot of good hw raid cards are unable to perform such reshaping too.

/mjt

^ permalink raw reply	[flat|nested] 72+ messages in thread

[parent not found: <fd8d0180601170121s1e6a55b7o@mail.gmail.com>]

* [PATCH 000 of 5] md: Introduction
       [not found]   ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com>
@ 2006-01-17  9:38     ` Francois Barre
  2006-01-19  0:35       ` Neil Brown
  0 siblings, 1 reply; 72+ messages in thread
From: Francois Barre @ 2006-01-17  9:38 UTC (permalink / raw)
  To: linux-raid

2006/1/17, Michael Tokarev <mjt@tls.msk.ru>:
> NeilBrown wrote:
> > Greetings.
> >
> > In line with the principle of "release early", following are 5 patches
> > against md in 2.6.latest which implement reshaping of a raid5 array.
> > By this I mean adding 1 or more drives to the array and then re-laying
> > out all of the data.
>
> Neil, is this online resizing/reshaping really needed?

Congratulations Neil, I was really expecting this feature, and will
test as soon as possible.
IMHO, being able to resize 'online' is really interesting, so I would
thank Neil and Steinar much more than I would blame them, Michael :-p.
Regarding box crash and process interruption, what is the remaining
work to be done to save the process status efficiently, in order to
resume resize process ?
In my case, I would really wish to trust resizing enough to use it on
working env.
May I help you ?

Anyway, the patchset you submitted appeared to me so clearly, neat and
simple, that it looks a piece of cake to make it secure. I know it's
wrong, but you know, you can take it as a congratulation for your code
quality :-p

Regards,

F.-E.B.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  9:38     ` Francois Barre
@ 2006-01-19  0:35       ` Neil Brown
  0 siblings, 0 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-19  0:35 UTC (permalink / raw)
  To: Francois Barre; +Cc: linux-raid

On Tuesday January 17, francois.barre@gmail.com wrote:
> Regarding box crash and process interruption, what is the remaining
> work to be done to save the process status efficiently, in order to
> resume resize process ?

design and implement ...
It's not particularly hard, but it is a separate task and I wanted to
keep it separate to ease code review.

> In my case, I would really wish to trust resizing enough to use it on
> working env.
> May I help you ?

Testing and code review is probably the most helpful thing you can do,
thanks.


> 
> Anyway, the patchset you submitted appeared to me so clearly, neat and
> simple, that it looks a piece of cake to make it secure. I know it's
> wrong, but you know, you can take it as a congratulation for your code
> quality :-p

Thanks :-)

NeilBrown 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  8:17 ` Michael Tokarev
       [not found]   ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com>
@ 2006-01-17  9:50   ` Sander
  2006-01-17 11:26     ` Michael Tokarev
  2006-01-17 14:10   ` Steinar H. Gunderson
  2 siblings, 1 reply; 72+ messages in thread
From: Sander @ 2006-01-17  9:50 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson

Michael Tokarev wrote (ao):
> NeilBrown wrote:
> > Greetings.
> > 
> > In line with the principle of "release early", following are 5
> > patches against md in 2.6.latest which implement reshaping of a
> > raid5 array. By this I mean adding 1 or more drives to the array and
> > then re-laying out all of the data.
> 
> Neil, is this online resizing/reshaping really needed? I understand
> all those words means alot for marketing persons - zero downtime,
> online resizing etc, but it is much safer and easier to do that stuff
> 'offline', on an inactive array, like raidreconf does - safer, easier,
> faster, and one have more possibilities for more complex changes. It
> isn't like you want to add/remove drives to/from your arrays every
> day... Alot of good hw raid cards are unable to perform such reshaping
> too.

I like the feature. Not only marketing prefers zero downtime you know :-)

Actually, I don't understand why you bother at all. One writes the
feature. Another uses it. How would this feature harm you?

	Kind regards, Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  9:50   ` Sander
@ 2006-01-17 11:26     ` Michael Tokarev
  2006-01-17 11:37       ` Francois Barre
                         ` (3 more replies)
  0 siblings, 4 replies; 72+ messages in thread
From: Michael Tokarev @ 2006-01-17 11:26 UTC (permalink / raw)
  To: sander; +Cc: NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson

Sander wrote:
> Michael Tokarev wrote (ao):
[]
>>Neil, is this online resizing/reshaping really needed? I understand
>>all those words means alot for marketing persons - zero downtime,
>>online resizing etc, but it is much safer and easier to do that stuff
>>'offline', on an inactive array, like raidreconf does - safer, easier,
>>faster, and one have more possibilities for more complex changes. It
>>isn't like you want to add/remove drives to/from your arrays every
>>day... Alot of good hw raid cards are unable to perform such reshaping
>>too.
[]
> Actually, I don't understand why you bother at all. One writes the
> feature. Another uses it. How would this feature harm you?

This is about code complexity/bloat.  It's already complex enouth.
I rely on the stability of the linux softraid subsystem, and want
it to be reliable. Adding more features, especially non-trivial
ones, does not buy you bugfree raid subsystem, just the opposite:
it will have more chances to crash, to eat your data etc, and will
be harder in finding/fixing bugs.

Raid code is already too fragile, i'm afraid "simple" I/O errors
(which is what we need raid for) may crash the system already, and
am waiting for the next whole system crash due to eg superblock
update error or whatnot.  I saw all sorts of failures due to
linux softraid already (we use it here alot), including ones
which required complete array rebuild with heavy data loss.

Any "unnecessary bloat" (note the quotes: I understand some
people like this and other features) makes whole system even
more fragile than it is already.

Compare this with my statement about "offline" "reshaper" above:
separate userspace (easier to write/debug compared with kernel
space) program which operates on an inactive array (no locking
needed, no need to worry about other I/O operations going to the
array at the time of reshaping etc), with an ability to plan it's
I/O strategy in alot more efficient and safer way...  Yes this
apprpach has one downside: the array has to be inactive.  But in
my opinion it's worth it, compared to more possibilities to lose
your data, even if you do NOT use that feature at all...

/mjt

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 11:26     ` Michael Tokarev
@ 2006-01-17 11:37       ` Francois Barre
  2006-01-17 14:03       ` Kyle Moffett
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Francois Barre @ 2006-01-17 11:37 UTC (permalink / raw)
  To: linux-raid

2006/1/17, Michael Tokarev <mjt@tls.msk.ru>:
> Sander wrote:
> This is about code complexity/bloat.  It's already complex enouth.
> I rely on the stability of the linux softraid subsystem, and want
> it to be reliable. Adding more features, especially non-trivial
> ones, does not buy you bugfree raid subsystem, just the opposite:
> it will have more chances to crash, to eat your data etc, and will
> be harder in finding/fixing bugs.
>
> Raid code is already too fragile, i'm afraid "simple" I/O errors
> (which is what we need raid for) may crash the system already, and
> am waiting for the next whole system crash due to eg superblock
> update error or whatnot.  I saw all sorts of failures due to
> linux softraid already (we use it here alot), including ones
> which required complete array rebuild with heavy data loss.
>
> Any "unnecessary bloat" (note the quotes: I understand some
> people like this and other features) makes whole system even
> more fragile than it is already.
>
> Compare this with my statement about "offline" "reshaper" above:
> separate userspace (easier to write/debug compared with kernel
> space) program which operates on an inactive array (no locking
> needed, no need to worry about other I/O operations going to the
> array at the time of reshaping etc), with an ability to plan it's
> I/O strategy in alot more efficient and safer way...  Yes this
> apprpach has one downside: the array has to be inactive.  But in
> my opinion it's worth it, compared to more possibilities to lose
> your data, even if you do NOT use that feature at all...
>
> /mjt

I do agree with you about that : the lesser the code, the fewer the
bugs. Of course.
But I do think that Linux would not have become what it is now if
each-and-every new feature was debated for years and years about their
risk.
Anyway, having a suspiously bogus raid5 resize is not a fatality :
what about having a 'paranoïd' option/strategy, decreasing performance
but mirroring superblocks on another device (not necessarily on the
array), logging/journalling loads of stuff (metadata quite
exclusively), and making resize much much stronger ?
I prefer having my raid5 online and have a resize time of 12 hours
than putting it offline and have a resize time of 2 hours.
This is not marketting, this is the way computers shall behave :-p.
Trustworthy features and flexibility in the same box.

F.-E.B.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 11:26     ` Michael Tokarev
  2006-01-17 11:37       ` Francois Barre
@ 2006-01-17 14:03       ` Kyle Moffett
  2006-01-19  0:28         ` Neil Brown
  2006-01-17 16:08       ` Ross Vandegrift
  2006-01-17 22:38       ` Phillip Susi
  3 siblings, 1 reply; 72+ messages in thread
From: Kyle Moffett @ 2006-01-17 14:03 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: sander, NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson

On Jan 17, 2006, at 06:26, Michael Tokarev wrote:
> This is about code complexity/bloat.  It's already complex enouth.  
> I rely on the stability of the linux softraid subsystem, and want  
> it to be reliable. Adding more features, especially non-trivial  
> ones, does not buy you bugfree raid subsystem, just the opposite:  
> it will have more chances to crash, to eat your data etc, and will  
> be harder in finding/fixing bugs.

What part of: "You will need to enable the experimental  
MD_RAID5_RESHAPE config option for this to work." isn't bvious?  If  
you don't want this feature, either don't turn on  
CONFIG_MD_RAID5_RESHAPE, or don't use the raid5 mdadm reshaping  
command.  This feature might be extremely useful for some people  
(including me on occasion), but I would not trust it even on my  
family's fileserver (let alone a corporate one) until it's been  
through several generations of testing and bugfixing.


Cheers,
Kyle Moffett

--
There is no way to make Linux robust with unreliable memory  
subsystems, sorry.  It would be like trying to make a human more  
robust with an unreliable O2 supply. Memory just has to work.
   -- Andi Kleen



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 14:03       ` Kyle Moffett
@ 2006-01-19  0:28         ` Neil Brown
  0 siblings, 0 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-19  0:28 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Michael Tokarev, sander, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Tuesday January 17, mrmacman_g4@mac.com wrote:
> On Jan 17, 2006, at 06:26, Michael Tokarev wrote:
> > This is about code complexity/bloat.  It's already complex enouth.  
> > I rely on the stability of the linux softraid subsystem, and want  
> > it to be reliable. Adding more features, especially non-trivial  
> > ones, does not buy you bugfree raid subsystem, just the opposite:  
> > it will have more chances to crash, to eat your data etc, and will  
> > be harder in finding/fixing bugs.
> 
> What part of: "You will need to enable the experimental  
> MD_RAID5_RESHAPE config option for this to work." isn't bvious?  If  
> you don't want this feature, either don't turn on  
> CONFIG_MD_RAID5_RESHAPE, or don't use the raid5 mdadm reshaping  
> command.

This isn't really a fair comment.  CONFIG_MD_RAID5_RESHAPE just
enables the code.  All the code is included whether this config option
is set or not.  So if code-bloat were an issue, the config option
wouldn't answer it.

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 11:26     ` Michael Tokarev
  2006-01-17 11:37       ` Francois Barre
  2006-01-17 14:03       ` Kyle Moffett
@ 2006-01-17 16:08       ` Ross Vandegrift
  2006-01-17 18:12         ` Michael Tokarev
  2006-01-17 22:38       ` Phillip Susi
  3 siblings, 1 reply; 72+ messages in thread
From: Ross Vandegrift @ 2006-01-17 16:08 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid, linux-kernel

On Tue, Jan 17, 2006 at 02:26:11PM +0300, Michael Tokarev wrote:
> Raid code is already too fragile, i'm afraid "simple" I/O errors
> (which is what we need raid for) may crash the system already, and
> am waiting for the next whole system crash due to eg superblock
> update error or whatnot.

I think you've got some other issue if simple I/O errors cause issues.
I've managed hundreds of MD arrays over the past ~ten years.  MD is
rock solid.  I'd guess that I've recovered at least a hundred disk failures
where data was saved by mdadm.

What is your setup like?  It's also possible that you've found a bug.

> I saw all sorts of failures due to
> linux softraid already (we use it here alot), including ones
> which required complete array rebuild with heavy data loss.

Are you sure?  The one thing that's not always intuitive about MD - a
faild array often still has your data and you can recover it.  Unlike
hardware RAID solutions, you have a lot of control over how the disks
are assembled and used - this can be a major advantage.

I'd say once a week someone comes on the linux-raid list and says "Oh no!
I accidently ruined my RAID array!".  Neil almost always responds "Well,
don't do that!  But since you did, this might help...".

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 16:08       ` Ross Vandegrift
@ 2006-01-17 18:12         ` Michael Tokarev
  2006-01-18  8:14           ` Sander
  2006-01-19  0:22           ` Neil Brown
  0 siblings, 2 replies; 72+ messages in thread
From: Michael Tokarev @ 2006-01-17 18:12 UTC (permalink / raw)
  To: Ross Vandegrift; +Cc: linux-raid, linux-kernel

Ross Vandegrift wrote:
> On Tue, Jan 17, 2006 at 02:26:11PM +0300, Michael Tokarev wrote:
> 
>>Raid code is already too fragile, i'm afraid "simple" I/O errors
>>(which is what we need raid for) may crash the system already, and
>>am waiting for the next whole system crash due to eg superblock
>>update error or whatnot.
> 
> I think you've got some other issue if simple I/O errors cause issues.
> I've managed hundreds of MD arrays over the past ~ten years.  MD is
> rock solid.  I'd guess that I've recovered at least a hundred disk failures
> where data was saved by mdadm.
> 
> What is your setup like?  It's also possible that you've found a bug.

We've about 500 systems with raid1, raid5 and raid10 running for about
5 or 6 years (since 0.90 beta patched into 2.2 kernel -- I don't think
linux softraid existed before, or, rather, I can't say that was something
which was possible to use in production).

Most problematic case so far, which I described numerous times (like,
"why linux raid isn't Raid really, why it can be worse than plain disk")
is when, after single sector read failure, md kicks the whole disk off
the array, and when you start resync (after replacing the "bad" drive or
just remapping that bad sector or even doing nothing, as it will be
remapped in almost all cases during write, on real drives anyway),
you find another "bad sector" on another drive.  After this, the array
can't be started anymore, at least not w/o --force (ie, requires some
user intervention, which is sometimes quite difficult if the server
is several 100s miles away).  More, it's quite difficult to recover
it even manually (after --force'ing it to start), without fixing that
bad sector somehow -- if first drive failure is "recent enouth" we've
a hope that this very sector can be read from that first drive. if
the alot of filesystem activity happened since that time, that chances
are quite small; and with raid5 it's quite difficult to say where the
error is in the filesystem, due to the complex layout of raid5.

But this has been described here numerous times, and - hopefully -
with current changes (re-writing of bad blocks) this very issue will
go away, at least most common scenario of it (i'd try to keep even
"bad" drive, even after some write errors, because it still contains
some data which can be read; but that's problematic to say the best
because one has to store a list of bad blocks somewhere...).

(And no, I don't have all bad/cheap drives - it's just when you have
hundreds or 1000s of drives, you've quite high probability that some
of them will fail sometimes, or will develop a bad sector etc).

>>I saw all sorts of failures due to
>>linux softraid already (we use it here alot), including ones
>>which required complete array rebuild with heavy data loss.
> 
> Are you sure?  The one thing that's not always intuitive about MD - a
> faild array often still has your data and you can recover it.  Unlike
> hardware RAID solutions, you have a lot of control over how the disks
> are assembled and used - this can be a major advantage.
> 
> I'd say once a week someone comes on the linux-raid list and says "Oh no!
> I accidently ruined my RAID array!".  Neil almost always responds "Well,
> don't do that!  But since you did, this might help...".

I know that.  And I've quite some expirience too, and I studied mdadm
source.

There was in fact two cases like that, not one.

First was mostly due to operator error, or lack of better choice at
2.2 (or early 2.4) times -- I relied on raid autodetection (which I
don't do anymore, and strongly suggest others to switch to mdassemble
or something like that) -- a drive failed (for real, not bad blocks)
and needed to be replaced, and I forgot to clear the partition table
on the replacement drive (which was in our testing box) - in a result,
kernel assembled a raid5 out of components which belonged to different
arrays..  I only vaguely remember what it was at that time -- maybe
kernel or I started reconstruction (not noticiyng the wrong array),
or i mounted the filesystem - can't say anymore for sure, but the
result was that I wasn't able to restore the filesystem, because i
didn't have that filesystem anymore.  (it should have been assembling
boot raid1 array but assembled a degraided raid5 instead)

And second case was when, after an attempt to resync the array (after
that famous 'bad block kicked off the whole disk) which resulted in an
OOPS (which I didn't notice immediately, but it continued the resync),
it wrote some garbage all over, resulting in badly broken filesystem,
and somewhat broken nearby partition too (which I was able to recover).
It was at about 2.4.19 or so, and I had that situation only once.
Granted, I can't blame raid code for all this, because I don't even
know what was in the oops (machine locked hard but someone who was
near the server noticied it OOPSed) - it sure may be a bug somewhere
else.

As a sort of conclusion.

There are several features that can be implemented in linux softraid
code to make it real Raid, with data safety goal.  One example is to
be able to replace a "to-be-failed" drive (think SMART failure
predictions for example) without removing it from the array with a
(hot)spare (or just a replacement) -- by adding the new drive to the
array *first*, and removing the to-be-replaced one only after new is
fully synced.  Another example is to implement some NVRAM-like storage
for metadata (this will require the necessary hardware as well, like
eg a flash card -- I dunno how safe it can be).  And so on.

The current MD code is "almost here", almost real.  It still has some
(maybe minor) problems, it still lacks some (again maybe minor) features
wrt data safety.  Ie, it still can fail, but it's almost here.

While current development is going to implement some new and non-trivial
features which are of little use in real life.  Face it: yes it's good
when you're able to reshape your array online keeping servicing your
users, but i'd go for even 12 hours downtime if i know my data is safe,
instead of unknown downtime after I realize the reshape failed for some
reason and I dont have my data anymore.  And yes it's very rarely used
(which adds to the problem - rarely used code paths with bugs with stays
unfound for alot of time, and bite you at a very unexpected moment, when
you think it's all ok...)

Well, not all is that bad really.  I really apprecate Neil's work, it's
all his baby after all, and I owe him alot of stuff because of all our
machines which, due to raid code, are running fine (most of them anyway).
I had a hopefully small question, whenever the new features are really
useful, and just described my point of view to the topic.. And answered
your, Ross, questions as well.. ;)

Thank you.

/mjt

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 18:12         ` Michael Tokarev
@ 2006-01-18  8:14           ` Sander
  2006-01-18  8:37             ` Brad Campbell
                               ` (2 more replies)
  2006-01-19  0:22           ` Neil Brown
  1 sibling, 3 replies; 72+ messages in thread
From: Sander @ 2006-01-18  8:14 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Ross Vandegrift, linux-raid, linux-kernel

Michael Tokarev wrote (ao):
> Most problematic case so far, which I described numerous times (like,
> "why linux raid isn't Raid really, why it can be worse than plain
> disk") is when, after single sector read failure, md kicks the whole
> disk off the array, and when you start resync (after replacing the
> "bad" drive or just remapping that bad sector or even doing nothing,
> as it will be remapped in almost all cases during write, on real
> drives anyway),

If the (harddisk internal) remap succeeded, the OS doesn't see the bad
sector at all I believe.

If you (the OS) do see a bad sector, the disk couldn't remap, and goes
downhill from there, right?

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18  8:14           ` Sander
@ 2006-01-18  8:37             ` Brad Campbell
  2006-01-18  9:03             ` Alan Cox
  2006-01-18 12:46             ` John Hendrikx
  2 siblings, 0 replies; 72+ messages in thread
From: Brad Campbell @ 2006-01-18  8:37 UTC (permalink / raw)
  To: sander; +Cc: Michael Tokarev, Ross Vandegrift, linux-raid

linux-kernel snipped from cc list.

Sander wrote:
> Michael Tokarev wrote (ao):
>> Most problematic case so far, which I described numerous times (like,
>> "why linux raid isn't Raid really, why it can be worse than plain
>> disk") is when, after single sector read failure, md kicks the whole
>> disk off the array, and when you start resync (after replacing the
>> "bad" drive or just remapping that bad sector or even doing nothing,
>> as it will be remapped in almost all cases during write, on real
>> drives anyway),

This particular case has been addressed in the latest kernels. md will now attempt to write the bad 
block back using reconstructed data and the disk will only be punted after multiple failures or a 
write failure (if my understanding of the patches is any good anyway)

> If the (harddisk internal) remap succeeded, the OS doesn't see the bad
> sector at all I believe.

If the disk can get a good read then it will re-map on the fly and the OS has no idea there was an 
issue. If not then it returns a read error to the OS. When that sector is next written it will be 
re-mapped by the drive and the error disappears.

> If you (the OS) do see a bad sector, the disk couldn't remap, and goes
> downhill from there, right?

With the older md code, yes, however as stated above this should almost become a non-issue now. (yay!)

Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18  8:14           ` Sander
  2006-01-18  8:37             ` Brad Campbell
@ 2006-01-18  9:03             ` Alan Cox
  2006-01-18 12:46             ` John Hendrikx
  2 siblings, 0 replies; 72+ messages in thread
From: Alan Cox @ 2006-01-18  9:03 UTC (permalink / raw)
  To: sander; +Cc: Michael Tokarev, Ross Vandegrift, linux-raid, linux-kernel

On Mer, 2006-01-18 at 09:14 +0100, Sander wrote:
> If the (harddisk internal) remap succeeded, the OS doesn't see the bad
> sector at all I believe.

True for ATA, in the SCSI case you may be told about the remap having
occurred but its a "by the way" type message not an error proper.

> If you (the OS) do see a bad sector, the disk couldn't remap, and goes
> downhill from there, right?

If a hot spare is configured it will be dropped into the configuration
at that point.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18  8:14           ` Sander
  2006-01-18  8:37             ` Brad Campbell
  2006-01-18  9:03             ` Alan Cox
@ 2006-01-18 12:46             ` John Hendrikx
  2006-01-18 12:51               ` Gordon Henderson
  2006-01-18 23:54               ` Neil Brown
  2 siblings, 2 replies; 72+ messages in thread
From: John Hendrikx @ 2006-01-18 12:46 UTC (permalink / raw)
  To: linux-raid

Sander wrote:
> Michael Tokarev wrote (ao):
>   
>> Most problematic case so far, which I described numerous times (like,
>> "why linux raid isn't Raid really, why it can be worse than plain
>> disk") is when, after single sector read failure, md kicks the whole
>> disk off the array, and when you start resync (after replacing the
>> "bad" drive or just remapping that bad sector or even doing nothing,
>> as it will be remapped in almost all cases during write, on real
>> drives anyway),
>>     
>
> If the (harddisk internal) remap succeeded, the OS doesn't see the bad
> sector at all I believe.
>   
Most hard disks will not remap sectors when reading fails, because then 
the contents would be lost permanently. 

Instead, they will report a failure to the OS, hoping that the sector 
might be readable at some later time.

What Linux Raid could do is reconstructing the sector that failed from 
the other drives and then writing it to disk.  Because the original 
contents of the sector will be lost on writing, your hard disk can 
safely remap the sector (and it will -- I often "repaired" bad sectors 
by writing to them).

> If you (the OS) do see a bad sector, the disk couldn't remap, and goes
> downhill from there, right?
>   
Not necessarily, if you see a bad sector after *writing* to it (several 
times), then your hard disk will probably go bad soon.  Most hard disks 
only remap sectors on write, so a simple full format can fix sectors 
that failed on read.

I agree with the original poster though, I'd really love to see Linux 
Raid take special action on sector read failures.  It happens about 5-6 
times a year here that a disk gets kicked out of the array for a simple 
read failure.  A rebuild of the array will fix it without a trace, but a 
rebuild takes about 3 hours :)

--John

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18 12:46             ` John Hendrikx
@ 2006-01-18 12:51               ` Gordon Henderson
  2006-01-18 23:51                 ` Neil Brown
  2006-01-18 23:54               ` Neil Brown
  1 sibling, 1 reply; 72+ messages in thread
From: Gordon Henderson @ 2006-01-18 12:51 UTC (permalink / raw)
  To: linux-raid

On Wed, 18 Jan 2006, John Hendrikx wrote:

> I agree with the original poster though, I'd really love to see Linux
> Raid take special action on sector read failures.  It happens about 5-6
> times a year here that a disk gets kicked out of the array for a simple
> read failure.  A rebuild of the array will fix it without a trace, but a
> rebuild takes about 3 hours :)

One thing that's well worth doing before simply fail/remove/add the drive
with the bad sector, is to do a read-only test on the other
drives/paritions in the rest of the set. That way you won't find out
half-way through the resync that other drives have failures, and then lose
the lot. It adds time to the whole operation, but it's worth it IMO.

Gordon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18 12:51               ` Gordon Henderson
@ 2006-01-18 23:51                 ` Neil Brown
  2006-01-19  7:20                   ` PFC
  0 siblings, 1 reply; 72+ messages in thread
From: Neil Brown @ 2006-01-18 23:51 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid

On Wednesday January 18, gordon@drogon.net wrote:
> On Wed, 18 Jan 2006, John Hendrikx wrote:
> 
> > I agree with the original poster though, I'd really love to see Linux
> > Raid take special action on sector read failures.  It happens about 5-6
> > times a year here that a disk gets kicked out of the array for a simple
> > read failure.  A rebuild of the array will fix it without a trace, but a
> > rebuild takes about 3 hours :)
> 
> One thing that's well worth doing before simply fail/remove/add the drive
> with the bad sector, is to do a read-only test on the other
> drives/paritions in the rest of the set. That way you won't find out
> half-way through the resync that other drives have failures, and then lose
> the lot. It adds time to the whole operation, but it's worth it IMO.

But what do you do if the read-only test fails... I guess you try to
reconstruct using the nearly-failed drive...

What might be good and practical is to not remove a failed drive
completely, but to hold on to it and only read from it in desperation
while reconstructing a spare.  That might be worth the effort...

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18 23:51                 ` Neil Brown
@ 2006-01-19  7:20                   ` PFC
  2006-01-19  8:01                     ` dean gaudet
  0 siblings, 1 reply; 72+ messages in thread
From: PFC @ 2006-01-19  7:20 UTC (permalink / raw)
  To: Neil Brown, Gordon Henderson; +Cc: linux-raid


	While we're at it, here's a little issue I had with RAID5 ; not really  
the fault of md, but you might want to know...

	I have a 5x250GB RAID5 array for home storage (digital photo, my lossless  
ripped cds, etc). 1 IDE Drive ave 4 SATA Drives.
	Now, turns out one of the SATA drives is a Maxtor 6V250F0, and these have  
problems ; it died, then was RMA'd, then died again. Finally, it turned  
out this drive series is incompatible with nvidia sata chipsets. A third  
drive seems to work, setting the jumper to SATA 150.
	Back to the point.

	Failure mode of these drives is an IDE command timeout. This takes a long  
time ! So, when the drive has failed, each command to it takes forever. md  
will eventually reject said drive, but it takes hours ; and meanwhile, the  
computer is unusable and data is offline...

	In this case, the really tempting solution is to hit the windows key (er,  
the hard reset button) ; but doing this, makes the array dirty and  
degraded, and it won't mount, and all data is seemingly lost. (well,  
recoverable with a bit of hacking /* goto error; */, but that's not very  
clean...)

	This isn't really a md issue, but it's really annoying only when using  
RAID, because it makes a normal process (kicking a dead drive out) so slow  
it's almost non-functional. Is there a way to modify the timeout in  
question ?

	Note that, re-reading the log below, it writes "Disk failure on sdd1,  
disabling device. Operation continuing on 4 devices", but errors continue  
to come, and the array is still unreachable (ie. cat /proc/mdstat hangs,  
etc). Hmm...

	Thanks for the time.


Jan  8 21:38:41 apollo13 ReiserFS: md2: checking transaction log (md2)
Jan  8 21:39:11 apollo13 ata4: command 0xca timeout, stat 0xd0 host_stat  
0x21
Jan  8 21:39:11 apollo13 ata4: translated ATA stat/err 0xca/00 to SCSI  
SK/ASC/ASCQ 0xb/47/00
Jan  8 21:39:11 apollo13 ata4: status=0xca { Busy }
Jan  8 21:39:11 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002
Jan  8 21:39:11 apollo13 sdd: Current: sense key=0xb
Jan  8 21:39:11 apollo13 ASC=0x47 ASCQ=0x0
Jan  8 21:39:11 apollo13 Info fld=0x3f
Jan  8 21:39:11 apollo13 end_request: I/O error, dev sdd, sector 63
Jan  8 21:39:11 apollo13 raid5: Disk failure on sdd1, disabling device.  
Operation continuing on 4 devices
Jan  8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:41 apollo13 ata4: command 0xca timeout, stat 0xd0 host_stat  
0x21
Jan  8 21:39:41 apollo13 ata4: translated ATA stat/err 0xca/00 to SCSI  
SK/ASC/ASCQ 0xb/47/00
Jan  8 21:39:41 apollo13 ata4: status=0xca { Busy }
Jan  8 21:39:41 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002
Jan  8 21:39:41 apollo13 sdd: Current: sense key=0xb
Jan  8 21:39:41 apollo13 ASC=0x47 ASCQ=0x0
Jan  8 21:39:41 apollo13 Info fld=0x9840097
Jan  8 21:39:41 apollo13 end_request: I/O error, dev sdd, sector 159645847
Jan  8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:40:01 apollo13 cron[17973]: (root) CMD (test -x  
/usr/sbin/run-crons && /usr/sbin/run-crons )
Jan  8 21:40:11 apollo13 ata4: command 0x35 timeout, stat 0xd0 host_stat  
0x21
Jan  8 21:40:11 apollo13 ata4: translated ATA stat/err 0x35/00 to SCSI  
SK/ASC/ASCQ 0x4/00/00
Jan  8 21:40:11 apollo13 ata4: status=0x35 { DeviceFault SeekComplete  
CorrectedError Error }
Jan  8 21:40:11 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002
Jan  8 21:40:11 apollo13 sdd: Current: sense key=0x4
Jan  8 21:40:11 apollo13 ASC=0x0 ASCQ=0x0
Jan  8 21:40:11 apollo13 end_request: I/O error, dev sdd, sector 465232831

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19  7:20                   ` PFC
@ 2006-01-19  8:01                     ` dean gaudet
  0 siblings, 0 replies; 72+ messages in thread
From: dean gaudet @ 2006-01-19  8:01 UTC (permalink / raw)
  To: PFC; +Cc: Neil Brown, Gordon Henderson, linux-raid

On Thu, 19 Jan 2006, PFC wrote:

> 	This isn't really a md issue, but it's really annoying only when using
> RAID, because it makes a normal process (kicking a dead drive out) so slow
> it's almost non-functional. Is there a way to modify the timeout in question ?

yeah i posted to l-k about similar problems a while back... i've got a 
disk which boots fine but fails all writes... useful for showing just how 
bad the system can become with a dead/dying disk.

western digital is selling "raid edition" disks now -- and part of their 
marketing material discusses the long timeout which commodity disks 
implement <http://www.westerndigital.com/en/library/sata/2579-001098.pdf>.  
the raid edition disks give up earlier on the assumption the raid layer is 
going to take care of things.  it's really too bad this isn't just a 
tunable parameter of the disk.

even still -- the linux kernel could probably do something about this... 
drivers could have a blockdev(8) tunable timeout, and a mode where the 
driver just gives up entirely on the device at the first error/timeout and 
return EIO for all outstanding requests at that point... and the driver 
could remain in this state until an explicit request to re-attempt normal 
operations.

-dean

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-18 12:46             ` John Hendrikx
  2006-01-18 12:51               ` Gordon Henderson
@ 2006-01-18 23:54               ` Neil Brown
  1 sibling, 0 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-18 23:54 UTC (permalink / raw)
  To: John Hendrikx; +Cc: linux-raid

On Wednesday January 18, hjohn@xs4all.nl wrote:
> 
> I agree with the original poster though, I'd really love to see Linux 
> Raid take special action on sector read failures.  It happens about 5-6 
> times a year here that a disk gets kicked out of the array for a simple 
> read failure.  A rebuild of the array will fix it without a trace, but a 
> rebuild takes about 3 hours :)

See 2.6.15 (for raid5) or 2.6.16-rc1 (for raid1).  You'll love it!

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 18:12         ` Michael Tokarev
  2006-01-18  8:14           ` Sander
@ 2006-01-19  0:22           ` Neil Brown
  2006-01-19  9:01             ` Jakob Oestergaard
  1 sibling, 1 reply; 72+ messages in thread
From: Neil Brown @ 2006-01-19  0:22 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Ross Vandegrift, linux-raid, linux-kernel

On Tuesday January 17, mjt@tls.msk.ru wrote:
> 
> As a sort of conclusion.
> 
> There are several features that can be implemented in linux softraid
> code to make it real Raid, with data safety goal.  One example is to
> be able to replace a "to-be-failed" drive (think SMART failure
> predictions for example) without removing it from the array with a
> (hot)spare (or just a replacement) -- by adding the new drive to the
> array *first*, and removing the to-be-replaced one only after new is
> fully synced.  Another example is to implement some NVRAM-like storage
> for metadata (this will require the necessary hardware as well, like
> eg a flash card -- I dunno how safe it can be).  And so on.

proactive replacement before complete failure is a good idea and is
(just recently) on my todo list.  It shouldn't be too hard.

> 
> The current MD code is "almost here", almost real.  It still has some
> (maybe minor) problems, it still lacks some (again maybe minor) features
> wrt data safety.  Ie, it still can fail, but it's almost here.

concrete suggestions are always welcome (though sometimes you might
have to put some effort into convincing me...)

> 
> While current development is going to implement some new and non-trivial
> features which are of little use in real life.  Face it: yes it's good
> when you're able to reshape your array online keeping servicing your
> users, but i'd go for even 12 hours downtime if i know my data is safe,
> instead of unknown downtime after I realize the reshape failed for some
> reason and I dont have my data anymore.  And yes it's very rarely used
> (which adds to the problem - rarely used code paths with bugs with stays
> unfound for alot of time, and bite you at a very unexpected moment, when
> you think it's all ok...)

If you look at the amount of code in the 'reshape raid5' patch you
will notice that it isn't really very much.  It reuses a lot of the
infrastructure that is already present in md/raid5.  So a reshape
actually uses a lot of code that is used very often.

Compare this to an offline solution (raidreconfig) where all the code
is only used occasionally.  You could argue that the online version
has more code safety than the offline version....

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-19  0:22           ` Neil Brown
@ 2006-01-19  9:01             ` Jakob Oestergaard
  0 siblings, 0 replies; 72+ messages in thread
From: Jakob Oestergaard @ 2006-01-19  9:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: Michael Tokarev, Ross Vandegrift, linux-raid, linux-kernel

On Thu, Jan 19, 2006 at 11:22:31AM +1100, Neil Brown wrote:
...
> Compare this to an offline solution (raidreconfig) where all the code
> is only used occasionally.  You could argue that the online version
> has more code safety than the offline version....

Correct.

raidreconf, however, can convert a 2 disk RAID-0 to a 4 disk RAID-5 for
example - the whole design of raidreconf is fundamentally different (of
course) from the on-line reshape.  The on-line reshape can be (and
should be) much simpler.

Now, back when I wrote raidreconf, my thoughts were that md would be
merged into dm, and that raidreconf should evolve into something like
'pvmove' - a user-space tool that moves blocks around, interfacing with
the kernel as much as strictly necessary, allowing hot reconfiguration
of RAID setups.

That was the idea.

Reality, however, seems to be that MD is not moving quickly into DM (for
whatever reasons). Also, I haven't had the time to actually just move on
this myself. Today, raidreconf is used by some, but it is not
maintained, and it is often too slow for comfortable off-line usage
(reconfiguration of TB sized arrays is slow - not so much because of
raidreconf, but because there simply is a lot of data that needs to be
moved around).

I still think that putting MD into DM and extending pvmove to include
raidreconf functionality, would be the way to go. The final solution
should also be tolerant (like pvmove is today) of power cycles during
reconfiguration - the operation should be re-startable.

Anyway - this is just me dreaming - I don't have time to do this and it
seems that currently noone else has either.

Great initiative with the reshape Neil - hot reconfiguration is much
needed - personally I still hope to see MD move into DM and pvmove
including raidreconf functionality, but I guess that when we're eating
an elephant we should be satisfied with taking one bite at a time  :)

-- 

 / jakob

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 11:26     ` Michael Tokarev
                         ` (2 preceding siblings ...)
  2006-01-17 16:08       ` Ross Vandegrift
@ 2006-01-17 22:38       ` Phillip Susi
  2006-01-17 22:57         ` Neil Brown
  3 siblings, 1 reply; 72+ messages in thread
From: Phillip Susi @ 2006-01-17 22:38 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: sander, NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson

Michael Tokarev wrote:
<snip>
> Compare this with my statement about "offline" "reshaper" above:
> separate userspace (easier to write/debug compared with kernel
> space) program which operates on an inactive array (no locking
> needed, no need to worry about other I/O operations going to the
> array at the time of reshaping etc), with an ability to plan it's
> I/O strategy in alot more efficient and safer way...  Yes this
> apprpach has one downside: the array has to be inactive.  But in
> my opinion it's worth it, compared to more possibilities to lose
> your data, even if you do NOT use that feature at all...
>
>   
I also like the idea of this kind of thing going in user space.  I was 
also under the impression that md was going to be phased out and 
replaced by the device mapper.  I've been kicking around the idea of a 
user space utility that manipulates the device mapper tables and 
performs block moves itself to reshape a raid array.  It doesn't seem 
like it would be that difficult and would not require modifying the 
kernel at all.  The basic idea is something like this:

/dev/mapper/raid is your raid array, which is mapped to a stripe between 
/dev/sda, /dev/sdb.  When you want to expand the stripe to add /dev/sdc 
to the array, you create three new devices:

/dev/mapper/raid-old:  copy of the old mapper table, striping sda and sdb
/dev/mapper/raid-progress: linear map with size = new stripe width, and 
pointing to raid-new
/dev/mapper/raid-new: what the raid will look like when done, i.e. 
stripe of sda, sdb, and sdc

Then you replace /dev/mapper/raid with a linear map to raid-new, 
raid-progress, and raid-old, in that order.  Initially the length of the 
chunks from raid-progress and raid-new are zero, so you will still be 
entirely accessing raid-old.  For each stripe in the array, you change 
raid-progress to point to the corresponding blocks in raid-new, but 
suspended, so IO to this stripe will block.  Then you update the raid 
map so raid-progress overlays the stripe you are working on to catch IO 
instead of allowing it to go to raid-old.  After you read that stripe 
from raid-old and write it to raid-new,  resume raid-progress to flush 
any blocked writes to the raid-new stripe.  Finally update raid so the 
previously in progress stripe now maps to raid-new. 

Repeat for each stripe in the array, and finally replace the raid table 
with raid-new's table, and delete the 3 temporary devices. 

Adding transaction logging to the user mode utility wouldn't be very 
hard either. 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 22:38       ` Phillip Susi
@ 2006-01-17 22:57         ` Neil Brown
  0 siblings, 0 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-17 22:57 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Michael Tokarev, sander, linux-raid, linux-kernel,
	Steinar H. Gunderson

On Tuesday January 17, psusi@cfl.rr.com wrote:
>                                                                  I was 
> also under the impression that md was going to be phased out and 
> replaced by the device mapper.

I wonder where this sort of idea comes from....

Obviously individual distributions are free to support or not support
whatever bits of code they like.  And developers are free to add
duplicate functionality to the kernel (I believe someone is working on
a raid5 target for dm).  But that doesn't mean that anything is going
to be 'phased out'.

md and dm, while similar, are quite different.  They can both
comfortably co-exist even if they have similar functionality.
What I expect will happen (in line with what normally happens in
Linux) is that both will continue to evolve as long as there is
interest and developer support.  They will quite possibly borrow ideas
from each other where that is relevant.  Parts of one may lose
support and eventually die (as md/multipath is on the way to doing)
but there is no wholesale 'phasing out' going to happen in either
direction. 

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  8:17 ` Michael Tokarev
       [not found]   ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com>
  2006-01-17  9:50   ` Sander
@ 2006-01-17 14:10   ` Steinar H. Gunderson
  2 siblings, 0 replies; 72+ messages in thread
From: Steinar H. Gunderson @ 2006-01-17 14:10 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: NeilBrown, linux-raid, linux-kernel

On Tue, Jan 17, 2006 at 11:17:15AM +0300, Michael Tokarev wrote:
> Neil, is this online resizing/reshaping really needed?  I understand
> all those words means alot for marketing persons - zero downtime,
> online resizing etc, but it is much safer and easier to do that stuff
> 'offline', on an inactive array, like raidreconf does - safer, easier,
> faster, and one have more possibilities for more complex changes. 

Try the scenario where the resize takes a week, and you don't have enough
spare disks to move it onto another server -- besides, that would take
several days alone... This is the kind of use-case for which I wrote the
original patch, and I'm grateful that Neil has picked it up again so we can
finally get something working in.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  6:56 NeilBrown
  2006-01-17  8:17 ` Michael Tokarev
@ 2006-01-17 15:07 ` Mr. James W. Laferriere
  2006-01-19  0:23   ` Neil Brown
  2006-01-22  4:42 ` Adam Kropelin
  2006-01-23  1:08 ` John Hendrikx
  3 siblings, 1 reply; 72+ messages in thread
From: Mr. James W. Laferriere @ 2006-01-17 15:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid maillist

 	Hello Neil ,

On Tue, 17 Jan 2006, NeilBrown wrote:
> Greetings.
>
> In line with the principle of "release early", following are 5 patches
> against md in 2.6.latest which implement reshaping of a raid5 array.
> By this I mean adding 1 or more drives to the array and then re-laying
> out all of the data.
 	Please inform me of which of the 2.6.latest to use ?  Tia ,  JimL

The latest stable version of the Linux kernel is:		2.6.15.1 	2006-01-15 06:14 UTC 	F 	V 	  	C 	Changelog
The latest prepatch for the stable Linux kernel tree is:	2.6.16-rc1 	2006-01-17 08:09 UTC 	  	V 	  	C 	Changelog
The latest snapshot for the stable Linux kernel tree is:	2.6.15-git12 	2006-01-16 08:04 UTC 	  	V 	  	C 	Changelog


> This is still EXPERIMENTAL and could easily eat your data.  Don't use it on
> valuable data.  Only use it for review and testing.
>
> This release does not make ANY attempt to record how far the reshape
> has progressed on stable storage.  That means that if the process is
> interrupted either by a crash or by "mdadm -S", then you completely
> lose your data.  All of it.
> So don't use it on valuable data.
>
> There are 5 patches to (hopefully) ease review.  Comments are most
> welcome, as are test results (providing they aren't done on valuable data:-).
>
> You will need to enable the experimental MD_RAID5_RESHAPE config option
> for this to work.  Please read the help message that come with it.
> It gives an example mdadm command to effect a reshape (you do not need
> a new mdadm, and vaguely recent version should work).
>
> This code is based in part on earlier work by
>  "Steinar H. Gunderson" <sgunderson@bigfoot.com>
> Though little of his code remains, having access to it, and having
> discussed the issues with him greatly eased the processed of creating
> these patches.  Thanks Steinar.
>
> NeilBrown
>
> [PATCH 001 of 5] md: Split disks array out of raid5 conf structure so it is easier to grow.
> [PATCH 002 of 5] md: Allow stripes to be expanded in preparation for expanding an array.
> [PATCH 003 of 5] md: Infrastructure to allow normal IO to continue while array is expanding.
> [PATCH 004 of 5] md: Core of raid5 resize process
> [PATCH 005 of 5] md: Final stages of raid5 expand code.
-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
|  http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr   |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17 15:07 ` Mr. James W. Laferriere
@ 2006-01-19  0:23   ` Neil Brown
  0 siblings, 0 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-19  0:23 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

On Tuesday January 17, babydr@baby-dragons.com wrote:
>  	Hello Neil ,
> 
> On Tue, 17 Jan 2006, NeilBrown wrote:
> > Greetings.
> >
> > In line with the principle of "release early", following are 5 patches
> > against md in 2.6.latest which implement reshaping of a raid5 array.
> > By this I mean adding 1 or more drives to the array and then re-laying
> > out all of the data.
>  	Please inform me of which of the 2.6.latest to use ?  Tia ,  JimL
> 
> The latest stable version of the Linux kernel is:		2.6.15.1 	2006-01-15 06:14 UTC 	F 	V 	  	C 	Changelog
> The latest prepatch for the stable Linux kernel tree is:	2.6.16-rc1 	2006-01-17 08:09 UTC 	  	V 	  	C 	Changelog
> The latest snapshot for the stable Linux kernel tree is:	2.6.15-git12 	2006-01-16 08:04 UTC 	  	V 	  	C 	Changelog

Yes, any of those would be fine.

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  6:56 NeilBrown
  2006-01-17  8:17 ` Michael Tokarev
  2006-01-17 15:07 ` Mr. James W. Laferriere
@ 2006-01-22  4:42 ` Adam Kropelin
  2006-01-22 22:52   ` Neil Brown
  2006-01-23  1:08 ` John Hendrikx
  3 siblings, 1 reply; 72+ messages in thread
From: Adam Kropelin @ 2006-01-22  4:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson

NeilBrown <neilb@suse.de> wrote:
> In line with the principle of "release early", following are 5 patches
> against md in 2.6.latest which implement reshaping of a raid5 array.
> By this I mean adding 1 or more drives to the array and then re-laying
> out all of the data.

I've been looking forward to a feature like this, so I took the
opportunity to set up a vmware session and give the patches a try. I
encountered both success and failure, and here are the details of both.

On the first try I neglected to read the directions and increased the
number of devices first (which worked) and then attempted to add the
physical device (which didn't work; at least not the way I intended).
The result was an array of size 4, operating in degraded mode, with 
three active drives and one spare. I was unable to find a way to coax
mdadm into adding the 4th drive as an active device instead of a 
spare. I'm not an mdadm guru, so there may be a method I overlooked.
Here's what I did, interspersed with trimmed /proc/mdstat output:

  mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc

    md0 : active raid5 sda[0] sdc[2] sdb[1]
          2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

  mdadm --grow -n4 /dev/md0

    md0 : active raid5 sda[0] sdc[2] sdb[1]
          3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

  mdadm --manage --add /dev/md0 /dev/sdd

    md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
          3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

  mdadm --misc --stop /dev/md0
  mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd

    md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
          3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

For my second try I actually read the directions and things went much
better, aside from a possible /proc/mdstat glitch shown below.

  mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc

    md0 : active raid5 sda[0] sdc[2] sdb[1]
          2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

  mdadm --manage --add /dev/md0 /dev/sdd

    md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0]
          2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

  mdadm --grow -n4 /dev/md0

    md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
          2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
                                ...should this be... --> [4/3] [UUU_] perhaps?
          [>....................]  recovery =  0.4% (5636/1048512) finish=9.1min speed=1878K/sec

    [...time passes...]

    md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
          3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

My final test was a repeat of #2, but with data actively being written
to the array during the reshape (the previous tests were on an idle,
unmounted array). This one failed pretty hard, with several processes
ending up in the D state. I repeated it twice and sysrq-t dumps can be
found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>.
The writeout load was a kernel tree untar started shortly before the
'mdadm --grow' command was given. mdadm hung, as did tar. Any process
which subsequently attmpted to access the array hung as well. A second
attempt at the same thing hung similarly, although only pdflush shows up
hung in that trace. mdadm and tar are missing for some reason.

I'm happy to do more tests. It's easy to conjur up virtual disks and
load them with irrelevant data (like kernel trees ;)

--Adam

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-22  4:42 ` Adam Kropelin
@ 2006-01-22 22:52   ` Neil Brown
  2006-01-23 23:02     ` Adam Kropelin
  0 siblings, 1 reply; 72+ messages in thread
From: Neil Brown @ 2006-01-22 22:52 UTC (permalink / raw)
  To: Adam Kropelin; +Cc: NeilBrown, linux-raid, linux-kernel, Steinar H. Gunderson

On Saturday January 21, akropel1@rochester.rr.com wrote:
> NeilBrown <neilb@suse.de> wrote:
> > In line with the principle of "release early", following are 5 patches
> > against md in 2.6.latest which implement reshaping of a raid5 array.
> > By this I mean adding 1 or more drives to the array and then re-laying
> > out all of the data.
> 
> I've been looking forward to a feature like this, so I took the
> opportunity to set up a vmware session and give the patches a try. I
> encountered both success and failure, and here are the details of both.
> 
> On the first try I neglected to read the directions and increased the
> number of devices first (which worked) and then attempted to add the
> physical device (which didn't work; at least not the way I intended).
> The result was an array of size 4, operating in degraded mode, with 
> three active drives and one spare. I was unable to find a way to coax
> mdadm into adding the 4th drive as an active device instead of a 
> spare. I'm not an mdadm guru, so there may be a method I overlooked.
> Here's what I did, interspersed with trimmed /proc/mdstat output:

Thanks, this is exactly the sort of feedback I was hoping for - people
testing thing that I didn't think to...

> 
>   mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
> 
>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
>   mdadm --grow -n4 /dev/md0
> 
>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

I assume that no "resync" started at this point?  It should have done.

> 
>   mdadm --manage --add /dev/md0 /dev/sdd
> 
>     md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
> 
>   mdadm --misc --stop /dev/md0
>   mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd
> 
>     md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

This really should have started a recovery.... I'll look into that
too.


> 
> For my second try I actually read the directions and things went much
> better, aside from a possible /proc/mdstat glitch shown below.
> 
>   mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
> 
>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
>   mdadm --manage --add /dev/md0 /dev/sdd
> 
>     md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
>   mdadm --grow -n4 /dev/md0
> 
>     md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
>           2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>                                 ...should this be... --> [4/3] [UUU_] perhaps?

Well, part of the array is "4/4 UUUU" and part is "3/3 UUU".  How do
you represent that?  I think "4/4 UUUU" is best.


>           [>....................]  recovery =  0.4% (5636/1048512) finish=9.1min speed=1878K/sec
> 
>     [...time passes...]
> 
>     md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
> 
> My final test was a repeat of #2, but with data actively being written
> to the array during the reshape (the previous tests were on an idle,
> unmounted array). This one failed pretty hard, with several processes
> ending up in the D state. I repeated it twice and sysrq-t dumps can be
> found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>.
> The writeout load was a kernel tree untar started shortly before the
> 'mdadm --grow' command was given. mdadm hung, as did tar. Any process
> which subsequently attmpted to access the array hung as well. A second
> attempt at the same thing hung similarly, although only pdflush shows up
> hung in that trace. mdadm and tar are missing for some reason.

Hmmm... I tried similar things but didn't get this deadlock.  Somehow
the fact that mdadm is holding the reconfig_sem semaphore means that
some IO cannot proceed and so mdadm cannot grab and resize all the
stripe heads... I'll have to look more deeply into this.

> 
> I'm happy to do more tests. It's easy to conjur up virtual disks and
> load them with irrelevant data (like kernel trees ;)

Great.  I'll probably be putting out a new patch set  late this week
or early next.  Hopefully it will fix the issues you can found and you
can try it again..


Thanks again,
NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-22 22:52   ` Neil Brown
@ 2006-01-23 23:02     ` Adam Kropelin
  0 siblings, 0 replies; 72+ messages in thread
From: Adam Kropelin @ 2006-01-23 23:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson

Neil Brown wrote:
> On Saturday January 21, akropel1@rochester.rr.com wrote:
>> On the first try I neglected to read the directions and increased the
>> number of devices first (which worked) and then attempted to add the
>> physical device (which didn't work; at least not the way I intended).
>
> Thanks, this is exactly the sort of feedback I was hoping for - people
> testing thing that I didn't think to...
>
>>   mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc
>>
>>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>>           2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>
>>   mdadm --grow -n4 /dev/md0
>>
>>     md0 : active raid5 sda[0] sdc[2] sdb[1]
>>           3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
>
> I assume that no "resync" started at this point?  It should have done.

Actually, it did start a resync. Sorry, I should have mentioned that. I 
waited until the resync completed before I issued the 'mdadm --add' 
command.

>>     md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]
>>           2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>>                                 ...should this be... --> [4/3]
>> [UUU_] perhaps?
>
> Well, part of the array is "4/4 UUUU" and part is "3/3 UUU".  How do
> you represent that?  I think "4/4 UUUU" is best.

I see your point. I was expecting some indication that that my array was 
vulnerable and that the new disk was not fully utilized yet. I guess the 
resync in progress indicator is sufficient.

>> My final test was a repeat of #2, but with data actively being
>> written
>> to the array during the reshape (the previous tests were on an idle,
>> unmounted array). This one failed pretty hard, with several processes
>> ending up in the D state.
>
> Hmmm... I tried similar things but didn't get this deadlock.  Somehow
> the fact that mdadm is holding the reconfig_sem semaphore means that
> some IO cannot proceed and so mdadm cannot grab and resize all the
> stripe heads... I'll have to look more deeply into this.

For what it's worth, I'm using the Buslogic SCSI driver for the disks in 
the array.

>> I'm happy to do more tests. It's easy to conjur up virtual disks and
>> load them with irrelevant data (like kernel trees ;)
>
> Great.  I'll probably be putting out a new patch set  late this week
> or early next.  Hopefully it will fix the issues you can found and you
> can try it again..

Looking forward to it...

--Adam


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-17  6:56 NeilBrown
                   ` (2 preceding siblings ...)
  2006-01-22  4:42 ` Adam Kropelin
@ 2006-01-23  1:08 ` John Hendrikx
  2006-01-23  1:25   ` Neil Brown
  3 siblings, 1 reply; 72+ messages in thread
From: John Hendrikx @ 2006-01-23  1:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson

NeilBrown wrote:
> In line with the principle of "release early", following are 5 patches
> against md in 2.6.latest which implement reshaping of a raid5 array.
> By this I mean adding 1 or more drives to the array and then re-laying
> out all of the data.
>   
I think my question is already answered by this, but...

Would this also allow changing the size of each raid device?  Let's say 
I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I 
only allowed to add more 160 GB devices?


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23  1:08 ` John Hendrikx
@ 2006-01-23  1:25   ` Neil Brown
  2006-01-23  1:54     ` Kyle Moffett
  2006-01-23  2:09     ` Mr. James W. Laferriere
  0 siblings, 2 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-23  1:25 UTC (permalink / raw)
  To: John Hendrikx; +Cc: linux-raid, linux-kernel, Steinar H. Gunderson

On Monday January 23, hjohn@xs4all.nl wrote:
> NeilBrown wrote:
> > In line with the principle of "release early", following are 5 patches
> > against md in 2.6.latest which implement reshaping of a raid5 array.
> > By this I mean adding 1 or more drives to the array and then re-laying
> > out all of the data.
> >   
> I think my question is already answered by this, but...
> 
> Would this also allow changing the size of each raid device?  Let's say 
> I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I 
> only allowed to add more 160 GB devices?

Changing the size of the devices is a separate operation that has been
supported for a while.
For each device in turn, you fail it and replace it with a larger
device. (This means the array runs degraded for a while, which isn't
ideal and might be fixed one day).

Once all the devices in the array are of the desired size, you run
  mdadm --grow /dev/mdX --size=max
and the array (raid1, raid5, raid6) will use up all available space on
the devices, and a resync will start to make sure that extra space is
in-sync.

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23  1:25   ` Neil Brown
@ 2006-01-23  1:54     ` Kyle Moffett
  2006-01-23  2:09     ` Mr. James W. Laferriere
  1 sibling, 0 replies; 72+ messages in thread
From: Kyle Moffett @ 2006-01-23  1:54 UTC (permalink / raw)
  To: Neil Brown; +Cc: John Hendrikx, linux-raid, linux-kernel, Steinar H. Gunderson

On Jan 22, 2006, at 20:25, Neil Brown wrote:
> Changing the size of the devices is a separate operation that has  
> been supported for a while. For each device in turn, you fail it  
> and replace it with a larger device. (This means the array runs  
> degraded for a while, which isn't ideal and might be fixed one day).
>
> Once all the devices in the array are of the desired size, you run
>   mdadm --grow /dev/mdX --size=max
> and the array (raid1, raid5, raid6) will use up all available space  
> on the devices, and a resync will start to make sure that extra  
> space is in-sync.

One option I can think of that would make it much safer would be to  
originally set up your RAID like this:

                md3 (RAID-5)
        __________/   |   \__________
       /              |              \
md0 (RAID-1)   md1 (RAID-1)   md2 (RAID-1)

Each of md0-2 would only have a single drive, and therefore provide  
no redundancy.  When you wanted to grow the RAID-5, you would first  
add a new larger disk to each of md0-md2 and trigger each resync.   
Once that is complete, remove the old drives from md0-2 and run:
   mdadm --grow /dev/md0 --size=max
   mdadm --grow /dev/md1 --size=max
   mdadm --grow /dev/md2 --size=max

Then once all that has completed, run:
   mdadm --grow /dev/md3 --size=max

This will enlarge the top-level array.  If you have LVM on the top- 
level, you can allocate new LVs, resize existing ones, etc.

With the newly added code, you could also add new drives dynamically  
by creating a /dev/md4 out of the single drive, and adding that as a  
new member of /dev/md3.

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw  
knives at people who weren't supposed to be in your machine room.
   -- Anthony de Boer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23  1:25   ` Neil Brown
  2006-01-23  1:54     ` Kyle Moffett
@ 2006-01-23  2:09     ` Mr. James W. Laferriere
  2006-01-23  2:33       ` Neil Brown
  1 sibling, 1 reply; 72+ messages in thread
From: Mr. James W. Laferriere @ 2006-01-23  2:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid maillist

 	Hello Neil ,

On Mon, 23 Jan 2006, Neil Brown wrote:
> On Monday January 23, hjohn@xs4all.nl wrote:
>> NeilBrown wrote:
>>> In line with the principle of "release early", following are 5 patches
>>> against md in 2.6.latest which implement reshaping of a raid5 array.
>>> By this I mean adding 1 or more drives to the array and then re-laying
>>> out all of the data.
>>>
>> I think my question is already answered by this, but...
>>
>> Would this also allow changing the size of each raid device?  Let's say
>> I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I
>> only allowed to add more 160 GB devices?
>
> Changing the size of the devices is a separate operation that has been
> supported for a while.
> For each device in turn, you fail it and replace it with a larger
> device. (This means the array runs degraded for a while, which isn't
> ideal and might be fixed one day).
>
> Once all the devices in the array are of the desired size, you run
>  mdadm --grow /dev/mdX --size=max
> and the array (raid1, raid5, raid6) will use up all available space on
> the devices, and a resync will start to make sure that extra space is
> in-sync.
 	How does one come up with a accurate '--size=max' ?
 	I thought someone had asked this question before ,  but the
 	message where this was mentioned eluded me .
 		Tia ,  JimL
-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
|  http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr   |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 000 of 5] md: Introduction
  2006-01-23  2:09     ` Mr. James W. Laferriere
@ 2006-01-23  2:33       ` Neil Brown
  0 siblings, 0 replies; 72+ messages in thread
From: Neil Brown @ 2006-01-23  2:33 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: linux-raid maillist

On Sunday January 22, babydr@baby-dragons.com wrote:
>  	Hello Neil ,
> 
> On Mon, 23 Jan 2006, Neil Brown wrote:
> > On Monday January 23, hjohn@xs4all.nl wrote:
> >> NeilBrown wrote:
> >>> In line with the principle of "release early", following are 5 patches
> >>> against md in 2.6.latest which implement reshaping of a raid5 array.
> >>> By this I mean adding 1 or more drives to the array and then re-laying
> >>> out all of the data.
> >>>
> >> I think my question is already answered by this, but...
> >>
> >> Would this also allow changing the size of each raid device?  Let's say
> >> I currently have 160 GB x 6, could I change that to 300 GB x 6 or am I
> >> only allowed to add more 160 GB devices?
> >
> > Changing the size of the devices is a separate operation that has been
> > supported for a while.
> > For each device in turn, you fail it and replace it with a larger
> > device. (This means the array runs degraded for a while, which isn't
> > ideal and might be fixed one day).
> >
> > Once all the devices in the array are of the desired size, you run
> >  mdadm --grow /dev/mdX --size=max
> > and the array (raid1, raid5, raid6) will use up all available space on
> > the devices, and a resync will start to make sure that extra space is
> > in-sync.
>  	How does one come up with a accurate '--size=max' ?
>  	I thought someone had asked this question before ,  but the
>  	message where this was mentioned eluded me .
>  		Tia ,  JimL

 --size=max
is literal.  If you say 'max', mdadm will either figure out the maximum,
or tell the kernel to (I don't remember which).

NeilBrown

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2006-01-24  2:02 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-17 21:38 [PATCH 000 of 5] md: Introduction Lincoln Dale (ltd)
2006-01-18 13:27 ` Jan Engelhardt
2006-01-18 23:19   ` Neil Brown
2006-01-19 15:33     ` Mark Hahn
2006-01-19 20:12     ` Jan Engelhardt
2006-01-19 21:22       ` Lars Marowsky-Bree
2006-01-19 22:17     ` Phillip Susi
2006-01-19 22:32       ` Neil Brown
2006-01-19 23:26         ` Phillip Susi
2006-01-19 23:43           ` Neil Brown
2006-01-20  2:17             ` Phillip Susi
2006-01-20 10:53               ` Lars Marowsky-Bree
2006-01-20 12:06                 ` Jens Axboe
2006-01-20 18:38                 ` Heinz Mauelshagen
2006-01-20 22:09                   ` Lars Marowsky-Bree
2006-01-21  0:06                     ` Heinz Mauelshagen
2006-01-20 18:41               ` Heinz Mauelshagen
2006-01-20 17:29             ` Ross Vandegrift
2006-01-20 18:36             ` Heinz Mauelshagen
2006-01-20 22:57               ` Lars Marowsky-Bree
2006-01-21  0:01                 ` Heinz Mauelshagen
2006-01-21  0:03                   ` Lars Marowsky-Bree
2006-01-21  0:08                     ` Heinz Mauelshagen
2006-01-21  0:13                       ` Lars Marowsky-Bree
2006-01-23  9:44                         ` Heinz Mauelshagen
2006-01-23 10:26                           ` Lars Marowsky-Bree
2006-01-23 10:38                             ` Heinz Mauelshagen
2006-01-23 10:45                               ` Lars Marowsky-Bree
2006-01-23 11:00                                 ` Heinz Mauelshagen
2006-01-23 12:54                           ` Ville Herva
2006-01-23 13:00                             ` Steinar H. Gunderson
2006-01-23 13:54                             ` Heinz Mauelshagen
2006-01-23 17:33                               ` Ville Herva
2006-01-24  2:02                             ` Phillip Susi
2006-01-20  7:51         ` Reuben Farrelly
2006-01-20  3:43           ` Andre' Breiler
2006-01-21  0:42             ` David Greaves
  -- strict thread matches above, loose matches on Subject: below --
2006-01-17  6:56 NeilBrown
2006-01-17  8:17 ` Michael Tokarev
     [not found]   ` <fd8d0180601170121s1e6a55b7o@mail.gmail.com>
2006-01-17  9:38     ` Francois Barre
2006-01-19  0:35       ` Neil Brown
2006-01-17  9:50   ` Sander
2006-01-17 11:26     ` Michael Tokarev
2006-01-17 11:37       ` Francois Barre
2006-01-17 14:03       ` Kyle Moffett
2006-01-19  0:28         ` Neil Brown
2006-01-17 16:08       ` Ross Vandegrift
2006-01-17 18:12         ` Michael Tokarev
2006-01-18  8:14           ` Sander
2006-01-18  8:37             ` Brad Campbell
2006-01-18  9:03             ` Alan Cox
2006-01-18 12:46             ` John Hendrikx
2006-01-18 12:51               ` Gordon Henderson
2006-01-18 23:51                 ` Neil Brown
2006-01-19  7:20                   ` PFC
2006-01-19  8:01                     ` dean gaudet
2006-01-18 23:54               ` Neil Brown
2006-01-19  0:22           ` Neil Brown
2006-01-19  9:01             ` Jakob Oestergaard
2006-01-17 22:38       ` Phillip Susi
2006-01-17 22:57         ` Neil Brown
2006-01-17 14:10   ` Steinar H. Gunderson
2006-01-17 15:07 ` Mr. James W. Laferriere
2006-01-19  0:23   ` Neil Brown
2006-01-22  4:42 ` Adam Kropelin
2006-01-22 22:52   ` Neil Brown
2006-01-23 23:02     ` Adam Kropelin
2006-01-23  1:08 ` John Hendrikx
2006-01-23  1:25   ` Neil Brown
2006-01-23  1:54     ` Kyle Moffett
2006-01-23  2:09     ` Mr. James W. Laferriere
2006-01-23  2:33       ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).